[Ray] Initial import (#872783).

Sebastien Boisvert sebhtml at fedoraproject.org
Mon Jul 15 16:34:03 UTC 2013


commit d550d9ccb0dac44d37808464d0c4cf25c627c4dc
Author: Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca>
Date:   Mon Jul 15 12:32:14 2013 -0400

    Initial import (#872783).

 Ray.manpage.patch |  410 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 Ray.spec          |  194 +++++++++++++++++++++++++
 2 files changed, 604 insertions(+), 0 deletions(-)
---
diff --git a/Ray.manpage.patch b/Ray.manpage.patch
new file mode 100644
index 0000000..35df432
--- /dev/null
+++ b/Ray.manpage.patch
@@ -0,0 +1,410 @@
+--- /dev/null	2012-11-27 10:10:35.990752806 -0500
++++ Ray.1	2012-11-29 21:48:32.447898203 -0500
+@@ -0,0 +1,407 @@
++.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.40.12.
++.TH RAY "1" "November 2012" "Ray 2.1.0" "User Commands"
++
++.SH NAME
++Ray - assemble genomes in parallel using the message-passing interface
++.SH SYNOPSIS
++       mpiexec -n NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
++
++       mpiexec -n NUMBER_OF_RANKS Ray Ray.conf # with commands in a file
++.SH DESCRIPTION
++
++  The Ray genome assembler is built on top of the RayPlatform, a generic plugin-based
++  distributed and parallel compute engine that uses the message-passing interface
++  for passing messages.
++
++  Ray targets several applications:
++
++    - de novo genome assembly (with Ray vanilla)
++    - de novo meta-genome assembly (with Ray Méta)
++    - de novo transcriptome assembly (works, but not tested a lot)
++    - quantification of contig abundances
++    - quantification of microbiome consortia members (with Ray Communities)
++    - quantification of transcript expression
++    - taxonomy profiling of samples (with Ray Communities)
++    - gene ontology profiling of samples (with Ray Ontologies)
++
++.SH OPTIONS
++
++       -help
++              Displays this help page.
++
++       -version
++              Displays Ray version and compilation options.
++
++  Using a configuration file
++
++    Ray can be launched with
++    mpiexec -n 16 Ray Ray.conf
++    The configuration file can include comments (starting with #).
++
++  K-mer length
++
++       -k kmerLength
++              Selects the length of k-mers. The default value is 21. 
++              It must be odd because reverse-complement vertices are stored together.
++              The maximum length is defined at compilation by MAXKMERLENGTH
++              Larger k-mers utilise more memory.
++
++  Inputs
++
++       -p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation]
++              Provides two files containing paired-end reads.
++              averageOuterDistance and standardDeviation are automatically computed if not provided.
++
++       -i interleavedSequenceFile [averageOuterDistance standardDeviation]
++              Provides one file containing interleaved paired-end reads.
++              averageOuterDistance and standardDeviation are automatically computed if not provided.
++
++       -s sequenceFile
++              Provides a file containing single-end reads.
++
++  Outputs
++
++       -o outputDirectory
++              Specifies the directory for outputted files. Default is RayOutput
++
++  Assembly options (defaults work well)
++
++       -disable-recycling
++              Disables read recycling during the assembly
++              reads will be set free in 3 cases:
++              1. the distance did not match for a pair
++              2. the read has not met its mate
++              3. the library population indicates a wrong placement
++              see Constrained traversal of repeats with paired sequences.
++              Sébastien Boisvert, Élénie Godzaridis, François Laviolette & Jacques Corbeil.
++              First Annual RECOMB Satellite Workshop on Massively Parallel Sequencing, March 26-27 2011, Vancouver, BC, Canada.
++
++       -disable-scaffolder
++              Disables the scaffolder.
++
++       -minimum-contig-length minimumContigLength
++              Changes the minimum contig length, default is 100 nucleotides
++
++       -color-space
++              Runs in color-space
++              Needs csfasta files. Activated automatically if csfasta files are provided.
++
++       -use-maximum-seed-coverage maximumSeedCoverageDepth
++              Ignores any seed with a coverage depth above this threshold.
++              The default is 4294967295.
++
++       -use-minimum-seed-coverage minimumSeedCoverageDepth
++              Sets the minimum seed coverage depth.
++              Any path with a coverage depth lower than this will be discarded. The default is 0.
++
++  Distributed storage engine (all these values are for each MPI rank)
++
++       -bloom-filter-bits bits
++              Sets the number of bits for the Bloom filter
++              Default is 268435456 bits, 0 bits disables the Bloom filter.
++
++       -hash-table-buckets buckets
++              Sets the initial number of buckets. Must be a power of 2 !
++              Default value: 268435456
++
++       -hash-table-buckets-per-group buckets
++              Sets the number of buckets per group for sparse storage
++              Default value: 64, Must be between >=1 and <= 64
++
++       -hash-table-load-factor-threshold threshold
++              Sets the load factor threshold for real-time resizing
++              Default value: 0.75, must be >= 0.5 and < 1
++
++       -hash-table-verbosity
++              Activates verbosity for the distributed storage engine
++
++  Biological abundances
++
++       -search searchDirectory
++              Provides a directory containing fasta files to be searched in the de Bruijn graph.
++              Biological abundances will be written to RayOutput/BiologicalAbundances
++              See Documentation/BiologicalAbundances.txt
++
++       -one-color-per-file
++              Sets one color per file instead of one per sequence.
++              By default, each sequence in each file has a different color.
++              For files with large numbers of sequences, using one single color per file may be more efficient.
++
++  Taxonomic profiling with colored de Bruijn graphs
++
++       -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv
++              Provides a taxonomy.
++              Computes and writes detailed taxonomic profiles.
++              See Documentation/Taxonomy.txt for details.
++
++       -gene-ontology OntologyTerms.txt  Annotations.txt
++              Provides an ontology and annotations.
++              OntologyTerms.txt is fetched from http://geneontology.org
++              Annotations.txt is a 2-column file (EMBL_CDS handle	&	gene ontology identifier)
++              See Documentation/GeneOntology.txt
++  Other outputs
++
++       -enable-neighbourhoods
++              Computes contig neighborhoods in the de Bruijn graph
++              Output file: RayOutput/NeighbourhoodRelations.txt
++
++       -amos
++              Writes the AMOS file called RayOutput/AMOS.afg
++              An AMOS file contains read positions on contigs.
++              Can be opened with software with graphical user interface.
++
++       -write-kmers
++              Writes k-mer graph to RayOutput/kmers.txt
++              The resulting file is not utilised by Ray.
++              The resulting file is very large.
++
++       -write-read-markers
++              Writes read markers to disk.
++
++       -write-seeds
++              Writes seed DNA sequences to RayOutput/Rank<rank>.RaySeeds.fasta
++
++       -write-extensions
++              Writes extension DNA sequences to RayOutput/Rank<rank>.RayExtensions.fasta
++
++       -write-contig-paths
++              Writes contig paths with coverage values
++              to RayOutput/Rank<rank>.RayContigPaths.txt
++
++       -write-marker-summary
++              Writes marker statistics.
++
++  Memory usage
++
++       -show-memory-usage
++              Shows memory usage. Data is fetched from /proc on GNU/Linux
++              Needs __linux__
++
++       -show-memory-allocations
++              Shows memory allocation events
++
++  Algorithm verbosity
++
++       -show-extension-choice
++              Shows the choice made (with other choices) during the extension.
++
++       -show-ending-context
++              Shows the ending context of each extension.
++              Shows the children of the vertex where extension was too difficult.
++
++       -show-distance-summary
++              Shows summary of outer distances used for an extension path.
++
++       -show-consensus
++              Shows the consensus when a choice is done.
++
++  Checkpointing
++
++       -write-checkpoints checkpointDirectory
++              Write checkpoint files
++
++       -read-checkpoints checkpointDirectory
++              Read checkpoint files
++
++       -read-write-checkpoints checkpointDirectory
++              Read and write checkpoint files
++
++  Message routing for large number of cores
++
++       -route-messages
++              Enables the Ray message router. Disabled by default.
++              Messages will be routed accordingly so that any rank can communicate directly with only a few others.
++              Without -route-messages, any rank can communicate directly with any other rank.
++              Files generated: Routing/Connections.txt, Routing/Routes.txt and Routing/RelayEvents.txt
++              and Routing/Summary.txt
++
++       -connection-type type
++              Sets the connection type for routes.
++              Accepted values are debruijn, hypercube, polytope, group, random, kautz and complete. Default is debruijn.
++               debruijn: a full de Bruijn graph a given alphabet and diameter
++               hypercube: a hypercube, alphabet is {0,1} and the vertices is a power of 2
++               polytope: a convex regular polytope, alphabet is {0,1,...,B-1} and the vertices is a power of B
++               group: silly model where one representative per group can communicate with outsiders
++               random: Erdős-Rényi model
++               kautz: a full de Kautz graph, which is a subgraph of a de Bruijn graph
++               complete: a full graph with all the possible connections
++              With the type debruijn, the number of ranks must be a power of something.
++              Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on.
++              Otherwise, don't use debruijn routing but use another one
++              With the type kautz, the number of ranks n must be n=(k+1)*k^(d-1) for some k and d
++
++       -routing-graph-degree degree
++              Specifies the outgoing degree for the routing graph.
++              See Documentation/Routing.txt
++
++  Hardware testing
++
++       -test-network-only
++              Tests the network and returns.
++
++       -write-network-test-raw-data
++              Writes one additional file per rank detailing the network test.
++
++       -exchanges NumberOfExchanges
++              Sets the number of exchanges
++
++       -disable-network-test
++              Skips the network test.
++
++  Debugging
++
++       -verify-message-integrity
++              Checks message data reliability for any non-empty message.
++              add '-D CONFIG_SSE_4_2' in the Makefile to use hardware instruction (SSE 4.2)
++
++       -run-profiler
++              Runs the profiler as the code runs. By default, only show granularity warnings.
++              Running the profiler increases running times.
++
++       -with-profiler-details
++              Shows number of messages sent and received in each methods during in each time slices (epochs). Needs -run-profiler.
++
++       -show-communication-events
++              Shows all messages sent and received.
++
++       -show-read-placement
++              Shows read placement in the graph during the extension.
++
++       -debug-bubbles
++              Debugs bubble code.
++              Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events
++
++       -debug-seeds
++              Debugs seed code.
++              Seeds are paths in the graph that are likely unique.
++
++       -debug-fusions
++              Debugs fusion code.
++
++       -debug-scaffolder
++              Debug the scaffolder.
++.SH FILES
++
++  Input files
++
++     Note: file format is determined with file extension.
++
++     .fasta
++     .fasta.gz (needs HAVE_LIBZ=y at compilation)
++     .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
++     .fastq
++     .fastq.gz (needs HAVE_LIBZ=y at compilation)
++     .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
++     .sff (paired reads must be extracted manually)
++     .csfasta (color-space reads)
++
++  Outputted files
++
++  Scaffolds
++
++     RayOutput/Scaffolds.fasta
++     	The scaffold sequences in FASTA format
++     RayOutput/ScaffoldComponents.txt
++     	The components of each scaffold
++     RayOutput/ScaffoldLengths.txt
++     	The length of each scaffold
++     RayOutput/ScaffoldLinks.txt
++     	Scaffold links
++
++  Contigs
++
++     RayOutput/Contigs.fasta
++     	Contiguous sequences in FASTA format
++     RayOutput/ContigLengths.txt
++     	The lengths of contiguous sequences
++
++  Summary
++
++     RayOutput/OutputNumbers.txt
++     	Overall numbers for the assembly
++
++  de Bruijn graph
++
++     RayOutput/CoverageDistribution.txt
++     	The distribution of coverage values
++     RayOutput/CoverageDistributionAnalysis.txt
++     	Analysis of the coverage distribution
++     RayOutput/degreeDistribution.txt
++     	Distribution of ingoing and outgoing degrees
++     RayOutput/kmers.txt
++     	k-mer graph, required option: -write-kmers
++         The resulting file is not utilised by Ray.
++         The resulting file is very large.
++
++  Assembly steps
++
++     RayOutput/SeedLengthDistribution.txt
++         Distribution of seed length
++     RayOutput/Rank<rank>.OptimalReadMarkers.txt
++         Read markers.
++     RayOutput/Rank<rank>.RaySeeds.fasta
++         Seed DNA sequences, required option: -write-seeds
++     RayOutput/Rank<rank>.RayExtensions.fasta
++         Extension DNA sequences, required option: -write-extensions
++     RayOutput/Rank<rank>.RayContigPaths.txt
++         Contig paths with coverage values, required option: -write-contig-paths
++
++  Paired reads
++
++     RayOutput/LibraryStatistics.txt
++     	Estimation of outer distances for paired reads
++     RayOutput/Library<LibraryNumber>.txt
++         Frequencies for observed outer distances (insert size + read lengths)
++
++  Partition
++
++     RayOutput/NumberOfSequences.txt
++         Number of reads in each file
++     RayOutput/SequencePartition.txt
++     	Sequence partition
++
++  Ray software
++
++     RayOutput/RayVersion.txt
++     	The version of Ray
++     RayOutput/RayCommand.txt
++     	The exact same command provided 
++
++  AMOS
++
++     RayOutput/AMOS.afg
++     	Assembly representation in AMOS format, required option: -amos
++
++  Communication
++
++     RayOutput/MessagePassingInterface.txt
++	    	Number of messages sent
++     RayOutput/NetworkTest.txt
++	    	Latencies in microseconds
++     RayOutput/Rank<rank>NetworkTestData.txt
++	    	Network test raw data
++.SH DOCUMENTATION
++
++       - mpiexec -n 1 Ray -help|less (always up-to-date)
++       - This help page (always up-to-date)
++       - The directory Documentation/
++       - Manual (Portable Document Format): InstructionManual.tex (in Documentation)
++       - Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler-users
++.SH AUTHOR
++       Written by Sébastien Boisvert.
++.SH "REPORTING BUGS"
++       Report bugs to denovoassembler-users at lists.sourceforge.net
++       Home page: <http://denovoassembler.sourceforge.net/>
++.SH COPYRIGHT
++       This program is free software: you can redistribute it and/or modify
++       it under the terms of the GNU General Public License as published by
++       the Free Software Foundation, version 3 of the License.
++
++       This program is distributed in the hope that it will be useful,
++       but WITHOUT ANY WARRANTY; without even the implied warranty of
++       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++       GNU General Public License for more details.
++
++       You have received a copy of the GNU General Public License
++       along with this program (see LICENSE).
++
diff --git a/Ray.spec b/Ray.spec
new file mode 100644
index 0000000..a58b422
--- /dev/null
+++ b/Ray.spec
@@ -0,0 +1,194 @@
+Name:           Ray
+Version:        2.1.0
+Release:        5%{?dist}
+Summary:        Parallel genome assemblies for parallel DNA sequencing
+
+Group:          Applications/Engineering
+License:        GPLv3
+URL:            http://denovoassembler.sourceforge.net/
+Source0:        http://downloads.sourceforge.net/denovoassembler/%{name}-v%{version}.tar.bz2
+Patch0:         Ray.manpage.patch
+
+BuildRequires:  openmpi-devel, bzip2-devel, zlib-devel, mpich2-devel
+
+%description
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+Included:
+ - %{name} de novo assembly of single genomes
+ - %{name} Méta de novo assembly of metagenomes
+ - %{name} Communities microbe abundance + taxonomic profiling
+ - %{name} Ontologies gene ontology profiling
+
+%package common
+Summary:        Parallel genome assemblies for parallel DNA sequencing
+Group:          Applications/Engineering
+
+%description common
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+This sub-package contains common files for Ray.
+
+%package openmpi
+Summary:        %{name} package for Open-MPI
+Group:          Applications/Engineering
+Requires:       openmpi, %{name}-common
+
+%description openmpi
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+This sub-package enables parallel computation using openmpi.
+
+%package mpich2
+Summary:        %{name} package for MPICH2
+Group:          Applications/Engineering
+Requires:       mpich2, %{name}-common
+
+%description mpich2
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+This sub-package enables parallel computation using mpich2.
+
+%package doc
+Summary:        Documentation files
+Group:          Documentation
+Requires:       %{name}-common
+
+%description doc
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+This sub-package includes documentation files.
+
+%package extra
+Summary:        Scripts and XSL sheets for post-processing
+Group:          Applications/Engineering
+Requires:       python, R, %{name}-common
+
+%description extra
+%{name} is a parallel software that computes de novo genome assemblies with   
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected 
+computers using the message-passing interface (MPI) standard.
+This sub-package contains scripts and XSL sheets for post-processing.
+
+%prep
+%setup -q -n %{name}-v%{version}
+%patch0
+
+%build
+CXXFLAGS="%{optflags} -D MAXKMERLENGTH=32 -D HAVE_LIBZ -D HAVE_LIBBZ2 -D "
+CXXFLAGS+="RAY_VERSION=\\\\\\\"2.1.0\\\\\\\" "
+CXXFLAGS+="-D RAYPLATFORM_VERSION=\\\\\\\"1.1.0\\\\\\\" -I . -I ../%{name}Platform"
+
+%{_openmpi_load}
+make CXXFLAGS="$CXXFLAGS" HAVE_LIBBZ2=y HAVE_LIBZ=y
+cp %{name} %{name}$MPI_SUFFIX
+
+cp README.md README
+cp %{name}Platform/README README.%{name}Platform
+cp %{name}Platform/AUTHORS AUTHORS.%{name}Platform
+
+make clean
+%{_openmpi_unload}
+
+%{_mpich2_load}
+make CXXFLAGS="$CXXFLAGS" HAVE_LIBBZ2=y HAVE_LIBZ=y
+cp %{name} %{name}$MPI_SUFFIX
+make clean
+%{_mpich2_unload}
+
+%install
+rm -rf %{buildroot}
+
+# Ray-common
+mkdir -p %{buildroot}%{_mandir}/man1
+install -m 0644 %{name}.1 %{buildroot}%{_mandir}/man1/%{name}.1
+
+# Ray-openmpi
+%{_openmpi_load}
+mkdir -p %{buildroot}$MPI_BIN
+install -m 0755 %{name}$MPI_SUFFIX %{buildroot}$MPI_BIN
+%{_openmpi_unload}
+
+# Ray-mpich2
+%{_mpich2_load}
+mkdir -p %{buildroot}$MPI_BIN
+install -m 0755 %{name}$MPI_SUFFIX %{buildroot}$MPI_BIN
+%{_mpich2_unload}
+
+# Ray-doc
+mkdir doc
+cp -ar %{name}Platform/Documentation/ doc/%{name}Platform
+chmod 644 doc/%{name}Platform/*
+chmod 644 Documentation/*
+
+# Ray-extra
+mkdir -p %{buildroot}%{_datadir}/%{name}
+cp -r scripts %{buildroot}%{_datadir}/%{name}
+chmod 0755 %{buildroot}%{_datadir}/%{name}/scripts
+
+%clean
+rm -rf %{buildroot}
+
+%files common
+%doc MANUAL_PAGE.txt gpl-3.0.txt LICENSE.txt 
+%doc %{name}Platform/lgpl-3.0.txt
+%doc AUTHORS AUTHORS.%{name}Platform
+%doc README README.%{name}Platform
+%{_mandir}/man1/%{name}.1*
+
+%files openmpi
+%{_libdir}/openmpi/bin/%{name}*
+
+%files mpich2
+%{_libdir}/mpich2/bin/%{name}*
+
+%files doc
+%doc Documentation/*
+%doc doc/%{name}Platform/
+
+%files extra
+%{_datadir}/%{name}/
+
+%changelog
+
+* Fri Nov 29 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-5
+- Added a patch for the man page
+
+* Fri Nov 5 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-4
+- The man page encoding is en_US.UTF-8
+- Added more specific descriptions
+
+* Fri Nov 4 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-3
+- Changed the package name from ray to Ray
+- Renamed README.md to README
+- Added AUTHORS, README.RayPlatform, AUTHORS.RayPlatform
+
+* Fri Nov 4 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-2
+- Added build dependency help2man 
+- Added OMPI_MCA_orte_rsh_agent to pass mock builds
+
+* Fri Nov 3 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-1
+
+- The Spec file was (informally) reviewed by Jussi Lehtola
+- Moved sub-package declarations to the top
+- Added sub-packages common, openmpi, mpich2
+- Removed useless '/' after buildroot
+- Fixed the packaging of Documentation
+- Removed symbols that are not U.S. American English from man page
+- Added Fedora compilation flags (optflags)
+- The Spec file was (informally) reviewed a second time by Jussi Lehtola
+- CXXFLAGS was shortened
+- Replacement of non-ASCII symbols is more compact with sed
+- ray-extra now ships _datadir/ray/ instead of _datadir/ray/scripts/.
+- This is the initial Ray package for Fedora


More information about the scm-commits mailing list