[Ray] Initial import (#872783).
Sebastien Boisvert
sebhtml at fedoraproject.org
Mon Jul 15 16:34:03 UTC 2013
commit d550d9ccb0dac44d37808464d0c4cf25c627c4dc
Author: Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca>
Date: Mon Jul 15 12:32:14 2013 -0400
Initial import (#872783).
Ray.manpage.patch | 410 +++++++++++++++++++++++++++++++++++++++++++++++++++++
Ray.spec | 194 +++++++++++++++++++++++++
2 files changed, 604 insertions(+), 0 deletions(-)
---
diff --git a/Ray.manpage.patch b/Ray.manpage.patch
new file mode 100644
index 0000000..35df432
--- /dev/null
+++ b/Ray.manpage.patch
@@ -0,0 +1,410 @@
+--- /dev/null 2012-11-27 10:10:35.990752806 -0500
++++ Ray.1 2012-11-29 21:48:32.447898203 -0500
+@@ -0,0 +1,407 @@
++.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.40.12.
++.TH RAY "1" "November 2012" "Ray 2.1.0" "User Commands"
++
++.SH NAME
++Ray - assemble genomes in parallel using the message-passing interface
++.SH SYNOPSIS
++ mpiexec -n NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
++
++ mpiexec -n NUMBER_OF_RANKS Ray Ray.conf # with commands in a file
++.SH DESCRIPTION
++
++ The Ray genome assembler is built on top of the RayPlatform, a generic plugin-based
++ distributed and parallel compute engine that uses the message-passing interface
++ for passing messages.
++
++ Ray targets several applications:
++
++ - de novo genome assembly (with Ray vanilla)
++ - de novo meta-genome assembly (with Ray Méta)
++ - de novo transcriptome assembly (works, but not tested a lot)
++ - quantification of contig abundances
++ - quantification of microbiome consortia members (with Ray Communities)
++ - quantification of transcript expression
++ - taxonomy profiling of samples (with Ray Communities)
++ - gene ontology profiling of samples (with Ray Ontologies)
++
++.SH OPTIONS
++
++ -help
++ Displays this help page.
++
++ -version
++ Displays Ray version and compilation options.
++
++ Using a configuration file
++
++ Ray can be launched with
++ mpiexec -n 16 Ray Ray.conf
++ The configuration file can include comments (starting with #).
++
++ K-mer length
++
++ -k kmerLength
++ Selects the length of k-mers. The default value is 21.
++ It must be odd because reverse-complement vertices are stored together.
++ The maximum length is defined at compilation by MAXKMERLENGTH
++ Larger k-mers utilise more memory.
++
++ Inputs
++
++ -p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation]
++ Provides two files containing paired-end reads.
++ averageOuterDistance and standardDeviation are automatically computed if not provided.
++
++ -i interleavedSequenceFile [averageOuterDistance standardDeviation]
++ Provides one file containing interleaved paired-end reads.
++ averageOuterDistance and standardDeviation are automatically computed if not provided.
++
++ -s sequenceFile
++ Provides a file containing single-end reads.
++
++ Outputs
++
++ -o outputDirectory
++ Specifies the directory for outputted files. Default is RayOutput
++
++ Assembly options (defaults work well)
++
++ -disable-recycling
++ Disables read recycling during the assembly
++ reads will be set free in 3 cases:
++ 1. the distance did not match for a pair
++ 2. the read has not met its mate
++ 3. the library population indicates a wrong placement
++ see Constrained traversal of repeats with paired sequences.
++ Sébastien Boisvert, Élénie Godzaridis, François Laviolette & Jacques Corbeil.
++ First Annual RECOMB Satellite Workshop on Massively Parallel Sequencing, March 26-27 2011, Vancouver, BC, Canada.
++
++ -disable-scaffolder
++ Disables the scaffolder.
++
++ -minimum-contig-length minimumContigLength
++ Changes the minimum contig length, default is 100 nucleotides
++
++ -color-space
++ Runs in color-space
++ Needs csfasta files. Activated automatically if csfasta files are provided.
++
++ -use-maximum-seed-coverage maximumSeedCoverageDepth
++ Ignores any seed with a coverage depth above this threshold.
++ The default is 4294967295.
++
++ -use-minimum-seed-coverage minimumSeedCoverageDepth
++ Sets the minimum seed coverage depth.
++ Any path with a coverage depth lower than this will be discarded. The default is 0.
++
++ Distributed storage engine (all these values are for each MPI rank)
++
++ -bloom-filter-bits bits
++ Sets the number of bits for the Bloom filter
++ Default is 268435456 bits, 0 bits disables the Bloom filter.
++
++ -hash-table-buckets buckets
++ Sets the initial number of buckets. Must be a power of 2 !
++ Default value: 268435456
++
++ -hash-table-buckets-per-group buckets
++ Sets the number of buckets per group for sparse storage
++ Default value: 64, Must be between >=1 and <= 64
++
++ -hash-table-load-factor-threshold threshold
++ Sets the load factor threshold for real-time resizing
++ Default value: 0.75, must be >= 0.5 and < 1
++
++ -hash-table-verbosity
++ Activates verbosity for the distributed storage engine
++
++ Biological abundances
++
++ -search searchDirectory
++ Provides a directory containing fasta files to be searched in the de Bruijn graph.
++ Biological abundances will be written to RayOutput/BiologicalAbundances
++ See Documentation/BiologicalAbundances.txt
++
++ -one-color-per-file
++ Sets one color per file instead of one per sequence.
++ By default, each sequence in each file has a different color.
++ For files with large numbers of sequences, using one single color per file may be more efficient.
++
++ Taxonomic profiling with colored de Bruijn graphs
++
++ -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv
++ Provides a taxonomy.
++ Computes and writes detailed taxonomic profiles.
++ See Documentation/Taxonomy.txt for details.
++
++ -gene-ontology OntologyTerms.txt Annotations.txt
++ Provides an ontology and annotations.
++ OntologyTerms.txt is fetched from http://geneontology.org
++ Annotations.txt is a 2-column file (EMBL_CDS handle & gene ontology identifier)
++ See Documentation/GeneOntology.txt
++ Other outputs
++
++ -enable-neighbourhoods
++ Computes contig neighborhoods in the de Bruijn graph
++ Output file: RayOutput/NeighbourhoodRelations.txt
++
++ -amos
++ Writes the AMOS file called RayOutput/AMOS.afg
++ An AMOS file contains read positions on contigs.
++ Can be opened with software with graphical user interface.
++
++ -write-kmers
++ Writes k-mer graph to RayOutput/kmers.txt
++ The resulting file is not utilised by Ray.
++ The resulting file is very large.
++
++ -write-read-markers
++ Writes read markers to disk.
++
++ -write-seeds
++ Writes seed DNA sequences to RayOutput/Rank<rank>.RaySeeds.fasta
++
++ -write-extensions
++ Writes extension DNA sequences to RayOutput/Rank<rank>.RayExtensions.fasta
++
++ -write-contig-paths
++ Writes contig paths with coverage values
++ to RayOutput/Rank<rank>.RayContigPaths.txt
++
++ -write-marker-summary
++ Writes marker statistics.
++
++ Memory usage
++
++ -show-memory-usage
++ Shows memory usage. Data is fetched from /proc on GNU/Linux
++ Needs __linux__
++
++ -show-memory-allocations
++ Shows memory allocation events
++
++ Algorithm verbosity
++
++ -show-extension-choice
++ Shows the choice made (with other choices) during the extension.
++
++ -show-ending-context
++ Shows the ending context of each extension.
++ Shows the children of the vertex where extension was too difficult.
++
++ -show-distance-summary
++ Shows summary of outer distances used for an extension path.
++
++ -show-consensus
++ Shows the consensus when a choice is done.
++
++ Checkpointing
++
++ -write-checkpoints checkpointDirectory
++ Write checkpoint files
++
++ -read-checkpoints checkpointDirectory
++ Read checkpoint files
++
++ -read-write-checkpoints checkpointDirectory
++ Read and write checkpoint files
++
++ Message routing for large number of cores
++
++ -route-messages
++ Enables the Ray message router. Disabled by default.
++ Messages will be routed accordingly so that any rank can communicate directly with only a few others.
++ Without -route-messages, any rank can communicate directly with any other rank.
++ Files generated: Routing/Connections.txt, Routing/Routes.txt and Routing/RelayEvents.txt
++ and Routing/Summary.txt
++
++ -connection-type type
++ Sets the connection type for routes.
++ Accepted values are debruijn, hypercube, polytope, group, random, kautz and complete. Default is debruijn.
++ debruijn: a full de Bruijn graph a given alphabet and diameter
++ hypercube: a hypercube, alphabet is {0,1} and the vertices is a power of 2
++ polytope: a convex regular polytope, alphabet is {0,1,...,B-1} and the vertices is a power of B
++ group: silly model where one representative per group can communicate with outsiders
++ random: Erdős-Rényi model
++ kautz: a full de Kautz graph, which is a subgraph of a de Bruijn graph
++ complete: a full graph with all the possible connections
++ With the type debruijn, the number of ranks must be a power of something.
++ Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on.
++ Otherwise, don't use debruijn routing but use another one
++ With the type kautz, the number of ranks n must be n=(k+1)*k^(d-1) for some k and d
++
++ -routing-graph-degree degree
++ Specifies the outgoing degree for the routing graph.
++ See Documentation/Routing.txt
++
++ Hardware testing
++
++ -test-network-only
++ Tests the network and returns.
++
++ -write-network-test-raw-data
++ Writes one additional file per rank detailing the network test.
++
++ -exchanges NumberOfExchanges
++ Sets the number of exchanges
++
++ -disable-network-test
++ Skips the network test.
++
++ Debugging
++
++ -verify-message-integrity
++ Checks message data reliability for any non-empty message.
++ add '-D CONFIG_SSE_4_2' in the Makefile to use hardware instruction (SSE 4.2)
++
++ -run-profiler
++ Runs the profiler as the code runs. By default, only show granularity warnings.
++ Running the profiler increases running times.
++
++ -with-profiler-details
++ Shows number of messages sent and received in each methods during in each time slices (epochs). Needs -run-profiler.
++
++ -show-communication-events
++ Shows all messages sent and received.
++
++ -show-read-placement
++ Shows read placement in the graph during the extension.
++
++ -debug-bubbles
++ Debugs bubble code.
++ Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events
++
++ -debug-seeds
++ Debugs seed code.
++ Seeds are paths in the graph that are likely unique.
++
++ -debug-fusions
++ Debugs fusion code.
++
++ -debug-scaffolder
++ Debug the scaffolder.
++.SH FILES
++
++ Input files
++
++ Note: file format is determined with file extension.
++
++ .fasta
++ .fasta.gz (needs HAVE_LIBZ=y at compilation)
++ .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
++ .fastq
++ .fastq.gz (needs HAVE_LIBZ=y at compilation)
++ .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
++ .sff (paired reads must be extracted manually)
++ .csfasta (color-space reads)
++
++ Outputted files
++
++ Scaffolds
++
++ RayOutput/Scaffolds.fasta
++ The scaffold sequences in FASTA format
++ RayOutput/ScaffoldComponents.txt
++ The components of each scaffold
++ RayOutput/ScaffoldLengths.txt
++ The length of each scaffold
++ RayOutput/ScaffoldLinks.txt
++ Scaffold links
++
++ Contigs
++
++ RayOutput/Contigs.fasta
++ Contiguous sequences in FASTA format
++ RayOutput/ContigLengths.txt
++ The lengths of contiguous sequences
++
++ Summary
++
++ RayOutput/OutputNumbers.txt
++ Overall numbers for the assembly
++
++ de Bruijn graph
++
++ RayOutput/CoverageDistribution.txt
++ The distribution of coverage values
++ RayOutput/CoverageDistributionAnalysis.txt
++ Analysis of the coverage distribution
++ RayOutput/degreeDistribution.txt
++ Distribution of ingoing and outgoing degrees
++ RayOutput/kmers.txt
++ k-mer graph, required option: -write-kmers
++ The resulting file is not utilised by Ray.
++ The resulting file is very large.
++
++ Assembly steps
++
++ RayOutput/SeedLengthDistribution.txt
++ Distribution of seed length
++ RayOutput/Rank<rank>.OptimalReadMarkers.txt
++ Read markers.
++ RayOutput/Rank<rank>.RaySeeds.fasta
++ Seed DNA sequences, required option: -write-seeds
++ RayOutput/Rank<rank>.RayExtensions.fasta
++ Extension DNA sequences, required option: -write-extensions
++ RayOutput/Rank<rank>.RayContigPaths.txt
++ Contig paths with coverage values, required option: -write-contig-paths
++
++ Paired reads
++
++ RayOutput/LibraryStatistics.txt
++ Estimation of outer distances for paired reads
++ RayOutput/Library<LibraryNumber>.txt
++ Frequencies for observed outer distances (insert size + read lengths)
++
++ Partition
++
++ RayOutput/NumberOfSequences.txt
++ Number of reads in each file
++ RayOutput/SequencePartition.txt
++ Sequence partition
++
++ Ray software
++
++ RayOutput/RayVersion.txt
++ The version of Ray
++ RayOutput/RayCommand.txt
++ The exact same command provided
++
++ AMOS
++
++ RayOutput/AMOS.afg
++ Assembly representation in AMOS format, required option: -amos
++
++ Communication
++
++ RayOutput/MessagePassingInterface.txt
++ Number of messages sent
++ RayOutput/NetworkTest.txt
++ Latencies in microseconds
++ RayOutput/Rank<rank>NetworkTestData.txt
++ Network test raw data
++.SH DOCUMENTATION
++
++ - mpiexec -n 1 Ray -help|less (always up-to-date)
++ - This help page (always up-to-date)
++ - The directory Documentation/
++ - Manual (Portable Document Format): InstructionManual.tex (in Documentation)
++ - Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler-users
++.SH AUTHOR
++ Written by Sébastien Boisvert.
++.SH "REPORTING BUGS"
++ Report bugs to denovoassembler-users at lists.sourceforge.net
++ Home page: <http://denovoassembler.sourceforge.net/>
++.SH COPYRIGHT
++ This program is free software: you can redistribute it and/or modify
++ it under the terms of the GNU General Public License as published by
++ the Free Software Foundation, version 3 of the License.
++
++ This program is distributed in the hope that it will be useful,
++ but WITHOUT ANY WARRANTY; without even the implied warranty of
++ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ GNU General Public License for more details.
++
++ You have received a copy of the GNU General Public License
++ along with this program (see LICENSE).
++
diff --git a/Ray.spec b/Ray.spec
new file mode 100644
index 0000000..a58b422
--- /dev/null
+++ b/Ray.spec
@@ -0,0 +1,194 @@
+Name: Ray
+Version: 2.1.0
+Release: 5%{?dist}
+Summary: Parallel genome assemblies for parallel DNA sequencing
+
+Group: Applications/Engineering
+License: GPLv3
+URL: http://denovoassembler.sourceforge.net/
+Source0: http://downloads.sourceforge.net/denovoassembler/%{name}-v%{version}.tar.bz2
+Patch0: Ray.manpage.patch
+
+BuildRequires: openmpi-devel, bzip2-devel, zlib-devel, mpich2-devel
+
+%description
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+Included:
+ - %{name} de novo assembly of single genomes
+ - %{name} Méta de novo assembly of metagenomes
+ - %{name} Communities microbe abundance + taxonomic profiling
+ - %{name} Ontologies gene ontology profiling
+
+%package common
+Summary: Parallel genome assemblies for parallel DNA sequencing
+Group: Applications/Engineering
+
+%description common
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+This sub-package contains common files for Ray.
+
+%package openmpi
+Summary: %{name} package for Open-MPI
+Group: Applications/Engineering
+Requires: openmpi, %{name}-common
+
+%description openmpi
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+This sub-package enables parallel computation using openmpi.
+
+%package mpich2
+Summary: %{name} package for MPICH2
+Group: Applications/Engineering
+Requires: mpich2, %{name}-common
+
+%description mpich2
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+This sub-package enables parallel computation using mpich2.
+
+%package doc
+Summary: Documentation files
+Group: Documentation
+Requires: %{name}-common
+
+%description doc
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+This sub-package includes documentation files.
+
+%package extra
+Summary: Scripts and XSL sheets for post-processing
+Group: Applications/Engineering
+Requires: python, R, %{name}-common
+
+%description extra
+%{name} is a parallel software that computes de novo genome assemblies with
+next-generation sequencing data.
+%{name} is written in C++ and can run in parallel on numerous interconnected
+computers using the message-passing interface (MPI) standard.
+This sub-package contains scripts and XSL sheets for post-processing.
+
+%prep
+%setup -q -n %{name}-v%{version}
+%patch0
+
+%build
+CXXFLAGS="%{optflags} -D MAXKMERLENGTH=32 -D HAVE_LIBZ -D HAVE_LIBBZ2 -D "
+CXXFLAGS+="RAY_VERSION=\\\\\\\"2.1.0\\\\\\\" "
+CXXFLAGS+="-D RAYPLATFORM_VERSION=\\\\\\\"1.1.0\\\\\\\" -I . -I ../%{name}Platform"
+
+%{_openmpi_load}
+make CXXFLAGS="$CXXFLAGS" HAVE_LIBBZ2=y HAVE_LIBZ=y
+cp %{name} %{name}$MPI_SUFFIX
+
+cp README.md README
+cp %{name}Platform/README README.%{name}Platform
+cp %{name}Platform/AUTHORS AUTHORS.%{name}Platform
+
+make clean
+%{_openmpi_unload}
+
+%{_mpich2_load}
+make CXXFLAGS="$CXXFLAGS" HAVE_LIBBZ2=y HAVE_LIBZ=y
+cp %{name} %{name}$MPI_SUFFIX
+make clean
+%{_mpich2_unload}
+
+%install
+rm -rf %{buildroot}
+
+# Ray-common
+mkdir -p %{buildroot}%{_mandir}/man1
+install -m 0644 %{name}.1 %{buildroot}%{_mandir}/man1/%{name}.1
+
+# Ray-openmpi
+%{_openmpi_load}
+mkdir -p %{buildroot}$MPI_BIN
+install -m 0755 %{name}$MPI_SUFFIX %{buildroot}$MPI_BIN
+%{_openmpi_unload}
+
+# Ray-mpich2
+%{_mpich2_load}
+mkdir -p %{buildroot}$MPI_BIN
+install -m 0755 %{name}$MPI_SUFFIX %{buildroot}$MPI_BIN
+%{_mpich2_unload}
+
+# Ray-doc
+mkdir doc
+cp -ar %{name}Platform/Documentation/ doc/%{name}Platform
+chmod 644 doc/%{name}Platform/*
+chmod 644 Documentation/*
+
+# Ray-extra
+mkdir -p %{buildroot}%{_datadir}/%{name}
+cp -r scripts %{buildroot}%{_datadir}/%{name}
+chmod 0755 %{buildroot}%{_datadir}/%{name}/scripts
+
+%clean
+rm -rf %{buildroot}
+
+%files common
+%doc MANUAL_PAGE.txt gpl-3.0.txt LICENSE.txt
+%doc %{name}Platform/lgpl-3.0.txt
+%doc AUTHORS AUTHORS.%{name}Platform
+%doc README README.%{name}Platform
+%{_mandir}/man1/%{name}.1*
+
+%files openmpi
+%{_libdir}/openmpi/bin/%{name}*
+
+%files mpich2
+%{_libdir}/mpich2/bin/%{name}*
+
+%files doc
+%doc Documentation/*
+%doc doc/%{name}Platform/
+
+%files extra
+%{_datadir}/%{name}/
+
+%changelog
+
+* Fri Nov 29 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-5
+- Added a patch for the man page
+
+* Fri Nov 5 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-4
+- The man page encoding is en_US.UTF-8
+- Added more specific descriptions
+
+* Fri Nov 4 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-3
+- Changed the package name from ray to Ray
+- Renamed README.md to README
+- Added AUTHORS, README.RayPlatform, AUTHORS.RayPlatform
+
+* Fri Nov 4 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-2
+- Added build dependency help2man
+- Added OMPI_MCA_orte_rsh_agent to pass mock builds
+
+* Fri Nov 3 2012 Sébastien Boisvert <sebastien.boisvert.3 at ulaval.ca> - 2.1.0-1
+
+- The Spec file was (informally) reviewed by Jussi Lehtola
+- Moved sub-package declarations to the top
+- Added sub-packages common, openmpi, mpich2
+- Removed useless '/' after buildroot
+- Fixed the packaging of Documentation
+- Removed symbols that are not U.S. American English from man page
+- Added Fedora compilation flags (optflags)
+- The Spec file was (informally) reviewed a second time by Jussi Lehtola
+- CXXFLAGS was shortened
+- Replacement of non-ASCII symbols is more compact with sed
+- ray-extra now ships _datadir/ray/ instead of _datadir/ray/scripts/.
+- This is the initial Ray package for Fedora
More information about the scm-commits
mailing list