BOL: All Site Activity

All Site Activity

- RAJESH DETROJA@rajeshdetroja
RAJESH DETROJA is now a friend with Anjana 3131 days ago
RAJESH DETROJA@rajeshdetroja
Anjana@anjana
- Anjana@anjana
Anjana is now a friend with RAJESH DETROJA 3131 days ago
Anjana@anjana
RAJESH DETROJA@rajeshdetroja
- Anjana@anjana
Anjana asked RNAseq dataset for different cancers stages !! 3131 days ago

Is there any database exists which provides RNAseq data for different cancers stages?
- Anjana@anjana
Anjana joined the group R and Bioconductor 3131 days ago
- Anjana@anjana
Anjana posted to the wire 3131 days ago

Happy to join BOL bioinformatics network #Join #BOL #Bioinfo
- Anjana@anjana
Anjana has a new avatar 3131 days ago
- Rahul Nayak@rahul
Rahul Nayak commented on a bookmark R and Bioconductor Tutorial in the group R and Bioconductor 3133 days ago

Learn R by urself www.datasciencecentral.com/profiles/blogs/learning-r-in-seven-simple-steps
- Neel@neelam
Neel bookmarked Picard 3133 days ago

Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF...

http://broadinstitute.github.io/picard/
- Poonam Mahapatra@poonam
Poonam Mahapatra bookmarked Easyfig 3133 days ago

Easyfig has moved to github, for newer releases of Easyfig please visit our new webpage - https://mjsull.github.io/Easyfig. Easyfig is a Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use...

http://easyfig.sourceforge.net/
- Poonam Mahapatra@poonam
Poonam Mahapatra commented on the blog Computer simulation of genetic mechanism !! 3134 days ago

Thanks for the list. I came across GPOPSIM: a simulation tool for whole-genome genetic data ( http://bmcgenet.biomedcentral.com/articles/10.1186/s12863-015-0173-4 ), which seems the best for be a useful tool for the methodological and...
- Jit@jit.aber
Jit bookmarked GATB : Genome Analysis Toolbox with de-Bruijn graph 3134 days ago

The Genome Analysis Toolbox with de-Bruijn graph (GATB) provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge...

https://gatb.inria.fr/
- Abhi@abhinav
Abhi bookmarked RASTtk : algorithm for building custom annotation pipelines and annotating batches of genomes 3135 days ago

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes...

http://rast.nmpdr.org/
- Abhi@abhinav
Abhi posted a new ad in the Opportunity Bioinformatics Faculty at TNU 3135 days ago
- Jit@jit.aber
Jit bookmarked Smash: An alignment-free method to find and visualise rearrangements between pairs of DNA sequences 3136 days ago

Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements. The detection is based on conditional exclusive compression, namely using a FCM (Markov model), of high context order (typically 20). For...

http://bioinformatics.ua.pt/software/smash/
- Jit@jit.aber
Jit bookmarked MEDEA: Comparative Genomic Visualization with Adobe Flash 3136 days ago

As the number of sequence and annotated genomes grows larger, the need to understand, compare, and contrast the data becomes increasingly important. Using the power of the human visual system to detect trends and spot outliers is necessary in such...

http://www.broadinstitute.org/annotation/medea/
- Jit@jit.aber
Jit bookmarked CANU: Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing. 3136 days ago

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered. Canu is...

https://github.com/marbl/canu
Comments
- Poonam Mahapatra@poonam
  
  Poonam Mahapatra 2380 days ago
  Canu is one of the best de novo assemblers available for long reads - it’s a fork and updated version of the Celera assembler that was used to assemble the human genome.
  It is quite a complex beast that has HPC integration built in - though you can turn this off. However, large assembly jobs are best run in parallel, making HPC integration essential. This can get tough if your cluster has a non-standard configuration.
  Run canu without any options to get help:
  canu
  This produces:
  usage: canu [-version] \ [-correct | -trim | -assemble | -trim-assemble] \ [-s <assembly-specifications-file>] \ -p <assembly-prefix> \ -d <assembly-directory> \ genomeSize=<number>[g|m|k] \ [other-options] \ [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq By default, all three stages (correct, trim, assemble) are computed. To compute only a single stage, use: -correct - generate corrected reads -trim - generate trimmed reads -assemble - generate an assembly -trim-assemble - generate trimmed reads and then assemble them The assembly is computed in the (created) -d <assembly-directory>, with most files named using the -p <assembly-prefix>. The genome size is your best guess of the genome size of what is being assembled. It is used mostly to compute coverage in reads. Fractional values are allowed: '4.7m' is the same as '4700k' and '4700000' A full list of options can be printed with '-options'. All options can be supplied in an optional sepc file. Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz. Reads are specified by the technology they were generated with: -pacbio-raw <files> -pacbio-corrected <files> -nanopore-raw <files> -nanopore-corrected <files> Complete documentation at http://canu.readthedocs.org/en/latest/
  Canu has three stages which it runs in order:
  
  Correct
  
  Trim
  
  Assemble
  
  By default canu runs these one after the other, but they can be run individually.
  An example “full pipeline” command would be:
  canu -p meta \ -d meta \ genomeSize=40m \ useGrid=false \ -nanopore-raw /vol_b/public_data/minion_brown_metagenome/brown_metagenome.2D.10.fasta
  This puts output in directory meta with prefix “meta”. We estimate the genome size, tell canu NOT to use HPC (as we don’t have one for porecamp) and give it some ONT data as fasta.
  This runs pretty quickly but doesn’t assemble anything. It’s a low coverage synthetic metagenome, so no surprise. It does produce corrected reads though! These could be used in the metagenomics practical (hint!)
  Now try the E coli subset:
  canu -p ecoli -d ecoli genomeSize=4.8m useGrid=false -nanopore-raw /vol_b/public_data/minion_ecoli_sample/ecoli_sample.template.fasta
  This one will take a bit longer ;)
- Rahul Nayak@rahul
  
  Rahul Nayak 2304 days ago
  ➜ bin git:(master) ✗ ./canu
  usage: canu [-version] [-citation] \
  [-correct | -trim | -assemble | -trim-assemble] \
  [-s <assembly-specifications-file>] \
  -p <assembly-prefix> \
  -d <assembly-directory> \
  genomeSize=<number>[g|m|k] \
  [other-options] \
  [-pacbio-raw |
  -pacbio-corrected |
  -nanopore-raw |
  -nanopore-corrected] file1 file2 ...
  example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz
  
  To restrict canu to only a specific stage, use:
  -correct - generate corrected reads
  -trim - generate trimmed reads
  -assemble - generate an assembly
  -trim-assemble - generate trimmed reads and then assemble them
  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>. This directory is created if needed. It is not
  possible to run multiple assemblies in the same directory.
  The genome size should be your best guess of the haploid genome size of what is being
  assembled. It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'
  Some common options:
  useGrid=string
  - Run under grid control (true), locally (false), or set up for grid control
  but don't submit any jobs (remote)
  rawErrorRate=fraction-error
  - The allowed difference in an overlap between two raw uncorrected reads. For lower
  quality reads, use a higher number. The defaults are 0.300 for PacBio reads and
  0.500 for Nanopore reads.
  correctedErrorRate=fraction-error
  - The allowed difference in an overlap between two corrected reads. Assemblies of
  low coverage or data with biological differences will benefit from a slight increase
  in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
  gridOptions=string
  - Pass string to the command used to submit jobs to the grid. Can be used to set
  maximum run time limits. Should NOT be used to set memory limits; Canu will do
  that for you.
  minReadLength=number
  - Ignore reads shorter than 'number' bases long. Default: 1000.
  minOverlapLength=number
  - Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.
  A full list of options can be printed with '-options'. All options can be supplied in
  an optional sepc file with the -s option.
  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
  Reads are specified by the technology they were generated with, and any processing performed:
  -pacbio-raw <files> Reads are straight off the machine.
  -pacbio-corrected <files> Reads have been corrected.
  -nanopore-raw <files>
  -nanopore-corrected <files>
  Complete documentation at http://canu.readthedocs.org/en/latest/
- Jit@jit.aber
Jit answered the question Comparison of mapping tools ! 3137 days ago

You should check the segemehl algorithm paper http://bioinformatics.oxfordjournals.org/content/early/2014/03/13/bioinformatics.btu146.full.pdf+html , in which they compare the mapping tools. For further detail of the Algo...
- Jit@jit.aber
Jit commented on a bookmark ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and... 3137 days ago

Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines CEGMA (formally discontinued but still useful) BUSCO (we have issues with fish, seems not to be tailored to that group of...
- Neel@neelam
Neel bookmarked mrFAST: Micro Read Fast Alignment Search Tool 3137 days ago

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold,...

http://mrfast.sourceforge.net/manual.html
- Neel@neelam
Neel bookmarked HOMER: Software for motif discovery and next-gen sequencing analysis 3137 days ago

This tutorial covers topics independently of HOMER, and represents knowledge which is important to know before diving head first into more advanced analysis tools such as HOMER. Setting up your computing environment Retrieving and storing...

http://homer.salk.edu/homer/basicTutorial/

BOL

Our Sponsors

All Site Activity