Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




All Site Activity

  • Is there any database exists which provides RNAseq data for different cancers stages?
  • Anjana joined the group R and Bioconductor 3131 days ago
  • Anjana posted to the wire 3131 days ago
    Happy to join BOL bioinformatics network #Join #BOL #Bioinfo
  • Anjana has a new avatar 3131 days ago
    Anjana
  • Rahul Nayak commented on a bookmark R and Bioconductor Tutorial in the group R and Bioconductor 3133 days ago
    Learn R by urself www.datasciencecentral.com/profiles/blogs/learning-r-in-seven-simple-steps
  • Neel bookmarked Picard 3133 days ago
    Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF...
  • Poonam Mahapatra bookmarked Easyfig 3133 days ago
    Easyfig has moved to github, for newer releases of Easyfig please visit our new webpage - https://mjsull.github.io/Easyfig.  Easyfig is a Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use...
  • Thanks for the list. I came across GPOPSIM: a simulation tool for whole-genome genetic data ( http://bmcgenet.biomedcentral.com/articles/10.1186/s12863-015-0173-4 ), which seems the best for be a useful tool for the methodological and...
  • The Genome Analysis Toolbox with de-Bruijn graph (GATB) provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge...
  • The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes...
  • Abhi posted a new ad in the Opportunity Bioinformatics Faculty at TNU 3135 days ago
  • Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements. The detection is based on conditional exclusive compression, namely using a FCM (Markov model), of high context order (typically 20). For...
  • As the number of sequence and annotated genomes grows larger, the need to understand, compare, and contrast the data becomes increasingly important. Using the power of the human visual system to detect trends and spot outliers is necessary in such...
  • Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered. Canu is...
    Comments
    • Poonam Mahapatra 2380 days ago

      Canu is one of the best de novo assemblers available for long reads - it’s a fork and updated version of the Celera assembler that was used to assemble the human genome.

      It is quite a complex beast that has HPC integration built in - though you can turn this off. However, large assembly jobs are best run in parallel, making HPC integration essential. This can get tough if your cluster has a non-standard configuration.

      Run canu without any options to get help:

      canu
      

      This produces:

      usage: canu [-version] \
                  [-correct | -trim | -assemble | -trim-assemble] \
                  [-s <assembly-specifications-file>] \
                   -p <assembly-prefix> \
                   -d <assembly-directory> \
                   genomeSize=<number>[g|m|k] \
                  [other-options] \
                  [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq
      
        By default, all three stages (correct, trim, assemble) are computed.
        To compute only a single stage, use:
          -correct       - generate corrected reads
          -trim          - generate trimmed reads
          -assemble      - generate an assembly
          -trim-assemble - generate trimmed reads and then assemble them
      
        The assembly is computed in the (created) -d <assembly-directory>, with most
        files named using the -p <assembly-prefix>.
      
        The genome size is your best guess of the genome size of what is being assembled.
        It is used mostly to compute coverage in reads.  Fractional values are allowed: '4.7m'
        is the same as '4700k' and '4700000'
      
        A full list of options can be printed with '-options'.  All options
        can be supplied in an optional sepc file.
      
        Reads can be either FASTA or FASTQ format, uncompressed, or compressed
        with gz, bz2 or xz.  Reads are specified by the technology they were
        generated with:
          -pacbio-raw         <files>
          -pacbio-corrected   <files>
          -nanopore-raw       <files>
          -nanopore-corrected <files>
      
      Complete documentation at http://canu.readthedocs.org/en/latest/
      

      Canu has three stages which it runs in order:

      • Correct
      • Trim
      • Assemble

      By default canu runs these one after the other, but they can be run individually.

      An example “full pipeline” command would be:

      canu -p meta \
           -d meta \
           genomeSize=40m \
           useGrid=false \
           -nanopore-raw /vol_b/public_data/minion_brown_metagenome/brown_metagenome.2D.10.fasta
      

      This puts output in directory meta with prefix “meta”. We estimate the genome size, tell canu NOT to use HPC (as we don’t have one for porecamp) and give it some ONT data as fasta.

      This runs pretty quickly but doesn’t assemble anything. It’s a low coverage synthetic metagenome, so no surprise. It does produce corrected reads though! These could be used in the metagenomics practical (hint!)

      Now try the E coli subset:

      canu -p ecoli      
           -d ecoli      
           genomeSize=4.8m      
           useGrid=false      
           -nanopore-raw /vol_b/public_data/minion_ecoli_sample/ecoli_sample.template.fasta
      

      This one will take a bit longer ;)

    • Rahul Nayak 2304 days ago

      ➜ bin git:(master) ✗ ./canu

      usage: canu [-version] [-citation] \
      [-correct | -trim | -assemble | -trim-assemble] \
      [-s <assembly-specifications-file>] \
      -p <assembly-prefix> \
      -d <assembly-directory> \
      genomeSize=<number>[g|m|k] \
      [other-options] \
      [-pacbio-raw |
      -pacbio-corrected |
      -nanopore-raw |
      -nanopore-corrected] file1 file2 ...

      example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz


      To restrict canu to only a specific stage, use:
      -correct - generate corrected reads
      -trim - generate trimmed reads
      -assemble - generate an assembly
      -trim-assemble - generate trimmed reads and then assemble them

      The assembly is computed in the -d <assembly-directory>, with output files named
      using the -p <assembly-prefix>. This directory is created if needed. It is not
      possible to run multiple assemblies in the same directory.

      The genome size should be your best guess of the haploid genome size of what is being
      assembled. It is used primarily to estimate coverage in reads, NOT as the desired
      assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

      Some common options:
      useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
      but don't submit any jobs (remote)
      rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads. For lower
      quality reads, use a higher number. The defaults are 0.300 for PacBio reads and
      0.500 for Nanopore reads.
      correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads. Assemblies of
      low coverage or data with biological differences will benefit from a slight increase
      in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
      gridOptions=string
      - Pass string to the command used to submit jobs to the grid. Can be used to set
      maximum run time limits. Should NOT be used to set memory limits; Canu will do
      that for you.
      minReadLength=number
      - Ignore reads shorter than 'number' bases long. Default: 1000.
      minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.
      A full list of options can be printed with '-options'. All options can be supplied in
      an optional sepc file with the -s option.

      Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
      Reads are specified by the technology they were generated with, and any processing performed:
      -pacbio-raw <files> Reads are straight off the machine.
      -pacbio-corrected <files> Reads have been corrected.
      -nanopore-raw <files>
      -nanopore-corrected <files>

      Complete documentation at http://canu.readthedocs.org/en/latest/

  • Jit answered the question Comparison of mapping tools ! 3137 days ago
    You should check the segemehl algorithm paper http://bioinformatics.oxfordjournals.org/content/early/2014/03/13/bioinformatics.btu146.full.pdf+html , in which they compare the mapping tools. For further detail of the Algo...
  • Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines CEGMA (formally discontinued but still useful) BUSCO (we have issues with fish, seems not to be tailored to that group of...
  • mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold,...
  • This tutorial covers topics independently of HOMER, and represents knowledge which is important to know before diving head first into more advanced analysis tools such as HOMER. Setting up your computing environment Retrieving and storing...