Log in

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Latest activity

  • Rahul Nayak commented on a bookmark R and Bioconductor Tutorial in the group R and Bioconductor 3183 days ago
    Learn R by urself www.datasciencecentral.com/profiles/blogs/learning-r-in-seven-simple-steps
  • Neel bookmarked Picard 3183 days ago
    Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF...
  • Poonam Mahapatra bookmarked Easyfig 3183 days ago
    Easyfig has moved to github, for newer releases of Easyfig please visit our new webpage - https://mjsull.github.io/Easyfig.  Easyfig is a Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use...
  • Thanks for the list. I came across GPOPSIM: a simulation tool for whole-genome genetic data ( http://bmcgenet.biomedcentral.com/articles/10.1186/s12863-015-0173-4 ), which seems the best for be a useful tool for the methodological and...
  • The Genome Analysis Toolbox with de-Bruijn graph (GATB) provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge...
  • The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes...
  • Abhi posted a new ad in the Opportunity Bioinformatics Faculty at TNU 3185 days ago
  • Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements. The detection is based on conditional exclusive compression, namely using a FCM (Markov model), of high context order (typically 20). For...
  • As the number of sequence and annotated genomes grows larger, the need to understand, compare, and contrast the data becomes increasingly important. Using the power of the human visual system to detect trends and spot outliers is necessary in such...
  • Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered. Canu is...
    Comments
    • Poonam Mahapatra 2430 days ago

      Canu is one of the best de novo assemblers available for long reads - it’s a fork and updated version of the Celera assembler that was used to assemble the human genome.

      It is quite a complex beast that has HPC integration built in - though you can turn this off. However, large assembly jobs are best run in parallel, making HPC integration essential. This can get tough if your cluster has a non-standard configuration.

      Run canu without any options to get help:

      canu
      

      This produces:

      usage: canu [-version] \
                  [-correct | -trim | -assemble | -trim-assemble] \
                  [-s <assembly-specifications-file>] \
                   -p <assembly-prefix> \
                   -d <assembly-directory> \
                   genomeSize=<number>[g|m|k] \
                  [other-options] \
                  [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq
      
        By default, all three stages (correct, trim, assemble) are computed.
        To compute only a single stage, use:
          -correct       - generate corrected reads
          -trim          - generate trimmed reads
          -assemble      - generate an assembly
          -trim-assemble - generate trimmed reads and then assemble them
      
        The assembly is computed in the (created) -d <assembly-directory>, with most
        files named using the -p <assembly-prefix>.
      
        The genome size is your best guess of the genome size of what is being assembled.
        It is used mostly to compute coverage in reads.  Fractional values are allowed: '4.7m'
        is the same as '4700k' and '4700000'
      
        A full list of options can be printed with '-options'.  All options
        can be supplied in an optional sepc file.
      
        Reads can be either FASTA or FASTQ format, uncompressed, or compressed
        with gz, bz2 or xz.  Reads are specified by the technology they were
        generated with:
          -pacbio-raw         <files>
          -pacbio-corrected   <files>
          -nanopore-raw       <files>
          -nanopore-corrected <files>
      
      Complete documentation at http://canu.readthedocs.org/en/latest/
      

      Canu has three stages which it runs in order:

      • Correct
      • Trim
      • Assemble

      By default canu runs these one after the other, but they can be run individually.

      An example “full pipeline” command would be:

      canu -p meta \
           -d meta \
           genomeSize=40m \
           useGrid=false \
           -nanopore-raw /vol_b/public_data/minion_brown_metagenome/brown_metagenome.2D.10.fasta
      

      This puts output in directory meta with prefix “meta”. We estimate the genome size, tell canu NOT to use HPC (as we don’t have one for porecamp) and give it some ONT data as fasta.

      This runs pretty quickly but doesn’t assemble anything. It’s a low coverage synthetic metagenome, so no surprise. It does produce corrected reads though! These could be used in the metagenomics practical (hint!)

      Now try the E coli subset:

      canu -p ecoli      
           -d ecoli      
           genomeSize=4.8m      
           useGrid=false      
           -nanopore-raw /vol_b/public_data/minion_ecoli_sample/ecoli_sample.template.fasta
      

      This one will take a bit longer ;)

    • Rahul Nayak 2354 days ago

      ➜ bin git:(master) ✗ ./canu

      usage: canu [-version] [-citation] \
      [-correct | -trim | -assemble | -trim-assemble] \
      [-s <assembly-specifications-file>] \
      -p <assembly-prefix> \
      -d <assembly-directory> \
      genomeSize=<number>[g|m|k] \
      [other-options] \
      [-pacbio-raw |
      -pacbio-corrected |
      -nanopore-raw |
      -nanopore-corrected] file1 file2 ...

      example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz


      To restrict canu to only a specific stage, use:
      -correct - generate corrected reads
      -trim - generate trimmed reads
      -assemble - generate an assembly
      -trim-assemble - generate trimmed reads and then assemble them

      The assembly is computed in the -d <assembly-directory>, with output files named
      using the -p <assembly-prefix>. This directory is created if needed. It is not
      possible to run multiple assemblies in the same directory.

      The genome size should be your best guess of the haploid genome size of what is being
      assembled. It is used primarily to estimate coverage in reads, NOT as the desired
      assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

      Some common options:
      useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
      but don't submit any jobs (remote)
      rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads. For lower
      quality reads, use a higher number. The defaults are 0.300 for PacBio reads and
      0.500 for Nanopore reads.
      correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads. Assemblies of
      low coverage or data with biological differences will benefit from a slight increase
      in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
      gridOptions=string
      - Pass string to the command used to submit jobs to the grid. Can be used to set
      maximum run time limits. Should NOT be used to set memory limits; Canu will do
      that for you.
      minReadLength=number
      - Ignore reads shorter than 'number' bases long. Default: 1000.
      minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.
      A full list of options can be printed with '-options'. All options can be supplied in
      an optional sepc file with the -s option.

      Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
      Reads are specified by the technology they were generated with, and any processing performed:
      -pacbio-raw <files> Reads are straight off the machine.
      -pacbio-corrected <files> Reads have been corrected.
      -nanopore-raw <files>
      -nanopore-corrected <files>

      Complete documentation at http://canu.readthedocs.org/en/latest/

  • Jit answered the question Comparison of mapping tools ! 3186 days ago
    You should check the segemehl algorithm paper http://bioinformatics.oxfordjournals.org/content/early/2014/03/13/bioinformatics.btu146.full.pdf+html , in which they compare the mapping tools. For further detail of the Algo...
  • Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines CEGMA (formally discontinued but still useful) BUSCO (we have issues with fish, seems not to be tailored to that group of...
  • mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold,...
  • This tutorial covers topics independently of HOMER, and represents knowledge which is important to know before diving head first into more advanced analysis tools such as HOMER. Setting up your computing environment Retrieving and storing...
  • Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and...
    Comments
    • Jit 3186 days ago

      Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines

      • CEGMA (formally discontinued but still useful)
      • BUSCO (we have issues with fish, seems not to be tailored to that group of organisms, developers tell us they are fixing it)
      • linkage map? or other map (RAD-tag based). (software?)
      • BioNanoGenomics can be used for QC also
      • Use a genome browser to get a feeling for your results, e.g. IGV; add assembly, BAM files, annotation, transcripts mapped and browse
  • Jitendra Prajapati commented on a bookmark Venn Diagrams on R Studio 3187 days ago
    How can I generate a Venn diagram in R? by UCLA is also useful http://www.ats.ucla.edu/stat/r/faq/venn.htm
  • First step: Install & load “VennDiagram” package. # install.packages('VennDiagram') library(VennDiagram) Second step: Load data Add filepath if “catdoge.csv” is not in working-directory. d <-...
    Comments
  • Thanks for such a useful links, I found Sofja Kovalevskaja Award very competative and have many scope for bioinformatician. Sofja Kovalevskaja Award – Become a research group leader in Germany € 1.65 million for young researchers from...
  • Abhimanyu Singh posted a new ad in the ResearchLabs Desai Lab 3191 days ago
  • Are you seeking funding for research or training in a particular area? Check out the following agencies ... National Science Foundation: For the love of science! Head here when searching for ways to pay for that gargantuan geology or bigtime...
    Comments
    • Jit 3033 days ago

      Bioinformatics funding for Japan

      Promoting science and technology is a key engine to materialize a bright future of Asia and it is vitally important to enhance the exchange of youths in Asian countries and Japan who will play a crucial role in the field of science and technology.


      Based on this concept, “Japan-Asia Youth Exchange Program in Science” (SAKURA Exchange Program in Science) is the program for enhancing exchanges between Asia and Japan of the youths who will play a crucial role in the future field of science and technology through the close collaboration of industry-academia-government by facilitating short-term visits of competent Asian youths to Japan. This program aims at raising the interest of Asian youths toward the leading Japanese science and technologies at Japanese universities, research institutions and private companies.

      More at http://www.ssp.jst.go.jp/EN/outline/index.html

    • Shruti Paniwala 2990 days ago

      The Arturo Falaschi ICGEB Fellowship Programmes for PhD, PostDoc and Short term courses

      The Arturo Falaschi ICGEB Fellowships programme offers long and short-term fellowships for scientists who are nationals of ICGEB Member States to perform research in TriesteNew Delhi or Cape Town.

      More at http://www.icgeb.org/fellowships.html

    • Neel 1809 days ago