    Learn R by urself www.datasciencecentral.com/profiles/blogs/learning-r-in-seven-simple-steps
    Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF...
    Easyfig has moved to github, for newer releases of Easyfig please visit our new webpage - https://mjsull.github.io/Easyfig.  Easyfig is a Python application for creating linear comparison figures of multiple genomic loci with an easy-to-use...
  • Thanks for the list. I came across GPOPSIM: a simulation tool for whole-genome genetic data ( http://bmcgenet.biomedcentral.com/articles/10.1186/s12863-015-0173-4 ), which seems the best for be a useful tool for the methodological and...
  • The Genome Analysis Toolbox with de-Bruijn graph (GATB) provides a set of highly efficient algorithms to analyse NGS data sets. These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge...
  • The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes...
  • Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements. The detection is based on conditional exclusive compression, namely using a FCM (Markov model), of high context order (typically 20). For...
  • As the number of sequence and annotated genomes grows larger, the need to understand, compare, and contrast the data becomes increasingly important. Using the power of the human visual system to detect trends and spot outliers is necessary in such...
  • Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered. Canu is...
    • Poonam Mahapatra 2430 days ago

      Canu is one of the best de novo assemblers available for long reads - it’s a fork and updated version of the Celera assembler that was used to assemble the human genome.

      It is quite a complex beast that has HPC integration built in - though you can turn this off. However, large assembly jobs are best run in parallel, making HPC integration essential. This can get tough if your cluster has a non-standard configuration.

      Run canu without any options to get help:


      This produces:

      usage: canu [-version] \
                  [-correct | -trim | -assemble | -trim-assemble] \
                  [-s <assembly-specifications-file>] \
                   -p <assembly-prefix> \
                   -d <assembly-directory> \
                   genomeSize=<number>[g|m|k] \
                  [other-options] \
                  [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq
        By default, all three stages (correct, trim, assemble) are computed.
        To compute only a single stage, use:
          -correct       - generate corrected reads
          -trim          - generate trimmed reads
          -assemble      - generate an assembly
          -trim-assemble - generate trimmed reads and then assemble them
        The assembly is computed in the (created) -d <assembly-directory>, with most
        files named using the -p <assembly-prefix>.
        The genome size is your best guess of the genome size of what is being assembled.
        It is used mostly to compute coverage in reads.  Fractional values are allowed: '4.7m'
        is the same as '4700k' and '4700000'
        A full list of options can be printed with '-options'.  All options
        can be supplied in an optional sepc file.
        Reads can be either FASTA or FASTQ format, uncompressed, or compressed
        with gz, bz2 or xz.  Reads are specified by the technology they were
        generated with:
          -pacbio-raw         <files>
          -pacbio-corrected   <files>
          -nanopore-raw       <files>
          -nanopore-corrected <files>
      Complete documentation at http://canu.readthedocs.org/en/latest/

      Canu has three stages which it runs in order:

      • Correct
      • Trim
      • Assemble

      By default canu runs these one after the other, but they can be run individually.

      An example “full pipeline” command would be:

      canu -p meta \
           -d meta \
           genomeSize=40m \
           useGrid=false \
           -nanopore-raw /vol_b/public_data/minion_brown_metagenome/brown_metagenome.2D.10.fasta

      This puts output in directory meta with prefix “meta”. We estimate the genome size, tell canu NOT to use HPC (as we don’t have one for porecamp) and give it some ONT data as fasta.

      This runs pretty quickly but doesn’t assemble anything. It’s a low coverage synthetic metagenome, so no surprise. It does produce corrected reads though! These could be used in the metagenomics practical (hint!)

      Now try the E coli subset:

      canu -p ecoli      
           -d ecoli      
           -nanopore-raw /vol_b/public_data/minion_ecoli_sample/ecoli_sample.template.fasta

      This one will take a bit longer ;)

    • Rahul Nayak 2354 days ago

    You should check the segemehl algorithm paper http://bioinformatics.oxfordjournals.org/content/early/2014/03/13/bioinformatics.btu146.full.pdf+html , in which they compare the mapping tools. For further detail of the Algo...
  • Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines CEGMA (formally discontinued but still useful) BUSCO (we have issues with fish, seems not to be tailored to that group of...
  • mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold,...
  • This tutorial covers topics independently of HOMER, and represents knowledge which is important to know before diving head first into more advanced analysis tools such as HOMER. Setting up your computing environment Retrieving and storing...
  • Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and...
    • Jit 3186 days ago

      Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines

      • CEGMA (formally discontinued but still useful)
      • BUSCO (we have issues with fish, seems not to be tailored to that group of organisms, developers tell us they are fixing it)
      • linkage map? or other map (RAD-tag based). (software?)
      • BioNanoGenomics can be used for QC also
      • Use a genome browser to get a feeling for your results, e.g. IGV; add assembly, BAM files, annotation, transcripts mapped and browse
    How can I generate a Venn diagram in R? by UCLA is also useful http://www.ats.ucla.edu/stat/r/faq/venn.htm
  • First step: Install & load “VennDiagram” package. # install.packages('VennDiagram') library(VennDiagram) Second step: Load data Add filepath if “catdoge.csv” is not in working-directory. d <-...
  • Thanks for such a useful links, I found Sofja Kovalevskaja Award very competative and have many scope for bioinformatician. Sofja Kovalevskaja Award – Become a research group leader in Germany € 1.65 million for young researchers from...
  • Are you seeking funding for research or training in a particular area? Check out the following agencies ... National Science Foundation: For the love of science! Head here when searching for ways to pay for that gargantuan geology or bigtime...
    • Jit 3033 days ago

      Bioinformatics funding for Japan

      Promoting science and technology is a key engine to materialize a bright future of Asia and it is vitally important to enhance the exchange of youths in Asian countries and Japan who will play a crucial role in the field of science and technology.

      Based on this concept, “Japan-Asia Youth Exchange Program in Science” (SAKURA Exchange Program in Science) is the program for enhancing exchanges between Asia and Japan of the youths who will play a crucial role in the future field of science and technology through the close collaboration of industry-academia-government by facilitating short-term visits of competent Asian youths to Japan. This program aims at raising the interest of Asian youths toward the leading Japanese science and technologies at Japanese universities, research institutions and private companies.

      More at http://www.ssp.jst.go.jp/EN/outline/index.html

    • Shruti Paniwala 2990 days ago

      The Arturo Falaschi ICGEB Fellowship Programmes for PhD, PostDoc and Short term courses

      The Arturo Falaschi ICGEB Fellowships programme offers long and short-term fellowships for scientists who are nationals of ICGEB Member States to perform research in TriesteNew Delhi or Cape Town.

      More at http://www.icgeb.org/fellowships.html

