BOL: Related items

DNA Replication Process [3D Animation]

Sat, 10 May 2014 04:41:22 -0500

See an organised list of all the animations: http://doctorprodigious.wordpress.com/hd-animations/

Opera: An optimal genome scaffolding program

Jit — Mon, 27 Nov 2017 10:18:20 -0600

Opera (Optimal Paired-End Read Assembler) is a sequence assembly program (http://en.wikipedia.org/wiki/Sequence_assembly ). It uses information from paired-end or long reads to optimally order and orient contigs assembled from shotgun-sequencing reads.

An updated version called OPERA-LG has been re-engineered with features for the assembly of large and complex genomes.

Song Gao, Denis Bertrand, Burton K. H. Chia and Niranjan Nagarajan. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biology, May 2016, doi: 10.1186/s13059-016-0951-y.

Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

Address of the bookmark: https://sourceforge.net/projects/operasf/

Bioinformatics PhD at CUK Kerala

Sat, 10 May 2014 20:21:22 -0500

Applications are invited from highly motivated students (UGC-CSIR-JRF) with a background in Genomics/ Biotechnology/ Molecular Microbiology/ Biochemistry and Bioinformatics to pursue research leading to Ph.D. in the following areas;

1. Cancer Genomics

2. Microbial Genetics and Metagenomics

3. Human Infective Diseases

4. Computational Drug Design

Interested candidates may apply to Dr. Ranjith N. Kumavath, Assistant Professor & Head, Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Padannakad (PO), Nileshwar, Kasaragod-671328,Kerala. Email: RNkumavath@gmail.com

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

A Brief Bioinformatics Tutorial

Jit — Wed, 21 May 2014 12:50:09 -0500

This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.

The tutorial is divided in three main parts:

In the Sequence part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.
The Structure part is about molecular visualization, homology modeling and structural domain prediction.
In the Function part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...

During all the three parts, we will use the S. cerevisiae VPS36 protein as an example.

Address of the bookmark: http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

Bioinformatics JRF/SRF position at NII

Sun, 25 May 2014 16:54:04 -0500

NATIONAL INSTITUTE OF IMMUNOLOGY, NEW DELHI-110067

Applications are invited for the position of Senior Research Fellow for the following time-bound sponsored project as per the details given below:

1. BTIS project on, “Bioinformatics Center-National Infrastructural Facility in the Area of Immunology” funded by DBT

Senior Research Fellow (P) (One Position only)

Dr. Debasisa Mohanty
Staff Scientist-VI
deb@nii.res.in

Qualifications: M.Sc in Biological Sciences or Biotechnology with at least 04 years of Research experience in Bioinformatics or computational Biology after the master’s degree is essential.

Emoluments: The selected candidates will draw consolidated emoluments as per Institute Rules, depending upon qualifications & experience

Rs. 18,000/- per month consolidated plus 30% HRA if Leading to Ph.D/NET/GATE Qualified otherwise Rs. 14,000/- per month + 30% HRA.

Job description: The candidate should be well versed in programming in PERL/C++/HTML/CGI, web server and portal development, computational analysis of
protein structure & function, molecular dynamics simulations and use of high performance computing systems.

GENERAL TERMS AND CONDITIONS:-

1. The candidates selected for the above posts will be on contract for one year or duration of the project whichever is shorter, at a time.
2. No hostel/ housing facility will be provided.
3. Number of posts may vary and shall be need based. Advertisement is no commitment.
4. Applicants may clearly mention the category they belong to i.e. SC/ST/OBC/PH and attach documentary proof of the same.
5. No TA/DA will be paid for attending the interview, if called for.
6. Apart from sending application in the prescribed format given below, candidates should send complete Curriculum Vitae along with the names of three referees. Curriculum Vitae should contain details of the experimental expertise.

HOW TO APPLY Interested candidates may apply directly, STRICTLY IN THE PRESCRIBED FORMAT GIVEN BELOW, through e-mail, to the Investigator of the project, clearly indicating the name of the project along with their complete C.V., e-mail id, fax numbers, telephone numbers. Only Short listed candidates will be called for interview and they required to submit attested copies of all their certificates and a Demand Draft of Rs 100/- drawn on Canara Bank or Indian Bank payable at Delhi/New Delhi in favour of the Director, NII (SC / ST and PH candidates are exempted subject to submission of documentary proof), at the time of interview.

LAST DATE OF RECEIPT OF APPLICATIONS: 06th June, 2014

www1.nii.res.in/sites/default/files/projectappointment-Dr.Mohanty-6June2014.pdf

Tools for bacterial whole genome annotation

Radha Agarkar — Sat, 16 Dec 2017 17:37:47 -0600

RAST – Web tool (upload contigs), uses the subsystems in the SEED database and provides detailed annotation and pathway analysis. Takes several hours per genome but I think this is the best way to get a high quality annotation (if you have only a few genomes to annotate).

Prokka – Standalone command line tool, takes just a few minutes per genome. This is the best way to get good quality annotation in a flash, which is particularly useful if you have loads of genomes or need to annotate a pangenome or metagenome. Note however that the quality of functional information is not as good as RAST, and you will need several extra steps if you want to do functional profiling and pathway analysis of your genome(s)… which is in-built in RAST.

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

PGAP: NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

BEACON (automated tool for Bacterial GEnome Annotation ComparisON), a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/.

BlastKOLA: Assigns K numbers to the user's sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

PAGIT: Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.

MAKER: A portable and easily configurable genome annotation pipeline. MAKER allows smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. It identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER's inputs are minimal and its ouputs can be directly loaded into a Generic Model Organism Database (GMOD). They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER is available for download and can be tested online via the MAKER Web Annotation Service (MWAS).

MyPro is a software pipeline for high-quality prokaryotic genome assembly and annotation. It was validated on 18 oral streptococcal strains to produce submission-ready, annotated draft genomes. MyPro installed as a virtual machine and supported by updated databases will enable biologists to perform quality prokaryotic genome assembly and annotation with ease.

Bioinformatics JRF vacancy at ICGEB, New Delhi

Wed, 23 Jul 2014 16:07:15 -0500

Junior Research Fellow for a DBT sponsored project entitled "Computational and experimental characterization of stage specific arginine methylation in P. falciparum proteome".

Candidates should have a 1st class MSc/MTech/BTech degree in Bioinformatics. Please send complete CV, quoting Application for RMETH-JRF-2014, by email to Dr. Dinesh Gupta: dinesh@icgeb.res.in

Closing date for applications: 6 August 2014

More at http://www.icgeb.org/tl_files/Vacancies/JRF.pdf

Linux Sort Commands for Bioinformatics

Rahul Nayak — Sat, 31 May 2014 15:41:16 -0500

Almost all the scripting languages such as Perl, Python etc have built-in sort, but unfortunately none of them are as flexible as sort command. But one when it come to space efficiency GNU sort stands at the top. It can sort a 20Gb file with less than 2Gb memory. It is not trivial to implement so powerful a sort by yourself.

sort a space-delimited file based on its first column, then the second if the first is the same, and so on:
sort input.txt

sort a huge file (GNU sort ONLY):
sort -S 1500M -t $HOME/tmp input.txt > sorted.txt

sort starting from the third column, skipping the first two columns:
sort +2 input.txt

sort the second column as numbers, descending order; if identical, sort the 3rd as strings, ascending order:
sort -k2,2nr -k3,3 input.txt

sort starting from the 4th character at column 2, as numbers:
sort -k2.4n input.txt

More Linxu sort command information

If you have any sort commands you'd like to share, please add them to our comments section below. For more help, you can also type:

man sort

or

sort --help

on your Unix/Linux system.