BOL: Related items

Blobsplorer

Jit — Tue, 14 Jun 2016 10:28:58 -0500

Blobsplorer is a tool for interactive visualization of assembled DNA sequence data ("contigs") derived from (often unintentionally) mixed-species pools. It allows the simultaneous display of GC content, coverage, and taxonomic annotation for collections of contigs with a view to separating out those belonging to different taxa.

Blobsplorer is unlikely to be of use on its own as it requires contig data to be supplied in a format that involves considerable preprocessing (see below for a description). The easiest way to use Blobsplorer is as part of a workflow using scripts from here.

Address of the bookmark: http://nematodes.org/martin/blobsplorer/blobsplorer.html

CNIDARIA: fast, reference-free phylogenomic clustering

Shruti Paniwala — Thu, 16 Jun 2016 17:55:17 -0500

Motivation: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but these do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.

Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on ge-nome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at supra-species level and 78% accuracy for species level.

Availability and Implementation: Cnidaria is written in C++ and Python and is available at http://www.ab.wur.nl/cnidaria.

Contact: Saulo Aflitos - sauloal@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Address of the bookmark: https://github.com/sauloal/cnidaria/wiki

Greengenes database

Jit — Wed, 29 Jun 2016 10:03:31 -0500

The greengenes web application provides access to the 2011 version of the greengenes 16S rRNA gene sequence alignment for browsing, blasting, probing, and downloading. The data and tools presented by greengenes can assist the researcher in choosing phylogenetically specific probes, interpreting microarray results, and aligning/annotating novel sequences. If you are an ARB user, you can use greengenes to keep your own local database current.

Address of the bookmark: http://greengenes.lbl.gov/cgi-bin/nph-index.cgi

SRF/ Project Assistant Bioinformatics at NIRRH

Sun, 19 Jun 2016 09:11:13 -0500

SRF/ Project Assistant Bioinformatics recruitment in National Institute for Research in Reproductive Health (NIRRH)

Title of Project : 1. “Analysis Of The Structures Of Known Antimicrobial Peptides Using Machine Learning Algoitms And Molecular Dynamics Simulations”

Senior Research Fellow /1 Post

Qualification: First class M.Sc. in Bioinformatics/ Biological Sciences from recognized university with 2 years research experience and CSIR/UGC/ICMR net qualified OR First class M.Sc. in Bioinformatics/ Biological Sciences from recognized university with 2 years research experience Research experience in bioinformatics and wetlab methods.

Age: Not exceeding 35 Years

Pay Scale : Rs.18,000/- + 30% HRA Rs.14,000/- + 30% HRA

Project Assistant (Level-II) /1 Post

Qualification: First class M.Sc. in Bioinformatics/ Biological Sciences/Computer Sciences Training experience in bioinformatics and wetlab methods .

Age: Not exceeding 28 Years

Pay Scale : Rs.8,000
How to apply
Candidates must bring along with them all the relevant documents in original and one set of attested photocopies of the same and one passport size recent colour photograph.

Walk-in-Interview on 28.06.2016 between 09:00 hrs. to 12:00 hrs.

More at http://www.nirrh.res.in/links/job_oppotunities.htm

Samtools Primer !!

Jit — Thu, 23 Jun 2016 07:18:17 -0500

SAMtools: Primer / Tutorial by Ethan Cerami, Ph.D.

keywords: samtools, next-gen, next-generation, sequencing, bowtie, sam, bam, primer, tutorial, how-to, introduction
Revisions

    1.0: May 30, 2013: First public release on biobits.org.
    1.1: July 24, 2013: Updated with Disqus Comments / Feedback section.
    1.2: December 19, 2014: Multiple updates, including:
        Updated to use samtools 1.1 and bcftools 1.2.
        Updated usage for bcftools.

About

SAMtools is a popular open-source tool used in next-generation sequence analysis. This primer provides an introduction to SAMtools, and is geared towards those new to next-generation sequence analysis. The primer is also designed to be self-contained and hands-on, meaning that you only need to install SAMtools, and no other tools, and sample data sets are provided. Terms in bold are also explained in the glossary at the end of the document.

Address of the bookmark: http://biobits.org/samtools_primer.html

NGS Glossary !!

Jit — Mon, 27 Jun 2016 08:56:18 -0500

alignment: the mapping of a raw sequence read to a location within a reference genome. The mapping occurs because the sequences within the raw read match or align to sequences within the reference genome. Alignment information is stored in the SAM or BAM file formats.

bcftools: a set of companion tools, currently bundled with SAMtools, for identifying and filtering genomics variants.

bowtie: widely used, open source alignment software for aligning raw sequence reads to a reference genome.

BAM Format: binary, compressed format for storing SAM data.

BCF Format: Binary call format. Binary, compressed format for storing VCF data.

CIGAR String: Compact Idiosyncratic Gapped Alignment Report. A compact string that (partially) summarizes the alignment of a raw sequence read to the reference genome. Three core abbreviations are used: M for alignment match; I for insertion; and D for Deletion. For example, a CIGAR string of 5M2I63M indicates that the first 5 base pairs of the read align to the reference, followed by 2 base pairs, which are unique to the read, and not in the reference genome, followed by an additional 63 base pairs of alignment.

FASTA Format: text format for storing raw sequence data. For example, the FASTA file at: http://www.ncbi.nlm.nih.gov/nuccore/NC_008253 contains entire genome for Escherichia coli 536.

FASTQ Format: text format for storing raw sequence data along with quality scores for each base; usually generated by sequencing machines.

genotype likelihood: the probability that a specific genotype is present in the sample of interest. Genotype likelihoods are usually expressed as a Phred-scaled probability, where P = 10 ^ (-Q/10). For example, if the genotype TT (both alleles are T) at position 1,299,132 in human chromosome 12 (reference G) is 37, this translates to a probability of 10^-37/10 = 0.0001995, meaning that there is very low probability that the reads in your sample support a TT genotype. On the other hand, a genotype of AA at the same position with a score of 0 translates into a probability of 10^-0 = 1, indicating extremely high probability that your sample contains a homozygous mutation of G to A.

mate-pair: in paired-end sequencing, both ends of a single DNA or RNA fragment are sequenced, but the intermediate region is not. The two ends which are sequenced form a pair, and are frequently referred to as mate-pairs.

QNAME: unique identifier of a raw sequence read (also known as the Query Name). Used in FASTQ and SAM files.

paired-end sequencing: sequencing process where both ends of a single DNA or RNA fragment are sequenced, but the intermediate region is not. Particularly useful for identifying structural rearrangements, including gene fusions.

Phred-scaled probability: a scaled value (Q) used to compactly summarize a probability, where P = 10^-Q/10. For example, a Phred Q score of 10 translates to probability (P) = 10^-10/10 = 0.1. Phred-scaled probabilities are common in next-generation sequencing, and are used to represent multiple types of quality metrics, including quality of base calls, quality of mappings, and probabilities associated with specific genotypes. The name Phred refers to the original Phred base-calling software, which first used and developed the scale.

Phred quality score: a score assigned to each base within a sequence, quantifying the probability that the base was called incorrectly. Scores use a Phred-scaled probability metric. For example, a Phred Q score of 10 translates to P=10^-10/10 = 0.1, indicating that the base has a 0.1 probability of being incorrect. Higher Phred score correspond to higher accuracy. In the FASTQ format, Phred scores are represented as single ASCII letters. For details on translating between Phred scores and ASCII values, refer to Table 1 of this useful blog post from Damian Gregory Allis.

read-length: the number of base pairs that are sequenced in an individual sequence read.

read-depth: the number of sequence reads that pile up at the same genomic location. For example, 30X read-depth coverage indicates that the genomic location is covered by 30 independent sequencing reads. Increased read-depth translates into higher confidence for calling genomic variants.

RNAME: reference genome identifier (also known as the Reference Name). Within a SAM formatted file, the RNAME identifies the reference genome where the raw read aligns.

SAM Flag: a single integer value (e.g. 16), which encodes multiple elements of meta-data regarding a read and its alignment. Elements include: whether the read is one part of a paired-end read, whether the read aligns to the genome, and whether the read aligns to the forward or reverse strand of the genome. A useful online utility decodes a single SAM flag value into plain English.

SAM Format: Text file format for storing sequence alignments against a reference genome. See also BAM Format.

SAMtools: widely used, open source command line tool for manipulating SAM/BAM files. Includes options for converting, sorting, indexing and viewing SAM/BAM files. The SAMtools distribution also includes bcftools, a set of command line tools for identifying and filtering genomics variants. Created by Heng Li, currently of the Broad Institute.

single-read sequencing: sequencing process where only one end of a DNA or RNA fragment is sequenced. Contrast with paired-end sequencing.

VCF Format: Variant call format. Text file format for storing genomic variants, including single nucleotide polymorphisms, insertions, deletions and structural rearrangements. See also BCF format.

NextGenerationSequencing
A high-throughput sequencing method which parallelizes the sequencing process, producing thousands or millions of sequences at once.

DeepSequencing
Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced.

Paired-EndSequencing
Sequence both ends of the same fragment and keep track of the paired data.

Adapter
Short oligonucleotides which are attached to the DNA to be sequenced. An adapter can provide a priming site for both amplification and sequencing of the adjoining, unknown nucleic acid.

Library
A collection of DNA fragments with adapters ligated to each end.

BridgeAmplification
Generation of in situ copies of a specific DNA molecule on an oligo-decorated solid support.

EmulsionPCR
A method for bead-based amplification of a library. A single adapter-bound fragment is attached to the surface of a bead, and an oil emulsion containing necessary amplification reagents is formed around the bead/fragment component. Parallel amplification of millions of beads with millions of single strand fragments produces a sequencer-ready library.

Alignment
Mapping of sequence reads to a known reference sequence

Referencesequence/genome
A fully assembled version of a genome that can be used for mapping short DNA sequence reads for comparisons of genomes from various individuals

CoverageDepth
The number of nucleotides from reads that are mapped to a given position of reference genome.

Specificity
The percentage of sequences that map to the intended targets out of total bases per run.

Uniformity
The variability in sequence coverage across target regions.

Homopolymer
Uninterrupted stretch of a single nucleotide type (e.g., TTT or GGGGGG)

InDel
InDel stands for Insertion or deletion. A form of structural variation in which a DNA segment is either deleted or inserted.

SNP

SNP stands for Single Nucleotide Polymorphism. A single base difference found when comparing the same DNA sequence from two different individuals.

CSBB-v1.0

Neel — Wed, 29 Jun 2016 07:33:05 -0500

CSBB is a command line based bioinformatics suite to analyze biological data acquired through varied avenues of biological experiments. CSBB is implemented in Perl, while it also leverages the use of R and python in background for specific modules. Major focus of CSBB is to allow users from biology and bioinformatics community, to get benefited by performing down-stream analysis tasks while eliminating the need to write programming code. CSBB is currently available on Linux, UNIX, MAC OS and Windows platforms.

Currently CSBB provides 13 modules focused on analytical tasks like performing upper-quantile normalization on expression data or convert genome wide gene expression to z-scores when comparing expression data from different platforms.

More at https://github.com/skygenomics/CSBB-v1.0

Address of the bookmark: https://github.com/skygenomics/CSBB-v1.0

NIPGR Hires Research Associate, JRF, Laboratory Assistant

Mon, 04 Jul 2016 20:12:14 -0500

National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg - Delhi, Delhi
₹15,000 a month
National Institute of Plant Genome Research (NIPGR) invites applications to recruit on vacant posts of Research Associate (RA), Junior Research Fellow (JRF) and Laboratory Assistant. Applications against these Sarkari Naukri can be submitted on or before 16 July 2016.
NIPGR Vacancy 2016 Details
1. Research Associate (RA)
Qualification: Ph.D. degree (awarded) in Molecular Biology/Biotechnolgy/Biochemistry/Plant Science/ Life Sciences/Bioinformatics or related field with 03 years post-doctoral research experience or 02 research papers in the journals of International repute are eligible to apply. Experience in the area of functional genomics, proteomics, metabolomics, multiomics and system biology will be preferred.
Age Limit: As Per Rules
2. Junior Research Fellow (JRF)
Qualification: M.Sc. degree or equivalent in Biotechnolgy/Biochemistry/Plant Science or Botany/ Life Sciences/Bioinformatics/ Molecular Biology or any other related field. Experience in advanced multiomics, big data analysis, molecular and system biology techniques will be given preference.
Age Limit: As Per Rules
3. Laboratory Assistant
Qualification: B.Sc. degree with 05 years working experience in government R&D Laboratory assisting in the field of molecular biology and genomis.
Pay Scale: Rs.15000/- Per Month
Age Limit: As Per Rules
How to Apply : Duly filled-in applications in prescribed application format along with copies of required documents should be reach to: Dr. Subhra Chakraborty, Staff Scientist-VII, National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, P.O. Box NO. 10531, New Delhi – 110067 . The Last Date to submit application is 16 July 2016

Source: http://www.nipgr.res.in/careers/vacancies_latest.php#
Form at http://www.nipgr.res.in/files/careers/format_RA_JRF_LA.doc

WiseScaffolder

Poonam Mahapatra — Wed, 13 Jul 2016 08:08:57 -0500

Function

WiseScaffolder is a stand-alone semi-automatic application for genome scaffolding of pre-assembled contigs using mate-pair data. It also produces editable scaffold maps, allowing either to build gapped scaffolds or usable as a common thread for the manual improvement of scaffolds.

Description

WiseScaffolder includes 4 subcommands: dumpconfig generates a configuration file that notably specifies the average insert size of the mate-pair library preprocess allows the detection and correction of chimerae, the estimation of contigs copy number and produces valuable outputs for the manual improvement of scaffolds scaffold constitutes the central scaffold-builder and comprises two modules:

i) the interative_scaffold_extender, which works with big, unambiguous contigs, or when they run out, single copy contigs, and

ii) the small_contig_inserter, which inserts the small contigs within scaffolds buildfasta converts the scaffold(s) map(s) into Fasta sequences.

Address of the bookmark: http://abims.sb-roscoff.fr/wisescaffolder

RA Bioinformatics at National Bureau of Fish Genetic Resources

Mon, 25 Jul 2016 03:14:06 -0500

F.No. 1(16)/2016-Admn. (DBT-BBSRC Project)
Research Associate /JRF Biotechnology Job vacancies in National Bureau of Fish Genetic Resources on contract basis

Research Associate /01 Post

Essential: Ph.D. in Bioinformatics or 03 years research experience after Post Graduation in Bioinformatics with at least one research paper in Science Citation Indexed (SCI) journals.

Desirable: The candidate should have at least 1st Division during Graduation and Post Graduation. Experience in assembly/ analysis/ annotation of genomic/transcriptomic data generated on next generation sequencing platforms and working knowledge on different genomic softwares. Publications in Relevant Field.

Pay Scale : Rs. 36,000/- +20% HRA

Age: 40 years for male and 45 years for female candidates, as on the date of interview

Junior Research Fellow/ 01

Essential: Master Degree in Biotechnology/Life Science with Specialization in Molecular Biology with NET qualification.

Desirable: Research Experience in Molecular Biology. 1st Division during Graduation as well as Post Graduation. Publications in Relevant Field.

Pay Scale: Rs. 25,000/-+ 20% HRA for 1st and 2nd year and Rs. 28,000/-+ 20% HRA for 3rd year

Age: 35 years for male and 40 years for female candidates, as on the date of interview.
How to apply
A walk-in-interview will be held on 26.07.2016 at 10:00 hrs. at ICAR-National Bureau of Fish Genetic Resources, Lucknow.

More at http://www.nbfgr.res.in/Recruitments.aspx