BOL: Related items

Introduction to phylogenies in R

Abhi — Wed, 13 Oct 2021 02:27:21 -0500

R phylogenetics is built on the contributed packages for phylogenetics in R, and there are many such packages. Let's begin today by installing a few critical packages, such as ape, phangorn, phytools, and geiger. To get the most recent CRAN version of these packages, you will need to have R 3.3.x installed on your computer!

Address of the bookmark: http://www.phytools.org/Cordoba2017/ex/2/Intro-to-phylogenies.html

How to sequence the human genome - Mark J. Kiel

Fri, 30 May 2014 13:24:11 -0500

View full lesson: http://ed.ted.com/lessons/how-to-sequence-the-human-genome-mark-j-kiel Your genome, every human's genome, consists of a unique DNA sequence of A's, T's, C's and G's that tell your cells how to operate. Thanks to technological advances, scientists are now able to know the sequence of letters that makes up an individual genome relatively quickly and inexpensively. Mark J. Kiel takes an in-depth look at the science behind the sequence. Lesson by Mark J. Kiel, animation by Marc Christoforidis.

Bioinformatics tools to explore SSRs in genomes !

BioStar — Tue, 07 Mar 2023 13:06:15 -0600

There are several bioinformatics tools that can be used to explore Simple Sequence Repeats (SSRs), which are also known as microsatellites. Here are a few examples:

MISA: MISA (MIcroSAtellite) is a web-based tool that can identify SSRs in DNA sequences. It can be used to analyze nucleotide sequences from various organisms and can identify perfect, compound, and imperfect SSRs.
SSR Locator: SSR Locator is a web-based tool that identifies SSRs in both DNA and RNA sequences. It can identify perfect, compound, and imperfect SSRs, and can also filter out low complexity regions.
SciRoKo: SciRoKo is a software tool that can identify SSRs in DNA sequences. It can be used to analyze genomic and transcriptomic sequences from various organisms and can identify perfect, compound, and imperfect SSRs.
Primer3: Primer3 is a web-based tool that designs PCR primers for SSRs. It can design primers for perfect and imperfect SSRs, and can be used to design primers for SSRs in various organisms.
QDD: QDD (Quick Detection of Duplication) is a software tool that can identify SSRs in DNA sequences and can also identify duplicate loci. It can be used to analyze genomic and transcriptomic sequences from various organisms.

These are just a few examples of the many bioinformatics tools available for exploring SSRs. Depending on your specific needs and research questions, you may find that other tools are more appropriate for your analysis.

Genomics and Personalized Medicine

Sun, 01 Jun 2014 23:38:42 -0500

(October 20, 2009) Michael Snyder, Professor of Genetics and Chair of the Department of Genetics at Stanford, discusses advances in gene sequencing, the impact of genomics on medicine, the potential for personalized medicine. and efforts at Stanford to further study these issues. Stanford Mini Med School is a series arranged and directed by Stanford's School of Medicine, and presented by the Stanford Continuing Studies program. Featuring more than thirty distinguished, faculty, scientists and physicians from Stanford's medical school, the series offers students a dynamic introduction to the world of human biology, health and disease, and the groundbreaking changes taking place in medical research and health care. Stanford University http://www.stanford.edu Stanford University School of Medicine http://med.stanford.edu Stanford Continuing Studies http://continuingstudies.stanford.edu Stanford University Channel on YouTube: http://www.youtube.com/stanford

Alignment-free sequence comparison tools available for next-generation sequencing data analysis

Abhimanyu Singh — Tue, 07 Nov 2017 05:33:33 -0600

kallisto

Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets)

Software (C++)

https://pachterlab.github.io/kallisto/

Sailfish

Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based)

Software (C++)

http://www.cs.cmu.edu/~ckingsf/software/sailfish/

Salmon

Quantification of the expression of transcripts using RNA-seq data (uses k-mers)

https://combine-lab.github.io/salmon/

RNA-Skim

RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers)

Software (C++)

http://www.csbio.unc.edu/rs/

Variant calling

ChimeRScope

Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads

Software (Java)

https://github.com/ChimeRScope/ChimeRScope/wiki

FastGT

Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers

Software (C)

https://github.com/bioinfo-ut/GenomeTester4/

Phy-Mer

Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based)

Software (Python)

https://github.com/danielnavarrogomez/phy-mer

LAVA

Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based)

Software (C)

http://lava.csail.mit.edu/

MICADo

Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs)

Software (Python)

http://github.com/cbib/MICADo

General mapper

Minimap

Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers)

Software (C)

https://github.com/lh3/minimap

Assembly

De novo genome assembly

MHAP

Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash

Software (Java)

https://github.com/marbl/MHAP

Miniasm

Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap)

Software (C)

https://github.com/lh3/miniasm

LINKS

Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes)

Software (Perl)

https://github.com/warrenlr/LINKS/

Read clustering

afcluster

Clustering of reads from different genes and different species based on k-mer counts

Software (C++)

https://github.com/luscinius/afcluster

QCluster

Clustering of reads with alignment-free measures (k-mer based) and quality values

Software (C++)

http://www.dei.unipd.it/~ciompin/main/qcluster.html

Reads error correction

Lighter

Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based)

Software (C++)

https://github.com/mourisl/Lighter

QuorUM

Error corrector for Illumina reads using k-mers

Software (C++)

https://github.com/gmarcais/Quorum

Trowel

Software (C++)

https://sourceforge.net/projects/trowel-ec/

Metagenomics

Assembly-free phylogenomics

AAF

Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based)

Software (Python)

https://github.com/fanhuan/AAF

kSNP v3

Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis)

Software (C)

https://sourceforge.net/projects/ksnp/files/

NGS-MC

Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words)

R package

http://www-rcf.usc.edu/~fsun/Programs/NGS-MC/NGS-MC.html

Species identification/taxonomic profiling

CLARK

Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment

Software (C++)

http://clark.cs.ucr.edu/

FOCUS

Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction)

Web service Software (Python)

http://edwards.sdsu.edu/FOCUS/

GSM

Estimation of abundances of microbial genomes in metagenomic samples (k-mer based)

Software (Go)

https://github.com/pdtrang/GSM

Mash

Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique)

Software (C++)

https://github.com/marbl/mash

Kraken

Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database

Software (C++)

https://ccb.jhu.edu/software/kraken/

LMAT

Assignment of taxonomic labels to reads by k-mers searches in precomputed database

Software (C++/Python)

https://sourceforge.net/projects/lmat/

stringMLST

k-mer-based tool for MLST directly from the genome sequencing reads

Software (Python)

http://jordan.biology.gatech.edu/page/software/stringMLST

Taxonomer

k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples

Web service

http://taxonomer.iobio.io/

Other

d2-tools

Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads

Software (Python/R)

https://code.google.com/p/d2-tools/

VirHostMatcher

Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2)

Software (C++)

https://github.com/jessieren/VirHostMatcher

MetaFast

Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure

Software (Java)

https://github.com/ctlab/metafast

Adhoc Bioinformatics Faculty Position @ NIT

Tue, 03 Jun 2014 16:19:52 -0500

NATIONAL INSTITUTE OF TECHNOLOGY, DEPARTMENT OF BIOTECHNOLOGY, WARANGAL – 506 021, Andhra Pradesh

No.NITW/BT/2014/adhoc

APPLICATIONS ARE INVITED FOR THE APPOINTMENT OF ADHOC FACULTY ON CONTRACT BASIS IN THE DEAPARTMENT OF BIOTECHNOLOGY

Period of Contract: Initially the appointment is for one semester i.e., from July 2014 up to December 2014 only.

Essential Qualifications:

i) B. Tech or equivalent in Biotechnology/ Industrial Biotechnology/ Biochemical Engineering / Chemical Engg. Or M. Sc in Microbiology/ Botany/ Zoology/ Biochemistry/Biotechnology and ii) M. Tech or equivalent in Biotechnology/Industrial Biotechnology/Bioinformatics

Integrated M. Tech in Biotechnology/Industrial Biotechnology/ Bioinformatics

Candidates must possess First class (60% aggregate marks or 6.5 CGPA) at B. Tech/ M. Sc and M. Tech.

Desirable: Ph. D Pay Package: All selected candidates shall be eligible for a consolidated pay of Rs.30, 000/- per month. Candidates with Ph. D shall be eligible for an additional amount of Rs.5, 000/- per month.

How to apply : Applications on plain paper with attested photocopies of certificate and bio data along with justification for eligibility should reach to the Head, Department of Biotechnology, National Institute of Technology, Warangal AP 506004 in the form of soft or hard copy on or before 21st June 2014 email : biotech_hod@nitw.ac.in

Intimation: No separate call letters will be sent to the candidates. All the eligible candidates will be notified in the institute web site on 23rd June 2014. All the eligible candidates are requested to report for the interview to the Head, Department of Biotechnology at 9:00 AM on 27th June 2014

Joining: Selected candidates will be informed and they are expected to join immediately.

http://www.nitw.ac.in/nitw/announcements/2014/Bio-Adhoc%20Advt.%20May-2014.pdf

List of visualization tools for genome alignments

Rahul Nayak — Fri, 02 Feb 2018 13:25:33 -0600

Genome browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. But sometime, we need publication ready figure of genomes. Following are the list of genome alignment visualization tools, which could be useful for analysis and interpretation of results:

ABySS Explorer

Interactive Java application that uses a novel graph-based representation to display a sequence assembly and associated metadata

http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer

BamView

Genome browser and annotation tool that allows visualization of sequence features, next-generation sequencing (NGS) data and the results of analyses within the context of the sequence, and also its six-frame translation

http://www.sanger.ac.uk/resources/software/artemis/

DNannotator

Annotation web toolkit for regional genomic sequences

http://bioapp.psych.uic.edu/DNannotator.htm

JVM

Java Visual Mapping tool for NGS reads

http://www.springer.com/cda/content/document/cda_downloaddocument/9789401792448-c2.pdf?SGWID=0-0-45-1487072-p176815501

LookSeq

Web-based visualization of sequences derived from multiple sequencing technologies. Low- or high-depth read pileups and easy visualization of putative single nucleotide and structural variation

http://lookseq.sourceforge.net

MagicViewer

Visualization of short read alignment, identification of genetic variation and association with annotation information of a reference genome

http://bioinformatics.zj.cn/magicviewer/

MapView

Alignments of huge-scale single-end and pair-end short reads

http://omictools.com/mapview-s1367.html

MultiPipMaker

Computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a ‘percent identity plot’ (pip)

http://pipmaker.bx.psu.edu/pipmaker/

PileLineGUI

Handling genome position files in NGS studies

http://sing.ei.uvigo.es/pileline/pilelinegui.html

SAMtools tview

Simple and fast text alignment viewer; NGS compatible

http://www.htslib.org/

SEWAL

Uses a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run

http://www.sourceforge.net/projects/sewal

STAR

A web-based integrated solution to management and visualization of sequencing data

http://wanglab.ucsd.edu/star/browser

SVA

Software for annotating and visualizing sequenced human genomes

http://www.svaproject.org

Viewer (IGV)

Visualization of large heterogeneous datasets, providing a smooth and intuitive user experience at all levels of genome resolution

https://www.broadinstitute.org/igv/

ZOOM Lite

NGS data mapping and visualization software

http://bioinfor.com/zoom/lite/

Search Shell Command History

Rahul Nayak — Thu, 12 Jun 2014 17:43:34 -0500

We use couple of hundreads of command in daily basis. Most of them are actually repeated several time. The question remain open how do I search old command history under bash shell and modify or reuse it?

Now a days almost all modern shell allows you to search command history if enabled by user. Use history command to display the history list with line numbers. Lines listed with with a * have been modified by user.

Shell history search command

Type history at a shell prompt:
$ history

It will display the list of all used commandline history with an serial number.

To search particular command, enter:
$ history | grep command-name
$ history | egrep -i 'scp|ssh|ftp'
Emacs Line-Edit Mode Command History Searching

To get previous command containing string, hit [CTRL]+[r] followed by search string:

(reverse-i-search):

To get previous command, hit [CTRL]+[p]. You can also use up arrow key.

CTRL-p

To get next command, hit [CTRL]+[n]. You can also use down arrow key.

CTRL-n

fc command

Apart from hostory command there are fc command to extract the command from history. The fc stands for either "find command" or "fix command.

For example list last 10 command, enter:
$ fc -l 10
To list commands 130 through 150, enter:
$ fc -l 130 150
To list all commands since the last command beginning with ssh, enter:
$ fc -l ssh
You can edit commands 1 through 5 using vi text editor, enter:
$ fc -e vi 1 5

Delete command history

The -c option causes the history list to be cleared by deleting all of the entries:
$ history -c

Gap filling or Contigs extensions tools !

Rahul Nayak — Fri, 01 Jun 2018 08:07:32 -0500

There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.

Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software

You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).

See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/

In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41

Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS

Here there is a useful explanation of several tools.

https://bioinformaticsonline.com/search?q=scaffolding&entity_type=object&entity_subtype=bookmarks&offset=0&search_type=entities

I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:

1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.

2. Repeat:
a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.
b. Identify unpaired reads that align extending these new paired-end reads.
c. Construct a new assembly C' from C and the new reads identified in (a) and (b).
d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.
e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.

3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.

4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.

Bioinformatician’s Pocket Reference !!

RAJESH DETROJA — Sun, 08 Jun 2014 09:56:58 -0500

It is amusing how brain of bioinformaticians work! Learning a new programming language for days feels so much of fun that making 5 minute discussion with neighbours (unless under special circumstances!) in our own mother-tongue. Today every bioinformatician keeps more than few languages and core IT toolkits on their plate. It has become mandatory to be able to mould different code snippets to build our own custom workflows, and thus keeping syntax at our fingertips has become essential.Although Google is best way to get syntax problem solved, it is not a bad idea to keep reference sheets is our smartphones or stick out some printed sheets on the back of your door, in the old fashion way!!

Address of the bookmark: http://infoplatter.wordpress.com/2014/04/06/bioinformaticians-pocket-reference/