BOL: Related items

Data Visualization in Bioinformatics: Useful and Eye-Catching Plots for Data Analysis

LEGE — Sat, 14 Dec 2024 12:41:53 -0600

Data visualization is a cornerstone of bioinformatics, enabling researchers to interpret complex datasets effectively. With a plethora of data types—genomic sequences, expression profiles, protein interactions, and more—the right visualizations can make or break an analysis. This blog highlights some of the most useful and visually compelling plots for bioinformatics data analysis, along with tools to create them.

1. Heatmaps: Exploring Patterns in High-Dimensional Data

Heatmaps are a go-to visualization for representing high-dimensional datasets, such as gene expression or metabolomics data. They use color gradients to display data intensity, making patterns and clusters easily detectable.

Applications: Gene expression analysis, pathway enrichment, methylation studies.
Tools: Seaborn (Python), ComplexHeatmap (R), Morpheus (web-based).

Tip: Add dendrograms to visualize clustering of rows and columns for hierarchical relationships.

2. Volcano Plots: Highlighting Differential Features

Volcano plots are indispensable for identifying significantly differentially expressed genes or proteins. They plot the log2 fold change against –log10(p-value), making it easy to spot statistically significant changes.

Applications: RNA-seq, proteomics, and metabolomics.
Tools: ggplot2 (R), EnhancedVolcano (R), Plotly (Python).

Tip: Use color to highlight significant features and label key genes or proteins.

3. PCA Plots: Reducing Complexity with Principal Component Analysis

Principal Component Analysis (PCA) plots are used to reduce dimensionality and uncover trends or clusters in data. They provide insights into sample variability and grouping.

Applications: Transcriptomics, metabolomics, microbiome studies.
Tools: scikit-learn + Matplotlib (Python), prcomp (R), ClustVis (web-based).

Tip: Annotate clusters with metadata to enhance interpretability.

4. Manhattan Plots: Genome-Wide Association Studies

Manhattan plots visualize p-values across the genome, making it easy to identify significant associations in genome-wide studies. They resemble city skylines, with the highest peaks indicating loci of interest.

Applications: GWAS, QTL mapping.
Tools: qqman (R), Matplotlib (Python).

Tip: Use alternating colors for chromosomes and highlight significant SNPs for clarity.

5. Circular Plots (Circos): Visualizing Genomic Relationships

Circular plots are ideal for visualizing relationships across the genome, such as structural variations, gene duplications, or synteny.

Applications: Comparative genomics, structural variation studies.
Tools: Circos (standalone), Rcircos (R), pyCircos (Python).

Tip: Keep the plot clean and avoid overcrowding to maintain readability.

6. Sankey Diagrams: Tracking Data Flows

Sankey diagrams visualize flows or relationships between categories, often used to track changes in gene expression or pathway enrichment across conditions.

Applications: Pathway analysis, gene set enrichment analysis.
Tools: Plotly (Python), networkD3 (R).

Tip: Use gradients or distinct colors to highlight key transitions.

7. Network Graphs: Mapping Interactions

Network graphs represent relationships between entities, such as protein-protein interactions or gene regulatory networks. Nodes represent entities, and edges represent relationships.

Applications: Systems biology, interactomics.
Tools: Cytoscape (standalone), igraph (R), NetworkX (Python).

Tip: Use edge thickness or node size to represent interaction strength or centrality.

8. Violin Plots: Visualizing Data Distribution

Violin plots combine a boxplot with a density plot, showing the distribution and variability of data.

Applications: Single-cell RNA-seq, quantitative trait analysis.
Tools: Seaborn (Python), ggplot2 (R).

Tip: Split violins by groups for side-by-side comparisons.

9. Time-Series Plots: Monitoring Changes Over Time

Time-series plots display changes in variables across time points, useful for tracking gene expression dynamics or metabolic fluxes.

Applications: Time-course experiments, cell cycle studies.
Tools: Matplotlib (Python), ggplot2 (R).

Tip: Smooth the data to highlight trends while avoiding overfitting.

10. Genome Tracks: Visualizing Genomic Features

Genome tracks display multiple layers of genomic data, such as gene annotations, sequencing coverage, and epigenetic marks.

Applications: ChIP-seq, ATAC-seq, whole-genome sequencing.
Tools: IGV (standalone), pyGenomeTracks (Python).

Tip: Stack related tracks for direct comparisons.

11. UpSet Plots: Visualizing Set Intersections

UpSet plots are a powerful alternative to Venn diagrams for visualizing intersections between multiple datasets.

Applications: Overlap analysis for gene sets, pathways, or variants.
Tools: UpSetR (R), ComplexUpset (Python).

Tip: Use bar plots to represent the size of each intersection for added clarity.

12. Ridge Plots: Comparing Distributions

Ridge plots visualize the distributions of multiple datasets, stacked for easy comparison.

Applications: Transcriptomics, single-cell RNA-seq.
Tools: ggridges (R), Matplotlib (Python).

Tip: Use transparency and consistent scaling for better readability.

13. Chord Diagrams: Visualizing Connections Between Groups

Chord diagrams illustrate relationships between categories, such as shared genes between pathways or overlaps in regulatory elements.

Applications: Pathway overlap, synteny, co-expression networks.
Tools: Circlize (R), Holoviews (Python).

Tip: Use distinct colors for each group to emphasize relationships.

14. Treemaps: Hierarchical Data Representation

Treemaps visualize hierarchical data as nested rectangles, with area proportional to data size.

Applications: Ontology enrichment, pathway analysis.
Tools: Treemapify (R), Plotly (Python).

Tip: Use colors to represent additional variables, like significance or enrichment scores.

15. T-SNE/UMAP Plots: Dimensionality Reduction for Clustering

T-SNE and UMAP plots are great for visualizing high-dimensional data in two dimensions while preserving local or global structure.

Applications: Single-cell transcriptomics, clustering analyses.
Tools: scikit-learn (Python), Seurat (R).

Tip: Combine with metadata annotations for better cluster interpretation.

Bringing It All Together

The choice of visualization can significantly impact the insights gained from bioinformatics data. By selecting plots tailored to your data type and analysis goals, you can effectively communicate your findings and make your research more impactful. Whether you’re a seasoned bioinformatician or a beginner, mastering these visualizations will elevate your analyses and presentations.

Assistant Professor at Jawaharlal Nehru University in Delhi

Wed, 07 May 2014 00:29:22 -0500

Advt. No. RC/48/2014

SCHOOL OF COMPUTATIONAL AND INTEGRATIVE SCIENCES (SC&IS)

ESSENTIAL QUALIFICATION : - M.Sc./M.Tech. in Physics/ Chemistry/ Biology/ Mathematics/ Statistics/ Bioinformatics/ Computational Biology. Ph.D. in the broad areas of Bioinformatics/ Computational Biology. Candidates must have demonstrated capabilities in terms of high impact research publications in either of the above mentioned areas.

Scale of Pay : - 15600-39100/- (PB-III) AGP Rs. 6000/-

For more details on Centre/School, Specializations etc. please visit JNU website www.jnu.ac.in or contact Section Officer, Room Nos. 131-132, Recruitment Cell, Administrative Block, JNU, New Delhi – 110067, Email: recruitmentjnu2013@gmail.com The last date for the receipt of application is 15 May, 2014.

http://www.jnu.ac.in/Career/

http://www.jnu.ac.in/Career/ADVTNo_RC_48_2014.pdf
Last Apply Date:

15 May 2014

Installing BLAT on Linux !

BioStar — Tue, 11 Sep 2018 08:17:35 -0500

It's been a while since I last installed BLAT and when I went to the download directory at UCSC: http://users.soe.ucsc.edu/~kent/src/ I found that the latest blast is now version 35 and that the code to download was: blatSrc35.zip. However, you can also get pre-compiled binaries at: http://hgdownload.cse.ucsc.edu/admin/exe/ and that there was a linux x86_64 executable for my architecture available at: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/. Though YYMV, BLAT can be a little bit of a tricky beast to get going, so I decided to download the source code and compile that.

I will be compiling this code as 'root' as a system tool in /usr/local/src, so do not scream at me for that.

First I created an /usr/local/src/blat directory and I copied the blatSrc35.zip file into that.

Next I used

unzip blatSrc35.zip

to unpack the archive. This gives a directory blatSrc now move into that directory.

#cd blatSrc

before you begin read the README file that comes with the source code.

One thing about building blat is that you need to set the MACHTYPE variable so that the BLAT sources know what type of machine you are compiling the software on.

on most *nix machines, typing

echo $MACHTYPE

will return the machine architecture type.

On my CentOS 6 based system this gave:

x86_64-redhat-linux-gnu

However, what BLAT requires is the 'short value' (ie the first part of the MACHTYPE). To correct this, in the bash shell type (change this to the correct MACHTYPE for your system)

MACHTYPE=x86_64
export MACHTYPE

now running the command:

echo $MACHTYPE

should give the correct short form of the MACHTYPE:

x86_64

now create the directory lib/$MACHTYPE in the source tree. ie:

mkdir lib/$MACHTYPE

For my machine, lib/x86_64 already existed, so I did not have to do this, but this is not the case for all architectures.

The BLAT code assumes that you are compiling BLAT as a non-privileged (ie non-root) user. As a result, you must create the directory for the executables to go into:

mkdir ~/bin/$MACHTYPE

If you are installing as a normal user, edit your .bashrc to add the following (change the x86_64 to be your MACHTYPE):

export PATH=~/bin/x86_64::$PATH

For me, though, this was not good enough. I wanted the executables in /usr/local/bin where all my other code goes. As a result I did some hackery...

There is a master make template in the inc directory called common.mk and I edited this file with the command:

vi inc/common.mk

I replaced the line

    BINDIR=${HOME}/bin/${MACHTYPE}

with

    BINDIR=/usr/local/bin

saved and quit (as this is in my path, I do not need to do anything else)

All the preparation is now done and you can create the blat executables by going into the toplevel of the blat source tree (for me it was /usr/local/src/blat/blatSrc, but change to wherever you unpacked blat into).

Now simply run the command:

make

to compile the code.

Blat installed cleanly and the executables were all neatly placed in /usr/local/bin/x86_64, just like I wanted.

now simply running the command:

blat

on the command line gives me information on blat and sample usage.

Blat is installed and it's installed properly in my system code tree!!!

Memories Can Be Passed Down Through DNA

Sat, 10 May 2014 21:24:10 -0500

The premise of Assassin's Creed is the reliving of other people's memories stored inside DNA. Well scientists have found that in mice, it actually happens! Anthony is joined by special guest and our friend Tara Long from Hard Science to explain how this process works, and if it might apply to humans as well. Read More: Parental olfactory experience influences behavior and neural structure in subsequent generations http://www.nature.com/neuro/journal/vaop/ncurrent/abs/nn.3594.html "Using olfactory molecular specificity, we examined the inheritance of parental traumatic exposure, a phenomenon that has been frequently observed, but not understood." What Is Epigenetics? http://www.sciencemag.org/content/330/6004/611 "The cells in a multicellular organism have nominally identical DNA sequences (and therefore the same genetic instruction sets), yet maintain different terminal phenotypes. This nongenetic cellular memory, which records developmental and environmental cues (and alternative cell states in unicellular organisms), is the basis of epi-(above)-genetics." Epigenetics http://en.wikipedia.org/wiki/Epigenetics Watch More: How to Change Your Genes https://www.youtube.com/watch?v=B5DU9lgbsSE TestTube Wild Card http://testtube.com/dnews/dnews-231-how-too-many-screens-affect-our-brain?utm_source=YT&utm_medium=DNews&utm_campaign=DNWC Is Sexiness Hereditary? https://www.youtube.com/watch?v=z6STRCncvM8 ____________________ DNews is dedicated to satisfying your curiosity and to bringing you mind-bending stories & perspectives you won't find anywhere else! New videos twice daily. Watch More DNews on TestTube http://testtube.com/dnews Subscribe now! http://www.youtube.com/subscription_center?add_user=dnewschannel DNews on Twitter http://twitter.com/dnews Anthony Carboni on Twitter http://twitter.com/acarboni Laci Green on Twitter http://twitter.com/gogreen18 Trace Dominguez on Twitter http://twitter.com/trace501 DNews on Facebook http://facebook.com/dnews DNews on Google+ http://gplus.to/dnews Discovery News http://discoverynews.com

Bioinformatics JRF/SRF position at NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY

Sun, 11 May 2014 22:29:12 -0500

NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY
LBS, CENTRE, PUSA CAMPUS, IARI NEW DELHI
NEW DELHI – 110 012

WALK- IN –INTERVIEWS

Eligible candidates may appear in Walk-in-Interview on May 23, 2014 at 10 AM for the posts of Research Associates & Senior Research Fellows (SRF) in the following DST/DBT/ICAR funded projects.

1 NPTC Project on Bioinformatics and Comparative Genomics

Research Associate (One)

Rs. 24000/- + 30% HRA for masters degree holder with more than 4 years experience

Essential: Ph D in Plant Molecular Biology & Biotechnology/Genetics 0r Candidates who have already submitted their Ph D thesis in above subjects

Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.

Senior Research Fellow ( UGCCSIR/ DBT/ ICAR Net qualified only): (One)

Rs. 16000/- + 30% HRA and Rs. 18000+30 HRA from 3rd year onwards

Essential:

1. ICAR/ UGCCSIR/DBT Net qualified only

2. M. Sc. (with thesis) in Biotechnology, Life Sciences, Biosciences/ Bioinformatics, Genetics/ Plant Pathology with experience in molecular biology.

Or M.Sc with more than 3 years research experiences

3. B.Sc. Agriculture or Biology

Desirable:
1. M. Sc. with thesis
2. Experience in molecular biology, plant tissue culture
3. Bioinformatics knowledge is important

2 DST JC Bose National Fellowship

Research Associate (Bioinformatics) : One

Rs.22000/- + 30% HRA for 1 & 2nd Yr., Rs. 23000+ 30% HRA for 3rd year and Rs. 24000+30% HRA for 4th &5th yr

Essential: M Ph D in Plant Molecular Biology & Biotechnology/Genetics

Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.

Age limit: Max.35 years (Age relaxation of 5 years for SC/ST & women and 3 years for OBC)

The posts are purely temporary in nature and are co-terminus with the project. Initially the offer will be made for one year only and may be further extendable based on performance of the candidate. The interview will be held on May 23 , 2014 at 10:00 AM at NRCPB, LBS Building, Pusa Campus, IARI, New Delhi- 110012. The candidates must bring four copies of biodata (in the prescribed proforma), original certificates, attested photocopies of each of the certificates and an attested copy of recent passport size photograph. No. TA/DA would be given for the appearance in interview. Only the candidates having essential qualification would be entertained for the interviews. Short-listing of candidates based on academic merit and experience will be done in case of large number of applicants.

Advertisement: http://www.nrcpb.org/sites/default/files/Advertisement%20for%20RA%20and%20SRF%20Position.pdf

A History of Bioinformatics (in the Year 2039)

Wed, 23 Jul 2014 06:37:51 -0500

C. Titus Brown http://video.open-bio.org/video/1/a-history-of-bioinformatics-in-the-year-2039

New born babies get ready to know their whole genome soon!!!

Rahul Agarwal — Thu, 05 Sep 2013 07:24:02 -0500

USA launch a pilot projects to examine medical information of newborn baby, which are being funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Human Genome Research Institute (NHGRI), both parts of the National Institutes of Health.

Awards of $5 million to four grantees have been made in fiscal year 2013 under the Genomic Sequencing and Newborn Screening Disorders research program. The program will be funded at $25 million over five years, as funds are made available.

"Hundreds of US babies will be pioneers in genomic medicine through a US$25-million programme to sequence their genomes soon after they are born."

Source:

http://blogs.nature.com/news/2013/09/scientists-to-sequence-hundreds-of-newborns-genomes.html

http://www.genome.gov/27554919

Scientists map 17,294 proteins produced in human body

Jit — Thu, 29 May 2014 01:57:55 -0500

Indian scientists missed the genomic profiling bus, but they've more than made up for it by creating the first human proteome map which is an extension of the genomic study. Till now, here is no direct equivalent for the human proteome. But recently two groups present mass spectrometry-based analysis of human tissues, body fluids and cells mapping the large majority of the human proteome.

The Indian scientists working in Bangalore, along with their American counterparts, have mapped more than 17,000 proteins in 30 organs of the human body. Just like the human genome was sequenced around the turn of the millennium, this is an equivalent mapping of the human proteome.

The researcher estimated there are around 20,500 proteins in the human body. These scientists have profiled around 17,294, which account for around 84% of the total proteins. Apart from this, the team also traced around 2,500 of 3,000 proteins that had been categorised as "missing proteins".

The work, done by group of Indian scientists, and Johns Hopkins University, published in the renowned journal Nature ( http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html ). Of the 72 people who worked on the project, 46 are Indians.

Reference:

http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html

http://www.proteinatlas.org/ -The antibody-based Human Protein Atlas programme

http://www.humanproteomemap.org/ -Proteogenomic analysis by identifying translated proteins from annotated pseudogenes, non-coding RNAs and untranslated regions.

https://www.proteomicsdb.org/ -Assembled protein evidence for 18,097 genes in ProteomicsDB

GOLD:Genomes Online Database

Jit — Wed, 26 Jul 2017 07:49:29 -0500

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

https://gold.jgi.doe.gov/

Address of the bookmark: https://gold.jgi.doe.gov/

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”