BOL: Related items

2 PhD Position-Bioinformatics Austria

Sun, 09 Feb 2020 03:13:05 -0600

1 position as a University Assistant for 3 years, 30 hours per week, starting as
soon as possible, at the Institute of Biomedical Informatics at Graz University of
Technology

A position for a doctoral candidate is available in Leila Taher’s new lab at the Institute for
Biomedical Informatics at Graz University of Technology (Austria, https://www.bioinfo.tugraz.at).
We develop and apply regulatory genomics and systems biology approaches to analyze large
genomic datasets. Our long-term goal is to gain novel insights into the mechanisms and
evolution of differential gene expression

Link:
https://www.tugraz.at/fileadmin/user_upload/tugrazExternal/1565e0f6-6c94-4077-a118-f84bc91c4b07/Stellenausschreibung_Bioinfo_FWF_Jan2020_EN.pdf

Perl One liner basics !!

Abhimanyu Singh — Sun, 24 May 2015 09:28:33 -0500

Perl has a ton of command line switches (see perldoc perlrun), but I'm just going to cover the ones you'll commonly need to debug code. The most important switch is -e, for execute (or maybe "engage" :) ). The -e switch takes a quoted string of Perl code and executes it. For example:

$ perl -e 'print "Hello, World!\n"'
Hello, World!

It's important that you use single-quotes to quote the code for -e. This usually means you can't use single-quotes within the one liner code. If you're using Windows cmd.exe or PowerShell, you must use double-quotes instead.

I'm always forgetting what Perl's predefined special variables do, and often test them at the command line with a one liner to see what they contain. For instance do you remember what $^O is?

$ perl -e 'print "$^O\n"'
linux

It's the operating system name. With that cleared up, let's see what else we can do. If you're using a relatively new Perl (5.10.0 or higher) you can use the -E switch instead of -e. This turns on some of Perl's newer features, like say, which prints a string and appends a newline to it. This saves typing and makes the code cleaner:

$ perl -E 'say "$^O"'
linux

Pretty handy! say is a nifty feature that you'll use again and again.

Pathway Analysis

Rahul Agarwal — Fri, 03 Oct 2014 08:51:13 -0500

Pathway Analysis is usually performed with aim to enrich the genes with their functional information and reveal the underlying biological mechanisms pursue by genes. Pathway Analysis is not only limited to what biological pathways a particular set of expressed genes follow but also to disclose the relationships between these genes. With availability of more genomics, transcriptomics and proteomics data, interactions between genes involve in multiple pathways become more clear and also relationships between the genes, their transcripts, and their gene products. However, existing tools and dbs mainly based on knowledge driven approach in which pathways will be identified by finding the correlation between the information in one of the pathway knowledge databases (KEGG,Reactome,Panther,BioCarta, Panther,GO,NCI,WikiPathways,etc) and gene expression result for a specific conditions for instance tumor, obesity , cold resistant crops/plants, etc.

Introductory Articles/ppt/sources:

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002375

http://bioinformatics.mdanderson.org/MicroarrayCourse/Lectures09/Pathway%20Analysis.pdf

http://gettinggeneticsdone.blogspot.de/2012/03/pathway-analysis-for-high-throughput.html

http://davetang.org/muse/tag/pathway/

https://www.biostars.org/p/42219/

http://bioinformatics.ca//files/public/Pathways_2014_Module4_v2.pdf

http://bioinformatics.ca//files/public/Pathways_2014_Module2.pdf

Impotant Database and Tools:

GeneMANIA, Cytoscape, IPA and Metacore (Commerical ), Pathway Commons, Reactome ,Panther, BioCyc, WikiPathways, Pathvisio, KEGG, NCI, Stringdb, Amigo, WebGestalt ,ConsensusPathDB ,GSEA,Blast2go

Popular R based tools:

Reactome.db, ReactomePA, ClusterProfiler, Gage, SPIA, topGO, Pathview,DOSE,GOStat

More:

http://www.bioconductor.org/help/search/index.html?q=Enrichment+analysis+

Ryan E. Mills Lab

Tue, 26 May 2015 09:29:24 -0500

Our research group is primarily focused on the analysis of whole genome sequence data to identify genetic variation (primarily structural variation) and examine their potential functional impact in disease phenotypes. We are particularly interested in analyzing complex regions of the genome that are not easily resolved through modern sequencing approaches and which may exhibit interesting mechanistic origins.

We are also interested in the large-scale integration of genomic, expression, methylation and proteomic data sets, as well as the application of whole genome sequence analysis in clinical diagnostics.

More at http://millslab.ccmb.med.umich.edu/index.html

iSeqQC: a tool for expression-based quality control in RNA sequencing

BioStar — Sun, 16 Feb 2020 08:47:17 -0600

iSeqQC, an expression-based QC tool that detects outliers either produced due to variable laboratory conditions or due to dissimilarity within a phenotypic group. iSeqQC implements various statistical approaches including unsupervised clustering, agglomerative hierarchical clustering and correlation coefficients to provide insight into outliers.

http://cancerwebpa.jefferson.edu/iSeqQC/

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3399-8

Address of the bookmark: https://github.com/gkumar09/iSeqQC

Rosenberg lab

Wed, 27 May 2015 17:52:24 -0500

Research. Research in the lab focuses on mathematical, statistical, and computational problems in evolutionary biology and human genetics. Long-term interests of the lab include topics such as:

Human genetic variation
Inference of human evolutionary history from genetic markers
Statistical analysis of population-genetic data
Mathematical models of gene genealogies
Theoretical population genetics
Combinatorics of evolutionary trees
The relationship between gene trees and species trees
The role of human evolutionary genetics in the search for genes that contribute to disease-susceptibility
More at https://web.stanford.edu/group/rosenberglab/index.html

SEASTAR: Systematic Evaluation of Alternative STArt site in RNA

BioStar — Thu, 13 Aug 2020 09:54:27 -0500

SEASTAR (Systematic Evaluation of Alternative STArt site in RNA) is a software package for Transcription Start Site (TSS) identification and quantification using only RNA-seq data. It assembles novel TSSs based only on RNA-Seq data and merges them with known TSSs from a public database. This package enables high-quality TSS identification that is comparable to the highly sophisticated CAGE technology. This package is particularly useful for finding novel TSSs that contribute to transcriptome complexity along with identifying differential promoter utilization.

version 1.0.0 - updates several descriptions and tests. To achieve v0.9.4, one can visit https://github.com/zhyqin/SEASTAR-0.9.4 for download.

Address of the bookmark: https://github.com/Xinglab/SEASTAR

JRF Bioinformatics @ ICAR - National Research Centre for Orchids Pakyong

Thu, 28 May 2015 19:33:19 -0500

ICAR - National Research Centre for Orchids

Pakyong

F.No:NRCO/Admn/DBT /136 /

Walk-in-Interviews will be held at 737106, Sikkim for the post of 01 (One Project ‘DBT’s Twinning programme for the NE’ titled “Assessment of some fragrant orchids of north-east India for sustainable improvement of community livelihood”, indicated below. The appointment will be on contractual basis and the incumbents shall not have any regular appointment in ICAR.

‘DBT’s Twinning programme for the NE’ titled “Assessment of chemical and genetic divergence of some fragrant orchids of north-east India for sustainable improvement of community livelihood”

Junior Research Fellow (One post)

Essential Qualification : a. MSc (with NET qualification) / M.Tech degree (with or without NET) with minimum 55% marks in Biotechnology/ Bioinformatics/ Molecular Biology or any other related field.

Desirable Qualification: Computer Skills (Linux, Perl, Java, MySQL) with experience in advanced molecular Biology techniques

2nd June 2015

Advertisement: www.nrcorchids.nic.in/Employments/Vacancy%20-%20JRF.pdf

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Neel — Wed, 11 Dec 2024 00:59:15 -0600

RNA sequencing (RNA-Seq) is a powerful technology used to study transcriptomes, providing insights into gene expression levels. However, raw RNA-Seq data requires normalization to account for sequencing depth and gene length, enabling accurate comparisons between genes and samples. Among the most widely used normalization methods are TPM (Transcripts Per Million), FPKM (Fragments Per Kilobase Million), and CPM (Counts Per Million). Each method has its unique principles and applications, which we’ll explore in this blog.

Why Normalize RNA-Seq Data?

Normalization is a crucial step in RNA-Seq analysis for the following reasons:

Sequencing depth: Different RNA-Seq experiments produce varying numbers of reads, making direct comparisons between samples misleading.
Gene length: Longer genes inherently generate more reads, irrespective of their actual expression level.
Bias reduction: Normalization mitigates technical biases, enabling meaningful biological interpretation.

TPM (Transcripts Per Million)

TPM measures the proportion of reads mapped to a transcript, normalized by transcript length and sequencing depth. It is calculated as:

Key Features:

Proportionality: TPM values sum to 1,000,000 across all transcripts in a sample, making it easier to compare between samples.
Intuitive interpretation: TPM values directly represent the abundance of transcripts in a sample.
Preferred for comparisons: TPM facilitates between-sample comparisons better than FPKM.

FPKM (Fragments Per Kilobase Million)

FPKM normalizes read counts by transcript length and sequencing depth, but without enforcing proportionality like TPM. It is defined as:

Key Features:

Historical significance: FPKM was one of the first normalization methods used for RNA-Seq.
Single-end vs. paired-end: In paired-end sequencing, FPKM becomes RPKM (Reads Per Kilobase Million).
Limited utility: FPKM values are not as robust as TPM for cross-sample comparisons due to lack of proportionality.

CPM (Counts Per Million)

CPM normalizes raw read counts by sequencing depth, without considering gene length. It is expressed as:

Key Features:

Simplicity: CPM is straightforward and computationally less intensive.
Application: Suitable for non-length-dependent analyses, such as comparing total expression levels or differential expression analysis.
Gene length agnostic: CPM does not correct for gene length, making it less ideal for measuring expression levels.

When to Use Each Method

TPM: Best for comparing expression levels between samples, especially when transcript length and sequencing depth vary.
FPKM: Useful for historical consistency but generally replaced by TPM.
CPM: Ideal for differential expression analysis when gene length normalization is unnecessary.

Conclusion

Choosing the right normalization method depends on the specific objectives of your RNA-Seq analysis. TPM’s proportionality and robustness make it the preferred choice for most applications, while CPM serves well for differential expression studies. Although FPKM paved the way for RNA-Seq normalization, it has largely been supplanted by TPM in modern workflows. Understanding these methods and their nuances ensures accurate and meaningful interpretations of RNA-Seq data.

References:

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics.
Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.
Law, C. W., et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

Frequent words problem solution by Perl

Jit — Tue, 09 Jun 2015 23:38:44 -0500

Solved with perl http://rosalind.info/problems/1a/

#Find the most frequent k-mers in a string.
#Given: A DNA string Text and an integer k.
#Return: All most frequent k-mers in Text (in any order).

use strict;
use warnings;

my $string="ACGTTGCATGTCGCATGATGCATGAGAGCT";
my $kmer=4;
my %myHash;
my $max=0;

for (my $aa=0; $aa<=(length($string)-4); $aa++) {
   my $myStr=substr $string, $aa,$kmer;
   #print "$myStr\n";
   my $km=kmerMatch ($string, $myStr, $kmer);
   if ($km > $max) { $max = $km;}
   #print "$km\t$myStr\n";
   $myHash{$myStr}=$km;

}

#Print all key which have matching values
foreach my $name (keys %myHash){
    print "$name " if $myHash{$name} == $max;
}

sub kmerMatch { #Check the exact matching kmers with sliding window
my ($string, $myStr, $kmer)=@_;
my $count=0;
for (my $aa=0; $aa<=(length($string)-4); $aa++) {
   my $myWin=substr $string, $aa,$kmer;
   if ($myWin eq $myStr) {
       #print "$myWin eq $myStr\n";
       $count++;
   }
}
return $count;
}