BOL: Related items

Bioinformatics JRF/SRF position at NII

Sun, 25 May 2014 16:54:04 -0500

NATIONAL INSTITUTE OF IMMUNOLOGY, NEW DELHI-110067

Applications are invited for the position of Senior Research Fellow for the following time-bound sponsored project as per the details given below:

1. BTIS project on, “Bioinformatics Center-National Infrastructural Facility in the Area of Immunology” funded by DBT

Senior Research Fellow (P) (One Position only)

Dr. Debasisa Mohanty
Staff Scientist-VI
deb@nii.res.in

Qualifications: M.Sc in Biological Sciences or Biotechnology with at least 04 years of Research experience in Bioinformatics or computational Biology after the master’s degree is essential.

Emoluments: The selected candidates will draw consolidated emoluments as per Institute Rules, depending upon qualifications & experience

Rs. 18,000/- per month consolidated plus 30% HRA if Leading to Ph.D/NET/GATE Qualified otherwise Rs. 14,000/- per month + 30% HRA.

Job description: The candidate should be well versed in programming in PERL/C++/HTML/CGI, web server and portal development, computational analysis of
protein structure & function, molecular dynamics simulations and use of high performance computing systems.

GENERAL TERMS AND CONDITIONS:-

1. The candidates selected for the above posts will be on contract for one year or duration of the project whichever is shorter, at a time.
2. No hostel/ housing facility will be provided.
3. Number of posts may vary and shall be need based. Advertisement is no commitment.
4. Applicants may clearly mention the category they belong to i.e. SC/ST/OBC/PH and attach documentary proof of the same.
5. No TA/DA will be paid for attending the interview, if called for.
6. Apart from sending application in the prescribed format given below, candidates should send complete Curriculum Vitae along with the names of three referees. Curriculum Vitae should contain details of the experimental expertise.

HOW TO APPLY Interested candidates may apply directly, STRICTLY IN THE PRESCRIBED FORMAT GIVEN BELOW, through e-mail, to the Investigator of the project, clearly indicating the name of the project along with their complete C.V., e-mail id, fax numbers, telephone numbers. Only Short listed candidates will be called for interview and they required to submit attested copies of all their certificates and a Demand Draft of Rs 100/- drawn on Canara Bank or Indian Bank payable at Delhi/New Delhi in favour of the Director, NII (SC / ST and PH candidates are exempted subject to submission of documentary proof), at the time of interview.

LAST DATE OF RECEIPT OF APPLICATIONS: 06th June, 2014

www1.nii.res.in/sites/default/files/projectappointment-Dr.Mohanty-6June2014.pdf

Genomicus: genome browser that enables users to navigate in genomes in several dimensions

Jit — Mon, 28 Feb 2022 23:27:37 -0600

Genomicus is a genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.

Once a query gene has been entered, it is displayed in its genomic context in parallel to the genomic context of all its orthologous and paralogous copies in all the other sequenced metazoan genomes. Moreover, Genomicus stores and displays the predicted ancestral genome structure in all the ancestral species within the phylogenetic range of interest.

All the data on extant species displayed in this browser are from Ensembl.

Summary statistics of Genomicus version 105.01: (view species tree in pdf or newick)


Number of extant species	200
Number of extant genes	4303993
Number of ancestral species	196
Number of ancestral genes	4624213
Number of ancestral synteny blocks	83342

Address of the bookmark: https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/cgi-bin/search.pl

Bioinformatics JRF vacancy at ICGEB, New Delhi

Wed, 23 Jul 2014 16:07:15 -0500

Junior Research Fellow for a DBT sponsored project entitled "Computational and experimental characterization of stage specific arginine methylation in P. falciparum proteome".

Candidates should have a 1st class MSc/MTech/BTech degree in Bioinformatics. Please send complete CV, quoting Application for RMETH-JRF-2014, by email to Dr. Dinesh Gupta: dinesh@icgeb.res.in

Closing date for applications: 6 August 2014

More at http://www.icgeb.org/tl_files/Vacancies/JRF.pdf

Π-cyc: A Reference-free SNP Discovery Application using Parallel Graph Search

Jit — Tue, 28 Jan 2020 03:34:23 -0600

Reference free SNP search for comparative population genomics: multiple samples run simultanously. **experimental phase, compiles and runs with OpenMPI-1.8.8 with Intel Compiler only

Cycles enumeration (aka Bubbles) as part of de novo de bruijn graphs assembly using colours can be unpractical for large error prone genomes which makes the assembly process produce an excessive number of false positive cycles. Our solution is to search the graph in multicores shared memory parallel mode using graph decomposition then use filtering method to generate good quality SNPs.

https://arxiv.org/abs/1809.06700

https://github.com/redayounsi/2KP2P

/2kp2omp/bin/main_2kp2_K63_C2 -i fastq_files.txt -o fungus_bub.fasta -r stat_fungus.txt -c cov_fungus_hash.txt -k 63 -h 20 -b 100 -g 600 -l 100 -f 16 -t 5.0 -x 1 -v 0 -p 1 -y 1 -u 1

Address of the bookmark: https://github.com/redayounsi/2KP2P

Linux Sort Commands for Bioinformatics

Rahul Nayak — Sat, 31 May 2014 15:41:16 -0500

Almost all the scripting languages such as Perl, Python etc have built-in sort, but unfortunately none of them are as flexible as sort command. But one when it come to space efficiency GNU sort stands at the top. It can sort a 20Gb file with less than 2Gb memory. It is not trivial to implement so powerful a sort by yourself.

sort a space-delimited file based on its first column, then the second if the first is the same, and so on:
sort input.txt

sort a huge file (GNU sort ONLY):
sort -S 1500M -t $HOME/tmp input.txt > sorted.txt

sort starting from the third column, skipping the first two columns:
sort +2 input.txt

sort the second column as numbers, descending order; if identical, sort the 3rd as strings, ascending order:
sort -k2,2nr -k3,3 input.txt

sort starting from the 4th character at column 2, as numbers:
sort -k2.4n input.txt

More Linxu sort command information

If you have any sort commands you'd like to share, please add them to our comments section below. For more help, you can also type:

man sort

or

sort --help

on your Unix/Linux system.

List of motif discovery tools !

Neel — Tue, 20 Nov 2018 03:54:26 -0600

In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed by the three-dimensional arrangement of amino acids which may not be adjacent.

Following are the list of tools for motif discovery:

2Dsweep -- protein annotation by secondary structure elements

Perform secondary structure predictions on protein sequences.

3D-footprint -- database of DNA-binding protein structures

Find binding specificity information about DNA-protein complexes.

3D-footprint: DNA-binding protein database

Find information about the binding specificity of DNA-binding proteins.

3D-partner -- a web server to infer interacting partners and binding models

Predict interacting partners and binding models.

3MOTIF -- a protein structure visualization system for conserved sequence motifs

Use this web-based sequence motif visualization system to display sequence motif information in its appropriate three-dimensional (3D) context.

AFAWE -- Automatic functional annotation in a distributed Web Services Environment

Protein function prediction and annotation in an integrated environment powered by web service.

ANCHOR -- Prediction of Protein Binding Regions in Disordered Proteins

Find information about protein binding.

ANNIE -- ANNotation and Interpretation Environment for Protein Sequences

Use to predict function from de novo protein sequences.

Active Sequences Collection (ASC) database -- A new tool to assign functions to protein sequences

Search for short active protein sequences with demonstrated biological activities.

Blocks -- Ungapped segments in conserved protein sequences

Search for ungapped segments corresponding to the most highly conserved regions of proteins.

CASTp -- computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues

Identify and measure surface accessible pockets as well as interior inaccessible cavities, for proteins and other molecules.

CSA -- The Catalytic Site Atlas

To search for catalytic residue annotation for enzymes in the Protein Data Bank.

ConFunc -- Conserved residue Protein Function Prediction Server

Predict protein function using Gene Ontology.

ConSurf-DB -- evolutionary conservation profiles of protein structures database

Automatically calculate evolutionary conservation scores of key amino acid residues and map them on protein structures.

DBAli -- A Database of Structure Alignments

Mine the protein structure space.

DILIMOT -- discovery of linear motifs in proteins

Predict short linear motifs (3-8 residues) in a set of protein sequences.

Dasty2 -- an Ajax protein DAS client

A web client for visualizing protein sequence feature information using DAS.

DomainSweep -- protein annotation by domain analysis

Identify the domain architecture within a protein sequence.

E1DS -- catalytic site prediction based on 1D signatures of concurrent conservation

Predict enzyme catalytic site.

ELM -- Eukarotic Linear Motif Resource

Predict functional sites in eukaryotic proteins.

EXPASY Proteome Tools Collection

Use a collection of tools for protein analyses.

EXPASY-Findmod

Predict potential protein post-translational modifications and find potential single amino acid substitutions in peptides.

EzCatDB -- the Enzyme Catalytic-mechanism Database

Search for information related to the catalytic mechanisms of enzymes.

FFPred -- feature-based function prediction

An integrated feature-based function prediction server for vertebrate proteomes.

FingerPRINT Scan

Identify the closest matching PRINTS sequence motif fingerprints in a protein sequence.

FireDB -- a database of functionally important residues from proteins of known structure

Search for functional annotation of important sites in proteins with known structures.

Frog2 -- a FRee Online druG 3D conformation generator

Produce 3D conformations of small drug compounds.

HGPD -- Human Gene and Protein Database

A database presenting experiment-based results in human proteomics.

HHsenser -- exhaustive transitive profile search using HMMx96HMM comparison

Conduct exhaustive intermediate profile searches of a set of homologous protein sequences.

HotSpot Wizard -- Substrate Specificity Hot Spot Identification web server

Design protein mutations in site-directed mutagenesis.

INTREPID -- INformation-theoretic TREe traversal for Protein functional site IDentification

Use for protein functional site identification.

Integrating protein annotation resources through the Distributed Annotation System

Annotate protein using this integrated annotation resource.

InterProScan -- protein domains identifier

Identify protein family (and DNA) domains, patterns, motifs, protein families, and functional sites.

KFC -- Knowledge-based FADE and Contacts

Interactive forecasting of protein interaction hot spots.

MAGIIC-PRO -- detecting functional signatures by efficient discovery of long patterns in protein sequences

Discover long patterns in protein sequences.

MALISAM -- Manual ALIgnments for Structurally Analogous Motifs

Database containing pairs of structural analogs and their alignments.

MEME -- discovering and analyzing DNA and protein sequence motifs

Find sequence patterns in DNA and protein sequences.

MODPROPEP -- a program for knowledge-based modeling of protein-peptide complexes

A web server for knowledge-based modeling of protein-peptide complexes, specifically peptides in complex with major histocompatibility complex (MHC) proteins and kinases.

MeMo -- a web tool for prediction of protein methylation modifications

Predict protein methylation sites.

MegaMotifBase -- a database of structural motifs in protein families and superfamilies

Find structural segments or motifs for protein structures.

Minimotif Miner -- a tool for investigating protein function

Find motifs in a protein sequence.

Motif3D -- Relating protein sequence motifs to 3D structure

Visualize protein sequence motifs on the 3D protein structures.

MotifScan

Find presence of any known protein motif (Prosite and Pfam) in a protein sequence.

MultiBind -- Multiple Alignment of Protein Binding Sites

Recognize spatial chemical binding patterns common to a set of protein structures.

NMT -- The MYR Predictor

Analyze proteins for the presence of N-terminal N-myristoylation site.

NetNGlyc -- N-Glycosylation sites prediction tool

Find the presence of N-Glycosylation sites in human proteins.

NetOGly 3.1 -- O-glycosylation sites prediction tool

Find the presence of O-GalNAc (mucin type) glycosylation sites in mammalian proteins.

NetPhos 2.0 -- Phosphorylation sites predictions

Analyze eukaryotic proteins for the presence of serine, threonine and tyrosine phosphorylation sites.

NetPhosK 1.0 Server -- kinase specific eukaryotic protein phosphorylation sites prediction tool

Find possible kinase specific phosphorylation sites in eukaryotic proteins.

NetworKIN -- a resource for exploring cellular phosphorylation networks

NeuroPred -- a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides

Predict cleavage sites at basic amino acid locations in neuropeptide precursor sequences.

Non-Redundant Patent Sequences - Patented Sequence Database

Find information about patented nucleotide and protein sequences.

O-GLYCBASE

Search for information about glycoproteins with O-linked and C-linked glycosylation sites.

PANDORA -- Protein ANnotation Diagram ORiented Analysis

Find information about protein sequence annotations.

PAR-3D -- Protein Active site Residue - 3D structural motif

A server to predict protein active site residues.

PDBSite -- a database of the 3D structure of protein functional sites

Search for structural and functional information on the protein functional sites.

PDBSiteScan -- A program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins

Search 3D protein fragments similar in structure to known active, binding and posttranslational modification sites.

PEDANT -- Protein Extraction, Description and ANalysis Tool

Conduct genome wide functional and structural analysis.

PHOSIDA -- Phosphorylation site database

Search for phosphorylation data of any protein of interest.

PHOSPHORYLATION SITE DATABASE

Search for information on prokaryotic proteins that undergo serine, threonine, or tyrosine phosphorylation.

PNU -- Protein Naming Utility

Determine correct names for proteins.

POODLE-S -- Predicition Of Order and Disorder by machine LEarning

Web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.

PPISearch -- Protein-Protein Interaction Search

Find homologous protein-protein interactions across multiple species.

PPSearch

Search your query sequence against PROSITE pattern database for protein motifs.

PRIDB -- Protein-RNA Interface DataBase

Find information about protein-RNA complexes from the Protein Data Bank (PDB).

PRINTS and its automatic supplement, prePRINTS -- A compendium of protein fingerprints

Search for protein fingerprints.

PROSITE

Identify protein families and domains for a given protein sequence.

PRRDB -- Pattern Recognition Receptor Database

A comprehensive database of pattern-recognition receptors and their ligands.

PatMatch -- a program for finding patterns in peptide and nucleotide sequences

Search for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences.

PepCyber:P~PEP -- a database of human protein protein interactions mediated by phosphoprotein-binding domains

Database specialized in documenting human PPBD-containing proteins and PPBD-mediated interactions.

PeptideCutter -- protein cleavage sites prediction tool

Predicts potential protease cleavage sites and sites cleaved by chemicals in a given protein sequence.

Phobius -- A combined transmembrane topology and signal peptide predictor

Predict combined transmembrane topology and signal peptides.

Phospho.ELM -- a database of phosphorylation sites

Search for eukaryotic phosphorylation sites.

Phospho3D -- a database of three-dimensional structures of protein phosphorylation sites

Search for 3D structure and functional annotation of phosphorylation sites in proteins.

PhosphoSite -- A bioinformatics resource dedicated to physiological protein phosphorylation.

Search the database of in vivo phosphorylation sites of human and mouse proteins

PolyQ -- Polyglutamine Database

Find information about polyglutamine (polyQ) repeats.

Pratt Protein motif and pattern discovery

Find the presence of protein motifs and patterns in an amino acid sequence.

PrediSi -- Prediction of Signal Peptides and their Cleavage Positions

Predict signal peptide sequences and their cleavage positions in bacterial and eukaryotic amino acid sequences.

ProFunc -- a server for predicting protein function from 3D structure

Predict protein functions based on known structures.

ProMateus--an open research approach to protein-binding sites analysis

Predict the location of potential protein-protein binding sites for unbound proteins.

ProTeus -- identifying signatures in protein termini

Identify short linear signatures in protein termini.

ProtSweep -- protein annotation by homology

Analyze and identify newly obtained protein sequences.

Protemot -- prediction of protein binding sites with automatically extracted geometrical templates

Predict protein binding sites in a protein sequence based on geometrical analysis of protein tertiary substructures.

QuasiMotiFinder -- protein annotation by searching for evolutionarily conserved motif-like patterns

Search for evolutionarily conserved motif-like patterns in protein sequences.

RNABindR -- software for prediction of RNA binding residues in proteins

Web-based server for analyzing and predicting RNA binding sites in proteins.

SCANMOT -- searching for similar sequences using a simultaneous scan of multiple sequence motifs

Search for similarities between proteins by simultaneous matching of multiple motifs.

SDPpred -- A Tool for Prediction of Amino Acid Residues that Determine Differences in Functional Specificity of Homologous Proteins

Predict residues in protein sequences that determine the proteins' functional specificity.

SDR -- Specificity Determining Residues Database

Predict specificity-determining residues in protein families.

SLiMDisc -- Short, Linear Motif Discovery

Find shared motifs in proteins with a common attribute.

SUMOsp -- a web server for sumoylation site prediction

Conduct in silico sumoylation sites prediction.

SWAKK -- a web server for detecting positive selection in proteins using a sliding window substitution rate analysis

Detect protein sequence section under positive evolution selection.

ScanProsite

Search for motifs and patterns within protein sequences.

ScanProsite -- detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins

Detect patterns, profiles and motifs in a protein sequence.

ScanSite 2.0 -- Proteome-wide prediction of cell signaling interactions using short sequence motifs

Search for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains.

SePreSA -- SErver for the PREdiction of populations susceptible to Serious Adverse drug reaction

Find information about populations carrying polymorphisms within protein binding pockets that make them susceptible to serious adverse drug reaction (SADR).

Sequence Motif Search

Search the presence of a motif in either amino acid sequence or nucleotide sequence.

Signal-3L -- A 3-layer approach for predicting signal peptides

Predict signal peptides.

SignalP -- Machine learning approaches to the prediction of signal peptides, their cleavage sites, and other protein sorting signals

Predict signal peptides and their cleavage sites.

Sulfinator -- tyrosine sulfation sites prediction tool

Predict the presence of tyrosine sulfation sites in protein sequences

SuperSite -- Ligand Binding Site Database

Look at protein structure from a ligand and binding site perspective.

Swiss EMBnet node web server

Use a collection of bioinformatics tools at this portal site.

T-REKS -- identification of Tandem REpeats in sequences with a K-meanS based algorithm

Find information about tandem repeats in proteins that carry fundamental biological functions and are related to a number of human diseases.

TMFunction -- The Functional Database of Membrane Proteins

Find information about functional residues in alpha-helical and beta-barrel membrane proteins.

TOPDOM -- Conservatively Located Domains and Motifs in Transmembrane Proteins

Database of domains and motifs with conservative location in transmembrane proteins.

The EMOTIF database

Search for highly conserved and specific protein sequence motifs.

TreeDet -- Predicting Functional Residues in Protein Sequence Alignments

Predict functional sites in protein sequence alignments use different methodologies.

W-ChIPMotifs -- ChIP-based protein Motif discovery web server

Find de novo protein motifs from chromatin immunoprecipitation data.

WebFEATURE -- an interactive web tool for identifying and visualizing functional sites on macromolecular structures

Scan query structures for functional sites in both proteins and nucleic acids.

WebProAnalyst -- an interactive tool for analysis of quantitative structurex96activity relationships in protein families

Analyze quantitative structure-activity relationship of related protein families.

eBLOCKs -- enumerating conserved protein blocks to achieve maximal sensitivity and specificity

Search for ungapped alignments of highly conserved regions among a protein family or superfamily.

eF-seek -- prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

Predict the functional sites of proteins.

firestar -- prediction of functionally important residues using structural templates and alignment reliability

An expert system for predicting ligand-binding residues in protein structures.

iMOTdb -- a comprehensive collection of spatially interacting motifs in proteins

Automatically identify spatially interacting motifs among distantly related proteins sharing similar folds and possessing common ancestral lineage.

Next generation sequencing in R or bioconductor environment

John Parker — Mon, 02 Jun 2014 18:03:09 -0500

There are many R software and bioconductor packages for NGS data analysis, some of them are as follows

Biostrings

The Biostrings package from Bioconductor provides an advanced environment for efficient sequence management and analysis in R. It contains many speed and memory effective string containers, string matching algorithms, and other utilities, for fast manipulation of large sets of biological sequences. The objects and functions provided by Biostrings form the basis for many other sequence analysis packages. Documentation

IRanges Overview

IRanges provides the low-level infrastructure and containers for handling sets of integer ranges within Bioconductor's BioC-Seq domain. Its classes and methods provide support for many more high-level packages like GenomicRanges, ShortRead, Rsamtools, etc. Documentation

GenomicRanges Overview

The GenomicRanges package serves as the foundation for representing genomic locations within the Bioconductor project. It is built upon the IRanges infrastructure and defines three major data containers - GRanges, GRangesList and GappedAlignments - which are supporting other important BioC-Seq packages including ShortRead, Rsamtools, rtracklayer, GenomicFeatures and BSgenome. Compared to the IRanges container, the GRanges/GRangesList classes are more flexible and extensible to store additional information about sequence ranges, such as chromosome identifiers (sequence space), strand information and annotation data. Documentation

Motif Discovery

cosmo

The cosmo package allows to search a set of unaligned DNA sequences for a shared motif that may function as transcription factor binding site. The algorithm extends the popular motif discovery tool MEME (Bailey and Elkan, 1995) in that it allows the search to be supervised by specifying a set of constraints that the motif to be discovered must satisfy. Documentation

BCRANK

BCRANK is a method that takes a ranked list of genomic regions as input and outputs short DNA sequences that are overrepresented in some part of the list. The algorithm was developed for detecting transcription factor (TF) binding sites in a large number of enriched regions from high-throughput ChIP-chip or ChIP-seq experiments, but it can be applied to any ranked list of DNA sequences. Documentation

rGADEM: Documentation

MotIV: Documentation

ShortRead

The ShortRead package provides input, quality control, filtering, parsing, and manipulation functionality for short read sequences produced by high throughput sequencing technologies. While support is provided for many sequencing technologies, this package is primairly focused on Solexa/Illumina reads. Documentation

Rsamtools

Rsamtools provides functions for parsing and inspecting samtools BAM formatted binary alignment data. SAM/BAM is quickly becoming a universal standard alignment format, and is now supported by a wide variety of alignment tools. Documentation

Samtools Website
BWA (Burrows-Wheeler Alignment) Website

Additional tools for SNP analysis:

snpMatrix

BSgenome

BSgenome provides an object oriented infrastructure for interacting with a Biostring based genome sequence. BSgenome packages exist for many common genomes, and can be created to represent custom genomes. See the "How to forge a BSgenome data package" Vignette for instructions to create a new BSgenome package if a prebuilt package does not exist for your organism. Documentation

rtracklayer

rtracklayer provides an interface for exporting annotation feature data to various genome browsers and file formats (such as GFF). See the Small RNA Profiling exercise for an example of using rtracklayer to visualize alignment coverage. Documentation

biomaRt

The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite (http:// www.biomart.org). The package enables online retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas. This data is retrieved automatically via the Internet, so it's recommended that you cache the data locally, or check versions if your code will be adversely affected by updates to these data. Documentation

ChIP-Seq Analysis Packages

Bioconductor provides various packages for analyzing and visualizing ChIP-Seq data. Only a small selection of these packages is introduced here. Additional useful introductions to this topic are: BioC ChIP-seq Case Study and BioC ChIP-Seq.

chipseq

The chipseq package combines a variety of HT-Seq packages to a pipeline for ChIP-Seq data analysis. Documentation

BayesPeak

BayesPeak is a peak calling package for identifying DNA binding sites of proteins in ChIP-Seq experiments. Its algorithm uses hidden Markov models (HMM) and Bayesian statistical methods. The following sample code introduces the identification of peaks with the BayesPeak package as well as the incorporation of read coverage information obtained by the chipseq package. Documentation [ Publication ]

PICS

The PICS package applies probabilistic inference to aligned-read ChIP-Seq data in order to identify regions bound by transcription factors. PICS identifies enriched regions by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. The following sample code uses the test data set from the above BayesPeak package in order to compare the results from both methods by identifying their consensus peak set. Documentation [ Publication ]

ChIPpeakAnno

The ChIPpeakAnno package provides. batch annotation of the peaks identified from either ChIP-seq or ChIP-chip experiments. It includes functions to retrieve the sequences around peaks, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. The package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages. Documentation

Additional ChIP-Seq Packages

DiffBind: Documentation

MOSAICS: Documentation

iSeq: Documentation

ChIPseqR: Documentation

ChiPsim: Documentation

CSAR: Documentation

ChIP-Seq Pipeline: PICS, rGADEM and MotIV (developer web site)

SPP: ChIP-seq processing pipeline

SPP Tutorial

MACS

SIPeS

RNA-Seq Analysis

Counting Reads that Overlap with Annotation Ranges

The GenomicRanges package provides support for importing into R short read alignment data in BAM format (via Rsamtools) and associating them with genomic feature ranges, such as exons or genes. This way one can quantify the number of reads aligning to annotated genomic regions. The package defines general purpose containers for storing genomic intervals as well as more specialized containers for storing alignments against a reference genome. The two main functions for read counting provided by this infrastructure are countOverlaps and summarizeOverlaps. For their proper usage, it is important to read the corresponding PDF manual. Documentation

Differential Gene Expression Analysis with DESeq

The DESeq package contains functions to call differentially expressed genes (DEGs) in count tables based on a model using the negative binomial distribution. It expects as input a data frame with the raw read counts per region/gene of interest (rows) for each test sample (columns). Such a count table can be imported into R or generated from BAM alignment files using the countOverlaps function as introduced above. Documentation

Differential Gene Expression Analysis with edgeR

The edgeR package uses empirical Bayes estimation and exact tests based on the negative binomial distribution to call differentially expressed genes (DEGs) in count data.

Documentation

A variety of additional R packages are available for normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG):

easyRNASeq (simplifies read counting per genome feature)

DEXSeq (Inference of differential exon usage); parathyroidSE explains how to generate exon read counts in R

DEGseq

baySeq (also see: segmentSeq)

Genominator (Bullard et al. 2010)

Detection of Alternative Splice Junctions

Another utility of RNA-Seq experiments is the analysis of splice junctions. The following software suggestions provide this utility:

ERANGE
TopHat

SpliceMap

SplitSeek

DNA-Methylation Data Analysis

methylPipe
bsseq
BiSeq
Much more under BiocViews

HT-Seq Data Visualization

ggbio: ggplot2 extension for genomics data (online manual) Gviz: Plotting data and annotation information along genomic coordinates HilbertVis: Hilbert genome plots

GenomeGraphs: Plotting genomic information from Ensembl

TileQC: Flow Cell Quality Visualization

rtracklayer: R interface to genome browsers

genoPlotR: Plotting maps of genes and genomes

Genominator: Tools for storing, accessing, analyzing and visualizing genomic data.

To install all packages

source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("ShortRead", "Biostrings", "IRanges", "BSgenome", "rtracklayer", "biomaRt", "chipseq", "ChIPpeakAnno", "Rsamtools", "BayesPeak", "PICS", "GenomicRanges", "DESeq", "edgeR", "leeBamViews", "GenomicFeatures", "BSgenome.Celegans.UCSC.ce2"))

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Jit — Wed, 13 Jan 2021 19:29:32 -0600

MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs. Metaeuk combines the fast and sensitive homology search capabilities of MMseqs2 with a dynamic programming procedure to recover optimal exons sets. It reduces redundancies in multiple discoveries of the same gene and resolves conflicting gene predictions on the same strand. MetaEuk is GPL-licensed open source software that is implemented in C++ and available for Linux and macOS. The software is designed to run on multiple cores.

Address of the bookmark: https://github.com/soedinglab/metaeuk

Postdoc position at Centre Méditerranéen de Médecine Moléculaire - Nice - France

Wed, 04 Jun 2014 07:20:57 -0500

The research group of Dr. Michele Trabucchi at the Centre Méditerranéen de Médecine Moléculaire (C3M) at INSERM U1065 (University of Nice Sophia-Antipolis, France) is seeking candidates for a Postdoctoral fellow position to start on October 2014 for 3 years funded by FRM (Fondation pour la Recherche Médicale).
The broad interest of the lab is in understanding the expression control and function of small RNAs in activated myeloid cells (visit our webpage to check research interests and publications of the group : http://www.unice.fr/c3m/EN/Equipe10.html ).

The work will focus on the functional studies of small RNAs by using next-generation sequencing approaches.

Candidates should hold a Ph.D. degree and have strong background in bioinformatics.
The University of Nice Sophia-Antipolis provides a wide range of facilities and training essential for biomedical research.

Interested applicants should send a PDF with a cover letter stating research interests and qualifications, an updated CV, a summary of previous research experience and contact information for two references to Michele Trabucchi ( mtrabucchi@unice.fr )

Homepage: http://www.unice.fr/c3m/EN/Equipe10.html

Ten recommendations for creating usable bioinformatics command line software

RAJESH DETROJA — Sun, 08 Jun 2014 10:06:26 -0500

Bioinformatics software varies greatly in quality. In terms of usability, the command line interface is the first experience a user will have of a tool. Unfortunately, this is often also the last time a tool will be used. Here I present ten recommendations for command line software author’s tools to follow, which I believe would greatly improve the uptake and usability of their products, waste less user’s time, and improve the quality of scientific analyses.

Address of the bookmark: http://www.gigasciencejournal.com/content/2/1/15?utm_content=buffer25ee0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer