BOL: Related items

List of motif discovery tools !

Neel — Tue, 20 Nov 2018 03:54:26 -0600

In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed by the three-dimensional arrangement of amino acids which may not be adjacent.

Following are the list of tools for motif discovery:

2Dsweep -- protein annotation by secondary structure elements

Perform secondary structure predictions on protein sequences.

3D-footprint -- database of DNA-binding protein structures

Find binding specificity information about DNA-protein complexes.

3D-footprint: DNA-binding protein database

Find information about the binding specificity of DNA-binding proteins.

3D-partner -- a web server to infer interacting partners and binding models

Predict interacting partners and binding models.

3MOTIF -- a protein structure visualization system for conserved sequence motifs

Use this web-based sequence motif visualization system to display sequence motif information in its appropriate three-dimensional (3D) context.

AFAWE -- Automatic functional annotation in a distributed Web Services Environment

Protein function prediction and annotation in an integrated environment powered by web service.

ANCHOR -- Prediction of Protein Binding Regions in Disordered Proteins

Find information about protein binding.

ANNIE -- ANNotation and Interpretation Environment for Protein Sequences

Use to predict function from de novo protein sequences.

Active Sequences Collection (ASC) database -- A new tool to assign functions to protein sequences

Search for short active protein sequences with demonstrated biological activities.

Blocks -- Ungapped segments in conserved protein sequences

Search for ungapped segments corresponding to the most highly conserved regions of proteins.

CASTp -- computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues

Identify and measure surface accessible pockets as well as interior inaccessible cavities, for proteins and other molecules.

CSA -- The Catalytic Site Atlas

To search for catalytic residue annotation for enzymes in the Protein Data Bank.

ConFunc -- Conserved residue Protein Function Prediction Server

Predict protein function using Gene Ontology.

ConSurf-DB -- evolutionary conservation profiles of protein structures database

Automatically calculate evolutionary conservation scores of key amino acid residues and map them on protein structures.

DBAli -- A Database of Structure Alignments

Mine the protein structure space.

DILIMOT -- discovery of linear motifs in proteins

Predict short linear motifs (3-8 residues) in a set of protein sequences.

Dasty2 -- an Ajax protein DAS client

A web client for visualizing protein sequence feature information using DAS.

DomainSweep -- protein annotation by domain analysis

Identify the domain architecture within a protein sequence.

E1DS -- catalytic site prediction based on 1D signatures of concurrent conservation

Predict enzyme catalytic site.

ELM -- Eukarotic Linear Motif Resource

Predict functional sites in eukaryotic proteins.

EXPASY Proteome Tools Collection

Use a collection of tools for protein analyses.

EXPASY-Findmod

Predict potential protein post-translational modifications and find potential single amino acid substitutions in peptides.

EzCatDB -- the Enzyme Catalytic-mechanism Database

Search for information related to the catalytic mechanisms of enzymes.

FFPred -- feature-based function prediction

An integrated feature-based function prediction server for vertebrate proteomes.

FingerPRINT Scan

Identify the closest matching PRINTS sequence motif fingerprints in a protein sequence.

FireDB -- a database of functionally important residues from proteins of known structure

Search for functional annotation of important sites in proteins with known structures.

Frog2 -- a FRee Online druG 3D conformation generator

Produce 3D conformations of small drug compounds.

HGPD -- Human Gene and Protein Database

A database presenting experiment-based results in human proteomics.

HHsenser -- exhaustive transitive profile search using HMMx96HMM comparison

Conduct exhaustive intermediate profile searches of a set of homologous protein sequences.

HotSpot Wizard -- Substrate Specificity Hot Spot Identification web server

Design protein mutations in site-directed mutagenesis.

INTREPID -- INformation-theoretic TREe traversal for Protein functional site IDentification

Use for protein functional site identification.

Integrating protein annotation resources through the Distributed Annotation System

Annotate protein using this integrated annotation resource.

InterProScan -- protein domains identifier

Identify protein family (and DNA) domains, patterns, motifs, protein families, and functional sites.

KFC -- Knowledge-based FADE and Contacts

Interactive forecasting of protein interaction hot spots.

MAGIIC-PRO -- detecting functional signatures by efficient discovery of long patterns in protein sequences

Discover long patterns in protein sequences.

MALISAM -- Manual ALIgnments for Structurally Analogous Motifs

Database containing pairs of structural analogs and their alignments.

MEME -- discovering and analyzing DNA and protein sequence motifs

Find sequence patterns in DNA and protein sequences.

MODPROPEP -- a program for knowledge-based modeling of protein-peptide complexes

A web server for knowledge-based modeling of protein-peptide complexes, specifically peptides in complex with major histocompatibility complex (MHC) proteins and kinases.

MeMo -- a web tool for prediction of protein methylation modifications

Predict protein methylation sites.

MegaMotifBase -- a database of structural motifs in protein families and superfamilies

Find structural segments or motifs for protein structures.

Minimotif Miner -- a tool for investigating protein function

Find motifs in a protein sequence.

Motif3D -- Relating protein sequence motifs to 3D structure

Visualize protein sequence motifs on the 3D protein structures.

MotifScan

Find presence of any known protein motif (Prosite and Pfam) in a protein sequence.

MultiBind -- Multiple Alignment of Protein Binding Sites

Recognize spatial chemical binding patterns common to a set of protein structures.

NMT -- The MYR Predictor

Analyze proteins for the presence of N-terminal N-myristoylation site.

NetNGlyc -- N-Glycosylation sites prediction tool

Find the presence of N-Glycosylation sites in human proteins.

NetOGly 3.1 -- O-glycosylation sites prediction tool

Find the presence of O-GalNAc (mucin type) glycosylation sites in mammalian proteins.

NetPhos 2.0 -- Phosphorylation sites predictions

Analyze eukaryotic proteins for the presence of serine, threonine and tyrosine phosphorylation sites.

NetPhosK 1.0 Server -- kinase specific eukaryotic protein phosphorylation sites prediction tool

Find possible kinase specific phosphorylation sites in eukaryotic proteins.

NetworKIN -- a resource for exploring cellular phosphorylation networks

NeuroPred -- a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides

Predict cleavage sites at basic amino acid locations in neuropeptide precursor sequences.

Non-Redundant Patent Sequences - Patented Sequence Database

Find information about patented nucleotide and protein sequences.

O-GLYCBASE

Search for information about glycoproteins with O-linked and C-linked glycosylation sites.

PANDORA -- Protein ANnotation Diagram ORiented Analysis

Find information about protein sequence annotations.

PAR-3D -- Protein Active site Residue - 3D structural motif

A server to predict protein active site residues.

PDBSite -- a database of the 3D structure of protein functional sites

Search for structural and functional information on the protein functional sites.

PDBSiteScan -- A program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins

Search 3D protein fragments similar in structure to known active, binding and posttranslational modification sites.

PEDANT -- Protein Extraction, Description and ANalysis Tool

Conduct genome wide functional and structural analysis.

PHOSIDA -- Phosphorylation site database

Search for phosphorylation data of any protein of interest.

PHOSPHORYLATION SITE DATABASE

Search for information on prokaryotic proteins that undergo serine, threonine, or tyrosine phosphorylation.

PNU -- Protein Naming Utility

Determine correct names for proteins.

POODLE-S -- Predicition Of Order and Disorder by machine LEarning

Web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.

PPISearch -- Protein-Protein Interaction Search

Find homologous protein-protein interactions across multiple species.

PPSearch

Search your query sequence against PROSITE pattern database for protein motifs.

PRIDB -- Protein-RNA Interface DataBase

Find information about protein-RNA complexes from the Protein Data Bank (PDB).

PRINTS and its automatic supplement, prePRINTS -- A compendium of protein fingerprints

Search for protein fingerprints.

PROSITE

Identify protein families and domains for a given protein sequence.

PRRDB -- Pattern Recognition Receptor Database

A comprehensive database of pattern-recognition receptors and their ligands.

PatMatch -- a program for finding patterns in peptide and nucleotide sequences

Search for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences.

PepCyber:P~PEP -- a database of human protein protein interactions mediated by phosphoprotein-binding domains

Database specialized in documenting human PPBD-containing proteins and PPBD-mediated interactions.

PeptideCutter -- protein cleavage sites prediction tool

Predicts potential protease cleavage sites and sites cleaved by chemicals in a given protein sequence.

Phobius -- A combined transmembrane topology and signal peptide predictor

Predict combined transmembrane topology and signal peptides.

Phospho.ELM -- a database of phosphorylation sites

Search for eukaryotic phosphorylation sites.

Phospho3D -- a database of three-dimensional structures of protein phosphorylation sites

Search for 3D structure and functional annotation of phosphorylation sites in proteins.

PhosphoSite -- A bioinformatics resource dedicated to physiological protein phosphorylation.

Search the database of in vivo phosphorylation sites of human and mouse proteins

PolyQ -- Polyglutamine Database

Find information about polyglutamine (polyQ) repeats.

Pratt Protein motif and pattern discovery

Find the presence of protein motifs and patterns in an amino acid sequence.

PrediSi -- Prediction of Signal Peptides and their Cleavage Positions

Predict signal peptide sequences and their cleavage positions in bacterial and eukaryotic amino acid sequences.

ProFunc -- a server for predicting protein function from 3D structure

Predict protein functions based on known structures.

ProMateus--an open research approach to protein-binding sites analysis

Predict the location of potential protein-protein binding sites for unbound proteins.

ProTeus -- identifying signatures in protein termini

Identify short linear signatures in protein termini.

ProtSweep -- protein annotation by homology

Analyze and identify newly obtained protein sequences.

Protemot -- prediction of protein binding sites with automatically extracted geometrical templates

Predict protein binding sites in a protein sequence based on geometrical analysis of protein tertiary substructures.

QuasiMotiFinder -- protein annotation by searching for evolutionarily conserved motif-like patterns

Search for evolutionarily conserved motif-like patterns in protein sequences.

RNABindR -- software for prediction of RNA binding residues in proteins

Web-based server for analyzing and predicting RNA binding sites in proteins.

SCANMOT -- searching for similar sequences using a simultaneous scan of multiple sequence motifs

Search for similarities between proteins by simultaneous matching of multiple motifs.

SDPpred -- A Tool for Prediction of Amino Acid Residues that Determine Differences in Functional Specificity of Homologous Proteins

Predict residues in protein sequences that determine the proteins' functional specificity.

SDR -- Specificity Determining Residues Database

Predict specificity-determining residues in protein families.

SLiMDisc -- Short, Linear Motif Discovery

Find shared motifs in proteins with a common attribute.

SUMOsp -- a web server for sumoylation site prediction

Conduct in silico sumoylation sites prediction.

SWAKK -- a web server for detecting positive selection in proteins using a sliding window substitution rate analysis

Detect protein sequence section under positive evolution selection.

ScanProsite

Search for motifs and patterns within protein sequences.

ScanProsite -- detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins

Detect patterns, profiles and motifs in a protein sequence.

ScanSite 2.0 -- Proteome-wide prediction of cell signaling interactions using short sequence motifs

Search for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains.

SePreSA -- SErver for the PREdiction of populations susceptible to Serious Adverse drug reaction

Find information about populations carrying polymorphisms within protein binding pockets that make them susceptible to serious adverse drug reaction (SADR).

Sequence Motif Search

Search the presence of a motif in either amino acid sequence or nucleotide sequence.

Signal-3L -- A 3-layer approach for predicting signal peptides

Predict signal peptides.

SignalP -- Machine learning approaches to the prediction of signal peptides, their cleavage sites, and other protein sorting signals

Predict signal peptides and their cleavage sites.

Sulfinator -- tyrosine sulfation sites prediction tool

Predict the presence of tyrosine sulfation sites in protein sequences

SuperSite -- Ligand Binding Site Database

Look at protein structure from a ligand and binding site perspective.

Swiss EMBnet node web server

Use a collection of bioinformatics tools at this portal site.

T-REKS -- identification of Tandem REpeats in sequences with a K-meanS based algorithm

Find information about tandem repeats in proteins that carry fundamental biological functions and are related to a number of human diseases.

TMFunction -- The Functional Database of Membrane Proteins

Find information about functional residues in alpha-helical and beta-barrel membrane proteins.

TOPDOM -- Conservatively Located Domains and Motifs in Transmembrane Proteins

Database of domains and motifs with conservative location in transmembrane proteins.

The EMOTIF database

Search for highly conserved and specific protein sequence motifs.

TreeDet -- Predicting Functional Residues in Protein Sequence Alignments

Predict functional sites in protein sequence alignments use different methodologies.

W-ChIPMotifs -- ChIP-based protein Motif discovery web server

Find de novo protein motifs from chromatin immunoprecipitation data.

WebFEATURE -- an interactive web tool for identifying and visualizing functional sites on macromolecular structures

Scan query structures for functional sites in both proteins and nucleic acids.

WebProAnalyst -- an interactive tool for analysis of quantitative structurex96activity relationships in protein families

Analyze quantitative structure-activity relationship of related protein families.

eBLOCKs -- enumerating conserved protein blocks to achieve maximal sensitivity and specificity

Search for ungapped alignments of highly conserved regions among a protein family or superfamily.

eF-seek -- prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

Predict the functional sites of proteins.

firestar -- prediction of functionally important residues using structural templates and alignment reliability

An expert system for predicting ligand-binding residues in protein structures.

iMOTdb -- a comprehensive collection of spatially interacting motifs in proteins

Automatically identify spatially interacting motifs among distantly related proteins sharing similar folds and possessing common ancestral lineage.

Frequent parameters for bioinformatics tools !

BioStar — Tue, 27 Oct 2020 19:42:32 -0500

Third party executable parameters and options.

Trimmomatic

“ILLUMINACLIP:...:2:30:10”

“LEADING:15”

“TRAILING:15”

“SLIDINGWINDOW:4:20”

“MINLEN:20”

“TOPHRED33”

Filtlong

--min_length 500

--min_mean_q 85

--min_window_q 65

FastQ Screen

--aligner bowtie2' (bwa for PacBio)

--subset 1000 (for PacBio)

SPAdes

--careful

--disable-gzip-output

--cov-cutoff auto

--phred-offset 33

HGAP

Pbalign.task_options.min_accuracy: 70

Pbalign.task_options.no_split_subreads: false

Genomic_consensus.task_options.min_confidence: 40

falcon_ns.task_options.HGAP_GenomeLength_str:

6000000

Pbcoretools.task_options.read_length: 0

Genomic_consensus.task_options.use_score: 0

Pbalign.task_options.min_length: 50

Pbalign.task_options.algorithm_options: --minMatch 12

--bestn 10 --minPctSimilarity 70.0

Pbalign.task_options.hit_policy: randombest

Pbcoretools.task_options.other_filters: rq >= 0.7

Pbalign.task_options.concordant: false

Genomic_consensus.task_options.min_coverage: 5

falcon_ns.task_options.HGAP_SeedCoverage_str: 30

falcon_ns.task_options.HGAP_AggressiveAsm_bool: false

Genomic_consensus.task_options.algorithm: best

falcon_ns.task_options.HGAP_SeedLengthCutoff_str: -1

Genomic_consensus.task_options.diploid: false

MeDuSa

-random 100

Prokka

--usegenus

--force

--addgenes

--rfam

--rawproduct

cmsearch (taxonomy, 16S)

--rfam

--noali

blastn (taxonomy, 16S)

-evalue 1E-10

blastn (MLST)

-ungapped

-dust no

-evalue 1E-20

-word_size 32

-culling_limit 2

-perc_identity 95

blastp (VF)

-culling_limit 2

RGI (ABR)

--input_type contig

bowtie2 (mapping)

--sensitive

minimap2 (mapping)

-a

-x map-ont

samtools mpileup (SNP detection)

-uRI

bcftools call (SNP detection)

--variants-only

--skip-variants indels

--output-type v

--ploidy 1

-c

SNPsift filter (SNP detection)

"( QUAL >= 30 ) & (( na FILTER ) | (FILTER = 'PASS')) &

( DP >= 20 ) & ( MQ >= 20 )"

SNPeff ann (SNP detection)

-nodownload

-no-intron

-no-downstream

-no SPLICE_SITE_REGION

-upDownStreamLen 250

bcftools consensus

(phylogenetic tree)

--haplotype 1

fasttreemp

-nt

-boot 100

roary

-e

-n

-cd 100

-g 100000

Short-read assembly using Spades !

Abhimanyu Singh — Mon, 31 Jan 2022 07:18:16 -0600

If we only had Illumina reads, we could also assemble these using the tool Spades.

You can try this here, or try it later on your own data.

Get data

We will use the same Illumina data as we used above:

illumina_R1.fastq.gz: the Illumina forward reads
illumina_R2.fastq.gz: the Illumina reverse reads

Assemble

Run Spades:

spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o spades_assembly_all_illumina

-1 is input file of forward reads
-2 is input file of reverse reads
--careful minimizes mismatches and short indels
--cov-cutoff auto computes the coverage threshold (rather than the default setting, “off”)
-o is the output directory

Results

Move into the output directory and look at the contigs:

infoseq contigs.fasta

Mitochondrial genome assembly tools !

Abhi — Wed, 06 Sep 2023 00:37:18 -0500

Mitochondrial genome assembly tools are specialized software and algorithms designed to accurately reconstruct the mitochondrial genome (mitogenome) from sequencing data, typically obtained through techniques like next-generation sequencing (NGS). The mitochondrial genome is relatively small compared to the nuclear genome, making it an ideal target for assembly. Here are some commonly used mitochondrial genome assembly tools:

MitoFinder: Mitofinder is a pipeline to assemble mitochondrial genomes and annotate mitochondrial genes from trimmed read sequencing data.

MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads

MITObim: MITObim is a tool specifically developed for the iterative assembly of mitochondrial genomes. It starts with a reference mitogenome and iteratively refines the assembly using the read data.

MITOS: MITOS is a web-based platform that provides a pipeline for annotating mitochondrial genomes. It integrates multiple software tools for assembly, annotation, and visualization of mitogenomes.

MIRA: MIRA (Mimicking Intelligent Read Assembly) is a versatile genome assembly tool that can be used for mitochondrial genome assembly. It supports various sequencing technologies and allows for reference-based or de novo assembly.

NOVOPlasty: NOVOPlasty is a user-friendly tool designed for de novo assembly of organelle genomes, including mitochondria. It utilizes a seed-and-extend algorithm and is suitable for both short-read and long-read data.

MITOS2: MITOS2 is an updated version of the MITOS pipeline, which automates the annotation of mitochondrial genomes. It provides improved accuracy and additional features for mitochondrial genome analysis.

GetOrganelle: While primarily designed for chloroplast genome assembly, GetOrganelle can also be used for mitochondrial genome assembly. It is particularly useful for dealing with high-throughput sequencing data.

SPAdes: SPAdes (St. Petersburg genome assembler) is a versatile genome assembly tool that can be employed for mitochondrial genome assembly, especially when dealing with complex datasets that may contain nuclear mitochondrial DNA sequences (numts).

IDBA-UD: IDBA-UD (Iterative De Bruijn Graph De Novo Assembler) is another de novo assembly tool that can be used for mitochondrial genome assembly, especially in cases with relatively low coverage.

Velvet: Velvet is a de novo assembly tool that can be applied to mitochondrial genome assembly, especially when working with short-read data.

When selecting a mitochondrial genome assembly tool, it's important to consider the specific characteristics of your sequencing data, such as read length and coverage, as well as the complexity of the mitochondrial genome. Additionally, some tools are better suited for specific organisms or research objectives, so choosing the right tool will depend on your particular project requirements.

Popular bioinformatics educational resources !

Rahul Nayak — Fri, 04 May 2018 19:43:21 -0500

Followings are the list of popular bioinformatics educational resources

Bii.a-star.edu.sg

Bio research and development. Has course information and research information.

Isb-sib.ch

SIB operates the ExPASy proteomics server and the Swiss node of EMBnet. Teaching activities include a series of post-graduate courses given at the Universities of Geneva and Lausanne, as well as at the EPFL, and a Masters Degree in bioinformatics. Major research areas include the development of integrated databases and software resources in the field of proteomics.

Bioinformatics.ca

Provides information about bioinformatics in Canada. Workshops, certification and resources.

Chickscope.beckman.uiuc.edu

Students raise chicken embryos in the classroom and obtain magnetic resonance images through the Internet.

Bcb.iastate.edu

Graduate program at Iowa State University offering Undergraduate Major (BCBio) and the PhD program (BCB).

Bu.edu/bioinformatics/

Interdisciplinary PhD and Masters Programs that include an internship in the local industry companies. In conjunction with the NE masters program.

Bioinformatics.ubc.ca

A computational biology research centre covering many areas of genomics, proteomics, computer science and statistics. Research, training, news and events, resources and support, director's message, faculty and personnel.

Openhelix.com

Provides onsite training on specific bioinformatics databases and tools. Also offers bioinformatic software testing and research consulting services.

Igb.uci.edu

Specializing in making publicly available software and database services for computational biology.

Bioinformatics.pe.kr

Maintained by Dr. Seyeon Weon, Korea providing information on courses, a database archive, software archive and online resources.

Groups.yahoo.com/group/bimatics/

Bioinformatics group for students interested and/or working in the bioinformatics/computationalbiology fields. Offers opportunities to exchanging information and sharing ideas.

Ncbi.nlm.nih.gov/books/NBK22183/

Information about several medically important genes and related diseases. Illustrates the use of bioinformatics in their study.

Bioinfo.mbb.yale.edu/mbb452a/2003/

Bioinformatics course at Yale University. All course slides are available online.

Cs.iastate.edu/~honavar/comp-bio-courses.html

Listing of computational molecular biology course pages that have extensive online course materials.

Bioinf.manchester.ac.uk/dbbrowser/bioactivity/prefacefrm.html

A web-based tutorial associated with "Introduction to bioinformatics" published by Addison Wesley Longman.

Northeastern.edu/bioinformatics/

From the Biology department and in cooperation with Boston University. Emphasis on the ability to integrate knowledge from biological, computational, and mathematical disciplines.

Biocomp.unibo.it/lsbioinfo/

A two year, international master's programme in bioinformatics at the Universita di Bologna, Italy.

Cs.helsinki.fi/bioinformatiikka/mbi/programme.html

A two year Masters Degree Programme in Bioinformatics (MBI) offered by the University of Helsinki and Helsinki University of Technology, Finland.

Ornl.gov/sci/techresources/Human_Genome/education/education.shtml

A resource for introductory information on the Human Genome Project.

His.se/bioinformatics

A one-year, international master's programme in bioinformatics at the University of Skovde, Sweden.

Members.tripod.com/C.elegans/

Resources in biochemical, molecular, cellular, system, and organism biology, including over 25,000 indexed links, accumulated since 2000, from topic menus or from search interface.

Bioinformatics.org/faq/#contents

Summary of basics of bioinformatics for the intelligent newcomer.

Jiscmail.ac.uk/archives/bioinformatics.html

Forum featuring various aspects, events and developments in the bioinformatics field.

Biinoida.blogspot.com

Blog focusing on bioinformatics, biotechnology, pharma regulatory affairs, IPR and clinical trials.

Colorbasepair.com/bioinformatics_courses_tutorials.html

A list of on-line course materials and tutorials for bioinformatics and computational biology.

Geospiza.com/education/

Instructional materials for teaching bioinformatics. These include animated tutorials on topicssuch as BLAST, finding mutations in a protein, and graphing with MS-Excel.

Bioinformatics.fi

An international, two-year Master's programme jointly managed by the University of Tampere and the University of Turku, Finland.

Perlsource.net

Provides online courses in Perl programming for bioinformatic tools.

Structural polymorphism analysis from NGS data

Sat, 13 Jul 2013 17:12:47 -0500

The LabEx BASC (Biodiversity, Agroecosystems, Society, Climate), a network of 13 laboratories of the Paris-Saclay Scientific Cluster, is seeking a bioinformatician to analyze Next Generation Sequencing (NGS) data analysis. In the context of a flagship project aiming at understanding and improving the adaptive capacity of agroecosystems it will be critical to establish a link between sequence variation, functional variation, gene/protein expression and phenotypic adaptation.

The successful candidate will be in charge of the detection of polymorphisms including structural variants, of the comparison of multiple and diverse genomes of a same species and of the construction of pan- and core-genomes. These challenging tasks will require bioinformatics developments and implementation of methods for accommodating the high level of repetitiveness of complex genomes. The tools will be integrated into pipelines and made available to end-users through the Galaxy platform. The bioinformatician will therefore also have to provide researchers with advices on their experimental designs in order to ensure compliance of produced datasets with pipelines requirements. He/she will be hosted by a bioinformatics/informatics team (7 people) (http://moulon.inra.fr/index.php/fr/equipestransversales/atelier-de-bioinformatique) which has computational facilities and expertise in NGS data analysis, and will benefit as well from national and international collaborative networks (Aplibio http://www.renabi.fr/platforms/aplibio/, Transplant http://transplantdb.eu, AMAIZING http://www.amaizing.fr/).

The position requires a doctoral degree (PhD) in bioinformatics with strong expertise in script writing (Python/Perl) and pipeline development.

Applicants should send a CV and the names of 2 referees willing to provide a letter of recommendation to joets@moulon.inra.fr.

Postdoc Positions - Mammalian Transcriptome Evolution at SIB

Mon, 12 Aug 2013 19:58:33 -0500

BIOINFORMATICS POSTDOC IN FUNCTIONAL EVOLUTIONARY GENOMICS

Center for Integrative Genomics, University of Lausanne, Switzerland

Two postdoctoral positions (2 years with possible extensions up to 5 years) are available immediately in the evolutionary genomics group of Henrik Kaessmann.

We are seeking highly qualified and enthusiastic applicants with strong skills in computational biology/bioinformatics, preferably also with experience in data mining and comparative or evolutionary genome analysis.

We have been interested in a range of topics related to the functional evolution of genomes from primates (e.g., the emergence of new genes and their functions) and other mammals (e.g., the origin and evolution of mammalian sex chromosomes). In the framework of a recently launched series of projects, a large amount of transcriptome and genome (e.g., epigenome) data are being produced by the wet lab unit of the group using next generation sequencing technologies for a unique collection of tissues from representative mammals and outgroup species (e.g., birds). Topics of current projects based on these data include the origins and/or evolution of protein-coding genes, alternative splicing, microRNAs, long noncoding RNAs, and dosage compensation.

The postdoctoral fellow will perform integrated evolutionary/bioinformatics analyses based on data produced in the lab and available genomic data. The specific project will be developed together with the candidate.

The language of the institute is English, and its members form an international group that is rapidly expanding. The institute is located in Lausanne, a beautiful city at Lake Geneva.

For more information on the group and our institute more generally, please refer to our website: http://www.unil.ch/cig/page7858_en.html

Please submit a CV, statement of research interest, and names of three references to: Henrik Kaessmann (Henrik.Kaessmann@unil.ch).

Webpage : http://www.unil.ch/cig/page7858.html

Research Assistant @ NATIONAL BUREAU OF ANIMAL GENETIC RESOURCES

Tue, 03 Dec 2013 06:17:34 -0600

NATIONAL BUREAU OF ANIMAL GENETIC RESOURCES
Near Basant Vihar G.T. Road Bypass
P.O. Box No.129, Karnal-132001 (Haryana)

WALK-IN-INTERVIEW

A walk-in-Interview is proposed to be held at National Bureau of Animal Genetic Resources, Karnal (Haryana)-132001 at 11:30 AM on 18.12.2013 to select One RA and One SRF as per details given below:

1. One post of Research Associate under DBT sponsored Support under BIPP for the “SanGenix: A comprehensive Next Generation Sequence (NGS) data analysis solution” as Grants in AID. Thepost duration is Upto 31st March 2015 or earlier.

2. One post of Senior Research Fellow under NAIP (Component-4) Bioprospecting of genes and allele mining for abiotic stress tolerance. The post duration is Upto 31st March 2014 or earlier

Essential Qualifications: Ph.D. in Bioinformatics/ Computer Application or
First Class Masters degree in Bioinformatics/ Computer Application with two years experience as evidenced by Publications.

Desirable: Experience in the field of handling Next generation Sequencing Data.

Emolument: Rs. 22,000/- per month + HRA as per admissibility

Age Limit:

40 years for Men
45 years for women as on date of interview

Research Associate: ONE

Duration of engagement: Upto

31st March 2015 or earlier & Coterminus with the project

Responsibilities: To help the PI for Beta testing and development of the SanGenix Tool for NGS data.

Essential Qualifications: First Class Masters’ degree in Bioinformatics/Biotechnology.

Desirable: Experience in the field of Biotechnology/ Bioinformatics

Emoluments:

Rs. 16,000/- per month + HRA as per admissibility.
Senior Research Fellow: ONE
Duration of engagement: Upto 31st March 2014 or earlier & Coterminus with the project

Age Limit

35 years for men
40 years for women as on date of interview

Note: Relaxation in age will be admissible for SC/ST & OBC candidates as per Govt. of India /ICAR norms

1. The applicants must bring with them original documents and brief of research work done during post graduation along with a set of photocopy and latest two passport size photographs.
2. A panel of selected candidates will also be made which may be utilized for filling of positions of shorter durations in future if demand arises.
3. Experience certificate in original, if any 4. The above positions are purely on temporary basis and are co-terminus with the project. No TA/DA will be paid to attend the interview.
5. Any other clarifications can be had on the date of interview.
6. The Director’s decision will be final and binding on all respects.

Advertisement: http://210.212.93.85/rasrfadvertise.pdf

Special Project Scientist – Sorghum Genomics

Tue, 20 May 2014 00:34:39 -0500

ICRISAT is seeking applications from Indian Nationals for a Special Project Scientist to work on a sorghum genomics activities related to sequencing/re-sequencing projects utilizing New Generation Sequencing platforms.

The Job detail

Advancing the SNP-discovery and polymorphism assessment work across several germplasm panels representing global genetic diversity
Population genetic and genomic analyses, testing the hypothesis related to adaptation in multiple geographic regions
Develop SNP assays from large scale GBS and other re-sequencing data for several target traits utilizing available phenotyping data
Combined analyses of genotypic and phenotypic data for discovery of marker-trait associations, and conducting GWAS
Processing, analyzing, and archiving large-scale genomic data sets, assessing data quality, conducting analyses, interpreting findings, and communicating findings to others including preparation of reports, presentations, posters and journal articles
Providing support to MSc and PhD students on topic related to its major core of research
Any other work assigned by the supervisor

The Person:

PhD in bioinformatics, genetics, computational biology preferably with 1 to 2 years of experience;
familiar with standard bioinformatics tools and scripting languages and emerging and evolving software platforms relevant to bioinformatics and computational biology;
ability to create new analytical pipelines; experience with handling large data sets;
ability to program in at least two of the following: C++, PERL, Python, R, Java.
will use next-generation sequencing technologies to generate marker data for genetic mapping and transcriptome data for expression QTL mapping, and will be responsible for data generation as well as data analysis.

Period and Remuneration: The assignment is for a period of two years, and can be extended for another year depending on performance. ICRISAT pays a very attractive all inclusive lump sum assignment fee payable in Indian Rupees.

How to Apply: Please send your application by email to icrisatjobs@cgiar.org, stating the job title (Special project Scientist-Sorghum Genomics) clearly in the subject column, addressed to the Director, Human Resources and Operations, ICRISAT, Patancheru, Andhra Pradesh 502 324, India, latest by 10 June 2014. The application should include an up-to-date Curriculum Vitae, a short statement of competencies and experience for the position, and the names and addresses (including phone/e-mail) of three referees. Only short-listed candidates will be contacted.

More at: http://www.icrisat.org/careers/Special-Project-Scientist-Sorghum-Genomics.htm

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/