BOL: Related items

BioPython Cookbook

Jitendra Narayan — Thu, 08 Aug 2013 06:43:02 -0500

If you are planning to start learning BioPython ( it does not bite but swallow :P just kidding) then this online cookbook will be really helpful for you.

http://biopython.org/DIST/docs/tutorial/Tutorial.html

NCBI Prokaryotic Genome Annotation Pipeline

Jit — Tue, 16 May 2017 08:56:03 -0500

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. You can find a more detailed description of the new version of the pipeline in NCBI Handbook chapter. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

https://www.ncbi.nlm.nih.gov/genome/annotation_prok/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/

List of pharmacogenomics companies worldwide

Jitendra Narayan — Fri, 09 Aug 2013 13:24:47 -0500

Pharmacogenomics are the most promising area of research. Here is the list of some Pharmacogenomics companies worldwide. Feel free to add more pharmacogenomics companies if not mentioned in here.

Great Pharmacogenomics companies
www.aruplab.com
www.clarientinc.com
www.cns-hts.com
www.dnanow.com
www.dnavision.be
www.dnavision.com
www.dxsdiagnostics.com
www.entrogen.com
www.exiqon.com
www.gene.com
www.genomichealth.com
www.genoptix.com
www.genpathdiagnostics.com
www.gentris.com
www.immunicon.com
www.ingenuity.com
www.lab21.com
www.labcorp.com
www.lion-ag.de
www.lynxgen.com
www.mayoclinic.com
www.mesoscale.com
www.microcide.com
www.mitokor.com
www.monarchlifesciences.com
www.mplnet.com
www.orchidbio.com
www.pebio.com
www.phenomenome.com
www.phenopath.com
www.ppgx.com
www.prometheuslabs.com
www.protogene.com
www.questdiagnostics.com
www.rigelinc.com
www.rii.com
www.saladax.com
www.tmdlab.com
www.transgenomic.com
www.twt.com
www.uslabs.net
www.variagenics.com

Great Equipment Companies for Genomics
www.affymetrix.com
www.illumina.com
www.iontorrent.com
www.sequenom.com
www.appliedbiosystems.com
www.454.com
www.appliedbiosystems.com

Genomics in India
www.ganitlabs.in
www.sandor.co.in
www.igib.res.in
www.genotypic.co.in
www.ocimumbio.com
www.abcgenomics.com
www.xcelrisgenomics.com
www.ayugen.com
www.geneombiotech.com

Large Global Whole Genome Companies
www.decode.com
www.23andme.com
www.navigenics.com
www.pathway.com

Global companies offering genomics services
www.asuragen.com
www.baseclear.com
www.agtcenter.com
www.ambrygen.com
www.arosab.com
www.agrf.org.au
www.beckmangenomics.com
www.genomics.cn
www.bsf.a-star.edu.sg
www.cbm.fvg.it
www.cincinnatichildrens.org
www.cofactorgenomics.com
www.covance.com
www.dnalandmarks.ca
www.dnavision.com
www.expressionanalysis.com
www.fasteris.com
www.gatc-biotech.com
www.genesdiffusion.com
www.geneseek.com
www.geneticvisions.com
www.geneworks.com.au
www.genizon.com
www.genoskan.dk/uk
www.gpbio.jp
www.igatechnology.com
www.igenixinc.com
www.auxologico.it
www.lifeandbrain.com
www.macrogen.co.kr/eng
www.gqinnovationcenter.com
www.mftservices.de
www.ncgr.org
www.ramaciotti.unsw.edu.au
www.rikengenesis.jp
www.SABiosciences.com
www.sequensysbio.com
www.servicexs.com
www.snp-genetics.com
www.takara-bio.com
www.gen-probe.com
www.traitgenetics.com

InterpretOmics

Jitendra Narayan — Sun, 11 Aug 2013 10:24:33 -0500

InterpretOmics, a big data analytics startup that focuses on life sciences, has received angel funding of around Rs 10 crore from a group of investors including Singapore's information technology and shipping company, Amarante.

http://www.interpretomics.co/

The 10th North East Bioinformatics Network (NEBINet) Annual Coordinators' Meet

Jit — Sat, 18 Nov 2017 15:02:44 -0600

The 10th North East Bioinformatics Network (NEBINet) Annual Coordinators' Meet organised by the Bioinformatics Centre, St Edmund's College, Shillong and sponsored by the Department of Biotechnology, Government of India, was held at St Edmund's College Auditorium here on Thursday. Meghalaya Governor Ganga Prasad graced the inaugural programme as chief guest.
In his inaugural address, the Governor said the panorama of scientific scenario has greatly changed over the years, the thrust areas have undergone a metamorphosis but the conceptual underpinning of the basic sciences still continues.
"Of late, the activity of basic research has been intricately intertwined with technology. And we are determined to carry forward this change, for it is through technology that science can actually reach the masses in our country and afar, and the changing times have also inculcated a culture of cross-departmental and interdisciplinary research. Science and technology has always played a pivotal role in taking a nation towards greater heights by ways of innovations and inventions," he added.
Prasad also hoped that discussions, suggestions and sharing of innovative ideas during the two-day 10th NEBINet Annual Coordinators' Meet will open up new avenues to make substantial advancement in Biological Sciences which will provide a platform for proper and effective delivery mechanism for the common man.
During the inaugural function, Advisor of Department of Biotechnology Dr T Madhan Mohan gave an overview of the NEBINet and Bioinformatics programme.
President of Epygen Biotech FZ LLC, Dubai, UAE, Dr Debayan Ghosh, delivered the keynote address.
St Edmund's College governing body secretary Brother Simon Coelho and St Edmund's College Principal Dr Sylvanus Lamare also spoke during the function.

Postdoc Positions - Mammalian Transcriptome Evolution at SIB

Mon, 12 Aug 2013 19:58:33 -0500

BIOINFORMATICS POSTDOC IN FUNCTIONAL EVOLUTIONARY GENOMICS

Center for Integrative Genomics, University of Lausanne, Switzerland

Two postdoctoral positions (2 years with possible extensions up to 5 years) are available immediately in the evolutionary genomics group of Henrik Kaessmann.

We are seeking highly qualified and enthusiastic applicants with strong skills in computational biology/bioinformatics, preferably also with experience in data mining and comparative or evolutionary genome analysis.

We have been interested in a range of topics related to the functional evolution of genomes from primates (e.g., the emergence of new genes and their functions) and other mammals (e.g., the origin and evolution of mammalian sex chromosomes). In the framework of a recently launched series of projects, a large amount of transcriptome and genome (e.g., epigenome) data are being produced by the wet lab unit of the group using next generation sequencing technologies for a unique collection of tissues from representative mammals and outgroup species (e.g., birds). Topics of current projects based on these data include the origins and/or evolution of protein-coding genes, alternative splicing, microRNAs, long noncoding RNAs, and dosage compensation.

The postdoctoral fellow will perform integrated evolutionary/bioinformatics analyses based on data produced in the lab and available genomic data. The specific project will be developed together with the candidate.

The language of the institute is English, and its members form an international group that is rapidly expanding. The institute is located in Lausanne, a beautiful city at Lake Geneva.

For more information on the group and our institute more generally, please refer to our website: http://www.unil.ch/cig/page7858_en.html

Please submit a CV, statement of research interest, and names of three references to: Henrik Kaessmann (Henrik.Kaessmann@unil.ch).

Webpage : http://www.unil.ch/cig/page7858.html

Bioinformatician needs ten heads !!!

Jitendra Narayan — Sat, 17 Aug 2013 10:30:45 -0500

Bioinformatics demands more and ... lots more knowledge. In this case Ravan, a mythological character from the Ramayan, can only be a real bioinformatician. :) :P

The Brent Lab

Fri, 09 Feb 2018 10:55:27 -0600

The Brent Lab is developing and applying computational methods for mapping gene regulation networks, modeling them quantitatively, and engineering new behaviors into them.

Best practices in bioinformatics training for life scientists

Jitendra Narayan — Tue, 13 Aug 2013 15:47:34 -0500

Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts.

Find the detail paper at http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full

Address of the bookmark: http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta