BOL: Related items

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Bioinformatics algorithms tutorials

John Parker — Tue, 24 Jun 2014 00:10:45 -0500

Useful bioinformatics tutorial, such as

De Bruijn Graphs for NGS Assembly
Algorithms for PacBio Reads
Software and Hardware Concepts for Bioinformatics
Finding us in Homolog.us (Search Algorithms)
NGS Genome and RNAseq Assembly - a Hands on Primer
Introduction to PERL, Python, R and C/C++ for Bioinformatics

Address of the bookmark: http://www.homolog.us/Tutorials/

COSMOS, our workflow management system for NGS data

Jit — Wed, 23 Jul 2014 07:29:14 -0500

COSMOS, our Python-based management system for implementing large-scale parallel workflows focusing on, but not restricted to, large-scale short-read "NGS" sequencing data is open-access published via Advance Access in Bioinformatics (Gafni et al. 2014). It is also available for download for non-commercial academic and research purposes at:

http://cosmos.hms.harvard.edu/.

Address of the bookmark: https://cosmos.hms.harvard.edu/

ArrayGen Bioinformatics Genomics Group

Sun, 28 Sep 2014 14:09:55 -0500

ArrayGen is a global bioinformatics company which is a one stop solution for microarray designing and genomics data analysis. Our novel Array Design Approach Strategy (ADAS) aims to condense the time lag between demands of scientific community and manufacture industry, thereby expediting research processes.

ArrayGen specializes in Genomics data analysis and research, as we believe in the level of precision, predictability, benchmark-ability, and data analysis capability of genomics data over other forms of biological data. ArrayGen constantly strives to develop new solutions, and plug the existing gaps in the technological advancement of the field.

More http://www.arraygen.com/

A powerful, yet simple, gene set analysis tool for interpreting RNA-seq and NGS results.

Shani — Thu, 30 Oct 2014 09:19:29 -0500

LifeMap Sciences is introducing GeneAnalytics, our new gene set analysis tool, which is applicable for NGS results and differentially expressed gene lists from variable sources. GeneAnalytics provides gene associations with tissues & cells, diseases, pathways, GO terms and compounds.

Our main advantages over other similar tools are:

GeneAnalytics is very simple and intuitive to use.
GeneAnalytics is based on our proprietary databases – GeneCards, MalaCards, PathCards and LifeMap Discovery, each of them integrates information from a very large number of resources.
GeneAnalytics supplies links for extensive background information on each of the matched results.

I invite you to try it out for free at geneanalytics.genecards.org, and would be happy to hear your comments and thoughts on how we can improve.

Yours,

Shani Ben-Ari Fuchs

LifeMap Sciences Team

Rosalind Bioinformatics problems !!!

Abhi — Thu, 18 Dec 2014 10:32:48 -0600

Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.

http://rosalind.info/problems/list-view/

Address of the bookmark: http://rosalind.info/problems/list-view/

LASTZ

Abhi — Mon, 18 Apr 2016 04:41:55 -0500

LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

More at http://www.bx.psu.edu/~rsharris/lastz/

Thesis: http://www.bx.psu.edu/~rsharris/rsharris_phd_thesis_2007.pdf

Address of the bookmark: http://www.bx.psu.edu/~rsharris/lastz/

Bioinformatics WalkIn at NII

Fri, 04 Sep 2015 21:48:15 -0500

ADVERTISEMENT OF WALK-IN-INTERVIEW

NAME OF THE POST : Bioinformatician (Part time 3 days in a week) (One Position only)

DURATION : One Year

NAME OF THE PROJECT : Next generation sequencing facility

EDUCATIONAL QUALIFICATIONS : At least a Masters degree in Bioinformatics and Bachelors degree in any stream of life sciences

REQUIREMENTS :

Around 5 years of experience and proven track record in next generation sequence data analysis (supported by publications in peer-reviewed journals), ability to analyze transcriptomics, Chip-seq, and small RNA –seq data.

: Should have the ability to analyze raw primary data generated by Illumina next generation sequencing platforms and create / troubleshoot custom analysis Pipelines.

Should have ability to handle all downstream secondary and tertiary data analysis using commercially available as well as open source softwares (transcriptomics, ChIP-seq, small RNA-seq)

Apart from these, the applicant should have knowledge of the following: Programming: Perl and Python. Operating system:

Linux and Windows. NGS Analysis tools: Maq, BWA, Bowtie, SAM tools, BEDTools, MACS, Galaxy, FastQC, Bismark, MEDIPS, Tophat, Cufflinks, AvadisNGS, CLC Genomics Workbench, Galaxy, BaseSpace, Trinity Statistics: Microsoft Excel and R. Database: MySQL Genome Browser: UCSC, Ensemble, IGV, IGB Motif Analysis Tools: MEME Suite, Transfac and RSAT Functional Annotation Tools: DAVID, GeneCodis, Gene Cards Networking Tools: Cytoscape

EMOLUMENTS : The incumbent will be paid a fee of Rs. 2000/- per sitting/ per day.

SCIENTIST NAME : Dr. Arnab Mukhopadhyay,

Staff Scientific V Next generation sequencing facility

SCIENTIST’S E-MAIL ID : arnab@nii.ac.in

WALK IN INTERVIEW ON : 18th September, 2015

REGISTRATION OF CANDIDATES: 10.30 AM to 11.00 AM

PLEASE NOTE- 1. CANDIDATE MAY FILL UP APPLICATION IN THE PRECRIBED FORMAT ALONG WITH NECESSARY DOCUMENTS FOR VERIFICATION. 2. APPLICATIONS CONTAINING INCOMPLETE INFORMATION SHALL NOT BE ENTERTAINED. 3. DATE OF PASSING THE EXAMINATIONS MUST BE INDICATED CLEARLY. 4. ONLY REGISTERED CANDIDATES WILL BE INTERVIEWED. 5. NO TA/DA WILL BE PAID FOR ATTENDING THE INTERVIEW PRESCRIBED FORM 1. NAME 2. FATHER’S NAME 3. MOTHER’S NAME 4. DATE OF BIRTH 5. SEX (MALE/FEMALE) 6. CATEGORY (SC/ ST/ OBC/ PH) 7. ADDRESS a. (CORRSPONDENCE) b. (PERMANENT) 8. E MAIL, TELEPHONE NO. & MOBILE No (if any) 9. ACADEMIC & PROFESSIONAL QUALIFICATIONS NAME OF EXAMINATION PASSED WITH SUBJECTS YEAR OF PASSING BOARD/ UNIVERSITY PERCENTAGE/ DIVISION REMARKS 10. PAST EXPERIENCE & PRESENT EMPLOYMENT, IF ANY 11. CANDIDATES SHOULD STATE CLEARLY WHETHER THEY HAVE BEEN AWARDED PH.D DEGREE OR THESIS HAS BEEN SUBMITTED. 12. HAVE YOU APPLIED FOR A POSITION EARLIER IN THE INSTITUTE? IF SO:- (1) THE DETAILS OF THE PROJECT AND PROJECT INVESTIGATOR (2) IF CALLED FOR INVERVIEW, RESULTS THEREOF

More at http://www1.nii.res.in/sites/default/files/walkininterview-18sept2015.pdf

Katju Lab

Fri, 26 Feb 2016 03:25:32 -0600

TheLab seek to understand the genetic factors contributing to genomic variation and phenotypic diversity. To this end, we employ molecular and bioinformatic tools to study evolutionary processes at the level of populations, both experimental and natural, and genomes. Our research interests encompass a wide range of topics, including the evolution of organellar and nuclear genomes, gene duplication and the origin of novel function, and the fitness and phenotypic consequences of mutation in evolution. For details regards ongoing projects, please see the Research page.

http://katjulab.com/research.html

SCALCE

Surabhi Chaudhary — Fri, 15 Apr 2016 05:09:51 -0500

SCALCE (/skeɪlz/, a.k.a. boosting Sequence Compression Algorithms using Locally ConsistentEncoding) is a tool for compressing FASTQ files. It is designed specifically for the Illumina-generated FASTQ files, but supports any valid FASTQ with consistent read lengths.

More at http://sfu-compbio.github.io/scalce/

Address of the bookmark: http://sfu-compbio.github.io/scalce/