BOL: Related items

sequenceserver

Jit — Fri, 10 Mar 2017 08:51:55 -0600

SequenceServer lets you rapidly set up a BLAST+ server with an intuitive user interface for use locally or over the web.

More at http://sequenceserver.com.

Address of the bookmark: https://github.com/wurmlab/sequenceserver

DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Jit — Wed, 19 Apr 2017 10:09:51 -0500

DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Our work is published in Scientific Reports:

Ye, C. et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies. Sci. Rep. 6, 31900; doi: 10.1038/srep31900 (2016).

http://www.nature.com/articles/srep31900

The manual can be downloaded from:

https://github.com/yechengxi/DBG2OLC/raw/master/Manual.docx

To use precompiled versions,please go to:

https://github.com/yechengxi/DBG2OLC/tree/master/compiled

Address of the bookmark: https://github.com/yechengxi/DBG2OLC

SSPACE

Jit — Fri, 05 May 2017 05:42:15 -0500

SSPACE standard is a stand-alone program for scaffolding pre-assembled contigs using NGS paired-read data. It is unique in offering the possibility to manually control the scaffolding process. By using the distance information of paired-end and/or matepair data, SSPACE is able to assess the order, distance and orientation of your contigs and combine them into scaffolds. Currently we offer this as a command-line tool in Perl. The input data is given by pre-assembled contig sequences (FASTA) and NGS paired-read data (Illumina/454/Solid FASTA or FASTQ). The final scaffolds are provided in FASTA format.

Address of the bookmark: https://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE

Bacterial genome assembly !!

Jit — Fri, 05 May 2017 06:11:22 -0500

This tutorial will serve as an example of how to use free and open-source genome assembly and secondary scaffolding tools to generate high quality assemblies of bacterial sequence data. The bacterial sample used in this tutorial will be referred to simply as “Species” since it is live data. This data is paired-end data, meaning that there are forward and reverse reads, which we will designate as Sample_R1.fastq and Sample_R2.fastq, respectively.

https://github.com/jennomics/WorkflowPaper/blob/master/Genome%20Assembly%20and%20Annotation.md

Address of the bookmark: http://bioinformatics.uconn.edu/bacterial-genome-assembly-tutorial/

NAViGaTOR: Network Analysis, Visualization and Graphing Toronto

Rahul Nayak — Tue, 03 Jul 2018 05:05:55 -0500

NAViGaTOR – Network Analysis, Visualization, & Graphing TORonto is a software system for scaleable visualizing and analyzing networks. The current version, NAViGaTOR 3, increases modularity, improves scaleability, extends input/output options, brings new network views and analysis algorithms. http://142.150.188.236/navigatorwp/

Address of the bookmark: http://142.150.188.236/navigatorwp/

DIYA: a bacterial annotation pipeline for any genomics lab

Jit — Fri, 30 Jun 2017 08:48:26 -0500

DIY Genomics is an open source bioinformatics consortium intended to bring a collection of tools and libraries into the hands of small scale genomics labs for the process of sequence assembly and annotation. Projects include DIYA, MGAP, CRISPR, and DIYGV

http://gmod.org/wiki/Diya

Address of the bookmark: https://sourceforge.net/projects/diyg/

RAVEN: a software suite for Matlab that allows for semi-automated reconstruction of genome-scale models

Jit — Wed, 24 Oct 2018 22:38:05 -0500

The RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox 2 is a software suite for Matlab that allows for semi-automated reconstruction of genome-scale models (GEMs). It makes use of published models and/or KEGG, MetaCyc databases, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology.

Address of the bookmark: https://github.com/SysBioChalmers/RAVEN

Sr.Bioinformatics Analyst (NGS) at Ocimum

Fri, 17 Nov 2017 07:50:44 -0600

JOB FUNCTIONBio Tech/R&D/Scientist
INDUSTRYBiotechnology/Pharmaceutical/Medicine
SPECIALIZATIONBasic Research,Bio-Statistician,Clinical Research
QUALIFICATION
Any Post Graduate
BA (Arts), B.Com. (Commerce), BE/ B.Tech (Engineering), B.Pharm. (Pharmacy), B.Sc. (Science), BL/LLB, BDS (Dental Surgery), B.Ed. (Education), BHM (Hotel Management), BBA/ BBM/ BBS, B.Arch. (Architecture), BCA (Computer Application), Diploma-Other Diploma, B.Plan. (Planning), BGL, B.V.Sc. (Veterinary Science), Other School/ Graduation, BHMS (Homeopathy), BAMS (Ayurveda)
Job Description

1. Must have basic understanding of molecular biology and Genomics.
2. Experience in application development or must have expertise in programming using either of Perl/Python.
3. Experience in statistical programming using R/Bioconductor/Matlab.
4. Strong concept in statistical and mathematical modelling.
5. Experience in designing and developing the bioinformatics pipeline.
6. Must have minimum 2+ years of hands on experience in NSG data analysis such as RNA-Seq,Exome-Seq ,Chip-Seq and downstream analysis.
7. Knowledge in WGS ,WES, Targeted re-sequencing,GWAS and population genomics will be preferred.
8. Must have experience working on opensource software/Framework and commercial software for NGS data analysis and reporting.
9. Should be aware of handling big data and guiding team members on multiple projects simultaneously.
10. Should have experience coordinating with different groups of clinical research scientist for various project requirements.
11. Ability to work as team as well as independently with minimal support.

More at http://www3.ocimumbio.com/

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta

Delta: a new Web-based 3D genome visualization and analysis platform

Jit — Wed, 20 Dec 2017 08:49:55 -0600

Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Deltafeatures a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs.

https://github.com/zhangzhwlab/delta

Address of the bookmark: https://github.com/zhangzhwlab/delta