BOL: Related items

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm

ALPACA: A hybrid strategy for assembly of genomic DNA shotgun sequencing reads.

Seema Singh — Mon, 30 Apr 2018 04:38:40 -0500

ALPACA requires Celera Assembler 8.3 or later. It is recommended to build Celera Assembler from source. (Why? The pre-built binaries CA_8.3rc1 and CA8.3rc2 will work for any large data set.

Detail paper at https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3927-8

Address of the bookmark: https://github.com/VicugnaPacos/ALPACA

Bioinformatics Faculty at UNIVERSITY OF HYDERABAD

Wed, 14 Oct 2015 22:53:44 -0500

UNIVERSITY OF HYDERABAD

(A Central University established by an Act of Parliament)

Prof. C.R.Rao Road, P.O. Central University Campus, Gachibowli,

Hyderabad - 500 046

Advt.No. UH/HR/Rectt-2015/02 dt. 12.10.2015

The University invites applications from the Indian citizens for the following positions:

Professor / Associate Professor / Assistant Professor :

Biotechnology & Bioinformatics

Last date : 16th November 2015

More Info : http://www.uohyd.ac.in/images/recruitment/advt-121015.pdf

BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,

Jit — Wed, 23 May 2018 06:54:32 -0500

BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.

Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The “target.fasta.sa” is the suffix array from “target.fasta” generated by sawriter.

blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15

the output format option “-m 4″ generate the alignment coordinate. Not fully documented, but I can explain that to you.

I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.

Address of the bookmark: http://bix.ucsd.edu/projects/blasr/

SRF/JRF Biotechnology NRCPB - Delhi, Delhi

Fri, 13 Nov 2015 02:52:11 -0600

SRF/JRF job position in National Research Centre on Plant Biotechnology (NRCPB)

JRF /1

Qualification : Master’s Degree in Biotechnology / life sciences with four years Bachelor’s Degree (or) Master’s Degree in Biotechnology / life sciences with NET qualification with 1st Division or 60% marks or equivalent overall grade point average . Non NET/ Master’s degree with three years Bachelor’s degree as per DST/DBT norms. Desirable: Working Experience in Molecular Biology Techniques, genome sequence analysis Bioinformatics

Emoluments : Rs.25000

SRF

Qualification : Master’s degree in Biotechnology/Bioinformatics/Life Science with 1st division or 60% marks or equivalent overall grade point average with 4 year of Bachelor’s degree or 5 years integrated Masters degree. Desirable: Working experience in Bioinformatics, genomic analysis

Emoluments : Rs.25000/

Age Limit: 35 years

Walk-in-interview will be held on 20th November 2015 at 10 AM at NRCPB, LBS Building, Pusa Campus, and New Delhi-110012

More at http://www.nrcpb.org/sites/default/files/Adverdisement_0.pdf

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Jit — Fri, 06 Jul 2018 04:27:49 -0500

ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies.

Address of the bookmark: https://github.com/songc001/remilo

Teaching Careers 2015 at Uttaranchal College of Science & Technology

Mon, 02 Nov 2015 04:00:26 -0600

Teaching, Non-teaching Careers 2015
Uttaranchal College of Science & Technology - Dehra Dun, Uttarakhand
Teaching, Non-teaching Careers 2015 at Uttaranchal College of Science and Technology, Dehradun

Below mentioned teaching and non-teaching job vacancies are to be recruited.

Assistant Professors job vacancies in the disciplines of:
Biotechnology
Bioinformatics

Eligibility:
Masters Degree in the Discipline along with Ph.D

How to apply:
Candidates who possess requisite eligibility criteria for all the above mentioned positions required to apply on or before 10 days from the date of vacancy notification.

Candidates required to send copies of all the documents along with application

Contact details:
Uttaranchal College of Science & Technology
Nagal Hatnala, P.O. Kulhan Sahastradhara Road,
Dehradun-248001, Uttarakhand, India

More details can be had at:
Dehradun Classifieds e-paper dated 01.11.2015 at page number 40-41

About Employer

Uttaranchal College of Science and Technology, Dehradun, Uttarakhand is affiliated to HNB Garhwal University, Srinagar, Uttarakhand

Employer: Uttaranchal College of Science & Technology
Address: Uttaranchal College of Science & Technology Nagal Hatnala, P.O. Kulhan Sahastradhara Road, Dehradun-248001, Uttarakhand, India
Email: info@ucstdoon.com
URL: http://www.ucstdoon.com
Phone: 0135-2607011, 607413, 3254785, 09719146701

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads

Rahul Nayak — Fri, 19 Oct 2018 08:23:42 -0500

Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top–down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom–up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data

Address of the bookmark: https://sourceforge.net/projects/bio-rainbow/files/

Project Assistant II Jobs opportunity in National Chemical Laboratory (NCL) on temporary basis

Fri, 06 Nov 2015 01:36:16 -0600

No. Bio/NCIM/3

Project Assistant II Jobs opportunity in National Chemical Laboratory (NCL) on temporary basis

Project Code No. : GAP312626

Title of the Project : Microbial ecology and distribution of geochemical cycling genes in an hot spring ecosystem

No. of Post : 01

Qualifications : M.Sc./B.Tech/M.Tech in Computational biology/ Bioinformatics from recognized university with minimum 60 % marks (aggregate)

Desirable : Good knowledge of computational skills, Linux (command line and GUI) and Unix; Perl / Python / R /C-programming. Practical knowledge of analysis of Next generation sequence datasets (amplicon sequencing, whole metagenome, and complete genome sequencing) with reference to microbes. Analysis and statistical validation of NGS data generated from different chemistry platforms. Some wet-lab experience in microbial system would be an added advantage as project involves some travel.

Emoluments : Rs. 16,000/-

Age Limit : 28 years
How to apply

The application with the above information duly signed together with photo-copies of relevant certificates/testimonials should be addressed to : The Head, NCIM Resource Centre (Attn Dr. M.S. DHARNE), National Chemical Laboratory, Pune 411 008, so as to reach on or before 16th November 2015.

More at http://www.ncl-india.org/files/JoinUs/JobVacancies/TemporaryJobs.aspx?menuid=ql6&childmenustripid=divSubQL6

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Jit — Tue, 28 Jan 2020 03:27:33 -0600

FastGT is a program package for whole-genome genotyping of genome variants directly from raw sequencing reads. It is written in C and runs in Linux. FastGT uses a list of variant-specific k-mer pairs that are unique in human genome, counts the frequency of k-mers in sequencing data and predicts the genotype. All this takes less than 1 hour on average low-cost Linux server.

http://bioinfo.ut.ee/FastGT/

https://github.com/bioinfo-ut/GenomeTester4/

Address of the bookmark: http://bioinfo.ut.ee/FastGT/