BOL: Related items

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm

ARCS: scaffolding genome drafts with linked reads

Rahul Nayak — Tue, 06 Mar 2018 16:35:26 -0600

ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS H.sapiensgenome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts.

Address of the bookmark: https://github.com/bcgsc/ARCS/

URDIP Pune Bioinformatics SRF/PA Openings

Sat, 20 Sep 2014 20:48:50 -0500

CSIR UNIT FOR RESEARCH AND DEVELOPMENT OF INFORMATION PRODUCTS
NCL Campus, S.No.113,114, Pashan, Pune 411 008

ADVERTISEMENT NO. - URDIP/ 5/2014

Learning opportunity for young Science and Engineering professionals to make a career in Information Science Industry CSIR has set up a Unit for Research and Development of Information Products (CSIR-URDIP) at Pune to work in the area of Scientific Informatics (ChemBioinformatics/Patent Informatics/Phytoinformatics/Toxinformatics) and related
software development projects.

Applications are invited from CSIR - UGC NET Qualified Candidates for consideration as Project Fellow (PF) and/or Senior Project Fellow (SPF) based on the experience to work on existing and new projects at CSIRURDIP.

Project Fellow

Remuneration - (Rs. 16,000.00 + 20% HRA)

M. Sc. In Biochemistry/Microbiology/Bioinformatics [Post-code A02] only with minimum of 55% marks

Senior Project Fellow

Remuneration - (Rs. 18,000.00 + 20% HRA)

M. Sc. in Biochemistry/Microbiology/Bioinformatics [Post-code A05] only with minimum of 55% marks plus two years research or relevant informatics experience

Please visit www.urdip.res.in/career.htm to apply online by 30th September, 2014.

Successful candidates who have appeared for NET exam in 2012 and 2013 are only eligible to apply.

Advertisement: http://115.112.95.114/urhr/download/Advt5_2014.pdf

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

Rahul Nayak — Fri, 11 May 2018 05:07:45 -0500

MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assemblying large genomes. For example, on a 32-thread computer with 2.0 GHz CPU , MECAT takes 9.5 days to assemble a human genome based on 54x SMRT data, which is 40 times faster than the current PBcR-Mhap pipeline. MECAT performance were compared with PBcR-Mhap pipeline, FALCON and Canu(v1.3) in five real datasets. The quality of assembled contigs produced by MECAT is the same or better than that of the PBcR-Mhap pipeline and FALCON.

https://www.nature.com/articles/nmeth.4432

Address of the bookmark: https://github.com/xiaochuanle/MECAT

Bioinformatics SVIMS Project Assistant Walk IN

Sat, 20 Sep 2014 21:02:29 -0500

SRI VENKATESWARA INSTITUTE OF MEDICAL SCIENCES
TIRUPATI, ANDHRA PRADESH, INDIA- 517 507
BIOINFORMATICS CENTRE, DEPARTMENT OF BIOINFORMATICS

Eligible candidates are invited for a walk-in-interview for recruitment of Project Assistant in SVIMS Bioinformatics centre under the BTISnet Project entitled “Creation of Bioinformatics Infrastructure Facility for promotion of Biology teaching through Bioinformatics” on 25.09.2014 at 11 AM in SVIMS, Tirupati. The engagement will be made purely on temporary basis for a period of one year and it can be terminated at any time without notice or without assigning any reason thereof by the Coordinator of the Project. The person engaged shall not be entitled for any claim implicit or explicit for absorption in the University.

1. Name of the post : Project Assistant

2. Qualification :
i) Essential : MSc Bioinformatics/MTech (Biotechnology/Bioinformatics)

ii) Desirable : Experience in Bioinformatics research work (Preference will be given to candidates qualified in BINC/UGC/CSIR/NET/GATE)

3. Remuneration : 16000 + 10% HRA for NET/GATE candidates 14000 + 10% HRA for M. Tech / M.Sc. Candidates

4. Place of posting : Tirupati

5. Duration of the Project : One year

Terms and conditions:

1. Candidates are required to submit the Biodata relevant certificates in support of their age and educational qualification etc., before the interview committee, SVIMS University, Tirupati.

2. Candidates called for interview will attend the interview at their own cost.
3. Interim enquiries will not be entertained.
4. The maximum age limit for Project Assistant is 28 years for general category and 33 years for SC and ST category candidates as on 25th September, 2014.

http://svr98.ehostpros.com/~svimsb98/Project%20Assistant_notification.pdf

Cerulean: A hybrid assembly using high throughput short and long reads

Rahul Nayak — Tue, 05 Jun 2018 10:10:15 -0500

Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads. Cerulean v0.1 has been implemented with bacterial genomes in mind. The method is fully described in Deshpande, V., Fung, E. D., Pham, S., & Bafna, V. (2013). Cerulean: A hybrid assembly using high throughput short and long reads. arXiv preprint arXiv:1307.7933. http://arxiv.org/abs/1307.7933

Address of the bookmark: https://sourceforge.net/projects/ceruleanassembler/

Nieduszynski Group

Fri, 26 Sep 2014 19:35:06 -0500

Complete, accurate replication of the genome is essential for life. All chromosomes in eukaryotic cells must be duplicated and then segregated to daughter cells to ensure genetic integrity and produce the large number of cells that make up a multicellular organism. We are using genetic, genomic and computational methods to understand how chromosome replication is regulated to ensure genome stability. By focusing on the basic biology that underpins cell growth and division we aim to provide new insights that may help our understanding of diseases such as cancer and congenital disorders.

More http://www.nieduszynski.org/index.php
http://www.path.ox.ac.uk/research/cell-biology-and-pathology/conrad-nieduszynski-group

NGS Online Training

Sat, 27 Sep 2014 07:42:29 -0500

ArrayGen Technologies announces to provide online NGS training through out the globe. Now analyze your own NGS datasets from anywhere.For more information contact us at training@arraygen.com

Please visit our site at www.arraygen.com

SimLoRD: A read simulator for third generation sequencing reads

Aaryan Lokwani — Wed, 22 Aug 2018 10:40:27 -0500

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Reads are simulated from both strands of a provided or randomly generated reference sequence.

The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)
The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length
Quality values and number of passes depend on fragment length.
Provided subread error probabilities are modified according to number of passes
Outputs reads in FASTQ format and alignments in SAM format

Address of the bookmark: https://bitbucket.org/genomeinformatics/simlord/

JRF in Bioinformatics @ INMAS, DRDO,Delhi

Wed, 01 Oct 2014 07:01:07 -0500

Institute of Nuclear Medicine and Allied Sciences (INMAS), Delhi under the aegis of Defence Research and Development Organisation (DRDO), is engaged in research and developmental work in radiation sciences, Neuro-Computing and Medical Image Processing. INMAS is looking for meritorious young researchers for pursuing research in the frontier areas at INMAS. The Institute invites applications from young and meritorious Indian nationals who are creative, have passion and desire to pursue R&D in frontier areas. INMAS possesses ambience of a research cum academic institute coupled with an advanced R&D infrastructure in a mission mode. It provides the best infrastructure, motivation and personality development prospects for talented students, dreaming of unparalleled success in their professional endeavors. INMAS provides state of the art research facilities for undertaking pioneering research with defence applications.

JRF (Maximum Tenure‐ Five Years: 2yrs as JRF and 3yrs as SRF)
A first class Master’s Degree in Bioinformatics (likely 2 posts)
Around Rs 16,000/ Plus 30% HRA (as per rules of funding agency)

Applications are invited from candidates possessing the above qualifications. The upper age limit is as on the last date for receipt of application. (5 years relaxation to SC/ST candidates, 3 years to OBC candidates, and other entitled categories as per Govt rules). Actual No. of vacancies may vary.

Application form can be download from the website www.drdo.gov.in and E Mailed to inmashrd@gmail.com.
Last date to apply by email is 1700 hrs on 15 Oct 2014
Incomplete applications are liable to be rejected.
Confirmation will be sent to short-listed candidates through email only
Antecedents of selected candidates will be verified.
Written Test will be conducted from 0930-1030 hrs. Latecomers will not be considered.
Candidates will be required to produce certificates/testimonials in original at the time of interview.
It may please be noted that offer of Fellowship does not confer on fellows any right for absorption in DRDO.
Candidates should carry photocopy of Application form sent by email with them.
No TA/DA will be paid for attending interview & on joining.
Last date to apply by email is 1700 hrs on 15 Oct 2014

More at http://drdo.gov.in/drdo/English/jrf29092014.pdf
http://drdo.gov.in/drdo/English/index.jsp?pg=inmas29092014.jsp