BOL: Related items

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm

Scientist at Advanced Centre for Treatment, Research and Education in Cancer - Navi Mumbai, Maharashtra

Tue, 30 Aug 2016 04:16:15 -0500

Scientist
Advanced Centre for Treatment, Research and Education in Cancer - Navi Mumbai, Maharashtra
Scientist (One position)
Project: Bioinformatics centre DBT- Sub-DIC at ACTREC
Funding agency: DBT Grant No.232

Duration of the Project: Six Months from the date of appointment can be extended further for six months
Essential Qualification and Experience: 1st Class Masters Degree in Bioinformatics or Life Sciences equivalent degree from a recognized University with 4 years R&D experience in Bioinformatics or relevant subjects from recognized institutes.
OR
Ph.D. degree in Bioinformatics or Life Sciences from recognized University.
M.Sc. degree obtained after a one year course will not be considered.
Experience: Research/teaching experience in Bioinformatics or relevant subjects form recognized Institute(s).

More at http://www.actrec.gov.in/data%20files/Vacancies/2016/AV-scin-stud-trainee-6-Sept-16.docx

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

Jit — Tue, 15 May 2018 07:35:26 -0500

HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy. We found that previously described haplotype assembly methods are specialized for specific read technologies or protocols, with slow or inaccurate performance on others. With this in mind, HapCUT2 is designed for speed and accuracy across diverse sequencing technologies, including but not limited to: NGS short reads (Illumina HiSeq) clone-based sequencing (Fosmid or BAC clones) SMRT reads (PacBio) Oxford Nanopore reads 10X Genomics Linked-Reads proximity-ligation (Hi-C) reads high-coverage sequencing (>40x coverage-per-SNP) using above technologies combinations of the above technologies (e.g. scaffold long reads with Hi-C reads) See below for specific examples of command line options and best practices for some of these technologies. NOTE: At this time HapCUT2 is for diploid organisms only. VCF input should contain diploid variants. If you use HapCUT2 in your research, please cite: Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. gr.213462.116 (2016). doi:10.1101/gr.213462.116

Address of the bookmark: https://github.com/vibansal/HapCUT2

SWALO: Scaffolding with assembly likelihood optimization

Jit — Wed, 20 Jun 2018 02:45:16 -0500

SWALO (scaffolding with assembly likelihood optimization) is a method for scaffolding based on likelihood of genome assemblies computed using generative models for sequencing. Please email your questions, comments, suggestions, and bug reports to atif.bd@gmail.com.

Address of the bookmark: https://atifrahman.github.io/SWALO/

Bioinformatics jobs at Chittaranjan National Cancer Institute

Thu, 29 Sep 2016 09:36:33 -0500

Chittaranjan National Cancer Institute Advertisement No.497/2016 Invites Applications For Senior Scientific Officer, Gr. II

Note: Experience in the following field required: Molecular cancer cytogenetic and genetic toxicology Molecular drug Designing and targeted therapy Cancer genomics, proteomics, bioinformatics and next generation sequencing Therapeutic stem cell research and gene therapy Molecular cancer immunology and immunotherapy Molecular epidemiology Tumor endocrinology Translation research Ultra structural/tissue engg/development biology research Virus and cancer Molecular pathology No. of Posts: 11 (Eleven), (SC-1, OBC-3, UR-7)

Location: Kolkata (Calcutta) Salary: Rs.15600-39100 + Grade, Pay Rs.5400/-

For details kindly refer to the Employment News dated 24-30 September, 2016 and in the Institute’s Website: http://www.cnci.org.in

Last date for receipt of applications is 30 days from the date of notification in the Employment News. Director Chittaranjan National Cancer Institute 378, S.P.

Institute’s Website: http://www.cnci.org.in

IMPUTE2

Jit — Thu, 27 Oct 2016 11:21:44 -0500

IMPUTE2 is a computer program for phasing observed genotypes and imputing missing genotypes. Most people use just a couple of the program's basic functions, but we have also built up a collection of specialized and powerful options. If you are new to IMPUTE2, or indeed to phasing and imputation in general, we suggest that you start by learning the basics.

You should begin by downloading the program from here. You will need to choose the link that matches your computing platform and then follow the instructions for opening the download package.

Once you have done this, you will be ready to try some example analyses on the test data that are provided with the download. The section on Examples shows how to use the most common IMPUTE2 functions. We suggest that you work through these examples and try to understand what the elements of each command are doing. If you don't understand something or would like to know if the program can perform a function that isn't listed, you can read our FAQ or submit a question to our mail list.

When you have learned the basic functionality of the program, you can use several features of this website to prepare your own analysis:

Learn about best practices for imputation.
Download reference data that you can use to impute genotypes in your study.
Look through a complete list of program options.

Address of the bookmark: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html

wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly

BioStar — Mon, 04 Feb 2019 04:53:47 -0600

Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output.

./wtdbg2 -x rs -g 4.6m -t 16 -i reads.fa.gz -fo prefix
./wtpoa-cns -t 16 -i prefix.ctg.lay.gz -fo prefix.ctg.fa

Address of the bookmark: https://github.com/ruanjue/wtdbg2

Research Associate and Junior Research Fellow at North-Eastern Hill University - Tura, Meghalaya

Fri, 28 Oct 2016 09:54:43 -0500

Research Associate and Junior Research Fellow
North-Eastern Hill University - Tura, Meghalaya
₹18,000 a month
Applications are invited for the post of Research Associate and JRF in the DBT sponsored Bioinformatics Infrastructure Facility (BIF), posts are purely temporary and terminable at anytime without prior notice or assigning any reason thereof.

Research Associate :
Essential Qualification: Ph.D in Bioinformatics/Biotechnology/Life Science from a reocngised univeristy/institute
Pay: Rs.36000-/- + Admissible 10% HRA per month
Age: Below 35 years

Junior Research Fellow
Essential Qualification: M.Sc in Bioinformatics/Biotechnology/Life Science from a reocngised univeristy/institute
Pay: Rs.18000-/- + per month
Age: Below 35 years

Last date for receving application by mail or post is 08.11.2016

Company Info.
North-Eastern Hill University

Bioinformatics Infrastructure Facility (BIF) Department of RDAP North-Eastern Hill University, Tura Campus Tura-794002, Meghalaya

More at http://www.nehu.ac.in/Advertisements/BIFTuraManpowerAdvt_25102016.pdf

CONTIGuator !

BioStar — Fri, 04 Oct 2019 01:27:58 -0500

CONTIGuator is a Python script for Linux environments whose purpose is to speed-up the bacterial genome assembly process and to obtain a first insight of the genome structure using the well-known artemis comparison tool (ACT).

Address of the bookmark: https://sourceforge.net/projects/contiguator/

3D de novo assembly (3D DNA) pipeline

Jit — Sun, 02 Feb 2020 13:41:55 -0600

For a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see Genome Assembly Cookbook on http://aidenlab.org/assembly.

For the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in (Dudchenko et al., Science, 2017) see the original commit.

For the detailed description of the merge section see https://github.com/theaidenlab/AGWG-merge.

Address of the bookmark: https://github.com/theaidenlab/3d-dna