BOL: Related items

GARM:Genome Assembly, Reconciliation and Merging

Jit — Mon, 19 Dec 2016 06:03:02 -0600

The pipeline is based mainly implemented using Perl scripts and modules and third-party open source software like the AMOS (Myers et al., 2000) and MUMmer (Kurtz et al., 2004) packages. The pipeline was tested on Debian, Ubuntu, Fedora and BioLinux distributions. The method merges contigs or scaffolds from different assemblers using the same or different sequencing technologies. When scaffolds are provided, a process of finding probable compressions or extensions (CE) problems in the assemblies can be per-formed; contigs are joined back into scaffolds after gap recalculation

Address of the bookmark: http://garm-meta-assem.sourceforge.net/

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

GKNO

Jit — Tue, 17 Jan 2017 03:35:34 -0600

gkno opens the world of complex bioinformatic analysis to people of all level of computational expertise. This site contains documentation, tutorials and information on all the tools that comprise gkno.

More at http://gkno.me/

Address of the bookmark: http://gkno.me/

GenomeComp

Jit — Fri, 17 Feb 2017 08:38:32 -0600

GenomeComp is a tool for summarizing, parsing and visualizing the genome wide sequence comparison results derived from voluminous BLAST textual output, so as to locate the rearrangements, insertions or deletions of genome segments between species or strains.

It can be easily used to compare, parsing and visualize large genomic sequences, especially closely related genomes such as inter-species or inter-strains. In addition, it can also show other sequence features like repeat sequence distributions in one whole-genome DNA sequence by comparing the genome to itself.

It is a stand-alone graphical user interface (GUI) program which runs on Linux, Unix, Mac OS X (tested on version 10.2.4 only) and Microsoft Windows platforms and is written in Perl/Tk.

Address of the bookmark: http://www.mgc.ac.cn/GenomeComp/

HTSlib

Jit — Wed, 15 Mar 2017 11:38:05 -0500

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
HTSlib: A C library for reading/writing high-throughput sequencing data

Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of htslib so they can be built independently.

Address of the bookmark: http://www.htslib.org/

Fools guide

Poonam Mahapatra — Sun, 02 Apr 2017 14:31:18 -0500

This website and accompaning documents are intended as a tool to help researchers dealing with non-model organisms acquire and process transcriptomic high-throughput sequencing data without having to learn extensive bioinformatics skills. It covers all steps from tissue collection, sample preparation and computer setup, through addressing biological questions with gene expression and SNP data.

http://sfg.stanford.edu/denovo.html

http://sfg.stanford.edu/sequencing.html

http://sfg.stanford.edu/BLAST.html

http://sfg.stanford.edu/denovo.html

Address of the bookmark: http://sfg.stanford.edu/guide.html

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.

Abhimanyu Singh — Tue, 23 May 2017 05:20:32 -0500

GRASS (GeneRic ASsembly Scaffolder)-a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation-maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.

Address of the bookmark: https://github.com/AlexeyG/GRASS

Sr.Bioinformatics Analyst (NGS) at Ocimum

Fri, 17 Nov 2017 07:50:44 -0600

JOB FUNCTIONBio Tech/R&D/Scientist
INDUSTRYBiotechnology/Pharmaceutical/Medicine
SPECIALIZATIONBasic Research,Bio-Statistician,Clinical Research
QUALIFICATION
Any Post Graduate
BA (Arts), B.Com. (Commerce), BE/ B.Tech (Engineering), B.Pharm. (Pharmacy), B.Sc. (Science), BL/LLB, BDS (Dental Surgery), B.Ed. (Education), BHM (Hotel Management), BBA/ BBM/ BBS, B.Arch. (Architecture), BCA (Computer Application), Diploma-Other Diploma, B.Plan. (Planning), BGL, B.V.Sc. (Veterinary Science), Other School/ Graduation, BHMS (Homeopathy), BAMS (Ayurveda)
Job Description

1. Must have basic understanding of molecular biology and Genomics.
2. Experience in application development or must have expertise in programming using either of Perl/Python.
3. Experience in statistical programming using R/Bioconductor/Matlab.
4. Strong concept in statistical and mathematical modelling.
5. Experience in designing and developing the bioinformatics pipeline.
6. Must have minimum 2+ years of hands on experience in NSG data analysis such as RNA-Seq,Exome-Seq ,Chip-Seq and downstream analysis.
7. Knowledge in WGS ,WES, Targeted re-sequencing,GWAS and population genomics will be preferred.
8. Must have experience working on opensource software/Framework and commercial software for NGS data analysis and reporting.
9. Should be aware of handling big data and guiding team members on multiple projects simultaneously.
10. Should have experience coordinating with different groups of clinical research scientist for various project requirements.
11. Ability to work as team as well as independently with minimal support.

More at http://www3.ocimumbio.com/

MIX: Combining multiple assemblies from NGS data

Rahul Nayak — Tue, 08 May 2018 04:58:05 -0500

Mix is a tool that combines two or more draft assemblies, without relying on a reference genome and has the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a path in the extension graph that maximizes the cumulative contig length.

The Mix algorithm, approach and results were published in BMC bioinformatics : http://www.biomedcentral.com/1471-2105/14/S15/S16.

Address of the bookmark: https://github.com/cbib/MIX

HALC: High throughput algorithm for long read error correction

Jit — Fri, 08 Jun 2018 10:47:41 -0500

HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach) HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.

Address of the bookmark: https://github.com/lanl001/halc