BOL: Related items

Genome U-Plot: a whole genome visualization

Rahul Nayak — Fri, 13 Jul 2018 19:50:41 -0500

Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities.

https://github.com/gaitat/GenomeUPlot

Address of the bookmark: https://github.com/gaitat/GenomeUPlot

mutatrix: a population genome simulator which generates simulated genomes.

Jit — Tue, 28 Jan 2020 04:06:58 -0600

genome simulation across a population with zeta-distributed allele frequency, snps, insertions, deletions, and multi-nucleotide polymorphisms

More at https://github.com/ekg/mutatrix

./mutatrix -S sample -P test/ -p 2 -n 10 reference.fasta

Address of the bookmark: https://github.com/ekg/mutatrix

VCF Compare !

Rahul Nayak — Wed, 19 Jan 2022 10:30:14 -0600

compare two BWA mapping methods with the online hg18-mapped data

We first operate a rapid inspection of the different BAM files using samtools flagstat. Illumina provided chr21 read mapping obtained with their GA IIx deep sequencing platform <ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam>, aligned to the b36/hg18 reference genome)

Address of the bookmark: https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare

BASE: a practical de novo assembler for large genomes using long NGS reads

Rahul Nayak — Fri, 19 Oct 2018 07:25:21 -0500

new de novo assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.

Address of the bookmark: https://github.com/dhlbh/BASE

ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data

LEGE — Wed, 08 May 2024 07:32:45 -0500

ContigExtender, was developed to extend contigs, complementing de novo assembly. ContigExtender employs a novel recursive Overlap Layout Candidates (r-OLC) strategy that explores multiple extending paths to achieve longer and highly accurate contigs. ContigExtender is effective for extending contigs significantly in in silico synthesized and real metagenomics datasets.

More at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953547/

Address of the bookmark: https://github.com/dengzac/contig-extender

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

Rahul Nayak — Fri, 11 May 2018 05:07:45 -0500

MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assemblying large genomes. For example, on a 32-thread computer with 2.0 GHz CPU , MECAT takes 9.5 days to assemble a human genome based on 54x SMRT data, which is 40 times faster than the current PBcR-Mhap pipeline. MECAT performance were compared with PBcR-Mhap pipeline, FALCON and Canu(v1.3) in five real datasets. The quality of assembled contigs produced by MECAT is the same or better than that of the PBcR-Mhap pipeline and FALCON.

https://www.nature.com/articles/nmeth.4432

Address of the bookmark: https://github.com/xiaochuanle/MECAT

assemblytics: delta file to analyze alignments of an assembly to another assembly or a reference genome

Jit — Thu, 14 Jun 2018 07:31:00 -0500

Download and install MUMmer Align your assembly to a reference genome using nucmer (from MUMmer package) $ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT Consult the MUMmer manual if you encounter problems Optional: Gzip the delta file to speed up upload (usually 2-4X faster) $ gzip OUT.delta Then use the OUT.delta.gz file for upload. Upload the .delta or delta.gz file (view example) to Assemblytics Important: Use only contigs rather than scaffolds from the assembly. This will prevent false positives when the number of Ns in the scaffolded sequence does not match perfectly to the distance in the reference. The unique sequence length required represents an anchor for determining if a sequence is unique enough to safely call variants from, which is an alternative to the mapping quality filter for read alignment. http://assemblytics.com/

Address of the bookmark: http://assemblytics.com/

HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly

BioStar — Thu, 27 Sep 2018 07:08:47 -0500

HM2 can process any diploid assemblies, but it is especially suitable for diploid assemblies with high heterozygosity (≥3%), which can be difficult for other tools. This pipeline also implements flexible and sensitive assembly error detection, a hierarchical scaffolding procedure and a reliable gap-closing method for haploid sub-assemblies.

Source code, executables and the testing dataset are freely available at https://github.com/mapleforest/HaploMerger2/releases/.

Address of the bookmark: https://github.com/mapleforest/HaploMerger2/releases/

Referee: Genome assembly quality scores

Jit — Sun, 04 Nov 2018 16:44:30 -0600

Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.

We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.

Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.

https://github.com/gwct/referee

Address of the bookmark: https://gwct.github.io/referee/#