BOL: Related items

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

Jit — Tue, 15 May 2018 07:35:26 -0500

HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy. We found that previously described haplotype assembly methods are specialized for specific read technologies or protocols, with slow or inaccurate performance on others. With this in mind, HapCUT2 is designed for speed and accuracy across diverse sequencing technologies, including but not limited to: NGS short reads (Illumina HiSeq) clone-based sequencing (Fosmid or BAC clones) SMRT reads (PacBio) Oxford Nanopore reads 10X Genomics Linked-Reads proximity-ligation (Hi-C) reads high-coverage sequencing (>40x coverage-per-SNP) using above technologies combinations of the above technologies (e.g. scaffold long reads with Hi-C reads) See below for specific examples of command line options and best practices for some of these technologies. NOTE: At this time HapCUT2 is for diploid organisms only. VCF input should contain diploid variants. If you use HapCUT2 in your research, please cite: Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. gr.213462.116 (2016). doi:10.1101/gr.213462.116

Address of the bookmark: https://github.com/vibansal/HapCUT2

npScarf: real-time scaffolder using SPAdes contigs and Nanopore sequencing reads

Shruti Paniwala — Mon, 11 Jun 2018 05:14:57 -0500

npScarf (jsa.np.npscarf) is a program that connect contigs from a draft genomes to generate sequences that are closer to finish. These pipelines can run on a single laptop for microbial datasets. In real-time mode, it can be integrated with simple structural analyses such as gene ordering, plasmid forming.

Address of the bookmark: http://japsa.readthedocs.io/en/latest/tools/jsa.np.npscarf.html

PureCN: copy number calling and SNV classification using targeted short read sequencing

Jit — Thu, 09 Aug 2018 04:09:37 -0500

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Author: Markus Riester [aut, cre], Angad P. Singh [aut]

Maintainer: Markus Riester

Citation (from within R, enter citation("PureCN")):

Riester M, Singh A, Brannon A, Yu K, Campbell C, Chiang D, Morrissey M (2016). “PureCN: Copy number calling and SNV classification using targeted short read sequencing.” Source Code for Biology and Medicine, 11, 13. doi: 10.1186/s13029-016-0060-z.

Address of the bookmark: http://bioconductor.org/packages/release/bioc/html/PureCN.html

LoRMA: A tool for correcting sequencing errors in long reads

Abhimanyu Singh — Thu, 06 Sep 2018 16:21:01 -0500

An error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

conda install -c atgc-montpellier lorma

Address of the bookmark: https://gite.lirmm.fr/lorma/lorma-releases/wikis/home

lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data

BioJoker — Tue, 27 Nov 2018 04:43:57 -0600

lordFAST is a sensitive tool for mapping long reads with high error rates. lordFAST is specially designed for aligning reads from PacBio sequencing technology but provides the user the ability to change alignment parameters depending on the reads and application.

lordFAST, a novel long-read mapper that is specifically designed to align reads generated by PacBio and potentially other SMS technologies to a reference. lordFAST not only has higher sensitivity than the available alternatives, it is also among the fastest and has a very low memory footprint.

Address of the bookmark: https://github.com/vpc-ccg/lordfast

Flye: Fast and accurate de novo assembler for single molecule sequencing reads

Rahul Nayak — Sat, 06 Jul 2019 03:48:22 -0500

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.

Address of the bookmark: https://github.com/fenderglass/Flye

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Jit — Tue, 28 Jan 2020 03:27:33 -0600

FastGT is a program package for whole-genome genotyping of genome variants directly from raw sequencing reads. It is written in C and runs in Linux. FastGT uses a list of variant-specific k-mer pairs that are unique in human genome, counts the frequency of k-mers in sequencing data and predicts the genotype. All this takes less than 1 hour on average low-cost Linux server.

http://bioinfo.ut.ee/FastGT/

https://github.com/bioinfo-ut/GenomeTester4/

Address of the bookmark: http://bioinfo.ut.ee/FastGT/

HASLR: a hybrid assembler which uses both second and third generation sequencing reads

BioStar — Mon, 04 May 2020 02:04:03 -0500

HASLR, a hybrid assembler which uses both second and third generation sequencing reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples. Availability. HASLR is an open source tool available at https://github.com/vpc-ccg/haslr.

Address of the bookmark: https://github.com/vpc-ccg/haslr

Smudgeplot: Inference of ploidy and heterozygosity structure using whole genome sequencing data

Neel — Fri, 25 Feb 2022 04:42:09 -0600

This tool extracts heterozygous kmer pairs from kmer count databases and performs gymnastics with them. We are able to disentangle genome structure by comparing the sum of kmer pair coverages (CovA + CovB) to their relative coverage (CovB / (CovA + CovB)). Such an approach also allows us to analyze obscure genomes with duplications, various ploidy levels, etc.

Smudgeplots are computed from raw or even better from trimmed reads and show the haplotype structure using heterozygous kmer pairs. For example:

Address of the bookmark: https://github.com/KamilSJaron/smudgeplot

ASplice: a scalable and memory-efficient algorithm for de novo transcriptome assembly

Rahul Nayak — Tue, 03 Jul 2018 04:09:46 -0500

With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. Texas A&M University researchers develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. Availability – A software program that implements the algorithm is available at: http://faculty.cse.tamu.edu/shsze/asplice. Sze SH, Pimsler ML, Tomberlin JK, Jones CD, Tarone AM. (2017) A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. BMC Genomics 18(Suppl 4):387.

Address of the bookmark: http://faculty.cse.tamu.edu/shsze/asplice/