BOL: Related items

LAMSA: fast split read alignment with long approximate matches

Jit — Tue, 15 May 2018 04:44:42 -0500

LAMSA (Long Approximate Matches-based Split Aligner) is a novel split alignment approach with faster speed and good ability of handling SV events. It is well-suited to align long reads (over thousands of base-pairs). LAMSA takes takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming (SDP)-based split alignment approach to handle the large or non-co-linear variants. We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; mean-while, it also has good ability to handle various categories of SVs. LAMSA is open source and free for non-commercial use. LAMSA is mainly designed by Bo Liu & Yan Gao and developed by Yan Gao in Center for Bioinformatics, Harbin Institute of Technology, China.

Address of the bookmark: https://github.com/hitbc/LAMSA

Indexcov: fast coverage quality control for whole-genome sequencing

Jit — Wed, 29 Aug 2018 09:20:46 -0500

indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license.

Address of the bookmark: https://github.com/brentp/goleft

FLAS: fast and high throughput algorithm for PacBio long read self-correction.

Jit — Sat, 22 Jun 2019 12:16:39 -0500

FLAS, a wrapper algorithm of MECAT, to achieve high throughput long read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy.

Address of the bookmark: https://github.com/baoe/flas

SeQuiLa-cov: A fast and scalable library for depth of coverage calculations

Jit — Sun, 15 Dec 2019 10:19:35 -0600

The Docker image is available at https://hub.docker.com/r/biodatageeks/. Supplementary information on benchmarking procedure as well as test data are publicly accessible at the project documentation site http://biodatageeks.org/sequila/benchmarking/benchmarking.html#depth-of-coverage. An archival copy of the code and supporting data is also available via the GigaScience database GigaDB

• Project name: SeQuiLa-cov

• Project home page: http://biodatageeks.org/sequila/

• Source code repository: https://github.com/ZSI-Bio/bdg-sequila

• Operating system: Platform independent

• Programming language: Scala

• Other requirements: Docker

• License: Apache License 2.0

Address of the bookmark: https://academic.oup.com/gigascience/article/8/8/giz094/5543653

MMseqs2: ultra fast and sensitive sequence search and clustering suite

Manisha Mishra — Mon, 18 Jan 2021 10:47:56 -0600

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

Address of the bookmark: https://github.com/soedinglab/MMseqs2

UniAligner: a parameter-free framework for fast sequence alignment

Abhi — Fri, 08 Mar 2024 23:36:12 -0600

UniAligner (formerly, TandemAligner) is the first parameter-free algorithm for sequence alignment that introduces a sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. Classical alignment approaches, such as the Smith-Waterman algorithm, that work well for most sequences, fail to construct biologically adequate alignments of extra-long tandem repeats (ETRs), such as human centromeres and immunoglobulin loci. This limitation was overlooked in the previous studies since the sequences of the centromeres and other ETRs across multiple genomes only became available recently.

More at https://www.nature.com/articles/s41592-023-01970-4

Address of the bookmark: https://github.com/seryrzu/unialigner

Tools and Method for Haplotype phasing !

Manisha Mishra — Fri, 04 Sep 2020 20:41:40 -0500

Huge amounts of genotype data are being produced with recent technological advances, both from increasingly comprehensive and inexpensive genome-wide SNP microarrays and from ever more accessible whole-genome and whole-exome sequencing methods. The vast amount of knowledge contained in these results, however, is best exploited through phased haplotypes, which classify the alleles co-located on the same chromosome. Since sequence and SNP array data normally take the form of unphased genotypes, one does not specifically observe which of the two parental chromosomes, or haplotypes, falls on a specific allele. Fortunately, new advances in both computational and laboratory methods promise improved determination of haplotype phase. Following are useful tools :

Arlequin: http://cmpg.unibe.ch/software/arlequin3/

BEAGLE: http://faculty.washington.edu/browning/beagle/beagle.html

fastPHASE: http://stephenslab.uchicago.edu/software.html

GENEHUNTER: http://linkage.rockefeller.edu/soft/gh/

The Genome Analysis Toolkit:

http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit

IMPUTE2: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html

MACH: http://www.sph.umich.edu/csg/abecasis/MACH/

MERLIN: http://www.sph.umich.edu/csg/abecasis/Merlin/

PHASE: http://stephenslab.uchicago.edu/software.html

PL-EM: http://www.people.fas.harvard.edu/~junliu/plem/

“Read-backed phasing” algorithm: http://www.broadinstitute.org/gsa/wiki/index.php/Read-backed_phasing_algorithm

SHAPE-IT: http://www.griv.org/shapeit/

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

Jit — Tue, 15 May 2018 07:35:26 -0500

HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy. We found that previously described haplotype assembly methods are specialized for specific read technologies or protocols, with slow or inaccurate performance on others. With this in mind, HapCUT2 is designed for speed and accuracy across diverse sequencing technologies, including but not limited to: NGS short reads (Illumina HiSeq) clone-based sequencing (Fosmid or BAC clones) SMRT reads (PacBio) Oxford Nanopore reads 10X Genomics Linked-Reads proximity-ligation (Hi-C) reads high-coverage sequencing (>40x coverage-per-SNP) using above technologies combinations of the above technologies (e.g. scaffold long reads with Hi-C reads) See below for specific examples of command line options and best practices for some of these technologies. NOTE: At this time HapCUT2 is for diploid organisms only. VCF input should contain diploid variants. If you use HapCUT2 in your research, please cite: Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. gr.213462.116 (2016). doi:10.1101/gr.213462.116

Address of the bookmark: https://github.com/vibansal/HapCUT2

Troyanskaya Lab

Tue, 04 Feb 2020 06:40:36 -0600

The goal of our research is to interpret and distill this complexity through accurate analysis and modeling of molecular pathways, particularly those in which malfunctions lead to the manifestation of disease. We are inventing integrative methods for systems-level pathway modeling through integrative analysis of genome-scale datasets. We apply these approaches in studying challenging biological problems, such as how pathways function in diverse cell types and how they change dynamically.

https://function.princeton.edu/

Liftoff: an accurate tool that maps annotations in GFF or GTF between assemblies

Jit — Tue, 30 Jun 2020 21:40:52 -0500

Liftoff, an accurate tool that maps annotations in GFF or GTF between assemblies of the same, or closely-related species. Unlike current coordinate lift-over tools which require a pre-generated “chain” file as input, Liftoff is a standalone tool that takes two genome assemblies and a reference annotation as input and outputs an annotation of the target genome.

Address of the bookmark: https://github.com/agshumate/Liftoff