BOL: Related items

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

Jit — Thu, 03 Feb 2022 04:01:55 -0600

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

USAGE:

-query: sequence A in fasta format
-db: sequence B in fasta format
-out: output matrix
-kmer Integer: k>1 (default 32) Use 32 for chromosomes and genomes and 16 for small bacteria
-diffuse Integer: z>0 (default 4) Use 4 for everything - if using large plant genomes you can try using 1
-dimension Size of the output matrix and plot. Integer: d>0 (default 1000) Use 1000 for everything that is not full genome size, where 2000 is recommended

Address of the bookmark: https://github.com/estebanpw/chromeister

HipSTR: Haplotype inference and phasing for Short Tandem Repeats

BioJoker — Thu, 07 Mar 2019 21:13:06 -0600

HipSTR was specifically developed to deal with these errors in the hopes of obtaining more robust STR genotypes. In particular, it accomplishes this by:

Learning locus-specific PCR stutter models using an EM algorithm
Mining candidate STR alleles from population-scale sequencing data
Employing a specialized hidden Markov model to align reads to candidate alleles while accounting for STR artifacts
Utilizing phased SNP haplotypes to genotype and phase STRs

Address of the bookmark: https://github.com/tfwillems/HipSTR

HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution

Jit — Wed, 07 Feb 2018 09:40:22 -0600

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"

Preprint: http://biorxiv.org/content/early/2016/08/01/062117
Paper: http://genome.cshlp.org/content/27/5/747.full
An ipython notebook to reproduce results in the paper can be found in this repository.

HINGE is an OLC(Overlap-Layout-Consensus) assembler. The idea of the pipeline is shown below.

Address of the bookmark: https://github.com/HingeAssembler/HINGE

INC-Seq: accurate single molecule reads using nanopore sequencing

Jit — Mon, 27 Nov 2017 10:38:56 -0600

INC-Seq reads enabled accurate species-level classification, identification of species at 0.1 % abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.

Address of the bookmark: https://github.com/CSB5/INC-Seq

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads

Jit — Tue, 12 Jun 2018 08:14:41 -0500

P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs.

Address of the bookmark: http://www.fishbrowser.org/software/P_RNA_scaffolder/

MSAProbs - Parallel and accurate multiple sequence alignment

Neel — Tue, 09 Jul 2019 23:58:44 -0500

MSAProbs is a well-established state-of-the-art multiple sequence alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. In addition, MSAProbs is optimized for shared-memory CPUs by employing a multi-threaded design, and further parallelized for distributed-memory systems using MPI to overcome high memory overhead barrier and achieve good parallel and data-size scalability.

Address of the bookmark: http://msaprobs.sourceforge.net/homepage.htm#latest

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

Jit — Thu, 24 Dec 2020 10:03:36 -0600

Hifiasm is a fast haplotype-resolved de novo assembler for PacBio Hifi reads. It can assemble a human genome in several hours and works with the California redwood genome, one of the most complex genomes sequenced so far. Hifiasm can produce primary/alternate assemblies of quality competitive with the best assemblers. It also introduces a new graph binning algorithm and achieves the best haplotype-resolved assembly given trio data.

Address of the bookmark: https://github.com/chhylp123/hifiasm

dna2bit: an ultra-fast and accurate genomic distance estimation software

LEGE — Sun, 31 Aug 2025 06:24:58 -0500

dna2bit is a software tool developed in C++11, leveraging the capabilities of OpenMP for parallel computing and the popcount technique for efficient bit manipulation. It has been thoroughly tested using the g++ and clang compilers on both Linux and MacOS platforms.

Address of the bookmark: https://github.com/lijuzeng/dna2bit

RapClust: Accurate, Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes

Rahul Nayak — Thu, 04 Oct 2018 17:57:10 -0500

RapClust is a tool for clustering contigs from de novo transcriptome assemblies. RapClust is designed to be run downstream of the Sailfish or Salmon tools for rapid transcript-level quantification. Specifically, RapClust relies on the fragment equivalence classes computed by these tools in order to determine how seqeunce is shared across the transcriptome, and how reads map to potentially-related contigs across different conditions.

Address of the bookmark: https://github.com/COMBINE-lab/RapClust

Cleaner BLAST Databases for More Accurate Results

LEGE — Tue, 23 Apr 2024 01:23:08 -0500

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases.

Foreign Contamination Screen tool for genome cross-species screening (FCS-GX) detects contamination from foreign organisms in genomes and other sequences using the genome cross-species aligner (GX)
Average Nucleotide Identity (ANI) evaluates the taxonomic classification of prokaryotic genome assemblies. Sequences from genomes marked up as ‘unverified source organism’ are considered suspect and removed.

Ref https://ncbiinsights.ncbi.nlm.nih.gov/2024/04/22/cleaner-blast-databases-more-accurate-results/