BOL: Related items

Termal: a fast and interactive terminal-based viewer for multiple sequence alignments

LEGE — Mon, 22 Sep 2025 23:51:02 -0500

termal, a fast, interactive, terminal-based viewer for multiple sequence alignments (MSAs), designed for use on remote systems such as high-performance computing (HPC) clusters.

https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbaf208/8257678?login=true

Address of the bookmark: https://github.com/sib-swiss/termal

HipSTR: Haplotype inference and phasing for Short Tandem Repeats

BioJoker — Thu, 07 Mar 2019 21:13:06 -0600

HipSTR was specifically developed to deal with these errors in the hopes of obtaining more robust STR genotypes. In particular, it accomplishes this by:

Learning locus-specific PCR stutter models using an EM algorithm
Mining candidate STR alleles from population-scale sequencing data
Employing a specialized hidden Markov model to align reads to candidate alleles while accounting for STR artifacts
Utilizing phased SNP haplotypes to genotype and phase STRs

Address of the bookmark: https://github.com/tfwillems/HipSTR

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

BioStar — Tue, 04 Feb 2020 23:23:16 -0600

Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

Usage: perl run_rcorrector.pl [OPTIONS]
OPTIONS:
	Required
	-s seq_files: comma separated files for single-end data sets
	-1 seq_files_left: comma separated files for the first mate in the paried-end data sets
	-2 seq_files_right: comma separated files for the second mate in the paired-end data sets
	-i seq_files_interleaved: comma sperated files for interleaved paired-end data sets
	Optional
	-k INT: kmer_length (<=32, default: 23)
	-od STRING: output_file_directory (default: ./)
	-t INT: number of threads to use (default: 1)
	-trim : allow trimming (default: false)
	-maxcorK INT: the maximum number of correction within k-bp window (default: 4)
	-wk FLOAT: the proportion of kmers that are used to estimate weak kmer count threshold, lower for more divergent genome (default: 0.95)
	-ek INT: expected number of kmers; does not affect the correctness of program but affects the memory usage (default: 100000000)
	-stdout: output the corrected reads to stdout (default: not used)
	-verbose: output some correction information to stdout (default: not used)
	-stage INT: start from which stage (default: 0)
		0-start from begining(storing kmers in bloom filter) ;
		1-start from count kmers showed up in bloom filter;
		2-start from dumping kmer counts into a jf_dump file;
		3-start from error correction.

Address of the bookmark: https://github.com/mourisl/Rcorrector/

FSA: Fast Statistical Alignment

Jit — Mon, 06 Feb 2017 04:26:01 -0600

FSA is a probabilistic multiple sequence alignment algorithm which uses a "distance-based" approach to aligning homologous protein, RNA or DNA sequences. Much as distance-based phylogenetic reconstruction methods like Neighbor-Joining build a phylogeny using only pairwise divergence estimates, FSA builds a multiple alignment using only pairwise estimations of homology. This is made possible by the sequence annealing technique for constructing a multiple alignment from pairwise comparisons, developed by Ariel Schwartz in "Posterior Decoding Methods for Optimization and Control of Multiple Alignments."

FSA brings the high accuracies previously available only for small-scale analyses of proteins or RNAs to large-scale problems such as aligning thousands of sequences or megabase-long sequences. FSA introduces several novel methods for constructing better alignments:

FSA uses machine-learning techniques to estimate gap and substitution parameters on the fly for each set of input sequences. This "query-specific learning" alignment method makes FSA very robust: it can produce superior alignments of sets of homologous sequences which are subject to very different evolutionary constraints.
FSA is capable of aligning hundreds or even thousands of sequences using a randomized inference algorithm to reduce the computational cost of multiple alignment. This randomized inference can be over ten times faster than a direct approach with little loss of accuracy.
FSA can quickly align very long sequences using the "anchor annealing" technique for resolving anchors and projecting them with transitive anchoring. It then stitches together the alignment between the anchors using the methods described above.
The included GUI, MAD (Multiple Alignment Display), can display the intermediate alignments produced by FSA, where each character is colored according to the probability that it is correctly aligned (see the picture and movie at the top of the page).

You can see more information on the FAQ.

Address of the bookmark: http://fsa.sourceforge.net/

RaGOO: Fast Reference-Guided Scaffolding of Genome Assembly Contigs

Jit — Sun, 27 Oct 2019 00:57:23 -0500

Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC: Fast and accurate reference-guided scaffolding of draft genomes. bioRxiv 2019.

RaGOO is a tool for coalescing genome assembly contigs into pseudochromosomes via minimap2 alignments to a closely related reference genome. The focus of this tool is on practicality and therefore has the following features:

Good performance. On a MacBook Pro using Arabidopsis data, pseudochromosome construction takes less than a minute and the whole pipeline with SV calling takes ~2 minutes.
Intact ordering and orienting of contigs.
Misassembly correction
GFF lift-over
Structural variant calling with and integrated version of Assemblytics
Confidence scores associated with the grouping, localization, and orientation for each contig.

Address of the bookmark: https://github.com/malonge/RaGOO

MMseqs2.0: ultra fast and sensitive protein search and clustering suite

Jit — Thu, 22 Mar 2018 10:40:51 -0500

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

The MMseqs2 user guide is available as Github Wiki or as PDF file (Thanks to pandoc!)

Please cite: Steinegger M and Soeding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi: 10.1038/nbt.3988 (2017).

Address of the bookmark: https://github.com/soedinglab/MMseqs2

LSC :a long read error correction tool

Jit — Thu, 02 Aug 2018 07:39:46 -0500

Getting Started

These simple steps will help you integrate LSC into your transcriptomics analysis pipeline.

Read the LSC_requirements for running LSC.
Download and set-up the LSC package.
Follow the tutorial to see how LSC works on some example data.
Read the manual if anything is unclear.
You're ready, Happy LSCing!

Latest publication

Kin Fai Au, Jason Underwood, Lawrence Lee and Wing Hung Wong
Improving PacBio Long Read Accuracy by Short Read Alignment [Manuscript]
PLoS ONE 2012. 7(10): e46679. doi:10.1371/journal.pone.0046679

Address of the bookmark: https://www.healthcare.uiowa.edu/labs/au/LSC/

FRODOCK 2.0: fast protein–protein docking server

Neel — Wed, 17 Oct 2018 04:31:30 -0500

frodock: a user-friendly protein–protein docking server based on an improved version of FRODOCK that includes a complementary knowledge-based potential. The web interface provides a very effective tool to explore and select protein–protein models and interactively screen them against experimental distance constraints. The competitive success rates and efficiency achieved allow the retrieval of reliable potential protein–protein binding conformations that can be further refined with more computationally demanding strategies.

Address of the bookmark: http://frodock.chaconlab.org/

Shouji: a fast and efficient pre-alignment filter for sequence alignment

Jit — Mon, 04 Nov 2019 07:09:45 -0600

The ability to generate massive amounts of sequencing data continues to overwhelm the processing capacity of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes.

We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator design that adopts modern FPGA (field-programmable gate array) architectures to further boost the performance of our algorithm.

More at https://github.com/CMU-SAFARI/Shouji

Address of the bookmark: https://github.com/CMU-SAFARI/Shouji