BOL: Related items

dna2bit: an ultra-fast and accurate genomic distance estimation software

LEGE — Sun, 31 Aug 2025 06:24:58 -0500

dna2bit is a software tool developed in C++11, leveraging the capabilities of OpenMP for parallel computing and the popcount technique for efficient bit manipulation. It has been thoroughly tested using the g++ and clang compilers on both Linux and MacOS platforms.

Address of the bookmark: https://github.com/lijuzeng/dna2bit

List of bioinformatics packages for NGS analysis !

Rahul Nayak — Sat, 20 Mar 2021 00:28:51 -0500

Package suites gather software packages and installation tools for specific languages or platforms. We have some for bioinformatics software.

Bioconductor – A plethora of tools for analysis and comprehension of high-throughput genomic data, including 1500+ software packages. [ paper-2004 | web ]
Biopython – Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the Open Bioinformatics Foundation. Contains the very useful Entrez package for API access to the NCBI databases. [ paper-2009 | web ]
Bioconda – A channel for the conda package manager specializing in bioinformatics software. Includes a repository with 3000+ ready-to-install (with conda install) bioinformatics packages. [ paper-2018 | web ]
BioJulia – Bioinformatics and computational biology infastructure for the Julia programming language. [ web ]
Rust-Bio – Rust implementations of algorithms and data structures useful for bioinformatics. [ paper-2016 ]
SeqAn – The modern C++ library for sequence analysis.

Levenshtein and Damerau-Levenshtein distance !

Surabhi Chaudhary — Tue, 28 Sep 2021 04:38:55 -0500

Levenshtein Distance

Also known as Edit Distance, it is the number of transformations (deletions, insertions, or substitutions) required to transform a source string into the target one. For example, if the target term is “book” and the source is “back”, you will need to change the first “o” to “a” and the second “o” to “c”, which will give us a Levenshtein Distance of 2.Edit Distance is very easy to implement, and it is a popular challenge during code interviews

Additionally, some frameworks also support the Damerau-Levenshtein distance:

Damerau-Levenshtein distance

It is an extension to Levenshtein Distance, allowing one extra operation: Transposition of two adjacent characters:

Ex: TSAR to STAR

Damerau-Levenshtein distance = 1 (Switching S and T positions cost only one operation)

Levenshtein distance = 2 (Replace S by T and T by S)

GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research

Rahul Nayak — Sat, 26 Jan 2019 18:58:16 -0600

GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).

Address of the bookmark: https://github.com/b-brankovics/grabb

Mash: fast genome and metagenome distance estimation using MinHash

Jit — Tue, 12 Dec 2017 17:30:12 -0600

Mash is normally distributed as a dependency-free binary for Linux or OSX (see https://github.com/marbl/Mash/releases). This source distribution is intended for other operating systems or for development. Mash requires c++11 to build, which is available in and GCC >= 4.8 and OSX >= 10.7.

See http://mash.readthedocs.org for more information.

Address of the bookmark: https://github.com/marbl/Mash/releases

GRIDSS: the Genomic Rearrangement IDentification Software Suite

Rahul Nayak — Sun, 17 May 2020 10:27:44 -0500

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. GRIDSS includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. GRIDSS calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.

Address of the bookmark: https://github.com/PapenfussLab/gridss

Andi

Jit — Fri, 13 May 2016 05:16:35 -0500

This is the andi program for estimating the evolutionary distance between closely related genomes. These distances can be used to rapidly infer phylogenies for big sets of genomes. Because andi does not compute full alignments, it is so efficient that it scales even up to thousands of bacterial genomes.

This readme covers all necessary instructions for the impatient to get andi up and running. For extensive instructions please consult the manual.

More at https://github.com/evolbioinf/andi/

Address of the bookmark: http://bioinformatics.oxfordjournals.org/content/early/2015/01/13/bioinformatics.btu815.full

MGSE: Mapping-based Genome Size Estimation

Shruti Paniwala — Fri, 17 Jan 2020 02:11:43 -0600

MGSE can harness the power of files generated in genome sequencing projects to predict the genome size. Required are the FASTA file containing a high continuity assembly and a BAM file with all available reads mapped to this assembly. The script construct_cov_file.py (https://doi.org/10.1186/s12864-018-5360-z) allows the generation of a COV file based on the (sorted) BAM file (also possible via MGSE directly). Next, this COV file can be used by MGSE to calculate the coverage in provided reference regions and to calculate the total number of mapped bases. Both values are subjected to the genome size estimation. Providing accurate reference regions is crucial for this genome size estimation.

Address of the bookmark: https://github.com/bpucker/MGSE

GAM-NGS: genomic assemblies merger for next generation sequencing

Jit — Fri, 19 May 2017 07:44:14 -0500

GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.

Address of the bookmark: https://github.com/vice87/gam-ngs

Genomic Open-source Breeding informatics initiative

BioStar — Wed, 06 Jan 2021 19:42:21 -0600

To build open-source genomic data management and analysis tools to enable breeders to implement genomic and marker-assisted selection as part of their routine breeding programs.

To transform breeding by connecting diverse data with precision breeding tools to advance yields and adaptation to local growing conditions, bringing global communities closer to a sustainable, reliable food supply.

Address of the bookmark: http://cbsugobii05.biohpc.cornell.edu/wordpress/