BOL: Related items

SiLiX: implements an ultra-efficient algorithm for the clustering of homologous sequences

Jit — Wed, 12 Dec 2018 09:22:41 -0600

The software package SiLiX implements an ultra-efficient algorithm for the clustering of homologous sequences, based on single transitive links (single linkage) with alignment coverage constraints.

SiLiX adopts a graph-theoretical framework to interpret similarity pairs as edges of a network. A very efficient algorithm, based on the Disjoint Sets Data Structure, allows the computation of sequence families with low time and space requirements.

A parallel version of SiLiX, based on MPI, is also available in this package and has been proved to be scalable, so that its allows the study of very large datasets.

SiLiX is already included in the analysis pipeline for HOGENOM.

Address of the bookmark: http://lbbe.univ-lyon1.fr/SiLiX?lang=fr

Trust But Verify: Sequencing Your Cell Lines Might Reveal an Uninvited Guest

LEGE — Wed, 04 Jun 2025 00:07:57 -0500

High-throughput sequencing has become indispensable in cell biology, enabling detailed insights into chromatin structure, gene expression, and regulatory dynamics. Yet, when faced with unexpectedly low mapping rates to the human genome, researchers often rush to troubleshoot technical parameters—sequencer quality, adapter trimming, or aligner settings.

Before you go down that path, consider this critical biological question:
Are you sequencing human cells—or bacterial contamination?

The Silent Saboteur: Mycoplasma in Cell Cultures

Mycoplasma contamination remains one of the most widespread and underdiagnosed issues in tissue culture work. Studies suggest that 15–35% of cell lines in use may be contaminated, often without visible signs. Unlike other microbial infections, Mycoplasma does not produce cloudiness, odor, or a change in pH. Many researchers won’t detect it unless they specifically test for it.

The consequences, however, are profound. Mycoplasma can significantly alter:

Host gene expression patterns
Cell proliferation rates
Epigenetic profiles and chromatin accessibility
Cytokine signaling and immune responses

In short, it can skew your results, compromise your biological conclusions, and invalidate weeks or months of research.

A Simple Diagnostic Step: Map Against Mycoplasma Genomes

If you encounter poor alignment rates to the human genome, consider mapping your reads to a Mycoplasma reference genome—or better yet, use a combined human + Mycoplasma reference. There have been cases where over half of all reads, initially assumed to be from human cells, were in fact bacterial in origin. This check is fast, easy, and could save your project.

How Contamination Happens—and Persists

Mycoplasma is small (0.1–0.3 μm), lacks a cell wall, and can pass through standard filters undetected. Common sources include:

Contaminated reagents (e.g., FBS)
Infected cell lines obtained from other labs
Poor aseptic technique or shared equipment

Once present, it spreads quickly between cultures and can persist for months, silently affecting results.

Why Treatment Is Difficult

While antibiotics such as Plasmocin or BM-Cyclin are sometimes used, they often offer only partial resolution and may themselves alter cell behavior. In many cases, the best course of action is to discard the contaminated culture and start with a fresh, verified stock.

Practical Recommendations for Researchers

Routinely test for Mycoplasma using PCR, qPCR, or fluorescence-based assays
Incorporate contamination screens into your sequencing QC pipeline
Use combined reference genomes when mapping ambiguous reads
Practice strict aseptic technique and monitor all incoming cell lines
Don’t ignore unexplained data anomalies—they might point to contamination

Closing Thought: Contamination Is a Biological Variable

It’s easy to view poor mapping as a technical issue, but sometimes the problem lies deeper—in the biology itself. Mycoplasma contamination doesn’t just interfere with sequencing; it interferes with science. As a research community, we must treat contamination not as an afterthought, but as a key variable to control.

So next time your reads won’t align, don’t just tune the aligner. Ask if your cells are telling the truth—or if they're hiding something.

Gupta Lab

Sat, 29 Dec 2018 13:18:31 -0600

Work include (i) understanding the evolutionary relationships among different prokaryotic and eukaryotic organisms; (ii) Understanding the cellular functions of these lineage-specific signature proteins as well as lineage-specific conserved inserts and deletions in important housekeeping proteins by genetic and biochemical studies; (iii) Development of novel diagnostic methods (PCR based and immunological) for identification of different groups of organisms based upon these signature proteins and conserved indels; (iv) The use of these lineage-specific probes with predicitive ability to identify/explore the presence of different groups of organisms in metagenomic sequences from various environments.

https://fhs.mcmaster.ca/gupta-lab/index.html

Rules for Pango Lineage !

Abhi — Tue, 14 Dec 2021 04:40:26 -0600

All the rules to classify a Lineage !

https://www.pango.network/the-pango-nomenclature-system/statement-of-nomenclature-rules/

Address of the bookmark: https://www.pango.network/the-pango-nomenclature-system/statement-of-nomenclature-rules/

PureCN: copy number calling and SNV classification using targeted short read sequencing

Jit — Thu, 09 Aug 2018 04:09:37 -0500

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Author: Markus Riester [aut, cre], Angad P. Singh [aut]

Maintainer: Markus Riester

Citation (from within R, enter citation("PureCN")):

Riester M, Singh A, Brannon A, Yu K, Campbell C, Chiang D, Morrissey M (2016). “PureCN: Copy number calling and SNV classification using targeted short read sequencing.” Source Code for Biology and Medicine, 11, 13. doi: 10.1186/s13029-016-0060-z.

Address of the bookmark: http://bioconductor.org/packages/release/bioc/html/PureCN.html

Metabuli 분리 improves metagenomic read classification

Abhi — Sat, 03 Jun 2023 20:15:04 -0500

Metabuli 분리 improves metagenomic read classification through metamers, DNA-AA k-mers, to be sensitive and specific, recovering 99% and 98% of DNA or AA classifiers.

Metabuli is metagenomic classifier that jointly analyze both DNA and amino acid (AA) sequences. DNA-based classifiers can make specific classifications, exploiting point mutations to distinguish close taxa. AA-based classifiers have higher sensitivity in detecting homology between query and reference sequences, leverageing higher conservation of AA sequences. Metabuli combines the information of both sequence types using a novel k-mer structure, metamer, to enable both specific and sensitive characterization of metagenomic samples. In addition, it can classify reads against a database of any size as long as it fits in the hard disk.

Address of the bookmark: https://github.com/steineggerlab/Metabuli

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

HISAT2: a fast and sensitive alignment program for mapping next-generation sequencing reads

Rahul Nayak — Tue, 08 May 2018 04:27:22 -0500

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).

more at https://ccb.jhu.edu/software/hisat2/index.shtml

Address of the bookmark: https://github.com/infphilo/hisat2

gSearch: a fast and flexible general search tool for whole-genome sequencing

Jit — Mon, 06 Aug 2018 17:19:15 -0500

gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner.

Address of the bookmark: http://ml.ssu.ac.kr/gSearch/index.html

ANItools web: a web tool for fast genome comparison within multiple bacterial strains

Jit — Wed, 14 Nov 2018 04:34:23 -0600

ANItools is a software package written by PERL scripts that can be run in a Linux/Unix system. If you want to compare bacterial genomes and calculate their average nucleotide identity (ANI), you could download and run this program directly. Or you could send us the genome sequence by email. Then we will do the analysis work for you.

https://academic.oup.com/database/article/doi/10.1093/database/baw084/2630454

Address of the bookmark: http://ani.mypathogen.cn/