BOL: Related items

Tools to access the quality of your assembled genome !

LEGE — Thu, 08 Aug 2024 23:31:18 -0500

FASTA VALIDATOR + SEQKIT RMDUP: FASTA validation
GENOMETOOLS GT GFF3VALIDATOR: GFF3 validation
ASSEMBLATHON STATS: Assembly statistics
GENOMETOOLS GT STAT: Annotation statistics
NCBI FCS ADAPTOR: Adaptor contamination pass/fail
NCBI FCS GX: Foreign organism contamination pass/fail
BUSCO: Gene-space completeness estimation
TIDK: Telomere repeat identification
LAI: Continuity of repetitive sequences
KRAKEN2: Taxonomy classification
HIC CONTACT MAP: Alignment and visualisation of HiC data
MUMMER → CIRCOS + DOTPLOT & MINIMAP2 → PLOTSR: Synteny analysis
MERQURY: K-mer completeness, consensus quality and phasing assessment

Early Genome Screening: The New Health Horoscope!

LEGE — Thu, 02 Jan 2025 19:44:36 -0600

In an era where precision medicine is reshaping healthcare, genome screening is emerging as the modern equivalent of a health horoscope. It offers insights into our biological "stars," unraveling predispositions to various conditions and empowering individuals with knowledge to navigate their health journeys proactively. But how reliable is this "horoscope," and how does it impact our lives?

Understanding Genome Screening

Genome screening involves analyzing an individual's DNA to identify genetic variations that may influence health and disease susceptibility. This can range from simple single-gene tests to comprehensive whole-genome sequencing. By peering into our genetic blueprint, we can uncover risks for conditions like cancer, diabetes, cardiovascular diseases, and even rare genetic disorders.

The process is straightforward: a saliva or blood sample is collected, and advanced sequencing technologies decipher the genetic code. The results provide a personalized health map, guiding lifestyle modifications, preventive measures, or medical interventions.

A Shift from Reactive to Proactive Healthcare

Traditional healthcare often focuses on treating diseases after they manifest. Genome screening flips this model on its head, enabling a shift toward prevention and early intervention. For instance:

Cancer Risk Management: Individuals with BRCA1 or BRCA2 gene mutations can opt for enhanced screening programs or preventive surgeries to mitigate their risk of breast and ovarian cancers.
Cardiovascular Health: Genetic predispositions to conditions like familial hypercholesterolemia can prompt early cholesterol monitoring and lifestyle adjustments.
Rare Diseases: Identifying carriers of genetic disorders can aid in family planning and reduce the incidence of inherited conditions.

The Ethical and Practical Concerns

While genome screening offers incredible promise, it is not without challenges:

Accuracy and Interpretation: Genetic predisposition does not guarantee disease. Misinterpretation of results can lead to unnecessary anxiety or unwarranted medical interventions.
Privacy and Data Security: Genetic data is highly sensitive. Ensuring robust data protection measures is crucial to prevent misuse.
Accessibility and Equity: High costs and limited availability may restrict access to genome screening, exacerbating health disparities.

Balancing Science and Pseudoscience

The comparison of genome screening to horoscopes isn’t entirely unfounded. Both offer predictive insights, but the scientific foundation of genome screening distinguishes it from astrology. Unlike the alignment of celestial bodies, genetic predictions are based on rigorous data and evidence. However, the probabilistic nature of genetic predispositions underscores the importance of interpreting results in conjunction with clinical and lifestyle factors.

The Road Ahead

As genome screening becomes more affordable and integrated into routine healthcare, its potential to transform lives is immense. Policymakers, healthcare providers, and genetic counselors must collaborate to ensure ethical implementation, public awareness, and equitable access.

Imagine a future where your genetic "horoscope" is a trusted guide, not just a prediction. Early genome screening could help chart a healthier path for generations, making it a cornerstone of personalized medicine. After all, our genes might just hold the key to unlocking a future of better health and well-being.

HiTE: a fast and accurate dynamic boundary adjustment approach for full-length Transposable Elements detection and annotation in Genome Assemblies

LEGE — Sat, 20 Sep 2025 09:34:04 -0500

HiTE is a Python software that uses a dynamic boundary adjustment approach to detect and annotate full-length Transposable Elements in Genome Assemblies. In comparison to other tools, HiTE demonstrates superior performance in detecting a greater number of full-length TEs.

panHiTE

We have developed panHiTE, a comprehensive and accurate pipeline for TE detection in large-scale population genomes. It has been successfully applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.

For detailed instructions, please refer to the panHiTE tutorial.

Address of the bookmark: https://github.com/CSU-KangHu/HiTE

IONiseR: tools for the quality assessment of data produced by Oxford Nanopore’s MinION sequencer

Jit — Thu, 23 Nov 2017 10:24:19 -0600

This package is intended to provide tools for the quality assessment of data produced by Oxford Nanopore’s MinION sequencer. It includes a functions to generate a number plots for examining the statistics that we think will be useful for this task.

However, nanopore sequencing is an emerging and rapidly developing technology. It is not clear what will be most informative. We hope that IONiseR will provide a framework for visualisation of metrics that we haven’t thought of, and welcome feedback at mike.smith@embl.de.

If you’re not interested in the quality assement of the raw or event level data, and want to jump straight to the getting FASTQ format files from fast5 files you can go straight to the final section of this document.

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/IONiseR/inst/doc/IONiseR.html

Deepbinner: a signal-level demultiplexer for Oxford Nanopore reads

Neel — Tue, 27 Nov 2018 03:38:49 -0600

Deepbinner is a tool for demultiplexing barcoded Oxford Nanopore sequencing reads. It does this with a deep convolutional neural network classifier, using many of the architectural advances that have proven successful in image classification. Unlike other demultiplexers (e.g. Albacore and Porechop), Deepbinner identifies barcodes from the raw signal (a.k.a. squiggle) which gives it greater sensitivity and fewer unclassified reads.

Reasons to use Deepbinner:
- To minimise the number of unclassified reads (use Deepbinner by itself).
- To minimise the number of misclassified reads (use Deepbinner in conjunction with Albacore demultiplexing).
- You plan on running signal-level downstream analyses, like Nanopolish. Deepbinner can demultiplex the fast5 fileswhich makes this easier.
Reasons to not use Deepbinner:
- You only have basecalled reads not the raw fast5 files (which Deepbinner requires).
- You have a small/slow computer. Deepbinner is more computationally intensive than Porechop.
- You used a sequencing/barcoding kit other than the ones Deepbinner was trained on.

Address of the bookmark: https://github.com/rrwick/Deepbinner

Now time is come to revolutionize amino acid sequencing by Nanopore technology

Rahul Agarwal — Mon, 07 Apr 2014 08:01:11 -0500

Amino acid sequencing by Nanopore recognition tunneling method

Address of the bookmark: http://www.eurekalert.org/multimedia/pub/71198.php

LoRDEC: a hybrid error correction program for long, PacBio reads

Jit — Mon, 10 Apr 2017 04:16:09 -0500

LoRDEC is a program to correct sequencing errors in long reads from 3rd generation sequencing with high error rate, and is especially intended for PacBio reads. It uses a hybrid strategy, meaning that it uses two sets of reads: the reference read set, whose error rate is assumed to be small, and the PacBio read set, which is then corrected using the reference set. Typically, the reference set contains Illumina reads.

Usually, errors in PacBio reads include many insertions and deletions, and comparatively less substitutions. LoRDEC can correct errors of all these types.
After correction, a larger portion of the sequence of PacBio reads is usable for detection of region of similarity with other sequences, for aligning them to the contigs of an assembly, etc.

Why is LoRDEC different?

It is efficient and can process large read data sets, included from eukaryotic or vertebrate species, on a usual computing server, and even works on desktop/laptop computers.
It adopts a novel graph based approach: it builds a succinct De Bruijn Graph (DBG) representing the short reads, and seeks a corrective sequence for each erroneous region of a long read by traversing chosen paths in the graph.

Address of the bookmark: http://www.atgc-montpellier.fr/lordec/

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Jit — Mon, 20 Aug 2018 14:14:11 -0500

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several studies require long and accurate reads including de novo assembly, fusion and structural variation detection. In such cases researchers often combine both technologies and the more erroneous long reads are corrected using the short reads. Current approaches rely on various graph based alignment techniques and do not take the error profile of the underlying technology into account. Memory- and time- efficient machine learning algorithms that address these shortcomings have the potential to achieve better and more accurate integration of these two technologies. Results: We designed and developed Hercules, the first machine learning-based long read error correction algorithm. The algorithm models every long read as a profile Hidden Markov Model with respect to the underlying platformtextquoterights error profile. The algorithm learns a posterior transition/emission probability distribution for each long read and uses this to correct errors in these reads. Using datasets from two DNA-seq BAC clones (CH17-157L1 and CH17-227A2), and human brain cerebellum polyA RNA-seq, we show that Hercules-corrected reads have the highest mapping rate among all competing algorithms and highest accuracy when most of the basepairs of a long read are covered with short reads. Availability:

Hercules source code is available at https://github.com/BilkentCompGen/Hercules

Address of the bookmark: https://github.com/BilkentCompGen/Hercules

pbmm2:A minimap2 frontend for PacBio native data formats

BioStar — Tue, 18 Feb 2020 03:36:22 -0600

pbmm2 is a SMRT C++ wrapper for minimap2's C API. Its purpose is to support native PacBio in- and output, provide sets of recommended parameters, generate sorted output on-the-fly, and postprocess alignments. Sorted output can be used directly for polishing using GenomicConsensus, if BAM has been used as input to pbmm2. Benchmarks show that pbmm2 outperforms BLASR in sequence identity, number of mapped bases, and especially runtime. pbmm2 is the official replacement for BLASR.

Address of the bookmark: https://github.com/PacificBiosciences/pbmm2

Understating pacbio reads name !

BioJoker — Fri, 23 Nov 2018 07:36:46 -0600

m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230 0.99 24
└1┘└─────2─────┘└──3─┘└────────────────4────────────────┘└5┘└6┘└7┘└────8────┘└─9─┘└10┘

"m" = movie
Time of Run Start (yymmdd_hhmmss)
Instrument Serial Number
SMRT Cell Barcode
Set Number (a.k.a. "Look Number". Deprecated field, used in earlier version of RS)
Part Number (usually "p0", "X0" when using expired reagents)
ZMW hole number
Subread Region (start_stop using polymerase read coordinates)
readScore
barcodeScore