BOL: Related items

Flye: Fast and accurate de novo assembler for single molecule sequencing reads

Jit — Fri, 04 May 2018 19:16:22 -0500

Flye is a de novo assembler for long and noisy reads, such as those produced by PacBio and Oxford Nanopore Technologies. The algorithm uses an A-Bruijn graph to find the overlaps between reads and does not require them to be error-corrected. After the initial assembly, Flye performs an extra repeat classification and analysis step to improve the structural accuracy of the resulting sequence. The package also includes a polisher module, which produces the final assembly of high nucleotide-level quality.

Address of the bookmark: https://github.com/fenderglass/Flye

HALC: High throughput algorithm for long read error correction

Jit — Fri, 08 Jun 2018 10:47:41 -0500

HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach) HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.

Address of the bookmark: https://github.com/lanl001/halc

Now time is come to revolutionize amino acid sequencing by Nanopore technology

Rahul Agarwal — Mon, 07 Apr 2014 08:01:11 -0500

Amino acid sequencing by Nanopore recognition tunneling method

Address of the bookmark: http://www.eurekalert.org/multimedia/pub/71198.php

1mb long DNA with Nanopore technology

Jit — Tue, 19 Dec 2017 18:49:28 -0600

The first continuous DNA read of more than a million bases (>1Mb) has been achieved, using Oxford Nanopore sequencing technology. Congratulations to Martin Smith and collaborators! Read more: http://bit.ly/2j5TNCO

nanofilt: Filtering and trimming of long read sequencing data

Jit — Mon, 30 Jul 2018 12:01:52 -0500

Filtering on quality and/or read length, and optional trimming after passing filters.
Reads from stdin, writes to stdout.

Intended to be used:

directly after fastq extraction
prior to mapping
in a stream between extraction and mapping

https://github.com/wdecoster/nanofilt

Address of the bookmark: https://github.com/wdecoster/nanofilt

Interview Mark Sansom (U. Oxford): Simulations of Membrane Proteins

Mon, 07 Oct 2013 14:34:13 -0500

Workshop in Bioinformatics, 4/June/2012 Campus Vida's Research Centers organize in Santiago de Compostela the Workshop in Bioinformatics. This event addressed issues such as structural bioinformatics, biological modelling and mining bioinformatics data. Professor Mark Sansom (University of Oxford), belonging to the Structural Bioinformatics and Computational Biochemistr Unit, opened the sessions with the lecture "Multiscale Simulations of Membrane Proteins: Lipid Interactions and Signalling".

Postdoctoral Fellowship at Department of Psychiatry, Warneford Hospital, Oxford

Tue, 01 Sep 2015 05:24:49 -0500

Applications are invited for a postdoctoral research assistant to work in the Translational Neuroscience and Dementia Research Group (TNDRG) on a project using informatics approaches to understand and prevent dementia, specifically on the role of the immune system in Alzheimer’s. The post is for a fixed-term duration of 1 year.

Working with other members of the TNDRG you will analyse complex genomic and epidemiological datasets, evaluating which computational tools are most suitable. You will contribute to the generation of innovative tools for linking epidemiological and multilevel omics datasets, ensuring that computer programs are written in a form that other collaborators can use and expand.

You will have or be close to completion of a PhD in either: bioinformatics; neuroscience; machine learning; statistics; epidemiology; neurology; or other relevant field. You will have experience programming on either R, Matlab, Python, C++, Java or any other imperative, object oriented or functional language.

Please direct Informal enquiries to Dr Alejo Nevado-Holgado (alejo.nevado-holgado@psych.ox.ac.uk).

You will be required to upload a supporting statement explaining how you meet the selection criteria for the post, a CV, and details of two referees as part of your online application.

The closing date for applications is 12.00 midday on 2 September 2015. Interviews will be held on Tuesday 15 September 2015.

https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=118696

Maq: Mapping and Assembly with Quality

Jit — Tue, 22 Nov 2016 04:51:39 -0600

Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. Maq is a project hosted by SourceForge.net. The project page is available athttp://sourceforge.net/projects/maq/. Maq is previously known as mapass2.

Run Maq Now

Follow these steps to try Maq. All you need is a reference sequence file in the FASTA format.

Prepare a reference sequence (ref.fasta). Better a bacterial genome.
Download maq, maq-data and maqview at the download page.
Copy maq, maq.pl and maq_eval.pl to the $PATH or to the same directory.
Simulate diploid reference and read sequences, map reads, call variants and evaluate the results in one go:
```
maq.pl demo ref.fasta calib-30.dat
```
where calib-30.dat is contained in maq-data.

View the alignment:

cd maqdemo/easyrun;
maqindex -i -c consensus.cns all.map;
maqview -c consensus.cns all.map

Even for advanced maq users, running `maq.pl demo' is recommended. You may find something helpful.

Address of the bookmark: http://maq.sourceforge.net

The "Ifs" and "Buts" of NGS Quality Control and Trimming

BioStar — Thu, 02 Jan 2025 20:11:07 -0600

Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.

The "Ifs" of NGS QC and Trimming

Ensures Data Integrity
If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.
Removes Contaminants
If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.
Improves Mapping and Assembly
If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.
Reduces Computational Load
If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.
Prepares for Standardized Analyses
If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.

The "Buts" of NGS QC and Trimming

Risk of Over-Trimming
But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.
Bias Introduction
But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.
Loss of Context in Paired-End Reads
But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.
Time and Resource Intensive
But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.
Variable Standards
But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.

Balancing the "Ifs" and "Buts"

To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:

Use QC Tools Wisely: Start with tools like FastQC to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.
Choose Reliable Trimming Tools: Tools like Trimmomatic, Cutadapt, and BBduk offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.
Set Reasonable Parameters: Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.
Test Downstream Effects: Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.
Document Your Workflow: Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.

Conclusion

NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.

Indexcov: fast coverage quality control for whole-genome sequencing

Jit — Wed, 29 Aug 2018 09:20:46 -0500

indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license.

Address of the bookmark: https://github.com/brentp/goleft