BOL: Related items

Wtdbg2: a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore

Neel — Fri, 19 Oct 2018 08:48:43 -0500

Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCONwhile producing contigs of comparable base accuracy.

Address of the bookmark: https://github.com/ruanjue/wtdbg2

Linux Sort Commands for Bioinformatics

Rahul Nayak — Sat, 31 May 2014 15:41:16 -0500

Almost all the scripting languages such as Perl, Python etc have built-in sort, but unfortunately none of them are as flexible as sort command. But one when it come to space efficiency GNU sort stands at the top. It can sort a 20Gb file with less than 2Gb memory. It is not trivial to implement so powerful a sort by yourself.

sort a space-delimited file based on its first column, then the second if the first is the same, and so on:
sort input.txt

sort a huge file (GNU sort ONLY):
sort -S 1500M -t $HOME/tmp input.txt > sorted.txt

sort starting from the third column, skipping the first two columns:
sort +2 input.txt

sort the second column as numbers, descending order; if identical, sort the 3rd as strings, ascending order:
sort -k2,2nr -k3,3 input.txt

sort starting from the 4th character at column 2, as numbers:
sort -k2.4n input.txt

More Linxu sort command information

If you have any sort commands you'd like to share, please add them to our comments section below. For more help, you can also type:

man sort

or

sort --help

on your Unix/Linux system.

pbmm2:A minimap2 frontend for PacBio native data formats

BioStar — Tue, 18 Feb 2020 03:36:22 -0600

pbmm2 is a SMRT C++ wrapper for minimap2's C API. Its purpose is to support native PacBio in- and output, provide sets of recommended parameters, generate sorted output on-the-fly, and postprocess alignments. Sorted output can be used directly for polishing using GenomicConsensus, if BAM has been used as input to pbmm2. Benchmarks show that pbmm2 outperforms BLASR in sequence identity, number of mapped bases, and especially runtime. pbmm2 is the official replacement for BLASR.

Address of the bookmark: https://github.com/PacificBiosciences/pbmm2

LRCstats: Long Read Correction Statistics

Jit — Fri, 05 Jan 2018 04:04:20 -0600

LRCstats is an open-source pipeline for benchmarking DNA long read correction algorithms for long reads outputted by third generation sequencing technology such as machines produced by Pacific Biosciences. The reads produced by third generation sequencing technology, as the name suggests, are longer in length than reads produced by next generation sequencing technologies, such as those produced by Illumina. However, long reads are plagued by high error rates, which can cause issues in downstream analysis. Long read correction algorithms reduce the error rate of long reads either through self-correcting methods or using accurate, short reads outputted by next generation sequencing technologies to correct long reads.

Of course, some long read correction algorithms are better than others, and developers of long read correction algorithms will wish to compare their algorithm with others currently available. LRCstats benchmarks long read correction algorithms using long reads produced by simulators (such as SimLoRD or PBSim) where the two-way alignments between the uncorrected long reads (uLR) and the corresponding sequences in the reference genome (Ref) are given in some sort of alignment file and then aligning the corrected long reads (cLR) to the Ref-uLR two-way alignments to create three-way alignments using a dynamic programming algorithm. Statistics on these three-way alignments are then collected, such as the overall error rates of the corrected long reads.

https://www.healthcare.uiowa.edu/labs/au/LSC/

Address of the bookmark: https://github.com/cchauve/lrcstats

Postdoc position at Centre Méditerranéen de Médecine Moléculaire - Nice - France

Wed, 04 Jun 2014 07:20:57 -0500

The research group of Dr. Michele Trabucchi at the Centre Méditerranéen de Médecine Moléculaire (C3M) at INSERM U1065 (University of Nice Sophia-Antipolis, France) is seeking candidates for a Postdoctoral fellow position to start on October 2014 for 3 years funded by FRM (Fondation pour la Recherche Médicale).
The broad interest of the lab is in understanding the expression control and function of small RNAs in activated myeloid cells (visit our webpage to check research interests and publications of the group : http://www.unice.fr/c3m/EN/Equipe10.html ).

The work will focus on the functional studies of small RNAs by using next-generation sequencing approaches.

Candidates should hold a Ph.D. degree and have strong background in bioinformatics.
The University of Nice Sophia-Antipolis provides a wide range of facilities and training essential for biomedical research.

Interested applicants should send a PDF with a cover letter stating research interests and qualifications, an updated CV, a summary of previous research experience and contact information for two references to Michele Trabucchi ( mtrabucchi@unice.fr )

Homepage: http://www.unice.fr/c3m/EN/Equipe10.html

rHAT: a seed-and-extension-based noisy long read alignment tool

Abhimanyu Singh — Sun, 23 Sep 2018 05:12:22 -0500

rHAT is a seed-and-extension-based noisy long read alignment tool. It is suitable for aligning 3rd generation sequencing reads which are in large read length with relatively high error rate, especially Pacbio's Single Molecule Read-time (SMRT) sequencing reads.

Address of the bookmark: https://github.com/dfguan/rHAT

Ten recommendations for creating usable bioinformatics command line software

RAJESH DETROJA — Sun, 08 Jun 2014 10:06:26 -0500

Bioinformatics software varies greatly in quality. In terms of usability, the command line interface is the first experience a user will have of a tool. Unfortunately, this is often also the last time a tool will be used. Here I present ten recommendations for command line software author’s tools to follow, which I believe would greatly improve the uptake and usability of their products, waste less user’s time, and improve the quality of scientific analyses.

Address of the bookmark: http://www.gigasciencejournal.com/content/2/1/15?utm_content=buffer25ee0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

R 3.5.0 has been Released!

Jit — Thu, 26 Apr 2018 11:31:58 -0500

The latest version of R is a major release! It comes with a ton of new features, including performance and speed improvements
All R packages will now be byte-compiled, hence boosting packages installed from GitHub
You may need to re-install all previously installed R packages; old scripts however will continue to work normally

More at https://cran.r-project.org/doc/manuals/r-release/NEWS.html

Bioinformatics algorithms tutorials

John Parker — Tue, 24 Jun 2014 00:10:45 -0500

Useful bioinformatics tutorial, such as

De Bruijn Graphs for NGS Assembly
Algorithms for PacBio Reads
Software and Hardware Concepts for Bioinformatics
Finding us in Homolog.us (Search Algorithms)
NGS Genome and RNAseq Assembly - a Hands on Primer
Introduction to PERL, Python, R and C/C++ for Bioinformatics

Address of the bookmark: http://www.homolog.us/Tutorials/

Genome Simulation with SLiM and msprime

BioStar — Fri, 31 Jan 2025 12:47:43 -0600

Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are SLiM and msprime. While both serve different purposes, they can be used together with the slendr framework to compare simulation outputs effectively.

Overview of SLiM and msprime

SLiM: Forward Genetic Simulator

SLiM is a free, open-source tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.

Key Features of SLiM:

Simulates population evolution forward in time
Supports custom evolutionary models using an embedded scripting language
Allows modeling of spatial and ecological dynamics
Provides high flexibility and extensibility for user-defined scenarios
Available on GitHub as an open-source project

msprime: Ancestry and Mutation Simulator

msprime is an efficient, open-source tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.

Key Features of msprime:

Implements coalescent simulations for ancestry modeling
Efficiently simulates large population histories
Supports the addition of mutations to genealogies
Developed using an open-source community model
Often faster and more memory-efficient than alternative simulators

Using SLiM and msprime with slendr

Both SLiM and msprime can be integrated with slendr, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.

How They Work Together:

SLiM and msprime simulations can be analyzed within slendr.
The ts_read() function in slendr enables loading and comparing tree sequence outputs from both simulators.
This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.

Performance Considerations

While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its speed and memory efficiency when simulating ancestry and mutations. The choice between the two depends on the research goals:

For detailed evolutionary modeling with selection and recombination: Use SLiM.
For large-scale coalescent simulations with mutations: Use msprime.
For comparing different simulation models and their outputs: Use slendr to integrate SLiM and msprime results.

Conclusion

SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.

For more information, check out the official GitHub repositories for SLiM and msprime, and explore the slendr framework for streamlined simulation workflow