BOL: Rahul Nayak's bookmarks

Snakemake Tutorials !

Rahul Nayak — Mon, 09 May 2022 05:20:41 -0500

A lesson introducing the Snakemake workflow system for bioinformatics analysis.

Prerequisites

This is an intermediate lesson and assumes learners have already done some bioinformatics:

Familiarity with the BASH command shell, including concepts like pipes, variables and loops.

Knowledge of bioinformatics fundamentals like the FASTQ file format and transcriptome sequencing, in order to understand the example workflow.

No previous knowledge of Snakemake or workflow systems is required.

https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/index.html

Address of the bookmark: https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/aio/index.html

MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (v4) to detect variants.

Rahul Nayak — Wed, 27 Apr 2022 04:34:12 -0500

MUM&Co is able to detect:
Deletions, insertions, tandem duplications and tandem contractions (>=50bp & <=150kb)
Inversions (>=1kb) and translocations (>=10kb)

Address of the bookmark: https://github.com/SAMtoBAM/MUMandCo

PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index

Rahul Nayak — Thu, 21 Apr 2022 05:41:39 -0500

PuffAligner, a fast, accurate and versatile aligner built on top of the Pufferfish index. PuffAligner is able to produce highly sensitive alignments, similar to those of Bowtie2, but much more quickly. While exhibiting similar speed to the ultrafast STAR aligner, PuffAligner requires considerably less memory to construct its index and align reads. PuffAligner strikes a desirable balance with respect to the time, space and accuracy tradeoffs made by different alignment tools and provides a promising foundation on which to test new alignment ideas over large collections of sequences.

Address of the bookmark: https://github.com/COMBINE-lab/pufferfish/tree/cigar-strings

Understanding HiFi Reads !

Rahul Nayak — Thu, 24 Mar 2022 19:48:11 -0500

While little public data is available for either of the new synthetic long read approaches, Illumina showed an example comparison earlier this year at the Festival of Genomics & Biodata conference (FoG 2022). In the IGV screenshot presented (below), synthetic Infinity reads – labeled “Longas” – are at the top, followed by standard Illumina short reads, and PacBio HiFi reads labeled “CCS” depicted at the bottom:

Address of the bookmark: http://pacb.com/blog/the-hifi-difference-true-long-reads-vs-synthetic-long-reads/

Tiara: deep learning-based classification system for eukaryotic sequences

Rahul Nayak — Mon, 14 Mar 2022 23:02:11 -0500

With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity.

Address of the bookmark: https://academic.oup.com/bioinformatics/article/38/2/344/6375939

kebabs: package provides functionality for kernel based analysis of biological sequences via Support Vector Machine (SVM) based methods

Rahul Nayak — Fri, 04 Mar 2022 00:14:11 -0600

The kebabs package provides functionality for kernel based analysis of biological sequences via Support Vector Machine (SVM) based methods. Biological sequences include DNA, RNA, and amino acid (AA) sequences. Sequence kernels define similarity measures between sequences. The package implements some of the most important kernels for sequence analysis in a very flexible and efficient way and extends the standard position-independent functionality of these kernels in a novel way to take the position of patterns in the sequences into account for the similarity measure.

http://www.bioinf.jku.at/software/kebabs/

http://bioconductor.org/packages/release/bioc/vignettes/kebabs/inst/doc/kebabs.pdf

Address of the bookmark: http://www.bioinf.jku.at/software/kebabs/

SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files

Rahul Nayak — Tue, 01 Mar 2022 03:13:33 -0600

A general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. SeqFu is available for Linux and MacOS.

A compiled program delivering high performance analyses
Supports FASTA/FASTQ files, also Gzip compressed
A growing collection of handy utilities, also for quick inspection of the datasets

Can be easily installed via conda:

conda install -c bioconda seqfu

Address of the bookmark: https://telatin.github.io/seqfu2/

Comparative Genomics Workshops !

Rahul Nayak — Tue, 25 Jan 2022 20:39:58 -0600

This meeting's objective was to obtain a big picture look at the current state of the field of comparative genomics with a focus on commonalities across genomic investigations into humans, model organisms (both traditional and non-traditional), agricultural species, wildlife species and microbes.

https://www.genome.gov/event-calendar/perspectives-in-comparative-genomics-and-evolution

Address of the bookmark: https://www.genome.gov/event-calendar/perspectives-in-comparative-genomics-and-evolution

CrossMap: program for genome coordinates conversion between different assemblies

Rahul Nayak — Tue, 25 Jan 2022 17:59:32 -0600

CrossMap is a program for genome coordinates conversion between different assemblies (such as hg18 (NCBI36) <=> hg19 (GRCh37)). It supports commonly used file formats including BAM, CRAM, SAM, Wiggle, BigWig, BED, GFF, GTF, MAF VCF, and gVCF.

Address of the bookmark: http://crossmap.sourceforge.net/

HIV genome database !

Rahul Nayak — Fri, 21 Jan 2022 05:40:15 -0600

HIV resources

https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Address of the bookmark: https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html