BOL: Related items

RASTtk : algorithm for building custom annotation pipelines and annotating batches of genomes

Abhi — Wed, 27 Apr 2016 11:07:59 -0500

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

More at http://www.nature.com/articles/srep08365

Address of the bookmark: http://rast.nmpdr.org/

Basic Structure of Snakemake Pipeline Run !

Abhi — Thu, 14 Oct 2021 07:01:38 -0500

/user/snakemake-demo$ ls

config.json data envs scripts slurm-240702.out Snakefile

data = mock data for the snakefile to use
Snakefile = name of the snakemake “formula” file
- Note: The default file that snakemake looks for in the current working directory is the Snakefile. If you would like to override that you can specify it following the -s
  - snakemake -s snakefile.py
envs = directory for storing the conda environments that the workflow will use.
scripts = directory for storing python scripts called by the snakemake formula.
config.json = json format file with extra parameters for our snakemake file to use.
cluster.json = json format file with specification for running on the HPC
samples.txt = file we will use later relating to the config.json file.

Run the snakemake file as a dry run (the example workflow shown above).

This will build a DAG of the jobs to be run without actually executing them.
snakemake --dry-run

User can execute rules of interest.

snakemake --dry-run all VS. snakemake --dry-run call VS. snakemake --dry-run bwa

Run the snakemake file in order to produce an image of the DAG of jobs to be run.

snakemake --dag | dot -Tsvg > dag.svg OR snakemake --dag | dot -Tsvg > dag.svg

Run the snakemake (this time not as a dry run)

snakemake --use-conda

Variant Calling Pipeline

LEGE — Sat, 19 Oct 2024 12:23:40 -0500

The variantcalling.nf nextflow script will take any number of samples with paired-end reads in FASTQ format, map reads using Bowtie2, process BAM files, and finally call variants using BCFtools v1.21 and/or Freebayes v1.3.6. If part of the pipeline is unsuccessful for a sample then these errors are ignored.

Pipeline flowchart:

Dependencies (version tested)

Nextflow (24.04.4)
Java (18.0.2.1)
Python (3.10)
Perl (5.32.1)
Bowtie2 (2.5.3)
SAMtools (1.19.2)
GATK4 (4.5)
BCFtools (1.21)
Freebayes (1.3.6)

Address of the bookmark: https://github.com/Tom-Jenkins/nextflow-pipelines/blob/main/docs/variant-calling.md

HiC-Pro: an optimized and flexible pipeline for Hi-C data processing

Jit — Wed, 06 Dec 2017 01:05:21 -0600

HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to the normalized contact maps. Since version 2.7.0, HiC-Pro supports the main Hi-C protocols, including digestion protocols as well as protocols that do not require restriction enzyme such as DNase Hi-C. In practice, HiC-Pro can be used to process dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or HiChip data.

http://nservant.github.io/HiC-Pro/

Address of the bookmark: http://nservant.github.io/HiC-Pro/

MCAT: Motif Combining and Association Tool

Neel — Sun, 13 Jan 2019 06:27:28 -0600

This is a pipeline for finding motifs in fasta files.
It can be run from the command line as follows:

usage: orange_pipeline_refine.py [-h] [-w W] [--nmotifs NMOTIFS] [--iter ITER] [-c C]
[-s S] [-d] [-ff] [-v V]
positive_seq negative_seq

positional arguments:
positive_seq the fasta file for the positive sequences
negative_seq the fasta file for the negative sequences

Address of the bookmark: https://github.com/yanshen43/MCAT

HaploTypo: a variant-calling pipeline for phased genomes

Jit — Thu, 19 Dec 2019 07:33:40 -0600

An increasing number of phased (i.e. with resolved haplotypes) reference genomes are available. However, most genetic variant calling tools do not explicitly account for haplotype structure. Here, we present HaploTypo, a pipeline tailored to resolve haplotypes in genetic variation analyses. HaploTypo infers the haplotype correspondence for each heterozygous variant called on a phased reference genome.

Availability and Implementation

HaploTypo is implemented in Python 2.7 and Python 3.5, and is freely available at https://github.com/gabaldonlab/haplotypo, and as a Docker image.

Address of the bookmark: https://github.com/gabaldonlab/haplotypo

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Rahul Nayak — Thu, 14 May 2020 15:13:30 -0500

gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733440/

Address of the bookmark: https://github.com/kammoji/gapFinisher

SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins

BioStar — Mon, 17 Aug 2020 05:25:10 -0500

SqueezeMeta is a full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual genomes via binning procedures. Thus, SqueezeMeta features several unique characteristics:

Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome
Co-assembly of a large number of metagenomes via merging of individual metagenomes
Includes binning and bin checking, for retrieving individual genomes
The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.
Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.
Metatranscriptomic support via mapping of cDNA reads against reference metagenomes

Address of the bookmark: https://github.com/jtamames/SqueezeMeta

JUDI: Just Do It

Jit — Mon, 06 Sep 2021 02:44:35 -0500

judi comes from the idea of bringing the power and efficiency of doit to execute any kind of task under many combinations of parameter settings.

https://github.com/ncbi/JUDI

Address of the bookmark: https://github.com/ncbi/JUDI

SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins

BioStar — Sat, 06 Jul 2024 04:29:16 -0500

Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome
Co-assembly of a large number of metagenomes via merging of individual metagenomes
Includes binning and bin checking, for retrieving individual genomes
The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.
Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.
Metatranscriptomic support via mapping of cDNA reads against reference metagenomes

Address of the bookmark: https://github.com/jtamames/SqueezeMeta