BOL: Related items

Basic Structure of Snakemake Pipeline Run !

Abhi — Thu, 14 Oct 2021 07:01:38 -0500

/user/snakemake-demo$ ls

config.json data envs scripts slurm-240702.out Snakefile

data = mock data for the snakefile to use
Snakefile = name of the snakemake “formula” file
- Note: The default file that snakemake looks for in the current working directory is the Snakefile. If you would like to override that you can specify it following the -s
  - snakemake -s snakefile.py
envs = directory for storing the conda environments that the workflow will use.
scripts = directory for storing python scripts called by the snakemake formula.
config.json = json format file with extra parameters for our snakemake file to use.
cluster.json = json format file with specification for running on the HPC
samples.txt = file we will use later relating to the config.json file.

Run the snakemake file as a dry run (the example workflow shown above).

This will build a DAG of the jobs to be run without actually executing them.
snakemake --dry-run

User can execute rules of interest.

snakemake --dry-run all VS. snakemake --dry-run call VS. snakemake --dry-run bwa

Run the snakemake file in order to produce an image of the DAG of jobs to be run.

snakemake --dag | dot -Tsvg > dag.svg OR snakemake --dag | dot -Tsvg > dag.svg

Run the snakemake (this time not as a dry run)

snakemake --use-conda

YAMP: Yet Another Metagenomic Pipeline

BioStar — Sat, 06 Jul 2024 04:26:00 -0500

YAMP is constructed on Nextflow, a framework based on the dataflow programming model, which allows writing workflows that are highly parallel, easily portable (including on distributed systems), and very flexible and customisable, characteristics which have been inherited by YAMP. New modules can be added easily and the existing ones can be customised -- even though we have already provided default parameters deriving from our own experience.

Address of the bookmark: https://github.com/alesssia/YAMP

Taxoblast : Taxoblast is a pipeline to identify contamination in genomic sequence

Jit — Thu, 23 Nov 2017 08:37:15 -0600

Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searches with short sequence fragments. They revealed a number of probable bacterial contaminations as well as hybrid scaffolds that contain both bacterial and algal sequences. This or similar types of analysis, in combination with manual curation, may thus constitute a useful complement to standard bioinformatics analyses prior to submission of genomic data to public repositories. Our analysis pipeline is open-source and freely available at http://sdittami.altervista.org/taxoblast and via SourceForge (https://sourceforge.net/projects/taxoblast).

Address of the bookmark: https://sourceforge.net/projects/taxoblast/files/

MetaPlotR: a Perl/R pipeline for plotting metagenes of nucleotide modifications and other transcriptomic sites

Neel — Mon, 05 Nov 2018 08:12:45 -0600

An increasing number of studies are mapping protein binding and nucleotide modifications sites throughout the transcriptome. Often, these sites cluster in certain regions of the transcript, giving clues to their function. Hence, it is informative to summarize where in the transcript these sites occur. A metagene is a simple and effective tool for visualizing the distribution of sites along a simplified transcript model. In this work, we introduce MetaPlotR, a Perl/R pipeline for creating metagene plots.

Address of the bookmark: https://github.com/olarerin/metaPlotR

HaploTypo: a variant-calling pipeline for phased genomes

Jit — Thu, 19 Dec 2019 07:33:40 -0600

An increasing number of phased (i.e. with resolved haplotypes) reference genomes are available. However, most genetic variant calling tools do not explicitly account for haplotype structure. Here, we present HaploTypo, a pipeline tailored to resolve haplotypes in genetic variation analyses. HaploTypo infers the haplotype correspondence for each heterozygous variant called on a phased reference genome.

Availability and Implementation

HaploTypo is implemented in Python 2.7 and Python 3.5, and is freely available at https://github.com/gabaldonlab/haplotypo, and as a Docker image.

Address of the bookmark: https://github.com/gabaldonlab/haplotypo

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Rahul Nayak — Thu, 14 May 2020 15:13:30 -0500

gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6733440/

Address of the bookmark: https://github.com/kammoji/gapFinisher

SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins

BioStar — Mon, 17 Aug 2020 05:25:10 -0500

SqueezeMeta is a full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual genomes via binning procedures. Thus, SqueezeMeta features several unique characteristics:

Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome
Co-assembly of a large number of metagenomes via merging of individual metagenomes
Includes binning and bin checking, for retrieving individual genomes
The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.
Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.
Metatranscriptomic support via mapping of cDNA reads against reference metagenomes

Address of the bookmark: https://github.com/jtamames/SqueezeMeta

LncPipe:A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets

LEGE — Fri, 17 Sep 2021 01:57:02 -0500

The pipeline was developed based on a popular workflow framework Nextflow, composed of four core procedures including reads alignment, assembly, identification and quantification. It contains various unique features such as well-designed lncRNAs annotation strategy, optimized calculating efficiency, diversified classification and interactive analysis report. LncPipe allows users additional control in interuppting the pipeline, resetting parameters from command line, modifying main script directly and resume analysis from previous checkpoint.

Ref https://www.lncrnablog.com/lncpipe-a-nextflow-based-pipeline-for-identification-and-analysis-of-long-non-coding-rnas-from-rna-seq-data/

Address of the bookmark: https://github.com/likelet/LncPipe

SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins

BioStar — Sat, 06 Jul 2024 04:29:16 -0500

Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome
Co-assembly of a large number of metagenomes via merging of individual metagenomes
Includes binning and bin checking, for retrieving individual genomes
The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.
Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.
Metatranscriptomic support via mapping of cDNA reads against reference metagenomes

Address of the bookmark: https://github.com/jtamames/SqueezeMeta

McClintock: Meta-pipeline to identify transposable element insertions using next generation sequencing data

BioStar — Tue, 27 Oct 2020 00:21:18 -0500

an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae.

Address of the bookmark: https://github.com/bergmanlab/mcclintock