BOL: Related items

FiNGS: Filters for Next Generation Sequencing

Neel — Sat, 27 Feb 2021 01:18:35 -0600

Key features

Filters SNVs from any variant caller to remove false positives
Calculates metrics based on BAM files and provides filtering not possible with other tools
Fully user-configurable filtering (including which filters to use and their thresholds)
Option to use filters identical to ICGC recommendations

FiNGS provides researchers with a tool to reproducibly filter somatic variants that is simple to both deploy and use, with filters and thresholds that are fully configurable by the user. It ingests and emits standard variant call format (VCF) files and will slot into existing sequencing pipelines. It allows users to develop and implement their own filtering strategies and simple sharing of these with others.

FiNGS reliably improves upon the precision of default variant caller outputs and performs better than other tools designed for the same task.

Address of the bookmark: https://github.com/cpwardell/FiNGS

Libraries or management tools for high throughput sequencing data

LEGE — Fri, 04 Oct 2024 02:45:06 -0500

GATB Library. The Genome Analysis Toolbox with de-Bruijn graph. A large part of tools developed by the GenScale team are based on this library.
These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge amount of reads data coming from any kind of organisms such as bacteria, plants, animals and even complex samples (e.g. metagenomes). Among them are (the full is available here: https://gatb.inria.fr/software/):
LRez: C++ Library and toolkit for the barcode-based management and indexation of linked-read datasets.

Variant calling and/or genotyping

DiscoSNP++ and discoSnpRAD: Reference-free small variant discovery (SNPs and indels)
MindTheGap: Detection and assembly of large insertion variants
TakeABreak: reference-free inversion discovery tool
SVJedi: Structural Variant genotyper with long read data
SVJedi-graph: Structural Variant genotyper with long read data using a variation graph

Sequence assembly

MinYS: reference-guided genome assembly in metagenomics data
MTG-link: local assembly tool for linked-read data
Minia: De novo short read assembler
de-novo pipeline: de-novo assembly pipeline (error correction / contigs / scaffolding) for genomes and meta-genomes
Mapsembler2: Targeted assembly (not maintained)

Managing k-mers & indexation

findere: simple strategy for speeding up queries and for reducing false positive calls from any Approximate Membership Query data structure.
- fimpera extends findere adding the abundance information.
kmtricks: modular tool suite for counting kmers, and constructing Bloom filters or kmer matrices, for large collections of sequencing data.
kmindex is a tool for indexing and querying sequencing samples. It is built on top of kmtricks.
back to sequences: Find sequences (reads, unitigs, genes) related to a set of kmers in large datasets, in a matter of seconds.
Backpack Quotient Filter: k-mer indexing data structure with abundance
short read connector: Detect similar reads from potentially large read set
DSK: Count K-mer in sequences

Pangenome graph manipulation

Pancat: Pangenome Comparison and Analysis Toolkit
GFAGraphs: a Python library to handle pangenome graph files in GFA format.

Comparative metagenomics with k-mers

Simka and SimkaMin: Comparative metagenomics for large-scale datasets
Comparead & Commet: comparison of metagenomic datasets

Species and bacterial strains identification

ORI: software using long nanopore reads to identify bacteria present in a sample at the strain level
StrainFLAIR: STRAIN-level proFiLing using vArIation gRaph

General-purpose sequencing data manipulation

GASSST: long read mapper
Leon: short read compressor (now included in GATB-core)
Bloocoo: short read corrector
BCALM: Construct compacted de Bruijn graphs (unitigs)

Protein Structure

A_Purva: Contact Map Overlap solver
MD-Jeep: Distance Geometry solver
CSA: Comparative Structural Alignment

Workflow

SLICEE: parallel execution of bioinformatics workflows

Comparative Genomics

CASSIS: detection of rearrangement breakpoints
PLAST: intensive bank-to-bank sequence comparison
DRJBreakpointFinder: detection and precise localization of excision sites in proviral segments

Qualimap2: Evaluating next generation sequencing alignment data

Jit — Tue, 11 Sep 2018 04:44:29 -0500

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Supported types of experiments include:

Whole-genome sequencing
Whole-exome sequencing
RNA-seq (speical mode available)
ChIP-seq

Address of the bookmark: http://qualimap.bioinfo.cipf.es/

jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator

Abhimanyu Singh — Fri, 26 Jul 2019 00:58:12 -0500

jackalope simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. All outputs can be written to standard file formats.

A swift, versatile phylogenomic and high-throughput sequencing simulator https://jackalope.lucasnell.com

Address of the bookmark: https://github.com/lucasnell/jackalope

genomics public data links !

Jit — Thu, 13 Feb 2020 00:20:00 -0600

List of publically available databases on google server.

More at https://software.broadinstitute.org/gatk/download/bundle

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/GATK/.

ftp://ftp.broadinstitute.org/bundle/hg38/hg38bundle/

Address of the bookmark: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1

Unicycler: Hybrid assembly pipeline for bacterial genomes

Jit — Fri, 10 Nov 2017 03:58:27 -0600

Unicycler is an assembly pipeline for bacterial genomes. It can assemble Illumina-only read sets where it functions as a SPAdes-optimiser. It can also assembly long-read-only sets (PacBio or Nanopore) where it runs a miniasm+Racon pipeline. For the best possible assemblies, give it both Illumina reads and long reads, and it will conduct a hybrid assembly.

Address of the bookmark: https://github.com/rrwick/Unicycler

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

Jit — Tue, 15 May 2018 07:35:26 -0500

HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy. We found that previously described haplotype assembly methods are specialized for specific read technologies or protocols, with slow or inaccurate performance on others. With this in mind, HapCUT2 is designed for speed and accuracy across diverse sequencing technologies, including but not limited to: NGS short reads (Illumina HiSeq) clone-based sequencing (Fosmid or BAC clones) SMRT reads (PacBio) Oxford Nanopore reads 10X Genomics Linked-Reads proximity-ligation (Hi-C) reads high-coverage sequencing (>40x coverage-per-SNP) using above technologies combinations of the above technologies (e.g. scaffold long reads with Hi-C reads) See below for specific examples of command line options and best practices for some of these technologies. NOTE: At this time HapCUT2 is for diploid organisms only. VCF input should contain diploid variants. If you use HapCUT2 in your research, please cite: Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. gr.213462.116 (2016). doi:10.1101/gr.213462.116

Address of the bookmark: https://github.com/vibansal/HapCUT2

transrate: Understanding your transcriptome assembly

Neel — Fri, 13 Jul 2018 07:49:26 -0500

Transrate is software for de-novo transcriptome assembly quality analysis. It examines your assembly in detail and compares it to experimental evidence such as the sequencing reads, reporting quality scores for contigs and assemblies. This allows you to choose between assemblers and parameters, filter out the bad contigs from an assembly, and help decide when to stop trying to improve the assembly.

Address of the bookmark: http://hibberdlab.com/transrate/index.html

ALLHiC: Phasing and scaffolding polyploid genomes based on Hi-C data

BioStar — Thu, 20 Dec 2018 12:03:32 -0600

The major problem of scaffolding polyploid genome is that Hi-C signals are frequently detected between allelic haplotypes and any existing stat of art Hi-C scaffolding program links the allelic haplotypes together. To solve the problem, we developed a new Hi-C scaffolding pipeline, called ALLHIC, specifically tailored to the polyploid genomes. ALLHIC pipeline contains a total of 5 steps: prune, partition, rescue, optimize and build.

Address of the bookmark: https://github.com/tangerzhang/ALLHiC/wiki

wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly

BioStar — Mon, 04 Feb 2019 04:53:47 -0600

Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output.

./wtdbg2 -x rs -g 4.6m -t 16 -i reads.fa.gz -fo prefix
./wtpoa-cns -t 16 -i prefix.ctg.lay.gz -fo prefix.ctg.fa

Address of the bookmark: https://github.com/ruanjue/wtdbg2