BOL: Related items

VCF Compare !

Rahul Nayak — Wed, 19 Jan 2022 10:30:14 -0600

compare two BWA mapping methods with the online hg18-mapped data

We first operate a rapid inspection of the different BAM files using samtools flagstat. Illumina provided chr21 read mapping obtained with their GA IIx deep sequencing platform <ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam>, aligned to the b36/hg18 reference genome)

Address of the bookmark: https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare

SALSA: A tool to scaffold long read assemblies with Hi-C

Jit — Fri, 15 Jun 2018 04:01:15 -0500

This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch. To use the latest version, first run the following commands: cd SALSA make To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2). If you consider using this tool, please cite our publication which describes the methods used for scaffolding. Ghurye, J., Pop, M., Koren, S., Bickhart, D., & Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).

Address of the bookmark: https://github.com/machinegun/SALSA

CGView.js is a Circular Genome Viewing tool

LEGE — Wed, 27 Mar 2024 11:16:24 -0500

CGView.js is a Circular Genome Viewing tool for visualizing and interacting with small genomes. This software is an adaptation of the Java program CGView.

CGView.js is the genome viewer of Proksee, an expert system for genome assembly, annotation and visualization.

Features

Circular and linear views of genomes
Capable of drawing genomes up to 10 Mbp with 1000's of features and 100's contigs
Smooth zooming down to the sequence level
Easily generate features and plots directly form the sequence (e.g. ORFs, GC-content and GC-Skew)
Save high resolution PNG maps up to 8000x8000px
Fully documented API for interacting with CGView.js maps

Address of the bookmark: https://js.cgview.ca/

DAVI: Deep learning-based tool for alignment and single nucleotide variant identification

Jit — Tue, 16 Mar 2021 05:41:33 -0500

DAVI consists of models for both global and local alignment and for variant calling. We have evaluated the performance of DAVI against existing state-of-the-art tool sets and found that its accuracy and performance is comparable to existing tools used for bench-marking. We further demonstrate that while existing tools are based on data generated from a specific sequencing technology, the models proposed in DAVI are generic and can be used across different NGS technologies as well as across different species

https://iopscience.iop.org/article/10.1088/2632-2153/ab7e19/pdf

Address of the bookmark: https://github.com/gguptaiitd/NEAT

Circlator: automated circularization of genome assemblies using long sequencing reads

Poonam Mahapatra — Tue, 15 May 2018 09:42:32 -0500

A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript. Citation: "Circlator: automated circularization of genome assemblies using long sequencing reads", Hunt et al, Genome Biology 2015 Dec 29;16(1):294. doi: 10.1186/s13059-015-0849-0. PMID: 26714481.

Address of the bookmark: http://sanger-pathogens.github.io/circlator/

Tools for bacterial whole genome annotation

Radha Agarkar — Sat, 16 Dec 2017 17:37:47 -0600

RAST – Web tool (upload contigs), uses the subsystems in the SEED database and provides detailed annotation and pathway analysis. Takes several hours per genome but I think this is the best way to get a high quality annotation (if you have only a few genomes to annotate).

Prokka – Standalone command line tool, takes just a few minutes per genome. This is the best way to get good quality annotation in a flash, which is particularly useful if you have loads of genomes or need to annotate a pangenome or metagenome. Note however that the quality of functional information is not as good as RAST, and you will need several extra steps if you want to do functional profiling and pathway analysis of your genome(s)… which is in-built in RAST.

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

PGAP: NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

BEACON (automated tool for Bacterial GEnome Annotation ComparisON), a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/.

BlastKOLA: Assigns K numbers to the user's sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

PAGIT: Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.

MAKER: A portable and easily configurable genome annotation pipeline. MAKER allows smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. It identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER's inputs are minimal and its ouputs can be directly loaded into a Generic Model Organism Database (GMOD). They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER is available for download and can be tested online via the MAKER Web Annotation Service (MWAS).

MyPro is a software pipeline for high-quality prokaryotic genome assembly and annotation. It was validated on 18 oral streptococcal strains to produce submission-ready, annotated draft genomes. MyPro installed as a virtual machine and supported by updated databases will enable biologists to perform quality prokaryotic genome assembly and annotation with ease.

HybPiper

Jit — Fri, 04 Nov 2016 05:02:10 -0500

HybPiper was designed for targeted sequence capture, in which DNA sequencing libraries are enriched for gene regions of interest, especially for phylogenetics. HybPiper is a suite of Python scripts that wrap and connect bioinformatics tools in order to extract target sequences from high-throughput DNA sequencing reads.

Targeted bait capture is a technique for sequencing many loci simultaneously based on bait sequences. HybPiper pipeline starts with high-throughput sequencing reads (for example from Illumina MiSeq), and assigns them to target genes using BLASTx or BWA. The reads are distributed to separate directories, where they are assembled separately using SPAdes. The main output is a FASTA file of the (in frame) CDS portion of the sample for each target region, and a separate file with the translated protein sequence.

HybPiper also includes post-processing scripts, run after the main pipeline, to also extract the intronic regions flanking each exon, investigate putative paralogs, and calculate sequencing depth. For more information, please see our wiki.

HybPiper is run separately for each sample (single or paired-end sequence reads). When HybPiper generates sequence files from the reads, it does so in a standardized directory hierarchy. Many of the post-processing scripts rely on this directory hierarchy, so do not modify it after running the initial pipeline. It is a good idea to run the pipeline for each sample from the same directory. You will end up with one directory per run of HybPiper, and some of the later scripts take advantage of this predictable directory structure.

Address of the bookmark: https://github.com/mossmatters/HybPiper

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

Jit — Tue, 14 Nov 2017 10:26:16 -0600

We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.

Availability and Implementation

The software is implemented in Python 3 and runs in both Python 2.7 and 3.4– on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/ under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/.

Address of the bookmark: https://dfast.nig.ac.jp/

jobTree based python wrapper to run the genome simulation tool suite Evolver

Jit — Fri, 08 Dec 2017 16:26:32 -0600

evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.

Address of the bookmark: https://github.com/dentearl/evolverSimControl

EAGLER: a scaffolding tool for long reads.

Jit — Mon, 04 Jun 2018 05:26:03 -0500

EAGLER is a scaffolding tool for long reads. The scaffolder takes as input a draft genome created by any NGS assembler and a set of long reads. The long reads are used to extend the contigs present in the NGS draft and possibly join overlapping contigs. EAGLER supports both PacBio and Oxford Nanopore reads.

The tool should be compatible with most UNIX flavors and has been successfully tested on the following operating systems:

Mac OS X 10.11.1
Mac OS X 10.10.3
Ubuntu 14.04 LTS

https://bib.irb.hr/datoteka/844447.Diplomski_2015_Luka_terbi.pdf

Address of the bookmark: https://github.com/mculinovic/EAGLER