BOL: Related items

TAREAN: A computational tool for identification and characterization of satellite DNA from unassembled short reads

Surabhi Chaudhary — Tue, 15 May 2018 02:53:11 -0500

TAndem REpeat ANalyzer -TAREAN – is a computational pipeline for unsupervised identification of satellite repeats from unassembled sequence reads. The pipeline uses low-pass whole genome sequence reads and performs their graph-based clustering. Resulting clusters, representing all types of repeats, are then examined for the presence of circular structures and putative satellite repeats are reported.

How to use TAREAN:

Install a local instance of the pipeline using its source code available from bitbucket repository.
Use public Galaxy-based server at https://repeatexplorer-elixir.cerit-sc.cz/. The server is provided in frame of the Elixir CZ project and is maintained by CESNET and CERIT-SC. Simple registration is required to use this service.

Development of TAREAN was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2015047).

References

Novak, P., Avila Robledillo, L., Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) – TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res., doi:10.1093/nar/gkx257

Address of the bookmark: https://bitbucket.org/petrnovak/repex_tarean

BFC: a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data

Jit — Thu, 31 May 2018 09:35:23 -0500

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes. The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC. https://github.com/lh3/bfc

Address of the bookmark: https://github.com/lh3/bfc

SALSA: A tool to scaffold long read assemblies with Hi-C

Jit — Fri, 15 Jun 2018 04:01:15 -0500

This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch. To use the latest version, first run the following commands: cd SALSA make To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2). If you consider using this tool, please cite our publication which describes the methods used for scaffolding. Ghurye, J., Pop, M., Koren, S., Bickhart, D., & Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).

Address of the bookmark: https://github.com/machinegun/SALSA

gSearch: a fast and flexible general search tool for whole-genome sequencing

Jit — Mon, 06 Aug 2018 17:19:15 -0500

gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner.

Address of the bookmark: http://ml.ssu.ac.kr/gSearch/index.html

LRCstats: a tool for evaluating long reads correction methods

Aaryan Lokwani — Wed, 22 Aug 2018 11:05:04 -0500

LRCstats is an open-source pipeline for benchmarking DNA long read correction algorithms for long reads outputted by third generation sequencing technology such as machines produced by Pacific Biosciences. The reads produced by third generation sequencing technology, as the name suggests, are longer in length than reads produced by next generation sequencing technologies, such as those produced by Illumina. However, long reads are plagued by high error rates, which can cause issues in downstream analysis. Long read correction algorithms reduce the error rate of long reads either through self-correcting methods or using accurate, short reads outputted by next generation sequencing technologies to correct long reads.

Address of the bookmark: https://github.com/cchauve/lrcstats

GRSR: a tool for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences

Jit — Fri, 28 Sep 2018 09:35:10 -0500

GRSR is a Tool for Deriving Genome Rearrangement Scenarios for Multiple Uni-chromosomal Genomes. This tool will do the following steps:

Step 1. Run mugsy to get multiple sequence alignment results.
Step 2 & 3. Extraction of the Coordinates of Core Blocks, Construction of Synteny Blocks and Generating Signed Permutations.
Step 4. Generate pairwise genome rearrangement scenarios and find repeats at the breakpoints of each rearrangement events.

https://github.com/DanwangJessica/GRSR

Address of the bookmark: https://github.com/DanwangJessica/GRSR

KOALA: KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation

Abhimanyu Singh — Wed, 12 Dec 2018 09:16:55 -0600

KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. BlastKOALA and GhostKOALA assign K numbers to the user's sequence data by BLAST and GHOSTX searches, respectively, against a nonredundant set of KEGG GENES. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to the BlastKOALA server and can be executed in an interactive mode. See Step-by-step Instructions.

Reference: Kanehisa, M., Sato, Y., and Morishima, K. (2016) BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726-731. [pubmed] [pdf]

Address of the bookmark: https://www.kegg.jp/blastkoala/

Cogent: a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences.

Jit — Tue, 18 Jun 2019 05:33:04 -0500

Cogent is a tool that identifies gene families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check assemblies for the presence of these known coding sequences.

Cogent is a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences. It is designed to be used on Iso-Seq data and in cases where there is no reference genome or the ref genome is highly incomplete.

See a recent presentation on Cogent being applied to the Cuttlefish Iso-Seq data.

Cogent preliminary draft paper (updated 2016Dec version), Supplementary

Please see wiki for details on usage.

Address of the bookmark: https://github.com/Magdoll/Cogent

vt: a variant tool set that discovers short variants from Next Generation Sequencing data.

Jit — Tue, 28 Jan 2020 03:44:43 -0600

vt is a variant tool set that discovers short variants from Next Generation Sequencing data.

https://genome.sph.umich.edu/wiki/Vt

https://github.com/atks/vt

Address of the bookmark: https://genome.sph.umich.edu/wiki/Vt

Filtlong: quality filtering tool for long reads

Radha Agarkar — Wed, 13 May 2020 10:23:55 -0500

Filtlong is a tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the filter.

Filtlong builds into a stand-alone executable:

git clone https://github.com/rrwick/Filtlong.git
cd Filtlong
make -j
bin/filtlong -h

Address of the bookmark: https://github.com/rrwick/Filtlong