BOL: Related items

“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

Poonam Mahapatra — Mon, 04 Jun 2018 03:45:15 -0500

One code to find them all is a set of perl scripts to extract useful information from RepeatMasker about transposable elements, retrieve their sequences and get some quantitative information. Assemble RepeatMasker hits into complete TE copies, including LTR-retrotransposon Retrieve corresponding TE sequences, and flanking sequences, from the local fasta files Compute summary statistics for each TE family (number of TE copies, genome coverage...) Ambiguous cases such as nested TE can be assembled into copies automatically or manually Allow for working with a TE user-defined library Allow for working with only a user-chosen set of TE families http://doua.prabi.fr/software/one-code-to-find-them-all

Address of the bookmark: http://doua.prabi.fr/software/one-code-to-find-them-all

SALSA: A tool to scaffold long read assemblies with Hi-C

Jit — Fri, 15 Jun 2018 04:01:15 -0500

This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch. To use the latest version, first run the following commands: cd SALSA make To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2). If you consider using this tool, please cite our publication which describes the methods used for scaffolding. Ghurye, J., Pop, M., Koren, S., Bickhart, D., & Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).

Address of the bookmark: https://github.com/machinegun/SALSA

gSearch: a fast and flexible general search tool for whole-genome sequencing

Jit — Mon, 06 Aug 2018 17:19:15 -0500

gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner.

Address of the bookmark: http://ml.ssu.ac.kr/gSearch/index.html

LRCstats: a tool for evaluating long reads correction methods

Aaryan Lokwani — Wed, 22 Aug 2018 11:05:04 -0500

LRCstats is an open-source pipeline for benchmarking DNA long read correction algorithms for long reads outputted by third generation sequencing technology such as machines produced by Pacific Biosciences. The reads produced by third generation sequencing technology, as the name suggests, are longer in length than reads produced by next generation sequencing technologies, such as those produced by Illumina. However, long reads are plagued by high error rates, which can cause issues in downstream analysis. Long read correction algorithms reduce the error rate of long reads either through self-correcting methods or using accurate, short reads outputted by next generation sequencing technologies to correct long reads.

Address of the bookmark: https://github.com/cchauve/lrcstats

GRSR: a tool for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences

Jit — Fri, 28 Sep 2018 09:35:10 -0500

GRSR is a Tool for Deriving Genome Rearrangement Scenarios for Multiple Uni-chromosomal Genomes. This tool will do the following steps:

Step 1. Run mugsy to get multiple sequence alignment results.
Step 2 & 3. Extraction of the Coordinates of Core Blocks, Construction of Synteny Blocks and Generating Signed Permutations.
Step 4. Generate pairwise genome rearrangement scenarios and find repeats at the breakpoints of each rearrangement events.

https://github.com/DanwangJessica/GRSR

Address of the bookmark: https://github.com/DanwangJessica/GRSR

KOALA: KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation

Abhimanyu Singh — Wed, 12 Dec 2018 09:16:55 -0600

KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. BlastKOALA and GhostKOALA assign K numbers to the user's sequence data by BLAST and GHOSTX searches, respectively, against a nonredundant set of KEGG GENES. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to the BlastKOALA server and can be executed in an interactive mode. See Step-by-step Instructions.

Reference: Kanehisa, M., Sato, Y., and Morishima, K. (2016) BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726-731. [pubmed] [pdf]

Address of the bookmark: https://www.kegg.jp/blastkoala/

Cogent: a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences.

Jit — Tue, 18 Jun 2019 05:33:04 -0500

Cogent is a tool that identifies gene families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check assemblies for the presence of these known coding sequences.

Cogent is a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences. It is designed to be used on Iso-Seq data and in cases where there is no reference genome or the ref genome is highly incomplete.

See a recent presentation on Cogent being applied to the Cuttlefish Iso-Seq data.

Cogent preliminary draft paper (updated 2016Dec version), Supplementary

Please see wiki for details on usage.

Address of the bookmark: https://github.com/Magdoll/Cogent

vt: a variant tool set that discovers short variants from Next Generation Sequencing data.

Jit — Tue, 28 Jan 2020 03:44:43 -0600

vt is a variant tool set that discovers short variants from Next Generation Sequencing data.

https://genome.sph.umich.edu/wiki/Vt

https://github.com/atks/vt

Address of the bookmark: https://genome.sph.umich.edu/wiki/Vt

Chromonomer: a tool set for repairing and enhancing assembled genomes through integration of genetic maps and conserved synteny

Jit — Mon, 17 Feb 2020 05:38:46 -0600

Chromonomer is a program designed to integrate a genome assembly with a genetic map. Chromonomer tries very hard to identify and remove markers that are out of order in the genetic map, when considered against their local assembly order; and to identify scaffolds that have been incorrectly assembled according to the genetic map, and split those scaffolds.

Address of the bookmark: http://catchenlab.life.illinois.edu/chromonomer/

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Jit — Mon, 18 May 2020 10:53:32 -0500

Contig Annotation Tool (CAT) and Bin Annotation Tool (BAT) are pipelines for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins) of both known and (highly) unknown microorganisms, as generated by contemporary metagenomics studies. The core algorithm of both programs involves gene calling, mapping of predicted ORFs against the nr protein database, and voting-based classification of the entire contig / MAG based on classification of the individual ORFs. CAT and BAT can be run from intermediate steps if files are formated appropriately (see Usage).

Address of the bookmark: https://github.com/dutilh/CAT