BOL: Related items

Installing BLAT on Linux !

BioStar — Tue, 11 Sep 2018 08:17:35 -0500

It's been a while since I last installed BLAT and when I went to the download directory at UCSC: http://users.soe.ucsc.edu/~kent/src/ I found that the latest blast is now version 35 and that the code to download was: blatSrc35.zip. However, you can also get pre-compiled binaries at: http://hgdownload.cse.ucsc.edu/admin/exe/ and that there was a linux x86_64 executable for my architecture available at: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/. Though YYMV, BLAT can be a little bit of a tricky beast to get going, so I decided to download the source code and compile that.

I will be compiling this code as 'root' as a system tool in /usr/local/src, so do not scream at me for that.

First I created an /usr/local/src/blat directory and I copied the blatSrc35.zip file into that.

Next I used

unzip blatSrc35.zip

to unpack the archive. This gives a directory blatSrc now move into that directory.

#cd blatSrc

before you begin read the README file that comes with the source code.

One thing about building blat is that you need to set the MACHTYPE variable so that the BLAT sources know what type of machine you are compiling the software on.

on most *nix machines, typing

echo $MACHTYPE

will return the machine architecture type.

On my CentOS 6 based system this gave:

x86_64-redhat-linux-gnu

However, what BLAT requires is the 'short value' (ie the first part of the MACHTYPE). To correct this, in the bash shell type (change this to the correct MACHTYPE for your system)

MACHTYPE=x86_64
export MACHTYPE

now running the command:

echo $MACHTYPE

should give the correct short form of the MACHTYPE:

x86_64

now create the directory lib/$MACHTYPE in the source tree. ie:

mkdir lib/$MACHTYPE

For my machine, lib/x86_64 already existed, so I did not have to do this, but this is not the case for all architectures.

The BLAT code assumes that you are compiling BLAT as a non-privileged (ie non-root) user. As a result, you must create the directory for the executables to go into:

mkdir ~/bin/$MACHTYPE

If you are installing as a normal user, edit your .bashrc to add the following (change the x86_64 to be your MACHTYPE):

export PATH=~/bin/x86_64::$PATH

For me, though, this was not good enough. I wanted the executables in /usr/local/bin where all my other code goes. As a result I did some hackery...

There is a master make template in the inc directory called common.mk and I edited this file with the command:

vi inc/common.mk

I replaced the line

    BINDIR=${HOME}/bin/${MACHTYPE}

with

    BINDIR=/usr/local/bin

saved and quit (as this is in my path, I do not need to do anything else)

All the preparation is now done and you can create the blat executables by going into the toplevel of the blat source tree (for me it was /usr/local/src/blat/blatSrc, but change to wherever you unpacked blat into).

Now simply run the command:

make

to compile the code.

Blat installed cleanly and the executables were all neatly placed in /usr/local/bin/x86_64, just like I wanted.

now simply running the command:

blat

on the command line gives me information on blat and sample usage.

Blat is installed and it's installed properly in my system code tree!!!

Caretta – A multiple protein structure alignment and feature extraction suite

Rahul Nayak — Fri, 18 Dec 2020 02:09:44 -0600

Caretta – a multiple protein structure alignment and feature extraction suite

Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning.

Address of the bookmark: http://www.bioinformatics.nl/caretta/

IQ-TREE: Efficient software for phylogenomic inference

Jit — Mon, 18 Feb 2019 04:25:11 -0600

A fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihoods with similar computing time

IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3–97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree

Address of the bookmark: http://www.iqtree.org/

Flye: Fast and accurate de novo assembler for single molecule sequencing reads

BioJoker — Tue, 02 Apr 2019 21:54:55 -0500

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.

Address of the bookmark: https://github.com/fenderglass/Flye

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

FastANI: fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI)

Jit — Fri, 13 Jul 2018 17:27:01 -0500

FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. FastANI supports pairwise comparison of both complete and draft genome assemblies. Its underlying procedure follows a similar workflow as described by Goris et al. 2007. However, it avoids expensive sequence alignments and uses Mashmap as its MinHash based sequence mapping engine to compute the orthologous mappings and alignment identity estimates. Based on our experiments with complete and draft genomes, its accuracy is on par with BLAST-based ANI solver and it achieves two to three orders of magnitude speedup. Therefore, it is useful for pairwise ANI computation of large number of genome pairs. More details about its speed, accuracy and potential applications are described here: "High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries".

Address of the bookmark: https://github.com/ParBLiSS/FastANI

STELLAR: fast and exact local alignments

Neel — Wed, 29 Aug 2018 16:00:46 -0500

STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar.

Address of the bookmark: http://www.seqan.de/apps/stellar/

FLAS: fast and high throughput algorithm for PacBio long read self-correction.

Jit — Sat, 22 Jun 2019 12:16:39 -0500

FLAS, a wrapper algorithm of MECAT, to achieve high throughput long read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy.

Address of the bookmark: https://github.com/baoe/flas

FastProNGS: fast preprocessing of next-generation sequencing reads

Rahul Nayak — Sat, 26 Dec 2020 08:35:21 -0600

FastProNGS to integrate the quality control process with automatic adapter removal. Parallel processing was implemented to speed up the process by allocating multiple threads. Compared with similar up-to-date preprocessing tools, FastProNGS is by far the fastest.

Address of the bookmark: https://github.com/Megagenomics/FastProNGS

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

Jit — Thu, 03 Feb 2022 04:01:55 -0600

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

USAGE:

-query: sequence A in fasta format
-db: sequence B in fasta format
-out: output matrix
-kmer Integer: k>1 (default 32) Use 32 for chromosomes and genomes and 16 for small bacteria
-diffuse Integer: z>0 (default 4) Use 4 for everything - if using large plant genomes you can try using 1
-dimension Size of the output matrix and plot. Integer: d>0 (default 1000) Use 1000 for everything that is not full genome size, where 2000 is recommended

Address of the bookmark: https://github.com/estebanpw/chromeister