BOL: Related items

LAMSA: fast split read alignment with long approximate matches

Jit — Tue, 15 May 2018 04:44:42 -0500

LAMSA (Long Approximate Matches-based Split Aligner) is a novel split alignment approach with faster speed and good ability of handling SV events. It is well-suited to align long reads (over thousands of base-pairs). LAMSA takes takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming (SDP)-based split alignment approach to handle the large or non-co-linear variants. We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; mean-while, it also has good ability to handle various categories of SVs. LAMSA is open source and free for non-commercial use. LAMSA is mainly designed by Bo Liu & Yan Gao and developed by Yan Gao in Center for Bioinformatics, Harbin Institute of Technology, China.

Address of the bookmark: https://github.com/hitbc/LAMSA

assemblytics: delta file to analyze alignments of an assembly to another assembly or a reference genome

Jit — Thu, 14 Jun 2018 07:31:00 -0500

Download and install MUMmer Align your assembly to a reference genome using nucmer (from MUMmer package) $ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT Consult the MUMmer manual if you encounter problems Optional: Gzip the delta file to speed up upload (usually 2-4X faster) $ gzip OUT.delta Then use the OUT.delta.gz file for upload. Upload the .delta or delta.gz file (view example) to Assemblytics Important: Use only contigs rather than scaffolds from the assembly. This will prevent false positives when the number of Ns in the scaffolded sequence does not match perfectly to the distance in the reference. The unique sequence length required represents an anchor for determining if a sequence is unique enough to safely call variants from, which is an alternative to the mapping quality filter for read alignment. http://assemblytics.com/

Address of the bookmark: http://assemblytics.com/

Qualimap2: Evaluating next generation sequencing alignment data

Jit — Tue, 11 Sep 2018 04:44:29 -0500

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

Supported types of experiments include:

Whole-genome sequencing
Whole-exome sequencing
RNA-seq (speical mode available)
ChIP-seq

Address of the bookmark: http://qualimap.bioinfo.cipf.es/

Cactus: a reference-free whole-genome multiple alignment program

Jit — Mon, 12 Aug 2019 07:52:33 -0500

Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here: https://doi.org/10.1101/gr.123356.111

Cactus uses substantial resources. For primate-sized genomes (3 gigabases each), you should expect Cactus to use approximately 120 CPU-days of compute per genome, with about 120 GB of RAM used at peak. The requirements scale roughly quadratically, so aligning two 1-megabase bacterial genomes takes only 1.5 CPU-hours and 14 GB RAM.

Address of the bookmark: https://github.com/ComparativeGenomicsToolkit/cactus

parallelLastz: Lastz with multi-threads support.

BioStar — Sat, 22 Aug 2020 05:58:40 -0500

Running Lastz (https://github.com/lastz/lastz) in parallel mode. This program is for single computer with multiple core processors.

When the query file format is fasta, you can specify many threads to process it. It can reduce run time linearly, and use almost equal memory as the original lastz program. This is useful when you lastz a big query file to a huge reference like human whole genome sequence.

The program is an extension on the original lastz program which was written by Bob Harris (the LASTZ guy).

Address of the bookmark: https://github.com/jnarayan81/parallelLastz

Mulan: MUltiple sequence Local AligNment and conservation visualization tool

Rahul Nayak — Thu, 20 Jul 2017 08:02:32 -0500

Mulan performs multiple (2 or more) sequence alignments with an efficient and rapid "full local" alignment strategy that ensures a recapitulation of evolutionary sequence rearrangements (such as inversions and reshuffling) in any of the species. It combines refine and tba tools to align either "draft" or "finished" quality sequences. Mulan provides a dynamic graphical interface to align and visualize conservation profiles for evolutionarily distant and closely related species.

Input formats, automated data upload from the UCSC Genome Browser, gene annotation, annotation of repetitive elements, and progress report were previously described in the zPicture instructions and we refer the users to these materials for more details. This introduction is mainly focused on some novel features unique to the Mulan.

Address of the bookmark: https://mulan.dcode.org/mulanInstructions.php

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

Manisha Mishra — Tue, 17 Apr 2018 16:21:20 -0500

AlignGraph is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.

Using AlignGraph

AlignGraph --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --coverage coverage --part p --fastMap --ratioCheck --iterativeMap --misassemblyRemoval --resume]

Address of the bookmark: https://github.com/baoe/AlignGraph

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Jit — Mon, 20 Aug 2018 14:14:11 -0500

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several studies require long and accurate reads including de novo assembly, fusion and structural variation detection. In such cases researchers often combine both technologies and the more erroneous long reads are corrected using the short reads. Current approaches rely on various graph based alignment techniques and do not take the error profile of the underlying technology into account. Memory- and time- efficient machine learning algorithms that address these shortcomings have the potential to achieve better and more accurate integration of these two technologies. Results: We designed and developed Hercules, the first machine learning-based long read error correction algorithm. The algorithm models every long read as a profile Hidden Markov Model with respect to the underlying platformtextquoterights error profile. The algorithm learns a posterior transition/emission probability distribution for each long read and uses this to correct errors in these reads. Using datasets from two DNA-seq BAC clones (CH17-157L1 and CH17-227A2), and human brain cerebellum polyA RNA-seq, we show that Hercules-corrected reads have the highest mapping rate among all competing algorithms and highest accuracy when most of the basepairs of a long read are covered with short reads. Availability:

Hercules source code is available at https://github.com/BilkentCompGen/Hercules

Address of the bookmark: https://github.com/BilkentCompGen/Hercules

HDOCK SERVER

Neel — Tue, 16 Jun 2020 01:54:41 -0500

HDOCK SERVER

Protein-protein and protein-DNA/RNA docking based on a hybrid algorithm of template-based modeling and ab initio free docking.

The HDOCK server distinguishes itself from similar docking servers in its ability to support amino acid sequences as input and a hybrid docking strategy in which experimental information about the protein–protein binding site and small-angle X-ray scattering can be incorporated during the docking and post-docking processes.

Address of the bookmark: http://hdock.phys.hust.edu.cn/

NCBI PSI-BLAST Tutorial

Fri, 23 Aug 2013 02:25:02 -0500

http:--www.biotechnology.jhu.edu- Tutorial for PSI-BLAST, an extension of BLAST that uses matrix algebra. BLAST is a cornerstone bioinformatics tool at NCBI. BLAST is the Basic Local Alignment Search tool and will protein and DNA sequences that are related to a sequence that the user provides.