BOL: Related items

JUDI: Just Do It

Jit — Mon, 06 Sep 2021 02:44:35 -0500

judi comes from the idea of bringing the power and efficiency of doit to execute any kind of task under many combinations of parameter settings.

https://github.com/ncbi/JUDI

Address of the bookmark: https://github.com/ncbi/JUDI

SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins

BioStar — Sat, 06 Jul 2024 04:29:16 -0500

SqueezeMeta is a full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual genomes via binning procedures. Thus, SqueezeMeta features several unique characteristics:

Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome
Co-assembly of a large number of metagenomes via merging of individual metagenomes
Includes binning and bin checking, for retrieving individual genomes
The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.
Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.
Metatranscriptomic support via mapping of cDNA reads against reference metagenomes

Address of the bookmark: https://github.com/jtamames/SqueezeMeta

dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes.

Jit — Wed, 20 Dec 2017 18:35:16 -0600

While the number of sequenced diploid genomes have been steadily increasing in the last few years, assembly of highly polymorphic (HP) diploid genomes remains challenging. As a result, there is a shortage of tools for assembling HP genomes from the next generation sequencing (NGS) data. The initial approaches to assembling HP genomes were proposed in the pre-NGS era and are not well suited for NGS projects. To address this limitation, we developed the first de Bruijn graph assembler, dipSPAdes, for HP genomes that significantly improves on the state-of-the-art assemblers for HP diploid genomes.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pubmed/25734602

Run miniasm assembler on nanopore reads !

Jit — Mon, 18 Dec 2017 04:07:50 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Find the detail of the reads repeats:

fq2fa ONT_A.fastq ONT_A.fasta

minimap2 -xava-ont ONT_A.fasta ONT_A.fasta -t10 -X > AONT.paf

awk '{if($1==$6){print}}' AONT.paf > AONTself.paf

awk '$5=="-"' AONTself.paf | awk '{print $1}'| sort|uniq > invertedrepeat.list

Generated a few palindrome and repeats plots (highlighting only repeats largest than 10, 20 and 30 kb)

minidot -f 5 -m 30000 AONTself.paf > AONTself30000.eps
sed 's/_template_pass_FAH31515//' AONTself30000.eps > AONTself30000final.eps

minidot -f 5 -m 20000 AONTself.paf > AONTself20000.eps
sed 's/_template_pass_FAH31515//' AONTself20000.eps > AONTself20000final.eps

minidot -f 5 -m 10000 AONTself.paf > AONTself10000.eps
sed 's/_template_pass_FAH31515//' AONTself10000.eps > AONTself10000final.eps

Assemble with miniasm:

miniasm -f ONT_A.fasta AONT.paf > AONT.gfa
grep '^S' AONT.gfa |awk '{print ">"$2"\n"$3}' > AONT_miniasm.fasta

minimap2 -xasm10 AONT_miniasm.fasta AONT_miniasm.fasta -t1 -X > AONT_miniasm.paf

awk '{if($1==$6){print}}' AONT_miniasm.paf > AONT_miniasm_self.paf

minidot -f 5 -m 10000 AONT_miniasm_self.paf > AONT_miniasm_self10000.eps

Njoy the assembly !

Scallop: reference-based transcriptome assembler for RNA-seq

Rahul Nayak — Tue, 08 May 2018 04:23:27 -0500

Scallop is an accurate reference-based transcript assembler. Scallop features its high accuracy in assembling multi-exon transcripts as well as lowly expressed transcripts. Scallop achieves this improvement through a novel algorithm that can be proved preserving all phasing paths from reads and paired-end reads, while also achieves both transcripts parsimony and coverage deviation minimization.

Scallop paper has been published at Nature Biotechnology. The datasets and scripts used in this paper to compare the performance of Scallop and other assemblers are available at scalloptest.

Please also checkout the podcast about Scallop (thanks Roman Cheplyaka for the interview). It is available at both the bioinformatics chat and iTunes.

https://github.com/Kingsford-Group/scallop

Address of the bookmark: https://github.com/Kingsford-Group/scallop

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.

Abhimanyu Singh — Tue, 23 May 2017 05:20:32 -0500

GRASS (GeneRic ASsembly Scaffolder)-a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation-maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.

Address of the bookmark: https://github.com/AlexeyG/GRASS

Liftoff: an accurate tool that maps annotations in GFF or GTF between assemblies

Jit — Tue, 30 Jun 2020 21:40:52 -0500

Liftoff, an accurate tool that maps annotations in GFF or GTF between assemblies of the same, or closely-related species. Unlike current coordinate lift-over tools which require a pre-generated “chain” file as input, Liftoff is a standalone tool that takes two genome assemblies and a reference annotation as input and outputs an annotation of the target genome.

Address of the bookmark: https://github.com/agshumate/Liftoff

ASplice: a scalable and memory-efficient algorithm for de novo transcriptome assembly

Rahul Nayak — Tue, 03 Jul 2018 04:09:46 -0500

With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. Texas A&M University researchers develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. Availability – A software program that implements the algorithm is available at: http://faculty.cse.tamu.edu/shsze/asplice. Sze SH, Pimsler ML, Tomberlin JK, Jones CD, Tarone AM. (2017) A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. BMC Genomics 18(Suppl 4):387.

Address of the bookmark: http://faculty.cse.tamu.edu/shsze/asplice/

The Extensive de novo TE Annotator (EDTA)

Abhimanyu Singh — Thu, 08 Aug 2019 04:05:36 -0500

The EDTA package was designed to filter out false discoveries in raw TE candidates and generate a high-quality non-redundant TE library for whole-genome TE annotations. Selection of initial search programs were based on benckmarkings on the annotation performance using a manually curated TE library in the rice genome.

Address of the bookmark: https://github.com/oushujun/EDTA

Tigers genome sequenced

Rahul Agarwal — Tue, 17 Sep 2013 16:48:24 -0500

Fifteen scientists led by Dr Jong Bhak of Genome Research Foundation, South Korea, decoded as many as 3 billion nucleotides (organic molecules that form the basic building blocks of nucleic acids, such as DNA). They identified 20,000 genes related to various functions of the tiger.

The biggest and perhaps most fearsome of the world's big cats, the tiger, shares 95.6 percent of its DNA with humans' cute and furry companions, domestic cats.

The new research showed that big cats have genetic mutations that enabled them to be carnivores. The team also identified mutations that allow snow leopards to thrive at high altitudes.

Reference:

http://www.nbcnews.com/science/your-cat-ferocious-tigers-share-lot-95-6-percent-their-4B11182690

http://timesofindia.indiatimes.com/home/environment/flora-fauna/Gene-mapping-of-tiger-completed/articleshow/22671681.cms

Paper:

http://www.nature.com/ncomms/2013/130917/ncomms3433/full/ncomms3433.html