BOL: Related items

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads

Jit — Tue, 12 Jun 2018 08:14:41 -0500

P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs.

Address of the bookmark: http://www.fishbrowser.org/software/P_RNA_scaffolder/

LINKS scaffolder bloomfilter setting !

Jit — Fri, 15 Jun 2018 10:39:54 -0500

➜ bin git:(master) ✗ ls -l
total 68
drwxrwxr-x 3 urbe urbe 4096 Jun 15 12:15 lib
-rwxrwxrwx 1 urbe urbe 65141 Jun 15 17:13 LINKS
➜ bin git:(master) ✗ pwd
/home/urbe/Tools/LINKS_1.8.6/bin

➜ bloomfilter git:(master) ✗ swig -Wall -c++ -perl5 BloomFilter.i
➜ bloomfilter git:(master) ✗ g++ -c BloomFilter_wrap.cxx -I/home/urbe/anaconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi/CORE/ -fPIC -Dbool=char -O3
BloomFilter_wrap.cxx:1892:30: fatal error: ../BloomFilter.hpp: No such file or directory
compilation terminated.
➜ bloomfilter git:(master) ✗ cd swig
➜ swig git:(master) ✗ g++ -c BloomFilter_wrap.cxx -I/home/urbe/anaconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi/CORE/ -fPIC -Dbool=char -O3
In file included from BloomFilter_wrap.cxx:1877:0:
../BloomFilter.hpp: In member function ‘void BloomFilter::loadHeader(FILE*)’:
../BloomFilter.hpp:141:59: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
fread(&header, sizeof(struct FileHeader), 1, file);
^
➜ swig git:(master) ✗ g++ -Wall -shared BloomFilter_wrap.o -o BloomFilter.so -O3
➜ swig git:(master) ✗ cd ..
➜ bloomfilter git:(master) ✗ cd ..
➜ lib git:(master) ✗ cd ..
➜ bin git:(master) ✗ ./LINKS
Usage: ./LINKS [v1.8.6]
-f sequences to scaffold (Multi-FASTA format, required)
-s file-of-filenames, full path to long sequence reads or MPET pairs [see below] (Multi-FASTA/fastq format, required)
-m MPET reads (default -m 1 = yes, default = no, optional)
! DO NOT SET IF NOT USING MPET. WHEN SET, LINKS WILL EXPECT A SPECIAL FORMAT UNDER -s
! Paired MPET reads in their original outward orientation <- -> must be separated by ":"
>template_name
ACGACACTATGCATAAGCAGACGAGCAGCGACGCAGCACG:ATATATAGCGCACGACGCAGCACAGCAGCAGACGAC
-d distance between k-mer pairs (ie. target distances to re-scaffold on. default -d 4000, optional)
Multiple distances are separated by comma. eg. -d 500,1000,2000,3000
-k k-mer value (default -k 15, optional)
-t step of sliding window when extracting k-mer pairs from long reads (default -t 2, optional)
Multiple steps are separated by comma. eg. -t 10,5
-o offset position for extracting k-mer pairs (default -o 0, optional)
-e error (%) allowed on -d distance e.g. -e 0.1 == distance +/- 10% (default -e 0.1, optional)
-l minimum number of links (k-mer pairs) to compute scaffold (default -l 5, optional)
-a maximum link ratio between two best contig pairs (default -a 0.3, optional)
*higher values lead to least accurate scaffolding*
-z minimum contig length to consider for scaffolding (default -z 500, optional)
-b base name for your output files (optional)
-r Bloom filter input file for sequences supplied in -s (optional, if none provided will output to .bloom)
NOTE: BLOOM FILTER MUST BE DERIVED FROM THE SAME FILE SUPPLIED IN -f WITH SAME -k VALUE
IF YOU DO NOT SUPPLY A BLOOM FILTER, ONE WILL BE CREATED (.bloom)
-p Bloom filter false positive rate (default -p 0.001, optional; increase to prevent memory allocation errors)
-x Turn off Bloom filter functionality (-x 1 = yes, default = no, optional)
-v Runs in verbose mode (-v 1 = yes, default = no, optional)

Error: Missing mandatory options -f and -s.

ERROR fixed

perl: symbol lookup error: /home/urbe/Tools/LINKS_new/bin/./lib/bloomfilter/swig/BloomFilter.so: undefined symbol: Perl_Gthr_key_ptr

My commonly used commands in Bioinformatics

Rahul Nayak — Thu, 26 Jul 2018 04:58:45 -0500

FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:

minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta

This reports Racon's changes in a table. You can exclude indels with the -I option in show-snps.

This process (Racon -> MUMmer -> SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).

Scribl : HTML5 canvas genomics graphic library

Jit — Thu, 25 Oct 2018 09:38:53 -0500

Scribl is a javascript, Canvas-based graphics library that easily generates biological visuals of genomic regions, alignments, and assembly data. Scribl can also be used in conventional offline pipelines, since everything needed to generate charts can be contained in a single html file.

Address of the bookmark: http://chmille4.github.io/Scribl/

ASCIIGenome: genome browser based on command line interface and designed for running from console terminals.

Neel — Fri, 09 Nov 2018 13:50:04 -0600

ASCIIGenome is a genome browser based on command line interface and designed for running from console terminals.

Since ASCIIGenome does not require a graphical interface it is particularly useful for quickly visualizing genomic data on remote servers while offering flexibility similar to popular GUI viewers like IGV.

Documentation is at readthedocs/asciigenome.

Address of the bookmark: https://github.com/dariober/ASCIIGenome

Purge Haplotigs: Pipeline to help with curating heterozygous diploid genome assemblies

Rahul Nayak — Mon, 17 Dec 2018 03:17:20 -0600

Some parts of a genome may have a very high degree of heterozygosity. This causes contigs for both haplotypes of that part of the genome to be assembled as separate primary contigs, rather than as a contig and an associated haplotig. This can be an issue for downstream analysis whether you're working on the haploid or phased-diploid assembly.

Identify pairs of contigs that are syntenic and move one of them to the haplotig 'pool'. The pipeline uses mapped read coverage and Minimap2 alignments to determine which contigs to keep for the haploid assembly. Dotplots are optionally produced for all flagged contig matches, juxtaposed with read-coverage, to help the user determine the proper assignment of any remaining ambiguous contigs. The pipeline will run on either a haploid assembly (i.e. Canu, FALCON or FALCON-Unzip primary contigs) or on a phased-diploid assembly (i.e. FALCON-Unzip primary contigs + haplotigs). Here are two examples of how Purge Haplotigs can improve a haploid and diploid assembly.

Address of the bookmark: https://bitbucket.org/mroachawri/purge_haplotigs

LTR_Finder: an efficient program for finding full-length LTR retrotranspsons in genome sequences.

Neel — Sun, 13 Jan 2019 07:05:53 -0600

LTR_Finder is an efficient program for finding full-length LTR retrotranspsons in genome sequences.

The Program first constructs all exact match pairs by a suffix-array based algorithm and extends them to long highly similar pairs. Then Smith-Waterman algorithm is used to adjust the ends of LTR pair candidates to get alignment boundaries. These boundaries are subject to re-adjustment using supporting information of TG..CA box and TSRs and reliable LTRs are selected. Next, LTR_FINDER tries to identify PBS, PPT and RT inside LTR pairs by build-in aligning and counting modules. RT identification includes a dynamic programming to process frame shift. For other protein domains, LTR_FINDER calls ps_scan (from PROSITE, http://www.expasy.org/prosite/) to locate cores of important enzymes if they occur.

Address of the bookmark: https://github.com/xzhub/LTR_Finder

Apollo: First instantaneous, collaborative genomic annotation editor available on the Web

Jit — Fri, 31 May 2019 19:55:39 -0500

Apollo is a plug-in for the JBrowse Genome Viewer.
In addition to genes and pseudogenes, users can annotate ncRNAs (snRNA, snoRNA, tRNA, rRNA), miRNAs, repeat regions, and transposable elements; each annotation type has its own configuration of the ‘Information Editor’.
History tracking with undo/redo functions is available.
Users are able to directly set an annotation to a specific state, choosing from the ‘History’ display.
Adding and updating PubMed IDs will prompt users with a publication title to confirm their submission.
Gene Ontology (GO) terms are supported and GO ID auto-completion has been incorporated.
Users may access a ‘Recent Changes’ page.
Help page with Apollo specific content is available.

Address of the bookmark: http://genomearchitect.github.io/

5700 year-old human genome !

Jit — Thu, 19 Dec 2019 11:22:18 -0600

A Landmark in genomics, scientists have done something that hasn't been done ever.

Scientists have reconstructed the genome of an ancient human who lived nearly 5,700 years ago in Southern Denmark from the birch pitch- an ancient tar-like substance.

By sequencing the sample, researchers not only discovered the ancient human DNA but also microbial DNA reflecting the oral microbiome of the person who chewed the pitch, along with plant and animal DNA that could be the recent meal she might have consumed.

The DNA sample is comparable in quality to well-preserved teeth and skull bones. The DNA suggests that the chewer was a female, most likely with dark skin, dark brown hair and blue eyes.

https://www.nature.com/articles/s41467-019-13549-9

Artistic reconstruction. (Tom Björklund)

More at https://gizmodo.com/scientists-reconstruct-lola-after-finding-her-dna-in-1840481633

Complete genome sequence of Wuhan seafood market pneumonia virus is out !

Jit — Fri, 31 Jan 2020 02:36:59 -0600

Wuhan-Hu-1 claimed at least 40 lives and infected at least 1300 others in China. Cases are now being reported from Thailand, Singapore, Malaysia, South Korea, Japan, Vietnam, Nepal, France, Australia and even as far as the US. On Jan 10 2020, while news of the first fatality was barely trickling in, the 29,903 letters constituting the viral genome from an affected individual in Wuhan had already been elucidated (even though a few corrections were made subsequently). All the viral genome sequences from affected individuals are very very close to each other. Several are identical and none has more than 5 differences (99.983% similarity). This strongly suggests that transmission into humans came from a single pointed source and happened very recently, between Sep-Dec 2019.

Check out the detail at https://www.ncbi.nlm.nih.gov/nuccore/MN908947