BOL: Related items

MGcV: the microbial genomic context viewer for comparative genome analysis

Jit — Mon, 29 Jan 2018 04:55:46 -0600

MGcV is an interactive web-based visalization tool tailored to facilitate small scale genome analysis. To start using MGcV:

Supply your genes/genomic segments/phylogenetic tree of interest in the input-box by
- selecting the type of identifier and pasting identifiers (one per line)
- or by using the gene ID search tool
- or with the BLAST search tool
Click "Visualize context".

Consult the documentation to learn more about MGcV.

Address of the bookmark: http://mgcv.cmbi.ru.nl/

Carefully opt for human reference genome

biogeek — Tue, 18 Feb 2020 07:43:32 -0600

Heng Li posted several issues with the human reference genomes given in these resources and suggests the following compressed FASTA file to be used as hg38/GRCh38 human reference genome.

if you map reads to GRCh38 or hg38, use the following:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

There are several other versions of GRCh37/GRCh38. What’s wrong with them? Here are a collection of potential issues:

More at http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

Address of the bookmark: http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

Circlator: automated circularization of genome assemblies using long sequencing reads

Poonam Mahapatra — Tue, 15 May 2018 09:42:32 -0500

A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript. Citation: "Circlator: automated circularization of genome assemblies using long sequencing reads", Hunt et al, Genome Biology 2015 Dec 29;16(1):294. doi: 10.1186/s13059-015-0849-0. PMID: 26714481.

Address of the bookmark: http://sanger-pathogens.github.io/circlator/

Download blasr 1.3 version

Jit — Fri, 15 Jun 2018 03:01:20 -0500

DOWNLOAD LINK: https://github.com/BioInf-Wuerzburg/proovread/raw/master/util/blasr-1.3.1/blasr

I'm running "OPERA-LG_v2.0.5/bin/preprocess_reads.pl" and have the following error:

fail to open file './temporarySam'

[bwa_aln_core] write to the disk... 0.09 sec
[bwa_aln_core] 70778880 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 161.35 sec
[bwa_aln_core] write to the disk... 0.06 sec
[bwa_aln_core] 70989574 sequences have been processed.
[main] Version: 0.7.15-r1140
[main] CMD: bwa aln -t 30 all_p_ctg.fa -
[main] Real time: 2402.523 sec; CPU: 53429.488 sec
[E::hts_open_format] Failed to open file temporarySam
samtools sort: can't open "temporarySam": No such file or directory
[bwa_aln_core] convert to sequence coordinate... 1.00 sec
[bwa_aln_core] refine gapped alignments... 6.07 sec
[bwa_aln_core] print alignments... PREPROCESS:
Fastq format is recognized
[Thu Jun 14 18:16:47 2018] Building bwa index...
bwa index -p all_p_ctg.fa /home/urbe/Tools/OPERA-LG_v2.0.6/all_p_ctg.fa
[Thu Jun 14 18:18:35 2018] Finding the SA coordinates of the reads using BWA aln...
[Thu Jun 14 18:58:37 2018] Generate alignments of reads using bwa sampe...
bwa samse -n 1 all_p_ctg.fa read.sai - | grep '\(^@\|XT:A:U\)' | /usr/local/bin/samtools view -S -h -b -F 0x4 - | /usr/local/bin/samtools sort -@ 20 -no - temporarySam > FALCON-Unzip-Scaff.bam
Mapping long-reads using blasr...
/home/urbe/Tools/SSpace/SSPACE-LongRead_v1-1/blasr -nproc 40 -m 1 -minMatch 5 -bestn 10 -noSplitSubreads -advanceExactMatches 1 -nCandidates 1 -maxAnchorsPerPosition 1 -sdpTupleSize 7 /media/urbe/MyDDrive/ONTdata/allONT/allONT.fasta /home/urbe/Tools/OPERA-LG_v2.0.6/all_p_ctg.fa | cut -d ' ' -f1-5,7-12 | sed 's/ /\t/g' > FALCON-Unzip-Scaff.map
sh: 1: /home/urbe/Tools/SSpace/SSPACE-LongRead_v1-1/blasr: Permission denied
Sorting mapping results...
sort -k1,1 -k9,9g FALCON-Unzip-Scaff.map > FALCON-Unzip-Scaff.map.sort
Analyzing sorted results...
Extracting linking information...
i3 2000 5000
i2 1000 2000
i4 5000 15000
i0 -200 300
i5 15000 40000
i1 300 1000
Repeat detection...
/home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_conflicting_edge.pl pairedEdges_i0 contig_length.dat 100 2
Illegal division by zero at /home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_conflicting_edge.pl line 93.
readline() on closed filehandle FILE at bin/OPERA-long-read.pl line 250.
rm anchor_contig_info.dat contig_length.dat filtered_edges.dat filtered_edges_cov.dat *.sai
rm: cannot remove 'anchor_contig_info.dat': No such file or directory
mv FALCON-Unzip-Scaff.bam FALCON-Unzip-Scaff-with-repeat.bam
/home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_repeat.pl FALCON-Unzip-Scaff-with-repeat.bam repeat.dat | /usr/local/bin/samtools view - -h -S -b > FALCON-Unzip-Scaff.bam
rm FALCON-Unzip-Scaff-with-repeat.bam
/home/urbe/Tools/OPERA-LG_v2.0.6/bin/OPERA-LG config > log
Analyzing 1 library: FALCON-Unzip-Scaff.bam
min library mean : 0
minimum contig length is 500
Current library: 1 out of 7
Analyzing file: pairedEdges_no_repeat_i0
Analyzing file: pairedEdges_no_repeat_i1
Analyzing file: pairedEdges_no_repeat_i2
Analyzing file: pairedEdges_no_repeat_i3
Analyzing file: pairedEdges_no_repeat_i4
Analyzing file: pairedEdges_no_repeat_i5
ln -s results/scaffoldSeq.fasta scaffoldSeq.fasta

To resolve this, try downloading blasr version 1.3 above and re-run :)

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads

BioStar — Fri, 07 Sep 2018 05:19:06 -0500

P_RNA_scaffolder is a novel scaffolding tool using Pair-end RNA-seq to scaffold genome fragments. The method is suitable for most genomes. The program could utilize Illumina Paired-end RNA-sequencing reads from target speciesies. Our method provides another practical alternative to existing mate-pair_based approaches or other Protein-based approaches (for instance, PEP_scaffolder ) for scaffolding genome sequences. The most important feature of this method is to improve the completeness of gene regions and long-coding gene regions (for instance, circRNA).

Address of the bookmark: http://www.fishbrowser.org/software/P_RNA_scaffolder/#

Synima: a Synteny imaging tool for annotated genome assemblies

Abhimanyu Singh — Tue, 30 Oct 2018 10:49:13 -0500

Synima written in Perl, which uses the graphical features of R. Synima takes orthologues computed from reciprocal best BLAST hits or OrthoMCL, and DAGchainer, and outputs an overview of genome-wide synteny in PDF. Each of these programs are included with the Synima package, and a pipeline for their use. Synima has a range of graphical parameters including size, colours, order, and labels, which are specified in a config file generated by the first run of Synima – and can be subsequently edited. Synima runs quickly on a command line to generate informative and publication quality figures. Synima is open source and freely available from https://github.com/rhysf/Synima under the MIT License.

Address of the bookmark: https://github.com/rhysf/Synima

ANItools web: a web tool for fast genome comparison within multiple bacterial strains

Jit — Wed, 14 Nov 2018 04:34:23 -0600

ANItools is a software package written by PERL scripts that can be run in a Linux/Unix system. If you want to compare bacterial genomes and calculate their average nucleotide identity (ANI), you could download and run this program directly. Or you could send us the genome sequence by email. Then we will do the analysis work for you.

https://academic.oup.com/database/article/doi/10.1093/database/baw084/2630454

Address of the bookmark: http://ani.mypathogen.cn/

Purge Haplotigs: Pipeline to help with curating heterozygous diploid genome assemblies

Rahul Nayak — Mon, 17 Dec 2018 03:17:20 -0600

Some parts of a genome may have a very high degree of heterozygosity. This causes contigs for both haplotypes of that part of the genome to be assembled as separate primary contigs, rather than as a contig and an associated haplotig. This can be an issue for downstream analysis whether you're working on the haploid or phased-diploid assembly.

Identify pairs of contigs that are syntenic and move one of them to the haplotig 'pool'. The pipeline uses mapped read coverage and Minimap2 alignments to determine which contigs to keep for the haploid assembly. Dotplots are optionally produced for all flagged contig matches, juxtaposed with read-coverage, to help the user determine the proper assignment of any remaining ambiguous contigs. The pipeline will run on either a haploid assembly (i.e. Canu, FALCON or FALCON-Unzip primary contigs) or on a phased-diploid assembly (i.e. FALCON-Unzip primary contigs + haplotigs). Here are two examples of how Purge Haplotigs can improve a haploid and diploid assembly.

Address of the bookmark: https://bitbucket.org/mroachawri/purge_haplotigs

LTR_Finder: an efficient program for finding full-length LTR retrotranspsons in genome sequences.

Neel — Sun, 13 Jan 2019 07:05:53 -0600

LTR_Finder is an efficient program for finding full-length LTR retrotranspsons in genome sequences.

The Program first constructs all exact match pairs by a suffix-array based algorithm and extends them to long highly similar pairs. Then Smith-Waterman algorithm is used to adjust the ends of LTR pair candidates to get alignment boundaries. These boundaries are subject to re-adjustment using supporting information of TG..CA box and TSRs and reliable LTRs are selected. Next, LTR_FINDER tries to identify PBS, PPT and RT inside LTR pairs by build-in aligning and counting modules. RT identification includes a dynamic programming to process frame shift. For other protein domains, LTR_FINDER calls ps_scan (from PROSITE, http://www.expasy.org/prosite/) to locate cores of important enzymes if they occur.

Address of the bookmark: https://github.com/xzhub/LTR_Finder

Apollo: First instantaneous, collaborative genomic annotation editor available on the Web

Jit — Fri, 31 May 2019 19:55:39 -0500

Apollo is a plug-in for the JBrowse Genome Viewer.
In addition to genes and pseudogenes, users can annotate ncRNAs (snRNA, snoRNA, tRNA, rRNA), miRNAs, repeat regions, and transposable elements; each annotation type has its own configuration of the ‘Information Editor’.
History tracking with undo/redo functions is available.
Users are able to directly set an annotation to a specific state, choosing from the ‘History’ display.
Adding and updating PubMed IDs will prompt users with a publication title to confirm their submission.
Gene Ontology (GO) terms are supported and GO ID auto-completion has been incorporated.
Users may access a ‘Recent Changes’ page.
Help page with Apollo specific content is available.

Address of the bookmark: http://genomearchitect.github.io/