BOL: Related items

Prokka: tool for the rapid annotation of prokaryotic genomes

Jit — Mon, 06 Mar 2017 03:49:57 -0600

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Address of the bookmark: http://www.vicbioinformatics.com/software.prokka.shtml

COCACOLA (binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge)

Jit — Tue, 07 Mar 2017 08:50:57 -0600

COCACOLA is a general framework that combines different types of information: sequence COmposition, CoverAge across multiple samples, CO-alignment to reference genomes and paired-end reads LinkAge to automatically bin contigs into OTUs. Furthermore, COCACOLA seamlessly embraces customized prior knowledge to facilitate binning accuracy.

News: Python version of COCACOLA is available now!

Address of the bookmark: https://github.com/younglululu/COCACOLA

Multigenome assembly

Jit — Tue, 14 Mar 2017 04:41:23 -0500

This project contains scripts and tutorials on how to assemble individual microbial genomes from metagenomes, as described in:

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

Mads Albertsen, Philip Hugenholtz, Adam Skarshewski, Gene W. Tyson, Kåre L. Nielsen and Per .H. Nielsen

Nature Biotechnology 2013, doi: 10.1038/nbt.2579

See the associated online guide for detailed information.

https://github.com/MadsAlbertsen/multi-metagenome

Address of the bookmark: https://github.com/MadsAlbertsen/multi-metagenome

Download assemblies from NCBI

Bulbul — Mon, 15 May 2017 06:02:32 -0500

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.

More at https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/08/genome-data-download-made-easy/

CAMSA :: a tool for Comparative Analysis and Merging of Scaffold Assemblies

Rahul Nayak — Thu, 28 Dec 2017 09:10:26 -0600

CAMSA – is a tool for Comparative Analysis and Merging of Scaffold Assemblies, distributed both as a standalone software package and as Python library under the MIT license.

Main features:

works with any number of scaffold assemblies in de-novo non-progressive fashion
allows to simultaneously work with scaffold assemblies obtained from any in silico and in vitro techniques, supporting multiple existing formats via built-in converters
creates an extensive report with several comparative quality metrics (both on assembly level and on the level of individual assembly points)
constructs a merged combined scaffold assembly
provides an interactive framework for a visual comparative analysis of the given assemblies

Address of the bookmark: https://cblab.org/camsa/

RNA-seq Analysis Workshop Course Materials

Jit — Tue, 03 Jul 2018 08:14:14 -0500

RNAseq can be roughly divided into two "types": Reference genome-based - an assembled genome exists for a species for which an RNAseq experiment is performed. It allows reads to be aligned against the reference genome and significantly improves our ability to reconstruct transcripts. This category would obviously include humans and most model organisms but excludes the majority of truly biologically intereting species (e.g., Hyacinth macaw); Reference genome-free - no genome assembly for the species of interest is available. In this case one would need to assemble the reads into transcripts using de novo approaches. This type of RNAseq is as much of an art as well as science because assembly is heavily parameter-dependent and difficult to do well. In this lesson we will focus on the Reference genome-based type of RNA seq. http://chagall.med.cornell.edu/RNASEQcourse/

Address of the bookmark: http://chagall.med.cornell.edu/RNASEQcourse/

LRSDAY: Long-read Sequencing Data Analysis for Yeasts

Poonam Mahapatra — Mon, 26 Aug 2019 18:07:33 -0500

Long-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads.

Address of the bookmark: https://github.com/yjx1217/LRSDAY

BioJupies: Automatically Generates RNA-seq Data Analysis Notebooks

Rahul Nayak — Sun, 20 Dec 2020 11:43:45 -0600

With BioJupies you can produce in seconds a customized, reusable, and interactive report from your own raw or processed RNA-seq data through a simple user interface

BioJupies now supports user accounts! Sign in from the top right corner of the page for access to unlimited private notebooks, RNA-seq datasets and alignment jobs.

Address of the bookmark: https://amp.pharm.mssm.edu/biojupies/

PhyloHerb: Phylogenomic Analysis Pipeline for Herbarium Specimens

LEGE — Wed, 21 Feb 2024 06:15:13 -0600

What is PhyloHerb: PhyloHerb is a wrapper program to process genome skimming data collected from plant materials. The outcomes include the plastid genome (plastome) assemblies, mitochondrial genome assemblies, nuclear ribosomal DNAs (NTS+ETS+18S+ITS1+5.8S+ITS2+28S), alignments of gene and intergenic regions, and a species tree. It is designed to be a high throughput program dealing with lower quality data. Examples include low-coverage (5x cpDNA) plastome phylogeny, recycling plastid genes from target enrichment data, retrieving low-copy nuclear genes from medium coverage (5x nucDNA) genome skimming.

Address of the bookmark: https://github.com/lmcai/PhyloHerb/

Step-by-Step Guide to Detect piRNAs Using Bioinformatics

Abhi — Fri, 13 Dec 2024 11:41:46 -0600

Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs that play crucial roles in silencing transposable elements and regulating gene expression, particularly in germline cells. Detecting piRNAs involves identifying their unique characteristics, such as size, sequence motifs, and association with Piwi proteins, from high-throughput RNA sequencing data.

This blog provides a comprehensive step-by-step guide to detect piRNAs using bioinformatics tools and workflows.

Step 1: Prepare Your Data

Obtain RNA Sequencing Data
Acquire raw small RNA-seq data in FASTQ format. Datasets can be sourced from repositories like NCBI SRA, EMBL-EBI, or specific small RNA sequencing projects.
Quality Control (QC)
Use FastQC to assess the quality of raw reads:

fastqc reads.fastq

Evaluate the per-base quality, adapter content, and overrepresented sequences.
Trimming and Adapter Removal
Use tools like Cutadapt or Trim Galore! to remove adapters and low-quality bases:

cutadapt -a TGGAATTCTCGGGTGCCAAGG -o trimmed_reads.fastq reads.fastq

Ensure the remaining reads are of high quality for downstream analysis.

Step 2: Map Reads to the Genome

Mapping reads to the reference genome is crucial for identifying piRNA loci.

Reference Genome Preparation
Download the genome assembly of your organism from databases like Ensembl, UCSC Genome Browser, or NCBI.
Align Reads
Use Bowtie or STAR for small RNA alignment:

bowtie -v 1 -k 1 --best genome_index trimmed_reads.fastq -S aligned_reads.sam
- -v 1: Allows one mismatch.
- -k 1: Reports the best alignment.
Convert SAM to BAM
Convert and sort alignments using SAMtools:

samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam

Step 3: Identify Small RNAs

piRNAs are characterized by their size (24–32 nt) and strand bias.

Extract Reads by Size
Use tools like BEDtools or custom scripts to filter reads between 24 and 32 nt:

bedtools bamtofastq -i sorted_reads.bam -fq all_reads.fastq seqkit seq -m 24 -M 32 all_reads.fastq > piRNA_size_reads.fastq
Check for Sequence Bias
piRNAs often have a strong bias for a uridine at the 5’ end (1U bias). Use tools like WebLogo to visualize sequence motifs.

Step 4: Detect Ping-Pong Signature

The ping-pong amplification loop is a hallmark of piRNA biogenesis, characterized by a 10 nt overlap between piRNAs on opposite strands.

Generate Overlap Statistics
Use the piPipes tool or custom scripts to calculate overlap:

python ping_pong_overlap.py sorted_reads.bam
Visualize Overlap Distribution
Plot the distribution of overlaps to confirm the presence of the 10 nt ping-pong signature.

Step 5: Annotate piRNA Clusters

piRNAs are often generated from genomic clusters.

Cluster Identification
Use tools like proTRAC or PIRANHA to identify piRNA-producing clusters:

proTRAC.pl -s sorted_reads.bam -g genome.fa -o clusters
Annotate Genomic Regions
Annotate the identified clusters using gene annotation files (GTF/GFF). Tools like BEDtools intersect can help associate piRNA clusters with genes or transposable elements:

bedtools intersect -a clusters.bed -b genome_annotation.gtf > annotated_clusters.bed

Step 6: Functional Analysis

Functional analysis of piRNAs can uncover their targets and regulatory roles.

Predict piRNA Targets
Use tools like IntaRNA or RNAhybrid to predict interactions between piRNAs and potential target mRNAs:

RNAhybrid -t target_transcripts.fa -q piRNAs.fa > piRNA_targets.txt
Enrichment Analysis
Perform GO or KEGG enrichment analysis of target genes using tools like g:Profiler or DAVID.

Step 7: Validation and Visualization

Validate piRNA Candidates
Cross-check the identified piRNAs against known piRNA databases, such as piRBase or piRNAdb.
Visualize Results
- Use IGV (Integrative Genomics Viewer) to visualize piRNA alignment and clusters on the genome.
- Generate heatmaps or circos plots to present piRNA distributions.

Step 8: Share and Publish Findings

Archive Data
Submit sequencing data to public repositories like SRA or GEO with metadata specifying piRNA-related experiments.
Publish Results
Share findings in journals or conferences, emphasizing novel piRNA candidates, target genes, or regulatory mechanisms.

Conclusion

Detecting piRNAs involves a combination of computational and analytical methods to identify these unique small RNAs and their roles in gene regulation and transposable element suppression. By following this step-by-step guide, you can confidently navigate the complexities of piRNA detection and contribute to the growing understanding of their biological significance.