BOL: Related items

Shasta long read assembler

Jit — Tue, 14 Jan 2020 06:47:07 -0600

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by Oxford Nanopore flow cells.

Computational methods used by the Shasta assembler include:

Using a run-length representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.
Using in some phases of the computation a representation of the read sequence based on markers, a fixed subset of short k-mers (k ≈ 10).

More at https://chanzuckerberg.github.io/shasta/index.html

Address of the bookmark: https://github.com/chanzuckerberg/shasta

Understanding your reads and mapping !

Neel — Wed, 29 Jan 2020 06:29:55 -0600

One of the best tutorial for beginners ...

https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2017/Day1/Session4-seqIntro.html

Address of the bookmark: https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2017/Day1/Session4-seqIntro.html

Scallop: reference-based transcriptome assembler for RNA-seq

Rahul Nayak — Tue, 08 May 2018 04:23:27 -0500

Scallop is an accurate reference-based transcript assembler. Scallop features its high accuracy in assembling multi-exon transcripts as well as lowly expressed transcripts. Scallop achieves this improvement through a novel algorithm that can be proved preserving all phasing paths from reads and paired-end reads, while also achieves both transcripts parsimony and coverage deviation minimization.

Scallop paper has been published at Nature Biotechnology. The datasets and scripts used in this paper to compare the performance of Scallop and other assemblers are available at scalloptest.

Please also checkout the podcast about Scallop (thanks Roman Cheplyaka for the interview). It is available at both the bioinformatics chat and iTunes.

https://github.com/Kingsford-Group/scallop

Address of the bookmark: https://github.com/Kingsford-Group/scallop

Modular, efficient and constant-memory single-cell RNA-seq preprocessing

Jit — Mon, 05 Apr 2021 11:19:43 -0500

With kallisto | bustools you can

Generate a cell x gene or cell x transcript equivalence class count matrix
Perform RNA velocity and single-nuclei RNA-seq analsis
Quantify data from numerous technologies such as 10x, inDrops, and Dropseq.
Customize workflows for new technologies and protocols.
Process feature barcoding data such as CITE-seq, REAP-seq, MULTI-seq, Clicktags, and Perturb-seq.
Obtain QC reports from single-cell RNA-seq data

The kallisto | bustools workflow is described in:

Páll Melsted*, A. Sina Booeshaghi*, Lauren Liu, Fan Gao, Lambda Lu, Kyung Hoi (Joseph) Min, Eduardo da Veiga Beltrame, Kristján Eldjárn Hjörleifsson, Jase Gehring & Lior Pachter† Modular and efficient pre-processing of single-cell RNA-seq, Nature Biotechnology (2021).

Documentation and tutorials for the kallisto bustools workflow are available at http://pachterlab.github.io/kallistobustools.

https://www.nature.com/articles/s41587-021-00870-2

Address of the bookmark: https://pachterlab.github.io/kallistobustools/

New RNA Seq tool

Rahul Agarwal — Fri, 25 Apr 2014 10:59:04 -0500

"By removing the time-consuming step of read mapping, the authors reported, Sailfish able to provide quantification estimates 20–30 times faster than current methods without loss of accuracy."

Tool link:

http://www.cs.cmu.edu/~ckingsf/software/sailfish/

Address of the bookmark: http://www.genengnews.com/gen-news-highlights/lightweight-algorithms-sail-through-rna-sequencing-data/81249765/

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

Yeshodari — Mon, 23 Oct 2017 02:36:05 -0500

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

ORLANDO, USA, Oct 17, 2017/ PRNewswire/

Strand NGS now supports large scale RNA- and small-RNA-Seq and Unique Molecular Identifiers (UMIs) for DNA-, RNA-, and small-RNA-Seq.

Strand Life Sciences announced the latest version release of its bioinformatics flagship product, Strand NGS, at the Annual Meeting of the American Society of Human Genetics today. Two major themes in Strand NGS v3.1 address recent challenges in next generation sequencing (NGS).

The first theme is large-scale RNA-Seq data analysis. Current cross-cohort RNA- and small-RNA-Seq studies span tens of replicates and batches across hundreds of samples, sometimes conducted across several different institutions. For such studies, Strand NGS v3.1 includes confounding variable analysis to eliminate technical effects, including batch effects; the t-SNE plot; profile and heat-map plots of gene-body coverage; and several other notable visual enhancements.

The second new feature is support for Unique Molecular Identifiers, or UMIs, for DNA-, RNA- and small-RNA-Seq. UMI support in Strand NGS is end-to-end, spanning alignment to variant calling in DNA-Seq, and alignment to quantification in RNA- and small-RNA-Seq. The Bioo Scientific, Qiagen, and Rubicon UMI protocols are natively supported, and an intuitive interface allows the specification of custom UMI protocols.

“For liquid biopsies and low-grade FFPE samples, UMI support in DNA-Seq enables the detection of somatic variants at low concentrations. In RNA-Seq, large-scale and UMI support can be used in single-cell-based studies that reveal tumor-cell heterogeneity, even at low concentrations”, says Dr. Vamsi Veeramachaneni, Chief Scientific Officer, Strand Life Sciences.

“At Strand, we are continuously working towards improving the accuracy and efficiency of NGS data analysis. Customers can look forward to Strand NGS becoming available on the cloud in the near future”, says Dr. Ramesh Hariharan, Chief Executive Officer, Strand Life Sciences.

Visit Strand Life Sciences at ASHG booth #1017 to know more about Strand NGS v3.1 and other products and service offerings from Strand Life Sciences. Click here to access detailed agenda and v3.1 release notes.

About Strand Life Sciences

Strand Life Sciences is a premier life science informatics innovation company. Founded in 2000, Strand is a leader in technology innovations for healthcare using genomics. By enhancing sequence-based diagnostics and clinical genomic data interpretation using a strong foundation of computational, scientific, and medical expertise, Strand is bringing individualized medicine to the world. To know more, visit www.strandls.com

Address of the bookmark: http://www.strand-ngs.com/strand-announce-strandngss-v31

proActiv: Estimation of Promoter Activity from RNA-Seq data

BioStar — Thu, 13 Aug 2020 10:21:44 -0500

proActiv is an R package that estimates promoter activity from RNA-Seq data. proActiv uses aligned reads and genome annotations as input, and provides absolute and relative promoter activity as output. The package can be used to identify active promoters and alternative promoters, the details of the method are described in Demircioglu et al.

Additional data on differential promoters in tissues and cancers from TCGA, ICGC, GTEx, and PCAWG can be downloaded here: https://jglab.org/data-and-software/

Address of the bookmark: https://github.com/GoekeLab/proActiv

Exploring RNA Sequence Analysis: Tools for Every Bioinformatician

Neel — Fri, 13 Dec 2024 04:03:04 -0600

RNA sequence analysis has become an essential part of modern biological research. From RNA-seq pipelines to specialized tools for specific RNA types, here's a comprehensive guide to tools you can use to make sense of RNA data.

1. RNA-Seq Analysis Pipelines

RNA-seq is one of the most popular techniques for studying RNA. These tools streamline processing raw sequence data:

FASTQC: For quality control of raw RNA-seq reads.
Trimmomatic: For trimming and filtering RNA-seq reads.
HISAT2/STAR: High-performance aligners for RNA-seq reads.
FeatureCounts: For quantifying gene expression.
DESeq2/EdgeR: For differential expression analysis.

2. Transcriptome Assembly and Annotation

For analyzing transcriptomes from non-model organisms or assembling novel transcripts:

Trinity: For de novo transcriptome assembly.
StringTie: For transcript assembly and quantification from RNA-seq alignments.
TransDecoder: To predict coding regions within assembled transcripts.
TAU: Tools for annotating non-coding and coding RNAs.

3. Exploring Non-Coding RNA (ncRNA)

Non-coding RNAs play critical regulatory roles. Dedicated tools for studying them include:

Infernal: For identifying ncRNA sequences based on covariance models.
Rfam: Database and tools for ncRNA families.
miRDeep: For identifying microRNAs in RNA-seq datasets.

4. RNA Structure and Motif Analysis

Structural biology of RNA helps in understanding its function:

RNAfold (ViennaRNA): Predicts secondary structures from RNA sequences.
RNAstructure: Tools for RNA secondary structure prediction and analysis.
MEME Suite: For identifying motifs in RNA sequences.
IntaRNA: For RNA-RNA interaction prediction.

5. RNA Editing and Modifications

Epitranscriptomics is a growing field focusing on RNA modifications:

REDItools: For RNA editing analysis.
m6Aboost: For identifying m6A modifications in RNA.

6. Long-Read RNA Sequencing Analysis

Long-read technologies like Nanopore and PacBio are transforming RNA research:

FLAIR: For isoform-level analysis of long-read RNA-seq data.
NanoMod: For detecting modifications in RNA from Nanopore sequencing.

7. RNA-Protein Interactions

To study RNA-protein interactions and complexes:

RBPmap: For identifying RNA-binding protein motifs.
PARalyzer: For analyzing PAR-CLIP data.

8. Functional Enrichment Analysis

Understanding biological functions and pathways from RNA-seq data:

getENRICH: A tool designed for pathway enrichment analysis of non-model organisms (hypergeometric P-value calculation with FDR correction).
ClusterProfiler: For GO and KEGG pathway enrichment analysis.

9. Visualization and Data Sharing

Presenting and sharing RNA sequence analysis results effectively:

IGV: Genome browser for visualizing RNA-seq alignments.
Circos: Circular visualization of RNA-seq data.
DashBio: A Python library for creating bioinformatics visualizations.

Conclusion

The bioinformatics landscape for RNA sequence analysis is vast, with tools catering to specific needs. Whether you’re studying coding RNAs, non-coding RNAs, or exploring RNA-protein interactions, the right tools can transform your data into biological insights.

Webinar on RNA-Seq Data Analysis on 28 Feb 2018

Strand — Thu, 22 Feb 2018 06:38:48 -0600

Strand NGS is a biologist friendly NGS analysis tool that allows biologists to analyze their data using a very intuitive workflow for the analysis and visualization of RNA-Seq data. This webinar will give an overview of the workflow which includes Transcriptome/ Genome alignment, Differential expression analysis, Splicing events and gene fusion detection. Strand NGS also supports novel discovery like identification of novel genes, exons and novel splice junctions.
We will highlight the use of Strand NGS features such as PCA, sample correlation, clustering, Venn diagrams, CVA, UMI support and elastic genome browser used in RNA-Seq workflow that supports large scale RNA-Seq data analysis too. The tool also supports biological contextualization on the set of interesting genes from the data by allowing downstream analysis such as GO and pathway analysis. The product has an option to create pipelines for time consuming jobs which automates analysis and leaves more time for end data interpretation. This webinar will give an overview of the features in the RNA-Seq data analysis workflow in Strand NGS.

Details:
Session 1: 28 Feb 2018, 9 AM CET
Session 2: 28 Feb 2018, 8 AM PST
Register here: http://www.strand-ngs.com/webinar_registration

About Speaker:

Dr. Suman Kapoor, Manager- Application Science at Strand Life Sciences, has over 11 years experience in molecular biology, next-generation sequencing based testing, clinical genomics, and personalized medicine for disease management and prenatal testing. Dr. Suman holds a Ph.D in Molecular and Cell Biology from Indian Institute of Science, Bangalore. Prior to joining Strand NGS team, Suman has worked extensively on protein synthesis in eubacteria and has experience working in CAP and NABL accredited lab validating and interpreting NGS based diagnostic tests.

Step-by-Step Guide to Detect piRNAs Using Bioinformatics

Abhi — Fri, 13 Dec 2024 11:41:46 -0600

Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs that play crucial roles in silencing transposable elements and regulating gene expression, particularly in germline cells. Detecting piRNAs involves identifying their unique characteristics, such as size, sequence motifs, and association with Piwi proteins, from high-throughput RNA sequencing data.

This blog provides a comprehensive step-by-step guide to detect piRNAs using bioinformatics tools and workflows.

Step 1: Prepare Your Data

Obtain RNA Sequencing Data
Acquire raw small RNA-seq data in FASTQ format. Datasets can be sourced from repositories like NCBI SRA, EMBL-EBI, or specific small RNA sequencing projects.
Quality Control (QC)
Use FastQC to assess the quality of raw reads:

fastqc reads.fastq

Evaluate the per-base quality, adapter content, and overrepresented sequences.
Trimming and Adapter Removal
Use tools like Cutadapt or Trim Galore! to remove adapters and low-quality bases:

cutadapt -a TGGAATTCTCGGGTGCCAAGG -o trimmed_reads.fastq reads.fastq

Ensure the remaining reads are of high quality for downstream analysis.

Step 2: Map Reads to the Genome

Mapping reads to the reference genome is crucial for identifying piRNA loci.

Reference Genome Preparation
Download the genome assembly of your organism from databases like Ensembl, UCSC Genome Browser, or NCBI.
Align Reads
Use Bowtie or STAR for small RNA alignment:

bowtie -v 1 -k 1 --best genome_index trimmed_reads.fastq -S aligned_reads.sam
- -v 1: Allows one mismatch.
- -k 1: Reports the best alignment.
Convert SAM to BAM
Convert and sort alignments using SAMtools:

samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam

Step 3: Identify Small RNAs

piRNAs are characterized by their size (24–32 nt) and strand bias.

Extract Reads by Size
Use tools like BEDtools or custom scripts to filter reads between 24 and 32 nt:

bedtools bamtofastq -i sorted_reads.bam -fq all_reads.fastq seqkit seq -m 24 -M 32 all_reads.fastq > piRNA_size_reads.fastq
Check for Sequence Bias
piRNAs often have a strong bias for a uridine at the 5’ end (1U bias). Use tools like WebLogo to visualize sequence motifs.

Step 4: Detect Ping-Pong Signature

The ping-pong amplification loop is a hallmark of piRNA biogenesis, characterized by a 10 nt overlap between piRNAs on opposite strands.

Generate Overlap Statistics
Use the piPipes tool or custom scripts to calculate overlap:

python ping_pong_overlap.py sorted_reads.bam
Visualize Overlap Distribution
Plot the distribution of overlaps to confirm the presence of the 10 nt ping-pong signature.

Step 5: Annotate piRNA Clusters

piRNAs are often generated from genomic clusters.

Cluster Identification
Use tools like proTRAC or PIRANHA to identify piRNA-producing clusters:

proTRAC.pl -s sorted_reads.bam -g genome.fa -o clusters
Annotate Genomic Regions
Annotate the identified clusters using gene annotation files (GTF/GFF). Tools like BEDtools intersect can help associate piRNA clusters with genes or transposable elements:

bedtools intersect -a clusters.bed -b genome_annotation.gtf > annotated_clusters.bed

Step 6: Functional Analysis

Functional analysis of piRNAs can uncover their targets and regulatory roles.

Predict piRNA Targets
Use tools like IntaRNA or RNAhybrid to predict interactions between piRNAs and potential target mRNAs:

RNAhybrid -t target_transcripts.fa -q piRNAs.fa > piRNA_targets.txt
Enrichment Analysis
Perform GO or KEGG enrichment analysis of target genes using tools like g:Profiler or DAVID.

Step 7: Validation and Visualization

Validate piRNA Candidates
Cross-check the identified piRNAs against known piRNA databases, such as piRBase or piRNAdb.
Visualize Results
- Use IGV (Integrative Genomics Viewer) to visualize piRNA alignment and clusters on the genome.
- Generate heatmaps or circos plots to present piRNA distributions.

Step 8: Share and Publish Findings

Archive Data
Submit sequencing data to public repositories like SRA or GEO with metadata specifying piRNA-related experiments.
Publish Results
Share findings in journals or conferences, emphasizing novel piRNA candidates, target genes, or regulatory mechanisms.

Conclusion

Detecting piRNAs involves a combination of computational and analytical methods to identify these unique small RNAs and their roles in gene regulation and transposable element suppression. By following this step-by-step guide, you can confidently navigate the complexities of piRNA detection and contribute to the growing understanding of their biological significance.