<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Bash script to discover piRNA in transcriptome data !]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data</guid>
	<pubDate>Fri, 13 Dec 2024 11:47:00 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data</link>
	<title><![CDATA[Bash script to discover piRNA in transcriptome data !]]></title>
	<description><![CDATA[<code>#!/bin/bash

# Variables (modify these as per your setup)
INPUT_FASTQ=&quot;input_reads.fastq&quot;
ADAPTER_SEQ=&quot;TGGAATTCTCGGGTGCCAAGG&quot;
REFERENCE_GENOME=&quot;reference_genome.fa&quot;
BOWTIE_INDEX=&quot;reference_index&quot;
OUTPUT_DIR=&quot;piRNA_analysis&quot;
THREADS=4

# Create output directory
mkdir -p $OUTPUT_DIR

# Step 1: Quality Control
echo &quot;Running FastQC for quality control...&quot;
fastqc $INPUT_FASTQ -o $OUTPUT_DIR

# Step 2: Adapter Trimming
echo &quot;Trimming adapters with Cutadapt...&quot;
cutadapt -a $ADAPTER_SEQ -o $OUTPUT_DIR/trimmed_reads.fastq $INPUT_FASTQ

# Step 3: Mapping Reads to Reference Genome
echo &quot;Mapping reads to reference genome using Bowtie...&quot;
bowtie -v 1 -k 1 --best -p $THREADS $BOWTIE_INDEX $OUTPUT_DIR/trimmed_reads.fastq -S $OUTPUT_DIR/aligned_reads.sam

# Step 4: Convert SAM to BAM and Sort
echo &quot;Converting SAM to BAM and sorting...&quot;
samtools view -Sb $OUTPUT_DIR/aligned_reads.sam | samtools sort -o $OUTPUT_DIR/sorted_reads.bam

# Step 5: Extract Reads of piRNA Size (24-32 nt)
echo &quot;Filtering reads of size 24-32 nt...&quot;
bedtools bamtofastq -i $OUTPUT_DIR/sorted_reads.bam -fq $OUTPUT_DIR/all_reads.fastq
seqkit seq -m 24 -M 32 $OUTPUT_DIR/all_reads.fastq &gt; $OUTPUT_DIR/piRNA_size_reads.fastq

# Step 6: Detect Sequence Bias (Optional)
echo &quot;Checking sequence bias using WebLogo-compatible data...&quot;
seqkit fx2tab $OUTPUT_DIR/piRNA_size_reads.fastq | cut -f2 | awk &#039;{print &quot;&gt;seq&quot;NR&quot;\n&quot;$0}&#039; &gt; $OUTPUT_DIR/piRNA_sequences.fa

# Step 7: Identify piRNA Clusters
# This step requires a tool like proTRAC or PIRANHA. Example placeholder:
echo &quot;Identifying piRNA clusters (requires proTRAC or PIRANHA)...&quot;
# Example with proTRAC:
# proTRAC.pl -s $OUTPUT_DIR/sorted_reads.bam -g $REFERENCE_GENOME -o $OUTPUT_DIR/clusters

# Step 8: Annotate Clusters
# Annotation depends on your genome&#039;s annotation file
# bedtools intersect example placeholder:
# bedtools intersect -a clusters.bed -b genome_annotation.gtf &gt; annotated_clusters.bed

# Step 9: Clean up intermediate files (optional)
echo &quot;Cleaning up intermediate files...&quot;
rm $OUTPUT_DIR/aligned_reads.sam $OUTPUT_DIR/all_reads.fastq

# Done
echo &quot;piRNA discovery pipeline completed! Results are in $OUTPUT_DIR.&quot;</code>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>

</channel>
</rss>