<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: To convert just one specific read group to fastq]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/41025/to-convert-just-one-specific-read-group-to-fastq?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/41025/to-convert-just-one-specific-read-group-to-fastq?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/41025/to-convert-just-one-specific-read-group-to-fastq</guid>
	<pubDate>Fri, 14 Feb 2020 03:35:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/41025/to-convert-just-one-specific-read-group-to-fastq</link>
	<title><![CDATA[To convert just one specific read group to fastq]]></title>
	<description><![CDATA[<code># Stop script on error.
set -uex

# The SRR BioProject number for the sequencing data.
PROJECT=PRJNA257197

# The number of datasets to subselect from the project.
N=5

# Get the project run information.
esearch -db sra -query $PROJECT  | efetch -format runinfo &gt; runinfo.txt

# Select the first N elements. Keep only valid SRR numbers.
cat runinfo.txt | cut -f 1 -d , | grep SRR | head -$N &gt; selected.txt

# Store the data in the reads folder.
mkdir -p reads

# Download the SRR data for each
cat selected.txt | parallel fastq-dump -O reads -X 1000 --split-files {}

# Create a directory for bam files
mkdir -p bam

# Generate a separate BAM file for each SAMPLE.
cat selected.txt | parallel &quot;picard FastqToSam F1=reads/{}_1.fastq F2=reads/{}_1.fastq O=bam/{}.bam  RG=GROUP-{} LB=LIB-{} SM=SAMPLE_{} QUIET=true 2&gt;&gt; log.txt&quot;

# Merge all the BAM files into one.
samtools merge -f all.bam bam/*.bam

# Investigate the readgroups in the header.
echo &quot;&quot;
echo &quot;SAM file header:&quot;
samtools view -H all.bam

echo &quot;&quot;
echo &quot;Number of alignments with read group: GROUP-SRR1972919&quot;
samtools view -c -r GROUP-SRR1972919 all.bam

# Reverting the process is to extract reads, tagged with readgroups to paired files.
samtools fastq -t -1 all1.fq -2 all2.fq all.bam

# To convert just one specific read group.
samtools view -r GROUP-SRR1972919 all.bam | samtools fastq -t -1 all_SRR1972919_1.fq -2 all_SRR1972919_2fq -</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>