Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Question: Question: how to determine @RG tags from a FastQ header??

Nadia Baig
3175 days ago

Question: how to determine @RG tags from a FastQ header??

i have downloaded sra file and converted into paired end FastQ having following headers:

@HWUSI-EAS754_0001:4:1:5605:1034#GCCAT

The head and tail of file are as follow

Head:

==> ERR042057_1.fastq <==
@HWUSI-EAS754_0001:4:1:5605:1034#GCCAAT
TNACTGTTTCTCTAAAACCTTACAAGAAAAACTAAGCTTCTCTAAACTTGTATTCATTATGGGAGNATGNCA
+HWUSI-EAS754_0001:4:1:5605:1034#GCCAAT
C)CCCGGGGGIIIIIIIIIIIIIHIIIIIIIIIIIGIHIIIIIHHIIIHIHIIHIIHIIHAAA?B#######
@HWUSI-EAS754_0001:4:1:6122:1035#GCCAAT
ANGTTGTCTAAAGTAATAACTCCATTGGCTATTATTTTCTAAACATCTAAAACAAAACTTAACTANCGANAT
+HWUSI-EAS754_0001:4:1:6122:1035#GCCAAT
B(B?BGGGGGIIIICIIIIIIIIIIIIIIIIHIHIIIIIIIIIIIIGIIIIIIFIIIIGIEEECE#######
@HWUSI-EAS754_0001:4:1:6654:1035#GCCAAT
TNTTTAAATAAAAGGTTCTGTATCTTGATCCTGAAGTATCCTGTAATAGCCTACAGTGAAAAGAANACANTT

==> ERR042057_2.fastq <==
@HWUSI-EAS754_0001:4:1:5605:1034#GCCAAT
AAGAAATTTATTCCTGCAGATCAATATTTCCCGCAACCACTTGAGAATTTCTGTGCTACTAAGCATTTGATG
+HWUSI-EAS754_0001:4:1:5605:1034#GCCAAT
IIHHHIIIIHIHIIHGIIIIIIGIHGGG4D<??:@CC?C@CGEDFGEEEEGH>EH<BEEFED8CFBEBEB?#
@HWUSI-EAS754_0001:4:1:6122:1035#GCCAAT
AGAAACTGAGCTGAAGAGGGTAAAAGTCCTTGGCTCAGGTGCTTTTGGAACGGTTTATAAAGTAAGTAAAAA
+HWUSI-EAS754_0001:4:1:6122:1035#GCCAAT
HHGHHHHHHHHHHHHHFHHHHHHFFHEHHHGHGH<DBDB=DDDDDBFCDEB><BB>CBDB=D?BE?8A?A37
@HWUSI-EAS754_0001:4:1:6654:1035#GCCAAT
CTAAACCCCCCCCCACCCCACCCCCCCACCACACCAACCCCACCCACACACCCCAACCACCCTCACACTCTC



Tail:

==> ERR042057_1.fastq <==
+HWUSI-EAS754_0001:4:120:19005:21176#GCCAAT
G>D>DBDDGBGGE@>EGGGGBG;GGGGDBG##########################################
@HWUSI-EAS754_0001:4:120:19024:21176#GCCAAT
GAGAAAGAATTCAAACTGATTTTTCTTTTCTTNNNNNNNNNNNNGGGCACTNNNNNNNNNNNTGGCCCTCCT
+HWUSI-EAS754_0001:4:120:19024:21176#GCCAAT
IIIEIIIHIIIHIIIIIIEIHIIIIHHIII##########################################
@HWUSI-EAS754_0001:4:120:19130:21175#GCCAAT
CGGGGAAGAGCGCCAGCACCGAGGTGCCAGGTNNNNNNNNNNNNGCGGAGAANNNNNNNNNNATCATGCAGT
+HWUSI-EAS754_0001:4:120:19130:21175#GCCAAT
HIGIIHIIIIIIIHDIIHIIGGII@DD?BD##########################################

==> ERR042057_2.fastq <==
+HWUSI-EAS754_0001:4:120:19005:21176#GCCAAT
?AA6A?A?A?A@@<A#########################################################
@HWUSI-EAS754_0001:4:120:19024:21176#GCCAAT
ATTAGCAGACATATATCTTTTCTCTGAAATCTAAATACTTGCAGAAATACTAATTTTCATTTTATATTATGT
+HWUSI-EAS754_0001:4:120:19024:21176#GCCAAT
HGHHFFEHEHHHFHHBDBH:DGGGEHFBHHHDGHHHDHDHBGGGEGGEGGGBGGG(+:;;F<=F@;FE?EGG
@HWUSI-EAS754_0001:4:120:19130:21175#GCCAAT
CAAAGCAGGCCCCACGAGGCTGGCTGCGTGCGGGGTGCTCACCCGTGGCCGGTCCTGCGGGGCCCGCTGATC
+HWUSI-EAS754_0001:4:120:19130:21175#GCCAAT
HFHHHHHHEHHHHHHA3E8GD>GGBBEEBEGG<G>?AD>GGGEEGD>G@DGD3D##################

 

 

 

How to determine the @RG ?

 

I'm using the following command to add @RG tag:

java -jar picard.jar AddOrReplaceReadGroups I=input.bam O=output.bam RGID=? RGLB=? RGPL=illumina RGPU=? RGSM=?

 

 

What should be the @RG tags values for following parameters:

RGID=?

RGLB=?

RGPL=illumina

RGPU=?

RGSM=?

The link for SRA data is :

http://www.ncbi.nlm.nih.gov/sra/ERX019190[accn

 

 

Thankyou in advance

 

Answers
1

To see the read group information for a BAM file, use the following command.

samtools view -H sample.bam | grep '@RG'

Option Description of AddOrReplaceReadGroups

INPUT (String) Input file (BAM or SAM or a GA4GH url). Required.
OUTPUT (File) Output file (BAM or SAM). Required.
SORT_ORDER (SortOrder) Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate}
RGID (String) Read Group ID Default value: 1. This option can be set to 'null' to clear the default value.
RGLB (String) Read Group library Required.
RGPL (String) Read Group platform (e.g. illumina, solid) Required.
RGPU (String) Read Group platform unit (eg. run barcode) Required.
RGSM (String) Read Group sample name Required.
RGCN (String) Read Group sequencing center name Default value: null.
RGDS (String) Read Group description Default value: null.
RGDT (Iso8601Date) Read Group run date Default value: null.
RGPI (Integer) Read Group predicted insert size Default value: null.
RGPG (String) Read Group program group Default value: null.
RGPM (String) Read Group platform model Default value: null.

0

thank u