<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Converting FASTQ to FASTA]]></title>
	<link>https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta?</link>
	<atom:link href="https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta</guid>
	<pubDate>Fri, 12 Jan 2018 03:49:09 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta</link>
	<title><![CDATA[Converting FASTQ to FASTA]]></title>
	<description><![CDATA[<div id="block-system-main"><div><div><div><div><div><div><p>There are several ways you can convert fastq to fasta sequences. Some methods are listed below.</p><h3>Using SED</h3><p><span><code><span>sed</span></code></span>&nbsp;can be used to selectively print the desired lines from a file, so if you print the first and 2rd line of every 4 lines, you get the sequence header and sequence needed for fasta format.</p><pre>sed -n '1~4s/^@/&gt;/p;2~4p' INFILE.fastq &gt; OUTFILE.fasta
</pre><h3>Using PASTE</h3><p>You can linerize every 4 lines in a tabular format and print first and second field using&nbsp;<span><code>paste</code></span></p><pre>cat INFILE.fastq | paste - - - - |cut -f 1, 2| sed 's/@/&gt;/'g | tr -s "/t" "/n" &gt; OUTFILE.fasta
</pre><h3>EMBOSS:seqret</h3><p>Standard script that can be used for many purposes. One such use is fastq-fasta conversion</p><pre>seqret -sequence reads.fastq -outseq reads.fasta
</pre><p><span><code><span>awk</span></code></span>&nbsp;can be used for conversion as follows:</p><h3>Using AWK</h3><pre>cat infile.fq | awk '{if(NR%4==1) {printf("&gt;%s\n",substr($0,2));} else if(NR%4==2) print;}' &gt; file.fa
</pre><h3>FASTX-toolkit</h3><p><span><code>fastq_to_fasta</code></span>&nbsp;is available in the FASTX-toolkit that scales really well with the huge datasets</p><pre>fastq_to_fasta -h
usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE]
# Remember to use -Q33 for illumina reads!
version 0.0.6
       [-h]         = This helpful help screen.
       [-r]         = Rename sequence identifiers to numbers.
       [-n]         = keep sequences with unknown (N) nucleotides.
                   Default is to discard such sequences.
       [-v]         = Verbose - report number of sequences.
                   If [-o] is specified,  report will be printed to STDOUT.
                   If [-o] is not specified (and output goes to STDOUT),
                   report will be printed to STDERR.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA output file. default is STDOUT.
</pre><h3>Bioawk</h3><p>Another option to convert fastq to fasta format using&nbsp;<span><code>bioawk</code></span></p><pre>bioawk -c fastx '{print "&gt;"$name"\n"$seq}' input.fastq &gt; output.fasta
</pre><h3>Seqtk</h3><p>From the same developer, there is another option using a tool called&nbsp;<span><code>seqtk</code></span></p><pre>seqtk seq -a input.fastq &gt; output.fasta
</pre><p>Note that you can use either compressed or uncompressed files for this tool</p></div></div></div></div></div></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

</channel>
</rss>