<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Bioinformatics OneLiner]]></title>
	<link>https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner?</link>
	<atom:link href="https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</guid>
	<pubDate>Tue, 10 Apr 2018 04:13:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</link>
	<title><![CDATA[Bioinformatics OneLiner]]></title>
	<description><![CDATA[<p>To remove all line ends (\n) from a Unix text file:</p><pre>sed ':a;N;$!ba;s/\n//g' filename.txt &gt; newfilename_oneline.txt</pre><p>To get average for a column of numbers (here the second column $2):</p><pre>awk '{ sum += $2; n++ } END { if (n &gt; 0) print sum / n; }'</pre><p>To get sequence length for all sequences in a fasta file:</p><pre>awk '/^&gt;/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \<br />filename.fasta</pre><p>To copy (move, rename, etc) files based on their list in a text file:</p><pre>cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done</pre><p>To split bam files into sets with mapped and unmapped reads:</p><pre>samtools view -F4 sample.bam &gt; sample.mapped.sam<br />samtools view -f4 sample.bam &gt; sample.unmapped.sam</pre><p>To gzip all your fastq files using gnu parallel and gzip:</p><pre>parallel gzip ::: *.fastq</pre><p>To gzip all your fastq files using pigz:</p><pre>pigz *.fastq</pre><p>To count all sequences in a fasta file:</p><pre>grep "^&gt;" yourfile.fasta -c</pre><p>To count all sequences in all fasta files in your current directory:</p><pre>for a in *.fasta; do ls $a; grep "^&gt;" -c $a; done</pre><p>To keep only one copy of duplicated lines:</p><pre>awk '!seen[$0]++'</pre><p>To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:</p><pre>grep "^&gt;" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc</pre><p>To remove everything after the first space at each line, e.g. to to simplify fasta headers:</p><pre>cut -d' ' -f1 &lt; your_file</pre><p>To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):</p><pre>parallel "echo {} &amp;&amp; gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz</pre><p>To count reads in a all .fastq.gz files in your current folder:</p><pre>zcat *.gz | echo $((`wc -l`/4))</pre><p>To count reads in a all .fastq files in your current folder:</p><pre>cat *.fastq | echo $((`wc -l`/4))</pre><p>To count base pairs in a all .fastq.gz files in your current folder:</p><pre>zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c </pre><p>To split multifasta file into many fasta files:</p><pre>awk '/^&gt;/ {OUT=substr($0,2) ".fa"}; {print &gt;&gt; OUT; close(OUT)}' Input_File</pre><p>To convert Illumina FASTQ 1.3 to 1.8:</p><pre>sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&amp;'\''()*+,-.\/0123456789:;&lt;=&gt;?@ABCDEFGHIJ/' f.fastq</pre><p>To convert FASTQ to FASTA:</p><pre>sed -n '1~4s/^@/&gt;/p;2~4p' </pre><p>To get fastq read length distribution:</p><pre>cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c</pre><p>To deinterleave interleaved fastq file:</p><pre>cat myf.fq | paste - - - - - - - - | tee &gt;(cut -f 1-4 | tr "\t" "\n" &gt; myfile_1.fq) | cut -f 5-8 | \<br />tr "\t" "\n" &gt; myf2.fq </pre><p>To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght &gt;= 4000 + coverage &gt;=100):</p><pre>grep "^&gt;" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 &gt;= 4000 &amp;&amp; $6 &gt;= 100) print $0 }' | sort -k 4 -n | \<br />sed s"/ /_/"g</pre><p>To append something to all headers of your fasta files:</p><pre>sed 's/&gt;.*/&amp;YOURSTRING/' filename.fasta &gt; new_filename.fasta</pre><p>To replace/squeeze multiple adjacent spaces by only one space:&nbsp;</p><pre>tr -s " " &lt; file</pre><p>To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.</p><pre>cat your.fastq | paste - - - - | awk 'length($2)&nbsp; &gt;= 21 &amp;&amp; length($2) &lt;= 25' | sed 's/\t/\n/g' &gt; filtered.fastq</pre><p>To print difference between the last and first row in 5th column:</p><pre>awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt</pre><p>To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):</p><pre>awk '/^&gt;/{ seqlen=0; print; next; } seqlen &lt; 200 { if (seqlen + length($0) &gt; 200) $0 = substr($0, 1, 200-seqlen);\<br /> seqlen += length($0); print }' scaffolds.fasta &gt; 200bp_scaffolds.fasta</pre><p>&nbsp;To pipe a compressed fasta file directly into makeblastdb.</p><pre>gunzip -c fasta.gz | makeblastdb -in -</pre><p>To remove sequences with duplicate fasta headers from a fasta file.</p><pre>awk '/^&gt;/{f=!d[$1];d[$1]=1}f' in.fasta &gt; out.fasta</pre>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>