<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43094?offset=10</link>
	<atom:link href="https://bioinformaticsonline.com/related/43094?offset=10" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40994/biological-databases</guid>
	<pubDate>Wed, 12 Feb 2020 01:16:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40994/biological-databases</link>
	<title><![CDATA[Biological databases !]]></title>
	<description><![CDATA[<p>Now a days there are a lots of genomics databases available around the world. This bookmark is created to provide all links in one place ...</p>
<p>ftp://ftp.ncbi.nih.gov/genomes/</p>
<p>https://hgdownload.soe.ucsc.edu/downloads.html</p><p>Address of the bookmark: <a href="ftp://ftp.ncbi.nih.gov/genomes/" rel="nofollow">ftp://ftp.ncbi.nih.gov/genomes/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32048/json</guid>
	<pubDate>Tue, 04 Apr 2017 08:02:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32048/json</link>
	<title><![CDATA[JSON]]></title>
	<description><![CDATA[<p><strong>JSON</strong>&nbsp;(JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the&nbsp;<a href="http://javascript.crockford.com/">JavaScript Programming Language</a>,&nbsp;<a href="http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf">Standard ECMA-262 3rd Edition - December 1999</a>. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.</p>
<p>JSON is built on two structures:</p>
<ul>
<li>A collection of name/value pairs. In various languages, this is realized as an&nbsp;<em>object</em>, record, struct, dictionary, hash table, keyed list, or associative array.</li>
<li>An ordered list of values. In most languages, this is realized as an&nbsp;<em>array</em>, vector, list, or sequence.</li>
</ul>
<p>These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.</p><p>Address of the bookmark: <a href="http://json.org/" rel="nofollow">http://json.org/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37198/understanding-blastn-output-format-6</guid>
	<pubDate>Wed, 27 Jun 2018 18:38:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37198/understanding-blastn-output-format-6</link>
	<title><![CDATA[Understanding BLASTn output format 6 !]]></title>
	<description><![CDATA[<h3 id="sites-page-title-header" style="text-align: left;"><span>BLASTn output format 6</span></h3><div id="sites-canvas-main"><div id="sites-canvas-main-content"><div dir="ltr"><div><div><em>BLASTn</em> maps DNA against DNA, for example gene sequences against a reference genome<br /><br /><code><strong>blastn</strong>  -query <span>genes.ffn</span>  -subject <span>genome.fna</span>  -outfmt <strong>6</strong></code></div><h2>BLASTn tabular output format 6</h2>
<p><strong>Column headers:</strong><br /><code>qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore</code><br /></p>
<table border="1" cellspacing="0">
<tbody>
<tr>
<td> 1.</td>
<td> qseqid</td>
<td> query (e.g., gene) sequence id</td>
</tr>
<tr>
<td> 2.</td>
<td> sseqid</td>
<td> subject (e.g., reference genome) sequence id</td>
</tr>
<tr>
<td> 3.</td>
<td> pident</td>
<td> percentage of identical matches</td>
</tr>
<tr>
<td> 4.</td>
<td> length</td>
<td> alignment length</td>
</tr>
<tr>
<td> 5.</td>
<td> mismatch</td>
<td> number of mismatches</td>
</tr>
<tr>
<td> 6.</td>
<td> gapopen</td>
<td> number of gap openings</td>
</tr>
<tr>
<td> 7.</td>
<td> qstart</td>
<td> start of alignment in query</td>
</tr>
<tr>
<td> 8.</td>
<td> qend</td>
<td> end of alignment in query</td>
</tr>
<tr>
<td> 9.</td>
<td> sstart</td>
<td> start of alignment in subject</td>
</tr>
<tr>
<td> 10.</td>
<td> send</td>
<td> end of alignment in subject</td>
</tr>
<tr>
<td> 11.</td>
<td> evalue</td>
<td> <a href="http://www.metagenomics.wiki/tools/blast/evalue">expect value</a></td>
</tr>
<tr>
<td> 12.</td>
<td> bitscore</td>
<td> <a href="http://www.metagenomics.wiki/tools/blast/evalue"><strong>bit score</strong></a></td>
</tr>
</tbody>
</table>
<p><strong><br /></strong></p>
</div><h2><a name="TOC-Define-your-own-output-format" id="TOC-Define-your-own-output-format"></a>Define your own output format</h2><div><em>by adding the option -outfmt, as for example: </em><strong><br /></strong></div>
<p><code><strong>-outfmt</strong> <strong>"6</strong> <span>qseqid sseqid pident qlen length mismatch gapope evalue bitscore</span><strong>"</strong></code><br /><br /><em><strong>supported format specifiers are:</strong></em><br /><code>qseqid    </code>Query Seq-id<br /><code>qgi       </code>Query GI<br /><code>qacc      </code>Query accesion<br /><code>qaccver   </code>Query accesion.version<br /><code>qlen      </code>Query sequence length<br /><code>sseqid    </code>Subject Seq-id<br /><code>sallseqid </code>All subject Seq-id(s), separated by a ';'<br /><code>sgi       </code>Subject GI<br /><code>sallgi    </code>All subject GIs<br /><code>sacc      </code>Subject accession<br /><code>saccver   </code>Subject accession.version<br /><code>sallacc   </code>All subject accessions<br /><code>slen      </code>Subject sequence length<br /><code>qstart    </code>Start of alignment in query<br /><code>qend      </code>End of alignment in query<br /><code>sstart    </code>Start of alignment in subject<br /><code>send      </code>End of alignment in subject<br /><code>qseq      </code>Aligned part of query sequence<br /><code>sseq      </code>Aligned part of subject sequence<br /><code>evalue    </code>Expect value<br /><code>bitscore  </code>Bit score<br /><code>score     </code>Raw score<br /><code>length    </code>Alignment length<br /><code>pident    </code>Percentage of identical matches<br /><code>nident    </code>Number of identical matches<br /><code>mismatch  </code>Number of mismatches<br /><code>positive  </code>Number of positive-scoring matches<br /><code>gapopen   </code>Number of gap openings<br /><code>gaps      </code>Total number of gaps<br /><code>ppos      </code>Percentage of positive-scoring matches<br /><code>frames    </code>Query and subject frames separated by a '/'<br /><code>qframe    </code>Query frame<br /><code>sframe    </code>Subject frame<br /><code>btop      </code>Blast traceback operations (BTOP)<br /><code>staxids   </code>Subject Taxonomy ID(s), separated by a ';'<br /><code>sscinames </code>Subject Scientific Name(s), separated by a ';'<br /><code>scomnames </code>Subject Common Name(s), separated by a ';'<br /><code>sblastnames </code>Subject Blast Name(s), separated by a ';'   (in alphabetical order)<br /><code>sskingdoms  </code>Subject Super Kingdom(s), separated by a ';'     (in alphabetical order) <br /><code>stitle      </code>Subject Title<br /><code>salltitles  </code>All Subject Title(s), separated by a '&lt;&gt;'<br /><code>sstrand   </code>Subject Strand<br /><code>qcovs     </code>Query Coverage Per Subject<br /><code>qcovhsp   </code>Query Coverage Per HSP<br /><strong><br /><em>default values are:</em></strong><br /><code><code>-outfmt "</code>6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"</code></p>
</div></div></div>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</guid>
	<pubDate>Fri, 20 Jul 2018 10:03:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</link>
	<title><![CDATA[Converting a VCF into a FASTA given some reference !]]></title>
	<description><![CDATA[<p>Samtools/BCFtools (Heng Li) provides a Perl script&nbsp;<a href="https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl"><code>vcfutils.pl</code></a>&nbsp;which does this, the function&nbsp;<code>vcf2fq</code>&nbsp;(lines 469-528)</p><p>This script has been modified by others to convert InDels as well, e.g.&nbsp;<a href="https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl">this</a>&nbsp;by David Eccles</p><pre><code><span>./</span><span>vcf2fq</span><span>.</span><span>pl </span><span>-</span><span>f </span><span>&lt;</span><span>input</span><span>.</span><span>fasta</span><span>&gt;</span><span> </span><span>&lt;</span><span>all</span><span>-</span><span>site</span><span>.</span><span>vcf</span><span>&gt;</span><span> </span><span>&gt;</span><span> </span><span>&lt;</span><span>output</span><span>.</span><span>fastq</span><span>&gt;</span></code></pre><p>https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl</p><p>https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42472/maftools-summarize-analyze-and-visualize-maf-files</guid>
	<pubDate>Wed, 23 Dec 2020 05:29:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42472/maftools-summarize-analyze-and-visualize-maf-files</link>
	<title><![CDATA[maftools : Summarize, Analyze and Visualize MAF Files]]></title>
	<description><![CDATA[<p><span>With advances in Cancer Genomics,&nbsp;</span><a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/">Mutation Annotation Format</a><span>&nbsp;(MAF) is being widely accepted and used to store somatic variants detected.&nbsp;</span><a href="http://cancergenome.nih.gov/">The Cancer Genome Atlas</a><span>&nbsp;Project has sequenced over 30 different cancers with sample size of each cancer type being over 200.&nbsp;</span><a href="https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files">Resulting data</a><span>&nbsp;consisting of somatic variants are stored in the form of&nbsp;</span><a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/">Mutation Annotation Format</a><span>. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.</span></p><p>Address of the bookmark: <a href="https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html" rel="nofollow">https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36857/%E2%80%9Cone-code-to-find-them-all%E2%80%9D-a-perl-tool-to-conveniently-parse-repeatmasker-output-files</guid>
	<pubDate>Mon, 04 Jun 2018 03:45:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36857/%E2%80%9Cone-code-to-find-them-all%E2%80%9D-a-perl-tool-to-conveniently-parse-repeatmasker-output-files</link>
	<title><![CDATA[“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files]]></title>
	<description><![CDATA[One code to find them all is a set of perl scripts to extract useful information from RepeatMasker about transposable elements, retrieve their sequences and get some quantitative information.

Assemble RepeatMasker hits into complete TE copies, including LTR-retrotransposon
Retrieve corresponding TE sequences, and flanking sequences, from the local fasta files
Compute summary statistics for each TE family (number of TE copies, genome coverage...)
Ambiguous cases such as nested TE can be assembled into copies automatically or manually
Allow for working with a TE user-defined library
Allow for working with only a user-chosen set of TE families


http://doua.prabi.fr/software/one-code-to-find-them-all<p>Address of the bookmark: <a href="http://doua.prabi.fr/software/one-code-to-find-them-all" rel="nofollow">http://doua.prabi.fr/software/one-code-to-find-them-all</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43870/quip-aggressive-compression-of-fastq-sam-and-bam-files</guid>
	<pubDate>Tue, 24 May 2022 06:31:48 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43870/quip-aggressive-compression-of-fastq-sam-and-bam-files</link>
	<title><![CDATA[Quip: Aggressive compression of FASTQ, SAM and BAM files.]]></title>
	<description><![CDATA[<p>This will help us to reduce the amount of drive space we take up and decrease data transfer times</p>
<p dir="auto">Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the&nbsp;<a href="http://en.wikipedia.org/wiki/Fastq">FASTQ</a>&nbsp;and&nbsp;<a href="http://samtools.sourceforge.net/">SAM/BAM</a>&nbsp;formats, compressing large datasets to as little as 15% of their original size.</p><p>Address of the bookmark: <a href="https://github.com/dcjones/quip" rel="nofollow">https://github.com/dcjones/quip</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32420/fastq-format</guid>
	<pubDate>Wed, 03 May 2017 04:23:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32420/fastq-format</link>
	<title><![CDATA[Fastq format]]></title>
	<description><![CDATA[<p><strong>FASTQ format</strong>&nbsp;is a text-based&nbsp;<a href="https://en.wikipedia.org/wiki/File_format" title="File format">format</a>&nbsp;for storing both a biological sequence (usually&nbsp;<a href="https://en.wikipedia.org/wiki/Nucleotide_sequence" title="Nucleotide sequence">nucleotide sequence</a>) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single&nbsp;<a href="https://en.wikipedia.org/wiki/ASCII" title="ASCII">ASCII</a>&nbsp;character for brevity.</p>
<p>It was originally developed at the&nbsp;<a href="https://en.wikipedia.org/wiki/Wellcome_Trust_Sanger_Institute" title="Wellcome Trust Sanger Institute">Wellcome Trust Sanger Institute</a>&nbsp;to bundle a&nbsp;<a href="https://en.wikipedia.org/wiki/FASTA_format" title="FASTA format">FASTA</a>&nbsp;sequence and its quality data, but has recently become the&nbsp;<em>de facto</em>&nbsp;standard for storing the output of high-throughput sequencing instruments such as the&nbsp;<a href="https://en.wikipedia.org/wiki/Illumina_(company)" title="Illumina (company)">Illumina</a>&nbsp;Genome Analyzer.<sup id="cite_ref-Cock2009_1-0"><a href="https://en.wikipedia.org/wiki/FASTQ_format#cite_note-Cock2009-1">[1]</a></sup></p><p>Address of the bookmark: <a href="https://en.wikipedia.org/wiki/FASTQ_format" rel="nofollow">https://en.wikipedia.org/wiki/FASTQ_format</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43661/maftools</guid>
	<pubDate>Fri, 17 Dec 2021 03:18:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43661/maftools</link>
	<title><![CDATA[maftools]]></title>
	<description><![CDATA[<p>With advances in Cancer Genomics, <a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/">Mutation Annotation Format</a> (MAF) is being widely accepted and used to store somatic variants detected. <a href="http://cancergenome.nih.gov">The Cancer Genome Atlas</a> Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. <a href="https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files">Resulting data</a> consisting of somatic variants are stored in the form of <a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/">Mutation Annotation Format</a>. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.</p>
<p>https://www.bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html</p><p>Address of the bookmark: <a href="https://github.com/PoisonAlien/maftools" rel="nofollow">https://github.com/PoisonAlien/maftools</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta</guid>
	<pubDate>Fri, 12 Jan 2018 03:49:09 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35144/converting-fastq-to-fasta</link>
	<title><![CDATA[Converting FASTQ to FASTA]]></title>
	<description><![CDATA[<div id="block-system-main"><div><div><div><div><div><div><p>There are several ways you can convert fastq to fasta sequences. Some methods are listed below.</p><h3>Using SED</h3><p><span><code><span>sed</span></code></span>&nbsp;can be used to selectively print the desired lines from a file, so if you print the first and 2rd line of every 4 lines, you get the sequence header and sequence needed for fasta format.</p><pre>sed -n '1~4s/^@/&gt;/p;2~4p' INFILE.fastq &gt; OUTFILE.fasta
</pre><h3>Using PASTE</h3><p>You can linerize every 4 lines in a tabular format and print first and second field using&nbsp;<span><code>paste</code></span></p><pre>cat INFILE.fastq | paste - - - - |cut -f 1, 2| sed 's/@/&gt;/'g | tr -s "/t" "/n" &gt; OUTFILE.fasta
</pre><h3>EMBOSS:seqret</h3><p>Standard script that can be used for many purposes. One such use is fastq-fasta conversion</p><pre>seqret -sequence reads.fastq -outseq reads.fasta
</pre><p><span><code><span>awk</span></code></span>&nbsp;can be used for conversion as follows:</p><h3>Using AWK</h3><pre>cat infile.fq | awk '{if(NR%4==1) {printf("&gt;%s\n",substr($0,2));} else if(NR%4==2) print;}' &gt; file.fa
</pre><h3>FASTX-toolkit</h3><p><span><code>fastq_to_fasta</code></span>&nbsp;is available in the FASTX-toolkit that scales really well with the huge datasets</p><pre>fastq_to_fasta -h
usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE]
# Remember to use -Q33 for illumina reads!
version 0.0.6
       [-h]         = This helpful help screen.
       [-r]         = Rename sequence identifiers to numbers.
       [-n]         = keep sequences with unknown (N) nucleotides.
                   Default is to discard such sequences.
       [-v]         = Verbose - report number of sequences.
                   If [-o] is specified,  report will be printed to STDOUT.
                   If [-o] is not specified (and output goes to STDOUT),
                   report will be printed to STDERR.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA output file. default is STDOUT.
</pre><h3>Bioawk</h3><p>Another option to convert fastq to fasta format using&nbsp;<span><code>bioawk</code></span></p><pre>bioawk -c fastx '{print "&gt;"$name"\n"$seq}' input.fastq &gt; output.fasta
</pre><h3>Seqtk</h3><p>From the same developer, there is another option using a tool called&nbsp;<span><code>seqtk</code></span></p><pre>seqtk seq -a input.fastq &gt; output.fasta
</pre><p>Note that you can use either compressed or uncompressed files for this tool</p></div></div></div></div></div></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

</channel>
</rss>