<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43550?offset=70</link>
	<atom:link href="https://bioinformaticsonline.com/related/43550?offset=70" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</guid>
	<pubDate>Tue, 06 Feb 2018 14:54:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</link>
	<title><![CDATA[Awk for Bioinformatician and computational biologist]]></title>
	<description><![CDATA[<p>Awk is a programming language which allows easy manipulation of structured data and is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that match with the specified patterns and then perform associated actions. The basic syntax is:</p><blockquote><p><br />awk '/pattern1/ {Actions}<br /> /pattern2/ {Actions}' file</p></blockquote><p><br />The working of Awk is as follows<br />Awk reads the input files one line at a time.<br />For each line, it matches with given pattern in the given order, if matches performs the corresponding action.<br />If no pattern matches, no action will be performed.<br />In the above syntax, either search pattern or action are optional, But not both.<br />If the search pattern is not given, then Awk performs the given actions for each line of the input.<br />If the action is not given, print all that lines that matches with the given patterns which is the default action.<br />Empty braces with out any action does nothing. It wont perform default printing operation.<br />Each statement in Actions should be delimited by semicolon.<br />Say you have data.tsv with the following contents:</p><p><br />$ cat data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />By default Awk prints every line from the file.</p><p><br />$ awk '{print;}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />We print the line which matches the pattern contig3</p><p><br />$ awk '/contig3/' data/test.tsv<br />contig3 ACTTATATATATATA<br />Awk has number of builtin variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 5 words, it will be stored in $1, $2, $3, $4 and $5. $0 represents the whole line. NF is a builtin variable which represents the total number of fields in a record.</p><p><br />$ awk '{print $1","$2;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p>$ awk '{print $1","$NF;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p><br />Awk has two important patterns which are specified by the keyword called BEGIN and END. The syntax is as follows:</p><blockquote><p>BEGIN { Actions before reading the file}<br />{Actions for everyline in the file} <br />END { Actions after reading the file }</p></blockquote><p><br />For example,<br />$ awk 'BEGIN{print "Header,Sequence"}{print $1","$2;}END{print "-------"}' data/test.tsv<br />Header,Sequence<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT<br />------- <br />We can also use the concept of a conditional operator in print statement of the form print CONDITION ? PRINT_IF_TRUE_TEXT : PRINT_IF_FALSE_TEXT. For example, in the code below, we identify sequences with lengths &gt; 14:</p><p>$ awk '{print (length($2)&gt;14) ? $0"&gt;14" : $0"&lt;=14";}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG&gt;14<br />contig2 ACTTTATATATT&lt;=14<br />contig3 ACTTATATATATATA&gt;14<br />contig4 ACTTATATATATATA&gt;14<br />contig5 ACTTTATATATT&lt;=14<br />We can also use 1 after the last block {} to print everything (1 is a shorthand notation for {print $0} which becomes {print} as without any argument print will print $0 by default), and within this block, we can change $0, for example to assign the first field to $0 for third line (NR==3), we can use:</p><p>$ awk 'NR==3{$0=$1}1' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT<br />You can have as many blocks as you want and they will be executed on each line in the order they appear, for example, if we want to print $1 three times (here we are using printf instead of print as the former doesn't put end-of-line character),</p><p>$ awk '{printf $1"\t"}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1 contig1<br />contig2 contig2 contig2<br />contig3 contig3 contig3<br />contig4 contig4 contig4<br />contig5 contig5 contig5 <br />Although, we can also skip executing later blocks for a given line by using next keyword:</p><p>$ awk '{printf $1"\t"}NR==3{print "";next}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2<br />contig3 <br />contig4 contig4<br />contig5 contig5</p><p>$ awk 'NR==3{print "";next}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2</p><p>contig4 contig4<br />contig5 contig5<br />You can also use getline to load the contents of another file in addition to the one you are reading, for example, in the statement given below, the while loop will load each line from test.tsv into k until no more lines are to be read:</p><p>$ awk 'BEGIN{while((getline k &lt;"data/test.tsv")&gt;0) print "BEGIN:"k}{print}' data/test.tsv<br />BEGIN:contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />BEGIN:contig2 ACTTTATATATT<br />BEGIN:contig3 ACTTATATATATATA<br />BEGIN:contig4 ACTTATATATATATA<br />BEGIN:contig5 ACTTTATATATT<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />You can also store data in the memory with the syntax VARIABLE_NAME[KEY]=VALUE which you can later use through for (INDEX in VARIABLE_NAME) command:</p><p>$ awk '{i[$1]=1}END{for (j in i) print j"&lt;="i[j]}' data/test.tsv<br />contig1&lt;=1<br />contig2&lt;=1<br />contig3&lt;=1<br />contig4&lt;=1<br />contig5&lt;=1</p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42552/bioinformatics-workbook</guid>
	<pubDate>Tue, 05 Jan 2021 22:42:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42552/bioinformatics-workbook</link>
	<title><![CDATA[bioinformatics workbook]]></title>
	<description><![CDATA[<p><span>This books assumes that the reader has some knowledge of biology and basic understanding of the Unix command line. However, for the beginner, the appendix contains introductory material and tips/tricks for common bioinformatic problems, that is referred to for more information throughout the book.</span></p>
<p>https://bioinformaticsworkbook.org/</p><p>Address of the bookmark: <a href="https://bioinformaticsworkbook.org/" rel="nofollow">https://bioinformaticsworkbook.org/</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43323/biostarhandbook</guid>
	<pubDate>Fri, 27 Aug 2021 01:31:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43323/biostarhandbook</link>
	<title><![CDATA[biostarhandbook]]></title>
	<description><![CDATA[<p>Nice book collection for bioinformatician ... highly recommended.</p><p>Address of the bookmark: <a href="https://www.biostarhandbook.com/" rel="nofollow">https://www.biostarhandbook.com/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44179/python-mini-projects</guid>
	<pubDate>Mon, 16 Jan 2023 02:14:03 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44179/python-mini-projects</link>
	<title><![CDATA[Python Mini Projects !]]></title>
	<description><![CDATA[<p><span>There is a directory for each chapter of the book. Each directory contains a&nbsp;</span><code>test.py</code><span>&nbsp;program you can use with&nbsp;</span><code>pytest</code><span>&nbsp;to check that you have written the program correctly. I have included a short README to describe each exercise. If you have problems writing code (or if you would like to support this project!), the book contains details about the skills you need.</span></p>
<p>https://github.com/kyclark/tiny_python_projects</p><p>Address of the bookmark: <a href="https://github.com/kyclark/tiny_python_projects" rel="nofollow">https://github.com/kyclark/tiny_python_projects</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/11175/next-generation-sequencingngs-books</guid>
	<pubDate>Fri, 30 May 2014 04:48:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/11175/next-generation-sequencingngs-books</link>
	<title><![CDATA[Next generation sequencing(NGS) books]]></title>
	<description><![CDATA[<p>Employing different technologies, the purpose of NGS platform is to decode the identity or modification on the nucleotides. NGS platforms evolve quickly and capture the main stream.</p>
<p>This bookmark is created to provide NGS online books links.</p><p>Address of the bookmark: <a href="http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Print_version" rel="nofollow">http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/Print_version</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26309/ratt</guid>
	<pubDate>Sun, 07 Feb 2016 16:09:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26309/ratt</link>
	<title><![CDATA[RATT]]></title>
	<description><![CDATA[<p><strong>RATT</strong> is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.</p>
<p>It was first developed to transfer annotations between different genome assembly versions. However, it can also transfer annotations between strains and even different species, like <em>Plasmodium chabaudi</em> onto <em> P. berghei</em>, between different Leishmania species or <em>Salmonella enterica</em> onto other Salmonella serotypes. <strong>RATT</strong> is able to transfer any entries present on a reference sequence, such as the systematic id or an annotator's notes; such information would be lost in a <em>de novo</em> annotation.</p>
<p>More at http://ratt.sourceforge.net/</p><p>Address of the bookmark: <a href="http://ratt.sourceforge.net/" rel="nofollow">http://ratt.sourceforge.net/</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30149/mypro-a-seamless-pipeline-for-automated-prokaryotic-genome-assembly-and-annotation</guid>
	<pubDate>Thu, 15 Dec 2016 05:47:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30149/mypro-a-seamless-pipeline-for-automated-prokaryotic-genome-assembly-and-annotation</link>
	<title><![CDATA[MyPro: A seamless pipeline for automated prokaryotic genome assembly and annotation]]></title>
	<description><![CDATA[<p>MyPro is an improved genomics software pipeline for prokaryotic genomes. MyPro is user-friendly and requires minimal programming skills. High-quality prokaryotic genome assembly and annotation can be obtained with ease. It performed better than de novo assemblers and contig integration software. Produces more contiguous assemblies, higher N50 values and lower number of contigs.</p>
<p>More at https://sourceforge.net/projects/sb2nhri/files/MyPro/</p><p>Address of the bookmark: <a href="http://www.sciencedirect.com/science/article/pii/S0167701215001207" rel="nofollow">http://www.sciencedirect.com/science/article/pii/S0167701215001207</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34501/dnapipete-de-novo-assembly-annotation-pipeline-for-transposable-elements</guid>
	<pubDate>Sat, 02 Dec 2017 18:25:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34501/dnapipete-de-novo-assembly-annotation-pipeline-for-transposable-elements</link>
	<title><![CDATA[dnaPipeTE: de-novo assembly &amp; annotation Pipeline for Transposable Elements]]></title>
	<description><![CDATA[<p>dnaPipeTE (for de-novo assembly &amp; annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (&lt; 1X).</p>
<ul>
<li>
<p>dnaPipeTE is developped by Cl&eacute;ment Goubert, Laurent Modolo and the TREEP team of the LBBE:&nbsp;<a href="http://lbbe.univ-lyon1.fr/-Equipe-Elements-transposables-.html?lang=en">http://lbbe.univ-lyon1.fr/-Equipe-Elements-transposables-.html?lang=en</a></p>
</li>
<li>
<p>You can find the original publication in GBE here:&nbsp;<a href="https://academic.oup.com/gbe/article/7/4/1192/533768">https://academic.oup.com/gbe/article/7/4/1192/533768</a></p>
</li>
</ul>
<p><a href="https://github.com/clemgoub/dnaPipeTE/blob/dev/dnaPipefront.png" target="_blank"><img src="https://github.com/clemgoub/dnaPipeTE/raw/dev/dnaPipefront.png" alt="Front" style="border: 0px;"></a><em>output examples of quantification and TE landscape (relative age) produced by dnaPipeTE</em></p>
<p><em>&nbsp;</em></p><p>Address of the bookmark: <a href="https://github.com/clemgoub/dnaPipeTE" rel="nofollow">https://github.com/clemgoub/dnaPipeTE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38758/roary-the-pan-genome-pipeline</guid>
	<pubDate>Tue, 22 Jan 2019 05:52:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38758/roary-the-pan-genome-pipeline</link>
	<title><![CDATA[Roary: the Pan Genome Pipeline]]></title>
	<description><![CDATA[<p><span>Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM. Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes.</span></p><p>Address of the bookmark: <a href="https://sanger-pathogens.github.io/Roary/" rel="nofollow">https://sanger-pathogens.github.io/Roary/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44470/phyloherb-phylogenomic-analysis-pipeline-for-herbarium-specimens</guid>
	<pubDate>Wed, 21 Feb 2024 06:15:13 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44470/phyloherb-phylogenomic-analysis-pipeline-for-herbarium-specimens</link>
	<title><![CDATA[PhyloHerb: Phylogenomic Analysis Pipeline for Herbarium Specimens]]></title>
	<description><![CDATA[<p><span>What is PhyloHerb</span><span>: PhyloHerb is a wrapper program to process&nbsp;</span><span>genome skimming</span><span>&nbsp;data collected from plant materials. The outcomes include the plastid genome (plastome) assemblies, mitochondrial genome assemblies, nuclear ribosomal DNAs (NTS+ETS+18S+ITS1+5.8S+ITS2+28S), alignments of gene and intergenic regions, and a species tree. It is designed to be a high throughput program dealing with lower quality data. Examples include&nbsp;</span><span>low-coverage (5x cpDNA) plastome phylogeny, recycling plastid genes from target enrichment data, retrieving low-copy nuclear genes from medium coverage (5x nucDNA) genome skimming</span><span>.</span></p><p>Address of the bookmark: <a href="https://github.com/lmcai/PhyloHerb/" rel="nofollow">https://github.com/lmcai/PhyloHerb/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>