<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/10394?offset=790</link>
	<atom:link href="https://bioinformaticsonline.com/related/10394?offset=790" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/3868/next-generation-sequencing-ngs-tutorials</guid>
	<pubDate>Sat, 24 Aug 2013 06:01:37 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/3868/next-generation-sequencing-ngs-tutorials</link>
	<title><![CDATA[Next Generation Sequencing (NGS) Tutorials]]></title>
	<description><![CDATA[<p>Institute of computational biomedicine, Cornell University provide an NGS workshop tutorial at&nbsp;<a href="http://chagall.med.cornell.edu/NGScourse/">http://chagall.med.cornell.edu/NGScourse/</a>&nbsp;</p>
<p>You can also add your favourite NGS educational material, or workshop tutorial by commenting on this bookmarks for user benefit.&nbsp;</p>
<p>Understanding the basics of genome sequencing:</p>
<p>Tutorial by Luke Jostins.</p>
<p>http://www.genetic-inference.co.uk/blog/2009/04/basics-sequencing-dna-part-1/</p>
<p>http://www.genetic-inference.co.uk/blog/2009/08/basics-sequencing-dna-part-2/</p>
<p>A window into third-generation sequencing</p>
<p>http://hmg.oxfordjournals.org/content/19/R2/R227.full.pdf</p>
<p>==============================================</p>
<p>NGS data analysis pipelines</p>
<ul>
<li><strong>Detecting and annotating genetic variations using the HugeSeq pipeline</strong>&nbsp; DOI: <a href="http://dx.doi.org/10.1038/nbt.2134">10.1038/nbt.2134</a></li>
<li><strong> NARWHAL, a primary analysis pipeline for NGS data</strong> <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/28/2/284?etoc">http://bioinformatics.oxfordjournals.org/cgi/content/abstract/28/2/284?etoc</a></li>
<li><strong>RseqFlow: Workflows for RNA-Seq data analysis</strong>&nbsp; DOI: <a href="http://dx.doi.org/10.1093/bioinformatics/btr441">10.1093/bioinformatics/btr441</a></li>
<li><strong>ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence</strong>&nbsp;&nbsp;<a href="http://dx.doi.org/10.1186/1471-2164-12-285">10.1186/1471-2164-12-285</a></li>
<li><strong>A framework for variation discovery and genotyping using next-generation DNA sequencing data</strong>&nbsp; PubMed: <a href="http://www.ncbi.nlm.nih.gov/pubmed/21478889">21478889</a></li>
<li><strong>SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects</strong>&nbsp; DOI: <a href="http://dx.doi.org/10.1186/1471-2105-12-134">10.1186/1471-2105-12-134</a> Abstract: <a href="http://www.biomedcentral.com/1471-2105/12/134/abstract">http://www.biomedcentral.com/1471-2105/12/134/abstract</a></li>
<li><strong>WEP: a high-performance analysis pipeline for whole-exome data&nbsp;</strong>http://www.biomedcentral.com/1471-2105/14/S7/S11</li>
<li><strong>DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.&nbsp;</strong>http://www.ncbi.nlm.nih.gov/pubmed/23657089</li>
<li><strong>GATK: a Toolkit for Genome Analysis&nbsp;</strong>http://www.broadinstitute.org/gatk/</li>
<li><strong>Metagenomics</strong>:http://www.nbic.nl/education/nbic-phd-school/course-schedule/ngsmetagenomics/</li>
<li><strong>RNASeq</strong>:http://www.nbic.nl/education/nbic-phd-school/course-schedule/ngsrnaseq/</li>
<li><strong>Bioinformatics and Seq courses</strong>:&nbsp;http://www.isb-sib.ch/training/training-activities-schedule/archive-2013.html</li>
<li><strong>Variant Detection (Model organism) Advanced tutorial</strong> https://docs.google.com/document/pub?id=1CuKkKylVDb03tnN7RSWl5EUzleetn0ctjmvaidPKLxM</li>
<li><strong>Variant Detection Introductory tutorial</strong> https://docs.google.com/document/pub?id=1ZRzrjjOCvtAu3m-IKL-rbJ1f4On60dDL_IEwG7oejdI</li>
<li><strong>Microbial de novo Assembly for Illumina Data Introductory tutorial</strong> https://docs.google.com/document/pub?id=1N3AB9ptISUu4zULqe1kXpVF0BDyGb5f5yzxWSJd_WNM</li>
<li><strong>RNAseq Differential Gene Expression Introductory tutorial</strong> https://docs.google.com/document/pub?id=1KbTiBHtvHLfPRZ39AY3uriazrINA8TJzgjjwn1zPP7Y</li>
</ul>
<blockquote>
<p>" Please add your favourite NGS link below in comment section for the benefit of bioinformatics community ".&nbsp;</p>
</blockquote><p>Address of the bookmark: <a href="http://chagall.med.cornell.edu/NGScourse/" rel="nofollow">http://chagall.med.cornell.edu/NGScourse/</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/22454/one-page-r-survival-guide</guid>
	<pubDate>Thu, 28 May 2015 21:10:12 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/22454/one-page-r-survival-guide</link>
	<title><![CDATA[One page R survival guide !!]]></title>
	<description><![CDATA[<p><span style="font-style: normal; color: #000000; float: none;">There any many of the documents have been developed and tested by scientist around the world. I found this one really useful. The data used is available for download as<span>&nbsp;</span></span><a href="http://onepager.togaware.com/data.zip">data.zip</a><span style="font-style: normal; color: #000000; float: none;">.</span></p><p><span style="font-style: normal; color: #000000; float: none;">Reference@http://www.datasciencecentral.com/profiles/blogs/one-page-r-a-survival-guide-to-data-science-with-r</span></p><ul>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Templates for the Data Scientist<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">A Template for Preparing Data:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/DataO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/DataO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">A Template for Building Models:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/ModelsO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/ModelsO.R">R</a></li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Getting Started as a Data Scientist<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Getting Started with R and Rattle:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/StartL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/StartG.pdf">Laboratory</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Introducing and Interacting with R:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/IntroRL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/IntroRR.pdf">Laboratory</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">BasicR - OnePage(R) - Writing R scripts</li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Dealing With Data<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Read Data into R:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/ReadO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/ReadO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Explore and Summarise Data:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/SummaryO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/SummaryO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Transform Data:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/TransformO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/TransformO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><a href="http://togaware.com/onepager/DateTimeRB"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Dealing with Dates and Time:</span></a><span>&nbsp;</span>(<a href="http://onepager.togaware.com/DateTimeR.pdf">PDF</a>,<span>&nbsp;</span><a href="http://onepager.togaware.com/DateTimeR.R">R</a>) Dates and Time</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Visualising Data with GGPlot2:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/GGPlot2O.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/GGPlot2O.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Visualising Data with Maps</span><span>&nbsp;</span>*<a href="http://togaware.com/onepager/MapsO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/MapsO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Spatial<span>&nbsp;</span>(R) Spatial Analysis</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Handling Big Data</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/BigDataO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/BigData.R">R</a></li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Descriptive Analytics<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Cluster Analysis:</span><span>&nbsp;</span>*<a href="http://togaware.com/onepager/ClustersL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/ClustersO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/Clusters.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Association Analysis:</span><span>&nbsp;</span>*<a href="http://togaware.com/onepager/ARulesL.pdf">Lecture</a></li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Predictive Analytics<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Decision Trees:</span><span>&nbsp;</span>*<a href="http://togaware.com/onepager/DTreesL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/DTreesO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/DTreesO.R">R</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/DTreesG.pdf">Rattle</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Ensembles of Decision Trees:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/EnsemblesL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/EnsemblesO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/EnsemblesO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">SVM (R)</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">KernLab (R)</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">NeuralNetworks (R)</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">NNet (R)</li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Model Delivery<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Evaluating Models:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/EvaluationO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/EvaluationO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Evaluation (R)</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Scoring (R)</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">PMML (R) Exporting Models for Deployment</li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Advanced Topics<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Text Mining:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/TextMiningO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/TextMiningO.R">R</a></li>
</ol></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Advanced R Topics<ol style="margin: 0px; padding: 0px 0px 0px 1.5em; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><a href="http://togaware.com/onepager/PlotsB"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Plots</span></a><span>&nbsp;</span>(<a href="http://onepager.togaware.com/Plots.pdf">PDF</a>,<span>&nbsp;</span><a href="http://onepager.togaware.com/Plots.R">R</a>) Miscellaneous Plots</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><a href="http://togaware.com/onepager/FunctionsB"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Functions</span></a><span>&nbsp;</span>(<a href="http://onepager.togaware.com/Functions.pdf">PDF</a>,<span>&nbsp;</span><a href="http://onepager.togaware.com/Functions.R">R</a>) Writing Functions in R</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><a href="http://togaware.com/onepager/ParallelB"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Parallel</span></a><span>&nbsp;</span>(<a href="http://onepager.togaware.com/Parallel.pdf">PDF</a>,<span>&nbsp;</span><a href="http://onepager.togaware.com/Parallel.R">R</a>) Parallel Execution</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Packaging (R) Pulling it Together into a Package</li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Doing R with Style:</span><span>&nbsp;</span>*<a href="http://onepager.togaware.com/StyleO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/StyleO.R">R</a></li>
<li style="margin: 0px; padding: 0px; border: 0px currentColor; font-style: inherit; font-weight: inherit; vertical-align: baseline;"><span style="margin: 0px; padding: 0px; border: 0px none currentcolor; font-style: inherit; font-weight: inherit; vertical-align: baseline;">Literate Data Mining with KnitR:</span><span>&nbsp;</span>*<a href="http://togaware.com/onepager/KnitRL.pdf">Lecture</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/KnitRO.pdf">OnePageR</a><span>&nbsp;</span>- *<a href="http://onepager.togaware.com/KnitRO.R"></a></li>
</ol></li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44865/snp-analysis-unlocking-the-secrets-in-our-dna</guid>
	<pubDate>Wed, 16 Jul 2025 01:31:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44865/snp-analysis-unlocking-the-secrets-in-our-dna</link>
	<title><![CDATA[SNP Analysis: Unlocking the Secrets in Our DNA]]></title>
	<description><![CDATA[<p>Single Nucleotide Polymorphisms (SNPs) are the most common type of genetic variation in humans&mdash;and many other organisms. A single base change in the DNA sequence (for example, an A instead of a G) can influence everything from our eye color to our risk of developing diseases. Analyzing these tiny changes has become central to modern genetics, medicine, agriculture, and evolutionary biology.</p><p><strong>What are SNPs?</strong><br />SNPs (pronounced "snips") are positions in the genome where individuals differ by a single nucleotide. For example:</p><p>Reference: ...A T G C A T G A...<br />Variant:&nbsp; &nbsp; &nbsp;...A T G T A T G A...</p><p>Here, the C in the reference genome has been replaced by a T in the variant.</p><p>SNPs occur roughly every 300&ndash;1,000 bases in the human genome, meaning there are millions of them scattered throughout our DNA. Most SNPs have no effect on health, but some are linked to disease susceptibility, drug response, and other traits.</p><p><strong>Why Do We Analyze SNPs?</strong><br />1. Medical Genetics</p><p>Identify disease-associated variants (e.g., BRCA1/2 in breast cancer).</p><p>Predict drug response (pharmacogenomics).</p><p>Enable precision medicine by tailoring treatments.</p><p>2. Population Genetics &amp; Ancestry</p><p>Trace human migration and ancestry.</p><p>Study genetic diversity within and between populations.</p><p>3. Agriculture &amp; Animal Breeding</p><p>Select for desirable traits (drought resistance, yield, disease resistance).</p><p>Improve breeding efficiency in livestock.</p><p>4. Evolutionary Biology</p><p>Track natural selection.</p><p>Study adaptation in wild populations.</p><p><strong>How is SNP Analysis Performed?</strong><br />SNP analysis can be broadly divided into three steps:</p><p>SNP Detection<br />Genotyping arrays: Chips that test hundreds of thousands of known SNP positions simultaneously. Fast and affordable, widely used in consumer ancestry testing.</p><p>Whole-genome or whole-exome sequencing: Can detect known and novel SNPs across the genome.</p><p>Targeted sequencing or PCR: For focused analysis of specific regions.</p><p>Variant Calling<br />Sequencing data is aligned to a reference genome. Bioinformatics tools (e.g., GATK, bcftools) identify positions where the sequenced sample differs from the reference.</p><p>Annotation and Interpretation<br />Tools (e.g., SnpEff, VEP) predict the functional impact of SNPs.</p><p>Are the SNPs in coding regions? Do they cause amino acid changes? Are they known to be pathogenic?</p><p>Databases like dbSNP, ClinVar, and GWAS Catalog provide information on known associations.</p><p>Common Tools for SNP Analysis<br />Alignment: BWA, Bowtie2</p><p>Variant Calling: GATK, FreeBayes</p><p>Visualization: IGV, UCSC Genome Browser</p><p>Annotation: SnpEff, VEP</p><p>Statistical Analysis: PLINK, SNPTEST</p><p><strong>Challenges in SNP Analysis</strong><br />False positives/negatives: Sequencing errors, alignment issues.</p><p>Population stratification: Confounding in association studies.</p><p>Interpretation: Many SNPs have unknown or complex effects.</p><p>Researchers address these with rigorous quality control, large datasets, and increasingly sophisticated statistical models.</p><p><strong>The Future of SNP Analysis</strong><br />With advances in sequencing technology and AI-driven analysis, SNP studies are expanding:</p><p>Polygenic risk scores predict disease risk based on thousands of SNPs.</p><p>Large-scale biobanks (e.g., UK Biobank, All of Us) enable powerful genome-wide association studies (GWAS).</p><p>CRISPR and functional assays help validate SNP effects in the lab.</p><p>SNP analysis is at the heart of the genomic revolution, promising insights into biology, health, and evolution at unprecedented scale.</p><p><strong>Conclusion</strong><br />From diagnosing rare diseases to designing better crops, SNP analysis is a foundational tool in modern science. As our ability to sequence and interpret genomes improves, so will our understanding of these tiny&mdash;but mighty&mdash;variations in DNA.</p><p>&nbsp;</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34463/single-cell-rnaseq-data-analysis-tutorial</guid>
	<pubDate>Mon, 27 Nov 2017 16:24:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34463/single-cell-rnaseq-data-analysis-tutorial</link>
	<title><![CDATA[Single Cell RNAseq data analysis tutorial !!]]></title>
	<description><![CDATA[<ul>
<li>A major breakthrough (replaced microarrays) in the late 00&rsquo;s and has been widely used since</li>
<li>Measures the&nbsp;average expression level&nbsp;for each gene across a large population of input cells</li>
<li>Useful for comparative transcriptomics, e.g.&nbsp;samples of the same tissue from different species</li>
<li>Useful for quantifying expression signatures from ensembles, e.g.&nbsp;in disease studies</li>
<li>Insufficient&nbsp;for studying heterogeneous systems, e.g.&nbsp;early development studies, complex tissues (brain)</li>
<li>Does&nbsp;not&nbsp;provide insights into the stochastic nature of gene expression</li>
</ul><p>Following are the useful links:</p><p><a href="http://hemberg-lab.github.io/scRNA.seq.course/scRNA-seq-course.pdf" target="_blank">Single Cell RNAseq data analysis Tutorial</a></p><p><a href="https://f1000research.com/articles/5-2122/v2" target="_blank">A step-by-step workflow for low-level analysis of single-cell RNA-seq data</a></p><p><a href="https://www.bioconductor.org/help/workflows/simpleSingleCell/" target="_blank">A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor</a></p><p>SCell: single-cell RNA-seq analysis software</p><p><a href="https://github.com/diazlab/SCell">https://github.com/diazlab/SCell</a></p><p>Beta-Poisson model for single-cell RNA-seq data analyses</p><p><a href="https://github.com/nghiavtr/BPSC">https://github.com/nghiavtr/BPSC</a></p><p>Sincera: A Computational Pipeline for Single Cell RNA-Seq Profiling Analysis</p><p><a href="https://research.cchmc.org/pbge/sincera.html">https://research.cchmc.org/pbge/sincera.html</a></p><p>SC3 &ndash; consensus clustering of single-cell RNA-Seq data</p><p><a href="http://biorxiv.org/content/early/2016/09/02/036558">http://biorxiv.org/content/early/2016/09/02/036558</a></p><p>Citrus: A toolkit for single cell sequencing analysis</p><p><a href="http://biorxiv.org/content/early/2016/09/14/045070">http://biorxiv.org/content/early/2016/09/14/045070</a></p><p>Single-Cell Resolution of Temporal Gene Expression during Heart Development</p><p><a href="http://www.cell.com/developmental-cell/fulltext/S1534-5807%2816%2930682-7">http://www.cell.com/developmental-cell/fulltext/S1534-5807(16)30682-7</a></p><p>Scalable latent-factor models applied to single-cell RNA-seq data separate biological drivers from confounding effects</p><p><a href="http://biorxiv.org/content/early/2016/11/15/087775">http://biorxiv.org/content/early/2016/11/15/087775</a></p><p>Single cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes</p><p><a href="http://genome.cshlp.org/content/early/2016/11/18/gr.212720.116.abstract">http://genome.cshlp.org/content/early/2016/11/18/gr.212720.116.abstract</a></p><p>SCODE: An efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation</p><p><a href="http://biorxiv.org/content/early/2016/11/21/088856">http://biorxiv.org/content/early/2016/11/21/088856</a></p><p>SCOUP is a probabilistic model to analyze single-cell expression data during differentiation</p><p><a href="https://github.com/hmatsu1226/SCOUP">https://github.com/hmatsu1226/SCOUP</a></p><p>scLVM is a modelling framework for single-cell RNA-seq data</p><p><a href="https://github.com/PMBio/scLVM">https://github.com/PMBio/scLVM</a></p><p>Selective Locally linear Inference of Cellular Expression Relationships (SLICER) algorithm for inferring cell trajectories</p><p><a href="https://github.com/jw156605/SLICER">https://github.com/jw156605/SLICER</a></p><p>SinQC: A Method and Tool to Control Single-cell RNA-seq Data Quality</p><p><a href="http://www.morgridge.net/SinQC.html">http://www.morgridge.net/SinQC.html</a></p><p>TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis</p><p><a href="https://github.com/zji90/TSCAN">https://github.com/zji90/TSCAN</a></p><p>Visualization and cellular hierarchy inference of single-cell data using SPADE</p><p><a href="http://www.nature.com/nprot/journal/v11/n7/full/nprot.2016.066.html">http://www.nature.com/nprot/journal/v11/n7/full/nprot.2016.066.html</a></p><p>OEFinder: Identify ordering effect genes in single cell RNA-seq data</p><p><a href="https://github.com/lengning/OEFinder">https://github.com/lengning/OEFinder</a></p>]]></description>
	<dc:creator>Robert M Willioms</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/39307/awk-for-beginners</guid>
	<pubDate>Fri, 26 Apr 2019 16:19:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/39307/awk-for-beginners</link>
	<title><![CDATA[AWK for beginners !]]></title>
	<description><![CDATA[<p>AWK is a standard tool on every POSIX-compliant UNIX system. It&rsquo;s like flex/lex, from the command-line, perfect for text-processing tasks and other scripting needs. It has a C-like syntax, but without mandatory semicolons (although, you should use them anyway, because they are required when you&rsquo;re writing one-liners, something AWK excels at), manual memory management, or static typing. It excels at text processing. You can call to it from a shell script, or you can use it as a stand-alone scripting language.</p><p>Why use AWK instead of Perl? Readability. AWK is easier to read than Perl. For simple text-processing scripts, particularly ones that read files line by line and split on delimiters, AWK is probably the right tool for the job.</p><div><pre><span>#!/usr/bin/awk -f</span>

<span># Comments are like this</span>


<span># AWK programs consist of a collection of patterns and actions.</span>
<span>pattern1</span> <span>{</span> <span>action</span><span>;</span> <span>}</span> <span># just like lex</span>
<span>pattern2</span> <span>{</span> <span>action</span><span>;</span> <span>}</span>

<span># There is an implied loop and AWK automatically reads and parses each</span>
<span># record of each file supplied. Each record is split by the FS delimiter,</span>
<span># which defaults to white-space (multiple spaces,tabs count as one)</span>
<span># You can assign FS either on the command line (-F C) or in your BEGIN</span>
<span># pattern</span>

<span># One of the special patterns is BEGIN. The BEGIN pattern is true</span>
<span># BEFORE any of the files are read. The END pattern is true after</span>
<span># an End-of-file from the last file (or standard-in if no files specified)</span>
<span># There is also an output field separator (OFS) that you can assign, which</span>
<span># defaults to a single space</span>

<span>BEGIN</span> <span>{</span>

    <span># BEGIN will run at the beginning of the program. It's where you put all</span>
    <span># the preliminary set-up code, before you process any text files. If you</span>
    <span># have no text files, then think of BEGIN as the main entry point.</span>

    <span># Variables are global. Just set them or use them, no need to declare..</span>
    <span>count</span> <span>=</span> <span>0</span><span>;</span>

    <span># Operators just like in C and friends</span>
    <span>a</span> <span>=</span> <span>count</span> <span>+</span> <span>1</span><span>;</span>
    <span>b</span> <span>=</span> <span>count</span> <span>-</span> <span>1</span><span>;</span>
    <span>c</span> <span>=</span> <span>count</span> <span>*</span> <span>1</span><span>;</span>
    <span>d</span> <span>=</span> <span>count</span> <span>/</span> <span>1</span><span>;</span> <span># integer division</span>
    <span>e</span> <span>=</span> <span>count</span> <span>%</span> <span>1</span><span>;</span> <span># modulus</span>
    <span>f</span> <span>=</span> <span>count</span> <span>^</span> <span>1</span><span>;</span> <span># exponentiation</span>

    <span>a</span> <span>+=</span> <span>1</span><span>;</span>
    <span>b</span> <span>-=</span> <span>1</span><span>;</span>
    <span>c</span> <span>*=</span> <span>1</span><span>;</span>
    <span>d</span> <span>/=</span> <span>1</span><span>;</span>
    <span>e</span> <span>%=</span> <span>1</span><span>;</span>
    <span>f</span> <span>^=</span> <span>1</span><span>;</span>

    <span># Incrementing and decrementing by one</span>
    <span>a</span><span>++</span><span>;</span>
    <span>b</span><span>--</span><span>;</span>

    <span># As a prefix operator, it returns the incremented value</span>
    <span>++</span><span>a</span><span>;</span>
    <span>--</span><span>b</span><span>;</span>

    <span># Notice, also, no punctuation such as semicolons to terminate statements</span>

    <span># Control statements</span>
    <span>if</span> <span>(</span><span>count</span> <span>==</span> <span>0</span><span>)</span>
        <span>print</span> <span>"Starting with count of 0"</span><span>;</span>
    <span>else</span>
        <span>print</span> <span>"Huh?"</span><span>;</span>

    <span># Or you could use the ternary operator</span>
    <span>print</span> <span>(</span><span>count</span> <span>==</span> <span>0</span><span>)</span> <span>?</span> <span>"Starting with count of 0"</span> <span>:</span> <span>"Huh?"</span><span>;</span>

    <span># Blocks consisting of multiple lines use braces</span>
    <span>while</span> <span>(</span><span>a</span> <span>&lt;</span> <span>10</span><span>)</span> <span>{</span>
        <span>print</span> <span>"String concatenation is done"</span> <span>" with a series"</span> <span>" of"</span>
            <span>" space-separated strings"</span><span>;</span>
        <span>print</span> <span>a</span><span>;</span>

        <span>a</span><span>++</span><span>;</span>
    <span>}</span>

    <span>for</span> <span>(</span><span>i</span> <span>=</span> <span>0</span><span>;</span> <span>i</span> <span>&lt;</span> <span>10</span><span>;</span> <span>i</span><span>++</span><span>)</span>
        <span>print</span> <span>"Good ol' for loop"</span><span>;</span>

    <span># As for comparisons, they're the standards:</span>
    <span># a &lt; b   # Less than</span>
    <span># a &lt;= b  # Less than or equal</span>
    <span># a != b  # Not equal</span>
    <span># a == b  # Equal</span>
    <span># a &gt; b   # Greater than</span>
    <span># a &gt;= b  # Greater than or equal</span>

    <span># Logical operators as well</span>
    <span># a &amp;&amp; b  # AND</span>
    <span># a || b  # OR</span>

    <span># In addition, there's the super useful regular expression match</span>
    <span>if</span> <span>(</span><span>"foo"</span> <span>~</span> <span>"^fo+$"</span><span>)</span>
        <span>print</span> <span>"Fooey!"</span><span>;</span>
    <span>if</span> <span>(</span><span>"boo"</span> <span>!~</span> <span>"^fo+$"</span><span>)</span>
        <span>print</span> <span>"Boo!"</span><span>;</span>

    <span># Arrays</span>
    <span>arr</span><span>[</span><span>0</span><span>]</span> <span>=</span> <span>"foo"</span><span>;</span>
    <span>arr</span><span>[</span><span>1</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>

    <span># You can also initialize an array with the built-in function split()</span>

    <span>n</span> <span>=</span> <span>split</span><span>(</span><span>"foo:bar:baz"</span><span>,</span> <span>arr</span><span>,</span> <span>":"</span><span>);</span>

    <span># You also have associative arrays (actually, they're all associative arrays)</span>
    <span>assoc</span><span>[</span><span>"foo"</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>
    <span>assoc</span><span>[</span><span>"bar"</span><span>]</span> <span>=</span> <span>"baz"</span><span>;</span>

    <span># And multi-dimensional arrays, with some limitations I won't mention here</span>
    <span>multidim</span><span>[</span><span>0</span><span>,</span><span>0</span><span>]</span> <span>=</span> <span>"foo"</span><span>;</span>
    <span>multidim</span><span>[</span><span>0</span><span>,</span><span>1</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>
    <span>multidim</span><span>[</span><span>1</span><span>,</span><span>0</span><span>]</span> <span>=</span> <span>"baz"</span><span>;</span>
    <span>multidim</span><span>[</span><span>1</span><span>,</span><span>1</span><span>]</span> <span>=</span> <span>"boo"</span><span>;</span>

    <span># You can test for array membership</span>
    <span>if</span> <span>(</span><span>"foo"</span> <span>in</span> <span>assoc</span><span>)</span>
        <span>print</span> <span>"Fooey!"</span><span>;</span>

    <span># You can also use the 'in' operator to traverse the keys of an array</span>
    <span>for</span> <span>(</span><span>key</span> <span>in</span> <span>assoc</span><span>)</span>
        <span>print</span> <span>assoc</span><span>[</span><span>key</span><span>];</span>

    <span># The command line is in a special array called ARGV</span>
    <span>for</span> <span>(</span><span>argnum</span> <span>in</span> <span>ARGV</span><span>)</span>
        <span>print</span> <span>ARGV</span><span>[</span><span>argnum</span><span>];</span>

    <span># You can remove elements of an array</span>
    <span># This is particularly useful to prevent AWK from assuming the arguments</span>
    <span># are files for it to process</span>
    <span>delete</span> <span>ARGV</span><span>[</span><span>1</span><span>];</span>

    <span># The number of command line arguments is in a variable called ARGC</span>
    <span>print</span> <span>ARGC</span><span>;</span>

    <span># AWK has several built-in functions. They fall into three categories. I'll</span>
    <span># demonstrate each of them in their own functions, defined later.</span>

    <span>return_value</span> <span>=</span> <span>arithmetic_functions</span><span>(</span><span>a</span><span>,</span> <span>b</span><span>,</span> <span>c</span><span>);</span>
    <span>string_functions</span><span>();</span>
    <span>io_functions</span><span>();</span>
<span>}</span>

<span># Here's how you define a function</span>
<span>function</span> <span>arithmetic_functions</span><span>(</span><span>a</span><span>,</span> <span>b</span><span>,</span> <span>c</span><span>,</span>     <span>d</span><span>)</span> <span>{</span>

    <span># Probably the most annoying part of AWK is that there are no local</span>
    <span># variables. Everything is global. For short scripts, this is fine, even</span>
    <span># useful, but for longer scripts, this can be a problem.</span>

    <span># There is a work-around (ahem, hack). Function arguments are local to the</span>
    <span># function, and AWK allows you to define more function arguments than it</span>
    <span># needs. So just stick local variable in the function declaration, like I</span>
    <span># did above. As a convention, stick in some extra whitespace to distinguish</span>
    <span># between actual function parameters and local variables. In this example,</span>
    <span># a, b, and c are actual parameters, while d is merely a local variable.</span>

    <span># Now, to demonstrate the arithmetic functions</span>

    <span># Most AWK implementations have some standard trig functions</span>
    <span>localvar</span> <span>=</span> <span>sin</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>cos</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>atan2</span><span>(</span><span>b</span><span>,</span> <span>a</span><span>);</span> <span># arc tangent of b / a</span>

    <span># And logarithmic stuff</span>
    <span>localvar</span> <span>=</span> <span>exp</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>log</span><span>(</span><span>a</span><span>);</span>

    <span># Square root</span>
    <span>localvar</span> <span>=</span> <span>sqrt</span><span>(</span><span>a</span><span>);</span>

    <span># Truncate floating point to integer</span>
    <span>localvar</span> <span>=</span> <span>int</span><span>(</span><span>5.34</span><span>);</span> <span># localvar =&gt; 5</span>

    <span># Random numbers</span>
    <span>srand</span><span>();</span> <span># Supply a seed as an argument. By default, it uses the time of day</span>
    <span>localvar</span> <span>=</span> <span>rand</span><span>();</span> <span># Random number between 0 and 1.</span>

    <span># Here's how to return a value</span>
    <span>return</span> <span>localvar</span><span>;</span>
<span>}</span>

<span>function</span> <span>string_functions</span><span>(</span>    <span>localvar</span><span>,</span> <span>arr</span><span>)</span> <span>{</span>

    <span># AWK, being a string-processing language, has several string-related</span>
    <span># functions, many of which rely heavily on regular expressions.</span>

    <span># Search and replace, first instance (sub) or all instances (gsub)</span>
    <span># Both return number of matches replaced</span>
    <span>localvar</span> <span>=</span> <span>"fooooobar"</span><span>;</span>
    <span>sub</span><span>(</span><span>"fo+"</span><span>,</span> <span>"Meet me at the "</span><span>,</span> <span>localvar</span><span>);</span> <span># localvar =&gt; "Meet me at the bar"</span>
    <span>gsub</span><span>(</span><span>"e+"</span><span>,</span> <span>"."</span><span>,</span> <span>localvar</span><span>);</span> <span># localvar =&gt; "m..t m. at th. bar"</span>

    <span># Search for a string that matches a regular expression</span>
    <span># index() does the same thing, but doesn't allow a regular expression</span>
    <span>match</span><span>(</span><span>localvar</span><span>,</span> <span>"t"</span><span>);</span> <span># =&gt; 4, since the 't' is the fourth character</span>

    <span># Split on a delimiter</span>
    <span>n</span> <span>=</span> <span>split</span><span>(</span><span>"foo-bar-baz"</span><span>,</span> <span>arr</span><span>,</span> <span>"-"</span><span>);</span> <span># a[1] = "foo"; a[2] = "bar"; a[3] = "baz"; n = 3</span>

    <span># Other useful stuff</span>
    <span>sprintf</span><span>(</span><span>"%s %d %d %d"</span><span>,</span> <span>"Testing"</span><span>,</span> <span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span> <span># =&gt; "Testing 1 2 3"</span>
    <span>substr</span><span>(</span><span>"foobar"</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span> <span># =&gt; "oob"</span>
    <span>substr</span><span>(</span><span>"foobar"</span><span>,</span> <span>4</span><span>);</span> <span># =&gt; "bar"</span>
    <span>length</span><span>(</span><span>"foo"</span><span>);</span> <span># =&gt; 3</span>
    <span>tolower</span><span>(</span><span>"FOO"</span><span>);</span> <span># =&gt; "foo"</span>
    <span>toupper</span><span>(</span><span>"foo"</span><span>);</span> <span># =&gt; "FOO"</span>
<span>}</span>

<span>function</span> <span>io_functions</span><span>(</span>    <span>localvar</span><span>)</span> <span>{</span>

    <span># You've already seen print</span>
    <span>print</span> <span>"Hello world"</span><span>;</span>

    <span># There's also printf</span>
    <span>printf</span><span>(</span><span>"%s %d %d %d\n"</span><span>,</span> <span>"Testing"</span><span>,</span> <span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span>

    <span># AWK doesn't have file handles, per se. It will automatically open a file</span>
    <span># handle for you when you use something that needs one. The string you used</span>
    <span># for this can be treated as a file handle, for purposes of I/O. This makes</span>
    <span># it feel sort of like shell scripting, but to get the same output, the string</span>
    <span># must match exactly, so use a variable:</span>

    <span>outfile</span> <span>=</span> <span>"/tmp/foobar.txt"</span><span>;</span>

    <span>print</span> <span>"foobar"</span> <span>&gt;</span> <span>outfile</span><span>;</span>

    <span># Now the string outfile is a file handle. You can close it:</span>
    <span>close</span><span>(</span><span>outfile</span><span>);</span>

    <span># Here's how you run something in the shell</span>
    <span>system</span><span>(</span><span>"echo foobar"</span><span>);</span> <span># =&gt; prints foobar</span>

    <span># Reads a line from standard input and stores in localvar</span>
    <span>getline</span> <span>localvar</span><span>;</span>

    <span># Reads a line from a pipe (again, use a string so you close it properly)</span>
    <span>cmd</span> <span>=</span> <span>"echo foobar"</span><span>;</span>
    <span>cmd</span> <span>|</span> <span>getline</span> <span>localvar</span><span>;</span> <span># localvar =&gt; "foobar"</span>
    <span>close</span><span>(</span><span>cmd</span><span>);</span>

    <span># Reads a line from a file and stores in localvar</span>
    <span>infile</span> <span>=</span> <span>"/tmp/foobar.txt"</span><span>;</span>
    <span>getline</span> <span>localvar</span> <span>&lt;</span> <span>infile</span><span>;</span> 
    <span>close</span><span>(</span><span>infile</span><span>);</span>
<span>}</span>

<span># As I said at the beginning, AWK programs consist of a collection of patterns</span>
<span># and actions. You've already seen the BEGIN pattern. Other</span>
<span># patterns are used only if you're processing lines from files or standard</span>
<span># input.</span>
<span>#</span>
<span># When you pass arguments to AWK, they are treated as file names to process.</span>
<span># It will process them all, in order. Think of it like an implicit for loop,</span>
<span># iterating over the lines in these files. these patterns and actions are like</span>
<span># switch statements inside the loop. </span>

<span>/^fo+bar$/</span> <span>{</span>

    <span># This action will execute for every line that matches the regular</span>
    <span># expression, /^fo+bar$/, and will be skipped for any line that fails to</span>
    <span># match it. Let's just print the line:</span>

    <span>print</span><span>;</span>

    <span># Whoa, no argument! That's because print has a default argument: $0.</span>
    <span># $0 is the name of the current line being processed. It is created</span>
    <span># automatically for you.</span>

    <span># You can probably guess there are other $ variables. Every line is</span>
    <span># implicitly split before every action is called, much like the shell</span>
    <span># does. And, like the shell, each field can be access with a dollar sign</span>

    <span># This will print the second and fourth fields in the line</span>
    <span>print</span> <span>$</span><span>2</span><span>,</span> <span>$</span><span>4</span><span>;</span>

    <span># AWK automatically defines many other variables to help you inspect and</span>
    <span># process each line. The most important one is NF</span>

    <span># Prints the number of fields on this line</span>
    <span>print</span> <span>NF</span><span>;</span>

    <span># Print the last field on this line</span>
    <span>print</span> <span>$</span><span>NF</span><span>;</span>
<span>}</span>

<span># Every pattern is actually a true/false test. The regular expression in the</span>
<span># last pattern is also a true/false test, but part of it was hidden. If you</span>
<span># don't give it a string to test, it will assume $0, the line that it's</span>
<span># currently processing. Thus, the complete version of it is this:</span>

<span>$</span><span>0</span> <span>~</span> <span>/^fo+bar$/</span> <span>{</span>
    <span>print</span> <span>"Equivalent to the last pattern"</span><span>;</span>
<span>}</span>

<span>a</span> <span>&gt;</span> <span>0</span> <span>{</span>
    <span># This will execute once for each line, as long as a is positive</span>
<span>}</span>

<span># You get the idea. Processing text files, reading in a line at a time, and</span>
<span># doing something with it, particularly splitting on a delimiter, is so common</span>
<span># in UNIX that AWK is a scripting language that does all of it for you, without</span>
<span># you needing to ask. All you have to do is write the patterns and actions</span>
<span># based on what you expect of the input, and what you want to do with it.</span>

<span># Here's a quick example of a simple script, the sort of thing AWK is perfect</span>
<span># for. It will read a name from standard input and then will print the average</span>
<span># age of everyone with that first name. Let's say you supply as an argument the</span>
<span># name of a this data file:</span>
<span>#</span>
<span># Bob Jones 32</span>
<span># Jane Doe 22</span>
<span># Steve Stevens 83</span>
<span># Bob Smith 29</span>
<span># Bob Barker 72</span>
<span>#</span>
<span># Here's the script:</span>

<span>BEGIN</span> <span>{</span>

    <span># First, ask the user for the name</span>
    <span>print</span> <span>"What name would you like the average age for?"</span><span>;</span>

    <span># Get a line from standard input, not from files on the command line</span>
    <span>getline</span> <span>name</span> <span>&lt;</span> <span>"/dev/stdin"</span><span>;</span>
<span>}</span>

<span># Now, match every line whose first field is the given name</span>
<span>$</span><span>1</span> <span>==</span> <span>name</span> <span>{</span>

    <span># Inside here, we have access to a number of useful variables, already</span>
    <span># pre-loaded for us:</span>
    <span># $0 is the entire line</span>
    <span># $3 is the third field, the age, which is what we're interested in here</span>
    <span># NF is the number of fields, which should be 3</span>
    <span># NR is the number of records (lines) seen so far</span>
    <span># FILENAME is the name of the file being processed</span>
    <span># FS is the field separator being used, which is " " here</span>
    <span># ...etc. There are plenty more, documented in the man page.</span>

    <span># Keep track of a running total and how many lines matched</span>
    <span>sum</span> <span>+=</span> <span>$</span><span>3</span><span>;</span>
    <span>nlines</span><span>++</span><span>;</span>
<span>}</span>

<span># Another special pattern is called END. It will run after processing all the</span>
<span># text files. Unlike BEGIN, it will only run if you've given it input to</span>
<span># process. It will run after all the files have been read and processed</span>
<span># according to the rules and actions you've provided. The purpose of it is</span>
<span># usually to output some kind of final report, or do something with the</span>
<span># aggregate of the data you've accumulated over the course of the script.</span>

<span>END</span> <span>{</span>
    <span>if</span> <span>(</span><span>nlines</span><span>)</span>
        <span>print</span> <span>"The average age for "</span> <span>name</span> <span>" is "</span> <span>sum</span> <span>/</span> <span>nlines</span><span>;</span>
<span>}</span>
</pre><p><span>&nbsp;</span></p></div>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43863/snakemake-tutorials</guid>
	<pubDate>Mon, 09 May 2022 05:20:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43863/snakemake-tutorials</link>
	<title><![CDATA[Snakemake Tutorials !]]></title>
	<description><![CDATA[<p>A lesson introducing the Snakemake workflow system for bioinformatics analysis.</p>
<blockquote>
<h2 id="prerequisites">Prerequisites<a href="https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/index.html#prerequisites"></a></h2>
<p>This is an intermediate lesson and assumes learners have already done some bioinformatics:</p>
<ul>
<li>Familiarity with the BASH command shell, including concepts like pipes, variables and loops.</li>
<li>Knowledge of bioinformatics fundamentals like the FASTQ file format and transcriptome sequencing, in order to understand the example workflow.</li>
</ul>
<p>No previous knowledge of Snakemake or workflow systems is required.</p>
<p>https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/index.html</p>
</blockquote><p>Address of the bookmark: <a href="https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/aio/index.html" rel="nofollow">https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/aio/index.html</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/18820/jrfsrf-at-university-of-calcutta</guid>
  <pubDate>Fri, 31 Oct 2014 08:53:10 -0500</pubDate>
  <link></link>
  <title><![CDATA[JRF/SRF at University of Calcutta]]></title>
  <description><![CDATA[
<p>Applications are invited to appear at a walk-in-interview for one post of Junior Research Fellow in the DBT(DBT Twinning NER) sponsored project entitled “Protein folding kinetics is a selection force on shaping codon usage bias in the high expression genes” in the room of the HOD, Department of Biotechnology and the Coordinator, DR. B. C. Guha Centre for Genetic Engineering and Biotechnology, University College of Science, 35 Ballygunge Circular Road, Kolkata 700019 on the 12th November, 2014 at 3:00 p.m.</p>

<p>Essential qualifications: First class M. Sc. in any branch of life sciences and qualified CSIR-UGC NET/GATE Examination.</p>

<p>Desirable qualifications: Practical experience in biochemical and biophysical studies of proteins</p>

<p>Emoluments: as per DBT norms</p>

<p>The project is tenable for two years, initially for one year.</p>

<p>Age: Below 28 years (relaxable in the case of SC/ST/OBC/women candidates)</p>

<p>Candidates are requested to bring two sets of complete applications on plain paper furnishing bio-data and copies of attested certificates along with originals (for verification) on the date of interview.</p>

<p>No TA/DA is admissible for candidates appearing at the interview.</p>

<p>Dr. Rajat Banerjee<br />Assistant Professor<br />Department of Biotechnology and<br />Dr. B. C. Guha Centre for Genetic Engineering and Biotechnology<br />University College of Science<br />35, Ballygunge Circular Road<br />Kolkata 700019</p>

<p>Advertisement: www.caluniv.ac.in/news/jrf_biotech_2.pdf</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/21471/opening-for-raextended-srf-in-bioinformatics-project-by-dbt-at-bose-institute</guid>
  <pubDate>Sun, 01 Mar 2015 00:50:18 -0600</pubDate>
  <link></link>
  <title><![CDATA[Opening for RA/extended SRF in Bioinformatics project by DBT at Bose Institute]]></title>
  <description><![CDATA[
<p>The institute has evolved over the years into a multi-disciplinary research organization with stress on fundamental research in its pursuit of advancement of knowledge in Science and technology and at the same time developing highly competent and able scientific manpower for the country. The institute has on its staff highly qualified and experienced scientists working in the field of Biological, biochemical, Chemical and Physical sciences placed in long established departments of Physics, Chemistry, Botany, Microbiology, Biochemistry, and Biophysics, and the research sections on plant Molecular &amp; Cellular Genetics, Animal Physiology, Immunotechnology and Environmental science</p>

<p>Walk-in-Interview will be held on 04th March 2015 at 11.30 A.M. in the Bio- Informatics Centre of Bose Institute, P-1/12, C.I.T. Scheme VII-M, Kolkata- 700054 for two (02) positions of Research Associate/ Extended Senior Research Fellow in the DBT sponsored following two projects running under the CoE- Bioinformatics under the guidance of Prof. Pinakpani Chakrabarti, Bioinformatics Centre.</p>

<p>Position : RA/SRF<br />Project title : 1. "Centre of Excellence (CoE) in Bioinformatics at Bose Institute”,2. Project entitled “Setting up of National Facility on Interactive Graphysics Computer System (IGCS) for Biomolecular Modeling, Molecular Dynamics &amp; Structures”</p>

<p>Desired Profile : Ph.D degree in Biological or Chemical Sciences with in-depth understanding of protein structure and dynamics for R.A. position.Those who have submitted thesis can be considered for Extended SRF position<br />Preferred : Knowledge of computer programming and bioinformatics softwares.<br />Stipend : For R.A- Rs. 22,000/- p.m., plus admissible H.R.A. and Medical benefit. For Extended SRF - Rs. 20,000/- p.m., plus admissible H.R.A.and Medical benefit.<br />Age : For R.A- Below 35 years; For Extended SRF - Below 33 years<br />Interested and eligible candidates should appear before the Selection Committee with atyped application addressed to the Sr.Prof. &amp; In-Charge, Registrar's Office, Bose Institute, P- 1/12, CIT Scheme VII-M, Kankurgachi, Kolkata-700054 along with Bio-data giving details of qualification i.e. examination passed, year, division, percentage of marks from Secondary onwards with attested copies of Certificates, Mark-Sheet and testimonials. The candidates should also bring the original mark-sheets, certificates etc. at the time of Interview.</p>

<p>Walk in Interview : 04.03.15</p>

<p>More at http://www.boseinst.ernet.in/ADVT/14/p_34.pdf</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/21619/research-associate-biotechnologyjrflab-assistant-indian-institute-of-vegetable-research-iivr-varanasi-uttar-pradesh</guid>
  <pubDate>Wed, 11 Mar 2015 08:59:27 -0500</pubDate>
  <link></link>
  <title><![CDATA[Research Associate Biotechnology/JRF/Lab. Assistant  Indian Institute of Vegetable Research (IIVR) - Varanasi, Uttar Pradesh]]></title>
  <description><![CDATA[
<p>F. No.: 2-19/2011-Adm.I </p>

<p>Research Associate Biotechnology /JRF / Lab. Assistant recruitment in Indian Institute of Vegetable Research </p>

<p>Project:<br />Genomics assisted selection of Solanum chilense introgression lines for enhancing drought tolerance in tomato <br />Post Name : Research Associate <br />Qualification : Ph.D in Biotechnology/ Bioinformatics/Genetics &amp; Plant Breeding. M. Tech in Computer Science with at least one research paper in science citation indexed journal. Desirable: Experience in bioinformatics and next generation sequence data handling. Familiarity in Linux, R, Perl/Phython or other programming languages. Willingness to travel to European partner centers. </p>

<p>Pay Scale : Rs. 36000 for 1st and 2nd year as per rules for Research Associate. Rs. 25000/- for 1st and 2nd year and Rs. 28000 as per rules for Junior Research Fellow. Rs. 7000/- for Lab. Assistant. </p>

<p>Age : Not more than 35 years for Men and 40 years for Women (Relaxable for SC/ST/OBC/PH candidates as per rules) for Research Associate/ Junior Research Fellow. Minimum age will be 21 years and maximum age will be 45 years (Relaxable for SC/ST/OBC/PH candidates as per rules) for Lab.Assistant.</p>

<p>More at http://iivr.org.in/Job%20Oppurtunities/RA20.03.2015.pdf</p>
]]></description>
</item>

</channel>
</rss>