<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44279?offset=610</link>
	<atom:link href="https://bioinformaticsonline.com/related/44279?offset=610" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	
<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/10127/assistant-professor-at-sardar-patel-university</guid>
  <pubDate>Mon, 21 Apr 2014 21:03:55 -0500</pubDate>
  <link></link>
  <title><![CDATA[Assistant Professor at SARDAR PATEL UNIVERSITY]]></title>
  <description><![CDATA[
<p>SARDAR PATEL UNIVERSITY<br />Centre for Interdisciplinary Studies in Science and Technology</p>

<p>No.: SPU/CISST/Advt./2014-15/519</p>

<p>ADVERTISEMENT for Teaching Positions (Contractual)</p>

<p>Applications for the following Contractual Teaching Position are invited for Centre for Interdisciplinary Studies in Science and Technology (CISST), Sardar Patel University:</p>

<p>2. Assistant Professor (ONE) (Contractual)</p>

<p>For the subject of Bioinformatics</p>

<p>Qualifications:</p>

<p>(I) Good academic record as defined by the concerned university with at least 55 % marks (or an equivalent grade in a point scale wherever grading system is followed) at the Master’s level</p>

<p>(II) Ph.D. degree in the concerned subject or in a relevant interdisciplinary subject<br />from an Indian University or NET/SLET clearance Contractual appointment carries a total Fixed Emoluments of Rs. 30,000/- p.m without any assurance of permanent Positions and related benefits.</p>

<p>An Application Form in prescribed Performa, available on University Website: www.spuvvn.edu should be filled in completely in Twelve Copies with self attested copies of certificates of qualifications and experience. Only one copy of each mark sheet be attached with the first copy of the application form. All 12 (Twelve) Application forms should be sent to Registrar’s office along with Demand Draft of Application form fee of Rs. 250/- (Non-refundable) in favour of “REGISTRAR, SARDAR PATEL UNIVERSITY, VALLABH VIDYANAGAR”. The S.C. and S.T. category candidates need not to pay Application fee.</p>

<p>Applicants who are in service should apply through their present employers. Candidates called for interview shall be required to attend at their own cost.</p>

<p>In absence of suitable candidate, the University may relax the eligibility criteria, for conditional appointment.</p>

<p>The last date of receipt of application by the University is 30th April, 2014</p>

<p>Advertisement: www.spuvvn.edu/careers/CISST%20Advt.%20April%202014.pdf</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43329/postdoc-position-at-kiel-university-germany</guid>
  <pubDate>Sat, 28 Aug 2021 01:16:55 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoc position at Kiel University, Germany]]></title>
  <description><![CDATA[
<p>In the Genomic Microbiology Group of Prof. Tal Dagan at the Institute<br />of Microbiology at Kiel University, Germany, a</p>

<p>Postdoc position (m/w/d)</p>

<p>in the field of computational evolutionary microbiology is available<br />for an initially limited period of 36 months at the earliest possible<br />date. The weekly working time corresponds to 100% of full employment<br />(If the legal requirements under collective bargaining law are met, the<br />tariff grouping is carried out up to pay scale 13 TV-L. The obligation<br />to teach amounts to 4 hours.</p>

<p>The Genomic Microbiology Group research interests are focused on<br />microbial genome evolution with an emphasis on the study of lateral gene<br />transfer. In our research we use both computational and experimental<br />approaches (see www.uni-kiel.de/genomik). The position offers the<br />opportunity to develop an independent research profile within the group<br />research focus. The successful applicant is expected to be involved<br />in teaching of bioinformatics and molecular evolution, including the<br />development of teaching materials (lectures/exercises/short videos).</p>

<p>Your profile:<br />· Doctoral or PhD degree in Molecular Evolution, Bioinformatics or<br />related fields.<br />· Knowledge and experience in programming (e.g., Python) and<br />biostatistical analysis (e.g., with R or MatLab).<br />· Any of the following expertise is an advantage: the analysis of<br />genomic or transcriptomic data, phylogenetic reconstruction,<br />comparative genomics.<br />· Good oral and written communication skills in English.<br />· Ability to teach in German is an advantage (alternatively, an<br />indication to do so from the 2nd year on).<br />· Skills and motivation to communicate and interact with other<br />scientists.<br /> <br />The Christian-Albrechts-University sees itself as a modern and<br />cosmopolitan employer. We welcome your application regardless of your age,<br />gender, cultural and social background, religion, ideology, disability<br />or sexual identity. We promote equality of the sexes.</p>

<p>The Christian-Albrechts-University is committed to the employment of<br />people with disabilities. Preference will be given to applications from<br />severely handicapped persons and persons of equal standing, provided<br />they are suitable.</p>

<p>We expressly welcome applications from people with a migration background.</p>

<p>For enquiries regarding the position, teaching obligations and research<br />topic please contact Prof. Tal Dagan: tdagan@ifam.uni-kiel.de.</p>

<p>Applications should be submitted by email to Mrs. Haacks<br />(dhaacks@ifam.uni-kiel.de) as a single PDF and include: (1) a letter of<br />motivation (max 1 page, Arial 11, line spacing 1.15), (2) CV, (3) PhD<br />certificate. Please use 'GMG postdoc application - [your name]'<br />as a subject.</p>

<p>Please, refrain from sending us application photos.</p>

<p>Application deadline:  August 31 2021 or until the position is<br />filled. Interviews will take place during September/October 2021. The<br />planned starting date for the position is flexible (but in 2021).</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/10394/bioinformatics-protocols</guid>
	<pubDate>Mon, 05 May 2014 10:21:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/10394/bioinformatics-protocols</link>
	<title><![CDATA[Bioinformatics Protocols]]></title>
	<description><![CDATA[<h2><span> RNA Seq </span></h2>
<p><strong> Basic Galaxy Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/pub?id=1KbTiBHtvHLfPRZ39AY3uriazrINA8TJzgjjwn1zPP7Y">RNA-Seq tutorial</a> based on <a href="http://www.nature.com/protocolexchange/protocols/2327">Trapnell et al. (2012)</a> <em>Nature Protocols</em></li>
</ul>
<dl><dd>In this tutorial we cover the concepts of <a href="http://en.wikipedia.org/wiki/RNA-Seq">RNA-Seq</a> differential gene expression (DGE) analysis using a very small synthetic dataset from a well studied organism.</dd></dl>
<p><strong> Advanced Galaxy Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1fQ1XfeOKhezJUDTzMXtZVY20c3RGoHe-HLvFOGzqU4s/pub">RNA-Seq (Advanced) Tutorial</a></li>
</ul>
<dl><dd>In this tutorial we compare the performance of three statistically-based differential expression tools:</dd><dd>* CuffDiff</dd><dd>* EdgeR</dd><dd>* DESeq2</dd></dl>
<p><strong> Advanced Command Line Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1ayJXtgBP1OXtnV7o7lq4QHKMNk5SdPHFq4hGkqndBtI/pub">Graphical Output with CummeRbund</a> introduces some basic commands using the cummeRbund package of the R programming language</li>
</ul>
<dl><dd>You will need to install R, RStudio and cummeRbund on your PC (explained in the Tutorial). You will learn how to produce graphical output from RNA-Seq analysis previously done using a Cuffdiff analysis.</dd></dl>
<h2><span> Variant Detection </span></h2>
<p><strong> Basic Galaxy Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/pub?id=1ZRzrjjOCvtAu3m-IKL-rbJ1f4On60dDL_IEwG7oejdI">Variant Detection tutorial</a></li>
</ul>
<dl><dd>In this tutorial we cover the concepts of detecting small variants (SNVs and indels) in human genomic DNA using a small set of reads from chromosome 22.</dd></dl>
<p><strong>Advanced Galaxy Tutorial</strong></p>
<ul>
<li><a href="https://docs.google.com/document/pub?id=1CuKkKylVDb03tnN7RSWl5EUzleetn0ctjmvaidPKLxM">Variant Detection (Advanced) Tutorial</a></li>
</ul>
<dl><dd>In this tutorial we compare the performance of three statistically-based variant detection tools:</dd><dd>* SAMtools: Mpileup</dd><dd>* GATK: Unified Genotyper</dd><dd>* FreeBayes</dd><dd>Each of these tools takes as its input a BAM file of aligned reads and generates a list of likely variants in VCF format</dd></dl>
<p><strong>Pipelines</strong> are for those who are comfortable with using the UNIX command line; and often allow more control over branching and iteration logic.</p>
<ul>
<li><a href="https://github.com/claresloggett/variant_calling_pipeline">WGS/exome GATK-based variant calling pipeline</a></li>
</ul>
<dl><dd>This is a basic variant-calling and annotation pipeline developed at the Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne. It is based around BWA, GATK and ENSEMBL and was originally designed for human (or similar) data. The master branch is configured for WGS data; there is an exome branch configured for variant calling in exome data.</dd><dd>To run the pipeline you will need Rubra: <a href="https://github.com/bjpop/rubra">https://github.com/bjpop/rubra</a>. Rubra uses the python Ruffus library: <a href="http://www.ruffus.org.uk/">http://www.ruffus.org.uk/</a>.</dd></dl>
<p><strong>Protocols</strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1lfDYNzHjfDA1pHTHd-0w3xHhg7L4TipT1gRfzgiV8es/pub">Familial Variant Calling</a></li>
</ul>
<dl><dd>In this protocol we discuss and outline the process of calling familial related mutations.</dd></dl>
<ul>
<li><a href="https://docs.google.com/document/d/1PIhm8NrFGaSK0hxpDcp8wUOz11ZkOaHIrpnJshMgDec/pub">Somatic Variant Calling</a></li>
</ul>
<dl><dd>In this protocol we discuss and outline the process of identifying somatic variants or mutations.</dd></dl>
<h2><span> Assembly </span></h2>
<p><strong> Basic Galaxy Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/pub?id=1N3AB9ptISUu4zULqe1kXpVF0BDyGb5f5yzxWSJd_WNM">Genome assembly tutorial</a></li>
</ul>
<dl><dd>In this tutorial we carry out de novo assembly of a microbial genome. We have also written a <a href="https://docs.google.com/document/d/1xs-TI5MejQARqo0pcocGlymsXldwJbJII890gnmjI0o/pub">De novo Genome Assembly for Illumina Data</a> Protocol for a more generic description of the method.</dd></dl>
<p><strong> Protocol </strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1xs-TI5MejQARqo0pcocGlymsXldwJbJII890gnmjI0o/pub">De novo Genome Assembly for Illumina Data</a></li>
</ul>
<dl><dd>In this protocol we discuss and outline the process of de novo assembly for small to medium sized genomes. Use our <a href="https://docs.google.com/document/pub?id=1N3AB9ptISUu4zULqe1kXpVF0BDyGb5f5yzxWSJd_WNM">Genome assembly tutorial</a> to learn a specific case of using Galaxy to carry out de novo assembly of a microbial genome.</dd></dl>
<h2><span> Small RNAs </span></h2>
<p><strong> Basic Galaxy Tutorial </strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1WAObJr7M0m8U-2ku-0Y0Sdt_IHmqd1h8WaJHPhnJ1lM/pub">Quality control for small RNA</a></li>
</ul>
<dl><dd>This tutorial covers initial steps of the workflow for analysis of short RNA expression such as a quality control of the raw reads, processing of the raw reads for the subsequent analysis and initial quality assessment of the library.</dd></dl>
<h2><span> ChIP Seq </span></h2>
<p><strong> Protocol </strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1UPJC8dsiDeP5R9MH9U0IvoDgPF2Q3EOstAuzS3e6WCE/pub">ChIP-Seq</a></li>
</ul>
<dl><dd>In this protocol we discuss ChIP-Seq: a method to analyze the interaction between proteins and DNA.</dd></dl>
<h2><span> Amplicons </span></h2>
<p><strong>Protocol</strong></p>
<ul>
<li><a href="https://docs.google.com/document/d/1uW7JzxG86QzS92hTyeuNsLhX_d1XFbaZPSjh7jWxcSg/pub">Amplicon Alignment</a></li>
</ul>
<dl><dd>In this protocol we discuss and outline the process of aligning custom amplicons using primers for high precision.</dd></dl>
<h2><span> Learn Galaxy </span></h2>
<p><a href="https://docs.google.com/document/d/1wsdJDYfjZVg2uJxm9AHi_j0mY3X1M1F4gB-elkuYL7c/pub">Introduction to Galaxy,</a> for those who are very new to Galaxy.</p>
<p><a href="https://docs.google.com/document/d/1t7vVqa3mdeZYPv5-8hiHBFBYhNiynV_3mWByno9-wUM/pub">Using Histories and Workflows,</a> for those with some Galaxy knowledge.</p>
<p>The Galaxy project website has many <a href="http://wiki.galaxyproject.org/Learn">tutorials</a> and <a href="http://wiki.galaxyproject.org/Learn/Screencasts">screencasts</a> about using Galaxy and the tools, and developing new tools.</p><p>Address of the bookmark: <a href="https://genome.edu.au/wiki/Learn" rel="nofollow">https://genome.edu.au/wiki/Learn</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44516/16srna-database-download</guid>
	<pubDate>Wed, 24 Apr 2024 04:33:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44516/16srna-database-download</link>
	<title><![CDATA[16sRNA Database Download]]></title>
	<description><![CDATA[<p>Downloading 16S rRNA databases can be crucial for various bioinformatics analyses, especially in microbiome research. However, it's important to note that databases can vary based on your specific needs, such as the taxonomic coverage you require or the type of analysis you're performing. Here's a general guideline on how you can obtain 16S rRNA databases:</p><ol>
<li>
<p><span>NCBI (National Center for Biotechnology Information)</span>:</p>
<ul>
<li>NCBI provides various databases related to genetic information, including 16S rRNA sequences.</li>
<li>You can access the 16S ribosomal RNA sequences from NCBI's Nucleotide database (<a href="https://www.ncbi.nlm.nih.gov/nucleotide/" target="_new">https://www.ncbi.nlm.nih.gov/nucleotide/</a>).</li>
<li>Perform a search using keywords like "16S rRNA" or specific bacterial names to find relevant sequences.</li>
<li>You can download sequences individually or in batches using the provided tools.</li>
</ul>
</li>
<li>
<p><span>GreenGenes</span>:</p>
<ul>
<li>GreenGenes is a widely used 16S rRNA gene sequence database.</li>
<li>You can access it at <a target="_new">http://greengenes.secondgenome.com/</a>.</li>
<li>GreenGenes provides precompiled databases for various purposes, including classification, alignment, and phylogenetic analysis.</li>
</ul>
</li>
<li>
<p><span>SILVA</span>:</p>
<ul>
<li>SILVA (<a href="https://www.arb-silva.de/" target="_new">https://www.arb-silva.de/</a>) is another comprehensive database for ribosomal RNA (rRNA) sequences.</li>
<li>It covers not only 16S rRNA but also other ribosomal RNA sequences.</li>
<li>SILVA provides precompiled databases for various purposes, including taxonomic classification and alignment.</li>
</ul>
</li>
<li>
<p><span>Ribosomal Database Project (RDP)</span>:</p>
<ul>
<li>RDP (<a target="_new">http://rdp.cme.msu.edu/</a>) is a curated database that offers 16S rRNA sequences.</li>
<li>It provides tools for sequence analysis and classification.</li>
<li>You can download sequences and taxonomy information from their website.</li>
</ul>
</li>
<li>
<p><span>QIIME (Quantitative Insights Into Microbial Ecology)</span>:</p>
<ul>
<li>QIIME (<a href="https://qiime2.org/" target="_new">https://qiime2.org/</a>) is a widely used bioinformatics platform for microbiome analysis.</li>
<li>It provides tools for analyzing microbial communities, including processing 16S rRNA sequences.</li>
<li>QIIME often includes its own preprocessed 16S rRNA databases that can be used for analysis within the platform.</li>
</ul>
</li>
</ol><p>Before downloading any database, make sure to read the terms of use and citation requirements, as some databases may have specific usage policies. Additionally, consider the compatibility of the database with your analysis pipeline and software tools.</p><p>&nbsp;</p><p>NCBI 16s RNA database location&nbsp;ftp://ftp.ncbi.nih.gov/blast/db/16SMicrobial.tar.gz</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/10457/assistant-professor-bio-informatics-at-health-and-family-welfare-department-medical-education-in-raipur</guid>
  <pubDate>Wed, 07 May 2014 00:08:38 -0500</pubDate>
  <link></link>
  <title><![CDATA[Assistant Professor (Bio-Informatics) at Health and Family Welfare Department (Medical Education) in Raipur]]></title>
  <description><![CDATA[
<p>Advertisement No.05/2014/ Exam/Dated 17/04/2014</p>

<p>No of vacancies: 01</p>

<p>Pay scale:Rs. 15600 – 39100 + 6600/-</p>

<p>Essential Academic Qualifications / Experience : Good academic record as defined by the concerned university with at least 55% marks (or an equivalent grade in a point scale wherever grading system is followed) at the Master's Degree level in a relevant subject from an Indian University, or an equivalent degree from an accredited foreign university.</p>

<p>Besides fulfilling the above qualifications, the candidate must have cleared the National Eligibility Test (NET) conducted by the UGC, CSIR or similar test accredited by the UGC like SLET/ SET.</p>

<p>Notwithstanding anything contained in sub-clauses (a) and (b) to this Clause, candidates, who are, or have been awarded a Ph.D. Degree in accordance with the University Grants Commission (Minimum Standards and Procedure for Award of Ph.D. Degree) Regulations, 2009, shall be exempted from the requirement of the minimum eligibility condition of NET/SLET/SET for recruitment and appointment of Assistant Professor or equivalent positions in Universities/Colleges/Institutions.</p>

<p>NET/SLET/SET shall also not be required for such Masters Programmes in disciplines for which NET/SLET/SET is not conducted.</p>

<p>Apply online: http://www.psc.cg.gov.in/htm/OA_ME2014.html</p>

<p>Last Date for Online Registration: 22/05/2014</p>

<p>For more details: http://www.psc.cg.gov.in/pdf/Advertisement/ADV_ME2014.pdf</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/41899/stay-at-home-revbayes-workshop</guid>
  <pubDate>Sat, 20 Jun 2020 11:53:24 -0500</pubDate>
  <link></link>
  <title><![CDATA[Stay-at-Home RevBayes Workshop]]></title>
  <description><![CDATA[
<p>Stay-at-Home RevBayes Workshop<br />Location: Anywhere (online-only event)<br />Dates: 7/13, 2020 to 8/12, 2020<br />Instructors: Joëlle Barido-Sottani, Walker Pett, Josh Justison, Wade Dismukes, Luiza Fabreti, Tracy Heath, Jeremy M. Brown, Rosana Zenil-Ferguson<br />Register: https://iastate.qualtrics.com/jfe/form/SV_02sCYRWbxYK9I5D</p>

<p>Description<br />This free online-only RevBayes workshop will provide an introduction to the theory and use of RevBayes, with a focus on (1) tree inference from molecular data, (2) analyses combining fossil and extant taxa, and (3) evaluating MCMC performance, with advanced topics including assessing model adequacy and macroevolutionary analyses. Additional topics may be added depending on the interests of selected participants. The format will be a combination of interactive video sessions (via Zoom or similar tools), real-time discussions over Slack, self-guided tutorials, and pre-recorded videos.</p>

<p>The initial session will resolve technical issues and present the basics of using RevBayes. Participants will then be expected to work through several tutorials on their own schedule, with the help of pre-recorded materials. A Slack forum will be open for questions and issues. The workshop will conclude with several online Q&amp;A sessions with the instructors. The dates for the interactive sessions are currently tentative and may be adjusted depending on the schedules of the participants and instructors.</p>

<p>We are hoping to identify up to 15 participants for this online course. While we hope we are able to accommodate everyone who applies, we realize that this may not be possible because of time-zones and availability. If the number of applicants exceeds our capacity, we hope to organize a second round of sessions later in the year. Participants will not be charged for the course, but we will request that they commit to completing the tutorials and attending a majority of interactive sessions.</p>

<p>To apply to this course, please go to the registration form and submit your application by July 6, 2020.</p>

<p>More at https://revbayes.github.io/workshops/online2020.html</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/10659/gps-dna-tracking-university-of-sheffield</guid>
	<pubDate>Sat, 10 May 2014 04:33:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/10659/gps-dna-tracking-university-of-sheffield</link>
	<title><![CDATA[GPS DNA tracking - University of Sheffield]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/Aap-s1kle4Q" frameborder="0" allowfullscreen></iframe>University of Sheffield geneticist and bioinformatics expert Dr Eran Elhaik demonstrates the power of his new DNA research, which allows people to discover their genetic homeland from 1000 years ago. Find out more about our biological research here http://www.sheffield.ac.uk/aps]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40946/free-genomics-data</guid>
	<pubDate>Fri, 07 Feb 2020 14:08:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40946/free-genomics-data</link>
	<title><![CDATA[Free Genomics data !]]></title>
	<description><![CDATA[<p><span>The specimens were collected by the Oxford Wytham Woods and Edinburgh Lohse lab teams. DNA extraction and sequencing was carried out by the Sanger Institute Scientific Operations teams. Assemblies were carried out by the Tree of Life team (Shane McCarthy) and colleagues in Pacific Biosciences (Jonas Korlach).</span></p>
<p><a href="https://www.darwintreeoflife.org/an-initial-set-of-raw-genome-assemblies-from-the-darwin-tree-of-life-project/">https://www.darwintreeoflife.org/an-initial-set-of-raw-genome-assemblies-from-the-darwin-tree-of-life-project/</a></p><p>Address of the bookmark: <a href="https://www.darwintreeoflife.org/an-initial-set-of-raw-genome-assemblies-from-the-darwin-tree-of-life-project/" rel="nofollow">https://www.darwintreeoflife.org/an-initial-set-of-raw-genome-assemblies-from-the-darwin-tree-of-life-project/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/10741/managing-and-analyzing-next-generation-sequence-data</guid>
	<pubDate>Sat, 10 May 2014 06:28:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/10741/managing-and-analyzing-next-generation-sequence-data</link>
	<title><![CDATA[Managing and Analyzing Next-Generation Sequence Data]]></title>
	<description><![CDATA[<p>Centralized Bioinformatics Core Facilities provide shared resources for the computational and IT requirements of the investigators in their department or institution. As such, they must be able to effectively react to new types of experimental technology. Recently faced with an unprecedented flood of data generated by the next generation of DNA sequencers, these groups found it necessary to respond quickly and efficiently to the informatics and infrastructure demands. Centralized Facilities newly facing this challenge need to anticipate time and design considerations of necessary components, including infrastructure upgrades, staffing, and tools for data analyses and management ...</p>
<p>More at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369</p><p>Address of the bookmark: <a href="http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369" rel="nofollow">http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/39307/awk-for-beginners</guid>
	<pubDate>Fri, 26 Apr 2019 16:19:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/39307/awk-for-beginners</link>
	<title><![CDATA[AWK for beginners !]]></title>
	<description><![CDATA[<p>AWK is a standard tool on every POSIX-compliant UNIX system. It&rsquo;s like flex/lex, from the command-line, perfect for text-processing tasks and other scripting needs. It has a C-like syntax, but without mandatory semicolons (although, you should use them anyway, because they are required when you&rsquo;re writing one-liners, something AWK excels at), manual memory management, or static typing. It excels at text processing. You can call to it from a shell script, or you can use it as a stand-alone scripting language.</p><p>Why use AWK instead of Perl? Readability. AWK is easier to read than Perl. For simple text-processing scripts, particularly ones that read files line by line and split on delimiters, AWK is probably the right tool for the job.</p><div><pre><span>#!/usr/bin/awk -f</span>

<span># Comments are like this</span>


<span># AWK programs consist of a collection of patterns and actions.</span>
<span>pattern1</span> <span>{</span> <span>action</span><span>;</span> <span>}</span> <span># just like lex</span>
<span>pattern2</span> <span>{</span> <span>action</span><span>;</span> <span>}</span>

<span># There is an implied loop and AWK automatically reads and parses each</span>
<span># record of each file supplied. Each record is split by the FS delimiter,</span>
<span># which defaults to white-space (multiple spaces,tabs count as one)</span>
<span># You can assign FS either on the command line (-F C) or in your BEGIN</span>
<span># pattern</span>

<span># One of the special patterns is BEGIN. The BEGIN pattern is true</span>
<span># BEFORE any of the files are read. The END pattern is true after</span>
<span># an End-of-file from the last file (or standard-in if no files specified)</span>
<span># There is also an output field separator (OFS) that you can assign, which</span>
<span># defaults to a single space</span>

<span>BEGIN</span> <span>{</span>

    <span># BEGIN will run at the beginning of the program. It's where you put all</span>
    <span># the preliminary set-up code, before you process any text files. If you</span>
    <span># have no text files, then think of BEGIN as the main entry point.</span>

    <span># Variables are global. Just set them or use them, no need to declare..</span>
    <span>count</span> <span>=</span> <span>0</span><span>;</span>

    <span># Operators just like in C and friends</span>
    <span>a</span> <span>=</span> <span>count</span> <span>+</span> <span>1</span><span>;</span>
    <span>b</span> <span>=</span> <span>count</span> <span>-</span> <span>1</span><span>;</span>
    <span>c</span> <span>=</span> <span>count</span> <span>*</span> <span>1</span><span>;</span>
    <span>d</span> <span>=</span> <span>count</span> <span>/</span> <span>1</span><span>;</span> <span># integer division</span>
    <span>e</span> <span>=</span> <span>count</span> <span>%</span> <span>1</span><span>;</span> <span># modulus</span>
    <span>f</span> <span>=</span> <span>count</span> <span>^</span> <span>1</span><span>;</span> <span># exponentiation</span>

    <span>a</span> <span>+=</span> <span>1</span><span>;</span>
    <span>b</span> <span>-=</span> <span>1</span><span>;</span>
    <span>c</span> <span>*=</span> <span>1</span><span>;</span>
    <span>d</span> <span>/=</span> <span>1</span><span>;</span>
    <span>e</span> <span>%=</span> <span>1</span><span>;</span>
    <span>f</span> <span>^=</span> <span>1</span><span>;</span>

    <span># Incrementing and decrementing by one</span>
    <span>a</span><span>++</span><span>;</span>
    <span>b</span><span>--</span><span>;</span>

    <span># As a prefix operator, it returns the incremented value</span>
    <span>++</span><span>a</span><span>;</span>
    <span>--</span><span>b</span><span>;</span>

    <span># Notice, also, no punctuation such as semicolons to terminate statements</span>

    <span># Control statements</span>
    <span>if</span> <span>(</span><span>count</span> <span>==</span> <span>0</span><span>)</span>
        <span>print</span> <span>"Starting with count of 0"</span><span>;</span>
    <span>else</span>
        <span>print</span> <span>"Huh?"</span><span>;</span>

    <span># Or you could use the ternary operator</span>
    <span>print</span> <span>(</span><span>count</span> <span>==</span> <span>0</span><span>)</span> <span>?</span> <span>"Starting with count of 0"</span> <span>:</span> <span>"Huh?"</span><span>;</span>

    <span># Blocks consisting of multiple lines use braces</span>
    <span>while</span> <span>(</span><span>a</span> <span>&lt;</span> <span>10</span><span>)</span> <span>{</span>
        <span>print</span> <span>"String concatenation is done"</span> <span>" with a series"</span> <span>" of"</span>
            <span>" space-separated strings"</span><span>;</span>
        <span>print</span> <span>a</span><span>;</span>

        <span>a</span><span>++</span><span>;</span>
    <span>}</span>

    <span>for</span> <span>(</span><span>i</span> <span>=</span> <span>0</span><span>;</span> <span>i</span> <span>&lt;</span> <span>10</span><span>;</span> <span>i</span><span>++</span><span>)</span>
        <span>print</span> <span>"Good ol' for loop"</span><span>;</span>

    <span># As for comparisons, they're the standards:</span>
    <span># a &lt; b   # Less than</span>
    <span># a &lt;= b  # Less than or equal</span>
    <span># a != b  # Not equal</span>
    <span># a == b  # Equal</span>
    <span># a &gt; b   # Greater than</span>
    <span># a &gt;= b  # Greater than or equal</span>

    <span># Logical operators as well</span>
    <span># a &amp;&amp; b  # AND</span>
    <span># a || b  # OR</span>

    <span># In addition, there's the super useful regular expression match</span>
    <span>if</span> <span>(</span><span>"foo"</span> <span>~</span> <span>"^fo+$"</span><span>)</span>
        <span>print</span> <span>"Fooey!"</span><span>;</span>
    <span>if</span> <span>(</span><span>"boo"</span> <span>!~</span> <span>"^fo+$"</span><span>)</span>
        <span>print</span> <span>"Boo!"</span><span>;</span>

    <span># Arrays</span>
    <span>arr</span><span>[</span><span>0</span><span>]</span> <span>=</span> <span>"foo"</span><span>;</span>
    <span>arr</span><span>[</span><span>1</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>

    <span># You can also initialize an array with the built-in function split()</span>

    <span>n</span> <span>=</span> <span>split</span><span>(</span><span>"foo:bar:baz"</span><span>,</span> <span>arr</span><span>,</span> <span>":"</span><span>);</span>

    <span># You also have associative arrays (actually, they're all associative arrays)</span>
    <span>assoc</span><span>[</span><span>"foo"</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>
    <span>assoc</span><span>[</span><span>"bar"</span><span>]</span> <span>=</span> <span>"baz"</span><span>;</span>

    <span># And multi-dimensional arrays, with some limitations I won't mention here</span>
    <span>multidim</span><span>[</span><span>0</span><span>,</span><span>0</span><span>]</span> <span>=</span> <span>"foo"</span><span>;</span>
    <span>multidim</span><span>[</span><span>0</span><span>,</span><span>1</span><span>]</span> <span>=</span> <span>"bar"</span><span>;</span>
    <span>multidim</span><span>[</span><span>1</span><span>,</span><span>0</span><span>]</span> <span>=</span> <span>"baz"</span><span>;</span>
    <span>multidim</span><span>[</span><span>1</span><span>,</span><span>1</span><span>]</span> <span>=</span> <span>"boo"</span><span>;</span>

    <span># You can test for array membership</span>
    <span>if</span> <span>(</span><span>"foo"</span> <span>in</span> <span>assoc</span><span>)</span>
        <span>print</span> <span>"Fooey!"</span><span>;</span>

    <span># You can also use the 'in' operator to traverse the keys of an array</span>
    <span>for</span> <span>(</span><span>key</span> <span>in</span> <span>assoc</span><span>)</span>
        <span>print</span> <span>assoc</span><span>[</span><span>key</span><span>];</span>

    <span># The command line is in a special array called ARGV</span>
    <span>for</span> <span>(</span><span>argnum</span> <span>in</span> <span>ARGV</span><span>)</span>
        <span>print</span> <span>ARGV</span><span>[</span><span>argnum</span><span>];</span>

    <span># You can remove elements of an array</span>
    <span># This is particularly useful to prevent AWK from assuming the arguments</span>
    <span># are files for it to process</span>
    <span>delete</span> <span>ARGV</span><span>[</span><span>1</span><span>];</span>

    <span># The number of command line arguments is in a variable called ARGC</span>
    <span>print</span> <span>ARGC</span><span>;</span>

    <span># AWK has several built-in functions. They fall into three categories. I'll</span>
    <span># demonstrate each of them in their own functions, defined later.</span>

    <span>return_value</span> <span>=</span> <span>arithmetic_functions</span><span>(</span><span>a</span><span>,</span> <span>b</span><span>,</span> <span>c</span><span>);</span>
    <span>string_functions</span><span>();</span>
    <span>io_functions</span><span>();</span>
<span>}</span>

<span># Here's how you define a function</span>
<span>function</span> <span>arithmetic_functions</span><span>(</span><span>a</span><span>,</span> <span>b</span><span>,</span> <span>c</span><span>,</span>     <span>d</span><span>)</span> <span>{</span>

    <span># Probably the most annoying part of AWK is that there are no local</span>
    <span># variables. Everything is global. For short scripts, this is fine, even</span>
    <span># useful, but for longer scripts, this can be a problem.</span>

    <span># There is a work-around (ahem, hack). Function arguments are local to the</span>
    <span># function, and AWK allows you to define more function arguments than it</span>
    <span># needs. So just stick local variable in the function declaration, like I</span>
    <span># did above. As a convention, stick in some extra whitespace to distinguish</span>
    <span># between actual function parameters and local variables. In this example,</span>
    <span># a, b, and c are actual parameters, while d is merely a local variable.</span>

    <span># Now, to demonstrate the arithmetic functions</span>

    <span># Most AWK implementations have some standard trig functions</span>
    <span>localvar</span> <span>=</span> <span>sin</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>cos</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>atan2</span><span>(</span><span>b</span><span>,</span> <span>a</span><span>);</span> <span># arc tangent of b / a</span>

    <span># And logarithmic stuff</span>
    <span>localvar</span> <span>=</span> <span>exp</span><span>(</span><span>a</span><span>);</span>
    <span>localvar</span> <span>=</span> <span>log</span><span>(</span><span>a</span><span>);</span>

    <span># Square root</span>
    <span>localvar</span> <span>=</span> <span>sqrt</span><span>(</span><span>a</span><span>);</span>

    <span># Truncate floating point to integer</span>
    <span>localvar</span> <span>=</span> <span>int</span><span>(</span><span>5.34</span><span>);</span> <span># localvar =&gt; 5</span>

    <span># Random numbers</span>
    <span>srand</span><span>();</span> <span># Supply a seed as an argument. By default, it uses the time of day</span>
    <span>localvar</span> <span>=</span> <span>rand</span><span>();</span> <span># Random number between 0 and 1.</span>

    <span># Here's how to return a value</span>
    <span>return</span> <span>localvar</span><span>;</span>
<span>}</span>

<span>function</span> <span>string_functions</span><span>(</span>    <span>localvar</span><span>,</span> <span>arr</span><span>)</span> <span>{</span>

    <span># AWK, being a string-processing language, has several string-related</span>
    <span># functions, many of which rely heavily on regular expressions.</span>

    <span># Search and replace, first instance (sub) or all instances (gsub)</span>
    <span># Both return number of matches replaced</span>
    <span>localvar</span> <span>=</span> <span>"fooooobar"</span><span>;</span>
    <span>sub</span><span>(</span><span>"fo+"</span><span>,</span> <span>"Meet me at the "</span><span>,</span> <span>localvar</span><span>);</span> <span># localvar =&gt; "Meet me at the bar"</span>
    <span>gsub</span><span>(</span><span>"e+"</span><span>,</span> <span>"."</span><span>,</span> <span>localvar</span><span>);</span> <span># localvar =&gt; "m..t m. at th. bar"</span>

    <span># Search for a string that matches a regular expression</span>
    <span># index() does the same thing, but doesn't allow a regular expression</span>
    <span>match</span><span>(</span><span>localvar</span><span>,</span> <span>"t"</span><span>);</span> <span># =&gt; 4, since the 't' is the fourth character</span>

    <span># Split on a delimiter</span>
    <span>n</span> <span>=</span> <span>split</span><span>(</span><span>"foo-bar-baz"</span><span>,</span> <span>arr</span><span>,</span> <span>"-"</span><span>);</span> <span># a[1] = "foo"; a[2] = "bar"; a[3] = "baz"; n = 3</span>

    <span># Other useful stuff</span>
    <span>sprintf</span><span>(</span><span>"%s %d %d %d"</span><span>,</span> <span>"Testing"</span><span>,</span> <span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span> <span># =&gt; "Testing 1 2 3"</span>
    <span>substr</span><span>(</span><span>"foobar"</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span> <span># =&gt; "oob"</span>
    <span>substr</span><span>(</span><span>"foobar"</span><span>,</span> <span>4</span><span>);</span> <span># =&gt; "bar"</span>
    <span>length</span><span>(</span><span>"foo"</span><span>);</span> <span># =&gt; 3</span>
    <span>tolower</span><span>(</span><span>"FOO"</span><span>);</span> <span># =&gt; "foo"</span>
    <span>toupper</span><span>(</span><span>"foo"</span><span>);</span> <span># =&gt; "FOO"</span>
<span>}</span>

<span>function</span> <span>io_functions</span><span>(</span>    <span>localvar</span><span>)</span> <span>{</span>

    <span># You've already seen print</span>
    <span>print</span> <span>"Hello world"</span><span>;</span>

    <span># There's also printf</span>
    <span>printf</span><span>(</span><span>"%s %d %d %d\n"</span><span>,</span> <span>"Testing"</span><span>,</span> <span>1</span><span>,</span> <span>2</span><span>,</span> <span>3</span><span>);</span>

    <span># AWK doesn't have file handles, per se. It will automatically open a file</span>
    <span># handle for you when you use something that needs one. The string you used</span>
    <span># for this can be treated as a file handle, for purposes of I/O. This makes</span>
    <span># it feel sort of like shell scripting, but to get the same output, the string</span>
    <span># must match exactly, so use a variable:</span>

    <span>outfile</span> <span>=</span> <span>"/tmp/foobar.txt"</span><span>;</span>

    <span>print</span> <span>"foobar"</span> <span>&gt;</span> <span>outfile</span><span>;</span>

    <span># Now the string outfile is a file handle. You can close it:</span>
    <span>close</span><span>(</span><span>outfile</span><span>);</span>

    <span># Here's how you run something in the shell</span>
    <span>system</span><span>(</span><span>"echo foobar"</span><span>);</span> <span># =&gt; prints foobar</span>

    <span># Reads a line from standard input and stores in localvar</span>
    <span>getline</span> <span>localvar</span><span>;</span>

    <span># Reads a line from a pipe (again, use a string so you close it properly)</span>
    <span>cmd</span> <span>=</span> <span>"echo foobar"</span><span>;</span>
    <span>cmd</span> <span>|</span> <span>getline</span> <span>localvar</span><span>;</span> <span># localvar =&gt; "foobar"</span>
    <span>close</span><span>(</span><span>cmd</span><span>);</span>

    <span># Reads a line from a file and stores in localvar</span>
    <span>infile</span> <span>=</span> <span>"/tmp/foobar.txt"</span><span>;</span>
    <span>getline</span> <span>localvar</span> <span>&lt;</span> <span>infile</span><span>;</span> 
    <span>close</span><span>(</span><span>infile</span><span>);</span>
<span>}</span>

<span># As I said at the beginning, AWK programs consist of a collection of patterns</span>
<span># and actions. You've already seen the BEGIN pattern. Other</span>
<span># patterns are used only if you're processing lines from files or standard</span>
<span># input.</span>
<span>#</span>
<span># When you pass arguments to AWK, they are treated as file names to process.</span>
<span># It will process them all, in order. Think of it like an implicit for loop,</span>
<span># iterating over the lines in these files. these patterns and actions are like</span>
<span># switch statements inside the loop. </span>

<span>/^fo+bar$/</span> <span>{</span>

    <span># This action will execute for every line that matches the regular</span>
    <span># expression, /^fo+bar$/, and will be skipped for any line that fails to</span>
    <span># match it. Let's just print the line:</span>

    <span>print</span><span>;</span>

    <span># Whoa, no argument! That's because print has a default argument: $0.</span>
    <span># $0 is the name of the current line being processed. It is created</span>
    <span># automatically for you.</span>

    <span># You can probably guess there are other $ variables. Every line is</span>
    <span># implicitly split before every action is called, much like the shell</span>
    <span># does. And, like the shell, each field can be access with a dollar sign</span>

    <span># This will print the second and fourth fields in the line</span>
    <span>print</span> <span>$</span><span>2</span><span>,</span> <span>$</span><span>4</span><span>;</span>

    <span># AWK automatically defines many other variables to help you inspect and</span>
    <span># process each line. The most important one is NF</span>

    <span># Prints the number of fields on this line</span>
    <span>print</span> <span>NF</span><span>;</span>

    <span># Print the last field on this line</span>
    <span>print</span> <span>$</span><span>NF</span><span>;</span>
<span>}</span>

<span># Every pattern is actually a true/false test. The regular expression in the</span>
<span># last pattern is also a true/false test, but part of it was hidden. If you</span>
<span># don't give it a string to test, it will assume $0, the line that it's</span>
<span># currently processing. Thus, the complete version of it is this:</span>

<span>$</span><span>0</span> <span>~</span> <span>/^fo+bar$/</span> <span>{</span>
    <span>print</span> <span>"Equivalent to the last pattern"</span><span>;</span>
<span>}</span>

<span>a</span> <span>&gt;</span> <span>0</span> <span>{</span>
    <span># This will execute once for each line, as long as a is positive</span>
<span>}</span>

<span># You get the idea. Processing text files, reading in a line at a time, and</span>
<span># doing something with it, particularly splitting on a delimiter, is so common</span>
<span># in UNIX that AWK is a scripting language that does all of it for you, without</span>
<span># you needing to ask. All you have to do is write the patterns and actions</span>
<span># based on what you expect of the input, and what you want to do with it.</span>

<span># Here's a quick example of a simple script, the sort of thing AWK is perfect</span>
<span># for. It will read a name from standard input and then will print the average</span>
<span># age of everyone with that first name. Let's say you supply as an argument the</span>
<span># name of a this data file:</span>
<span>#</span>
<span># Bob Jones 32</span>
<span># Jane Doe 22</span>
<span># Steve Stevens 83</span>
<span># Bob Smith 29</span>
<span># Bob Barker 72</span>
<span>#</span>
<span># Here's the script:</span>

<span>BEGIN</span> <span>{</span>

    <span># First, ask the user for the name</span>
    <span>print</span> <span>"What name would you like the average age for?"</span><span>;</span>

    <span># Get a line from standard input, not from files on the command line</span>
    <span>getline</span> <span>name</span> <span>&lt;</span> <span>"/dev/stdin"</span><span>;</span>
<span>}</span>

<span># Now, match every line whose first field is the given name</span>
<span>$</span><span>1</span> <span>==</span> <span>name</span> <span>{</span>

    <span># Inside here, we have access to a number of useful variables, already</span>
    <span># pre-loaded for us:</span>
    <span># $0 is the entire line</span>
    <span># $3 is the third field, the age, which is what we're interested in here</span>
    <span># NF is the number of fields, which should be 3</span>
    <span># NR is the number of records (lines) seen so far</span>
    <span># FILENAME is the name of the file being processed</span>
    <span># FS is the field separator being used, which is " " here</span>
    <span># ...etc. There are plenty more, documented in the man page.</span>

    <span># Keep track of a running total and how many lines matched</span>
    <span>sum</span> <span>+=</span> <span>$</span><span>3</span><span>;</span>
    <span>nlines</span><span>++</span><span>;</span>
<span>}</span>

<span># Another special pattern is called END. It will run after processing all the</span>
<span># text files. Unlike BEGIN, it will only run if you've given it input to</span>
<span># process. It will run after all the files have been read and processed</span>
<span># according to the rules and actions you've provided. The purpose of it is</span>
<span># usually to output some kind of final report, or do something with the</span>
<span># aggregate of the data you've accumulated over the course of the script.</span>

<span>END</span> <span>{</span>
    <span>if</span> <span>(</span><span>nlines</span><span>)</span>
        <span>print</span> <span>"The average age for "</span> <span>name</span> <span>" is "</span> <span>sum</span> <span>/</span> <span>nlines</span><span>;</span>
<span>}</span>
</pre><p><span>&nbsp;</span></p></div>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>

</channel>
</rss>