<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/15000?offset=830</link>
	<atom:link href="https://bioinformaticsonline.com/related/15000?offset=830" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/38618/canu-genome-assembly-parameters</guid>
	<pubDate>Mon, 07 Jan 2019 08:40:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/38618/canu-genome-assembly-parameters</link>
	<title><![CDATA[CANU genome assembly parameters !]]></title>
	<description><![CDATA[<p>Choose the appropriate parameters to run Canu and run it. The assembly will take about an hour. You can use two cores (parameter&nbsp;<code>-maxThreads=2</code>) and you would like to disable cluster option, since we compute on a single Amazon server set off the option to compute on cluster&nbsp;<code>useGrid=false</code>. This specifications should be for your project discussed with a local computing guru. The parameters that are in square brackets&nbsp;<code>[]</code>&nbsp;are optional, symbol&nbsp;<code>|</code>&nbsp;stands for "or".</p><pre><code>usage:   canu [-correct | -trim | -assemble | -trim-assemble] \
              [-s ] \
               -p  \
               -d  \
               genomeSize=[g|m|k] \
               -maxThreads=2 \
               useGrid=false \
              [other-options] \
               read_file.fastq.gz
</code></pre><p>A default&nbsp;<code>Canu</code>&nbsp;run produces usually high quality assembly, example of a command that was used for testing can be found below. However, there are still a lot of parameters that are possible to tweak. For example if we desire to assemble haplotypes separately of if we want to smash them together, we can alternate the error correction process.</p><pre><code>canu -p test_asmbl \
     -d asm_test3 \
     genomeSize=2m \
     -maxThreads=2 useGrid=false \
     -pacbio-raw \ ~/pacbio/dna/sample_reads.fastq.gz</code></pre><p>There is a brilliant&nbsp;<a href="http://canu.readthedocs.io/en/latest/faq.html#what-parameters-can-i-tweak">section in documentation</a>&nbsp;about parameter tweaking.</p><p>The output directory contains will contain many files. The most interesting ones are:</p><ul>
<li><code>*.correctedReads.fasta.gz</code>&nbsp;: file containing the input sequences after correction, trim and split based on consensus evidence.</li>
<li><code>*.trimmedReads.fastq</code>&nbsp;: file containing the sequences after correction and final trimming</li>
<li><code>*.layout</code>&nbsp;: file containing informations about read inclusion in the final assembly</li>
<li><code>*.gfa</code>&nbsp;: file containing the assembly graph by Canu</li>
<li><code>*.contigs.fasta</code>&nbsp;: file containing everything that could be assembled and is part of the primary assembly</li>
</ul><p>The basic stats of assembly can be read from reports generated by the assembler, or calculated using standard UNIX command line tools.</p><p>More at&nbsp;https://canu.readthedocs.io/en/latest/faq.html</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39674/simka-and-simkamin-are-comparative-metagenomics-method-dedicated-to-ngs-datasets</guid>
	<pubDate>Sat, 06 Jul 2019 13:56:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39674/simka-and-simkamin-are-comparative-metagenomics-method-dedicated-to-ngs-datasets</link>
	<title><![CDATA[Simka and SimkaMin are comparative metagenomics method dedicated to NGS datasets]]></title>
	<description><![CDATA[<p>Simka is a de novo comparative metagenomics tool. Simka represents each dataset as a k-mer spectrum and compute several classical ecological distances between them.</p>
<p>Developper:&nbsp;<a href="http://people.rennes.inria.fr/Gaetan.Benoit/">Ga&euml;tan Benoit</a>, PhD, former member of the&nbsp;<a href="http://team.inria.fr/genscale/">Genscale</a>&nbsp;team at Inria.</p>
<p>Contact: claire dot lemaitre at inria dot fr</p>
<p><span>Simka and SimkaMin are comparative metagenomics method dedicated to NGS datasets.&nbsp;</span><span></span><span><a href="https://gatb.inria.fr/software/simka/">https://gatb.inria.fr/software/simka/</a></span></p><p>Address of the bookmark: <a href="https://github.com/GATB/simka" rel="nofollow">https://github.com/GATB/simka</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40604/gapfinisher-a-reliable-gap-filling-pipeline-for-sspace-longread-scaffolder-output</guid>
	<pubDate>Fri, 24 Jan 2020 06:04:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40604/gapfinisher-a-reliable-gap-filling-pipeline-for-sspace-longread-scaffolder-output</link>
	<title><![CDATA[gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output]]></title>
	<description><![CDATA[<p><span>gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. They compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. </span></p>
<p><span>gapFinisher can fill gaps in draft genomes quickly and reliably.</span></p><p>Address of the bookmark: <a href="https://github.com/kammoji/gapFinisher" rel="nofollow">https://github.com/kammoji/gapFinisher</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/14011/dynamic-chromosome-breakpoints</guid>
	<pubDate>Wed, 13 Aug 2014 18:38:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/14011/dynamic-chromosome-breakpoints</link>
	<title><![CDATA[Dynamic chromosome breakpoints !!!]]></title>
	<description><![CDATA[<p>Cell division involves the distribution of identical genetic material, DNA, to two daughters&rsquo; cells. During this process, duplicated deoxyribonucleic acid (DNA) goes through a condensation and decondensation process. This is followed by nuclear envelope dissolution, mitotic spindle assembly, migration of the sister chromatid pairs to the metaphase plate, division and segregation of identical sets of chromosomes into daughter nuclei and nuclear envelope reformation.</p><p>The vital metaphase stage of cell division, when the sister chromatids migrated to the centre and lined up in a row, and pulled apart using attached microtubules in such a way that half the DNA ends up in each daughter cell. However, before the mitotic spindle‐mediated movement gets start and pulled DNA apart, the chromosomes are free to undergo <strong>recombination </strong>which involves the exchange of genetic material either between multiple chromosomes or between different regions of the same chromosome.</p><p><img src="http://www.sciencelearn.org.nz/var/sciencelearn/storage/images/contexts/uniquely-me/sci-media/images/chromosomes-crossing-over/464438-1-eng-NZ/Chromosomes-crossing-over.jpg" alt="image" width="504" height="342" style="border: 0px; border: 0px;"></p><p>During recombination, the precise breakage of each strand, exchange between the strands, and sealing of the resulting recombined molecules happens. The &ldquo;<strong>chromosomal breakpoints</strong>&rdquo; refers to these places where they break. Mostly, this process occurs with a high degree of accuracy at high frequency in both eukaryotic and prokaryotic cells. But occasionally this &ldquo;break and sealing/ break and reattach&rdquo; process goes wrong and the reattachment happens in the wrong place which usually create disaster (with few exceptions).These chromosome disaster or abnormalities involve the gain, loss or rearrangement of visible amounts of genetic material during cell division. These abnormalities are of two type, the first one is numerical abnormalities &nbsp;where severe disorders are caused by the loss or gain of whole chromosomes, which affect the copy number of hundreds or even thousands of genes. The second are structural abnormalities which can be unbalanced or balanced. The former are similar to numerical abnormalities in that genetic material is either gained or lost. The natural defects in chromosome segregation are linked to cancer and several genetic diseases (http://en.wikipedia.org/wiki/List_of_genetic_disorders). Therefore, the enzymes involved in regulating cell division are still the attractive drug targets for many diseases.</p><p>&nbsp;</p><p>&nbsp;</p><p><img src="http://upload.wikimedia.org/wikipedia/commons/4/4a/Chromosomal_translocations.svg" alt="image" width="424" height="331" style="border: 0px; border: 0px;"></p><p>&nbsp;</p><p>Apart from certain chromosome abnormalities, these &ldquo;crossing over&rdquo; of segments of maternal and paternal chromosomes to form hybrid chromosomes have some evolutionary importance and considered as a driver of genetic variation. Moreover, the chromosome breakage in evolution is considered to be non-random in nature(http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0020014). In addition the study of breakpoint regions and non-breakpoint (stable) regions of chromosomes indicates both the regions evolved in distinctly different ways ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675965/). These breakage may lead to genetic diseases or participate to chromosomal rearranmgnets and contributed in development of new species.</p><p>I will try to explain the genome hotspots/Evolutionary Breakpoint Regions(EBRs)/fragile regions/weak fragments/&nbsp; in my next blog.</p><p><strong>Software for recombination detection:</strong></p><p><strong>RAT</strong> http://cbr.jic.ac.uk/dicks/software/RAT/</p><p><strong>Breakpointer</strong> https://github.com/ruping/Breakpointer</p><p><strong>DRP</strong> http://web.cbio.uct.ac.za/~darren/rdp.html</p><p><strong>RB-finder</strong> http://www.ncbi.nlm.nih.gov/pubmed/18707535</p><p><strong>LDhat2.0</strong> http://ldhat.sourceforge.net/LDhat2.0/instructions.shtml</p><p><strong>Reference:</strong></p><p>http://www.nature.com/scitable/topicpage/genetic-recombination-514#</p><p>Image: Wikipedia , sciencelearn.org.nz</p><p><strong>Recommended Articles:</strong></p><p>http://www.friendshipcircle.org/blog/2012/05/22/13-chromosomal-disorders-youve-never-heard-of/</p><p>http://web.udl.es/usuaris/e4650869/docencia/segoncicle/genclin98/recursos_classe_%28pdf%29/revisionsPDF/chromosyndromes.pdf</p><p>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775595/table/T2/</p><p>http://learn.genetics.utah.edu/content/disorders/chromosomal/</p><p>http://www.ncert.nic.in/html/learning_basket/biology/cc&amp;cd.pdf</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41730/parliament2-runs-a-combination-of-tools-to-generate-structural-variant-calls-on-whole-genome-sequencing-data</guid>
	<pubDate>Thu, 28 May 2020 21:57:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41730/parliament2-runs-a-combination-of-tools-to-generate-structural-variant-calls-on-whole-genome-sequencing-data</link>
	<title><![CDATA[Parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data]]></title>
	<description><![CDATA[<p>Parliament2 identifies structural variants in a given sample relative to a reference genome. These structural variants cover large deletion events that are called as Deletions of a region, Insertions of a sequence into a region, Duplications of a region, Inversions of a region, or Translocations between two regions in the genome.</p>
<p>Parliament2 runs a combination of tools to generate structural variant calls on whole-genome sequencing data. It can run the following callers: Breakdancer, Breakseq2, CNVnator, Delly2, Manta, and Lumpy. Because of synergies in how the programs use computational resources, these are all run in parallel. Parliament2 will produce the outputs of each of the tools for subsequent investigation.</p><p>Address of the bookmark: <a href="https://github.com/dnanexus/parliament2" rel="nofollow">https://github.com/dnanexus/parliament2</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43227/project-associate-i-project-associate-ii-senior-project-associate-igib</guid>
  <pubDate>Thu, 05 Aug 2021 16:11:32 -0500</pubDate>
  <link></link>
  <title><![CDATA[Project Associate-I | Project Associate-II | Senior Project Associate @ IGIB]]></title>
  <description><![CDATA[
<p>Experience in Next Generation Sequencing (NGS) application and interest in Genomics/ Clinical / Translational Applications. OR Good computational programming skills and deep interest in working on interface of Genomics and Clinical application. </p>

<p>Project Scientist-I <br />Experimental / Computation analysis experience in highthroughput genomics/ clinical application.</p>

<p>Project Manager <br />Experience in handling large biological projects involving high-throughput genomics/ clinical application.</p>

<p>Scientific Administrative Assistant <br />Lab Work. </p>

<p>More at https://vinodscaria.genomes.in/positionsopen</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37411/my-commonly-used-commands-in-bioinformatics</guid>
	<pubDate>Thu, 26 Jul 2018 04:58:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37411/my-commonly-used-commands-in-bioinformatics</link>
	<title><![CDATA[My commonly used commands in Bioinformatics]]></title>
	<description><![CDATA[<p>FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:</p><pre><code>minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta
</code></pre><p>This reports Racon's changes in a table. You can exclude indels with the&nbsp;<code>-I</code>&nbsp;option in&nbsp;<code>show-snps</code>.&nbsp;</p><p>This process (Racon -&gt; MUMmer -&gt; SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38063/referee-genome-assembly-quality-scores</guid>
	<pubDate>Sun, 04 Nov 2018 16:44:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38063/referee-genome-assembly-quality-scores</link>
	<title><![CDATA[Referee: Genome assembly quality scores]]></title>
	<description><![CDATA[<p>Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.</p>
<p>We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.</p>
<h3>Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.</h3>
<p>https://github.com/gwct/referee</p><p>Address of the bookmark: <a href="https://gwct.github.io/referee/#" rel="nofollow">https://gwct.github.io/referee/#</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39726/jackalope-a-swift-versatile-phylogenomic-and-high-throughput-sequencing-simulator</guid>
	<pubDate>Fri, 26 Jul 2019 00:58:12 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39726/jackalope-a-swift-versatile-phylogenomic-and-high-throughput-sequencing-simulator</link>
	<title><![CDATA[jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator]]></title>
	<description><![CDATA[<p><code>jackalope</code> simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations&mdash;the latter of which can include selection, recombination, and demographic fluctuations. <code>jackalope</code> can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. All outputs can be written to standard file formats.</p>
<p><span>A swift, versatile phylogenomic and high-throughput sequencing simulator </span> <span><a href="https://jackalope.lucasnell.com">https://jackalope.lucasnell.com</a></span></p><p>Address of the bookmark: <a href="https://github.com/lucasnell/jackalope" rel="nofollow">https://github.com/lucasnell/jackalope</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

</channel>
</rss>