<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38481?offset=40</link>
	<atom:link href="https://bioinformaticsonline.com/related/38481?offset=40" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36880/jvarkit-java-utilities-for-bioinformatics</guid>
	<pubDate>Fri, 08 Jun 2018 09:31:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36880/jvarkit-java-utilities-for-bioinformatics</link>
	<title><![CDATA[Jvarkit : Java utilities for Bioinformatics]]></title>
	<description><![CDATA[Collection of Java tool kits for bioinformatics works:

Jvarkit : Java utilities for Bioinformatics<p>Address of the bookmark: <a href="http://lindenb.github.io/jvarkit/" rel="nofollow">http://lindenb.github.io/jvarkit/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43254/quasr-quantification-and-annotation-of-short-reads-in-r</guid>
	<pubDate>Fri, 13 Aug 2021 07:44:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43254/quasr-quantification-and-annotation-of-short-reads-in-r</link>
	<title><![CDATA[QuasR: Quantification and annotation of short reads in R]]></title>
	<description><![CDATA[<p>The <em><a href="https://bioconductor.org/packages/3.14/QuasR">QuasR</a></em> package (short for <em>Qu</em>antify and <em>a</em>nnotate <em>s</em>hort reads in <em>R</em>) integrates the functionality of several <strong>R</strong> packages (such as <em><a href="https://bioconductor.org/packages/3.14/IRanges">IRanges</a></em> <span>(Lawrence et al. 2013)</span> and <em><a href="https://bioconductor.org/packages/3.14/Rsamtools">Rsamtools</a></em>) and external software (e.g.&nbsp;<code>bowtie</code>, through the <em><a href="https://bioconductor.org/packages/3.14/Rbowtie">Rbowtie</a></em> package, and <code>HISAT2</code>, through the <em><a href="https://bioconductor.org/packages/3.14/Rhisat2">Rhisat2</a></em> package). The package aims to cover the whole analysis workflow of typical high throughput sequencing experiments, starting from the raw sequence reads, over pre-processing and alignment, up to quantification. A single <strong>R</strong> script can contain all steps of a complete analysis, making it simple to document, reproduce or share the workflow containing all relevant details.</p><p>Address of the bookmark: <a href="https://www.bioconductor.org/packages/devel/bioc/vignettes/QuasR/inst/doc/QuasR.html" rel="nofollow">https://www.bioconductor.org/packages/devel/bioc/vignettes/QuasR/inst/doc/QuasR.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</guid>
	<pubDate>Tue, 14 Nov 2017 10:26:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</link>
	<title><![CDATA[DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication]]></title>
	<description><![CDATA[<p>We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.</p>
<div>Availability and Implementation</div>
<p>The software is implemented in Python 3 and runs in both Python 2.7 and 3.4&ndash; on Macintosh and Linux systems. It is freely available at&nbsp;<a href="https://github.com/nigyta/dfast_core/" target="">https://github.com/nigyta/dfast_core/</a>&nbsp;under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at&nbsp;<a href="https://dfast.nig.ac.jp/" target="">https://dfast.nig.ac.jp/</a>.</p><p>Address of the bookmark: <a href="https://dfast.nig.ac.jp/" rel="nofollow">https://dfast.nig.ac.jp/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37306/genome-u-plot-a-whole-genome-visualization</guid>
	<pubDate>Fri, 13 Jul 2018 19:50:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37306/genome-u-plot-a-whole-genome-visualization</link>
	<title><![CDATA[Genome U-Plot: a whole genome visualization]]></title>
	<description><![CDATA[<p><span>Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities.&nbsp;</span></p>
<p><span>https://github.com/gaitat/GenomeUPlot</span></p><p>Address of the bookmark: <a href="https://github.com/gaitat/GenomeUPlot" rel="nofollow">https://github.com/gaitat/GenomeUPlot</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37796/grsr-a-tool-for-deriving-genome-rearrangement-scenarios-from-multiple-unichromosomal-genome-sequences</guid>
	<pubDate>Fri, 28 Sep 2018 09:35:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37796/grsr-a-tool-for-deriving-genome-rearrangement-scenarios-from-multiple-unichromosomal-genome-sequences</link>
	<title><![CDATA[GRSR: a tool for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences]]></title>
	<description><![CDATA[<p>GRSR is a Tool for Deriving Genome Rearrangement Scenarios for Multiple Uni-chromosomal Genomes. This tool will do the following steps:</p>
<ul>
<li>Step 1. Run mugsy to get multiple sequence alignment results.</li>
<li>Step 2 &amp; 3. Extraction of the Coordinates of Core Blocks, Construction of Synteny Blocks and Generating Signed Permutations.</li>
<li>Step 4. Generate pairwise genome rearrangement scenarios and find repeats at the breakpoints of each rearrangement events.</li>
<li></li>
<li></li>
</ul>
<p>https://github.com/DanwangJessica/GRSR</p><p>Address of the bookmark: <a href="https://github.com/DanwangJessica/GRSR" rel="nofollow">https://github.com/DanwangJessica/GRSR</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39624/cogent-a-tool-for-reconstructing-the-coding-genome-using-high-quality-full-length-transcriptome-sequences</guid>
	<pubDate>Tue, 18 Jun 2019 05:33:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39624/cogent-a-tool-for-reconstructing-the-coding-genome-using-high-quality-full-length-transcriptome-sequences</link>
	<title><![CDATA[Cogent: a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences.]]></title>
	<description><![CDATA[<div id="yui_3_14_1_1_1560853173251_3865">Cogent is a tool that identifies gene&nbsp;families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check&nbsp;assemblies&nbsp;for the presence of&nbsp;these known coding sequences.</div>
<div>&nbsp;</div>
<div>
<p>Cogent is a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences. It is designed to be used on&nbsp;<a href="https://github.com/PacificBiosciences/cDNA_primer/wiki">Iso-Seq data</a>&nbsp;and in cases where there is no reference genome or the ref genome is highly incomplete.</p>
<p>See a&nbsp;<a href="https://www.dropbox.com/s/mn6hwhguh0pqceu/20160106_Cogent_developers_conference_slides_Cuttlefish.pdf?dl=0">recent presentation</a>&nbsp;on Cogent being applied to the Cuttlefish Iso-Seq data.</p>
<p><a href="https://www.dropbox.com/s/kz0gi7qg0w82k9a/20161026_Cogent_manuscript_forGitHub.pdf?dl=0">Cogent preliminary draft paper (updated 2016Dec version)</a>,&nbsp;<a href="https://www.dropbox.com/s/37412o8glvnfhf9/20161026_Cogent_ManuscriptPlusSupplement_forGitHub.pdf?dl=0">Supplementary</a></p>
<p>Please see&nbsp;<a href="https://github.com/Magdoll/Cogent/wiki">wiki</a>&nbsp;for details on usage.</p>
</div><p>Address of the bookmark: <a href="https://github.com/Magdoll/Cogent" rel="nofollow">https://github.com/Magdoll/Cogent</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42267/hapsolo-an-optimization-approach-for-removing-secondary-haplotigs-during-diploid-genome-assembly-and-scaffolding</guid>
	<pubDate>Mon, 26 Oct 2020 21:23:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42267/hapsolo-an-optimization-approach-for-removing-secondary-haplotigs-during-diploid-genome-assembly-and-scaffolding</link>
	<title><![CDATA[HapSolo: An optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding.]]></title>
	<description><![CDATA[<p><span>Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding.</span></p>
<p><span>More at&nbsp;https://github.com/esolares/HapSolo</span></p><p>Address of the bookmark: <a href="https://github.com/esolares/HapSolo" rel="nofollow">https://github.com/esolares/HapSolo</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43273/understanding-kmer</guid>
	<pubDate>Wed, 18 Aug 2021 04:27:51 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43273/understanding-kmer</link>
	<title><![CDATA[Understanding kmer !]]></title>
	<description><![CDATA[<p><a href="https://en.wikipedia.org/wiki/k-mer">What is a&nbsp;<em>k-mer</em>&nbsp;anyway?</a><span>&nbsp;A&nbsp;</span><em>k-mer</em><span>&nbsp;is just a sequence of&nbsp;</span><em>k</em><span>&nbsp;characters in a string (or nucleotides in a DNA sequence). Now, it is important to remember that to get&nbsp;</span><em>all k-mers</em><span>&nbsp;from a sequence you need to get the first&nbsp;</span><em>k</em><span>&nbsp;characters, then move just a single character for the start of the next&nbsp;</span><em>k-mer</em><span>&nbsp;and so on. Effectively, this will create sequences that overlap in&nbsp;</span><code>k-1</code><span>&nbsp;positions.</span></p><p>Address of the bookmark: <a href="https://bioinfologics.github.io/post/2018/09/17/k-mer-counting-part-i-introduction/" rel="nofollow">https://bioinfologics.github.io/post/2018/09/17/k-mer-counting-part-i-introduction/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</guid>
	<pubDate>Mon, 31 Jan 2022 07:18:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</link>
	<title><![CDATA[Short-read assembly using Spades !]]></title>
	<description><![CDATA[<h2 id="short-read-assembly-a-comparison">If we only had Illumina reads, we could also assemble these using the tool Spades.</h2><p>You can try this here, or try it later on your own data.</p><h2 id="get-data">Get data</h2><p>We will use the same Illumina data as we used above:</p><ul>
<li>illumina_R1.fastq.gz: the Illumina forward reads</li>
<li>illumina_R2.fastq.gz: the Illumina reverse reads</li>
</ul><h2 id="assemble">Assemble</h2><p>Run Spades:</p><div><pre>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o spades_assembly_all_illumina
</pre></div><ul>
<li><code>-1</code>&nbsp;is input file of forward reads</li>
<li><code>-2</code>&nbsp;is input file of reverse reads</li>
<li><code>--careful</code>&nbsp;minimizes mismatches and short indels</li>
<li><code>--cov-cutoff auto</code>&nbsp;computes the coverage threshold (rather than the default setting, &ldquo;off&rdquo;)</li>
<li><code>-o</code>&nbsp;is the output directory</li>
</ul><h2 id="results">Results</h2><p>Move into the output directory and look at the contigs:</p><div><pre>infoseq contigs.fasta</pre></div>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

</channel>
</rss>