<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43859?offset=410</link>
	<atom:link href="https://bioinformaticsonline.com/related/43859?offset=410" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22570/frequent-words-problem-solution-by-perl</guid>
	<pubDate>Tue, 09 Jun 2015 23:38:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22570/frequent-words-problem-solution-by-perl</link>
	<title><![CDATA[Frequent words problem solution by Perl]]></title>
	<description><![CDATA[<div><p>Solved with perl <a href="http://rosalind.info/problems/1a/">http://rosalind.info/problems/1a/</a></p><p>#Find the most frequent k-mers in a string.<br />#Given: A DNA string Text and an integer k.<br />#Return: All most frequent k-mers in Text (in any order).<br /><br />use strict;<br />use warnings;<br /><br />my $string="ACGTTGCATGTCGCATGATGCATGAGAGCT";<br />my $kmer=4; <br />my %myHash;<br />my $max=0;<br /><br />for (my $aa=0; $aa&lt;=(length($string)-4); $aa++) {<br />&nbsp;&nbsp; &nbsp;my $myStr=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp; &nbsp;#print "$myStr\n";<br />&nbsp;&nbsp; &nbsp;my $km=kmerMatch ($string, $myStr, $kmer);<br />&nbsp;&nbsp; &nbsp;if ($km &gt; $max) { $max = $km;}<br />&nbsp;&nbsp; &nbsp;#print "$km\t$myStr\n";<br />&nbsp;&nbsp; &nbsp;$myHash{$myStr}=$km;<br />&nbsp;&nbsp; &nbsp;<br />}<br /><br />#Print all key which have matching values<br />foreach my $name (keys %myHash){<br />&nbsp;&nbsp;&nbsp; print "$name " if $myHash{$name} == $max;<br />}<br /><br />sub kmerMatch { #Check the exact matching kmers with sliding window<br />my ($string, $myStr, $kmer)=@_;<br />my $count=0;<br />for (my $aa=0; $aa&lt;=(length($string)-4); $aa++) {<br />&nbsp;&nbsp; &nbsp;my $myWin=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp; &nbsp;if ($myWin eq $myStr) {<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;#print "$myWin eq $myStr\n";<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;$count++;<br />&nbsp;&nbsp; &nbsp;}<br />}<br />return $count;<br />}</p></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26424/biotoolbox</guid>
	<pubDate>Fri, 19 Feb 2016 09:14:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26424/biotoolbox</link>
	<title><![CDATA[BioToolbox]]></title>
	<description><![CDATA[<p>This is a collection of libraries and high-quality end-user scripts for bioinformatic analysis, including working with gene annotation, collecting data scores from a variety of modern file formats, and conversion between file formats. The Bio::ToolBox libraries provide a unified, abstracted interface to multiple common gene annotation formats and the collection of data from multiple data files. They rely on BioPerl SeqFeature libraries and related adaptors to access binary file formats including Bam, BigWig, BigBed, and USeq. The Bio::ToolBox package includes scripts for setting up databases of annotation, collecting annotated features, collecting genomic data relative to features, manipulating and analyzing data, and data format conversion.</p>
<p>More at http://cpansearch.perl.org/src/TJPARNELL/</p><p>Address of the bookmark: <a href="http://cpansearch.perl.org/src/TJPARNELL/" rel="nofollow">http://cpansearch.perl.org/src/TJPARNELL/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37592/benchmarking-perl-module</guid>
	<pubDate>Sat, 25 Aug 2018 11:40:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37592/benchmarking-perl-module</link>
	<title><![CDATA[Benchmarking Perl Module !]]></title>
	<description><![CDATA[<p>The benchmark module is a great tool to know the time the code takes to run. The output is usually in terms of CPU time. This module provides us with a way to optimize our code. With the advent of petascale computing and other multicore processor it is becoming a neccesity to know about the CPU time taken by our perl program.</p><p>This is the simple way to use the module</p><blockquote><p>Example1:</p><p>use Benchmark;</p><p>$first_time = Benchmark-&gt;new;</p><p>our code&hellip;&hellip;</p><p>$second_time = Benchmark-&gt;new;</p><p>$final_difference = timediff($first_time,$second_time);</p><p>print &ldquo;the code took, timestr($final_difference),&rdquo;\n&rdquo;;</p></blockquote><p>that was a very simple way to know the time diff , we can use it to know the time taken by some part of the code in the program.</p><blockquote><p>More sophisticated way:</p><p>use Benchmark;<br />sub first {</p><p>my(arguments) = @_;</p><p>}</p><p>timethese(100, { first =&gt; &lsquo;first_sub(arguments)&rsquo;});</p><p>The first argument to timethese is 100 (evaluate 100 times).</p></blockquote><p>Hope this very small tutorial with Benchmark will help people get started.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/38029/biologist-versus-computational-biologist</guid>
	<pubDate>Mon, 29 Oct 2018 04:23:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/38029/biologist-versus-computational-biologist</link>
	<title><![CDATA[Biologist versus computational biologist !]]></title>
	<description><![CDATA[<p>This is how it work :)</p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/38029" length="69305" type="image/png" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44803/basics-of-deseq2-differential-expression-made-simple</guid>
	<pubDate>Wed, 28 May 2025 06:47:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44803/basics-of-deseq2-differential-expression-made-simple</link>
	<title><![CDATA[Basics of DESeq2: Differential Expression Made Simple]]></title>
	<description><![CDATA[<p>DESeq2 is a powerful and widely-used R package that identifies differentially expressed genes (DEGs) from RNA-seq data. Whether you're comparing treated vs untreated samples, disease vs healthy conditions, or wild-type vs mutant strains, DESeq2 helps you statistically determine which genes are significantly up- or down-regulated.</p><p><strong>What Does DESeq2 Do?</strong><br />DESeq2 analyzes count data&mdash;the number of sequencing reads that map to each gene. It:</p><p>Normalizes the data to account for sequencing depth and library size.</p><p>Estimates variance (dispersion) for each gene.</p><p>Fits a model to compare groups (e.g., control vs treated).</p><p>Calculates fold-changes and p-values to determine significance.</p><p><strong>Installing DESeq2</strong></p><p><br />You can install DESeq2 via Bioconductor in R:</p><p>if (!requireNamespace("BiocManager", quietly = TRUE))<br /> install.packages("BiocManager")<br />BiocManager::install("DESeq2")</p><p><br />Inputs Needed</p><p><br />A count matrix: genes as rows, samples as columns (raw counts, not normalized).</p><p>A sample metadata table (also called colData): defines the condition/group for each sample.</p><blockquote><p>Example:<br /># Count matrix (rows = genes, columns = samples)<br />counts &lt;- read.csv("counts.csv", row.names = 1)</p><p># Sample metadata<br />colData &lt;- data.frame(<br /> row.names = colnames(counts),<br /> condition = c("control", "control", "treated", "treated")<br />)</p><p>DESeq2 Workflow</p><p>1. Load the package<br />library(DESeq2)<br />2. Create a DESeqDataSet object<br />dds &lt;- DESeqDataSetFromMatrix(countData = counts,<br /> colData = colData,<br /> design = ~ condition)<br />3. Run the differential expression analysis<br />dds &lt;- DESeq(dds)<br />4. Get the results<br />res &lt;- results(dds)<br />head(res)<br />This gives a table with:</p><p>log2FoldChange: how much expression changed</p><p>pvalue: statistical significance</p><p>padj: adjusted p-value (FDR corrected)</p></blockquote><p><strong>Visualization (Optional but Powerful)</strong></p><blockquote><p><br />MA Plot<br />plotMA(res, ylim = c(-2, 2))</p><p>Volcano Plot (custom)<br />library(ggplot2)<br />res$significant &lt;- res$padj &lt; 0.05<br />ggplot(res, aes(x=log2FoldChange, y=-log10(padj), color=significant)) +<br /> geom_point() +<br /> theme_minimal()</p><p>Heatmap of Top Genes<br />library(pheatmap)<br />topgenes &lt;- head(order(res$padj), 20)<br />vsd &lt;- vst(dds, blind=FALSE)<br />pheatmap(assay(vsd)[topgenes, ])</p><p>Tips for Best Results<br />Use raw counts (not normalized or TPM/RPKM values).</p><p>Have replicates: DESeq2 relies on variance estimates, so at least 3 per group is ideal.</p><p>Watch out for batch effects&mdash;include them in your design if needed (e.g., ~ batch + condition).</p></blockquote><p><strong>Summary</strong></p><p>Step Purpose<br />DESeqDataSetFromMatrix() Load your data into DESeq2<br />DESeq() Run the differential expression analysis<br />results() Extract the output (log fold change, p-values, etc.)<br />plotMA() / ggplot2 / pheatmap Visualize the results</p><p><strong>Final Thoughts</strong><br />DESeq2 is an essential tool for RNA-seq data analysis. It abstracts away much of the complexity of statistical modeling, while still giving you control when needed. Whether you're a bioinformatician or a wet-lab biologist, DESeq2 offers both ease of use and analytical power.</p><p>&nbsp;</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36373/tools-to-predict-the-impact-of-missense-variants</guid>
	<pubDate>Mon, 23 Apr 2018 12:57:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36373/tools-to-predict-the-impact-of-missense-variants</link>
	<title><![CDATA[Tools to Predict the Impact of Missense Variants !]]></title>
	<description><![CDATA[<p><span>Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of&nbsp;</span><em>in silico</em><span>&nbsp;tools have been employed for the task of pathogenicity prediction, including PolyPhen‐2, SIFT, FatHMM, MutationTaster‐2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. </span></p><p><span>Study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. Comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.</span></p><p><span>Following tools are useful for mis sense muation detection ...&nbsp;</span></p><p>PolyPhen‐2 (PP2)<br />&ldquo;Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations&rdquo;</p><p>MutationTaster‐2 (MT2)<br />&ldquo;Evaluation of the disease‐causing potential of DNA sequence alterations&rdquo;</p><p>MutationAssessor (MASS)<br />&ldquo;Predicts the functional impact of amino acid substitutions in proteins, such as mutations discovered in cancer or missense polymorphisms&rdquo;</p><p>LRT<br />&ldquo;Identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein‐coding sequences, which are likely to be unconditionally deleterious&rdquo;</p><p>SIFT<br />&ldquo;Predicts whether an amino acid substitution affects protein function&rdquo;</p><p>GERP++<br />&ldquo;Identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as &ldquo;rejected substitutions.&rdquo; Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element&rdquo;</p><p>phyloP<br />&ldquo;Compute conservation or acceleration P values based on an alignment and a model of neutral evolution&rdquo;</p><p>FatHMM unweighted (FatHMM‐U)<br />Predicts &ldquo;functional consequences of both coding variants, that is, nonsynonymous single‐nucleotide variants, and noncoding variants&rdquo;</p><p>FatHMM weighted (FatHMM‐W)<br />Predicts &ldquo;functional consequences of both coding variants, that is, nonsynonymous single‐nucleotide variants, and noncoding variants&rdquo; and its weighting scheme attributes higher tolerance scores to SNVs in proteins, related proteins, or domains that already include a high fraction of pathogenic variantsh</p><p>Combined Annotation Dependent Depletion (CADD)<br />&ldquo;CADD is a tool for scoring the deleteriousness of single‐nucleotide variants as well as insertion/deletions variants in the human genome&rdquo;</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33482/tardis-toolkit-for-automated-and-rapid-discovery-of-structural-variants</guid>
	<pubDate>Fri, 09 Jun 2017 04:43:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33482/tardis-toolkit-for-automated-and-rapid-discovery-of-structural-variants</link>
	<title><![CDATA[TARDIS: Toolkit for automated and rapid discovery of structural variants]]></title>
	<description><![CDATA[<p>tardis</p>
<p>Toolkit for Automated and Rapid DIscovery of Structural variants</p>
<p>Requirements</p>
<p>zlib (http://www.zlib.net)<br>mrfast (https://github.com/BilkentCompGen/mrfast)<br>htslib (included as submodule; http://htslib.org/)<br>Fetching tardis</p>
<p>git clone https://github.com/BilkentCompGen/tardis.git --recursive</p>
<p>&nbsp;</p>
<p>https://github.com/BilkentCompGen/tardis</p><p>Address of the bookmark: <a href="https://github.com/BilkentCompGen/tardis" rel="nofollow">https://github.com/BilkentCompGen/tardis</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40611/deepvariant-an-analysis-pipeline-that-uses-a-deep-neural-network-to-call-genetic-variants-from-next-generation-dna-sequencing-data</guid>
	<pubDate>Sat, 25 Jan 2020 13:28:09 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40611/deepvariant-an-analysis-pipeline-that-uses-a-deep-neural-network-to-call-genetic-variants-from-next-generation-dna-sequencing-data</link>
	<title><![CDATA[DeepVariant : an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.]]></title>
	<description><![CDATA[<p><span>DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.</span></p>
<p><span><span>DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on&nbsp;</span><a href="https://github.com/google/nucleus">Nucleus</a><span>, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the&nbsp;</span><a href="https://www.tensorflow.org/">TensorFlow</a><span>&nbsp;machine learning framework.</span></span></p>
<p><span><a href="https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html">https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html</a></span></p>
<p><span><a href="https://www.biorxiv.org/content/10.1101/092890v6">https://www.biorxiv.org/content/10.1101/092890v6</a></span></p>
<p><span><img src="https://4.bp.blogspot.com/-2KlXZO60sWE/WiGc8qlZfxI/AAAAAAAACOs/s1pNiKI8jsAvJLr1E_po5udDO8eObm_awCLcBGAs/s640/image3.png" width="640" height="427" alt="image" style="border: 0px;"></span></p><p>Address of the bookmark: <a href="https://github.com/google/deepvariant" rel="nofollow">https://github.com/google/deepvariant</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40705/malva-genotyping-by-mapping-free-allele-detection-of-known-variants</guid>
	<pubDate>Tue, 28 Jan 2020 03:39:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40705/malva-genotyping-by-mapping-free-allele-detection-of-known-variants</link>
	<title><![CDATA[MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants]]></title>
	<description><![CDATA[<p id="p0010">MALVA is able to genotype multi-allelic SNPs and indels without mapping reads</p>
<p id="p0015">MALVA calls correctly more indels than the most widely adopted genotyping pipelines</p>
<p id="p0020">Mapping-free approaches are as accurate as alignment-based ones, while being faster</p>
<p>More at&nbsp;<a href="https://www.sciencedirect.com/science/article/pii/S2589004219302366">https://www.sciencedirect.com/science/article/pii/S2589004219302366</a></p>
<p><a href="https://www.sciencedirect.com/science/article/pii/S2589004219302366">https://www.sciencedirect.com/science/article/pii/S2589004219302366</a></p><p>Address of the bookmark: <a href="https://github.com/AlgoLab/malva" rel="nofollow">https://github.com/AlgoLab/malva</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27094/smash-an-alignment-free-method-to-find-and-visualise-rearrangements-between-pairs-of-dna-sequences</guid>
	<pubDate>Tue, 26 Apr 2016 12:18:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27094/smash-an-alignment-free-method-to-find-and-visualise-rearrangements-between-pairs-of-dna-sequences</link>
	<title><![CDATA[Smash: An alignment-free method to find and visualise rearrangements between pairs of DNA sequences]]></title>
	<description><![CDATA[<p><strong>Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements</strong><span>. The detection is based on&nbsp;</span><strong>conditional exclusive compression</strong><span>, namely using a FCM (Markov model), of high context order (typically 20). For visualisation, Smash outputs a&nbsp;</span><strong>SVG image</strong><span>, with an&nbsp;</span><strong>ideogram</strong><span>output architecture, where the patterns are represented with several&nbsp;</span><strong>HSV values</strong><span>&nbsp;(only value varies). The method can perform both in small- and large-scale. Nevertheless is more directed to large-scale since that the main aim of the research is to&nbsp;</span><strong>know where the large-scale [chromosomal by chromosome] of several primates was equal/different, having at a glance a map of the entire genomes</strong><span>.</span></p><p>Address of the bookmark: <a href="http://bioinformatics.ua.pt/software/smash/" rel="nofollow">http://bioinformatics.ua.pt/software/smash/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>