<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44742?</link>
	<atom:link href="https://bioinformaticsonline.com/related/44742?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</guid>
	<pubDate>Mon, 16 Jun 2025 01:44:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</link>
	<title><![CDATA[What is Data Science? — A Bioinformatics Perspective]]></title>
	<description><![CDATA[<p>In today&rsquo;s era of big biology, we&rsquo;re generating more data than ever before&mdash;genomes, transcriptomes, proteomes, metabolomes, microbiomes&hellip; you name it. But raw biological data doesn&rsquo;t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.</p><p><strong>So, What Is Data Science?</strong><br />At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.</p><p>Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes&mdash;these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.</p><p><strong>Data Science Meets Bioinformatics</strong><br />Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:</p><p>Clean and process massive datasets</p><p>Discover patterns in high-dimensional data</p><p>Build predictive models (e.g., for disease classification)</p><p>Visualize complex biological networks and trends</p><p>Integrate diverse data types (e.g., transcriptomic + epigenomic data)</p><p><strong>The Bioinformatics Toolkit</strong><br />Here&rsquo;s what data science typically looks like in bioinformatics:</p><p>Task Data Science Role<br />Sequence alignment Efficient algorithms, indexing, parallel processing<br />Gene expression analysis Statistical modeling (e.g., DESeq2, limma)<br />Variant calling Data filtering, probabilistic models<br />Clustering of cells in single-cell data Unsupervised learning<br />Protein structure prediction Deep learning models (e.g., AlphaFold)<br />Metagenomics Data integration, classification, dimensionality reduction</p><p>Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow&mdash;often working together in reproducible workflows.</p><p><strong>It's Not Just About Coding</strong><br />A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:</p><p>Understanding experimental design</p><p>Asking biologically meaningful questions</p><p>Choosing the right statistical or machine learning models</p><p>Communicating findings effectively (e.g., plots, dashboards, papers)</p><p>In other words, data science in bioinformatics is where biology, statistics, and computer science converge.</p><p><strong>Why It Matters</strong><br />The real power of data science in bioinformatics is its ability to scale discovery.</p><p>Instead of studying one gene, we can study thousands.</p><p>Instead of analyzing one species, we can explore entire ecosystems.</p><p>Instead of waiting months for lab results, we can generate hypotheses in days.</p><p>From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.</p><p><strong>Final Thoughts</strong><br />If you&rsquo;re a biologist who&rsquo;s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground&mdash;and data science is your toolkit.</p><p>In bioinformatics, data science isn&rsquo;t just useful. It&rsquo;s essential.</p><p>&nbsp;</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44873/bakrep-denglish-blend-of-bakterien-repository-simplifies-access-to-this-data</guid>
	<pubDate>Wed, 13 Aug 2025 02:31:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44873/bakrep-denglish-blend-of-bakterien-repository-simplifies-access-to-this-data</link>
	<title><![CDATA[BakRep (Denglish blend of Bakterien &amp; Repository) simplifies access to this data]]></title>
	<description><![CDATA[<p>2,438,386 bacterial genomes at your fingertips consistently processed &amp; characterized, enriched with metadata, accessible via a flexible search engine.</p>
<p>BakRep (Denglish blend of Bakterien &amp; Repository) simplifies access to this data. It integrates enriched genomic information with metadata accessible via a flexible search-engine.</p>
<h1>Key features</h1>
<ul>
<li>Assembly statistics: ensure data quality with genome-based key metrics</li>
<li>Taxonomic classification: robust, purely genome-based classifications (<a href="https://gtdb.ecogenomic.org/" target="_blank">GTDB</a>)</li>
<li><a href="https://pubmlst.org/">MLST</a>: subtyping for deeper insights into genetic variation</li>
<li>Annotation: comprehensive &amp; taxonomy-independent (<a href="https://bakta.computational.bio/" target="_blank">Bakta</a>)</li>
<li>Metadata: full original submission records</li>
</ul>
<div>&nbsp;</div><p>Address of the bookmark: <a href="https://bakrep.computational.bio/" rel="nofollow">https://bakrep.computational.bio/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35272/biocircosjs-is-an-open-source-interactive-javascript-library-to-interactive-display-biological-data-on-the-web</guid>
	<pubDate>Fri, 19 Jan 2018 15:03:51 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35272/biocircosjs-is-an-open-source-interactive-javascript-library-to-interactive-display-biological-data-on-the-web</link>
	<title><![CDATA[BioCircos.js is an open source interactive Javascript library to interactive display biological data on the web]]></title>
	<description><![CDATA[<p><a href="http://bioinfo.ibp.ac.cn/biocircos/index.php">BioCircos.js</a>&nbsp;is an open source interactive&nbsp;<code>Javascript</code>&nbsp;library which provides an easy way to interactive display biological data on the web. It implements a raster-based&nbsp;<code>SVG</code>&nbsp;visualization using the open source Javascript framework jquery.js. BioCircos.js is multiplatform and works in all major internet browsers (<strong>Internet Explorer</strong>,&nbsp;<strong>Mozilla Firefox</strong>,&nbsp;<strong>Google Chrome</strong>,&nbsp;<strong>Safari</strong>,&nbsp;<strong>Opera</strong>). Its speed is determined by the client&rsquo;s hardware and internet browser. For smoothest user experience, we recommend&nbsp;<strong>Google Chrome</strong>.</p>
<p>BioCircos.js provides&nbsp;<strong>SNP</strong>,&nbsp;<strong>CNV</strong>,&nbsp;<strong>HEATMAP</strong>,&nbsp;<strong>LINK</strong>,&nbsp;<strong>LINE</strong>,&nbsp;<strong>SCATTER</strong>,&nbsp;<strong>ARC</strong>,&nbsp;<strong>TEXT</strong>, and&nbsp;<strong>HISTGRAM</strong>modules to display genome-wide genetic variations (SNPs, CNVs and chromosome rearrangement), gene expression and biomolecule interactions. BioCircos.js also provides&nbsp;<strong>BACKGROUND</strong>&nbsp;module to display background and axis circles. Tooltips showing detailed information of SVG elements are also provided.</p>
<p><a href="http://bioinfo.ibp.ac.cn/biocircos/document/demo/pages/paper01.html">Demo</a></p><p>Address of the bookmark: <a href="http://bioinfo.ibp.ac.cn/biocircos/document/index.html" rel="nofollow">http://bioinfo.ibp.ac.cn/biocircos/document/index.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41464/phytozome-v121-plant-science-community-hub-for-accessing-palnts-genomic-data</guid>
	<pubDate>Tue, 17 Mar 2020 07:30:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41464/phytozome-v121-plant-science-community-hub-for-accessing-palnts-genomic-data</link>
	<title><![CDATA[Phytozome  v12.1: plant science community hub for accessing palnts genomic data]]></title>
	<description><![CDATA[<p>Phytozome, the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute, provides JGI users and the broader plant science community a hub for accessing, visualizing and analyzing JGI-sequenced plant genomes, as well as selected genomes and datasets that have been sequenced elsewhere. As of release v12.1.6, Phytozome hosts 93 assembled and annotated genomes, from 82 Viridiplantae species. More than half of these genomes have been sequenced, assembled and/or annotated with JGI Plant Science program resources. By integrating this large collection of plant genomes into a single resource and performing comprehensive and uniform annotation and analyses, Phytozome facilitates accurate and insightful comparative genomics studies.</p><p>Address of the bookmark: <a href="https://phytozome.jgi.doe.gov/pz/portal.html" rel="nofollow">https://phytozome.jgi.doe.gov/pz/portal.html</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/38226/ncbi-to-assist-in-virus-hunting-data-science-hackathon</guid>
	<pubDate>Thu, 15 Nov 2018 12:55:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/38226/ncbi-to-assist-in-virus-hunting-data-science-hackathon</link>
	<title><![CDATA[NCBI to assist in Virus Hunting Data Science Hackathon]]></title>
	<description><![CDATA[<p>NCBI Hackathon are pleased to announce the second installment of the&nbsp;<a href="https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/30/ncbi-southern-california-genomics-hackathon-january/" target="_blank">SoCal Bioinformatics Hackathon</a>. From January 9-11, 2019, the&nbsp;<a href="https://www.ncbi.nlm.nih.gov/" target="_blank">NCBI</a>&nbsp;will help run a bioinformatics hackathon in Southern California hosted by the&nbsp;<a href="http://www.csrc.sdsu.edu/" target="_blank">Computational Sciences Research Center</a>&nbsp;at&nbsp;<a href="http://www.sdsu.edu/" target="_blank">San Diego State University</a>!</p><p><span>NCBI Hackathon</span>&nbsp;specifically looking for folks who have experience in computational virus hunting or adjacent fields to identify known, taxonomically-definable and novel viruses from a few hundred thousand metagenomic datasets that we&rsquo;ll put on cloud infrastructure. This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. If this describes you, please&nbsp;<a href="https://goo.gl/forms/kDnSG0IAZD62XQRe2" target="_blank">apply</a>! The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).</p><p>https://ncbiinsights.ncbi.nlm.nih.gov/2018/11/09/ncbi-sdsu-virus-hunting-data-science-hackathon-january-2019/</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35418/karyoploter-plot-whole-genomes-with-arbitrary-data</guid>
	<pubDate>Fri, 02 Feb 2018 03:24:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35418/karyoploter-plot-whole-genomes-with-arbitrary-data</link>
	<title><![CDATA[karyoploteR: plot whole genomes with arbitrary data]]></title>
	<description><![CDATA[<p><span><a href="http://bioconductor.org/packages/karyoploteR">karyoploteR</a></span><span>&nbsp;is an R package to create karyoplots, that is, representations of whole genomes with arbitrary data plotted on them. It is inspired by the R base graphics system and does not depend on other graphics packages. The aim of karyoploteR is to offer the user an easy way to plot data along the genome to get broad genome-wide view to facilitate the identification of genome wide relations and distributions.</span></p><p>Address of the bookmark: <a href="https://bernatgel.github.io/karyoploter_tutorial/" rel="nofollow">https://bernatgel.github.io/karyoploter_tutorial/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34488/scripts-for-the-analysis-of-hgt-in-genome-sequence-data</guid>
	<pubDate>Wed, 29 Nov 2017 16:44:10 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34488/scripts-for-the-analysis-of-hgt-in-genome-sequence-data</link>
	<title><![CDATA[Scripts for the analysis of HGT in genome sequence data.]]></title>
	<description><![CDATA[<p><span>Scripts for the analysis of HGT in genome sequence data</span></p><p>Address of the bookmark: <a href="https://github.com/reubwn/hgt" rel="nofollow">https://github.com/reubwn/hgt</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36271/heap-a-highly-sensitive-and-accurate-snp-detection-tool-for-low-coverage-high-throughput-sequencing-data</guid>
	<pubDate>Thu, 19 Apr 2018 08:06:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36271/heap-a-highly-sensitive-and-accurate-snp-detection-tool-for-low-coverage-high-throughput-sequencing-data</link>
	<title><![CDATA[Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data]]></title>
	<description><![CDATA[<p><span>Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both end of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from&nbsp;</span><a href="https://github.com/meiji-bioinf/heap">https://github.com/meiji-bioinf/heap</a><span>&nbsp;and our web site (</span><a href="http://bioinf.mind.meiji.ac.jp/lab/en/tools.html">http://bioinf.mind.meiji.ac.jp/lab/en/tools.html</a><span>).</span></p><p>Address of the bookmark: <a href="https://github.com/meiji-bioinf/heap" rel="nofollow">https://github.com/meiji-bioinf/heap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37259/epiviz-an-interactive-visualization-tool-for-functional-genomics-data</guid>
	<pubDate>Mon, 09 Jul 2018 05:27:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37259/epiviz-an-interactive-visualization-tool-for-functional-genomics-data</link>
	<title><![CDATA[Epiviz: an interactive visualization tool for functional genomics data.]]></title>
	<description><![CDATA[<p><span>Epiviz is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. It also includes data from the&nbsp;</span><a href="http://barcode.luhs.org/" target="_blank">Gene Expression Barcode project</a><span>&nbsp;for transcriptome visualization. It has a flexible plugin framework so users can add</span><a href="http://d3js.org/" target="_blank">d3</a><span>&nbsp;visualizations. You can see a video tour&nbsp;</span><a href="http://youtu.be/099c4wUxozA" target="_blank">here</a><span>.</span></p>
<p><span>https://bioconductor.org/packages/release/bioc/html/epivizr.html</span></p>
<p><span>https://github.com/epiviz</span></p>
<p><span>https://github.com/epiviz/epiviz</span></p><p>Address of the bookmark: <a href="https://epiviz.github.io/" rel="nofollow">https://epiviz.github.io/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37835/variantbam-filtering-and-profiling-of-next-generational-sequencing-data-using-region-specific-rules</guid>
	<pubDate>Thu, 04 Oct 2018 16:30:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37835/variantbam-filtering-and-profiling-of-next-generational-sequencing-data-using-region-specific-rules</link>
	<title><![CDATA[VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules]]></title>
	<description><![CDATA[<p>VariantBam is a tool to extract/count specific sets of sequencing reads from next-generational sequencing files. To save money, disk space and I/O, one may not want to store an entire BAM on disk. In many cases, it would be more efficient to store only those read-pairs or reads who intersect some region around the variant locations. Alternatively, if your scientific question is focused on only one aspect of the data (e.g. breakpoints), many reads can be removed without losing the information relevant to the problem.</p>
<h5>&nbsp;</h5><p>Address of the bookmark: <a href="https://github.com/broadinstitute/VariantBam" rel="nofollow">https://github.com/broadinstitute/VariantBam</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>