<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/37672?offset=230</link>
	<atom:link href="https://bioinformaticsonline.com/related/37672?offset=230" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38634/eyechrom-visualizing-chromosome-count-data-from-plants</guid>
	<pubDate>Tue, 08 Jan 2019 10:20:54 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38634/eyechrom-visualizing-chromosome-count-data-from-plants</link>
	<title><![CDATA[EyeChrom: Visualizing Chromosome Count Data From Plants]]></title>
	<description><![CDATA[<p><span>It's goal is to show chromosmal data per genus. Select the genus, and the plot will show the records found for it in the Chromosome Counts Database. note: Report an issue via Gihub: github.com/roszenil/CCDBcurator and github.com/RodrigoRivero/EyeChrom</span></p>
<p>https://bsapubs.onlinelibrary.wiley.com/doi/pdf/10.1002/aps3.1207</p><p>Address of the bookmark: <a href="http://eyechrom.com:3838/EyeChrom/" rel="nofollow">http://eyechrom.com:3838/EyeChrom/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41328/deephic-a-generative-adversarial-network-for-enhancing-hi-c-data-resolution</guid>
	<pubDate>Tue, 03 Mar 2020 01:12:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41328/deephic-a-generative-adversarial-network-for-enhancing-hi-c-data-resolution</link>
	<title><![CDATA[DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution]]></title>
	<description><![CDATA[<p><strong>DeepHiC</strong> is a GAN-based model for enhancing Hi-C data resolution. We developed this server for helping researchers to enhance their own low-resolution data by a few steps of clicks. <em>Ab initio</em> training could be performed according to our published <a href="https://github.com/omegahh/DeepHiC">code</a>. We provided trained models for various depth of low-coverage sequencing Hi-C data. The depth of input data is estimated by its distribution comparing with those of the downsampled Hi-C data we used in training</p><p>Address of the bookmark: <a href="http://sysomics.com/deephic" rel="nofollow">http://sysomics.com/deephic</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</guid>
	<pubDate>Wed, 15 Sep 2021 21:15:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</link>
	<title><![CDATA[Reference Sequence Resource!]]></title>
	<description><![CDATA[<p><span>The ENCODE project uses Reference Genomes from&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/genome/browse/reference/">NCBI</a><span>&nbsp;or&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/downloads.html">UCSC</a><span>&nbsp;to provide a consistent framework for mapping high-throughput sequencing data.&nbsp;In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability.&nbsp;</span><em>Drosophia melanogaster</em><span>&nbsp;experiments are mapped to either dm3 or dm6 and&nbsp;</span><em>Caenorhabdilis elegans&nbsp;</em><span>experiments are mapped to ce10 or ce11.&nbsp;T</span></p><p>Address of the bookmark: <a href="https://www.encodeproject.org/data-standards/reference-sequences/" rel="nofollow">https://www.encodeproject.org/data-standards/reference-sequences/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</guid>
	<pubDate>Thu, 02 Jan 2025 11:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</link>
	<title><![CDATA[Large Language Models in Bioinformatics: Transforming Data Analysis and Interpretation]]></title>
	<description><![CDATA[<p>The integration of artificial intelligence (AI) into bioinformatics has ushered in a new era of computational biology. Among the most transformative advancements are large language models (LLMs), such as GPT and BERT, which leverage deep learning to process and interpret vast amounts of text data. These models are reshaping bioinformatics by enhancing data analysis, hypothesis generation, and literature mining.</p><h3>Understanding Large Language Models</h3><p>LLMs are AI systems trained on extensive datasets of natural language. Their ability to model context, identify patterns, and generate coherent language has proven invaluable across domains, including bioinformatics. By fine-tuning these models on biological datasets, researchers can unlock insights into molecular biology, systems biology, and beyond.</p><h3>Key Applications of LLMs in Bioinformatics</h3><h4>1. <strong>Annotating Biological Data</strong></h4><p>Annotating genomic and proteomic data is fundamental yet labor-intensive. LLMs streamline this process by extracting functional annotations from literature and databases, predicting gene and protein functions, and providing automated insights.</p><h4>2. <strong>Mining Scientific Literature</strong></h4><p>The exponential growth of publications presents a challenge for researchers to stay updated. LLMs can process large volumes of text to extract key findings, summarize papers, and identify trends, thereby facilitating efficient literature reviews.</p><h4>3. <strong>Predicting Gene and Protein Functions</strong></h4><p>By leveraging sequence data and annotations, LLMs can predict the functions of uncharacterized genes and proteins. This capability is particularly useful for studying non-model organisms and orphan genes.</p><h4>4. <strong>Drug Discovery and Repurposing</strong></h4><p>LLMs enable pattern recognition across chemical, genomic, and clinical datasets, identifying novel drug candidates and repurposing existing drugs for new therapeutic targets. They can simulate interactions between drugs and biological molecules, accelerating the discovery pipeline.</p><h4>5. <strong>Generating Hypotheses for Research</strong></h4><p>LLMs analyze complex datasets to propose testable hypotheses. For example, they can predict protein-protein interactions, identify regulatory motifs, or model evolutionary processes in genomes.</p><h3>Advantages of LLMs in Bioinformatics</h3><ul>
<li>
<p><strong>Scalability:</strong> LLMs process massive datasets rapidly, reducing the time required for data analysis.</p>
</li>
<li>
<p><strong>Versatility:</strong> These models adapt to diverse bioinformatics tasks, from genomic annotation to network analysis.</p>
</li>
<li>
<p><strong>Contextual Insights:</strong> By synthesizing information across disparate datasets, LLMs provide integrative insights into biological systems.</p>
</li>
</ul><h3>Challenges in Applying LLMs</h3><p>Despite their promise, LLMs face limitations:</p><ul>
<li>
<p><strong>Data Quality and Bias:</strong> Inaccurate or biased datasets can affect model predictions, necessitating rigorous data curation.</p>
</li>
<li>
<p><strong>Interpretability:</strong> Understanding the decision-making process of LLMs remains a critical challenge, especially in high-stakes fields like genomics and medicine.</p>
</li>
<li>
<p><strong>Resource Intensity:</strong> Training and deploying LLMs require substantial computational power, which can limit accessibility.</p>
</li>
<li>
<p><strong>Ethical Concerns:</strong> Handling sensitive genomic data raises privacy and security issues, emphasizing the need for ethical guidelines.</p>
</li>
</ul><h3>Future Prospects</h3><p>The continued development of LLMs tailored for bioinformatics promises exciting advancements. Specialized models trained on omics data, open-access platforms, and interdisciplinary collaborations will expand the utility of LLMs. Moreover, integrating LLMs with other AI technologies, such as graph neural networks and reinforcement learning, can unlock deeper biological insights.</p><h3>Conclusion</h3><p>Large language models are revolutionizing bioinformatics by addressing longstanding challenges in data annotation, literature mining, and function prediction. Their ability to analyze complex biological datasets efficiently positions them as indispensable tools for modern research. As bioinformatics embraces AI, the synergy between LLMs and biological sciences holds the potential to unravel the complexities of life with unprecedented precision and scale.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/2791/ncbi-psi-blast-tutorial</guid>
	<pubDate>Fri, 23 Aug 2013 02:25:02 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/2791/ncbi-psi-blast-tutorial</link>
	<title><![CDATA[NCBI PSI-BLAST Tutorial]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/T3kHEieyylk" frameborder="0" allowfullscreen></iframe>http:--www.biotechnology.jhu.edu-
Tutorial for PSI-BLAST, an extension of BLAST that uses matrix algebra. BLAST is a cornerstone bioinformatics tool at NCBI. BLAST is the
Basic Local Alignment Search tool and will protein and DNA sequences that
are related to a sequence that the user provides.]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/22793/sequencing-by-xpansion</guid>
	<pubDate>Wed, 17 Jun 2015 20:58:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/22793/sequencing-by-xpansion</link>
	<title><![CDATA[Sequencing By Xpansion]]></title>
	<description><![CDATA[<p>Sequencing By Xpansion (SBX) is a DNA sequencing method that uses a simple biochemical reaction to encode the sequence of a DNA molecule into a highly measurable surrogate called an Xpandomer. This single molecule approach produces enough Xpandomer in a single drop reaction to sequence an entire human genome 1000X over. To achieve this, an Xpandomer replaces each DNA sequence with a sequence of large, high signal reporter molecules using the SBX molecular expansion technology. The DNA sequence is then read out as the Xpandomer reporters pass sequentially through a nanopore detector. SBX is a molecular engineering platform that benefits from core design principles that separate the multiple molecular functions. This systems approach enables efficient development and incorporation of improvements to SBX and is key to reconfiguring and optimizing Xpandomer measurement for different detection platforms.</p><p>http://www.stratosgenomics.com/stratos-genomics-technology</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27696/methylkit</guid>
	<pubDate>Fri, 03 Jun 2016 10:09:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27696/methylkit</link>
	<title><![CDATA[methylKit]]></title>
	<description><![CDATA[<p><em>methylKit</em> is an <a href="http://en.wikipedia.org/wiki/R_%28programming_language%29">R</a> package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from <a href="http://www.nature.com/nprot/journal/v6/n4/abs/nprot.2010.190.html">RRBS</a> and its variants, but also target-capture methods such as <a href="http://www.halogenomics.com/sureselect/methyl-seq">Agilent SureSelect methyl-seq</a>. In addition, methylKit can deal with base-pair resolution data for 5hmC obtained from Tab-seq or oxBS-seq. It can also handle whole-genome bisulfite sequencing data if proper input format is provided.</p><p>Address of the bookmark: <a href="https://github.com/al2na/methylKit" rel="nofollow">https://github.com/al2na/methylKit</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28121/kaiju</guid>
	<pubDate>Mon, 27 Jun 2016 11:23:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28121/kaiju</link>
	<title><![CDATA[Kaiju]]></title>
	<description><![CDATA[<p>Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.</p>
<p>By default, Kaiju uses either the available complete genomes from NCBI RefSeq or the microbial subset of the non-redundant protein database <em>nr</em> used by NCBI BLAST, optionally also including fungi and microbial eukaryotes.</p>
<p>Kaiju translates reads into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment. The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a protein database comprising 4821 microbial genomes. Kaiju can also be used for querying any other protein database without taxonomic classification, using either protein or nucleotide queries.</p>
<p>Kaiju is described in <a href="http://www.nature.com/ncomms/2016/160413/ncomms11257/full/ncomms11257.html">Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. <em>Nat. Commun.</em> 7:11257</a> (open access).</p><p>Address of the bookmark: <a href="http://kaiju.binf.ku.dk/" rel="nofollow">http://kaiju.binf.ku.dk/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/31566/software-and-tools-to-detect-structure-variation-with-long-reads</guid>
	<pubDate>Wed, 15 Mar 2017 14:31:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/31566/software-and-tools-to-detect-structure-variation-with-long-reads</link>
	<title><![CDATA[Software and Tools to detect structure variation with long reads !!]]></title>
	<description><![CDATA[<p>Uncovering the connection between genetics and heritable diseases requires an approach that looks at all the variant bases and types in a genome. While a PacBio&nbsp;<em>de novo</em>&nbsp;assembly resolves the most novel SV variants. 8-10X PacBio coverage of single genomes or trios reveals triple the SVs detectable by short-read data.</p><p>With&nbsp;<span style="text-decoration: underline;"><a href="http://www.pacb.com/smrt-science/">Single Molecule, Real-Time (SMRT) Sequencing</a></span>, you can access structural variations having a broad range of sizes, types, and GC content with the ability to:</p><ul>
<li>Uncover missing heritability linked to structural variation</li>
<li>Unambiguously identify genomic context and variant breakpoints at the sequence level to unravel the genetic etiology of disease</li>
<li>Resolve structural variation across the complete size spectrum with basepair resolution</li>
</ul><p>Following are the SV tools, which can assist you to achieve your goal.</p><p><strong>Sniffles:</strong>&nbsp;Structural variation caller using third generation sequencing</p><p>Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs using evidence from split-read alignments, high-mismatch regions, and coverage analysis. Please note the current version of Sniffles requires sorted output from BWA-MEM (use -M and -x parameter) or NGM-LR with the optional SAM attributes enabled!&nbsp;</p><p>More at&nbsp;https://github.com/fritzsedlazeck/Sniffles</p><p><strong style="font-size: 12.8px;"><br />MultiBreak-SV:</strong> It identifies structural variants from next-generation paired end data, third-generation long read data, or data from a combination of sequencing platforms.</p><p>There are two pieces of software in this release: (1) a pre-processor that takes machineformat (.m5) BLASR files, and (2) MultiBreak-SV. For installation and usage instructions, see doc/MultiBreakSV-Manual.txt.</p><p>More at&nbsp;https://github.com/raphael-group/multibreak-sv</p><p><strong style="font-size: 12.8px;"><br />Parliament:</strong>&nbsp;A Structural Variation Tool. Why ask a single sv-detection approach to find every variant when you can have a parliament of tools deciding?</p><p>Publication about the algorithm and &ldquo;&hellip;the first long-read characterization of structural variation in a diploid human personal genome&hellip;&rdquo; (HS1011) -&nbsp;<a href="http://www.biomedcentral.com/1471-2164/16/286">&ldquo;Assessing structural variation in a personal genome&mdash;towards a human reference diploid genome&rdquo;</a></p><p>More at&nbsp;https://sourceforge.net/projects/parliamentsv/</p><p>https://www.dnanexus.com/papers/Parliament_Info_Sheet.pdf</p><p><br /><strong>PBHoney:</strong>&nbsp;the structural variation discovery tool&nbsp;<br /><br />PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.</p><p>Read The Paper&nbsp;<a href="http://www.biomedcentral.com/1471-2105/15/180/abstract" target="_blank">http://www.biomedcentral.com/1471-2105/15/180/abstract</a></p><p>More at&nbsp;https://sourceforge.net/projects/pb-jelly/</p><p><strong><br />SMRT-SV:</strong> Structural variant and indel caller for PacBio reads</p><p>Structural variant (SV) and indel caller for PacBio reads based on methods from&nbsp;<a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html">Chaisson et al. 2014</a>.</p><p>SMRT-SV provides an official software package for tools described in&nbsp;<a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html">Chaisson et al. 2014</a>&nbsp;and adds several key features including the following.</p><ul>
<li>Unified variant calling user interface with built-in cluster compute support</li>
<li>Small indel calling (2-49 bp)</li>
<li>Improved inversion calling (<code>screenInversions</code>)</li>
<li>Quality metric for SV calls based on number of local assemblies supporting each call</li>
<li>Higher sensitivity for SV calls using tiled local assemblies across the entire genome instead of "signature" regions</li>
<li>Genotyping of SVs with Illumina paired-end reads from WGS samples</li>
</ul><p>More at&nbsp;https://github.com/EichlerLab/pacbio_variant_caller</p>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34565/fogsaa-fast-optimal-global-sequence-alignment-algorithm</guid>
	<pubDate>Fri, 08 Dec 2017 14:41:08 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34565/fogsaa-fast-optimal-global-sequence-alignment-algorithm</link>
	<title><![CDATA[FOGSAA: Fast Optimal Global Sequence Alignment Algorithm]]></title>
	<description><![CDATA[<p>Sequence alignment algorithms are widely used to infer similarirty and the point of differences between pair of sequences. FOGSAA is a fast Global alignment algorithm. It is basically a branch and bound approach which starts branch expansion in a greedy way taking the symbols from the given pair of sequences (protein or nucleotide) and results in an optimal alignment faster than conventional dymanic programming techniques. It is also better than the heuristic methods with respect to alignment quality.</p><p>Address of the bookmark: <a href="http://www.isical.ac.in/~bioinfo_miu/FOGSAA.htm" rel="nofollow">http://www.isical.ac.in/~bioinfo_miu/FOGSAA.htm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>