<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/31526?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/31526?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30833/dnasp-v5-a-software-for-comprehensive-analysis-of-dna-polymorphism-data</guid>
	<pubDate>Mon, 06 Feb 2017 04:45:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30833/dnasp-v5-a-software-for-comprehensive-analysis-of-dna-polymorphism-data</link>
	<title><![CDATA[DnaSP v5: a software for comprehensive analysis of DNA polymorphism data]]></title>
	<description><![CDATA[<p><span>DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.</span></p><p>Address of the bookmark: <a href="http://www.ub.edu/dnasp/" rel="nofollow">http://www.ub.edu/dnasp/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31087/bedtools</guid>
	<pubDate>Fri, 24 Feb 2017 04:50:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31087/bedtools</link>
	<title><![CDATA[bedtools]]></title>
	<description><![CDATA[<p>Collectively, the&nbsp;<strong>bedtools</strong>&nbsp;utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable&nbsp;<em>genome arithmetic</em>: that is, set theory on the genome. For example,&nbsp;<strong>bedtools</strong>&nbsp;allows one to<em>intersect</em>,&nbsp;<em>merge</em>,&nbsp;<em>count</em>,&nbsp;<em>complement</em>, and&nbsp;<em>shuffle</em>&nbsp;genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g.,&nbsp;<em>intersect</em>&nbsp;two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.</p>
<p><strong>bedtools</strong>&nbsp;is developed in the&nbsp;<a href="http://quinlanlab.org/">Quinlan laboratory</a>&nbsp;at the&nbsp;<a href="http://www.utah.edu/">University of Utah</a>&nbsp;and benefits from fantastic contributions made by scientists worldwide.</p><p>Address of the bookmark: <a href="http://bedtools.readthedocs.io/en/latest/index.html" rel="nofollow">http://bedtools.readthedocs.io/en/latest/index.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31156/splitbam-splits-a-bam-by-chromosomes</guid>
	<pubDate>Tue, 28 Feb 2017 09:01:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31156/splitbam-splits-a-bam-by-chromosomes</link>
	<title><![CDATA[splitbam: splits a BAM by chromosomes]]></title>
	<description><![CDATA[<p><strong>splitbam</strong>&nbsp;splits a BAM by chromosomes.</p>
<p>Using the reference sequence dictionary (<code>*.dict</code>), it also creates some empty BAM files if no sam record was found for a chromosome. A pair of 'mock' SAM-Records can also be added to those empty BAMs to avoid some tools (like samtools) to crash.</p>
<h1>Usage</h1>
<p><code>java -jar splitbam.jar -p OUT/__CHROM__/__CHROM__.bam -R ref.fasta (bam|sam|stdin)</code></p>
<h1>Options</h1>
<ul>
<li>-h help; This screen.</li>
<li>-R (indexed reference file) REQUIRED.</li>
<li>-u (unmapped chromosome name): default:Unmapped</li>
<li>-e | --empty : generate EMPTY bams for chromosome having no read mapped</li>
<li>-m | --mock : if option '-e', add a mock pair of sam records to the empty bam</li>
<li>-p (output file/bam pattern) REQUIRED. MUST contain&nbsp;<strong><code>__CHROM__</code></strong>&nbsp;and end with .bam</li>
<li>-s assume input is sorted.</li>
<li>-x | --index create index.</li>
<li>-t | --tmp (dir) tmp file directory</li>
<li>-G (file) chrom-group file (see below)</li>
</ul><p>Address of the bookmark: <a href="https://code.google.com/archive/p/jvarkit/wikis/SplitBam.wiki" rel="nofollow">https://code.google.com/archive/p/jvarkit/wikis/SplitBam.wiki</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31345/prokka-tool-for-the-rapid-annotation-of-prokaryotic-genomes</guid>
	<pubDate>Mon, 06 Mar 2017 03:49:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31345/prokka-tool-for-the-rapid-annotation-of-prokaryotic-genomes</link>
	<title><![CDATA[Prokka: tool for the rapid annotation of prokaryotic genomes]]></title>
	<description><![CDATA[<p>Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="http://www.vicbioinformatics.com/software.prokka.shtml" rel="nofollow">http://www.vicbioinformatics.com/software.prokka.shtml</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31375/cocacola-binning-metagenomic-contigs-using-sequence-composition-read-coverage-co-alignment-and-paired-end-read-linkage</guid>
	<pubDate>Tue, 07 Mar 2017 08:50:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31375/cocacola-binning-metagenomic-contigs-using-sequence-composition-read-coverage-co-alignment-and-paired-end-read-linkage</link>
	<title><![CDATA[COCACOLA (binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge)]]></title>
	<description><![CDATA[<p>COCACOLA is a general framework that combines different types of information: sequence COmposition, CoverAge across multiple samples, CO-alignment to reference genomes and paired-end reads LinkAge to automatically bin contigs into OTUs. Furthermore, COCACOLA seamlessly embraces customized prior knowledge to facilitate binning accuracy.</p>
<p>News: Python version of COCACOLA is available now!</p><p>Address of the bookmark: <a href="https://github.com/younglululu/COCACOLA" rel="nofollow">https://github.com/younglululu/COCACOLA</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32048/json</guid>
	<pubDate>Tue, 04 Apr 2017 08:02:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32048/json</link>
	<title><![CDATA[JSON]]></title>
	<description><![CDATA[<p><strong>JSON</strong>&nbsp;(JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the&nbsp;<a href="http://javascript.crockford.com/">JavaScript Programming Language</a>,&nbsp;<a href="http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf">Standard ECMA-262 3rd Edition - December 1999</a>. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.</p>
<p>JSON is built on two structures:</p>
<ul>
<li>A collection of name/value pairs. In various languages, this is realized as an&nbsp;<em>object</em>, record, struct, dictionary, hash table, keyed list, or associative array.</li>
<li>An ordered list of values. In most languages, this is realized as an&nbsp;<em>array</em>, vector, list, or sequence.</li>
</ul>
<p>These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.</p><p>Address of the bookmark: <a href="http://json.org/" rel="nofollow">http://json.org/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32465/tetra-nucleotide-analysis</guid>
	<pubDate>Thu, 04 May 2017 05:07:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32465/tetra-nucleotide-analysis</link>
	<title><![CDATA[Tetra-Nucleotide Analysis]]></title>
	<description><![CDATA[<p>A tetra-nucleotide is a fragment of DNA sequence with 4 bases (e.g. AGTC or TTGG). Pride&nbsp;<em>et al.</em>&nbsp;(2003) showed that the frequency of tetra-nucleotides in bacterial genomes contain useful, albeit weak, phylogenetic signals. Even though tetra-nucleotide analysis (TNA) utilizes the information of whole genome, it is evident that it cannot replace other alignment-based phylogenetic methods such as&nbsp;<a href="https://chunlab.wordpress.com/orthoani/">OrthoANI</a>&nbsp;or&nbsp;16S rRNA phylogeny. However, TNA can be useful for&nbsp;phylogenetic characterization when whole genome or 16S rRNA gene information is not available. For example, a partial genomic fragment obtained from a metagenome can be identified by TNA (Teeling&nbsp;<em>et al.</em>, 2004). TNA is also fast enough that it can be&nbsp;used&nbsp;as a search engine against a large genome database.</p><p>Address of the bookmark: <a href="https://chunlab.wordpress.com/tetra-nucleotide-analysis/" rel="nofollow">https://chunlab.wordpress.com/tetra-nucleotide-analysis/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32730/ncbi-prokaryotic-genome-annotation-pipeline</guid>
	<pubDate>Tue, 16 May 2017 08:56:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32730/ncbi-prokaryotic-genome-annotation-pipeline</link>
	<title><![CDATA[NCBI Prokaryotic Genome Annotation Pipeline]]></title>
	<description><![CDATA[<p>NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).</p>
<p>Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.</p>
<p>NCBI has developed an automatic prokaryotic genome annotation pipeline that combines&nbsp;<em>ab initio</em>&nbsp;gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP;&nbsp;<a href="https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=18416670">see Pubmed Article</a>) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. You can find a more detailed description of the new version of&nbsp;the pipeline in&nbsp;<a href="https://www.ncbi.nlm.nih.gov/books/NBK174280/">NCBI Handbook chapter</a>. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.</p>
<p>https://www.ncbi.nlm.nih.gov/genome/annotation_prok/</p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/genome/annotation_prok/" rel="nofollow">https://www.ncbi.nlm.nih.gov/genome/annotation_prok/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</guid>
	<pubDate>Thu, 02 Jan 2025 11:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</link>
	<title><![CDATA[Large Language Models in Bioinformatics: Transforming Data Analysis and Interpretation]]></title>
	<description><![CDATA[<p>The integration of artificial intelligence (AI) into bioinformatics has ushered in a new era of computational biology. Among the most transformative advancements are large language models (LLMs), such as GPT and BERT, which leverage deep learning to process and interpret vast amounts of text data. These models are reshaping bioinformatics by enhancing data analysis, hypothesis generation, and literature mining.</p><h3>Understanding Large Language Models</h3><p>LLMs are AI systems trained on extensive datasets of natural language. Their ability to model context, identify patterns, and generate coherent language has proven invaluable across domains, including bioinformatics. By fine-tuning these models on biological datasets, researchers can unlock insights into molecular biology, systems biology, and beyond.</p><h3>Key Applications of LLMs in Bioinformatics</h3><h4>1. <strong>Annotating Biological Data</strong></h4><p>Annotating genomic and proteomic data is fundamental yet labor-intensive. LLMs streamline this process by extracting functional annotations from literature and databases, predicting gene and protein functions, and providing automated insights.</p><h4>2. <strong>Mining Scientific Literature</strong></h4><p>The exponential growth of publications presents a challenge for researchers to stay updated. LLMs can process large volumes of text to extract key findings, summarize papers, and identify trends, thereby facilitating efficient literature reviews.</p><h4>3. <strong>Predicting Gene and Protein Functions</strong></h4><p>By leveraging sequence data and annotations, LLMs can predict the functions of uncharacterized genes and proteins. This capability is particularly useful for studying non-model organisms and orphan genes.</p><h4>4. <strong>Drug Discovery and Repurposing</strong></h4><p>LLMs enable pattern recognition across chemical, genomic, and clinical datasets, identifying novel drug candidates and repurposing existing drugs for new therapeutic targets. They can simulate interactions between drugs and biological molecules, accelerating the discovery pipeline.</p><h4>5. <strong>Generating Hypotheses for Research</strong></h4><p>LLMs analyze complex datasets to propose testable hypotheses. For example, they can predict protein-protein interactions, identify regulatory motifs, or model evolutionary processes in genomes.</p><h3>Advantages of LLMs in Bioinformatics</h3><ul>
<li>
<p><strong>Scalability:</strong> LLMs process massive datasets rapidly, reducing the time required for data analysis.</p>
</li>
<li>
<p><strong>Versatility:</strong> These models adapt to diverse bioinformatics tasks, from genomic annotation to network analysis.</p>
</li>
<li>
<p><strong>Contextual Insights:</strong> By synthesizing information across disparate datasets, LLMs provide integrative insights into biological systems.</p>
</li>
</ul><h3>Challenges in Applying LLMs</h3><p>Despite their promise, LLMs face limitations:</p><ul>
<li>
<p><strong>Data Quality and Bias:</strong> Inaccurate or biased datasets can affect model predictions, necessitating rigorous data curation.</p>
</li>
<li>
<p><strong>Interpretability:</strong> Understanding the decision-making process of LLMs remains a critical challenge, especially in high-stakes fields like genomics and medicine.</p>
</li>
<li>
<p><strong>Resource Intensity:</strong> Training and deploying LLMs require substantial computational power, which can limit accessibility.</p>
</li>
<li>
<p><strong>Ethical Concerns:</strong> Handling sensitive genomic data raises privacy and security issues, emphasizing the need for ethical guidelines.</p>
</li>
</ul><h3>Future Prospects</h3><p>The continued development of LLMs tailored for bioinformatics promises exciting advancements. Specialized models trained on omics data, open-access platforms, and interdisciplinary collaborations will expand the utility of LLMs. Moreover, integrating LLMs with other AI technologies, such as graph neural networks and reinforcement learning, can unlock deeper biological insights.</p><h3>Conclusion</h3><p>Large language models are revolutionizing bioinformatics by addressing longstanding challenges in data annotation, literature mining, and function prediction. Their ability to analyze complex biological datasets efficiently positions them as indispensable tools for modern research. As bioinformatics embraces AI, the synergy between LLMs and biological sciences holds the potential to unravel the complexities of life with unprecedented precision and scale.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/41905/research-associate-bioinformatics-in-iisc-recruitment-2020</guid>
  <pubDate>Tue, 23 Jun 2020 21:53:34 -0500</pubDate>
  <link></link>
  <title><![CDATA[Research Associate Bioinformatics in IISc Recruitment 2020]]></title>
  <description><![CDATA[
<p>Research Associate Bioinformatics in IISc Recruitment 2020</p>

<p>Essential Qualifications: Ph.D. (Bioinformatics/ Biophysics/ Biotechnology or any other stream of biological/ physical sciences) with a minimum of two publications in reputed peer reviewed journals in the area of structural bioinformatics or biophysics or biomolecular modeling/ simulation.</p>

<p>Job description: Development of bioinformatics tools and algorithms/software for structure based analysis of biomolecular systems. Programmatic access to major biomolecular databases using APIs Knowledge based prediction and analysis of biomolecular structure, function and interactions. Docking/simulations for inhibitor design.</p>

<p>Desirable Qualifications (Research Associate/s): i)  Strong computer programming skills (in Python/PERL/PHP or C++ or object oriented database management systems like MySQL etc or scripting languages under LINUX/UNIX environment). </p>

<p>ii) Extensive experience in computational analysis of biomolecular structure/interactions and usage of advanced biomolecular simulation softwares. iii) Adequate knowledge of major databases, webservers and softwares in the area of biomolecular structure/function and drug design. iv)  Familiarity with Parallel Programming environments and experience in usage of high-end HPC clusters.</p>

<p>The candidates must highlight their experience in above mentioned fields/topics in their CV. Initial appointment will be for a period of 1 year, subject to extension after review of performance.</p>

<p>Emoluments: As per DST, GOI norms and commensurate with experience.</p>

<p>More at https://www.iisc.ac.in/positions-open/</p>
]]></description>
</item>

</channel>
</rss>