<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/39200?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/39200?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/37586/julia-programming-language-a-python-and-r-rival</guid>
	<pubDate>Sat, 25 Aug 2018 04:46:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/37586/julia-programming-language-a-python-and-r-rival</link>
	<title><![CDATA[Julia Programming Language, a Python and R rival]]></title>
	<description><![CDATA[<p>Big data has grown to become one of the most lucrative fields. In fact, data scientists are some of the most sought people. They are usually hired to analyze, control and parse large chunks of data. Implementing these actions using traditional techniques is not a walk in the park. This is why most data scientists prefer using programming languages such as R and Python. However, there is one more programming language that can do the job. That is Julia programming language.</p><p>What Is Julia Language?</p><p>Julia is a programming language that came into the limelight in 2012. It is a general-purpose programming language that was designed for solving scientific computations. Julia was meant to be an alternative to Python, R and other programming languages that were mainly used for manipulating data. This is because it has numerous features that can minimize the complexities of numerical computations.&nbsp;</p><p>Julia optimizes on the best features of Python and R while at the same time overlooks their weaknesses. This explains why it is viewed as an alternative to these programming languages. For instance, it utilizes the readability and simplicity of Python then performs faster.</p><p>Julia is the most preferred programming language for data scientists and mathematicians. This is because its core features are similar to the ones that are used on most data software. Also, the language is ideal for these two subjects because its syntax is similar to the standard mathematical formulas.</p><p>Key Features Of Julia Language<br />Uses JIT Compilation<br />Parallelism<br />Dynamic Typing<br />Simple Syntax<br />Allows Metaprogramming<br />Accessible to Libraries<br />-1-Array Indexing</p><p>Julia Vs Python And R Programming Languages<br />1. Speed<br />Julia is faster than both Python and R. This is a very critical aspect that is given special attention in the big data programming. The high speed of Julia is because of JIT compilers. You will need to install external libraries on Python to achieve similar speed.</p><p>2. Syntax<br />Julia has a math-friendly syntax. The syntax of this programming language is similar to the mathematical formulas hence can be used to perform mathematical and scientific computations. This syntax makes it easier to learn than Python.</p><p>3. Parallelism<br />Although both Python and R use parallelism, Julia uses a top-level parallelism. Julia allows the processor to perform to the optimum level than what Python and R can achieve.</p><p>4. Versatility<br />Julia programming language is more versatile than Python and R. It allows a programmer to move from different codes and functions with ease.</p><p>The only area that Python and R are superior to Julia is in terms of community. Given that Julia is a new programming language, it has a small community as compared to others which have been around for years.</p><p>In overall Julia programming language is a better alternative that you can use to handle Big data projects. Despite having a small community, it is one of those programming languages that you can easily learn.</p>]]></description>
	<dc:creator>Radha Agarkar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41328/deephic-a-generative-adversarial-network-for-enhancing-hi-c-data-resolution</guid>
	<pubDate>Tue, 03 Mar 2020 01:12:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41328/deephic-a-generative-adversarial-network-for-enhancing-hi-c-data-resolution</link>
	<title><![CDATA[DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution]]></title>
	<description><![CDATA[<p><strong>DeepHiC</strong> is a GAN-based model for enhancing Hi-C data resolution. We developed this server for helping researchers to enhance their own low-resolution data by a few steps of clicks. <em>Ab initio</em> training could be performed according to our published <a href="https://github.com/omegahh/DeepHiC">code</a>. We provided trained models for various depth of low-coverage sequencing Hi-C data. The depth of input data is estimated by its distribution comparing with those of the downsampled Hi-C data we used in training</p><p>Address of the bookmark: <a href="http://sysomics.com/deephic" rel="nofollow">http://sysomics.com/deephic</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</guid>
	<pubDate>Wed, 15 Sep 2021 21:15:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</link>
	<title><![CDATA[Reference Sequence Resource!]]></title>
	<description><![CDATA[<p><span>The ENCODE project uses Reference Genomes from&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/genome/browse/reference/">NCBI</a><span>&nbsp;or&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/downloads.html">UCSC</a><span>&nbsp;to provide a consistent framework for mapping high-throughput sequencing data.&nbsp;In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability.&nbsp;</span><em>Drosophia melanogaster</em><span>&nbsp;experiments are mapped to either dm3 or dm6 and&nbsp;</span><em>Caenorhabdilis elegans&nbsp;</em><span>experiments are mapped to ce10 or ce11.&nbsp;T</span></p><p>Address of the bookmark: <a href="https://www.encodeproject.org/data-standards/reference-sequences/" rel="nofollow">https://www.encodeproject.org/data-standards/reference-sequences/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</guid>
	<pubDate>Thu, 02 Jan 2025 11:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</link>
	<title><![CDATA[Large Language Models in Bioinformatics: Transforming Data Analysis and Interpretation]]></title>
	<description><![CDATA[<p>The integration of artificial intelligence (AI) into bioinformatics has ushered in a new era of computational biology. Among the most transformative advancements are large language models (LLMs), such as GPT and BERT, which leverage deep learning to process and interpret vast amounts of text data. These models are reshaping bioinformatics by enhancing data analysis, hypothesis generation, and literature mining.</p><h3>Understanding Large Language Models</h3><p>LLMs are AI systems trained on extensive datasets of natural language. Their ability to model context, identify patterns, and generate coherent language has proven invaluable across domains, including bioinformatics. By fine-tuning these models on biological datasets, researchers can unlock insights into molecular biology, systems biology, and beyond.</p><h3>Key Applications of LLMs in Bioinformatics</h3><h4>1. <strong>Annotating Biological Data</strong></h4><p>Annotating genomic and proteomic data is fundamental yet labor-intensive. LLMs streamline this process by extracting functional annotations from literature and databases, predicting gene and protein functions, and providing automated insights.</p><h4>2. <strong>Mining Scientific Literature</strong></h4><p>The exponential growth of publications presents a challenge for researchers to stay updated. LLMs can process large volumes of text to extract key findings, summarize papers, and identify trends, thereby facilitating efficient literature reviews.</p><h4>3. <strong>Predicting Gene and Protein Functions</strong></h4><p>By leveraging sequence data and annotations, LLMs can predict the functions of uncharacterized genes and proteins. This capability is particularly useful for studying non-model organisms and orphan genes.</p><h4>4. <strong>Drug Discovery and Repurposing</strong></h4><p>LLMs enable pattern recognition across chemical, genomic, and clinical datasets, identifying novel drug candidates and repurposing existing drugs for new therapeutic targets. They can simulate interactions between drugs and biological molecules, accelerating the discovery pipeline.</p><h4>5. <strong>Generating Hypotheses for Research</strong></h4><p>LLMs analyze complex datasets to propose testable hypotheses. For example, they can predict protein-protein interactions, identify regulatory motifs, or model evolutionary processes in genomes.</p><h3>Advantages of LLMs in Bioinformatics</h3><ul>
<li>
<p><strong>Scalability:</strong> LLMs process massive datasets rapidly, reducing the time required for data analysis.</p>
</li>
<li>
<p><strong>Versatility:</strong> These models adapt to diverse bioinformatics tasks, from genomic annotation to network analysis.</p>
</li>
<li>
<p><strong>Contextual Insights:</strong> By synthesizing information across disparate datasets, LLMs provide integrative insights into biological systems.</p>
</li>
</ul><h3>Challenges in Applying LLMs</h3><p>Despite their promise, LLMs face limitations:</p><ul>
<li>
<p><strong>Data Quality and Bias:</strong> Inaccurate or biased datasets can affect model predictions, necessitating rigorous data curation.</p>
</li>
<li>
<p><strong>Interpretability:</strong> Understanding the decision-making process of LLMs remains a critical challenge, especially in high-stakes fields like genomics and medicine.</p>
</li>
<li>
<p><strong>Resource Intensity:</strong> Training and deploying LLMs require substantial computational power, which can limit accessibility.</p>
</li>
<li>
<p><strong>Ethical Concerns:</strong> Handling sensitive genomic data raises privacy and security issues, emphasizing the need for ethical guidelines.</p>
</li>
</ul><h3>Future Prospects</h3><p>The continued development of LLMs tailored for bioinformatics promises exciting advancements. Specialized models trained on omics data, open-access platforms, and interdisciplinary collaborations will expand the utility of LLMs. Moreover, integrating LLMs with other AI technologies, such as graph neural networks and reinforcement learning, can unlock deeper biological insights.</p><h3>Conclusion</h3><p>Large language models are revolutionizing bioinformatics by addressing longstanding challenges in data annotation, literature mining, and function prediction. Their ability to analyze complex biological datasets efficiently positions them as indispensable tools for modern research. As bioinformatics embraces AI, the synergy between LLMs and biological sciences holds the potential to unravel the complexities of life with unprecedented precision and scale.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34867/magic-blast-a-tool-for-mapping-large-next-generation-rna-or-dna-sequencing-runs-against-a-whole-genome-or-transcriptome</guid>
	<pubDate>Tue, 26 Dec 2017 22:23:39 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34867/magic-blast-a-tool-for-mapping-large-next-generation-rna-or-dna-sequencing-runs-against-a-whole-genome-or-transcriptome</link>
	<title><![CDATA[Magic-BLAST: a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.]]></title>
	<description><![CDATA[<p>Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.</p>
<p>Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pubmed/26109056">(http://www.ncbi.nlm.nih.gov/pubmed/26109056)</a>, and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.</p><p>Address of the bookmark: <a href="https://ncbi.github.io/magicblast/" rel="nofollow">https://ncbi.github.io/magicblast/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36921/breakpointer-using-local-mapping-artifacts-to-support-sequence-breakpoint-discovery-from-single-end-reads</guid>
	<pubDate>Tue, 12 Jun 2018 12:41:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36921/breakpointer-using-local-mapping-artifacts-to-support-sequence-breakpoint-discovery-from-single-end-reads</link>
	<title><![CDATA[Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads]]></title>
	<description><![CDATA[Breakpointer is a fast tool for locating sequence breakpoints from the alignment of single end reads (SE) produced by next generation sequencing (NGS). It adopts a heuristic method in searching for local mapping signatures created by insertion/deletions (indels) or more complex structural variants(SVs).<p>Address of the bookmark: <a href="https://github.com/ruping/Breakpointer" rel="nofollow">https://github.com/ruping/Breakpointer</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43711/vcf-compare</guid>
	<pubDate>Wed, 19 Jan 2022 10:30:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43711/vcf-compare</link>
	<title><![CDATA[VCF Compare !]]></title>
	<description><![CDATA[<h2><span>compare two&nbsp;<strong>BWA</strong>&nbsp;mapping methods with the online hg18-mapped data</span></h2>
<p>We first operate a rapid inspection of the different BAM files using&nbsp;<strong>samtools flagstat</strong>. Illumina provided chr21 read mapping obtained with their&nbsp;<strong>GA IIx</strong>&nbsp;deep sequencing platform &lt;<a href="ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam" target="_blank">ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam</a>&gt;, aligned to the b36/hg18 reference genome)</p><p>Address of the bookmark: <a href="https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare" rel="nofollow">https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27080/mrfast-micro-read-fast-alignment-search-tool</guid>
	<pubDate>Tue, 26 Apr 2016 03:50:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27080/mrfast-micro-read-fast-alignment-search-tool</link>
	<title><![CDATA[mrFAST:  Micro Read Fast Alignment Search Tool]]></title>
	<description><![CDATA[<p><span>mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find&nbsp;</span><strong><span style="text-decoration: underline;">'all'</span></strong><span>&nbsp; mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked.</span></p>
<p><span>More at&nbsp;http://mrfast.sourceforge.net/manual.html</span></p><p>Address of the bookmark: <a href="http://mrfast.sourceforge.net/manual.html" rel="nofollow">http://mrfast.sourceforge.net/manual.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29912/maq-mapping-and-assembly-with-quality</guid>
	<pubDate>Tue, 22 Nov 2016 04:51:39 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29912/maq-mapping-and-assembly-with-quality</link>
	<title><![CDATA[Maq: Mapping and Assembly with Quality]]></title>
	<description><![CDATA[<p><strong>Maq</strong>&nbsp;stands for&nbsp;<em>Mapping and Assembly with Quality</em>&nbsp;It builds assembly by mapping short reads to reference sequences. Maq is a project hosted by&nbsp;<a href="http://sourceforge.net/">SourceForge.net</a>. The project page is available at<a href="http://sourceforge.net/projects/maq/">http://sourceforge.net/projects/maq/</a>. Maq is previously known as mapass2.</p>
<h2>Run Maq Now</h2>
<p>Follow these steps to try Maq. All you need is a reference sequence file in the FASTA format.</p>
<ol>
<li>Prepare a reference sequence (ref.fasta). Better a bacterial genome.</li>
<li>Download maq, maq-data and maqview at the&nbsp;<a href="http://sourceforge.net/project/showfiles.php?group_id=191815">download page</a>.</li>
<li>Copy maq, maq.pl and maq_eval.pl to the $PATH or to the same directory.</li>
<li>Simulate diploid reference and read sequences, map reads, call variants and evaluate the results in one go:
<pre>maq.pl demo ref.fasta calib-30.dat
</pre>
where&nbsp;<em>calib-30.dat</em>&nbsp;is contained in maq-data.</li>
<li>View the alignment:
<pre>cd maqdemo/easyrun;
maqindex -i -c consensus.cns all.map;
maqview -c consensus.cns all.map</pre>
</li>
</ol>
<p><strong>Even for advanced maq users, running `maq.pl demo' is recommended. You may find something helpful.</strong></p><p>Address of the bookmark: <a href="http://maq.sourceforge.net" rel="nofollow">http://maq.sourceforge.net</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</guid>
	<pubDate>Wed, 23 May 2018 06:54:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</link>
	<title><![CDATA[BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,]]></title>
	<description><![CDATA[<p><span>BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.</span></p>
<p>Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The &ldquo;target.fasta.sa&rdquo; is the suffix array from &ldquo;target.fasta&rdquo; generated by sawriter.</p>
<blockquote>
<p>blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15</p>
</blockquote>
<p>the output format option &ldquo;-m 4&Prime; generate the alignment coordinate. Not fully documented, but I can explain that to you.&nbsp;</p>
<p>I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.</p><p>Address of the bookmark: <a href="http://bix.ucsd.edu/projects/blasr/" rel="nofollow">http://bix.ucsd.edu/projects/blasr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>