<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/30093?offset=280</link>
	<atom:link href="https://bioinformaticsonline.com/related/30093?offset=280" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27323/cutadapt</guid>
	<pubDate>Fri, 13 May 2016 04:54:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27323/cutadapt</link>
	<title><![CDATA[cutadapt]]></title>
	<description><![CDATA[<p>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.</p>
<p>Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the 3&rsquo; sequencing adapter because the read is longer than the molecule that is sequenced. Amplicon reads start with a primer sequence. Poly-A tails are useful for pulling out RNA from your sample, but often you don&rsquo;t want them to be in your reads.</p>
<p>Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.</p>
<p>Cutadapt comes with an extensive suite of automated tests and is available under the terms of the MIT license.</p>
<p>If you use cutadapt, please cite <a href="http://dx.doi.org/10.14806/ej.17.1.200">DOI:10.14806/ej.17.1.200</a> .</p><p>Address of the bookmark: <a href="https://cutadapt.readthedocs.io/en/stable/installation.html#quickstart" rel="nofollow">https://cutadapt.readthedocs.io/en/stable/installation.html#quickstart</a></p>]]></description>
	<dc:creator>Radha Agarkar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</guid>
	<pubDate>Mon, 13 Jun 2016 05:47:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</link>
	<title><![CDATA[BBMap/BBTools package: Multipurpose tool designed for converting reads or other nucleotide data between different formats.]]></title>
	<description><![CDATA[<div id="post_message_148585"><a href="https://sourceforge.net/projects/bbmap/" target="_blank">Reformat</a>is a member of the <a href="https://sourceforge.net/projects/bbmap/" target="_blank">BBMap/BBTools package</a>. It is a multipurpose tool designed for converting reads or other nucleotide data between different formats. It supports, and can inter-convert:<br /> <br /> fastq<br /> fasta<br /> fasta+qual<br /> sam<br /> scarf (an old Illumina format)<br /> bam (if samtools is installed)<br /> gzip<br /> zip<br /> ascii-33 (sanger)<br /> ascii-64 (old Illumina)<br /> paired files<br /> interleaved files<br /> <br /> It is multithreaded and can process data at over 500 megabytes per second, and can accept streams from standard in and write to standard out, allowing it to be easily dropped into the middle of a pipeline for format conversion. Reformat autodetects formats based on file extensions and content, making it very easy to use; and the autodetection can be overridden, allowing flexibility for people who don't like to follow naming conventions, or out-of-spec fastq files with qualities values like -17 or 120.<br /> <br /> The program has been gradually expanded, and can now perform various other functions. None of these will break pairing, if the input is paired.<br /> <br /> Quality trimming (either or both ends)<br /> Quality filtering<br /> Fixed-length trimming<br /> Generation of histograms (base composition, quality, etc)<br /> Subsampling (to a fraction of input reads, or an exact number of reads or bases)<br /> Changing fasta line-wrapping length<br /> Reverse-complementing (all reads or only read 2)<br /> Adding /1 and /2 suffix to read names<br /> GC-content filtering<br /> Length-filtering<br /> Testing for corrupted interleaved files<br /> <br /> Reformat is compatible with any platform that supports Java 1.7 or higher. It also has a bash shellscript for simpler invocation. Typical usage examples:<br /> <br /> Reformat fastq into fasta:<br /> <strong>reformat.sh in=x.fq out=y.fa</strong><br /> <br /> Interleave paired reads:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out=y.fq</strong><br /> <br /> Note - you can actually use a shortcut if paired read files have the same name with a 1 and a 2. This is equivalent to the above command:<br /> <strong>reformat.sh in=x#.fq out=y.fq</strong><br /> <br /> De-interleave reads:<br /> <strong>reformat.sh in=x.fq out1=y1.fq out2=y2.fq</strong><br /> <br /> Verify that interleaving appears correct, assuming Illumina namimg conventions:<br /> <strong>reformat.sh in=x.fq vint</strong><br /> <br /> Convert ASCII-33 to ASCII-64:<br /> <strong>reformat.sh in=x.fq out=y.fq qin=33 qout=64</strong><br /> <br /> Quality-trim paired reads to Q10 on the left and right ends and discard reads shorter than 50bp after trimming:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out1=y1.fq out2=y2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlength=50</strong><br /> <br /> Subsample 10% of the first 20000 pairs in an interleaved file:<br /> <strong>reformat.sh in=x.fq out=y.fq reads=20000 samplerate=0.1 int=t</strong><br /> (in this case "int=t" overrides interleaving autodetection, to ensure reads are treated as pairs)<br /> <br /> Pipe in a gzipped sam file and pipe out fasta:<br /> <strong>reformat.sh in=stdin.sam.gz out=stdout.fa</strong><br /> <br /> Reverse-complement reads:<br /> <strong>reformat.sh in=x.fq out=y.fq rcomp</strong><br /> <br /> For reformatting a file with very long sequences, Reformat will need more memory; just add the additional flag "-Xmx2g". For example, to change the line-wrapping length on the human genome (which has individual sequences over 200Mbp long) to 70 characters:<br /> <strong>reformat.sh -Xmx2g in=HG19.fa.gz out=HG19_wrapped.fa.gz fastawrap=70</strong><br /> <br /> For additional functions, please run the shellscript with no arguments, or just read it with a text editor. If you have any questions, please post them in this thread.<br /> <br /> For people using a non-bash terminal, you may need to type "bash reformat.sh" instead of just "reformat.sh".<br /> For users of Windows or other platforms that do not support bash shellscripts, replace "reformat.sh" with "java -ea -Xmx200m /path/to/bbmap/current/ jgi.ReformatReads"<br /> for example,<br /> <strong>java -ea -Xmx200m C:\bbmap\current\ jgi.ReformatReads in=x.fq out=y.fa</strong><br /> <br /> Reformat can be downloaded with BBTools here:<br /> <a href="https://sourceforge.net/projects/bbmap/" target="_blank">https://sourceforge.net/projects/bbmap/</a></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32152/upsetr-shiny-app</guid>
	<pubDate>Fri, 14 Apr 2017 06:19:54 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32152/upsetr-shiny-app</link>
	<title><![CDATA[UpSetR Shiny App!]]></title>
	<description><![CDATA[<p>UpSetR generates static&nbsp;<a href="http://vcg.github.io/upset/?dataset=0&amp;duration=1000&amp;orderBy=subsetSize&amp;grouping=groupByIntersectionSize&amp;selection=">UpSet plots</a>. The UpSet technique visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes.</p>
<h4>To begin, input your data using one of the three input styles.</h4>
<ol>
<li>"File" takes a correctly formatted.csv file.</li>
<li>"List" takes up to 6 different lists that contain unique elements, similar to that used in the web applications BioVenn&nbsp;<a href="http://www.biomedcentral.com/content/pdf/1471-2164-9-488.pdf">(Hulsen et al., 2008)</a>&nbsp;and jvenn&nbsp;<a href="http://www.biomedcentral.com/content/pdf/1471-2105-15-293.pdf">(Bardou et al., 2014)</a></li>
<li>"Expression" takes the input used by the venneuler R package&nbsp;<a href="https://cran.r-project.org/web/packages/venneuler/venneuler.pdf">(Wilkinson, 2015)</a></li>
</ol><p>Address of the bookmark: <a href="https://gehlenborglab.shinyapps.io/upsetr/" rel="nofollow">https://gehlenborglab.shinyapps.io/upsetr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28119/kraken-ultrafast-metagenomic-sequence-classification-using-exact-alignments</guid>
	<pubDate>Mon, 27 Jun 2016 11:01:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28119/kraken-ultrafast-metagenomic-sequence-classification-using-exact-alignments</link>
	<title><![CDATA[Kraken: ultrafast metagenomic sequence classification using exact alignments]]></title>
	<description><![CDATA[<p>Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of <em>k</em>-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at <a href="http://ccb.jhu.edu/software/kraken/" target="pmc_ext">http://ccb.jhu.edu/software/kraken/</a>.</p>
<p>Krona</p>
<p>https://sourceforge.net/p/krona/home/krona/</p><p>Address of the bookmark: <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/" rel="nofollow">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32483/cla-contig-layout-authenticator</guid>
	<pubDate>Fri, 05 May 2017 05:58:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32483/cla-contig-layout-authenticator</link>
	<title><![CDATA[CLA: Contig-Layout-Authenticator]]></title>
	<description><![CDATA[<p><span>To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.</span></p>
<p><span>Script&nbsp;https://sourceforge.net/projects/c-l-authenticator/files/</span></p><p>Address of the bookmark: <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155459" rel="nofollow">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155459</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40613/genome-in-a-bottle-giab-consortium</guid>
	<pubDate>Sat, 25 Jan 2020 13:50:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40613/genome-in-a-bottle-giab-consortium</link>
	<title><![CDATA[Genome in a Bottle (GIAB) Consortium]]></title>
	<description><![CDATA[<p><span>The</span><a href="http://www.genomeinabottle.org/"> Genome in a Bottle (GIAB) Consortium</a><span> is a public-private-academic consortium hosted by </span><a href="http://www.nist.gov/" target="_blank">NIST</a><span> to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice. </span></p>
<p><span><a href="https://www.nist.gov/news-events/news/2016/09/nist-releases-new-family-standardized-genomes">https://www.nist.gov/news-events/news/2016/09/nist-releases-new-family-standardized-genomes</a></span></p><p>Address of the bookmark: <a href="https://jimb.stanford.edu/giab/" rel="nofollow">https://jimb.stanford.edu/giab/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/29208/srf-bioinformatics-job-position-in-national-institute-of-plant-genome-research-nipgr</guid>
  <pubDate>Mon, 19 Sep 2016 05:43:38 -0500</pubDate>
  <link></link>
  <title><![CDATA[SRF Bioinformatics job position in National Institute of Plant Genome Research (NIPGR)]]></title>
  <description><![CDATA[
<p>SRF Bioinformatics job position in National Institute of Plant Genome Research (NIPGR)<br />Title : “Transcriptome and small RNA diversity analysis of developing seed contrasting rice varieties” <br />Qualification : Candidates having M.Sc./M.Tech. degree or equivalent (with minimum 60% marks) in Bioinformatics with a minimum of two years of post M.Sc./M.Tech research experience are eligible to apply.<br />No. of Post : 01<br />How to apply<br />Application should reach to Dr. Pinky Agarwal, Staff Scientist, National Institute of Plant Genome Research (NIPGR) Aruna Asaf Ali Marg, P.O. Box NO. 10531, New Delhi - 110067 on or before 30/09/2016</p>

<p>More at http://www.nipgr.res.in/careers/vacancies_latest.php#</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29210/cgview-circular-genome-viewer</guid>
	<pubDate>Mon, 19 Sep 2016 07:52:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29210/cgview-circular-genome-viewer</link>
	<title><![CDATA[CGView - Circular Genome Viewer]]></title>
	<description><![CDATA[<p>GView is a Java package used to display and navigate bacterial genomes. GView is useful for producing high-quality genome maps for use in publications and websites, or as a visualization tool in a sequence annotation pipeline. Users can interact with the genome using a powerful pan-and-zoom interface, or GView can write static images of a genome to a file. GView can draw a genome using either circular or linear layouts. For examples of some of the images GView can produce, see the <a href="https://www.gview.ca/bin/view/GView/ImageGallery">Image Gallery</a>. GView is a re-write of <a href="http://wishart.biology.ualberta.ca/cgview/" target="_top">CGView</a>, a circular genome viewer written by Paul Stothard. The goal of GView is to provide greater user interaction, and more flexibility in how the genome map is rendered. To aid with easily configuring the display of a genome, a style editor has been included to provide an intuitive, user-friendly graphical user interface for customizing genome maps. Styling attributes such as colours or fonts for the various map elements can be adjusted in real time. Customized styles can be saved for later use or for application to other genome maps using GView's <a href="https://www.gview.ca/bin/view/GViewDocumentation/GViewGSS">custom file format</a>.</p><p>Address of the bookmark: <a href="http://wishart.biology.ualberta.ca/cgview/" rel="nofollow">http://wishart.biology.ualberta.ca/cgview/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</guid>
	<pubDate>Thu, 02 Jan 2025 11:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44751/large-language-models-in-bioinformatics-transforming-data-analysis-and-interpretation</link>
	<title><![CDATA[Large Language Models in Bioinformatics: Transforming Data Analysis and Interpretation]]></title>
	<description><![CDATA[<p>The integration of artificial intelligence (AI) into bioinformatics has ushered in a new era of computational biology. Among the most transformative advancements are large language models (LLMs), such as GPT and BERT, which leverage deep learning to process and interpret vast amounts of text data. These models are reshaping bioinformatics by enhancing data analysis, hypothesis generation, and literature mining.</p><h3>Understanding Large Language Models</h3><p>LLMs are AI systems trained on extensive datasets of natural language. Their ability to model context, identify patterns, and generate coherent language has proven invaluable across domains, including bioinformatics. By fine-tuning these models on biological datasets, researchers can unlock insights into molecular biology, systems biology, and beyond.</p><h3>Key Applications of LLMs in Bioinformatics</h3><h4>1. <strong>Annotating Biological Data</strong></h4><p>Annotating genomic and proteomic data is fundamental yet labor-intensive. LLMs streamline this process by extracting functional annotations from literature and databases, predicting gene and protein functions, and providing automated insights.</p><h4>2. <strong>Mining Scientific Literature</strong></h4><p>The exponential growth of publications presents a challenge for researchers to stay updated. LLMs can process large volumes of text to extract key findings, summarize papers, and identify trends, thereby facilitating efficient literature reviews.</p><h4>3. <strong>Predicting Gene and Protein Functions</strong></h4><p>By leveraging sequence data and annotations, LLMs can predict the functions of uncharacterized genes and proteins. This capability is particularly useful for studying non-model organisms and orphan genes.</p><h4>4. <strong>Drug Discovery and Repurposing</strong></h4><p>LLMs enable pattern recognition across chemical, genomic, and clinical datasets, identifying novel drug candidates and repurposing existing drugs for new therapeutic targets. They can simulate interactions between drugs and biological molecules, accelerating the discovery pipeline.</p><h4>5. <strong>Generating Hypotheses for Research</strong></h4><p>LLMs analyze complex datasets to propose testable hypotheses. For example, they can predict protein-protein interactions, identify regulatory motifs, or model evolutionary processes in genomes.</p><h3>Advantages of LLMs in Bioinformatics</h3><ul>
<li>
<p><strong>Scalability:</strong> LLMs process massive datasets rapidly, reducing the time required for data analysis.</p>
</li>
<li>
<p><strong>Versatility:</strong> These models adapt to diverse bioinformatics tasks, from genomic annotation to network analysis.</p>
</li>
<li>
<p><strong>Contextual Insights:</strong> By synthesizing information across disparate datasets, LLMs provide integrative insights into biological systems.</p>
</li>
</ul><h3>Challenges in Applying LLMs</h3><p>Despite their promise, LLMs face limitations:</p><ul>
<li>
<p><strong>Data Quality and Bias:</strong> Inaccurate or biased datasets can affect model predictions, necessitating rigorous data curation.</p>
</li>
<li>
<p><strong>Interpretability:</strong> Understanding the decision-making process of LLMs remains a critical challenge, especially in high-stakes fields like genomics and medicine.</p>
</li>
<li>
<p><strong>Resource Intensity:</strong> Training and deploying LLMs require substantial computational power, which can limit accessibility.</p>
</li>
<li>
<p><strong>Ethical Concerns:</strong> Handling sensitive genomic data raises privacy and security issues, emphasizing the need for ethical guidelines.</p>
</li>
</ul><h3>Future Prospects</h3><p>The continued development of LLMs tailored for bioinformatics promises exciting advancements. Specialized models trained on omics data, open-access platforms, and interdisciplinary collaborations will expand the utility of LLMs. Moreover, integrating LLMs with other AI technologies, such as graph neural networks and reinforcement learning, can unlock deeper biological insights.</p><h3>Conclusion</h3><p>Large language models are revolutionizing bioinformatics by addressing longstanding challenges in data annotation, literature mining, and function prediction. Their ability to analyze complex biological datasets efficiently positions them as indispensable tools for modern research. As bioinformatics embraces AI, the synergy between LLMs and biological sciences holds the potential to unravel the complexities of life with unprecedented precision and scale.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29270/blast-ring-image-generator-brig</guid>
	<pubDate>Fri, 30 Sep 2016 09:18:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29270/blast-ring-image-generator-brig</link>
	<title><![CDATA[BLAST Ring Image Generator (BRIG)]]></title>
	<description><![CDATA[<p>BRIG is a free cross-platform (Windows/Mac/Unix) application that can display circular comparisons between a large number of genomes, with a focus on handling genome assembly data. The application is available at: <a href="http://sourceforge.net/projects/brig">http://sourceforge.net/projects/brig</a></p>
<p>If you have any questions or comments, post them on <a href="http://sourceforge.net/tracker/?group_id=328245">one of the trackers</a> on BRIG&rsquo;s SourceForge page: <a href="http://sourceforge.net/tracker/?group_id=328245">http://sourceforge.net/tracker/?group_id=328245</a>.</p>
<p>Features:</p>
<ul>
<li>Images show similarity between a central reference sequence and other sequences as concentric rings.</li>
<li>BRIG will perform all BLAST comparisons and file parsing automatically via a simple GUI.</li>
<li>Contig boundaries and read coverage can be displayed for draft genomes; customized graphs and annotations can be displayed.</li>
<li>Using a user-defined set of genes as input, BRIG can display gene presence, absence, truncation or sequence variation in a set of complete genomes, draft genomes or even raw, unassembled sequence data.</li>
<li>BRIG also accepts SAM-formatted read-mapping files enabling genomic regions present in unassembled sequence data from multiple samples to be compared simultaneously</li>
</ul><p>Address of the bookmark: <a href="http://brig.sourceforge.net/" rel="nofollow">http://brig.sourceforge.net/</a></p>]]></description>
	<dc:creator>Anjana</dc:creator>
</item>

</channel>
</rss>