BOL: Related items

UpSetR Shiny App!

Jit — Fri, 14 Apr 2017 06:19:54 -0500

UpSetR generates static UpSet plots. The UpSet technique visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes.

To begin, input your data using one of the three input styles.

"File" takes a correctly formatted.csv file.
"List" takes up to 6 different lists that contain unique elements, similar to that used in the web applications BioVenn (Hulsen et al., 2008) and jvenn (Bardou et al., 2014)
"Expression" takes the input used by the venneuler R package (Wilkinson, 2015)

Address of the bookmark: https://gehlenborglab.shinyapps.io/upsetr/

CLA: Contig-Layout-Authenticator

Jit — Fri, 05 May 2017 05:58:36 -0500

To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.

Script https://sourceforge.net/projects/c-l-authenticator/files/

Address of the bookmark: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155459

Large Language Models in Bioinformatics: Transforming Data Analysis and Interpretation

LEGE — Thu, 02 Jan 2025 11:26:29 -0600

The integration of artificial intelligence (AI) into bioinformatics has ushered in a new era of computational biology. Among the most transformative advancements are large language models (LLMs), such as GPT and BERT, which leverage deep learning to process and interpret vast amounts of text data. These models are reshaping bioinformatics by enhancing data analysis, hypothesis generation, and literature mining.

Understanding Large Language Models

LLMs are AI systems trained on extensive datasets of natural language. Their ability to model context, identify patterns, and generate coherent language has proven invaluable across domains, including bioinformatics. By fine-tuning these models on biological datasets, researchers can unlock insights into molecular biology, systems biology, and beyond.

Key Applications of LLMs in Bioinformatics

1. Annotating Biological Data

Annotating genomic and proteomic data is fundamental yet labor-intensive. LLMs streamline this process by extracting functional annotations from literature and databases, predicting gene and protein functions, and providing automated insights.

2. Mining Scientific Literature

The exponential growth of publications presents a challenge for researchers to stay updated. LLMs can process large volumes of text to extract key findings, summarize papers, and identify trends, thereby facilitating efficient literature reviews.

3. Predicting Gene and Protein Functions

By leveraging sequence data and annotations, LLMs can predict the functions of uncharacterized genes and proteins. This capability is particularly useful for studying non-model organisms and orphan genes.

4. Drug Discovery and Repurposing

LLMs enable pattern recognition across chemical, genomic, and clinical datasets, identifying novel drug candidates and repurposing existing drugs for new therapeutic targets. They can simulate interactions between drugs and biological molecules, accelerating the discovery pipeline.

5. Generating Hypotheses for Research

LLMs analyze complex datasets to propose testable hypotheses. For example, they can predict protein-protein interactions, identify regulatory motifs, or model evolutionary processes in genomes.

Advantages of LLMs in Bioinformatics

Scalability: LLMs process massive datasets rapidly, reducing the time required for data analysis.
Versatility: These models adapt to diverse bioinformatics tasks, from genomic annotation to network analysis.
Contextual Insights: By synthesizing information across disparate datasets, LLMs provide integrative insights into biological systems.

Challenges in Applying LLMs

Despite their promise, LLMs face limitations:

Data Quality and Bias: Inaccurate or biased datasets can affect model predictions, necessitating rigorous data curation.
Interpretability: Understanding the decision-making process of LLMs remains a critical challenge, especially in high-stakes fields like genomics and medicine.
Resource Intensity: Training and deploying LLMs require substantial computational power, which can limit accessibility.
Ethical Concerns: Handling sensitive genomic data raises privacy and security issues, emphasizing the need for ethical guidelines.

Future Prospects

The continued development of LLMs tailored for bioinformatics promises exciting advancements. Specialized models trained on omics data, open-access platforms, and interdisciplinary collaborations will expand the utility of LLMs. Moreover, integrating LLMs with other AI technologies, such as graph neural networks and reinforcement learning, can unlock deeper biological insights.

Conclusion

Large language models are revolutionizing bioinformatics by addressing longstanding challenges in data annotation, literature mining, and function prediction. Their ability to analyze complex biological datasets efficiently positions them as indispensable tools for modern research. As bioinformatics embraces AI, the synergy between LLMs and biological sciences holds the potential to unravel the complexities of life with unprecedented precision and scale.

bpRNA: large-scale automated annotation and analysis of RNA secondary structure

Rahul Nayak — Wed, 23 May 2018 03:24:33 -0500

bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature.

The bpRNA code is written in perl and requires the Graph perl module. Several additional scripts for analysis are included. The source code is available at http://github.com/hendrixlab/bpRNA.

Address of the bookmark: http://github.com/hendrixlab/bpRNA

MAKER

Jitendra Narayan — Sun, 07 Feb 2016 15:59:24 -0600

MAKER is a portable and easily configurable genome annotation pipeline.Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.

More at http://www.yandell-lab.org/software/maker.html

Address of the bookmark: http://www.yandell-lab.org/software/maker.html

NCBI Remap

Jit — Thu, 11 Feb 2016 11:02:26 -0600

NCBI Remap. This tool is conceptually similar to liftOver in that in manages conversions between a pair of genome assemblies but it uses different methods to achieve these mappings. It is also available through a simple web interface or you can use the API for NCBI Remap.

More at http://www.ncbi.nlm.nih.gov/genome/tools/remap

API http://www.ncbi.nlm.nih.gov/genome/tools/remap/docs/api

Address of the bookmark: http://www.ncbi.nlm.nih.gov/genome/tools/remap

WEGO : simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results

BioStar — Sun, 12 Apr 2020 10:02:22 -0500

WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results. As the GO vocabulary became more and more popular, WEGO was widely adopted and used in many researches. Therefore we have updated WEGO 2.0 in 2018. Here are some changes we’ve made:
1. The limit of input file numbers was cancelled. Now the users could upload as many files as they want with one operation.
2. We have added the reference data of 9 species for users selection.
3. Besides the traditional WEGO histogram, WEGO 2.0 outputs an additional type of bar graph showing GO terms with significant gene number differences.

Address of the bookmark: http://wego.genomics.org.cn/

Bioinformatics Scientist, Production Bioinformatics @ South San Francisco, CA

Thu, 19 Aug 2021 08:45:24 -0500

wist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better evaluate and build hardened, production quality pipelines, optimize data quality, and automate lab and bioinformatics processes. Our ideal candidate is an organized problem solver with a background in developing and building novel production-quality bioinformatics tools and packages. Equally excellent communication skills and a proven ability to work independently are required.

More at https://boards.greenhouse.io/twistbioscience/jobs/3135495?gh_src=9ecc0b941us

Bpipe - a tool for running and managing bioinformatics pipelines

Radha Agarkar — Sat, 21 May 2016 22:42:16 -0500

Bpipe provides a platform for running big bioinformatics jobs that consist of a series of processing stages - known as 'pipelines'.

January 20th, 2016 - New! Bpipe 0.9.9 released!
Download latest, all
Documentation
Mailing List (Google Group)

Bpipe has been published in Bioinformatics! If you use Bpipe, please cite:

Sadedin S, Pope B & Oshlack A, Bpipe: A Tool for Running and Managing Bioinformatics Pipelines, Bioinformatics

Address of the bookmark: http://docs.bpipe.org/

MetaPred2CS

Manisha Mishra — Fri, 03 Mar 2017 05:15:07 -0600

MetaPred2CS Web server is a meta-predictor based on Support Vector Machine (SVM) that combines 6 individual sequence based protein-protein interaction prediction methods to predict prokaryotic two-component system protein-protein interactions (PPIs). The methods implemented in MetaPred2CS are 2 co-evolutionary methods: in-silico two hybrid (i2h) and mirror tree (MT) methods and 4 genomics context based methods: phylogenetic profiling (PP), gene fusion (GF), gene neighbourhood (GN) and and gene operon methods (GO).

http://metapred2cs.ibers.aber.ac.uk/

Address of the bookmark: https://github.com/martinjvickers/MetaPred2CS