BOL: Related items

NVIDIA and Arc Institute Unveil Evo 2: A Breakthrough AI for DNA Design

BioStar — Fri, 21 Feb 2025 10:39:47 -0600

NVIDIA and the Arc Institute have introduced Evo 2, a groundbreaking AI model designed to understand, predict, and generate DNA sequences. This marks a major advancement in computational biology, offering scientists an unprecedented tool to decode the genetic blueprint of life and even design entirely new biological systems.

The Power of Evo 2: AI Meets DNA

Evo 2 is the largest AI model for biology ever created, trained on an astonishing 9.3 trillion DNA "letters" (nucleotides) carefully selected from genomes spanning the entire tree of life. This massive dataset ensures that Evo 2 can recognize patterns and relationships in genetic sequences at an unparalleled scale.

For the first time, scientists can design DNA with AI, moving beyond simple sequence analysis to active DNA generation. Evo 2 enables researchers to predict, modify, and even create entire genetic sequences, opening new possibilities in medicine, agriculture, and synthetic biology.

Decoding the Dark Genome

One of the biggest challenges in genetics is understanding the non-coding regions of DNA—vast stretches of the genome that do not code for proteins but play crucial roles in regulating gene expression. These regions control when and how genes are activated, influencing everything from development to disease.

Evo 2 is designed to decode these non-coding elements, helping researchers uncover their functions and use this knowledge to develop gene-based therapies, synthetic life forms, and precision agriculture solutions.

From Reading DNA to Writing It

To put Evo 2’s impact into perspective:

Previous AI models could "read" DNA like a book, analyzing genetic sequences and identifying patterns.
Evo 2 can "write" entirely new DNA, designing functional genes, chromosomes, and even full genomes from scratch.

This means scientists can now engineer biological systems with AI, designing new proteins, metabolic pathways, and genetic circuits to address real-world challenges.

A Step Toward Generative Biology

The Arc Institute describes Evo 2 as a major step toward "generative biology"—a revolutionary approach where AI is used to create novel biological structures rather than just analyzing existing ones. This could lead to breakthroughs such as:

New medicines: AI-generated enzymes and proteins tailored for targeted therapies.
Disease-resistant crops: Genetically optimized plants for higher yield and climate resilience.
Synthetic organisms: Custom-designed microbes for bioremediation, biofuel production, and industrial applications.

An Open-Source Revolution

Unlike many proprietary AI models, Evo 2 is open source, making its capabilities accessible to researchers worldwide. This democratization of AI-driven biology means that scientists from different disciplines can collaborate, experiment, and innovate, accelerating discoveries in genetic engineering and synthetic biology.

With Evo 2, the boundaries of what’s possible in DNA design, genetic engineering, and biological innovation are being redrawn. The future of life sciences is no longer just about understanding life’s code—it’s about writing it.

PANNZER: a fully automated service for functional annotation of prokaryotic and eukaryotic proteins of unknown function.

BioStar — Thu, 13 Aug 2020 09:57:24 -0500

PANNZER (Protein ANNotation with Z-scoRE) is a fully automated service for functional annotation of prokaryotic and eukaryotic proteins of unknown function.

PANNZER (Protein ANNotation with Z-scoRE) is a fully automated service for functional annotation of prokaryotic and eukaryotic proteins of unknown function. The tool is designed to predict the functional description (DE) and GO classes.

PANNZER2 processes bacterial proteomes in minutes and eukaryotic proteomes in an hour. You can use AAI-profiler to summarize a proteome's species neighbors and reveal taxonomic identity or contamination.

Address of the bookmark: http://ekhidna2.biocenter.helsinki.fi/sanspanz/#

High Density Sheep SNP Genotyping Chip released!!!

Rahul Agarwal — Tue, 03 Sep 2013 13:58:04 -0500

If you are working on Sheep genomics then there is a good news for you. FarmIQ in conjunction with Illumina and the International Sheep Genomics Consortium (ISGC) are today announcing completion of the “Ovine Infinium® HD SNP BeadChip”, a high definition SNP chip for ship genome. The OvineSNP50 BeadChip features over 54,241 evenly spaced probes that target SNPs, offering more than sufficient SNP density for genome-wide association studies and other applications such as genome-wide selection, determination of genetic merit, identification of quantitative trait loci, and comparative genetic studies.

The BeadChip was developed in collaboration with leading ovine researchers from AgResearch, Baylor UCSC, CSIRO, and the USDA as part of the International Sheep Genomics Consortium. It features over 54,241 evenly spaced probes that target single nucleotide polymorphisms (SNPs). More than 18,000 of these markers were discovered through sequencing reduced representation libraries with the Illumina Genome Analyzer IIx. A set of 600 SNPs were identified by BAC end sequencing and validated with Illumina GoldenGate Genotyping Assays over 403 animals from 23 breeds. The remaining SNPs were derived from the draft ovine genome.

Summer 2016

Sun, 21 Feb 2016 06:17:55 -0600

REU at Fordham University- Summer 2016

An NSF-funded REU to study Y-chromosome diversity and sex-biased dispersal in wild brown rats (Rattus norvegicus) is available in the Munshi-South Lab at Fordham University. Our lab is currently investigating rat evolution at scales ranging from landscape genetics of individual cities to global patterns of diversity. Development of resources for investigating Y-chromosome diversity will support many of these studies. The REU student will work with the lab to bioinformatically identify Y-chromosome SNPs, design SNPtype assays,
extract DNA, genotype samples, and analyze data.

We seek applicants interested in bioinformatics, evolutionary biology, and related disciplines. Applicants must have taken a college-level genetics course. This REU will require attention to detail, reliability, independence, and critical thinking.

This position is based at Fordham University's field station, the Louis Calder Center, in Armonk, NY. The Calder Center is located approximately 25 miles north of New York City in a protected woodland area. Housing
will be provided at the Calder Center for the duration of the REU (May 23 to Aug 12, 2016). Additionally, the student will receive a $6,000 stipend. The selected student will participate in professional development activities through the Calder Centers REU program, including presentation of results at a research colloquium at the end of the summer.

To apply, please send a one page personal statement about your scientific interests and how this REU will support your professional goals, unofficial transcripts including a list of Spring 2016 courses, and names of two professional references (including title, address, phone number, and email address) as a single pdf (with your last name in the file name) to Dr. Jason Munshi-South (jmunshisouth@fordham.edu).

Applications are due March 4th, 2016.

Jason Munshi-South

MALVA: Genotyping by Mapping-free ALlele Detection of Known VAriants

Jit — Tue, 28 Jan 2020 03:39:22 -0600

MALVA is able to genotype multi-allelic SNPs and indels without mapping reads

MALVA calls correctly more indels than the most widely adopted genotyping pipelines

Mapping-free approaches are as accurate as alignment-based ones, while being faster

More at https://www.sciencedirect.com/science/article/pii/S2589004219302366

https://www.sciencedirect.com/science/article/pii/S2589004219302366

Address of the bookmark: https://github.com/AlgoLab/malva

Perl in a day !!

Jitendra Narayan — Sat, 10 Aug 2013 21:14:03 -0500

This pdf based tutorial in good resource to understand the basic of Perl in a day

http://ritg.med.harvard.edu/training/perl/RC_Perl_Intro.pdf

R and Bioconductor Tutorial

Jitendra Narayan — Fri, 23 Aug 2013 08:23:59 -0500

This tutorial is intended to introduce users quickly to the basics of R, focusing on a few common tasks that biologists need to perform some basic analysis: load a table, plot some graphs, and perform some basic statistics. More extensive tutorials can be found on the project website and via bioconductor (not covered here).

You can add more tutorial links in comments if found new pages.

Address of the bookmark: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual

A Brief Bioinformatics Tutorial

Jit — Wed, 21 May 2014 12:50:09 -0500

This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.

The tutorial is divided in three main parts:

In the Sequence part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.
The Structure part is about molecular visualization, homology modeling and structural domain prediction.
In the Function part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...

During all the three parts, we will use the S. cerevisiae VPS36 protein as an example.

Address of the bookmark: http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html

Pimp your brain: Bioinformatics

Wed, 20 Aug 2014 22:09:21 -0500

Jan Lisec from the Max Planck Institute of Molecular Plant Physiology explains, in this "pimp your brain" episode, what bioinformatics is and why bioinformatics is so important and indispensable for biological research. In the video serial "Pimp your brain" scientists from the Max Planck Institute of Molecular Plant Physiology describe their research. More videos from the 'Pimp your brain' serial are available on www.youtube.com/playlist?list=PL-l9VItC9Gn2Ur2Xj6PTOAkjLUlVPbIOO More videos are available on www.mpimp-golm.mpg.de

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save