Genomics for Bioinformatician

Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.

Genomics was established by Fred Sanger when he first sequenced the complete genomes of a virus and a mitochondrion. His group established techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in the 1970-1980s. A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics. A related concept is materiomics, which is defined as the study of the material properties of biological materials (e.g. hierarchical protein structures and materials, mineralized biological tissues, etc.) and their effect on the macroscopic function and failure in their biological context, linking processes, structure and properties at multiple scales through a materials science approach. The actual term 'genomics' is thought to have been coined by Dr. Tom Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) over beer at a meeting held in Maryland on the mapping of the human genome in 1986.

The outcome of almost two years of intense discussions with literally hundreds of scientists and members of the public, has three major areas of focus: Genomics to Biology, Genomics to Health, and Genomics to Society.

Genomics to Biology: 
The human genome sequence provides foundational information that now will allow development of a comprehensive catalog of all of the genome's components, determination of the function of all human genes, and deciphering of how genes and proteins work together in pathways and networks.

Genomics to Health:
Completion of the human genome sequence offers a unique opportunity to understand the role of genetic factors in health and disease, and to apply that understanding rapidly to prevention, diagnosis, and treatment. This opportunity will be realized through such genomics-based approaches as identification of genes and pathways and determining how they interact with environmental factors in health and disease, more precise prediction of disease susceptibility and drug response, early detection of illness, and development of entirely new therapeutic approaches.

Genomics to Society: 
Just as the HGP has spawned new areas of research in basic biology and in health, it has created new opportunities in exploring the ethical, legal, and social implications (ELSI) of such work. These include defining policy options regarding the use of genomic information in both medical and non-medical settings and analysis of the impact of genomics on such concepts as race, ethnicity, kinship, individual and group identity, health, disease, and "normality" for traits and behaviors.

This vision for the future of genomics is not just about the NHGRI. It encompasses the whole field of genomics, including the work of all the other Institutes and Centers at the NIH and of a number of other federal agencies. All of the NIH Institutes are already taking full advantage of the sequence and will apply its data to the better understanding of both rare and common diseases, almost all of which have a genetic component. A recent example of the way that the HGP and the knowledge and new technologies it has spawned are already facilitating science is the extremely rapid sequencing by groups in Canada and at the Centers for Disease Control and Prevention (CDC) in Atlanta of the genome of the virus that causes Severe Acute Respiratory Syndrome (SARS). The sequencing of the SARS virus genome provides insight into this new and deadly disease at a speed never before possible in science. In turn, this should lead to the rapid development of diagnostic tests and, in time, vaccines and effective treatments.

Links for the addition material available on Net

Genomes and genomics:

Bioinformatics and Genomics:

Structural genomics tutorial:

Comparative Genomics Tutorial:


Tools and resources for identifying protein families, domains and motifs

Bioinformatics Tools 
Tips, Tutorials, and Terminology for Using Selected Resources in Genome Database Guide:

A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes:

Free Online Tutorials Teach Anyone How to Use Genome Databases:

Circos to create concise, explanatory, unique and print-ready visualizations of your data:

Genomics and Comparative Genomics Learning Module:

Computational Challenges in Comparative Genomics

A Tutorial:

A Comparative Genomics Resource for Grains:

PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants:


Software for Genomics

  1. Artemis Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.
  2. Chromas It will display and prints chromatogram files from ABI automated DNA sequencers, and Staden SCF files which the analysis programs for ALF, Li-Cor and Visible Genetics OpenGene sequencers can create.
  3. Glimmer A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea.Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DN
  4. Glimmer HMM A fast and accurate gene finder based on a GHMM architecture, developed specifically for eukaryotes. It incorporates splice site models adapted from the GeneSplicer program and uses interpolated Markov models for evaluating the coding regions.
  5. Glimmer M A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programming algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The d
  6. MUMmer MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.
  7. pDRAW pDRAW32 is being developed as a free time hobby project. It is far from finished, but as it has reached a point where it could be helpful for many labs, it is now available to the scientific community.
  8. Sequin Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. It is capable of handling simple submissions that contain a single short mRNA sequence, and complex submissio
  9. Staden The Staden Package consists of a series of tools for DNA sequence preparation (pregap4), assembly (gap4), editing (gap4) and DNA/protein sequence analysis (spin).

For more software @