A. Introduction and Relevance Future insights into basic plant biological processes and the ability to genetically manipulate plants for agronomic improvement will depend, to a large extent, on our ability to identify genes controlling fundamental developmental and metabolic processes. In a number of plant species, genes controlling a wide range of agronomically important traits have been identified by mutational analysis and placed on classical genetic linkage maps. A major stumbling block to continued progress in plant biology is the difficulty involved in the isolation of these genes. In most cases, while the mutant phenotypes and genetic map locations are known, virtually nothing is known about the nature of the gene products. The isolation of these genes relies solely on their mutant phenotype and genetic map position. In this context, clone-based physical maps have greatly facilitated the identification and cloning of genes important both to human health and to agriculture. In addition, physical maps have revealed significant information about the complex organization and evolution of genomes and the molecular structure of chromosomes. Because most plant species have relatively large genomes with a high content of repeated sequences, cloning by chromosome walking is difficult. Thus, positional cloning projects and the construction of genomic physical maps in plants have centered largely on the reference plant Arabidopsis thaliana (15). Plant geneticists adopted Arabidopsis as a model organism some years ago because of its small diploid genome, low repetitive DNA content, and rapid reproductive cycle (35). The Arabidopsis genome, at ~120 megabases (Mb), is among the smallest known plant genomes and is the obvious choice for initiating a plant genome project. Such a project was indeed initiated in 1996 by the establishment of the Arabidopsis Genome Initiative (AGI), an international effort for sequencing the Arabidopsis by the end of 2000 (3,12). This effort has produced 50 Mb of highly accurate genomic sequence and is ahead of schedule. It is anticipated that the entire genome will be completely sequenced by the end of 2000. 1. Development of Molecular Resources for Plant Biology Derived from the Sequencing of the Arabidopsis Genome The Arabidopsis Genome Initiative (AGI) will identify all the genes (~25,000) necessary for the
growth and development of a plant. In order for this vast structural resource to be useful in the future for
basic studies aiming at the understanding of how plants grow and function necessitates the development
of new resources derived from the genomic sequence. These new resources include the isolation of full-length
cDNAs for all the genes in the genome. Having a full-length cDNA for each gene allows you to
study the function of each protein encoded by the gene using molecular/biochemical approaches. It also
allows the precise determination of the molecular mass of each protein encoded by the genome. Knowing
the exact molecular weight will facilitate the identification of the gene for a protein isoform purified by
conventional protein purification procedures followed by mass spectrometry. In addition, the isolation of
full-length cDNAs for all the genes in the genome will allow the construction of one cDNA library where
each gene is represented at equimolar amounts. Consequently, no more cDNA libraries will be
constructed in the future by individual labs for isolating their favorite cDNA. Furthermore, the entire
genome will be experimentally annotated by the isolation of full-length cDNAs.
2. DNA Array-Based Methods for Monitoring Gene Expression and Coding Region Identification: Technical Details New genomic technologies allow the monitoring of RNA expression levels for all of the coding regions in a genome. These DNA array-based methods have made great advances in recent years (43). There are three basic technologies that are currently in use: a. Nylon Microarrays. This is, by far, the simplest technology and has been used for considerable time. Recent advances in mechanical spotting techniques have increased the utility of the nylon methods. Commercially available equipment such as the Biomek 1000 or 2000 can be used for spotting. The use of two radioactive labels has also greatly improved this methodology. However, while this is a commonly used technology, it has a number of disadvantages. One is that nylon is not dimensionally stable. Several years ago we developed software that would allow the recognition of DNA spots and computationally corrected for changes in size of the nylon. We were also able to quantitatively measure levels of hybridization in high density spotted arrays. During these studies, we found that the data quality was far less than desirable due to: 1) the variability in the amount of DNA that adheres to the nylon surface; 2) the variable degree in hybridizability of the adhered DNA; and 3) the variable background across the nylon surface greatly increased the errors in measurement and the quantitative quality of the data. Despite these disadvantages, the method still has utility for low-density screens. The lower quality of quantitative data may not be relevant for qualitative screening to look for a few positive clones. If the resulting positive clones are confirmed by a subsequent analysis, the nylon filter technology may continue to be a suitable technology for such qualitative analysis. However, the general lower quality of quantitative data renders nylon filter technology unsuitable for whole genome scans. The considerable expense and effort that is required to conduct these scans does not justify the small savings that can result in using nylon filters. Furthermore, the frequent mistakes that can occur because of the nylon filters greatly increase the cost of this technology. Also, we feel that the lower quality of the quantitative data does not justify it being used in a central database to provide information to other researchers. For these reasons, we do not propose to use this method for our experiments. The problems associated with this technology led us to initiate, in 1990, an extensive collaboration with scientists at Affymetrix even before it was founded for exploring other methods of producing arrays. We were successful in producing the first high density oligonucleotide array using yeast gene sequence and later used cDNAs arrayed on glass that could be used to accurately quantitate RNA expression levels using genes from Arabidopsis (43). Scientists at Affymetrix further improved the arrays using photolytographic techniques and we have used the Affymetrix arrays for analyzing yeast gene expression (54). Since we have had extensive experience with all three technologies and have made numerous contributions to the development of all three of them, we feel we are in a good position to compare the three technologies. (9,17,30,37,44,47,53) b. Glass Microarrays. Glass microarrays give superior data compared to the nylon filters. The technique uses two-color fluorescence to help improve the quality of the data. The background fluorescence is quite low and uniform. The amount of DNA that sticks at each spot, however, is variable. This variability is in part compensated by the use of two-color fluorescence. The two colors are used to compare the levels of RNA in two different expression states. For example, with and without hormone. Mark Schena, a postdoctoral fellow in the laboratory of Ron Davis, was the first to develop the glass microarray technology (43) and has continued to make improvements. In comparing the RNA levels from two different expression states we found that the original fluorescent labels were not equally incorporated. Therefore, we have changed the dye chemistry to use Cy3 and Cy5 which required changing the two lasers and the band pass filters. Although this pair of dyes works considerably better than the original dye pair, there are, occasionally observed, differential incorporation of the two dyes. Therefore, the two RNA preparations must be labeled twice, once with each dye. The hybridizations are conducted simultaneously and the ratios between the two fluorescence are simultaneously determined. By the use of the ratios of the two dyes, the variability and the amount of DNA spotting has less impact on the quantitation. However, because of the variable DNA amounts at each spot, it is possible to hybridize to saturation for one sample at one spot, but not for the other samples. Such saturation can give misleading quantitation and it is difficult to control for such mistakes. In addition, this technology still suffers from the requirement of having a large inventory of DNAs to be arrayed. This inventory must be maintained without cross contamination degradation or mislabeling. Although many researchers frequently trivialize these problems they are quite significant if the number of samples to be arrayed is large such as in Arabidopsis. Maintaining this infrastructure generates a considerable amount of hidden cost and the number of errors that can occur decreases the value of this technology. Since high quality arrays demand maintenance of infrastructure, glass micro-array technology is valuable only if someone generates a very large number of arrays. In addition, because of the shortcomings inherent in the method there is a need to develop methods for evaluating array quality and whether the appropriate DNA sequences were arrayed at the designated position. Such required quality control increases the cost. c. Photolythographic Oligonucleotide Arrays. This is a new and extremely complex technology (14).
The infrastructure required to produce microarrays with this technology, currently, far exceeds that which
is reasonable to be maintained by an academic laboratory. However, the recent commercial availability
of oligonucleotide chips from Affymetrix using this technology vastly improves the utility of this
technology. The current general approach for monitoring gene expression using this technology is to
synthesize approximately 20 oligonucleotides for each coding region. The oligonucleotides are generally
25 nucleotides long. Because considerable standardization is used in the manufacturing of the chip, its
variability is extremely low. Since the entire surface is covered with oligonucleotides at extremely high
density, the background fluorescence is also extremely low and uniform. In addition, since the
oligonucleotides are synthesized with a long linker arm, each base of the oligonucleotide is equally
accessible to hybridization and the kinetics of hybridization are equivalent of those in solution. We prefer
to exploit the use of this technology because: 1) the chips are extremely reproducible; 2) the same chip
can be made available to all researchers; and 3) the data quality obtained from these chips is, in our
experience, considerably better than the two previously described technologies.
|
© SIGnAL 2001-2021 |
|