PROJECT DESCRIPTION

Future insights into basic plant biological processes and the ability to genetically manipulate plants for agronomic improvement will depend, to a large extent, on our ability to identify genes controlling fundamental developmental and metabolic processes. In a number of plant species, genes controlling a wide range of agronomically important traits have been identified by mutational analysis and placed on classical genetic linkage maps. A major stumbling block to continued progress in plant biology is the difficulty involved in the isolation of these genes. In most cases, while the mutant phenotypes and genetic map locations are known, virtually nothing is known about the nature of the gene products. The isolation of these genes relies solely on their mutant phenotype and genetic map position. In this context, clone-based physical maps have greatly facilitated the identification and cloning of genes important both to human health and to agriculture. In addition, physical maps have revealed significant information about the complex organization and evolution of genomes and the molecular structure of chromosomes. Because most plant species have relatively large genomes with a high content of repeated sequences, cloning by chromosome walking is difficult. Thus, positional cloning projects and the construction of genomic physical maps in plants have centered largely on the reference plant Arabidopsis thaliana (15). Plant geneticists adopted Arabidopsis as a model organism some years ago because of its small diploid genome, low repetitive DNA content, and rapid reproductive cycle (35). The Arabidopsis genome, at ~120 megabases (Mb), is among the smallest known plant genomes and is the obvious choice for initiating a plant genome project. Such a project was indeed initiated in 1996 by the establishment of the Arabidopsis Genome Initiative (AGI), an international effort for sequencing the Arabidopsis by the end of 2000 (3,12). This effort has produced 50 Mb of highly accurate genomic sequence and is ahead of schedule. It is anticipated that the entire genome will be completely sequenced by the end of 2000.

1. Development of Molecular Resources for Plant Biology Derived from the Sequencing of the Arabidopsis Genome

The Arabidopsis Genome Initiative (AGI) will identify all the genes (~25,000) necessary for the growth and development of a plant. In order for this vast structural resource to be useful in the future for basic studies aiming at the understanding of how plants grow and function necessitates the development of new resources derived from the genomic sequence. These new resources include the isolation of full-length cDNAs for all the genes in the genome. Having a full-length cDNA for each gene allows you to study the function of each protein encoded by the gene using molecular/biochemical approaches. It also allows the precise determination of the molecular mass of each protein encoded by the genome. Knowing the exact molecular weight will facilitate the identification of the gene for a protein isoform purified by conventional protein purification procedures followed by mass spectrometry. In addition, the isolation of full-length cDNAs for all the genes in the genome will allow the construction of one cDNA library where each gene is represented at equimolar amounts. Consequently, no more cDNA libraries will be constructed in the future by individual labs for isolating their favorite cDNA. Furthermore, the entire genome will be experimentally annotated by the isolation of full-length cDNAs.
In order for such a resource to be developed it requires a technology for monitoring global expression patterns of the Arabidopsis genome during plant growth and development. Such a technology, known as DNA-array technology (43), is currently emerging and can be used for monitoring global expression patterns as well as for defining the boundaries of the various transcriptional units in the genome. Defining the boundaries allows you to construct a full-length cDNA clone from the mRNA encoded by each unit. In this proposal, the SPP consortium (Stanford-Penn-PGEC) has two goals. First, to develop an inexpensive Arabidopsis Affymetrix chip to be used by the Plant Biology community for monitoring gene expression and second, to isolate full-length cDNAs for 15,000 Arabidopsis genes using the Affymetrix chip as a tool to achieve this goal. Below we want to offer some technical discussion associated with these two goals.

2. DNA Array-Based Methods for Monitoring Gene Expression and Coding Region Identification: Technical Details

New genomic technologies allow the monitoring of RNA expression levels for all of the coding regions in a genome. These DNA array-based methods have made great advances in recent years (43). There are three basic technologies that are currently in use:

a. Nylon Microarrays. This is, by far, the simplest technology and has been used for considerable time. Recent advances in mechanical spotting techniques have increased the utility of the nylon methods. Commercially available equipment such as the Biomek 1000 or 2000 can be used for spotting. The use of two radioactive labels has also greatly improved this methodology. However, while this is a commonly used technology, it has a number of disadvantages. One is that nylon is not dimensionally stable. Several years ago we developed software that would allow the recognition of DNA spots and computationally corrected for changes in size of the nylon. We were also able to quantitatively measure levels of hybridization in high density spotted arrays. During these studies, we found that the data quality was far less than desirable due to: 1) the variability in the amount of DNA that adheres to the nylon surface; 2) the variable degree in hybridizability of the adhered DNA; and 3) the variable background across the nylon surface greatly increased the errors in measurement and the quantitative quality of the data. Despite these disadvantages, the method still has utility for low-density screens. The lower quality of quantitative data may not be relevant for qualitative screening to look for a few positive clones. If the resulting positive clones are confirmed by a subsequent analysis, the nylon filter technology may continue to be a suitable technology for such qualitative analysis. However, the general lower quality of quantitative data renders nylon filter technology unsuitable for whole genome scans. The considerable expense and effort that is required to conduct these scans does not justify the small savings that can result in using nylon filters. Furthermore, the frequent mistakes that can occur because of the nylon filters greatly increase the cost of this technology. Also, we feel that the lower quality of the quantitative data does not justify it being used in a central database to provide information to other researchers. For these reasons, we do not propose to use this method for our experiments.

The problems associated with this technology led us to initiate, in 1990, an extensive collaboration with scientists at Affymetrix even before it was founded for exploring other methods of producing arrays. We were successful in producing the first high density oligonucleotide array using yeast gene sequence and later used cDNAs arrayed on glass that could be used to accurately quantitate RNA expression levels using genes from Arabidopsis (43). Scientists at Affymetrix further improved the arrays using photolytographic techniques and we have used the Affymetrix arrays for analyzing yeast gene expression (54). Since we have had extensive experience with all three technologies and have made numerous contributions to the development of all three of them, we feel we are in a good position to compare the three technologies. (9,17,30,37,44,47,53)

b. Glass Microarrays. Glass microarrays give superior data compared to the nylon filters. The technique uses two-color fluorescence to help improve the quality of the data. The background fluorescence is quite low and uniform. The amount of DNA that sticks at each spot, however, is variable. This variability is in part compensated by the use of two-color fluorescence. The two colors are used to compare the levels of RNA in two different expression states. For example, with and without hormone. Mark Schena, a postdoctoral fellow in the laboratory of Ron Davis, was the first to develop the glass microarray technology (43) and has continued to make improvements. In comparing the RNA levels from two different expression states we found that the original fluorescent labels were not equally incorporated. Therefore, we have changed the dye chemistry to use Cy3 and Cy5 which required changing the two lasers and the band pass filters. Although this pair of dyes works considerably better than the original dye pair, there are, occasionally observed, differential incorporation of the two dyes. Therefore, the two RNA preparations must be labeled twice, once with each dye. The hybridizations are conducted simultaneously and the ratios between the two fluorescence are simultaneously determined. By the use of the ratios of the two dyes, the variability and the amount of DNA spotting has less impact on the quantitation. However, because of the variable DNA amounts at each spot, it is possible to hybridize to saturation for one sample at one spot, but not for the other samples. Such saturation can give misleading quantitation and it is difficult to control for such mistakes. In addition, this technology still suffers from the requirement of having a large inventory of DNAs to be arrayed. This inventory must be maintained without cross contamination degradation or mislabeling. Although many researchers frequently trivialize these problems they are quite significant if the number of samples to be arrayed is large such as in Arabidopsis. Maintaining this infrastructure generates a considerable amount of hidden cost and the number of errors that can occur decreases the value of this technology. Since high quality arrays demand maintenance of infrastructure, glass micro-array technology is valuable only if someone generates a very large number of arrays. In addition, because of the shortcomings inherent in the method there is a need to develop methods for evaluating array quality and whether the appropriate DNA sequences were arrayed at the designated position. Such required quality control increases the cost.

c. Photolythographic Oligonucleotide Arrays. This is a new and extremely complex technology (14). The infrastructure required to produce microarrays with this technology, currently, far exceeds that which is reasonable to be maintained by an academic laboratory. However, the recent commercial availability of oligonucleotide chips from Affymetrix using this technology vastly improves the utility of this technology. The current general approach for monitoring gene expression using this technology is to synthesize approximately 20 oligonucleotides for each coding region. The oligonucleotides are generally 25 nucleotides long. Because considerable standardization is used in the manufacturing of the chip, its variability is extremely low. Since the entire surface is covered with oligonucleotides at extremely high density, the background fluorescence is also extremely low and uniform. In addition, since the oligonucleotides are synthesized with a long linker arm, each base of the oligonucleotide is equally accessible to hybridization and the kinetics of hybridization are equivalent of those in solution. We prefer to exploit the use of this technology because: 1) the chips are extremely reproducible; 2) the same chip can be made available to all researchers; and 3) the data quality obtained from these chips is, in our experience, considerably better than the two previously described technologies.
Because the oligonucleotide arrays use sequences of 25 nts long, they show considerably lower level hybridization if there is a single base pair mismatch (except for the ends of the sequence). This aspect of the technology makes it very suitable for monitoring the expression level of genes that have other closely related sequences in the genome. When the entire genome sequence is known, one can readily design specific probes for measuring the expression of each gene based on the decreased hybridization caused by the mismatch. Unfortunately, even a few mismatches are not sufficient to block the hybridization to a target sequence that is at extremely high concentration, such as rRNA in preparations of total RNA. However, using oligonucleotide probe sequences designed to have little similarity to the known abundant RNA sequences can circumvent this difficulty. Since the chips are extremely reproducible and the amount of oligonucleotide at each location is the same, there is no need for a two-color fluorescence. This allows the use of a single probe with extremely high fluorescent efficiency. The currently used labeling technique utilizes biotinylation followed by streptavadin coupled to phycoerythrin (54). This method increases the sensitivity of detection by approximately one order of magnitude. The increase in sensitivity is quite important in analyzing RNA levels from the entire genome. Many important genes are expressed at extremely low levels that can be readily detected by the Affymetrix technology. Therefore, at this time, we believe this technology is, by far, the preferred technology for the generation of databases to be used by the scientific community. Because all investigators will have access to exactly the same chip and the hybridization results from different laboratories can be directly compared, we believe this method will be the dominant technology of the future. There has been some concern among researchers about the cost of the Affymetrix chip. This concern is not justified because for the production of glass slide microarrays one must maintain considerable costly infrastructure that is not required for use of the Affymetrix chip. Frequently, scientists compare the cost of producing a successful microarray to the purchase of Affymetrix array. However, they usually don't add the cost of microarray failures, the cost of making and maintaining the large number of clones, the student's time in manufacturing arrays and the cost of quality control. All of these costs are part of the purchase price of an Affymetrix array. Further, more in our collaboration with Affymetrix, we will work with them in designing a new chip format (see below) that should reduce the cost about 10 fold and thus be considerably cheaper than what is possible for glass slide microarrays. Furthermore, the generation of general useable databases with the Affymetrix chip can be achieved without excessive duplication of experimental results.


© SIGnAL 2001-2021	\| Home \| About Us \| T-DNA Express \| Transcriptome \| Methylome \| RiceGE \| Microarray \| cDNA \| T-DNA \| T-DNA Genotyping \| ATGC \| Contact Us \|