Simple Sequence Length Polymorphisms

Assignment of 30 microsatellite loci to the linkage map of Arabidopsis

Abstract

Thirty microsatellite loci were assigned to the Arabidopsis linkage 
map. The existence of microsatellite sequences in the Arabidopsis 
genome was confirmed by searching the EMBL and GenBank 
databases for di- and mono-nucleotide tracts. Initially, primers were 
synthesized flanking an (AT)n repeat in the intron of the gene 
encoding basic chitinase and an (AG)n repeat in the 5' untranslated 
region of the vacuolar ATPase 57 kd nucleotide binding subunit 
cDNA and these were subsequently found to detect polymorphisms 
between different Arabidopsis ecotypes by the polymerase chain 
reaction (PCR). After demonstrating the presence of microsatellites in 
Arabidopsis and their utility for genetic mapping, systematic 
screening for (CA)n and (GA)n sequences was carried out on marker-
selected plasmid libraries and a small-insert genomic library in 
lambda ZapII using poly (dA.dC)/ poly (dG.dT) and poly (dA.dG)/ 
poly (dC.dT) as probes. Clones hybridizing to these probes were 
sequenced and PCR primers flanking the repeats were selected using 
the PRIMER program (Whitehead Institute). PCR was carried out on 
the ecoypes Columbia and Landsberg erecta, the parental strains of a 
set of recombinant inbred lines, in order to look for useful 
polymorphisms. Surprisingly, of 18 (CA)n repeats (n>13), only one 
was polymorphic. In contrast, 25 out of 30 (GA)n repeats, 2 out of 3 
(AT)n repeats and 2 out of 4 (A)n repeats were polymorphic. The 
majority of the (CA)n repeats were complex, with adjacent short di-, 
tri- or tetra-nucleotide repeats, whereas most of the (GA)n, (TA)n 
and (A)n repeats were simple. The (CA)n repeats were also 
refractory to PCR analysis, requiring extensive optimization of PCR 
conditions, whereas the other repeat classes were mostly amplified 
with a single set of standard conditions. Where polymorphisms were 
detected, genomic DNAs from subsets of 48 or 96 recombinant inbred 
lines were amplified and the loci were placed on the map by 
comparison of the resulting strain distribution patterns (SDP) with 
SDPs of an existing set of restriction fragment length polymorphisms 
(RFLPs). Chromosomal assignments were made using the program R.I. 
Plant Manager (K.F. Manly and R.W. Elliot (1991) Mammalian Genome 
1: 123) and two-, three- and multipoint linkage analysis were 
carried out using the program MAPMAKER 3.0 (Lander et al (1987) 
Genomics 1: 174; Lincoln et al (1992) Whitehead Inst. Tech. Report 
3rd ed.). As estimated by plaque hybridization, (GA)n and (CA)n 
repeats are relatively abundant in Arabidopsis, although much less 
so than in mammalian genomes, with, on average, one repeat every 
240 and 430 kilobase pairs respectively.

Introduction 

	Genetic mapping in mammals has undergone a transformation 
since the discovery of simple sequence length polymorphisms 
(SSLPs) (Weber and May, 1989, Litt and Luty, 1989, Tautz, 1989) 
and their exploitation as linkage markers (Reviewed by Hearne et al; 
Human linkage map - Nature genome issue; Mouse paper, Rat paper). 
The many benefits of SSLPs should apply equally to plant studies 
where there is also a need for abundant, highly informative, 
randomly distributed markers that can be assayed by the 
polymerase chain reaction (PCR) and distributed between 
laboratories as primer sequences. The adoption of Arabidopsis 
thaliana as a model system for plant genetics and molecular biology 
makes it desirable to have a dense linkage map of broad utility for 
this organism. A linkage map of SSLPs would have three obvious 
uses in Arabidopsis. 
	The first of these would be the rapid mapping of mutations, 
which is currently carried out using classical markers, restriction 
fragment length polymorphisms (Chang et al, 1988; Nam et al, 1989) 
or random amplified polymorphic DNAs (Williams et al, 1990). 
Classical markers are simple to use and require no use of molecular 
biology but can suffer from ambiguous scoring and interference 
between the marker phenotype and the phenotype to be mapped. In 
addition, only a few markers can be reliably followed in a single 
cross, meaning that many crosses have to be made to arrive at a 
location for the gene of interest. RAPDs are easily generated, simple 
to score and their use is amenable to automation, but they are 
generally dominant in nature meaning that they cannot be used in 
the F2 or backcross populations that are commonly used for 
mapping. For these reasons, RFLPs and related codominant cleaved 
amplified polymorphic sequences (CAPS, Konieczny and Ausubel, 
1993) are more commonly used. Unlike with RFLPs, mapping with 
SSLPs can be accomplished with small preparations of miniprep DNA 
made from single seedlings or leaf pieces, and polymorphisms are 
visualized by electrophoresis rather than blotting and hybridization. 
DNA preparation, PCR, analysis of the amplification products, and 
determination of map position can be accomplished in two days. 
CAPS are a logical extension of RFLPs that use PCR technology but 
their generation requires the prior generation of an RFLP and the 
complete sequence of the RFLP probe. For these reasons they have so 
far been limited to cloned genes. The second use of an SSLP map 
would be as an ordered set of sequence tagged sites (STSs) for 
construction of a physical map by STS content mapping (Olson et al, 
1989). The assembly of yeast artificial chromosomes (YACs) into 
contiguous physical maps can be complicated by false positive and 
negative results, by the chimaeric nature of some YACs and by STSs 
that detect sequences at multiple locations in the genome. Prior 
knowledge of the relative order of the STSs provides a means of 
detecting some of these errors therefore it would be desirable to 
have high confidence in this linkage order.  The small amounts of 
template DNA required and the simple nature of the PCR assay 
makes this feasable through scoring large mapping populations. 
Thirdly, their multiple alleles and probable selective neutrality make 
these ideal markers for population and evolutionary studies. 
	The existence of microsatellite repeat sequences has been 
shown in several plant species, beginning with Beckman and Soller 
(1989) who by database searching showed the existence of (AT)n, 
(GA)n and (CG)n repeats in potato, and Condit and Hubbell, (1991), 
who estimated the abundance of (CA)n and (GA)n repeats in corn and 
in four species of tropical trees. Akkaya et al (1992) demonstated 
that (AT)n and (ATT)n repeats are polymorphic in soybean, and 
mapped the first microsatellite loci in a plant species. Lagercrantz et 
al (1993) estimated the frequency of plant microsatellite sequences 
in the EMBL database, showing that these elements are less frequent 
in plant genomes than in mammals, with, on average, one repeat 
longer than 20 bp every 29 kb, compared to a similar figure of 6 kb 
in mammals. The most abundant plant microsatellite was found to be 
(A)n, followed by (AT)n then (AG)n, with (CA)n repeats being 
relatively scarce compared to mammalian genomes, (Lagercrantz, 
1993).
	In this study we investigate the utility of microsatellites as 
tools for genetic mapping in Arabidopsis thaliana , estimate the 
abundance of (CA)n and (GA)n sequences in the genome, assign 30 
microsatellites of various types to the linkage map and provide 
polymorphism data for these 30 repeats in six ecotypes.

Materials and Methods

Identification and isolation of microsatellites

Database search. To identify microsatellites in previously sequenced 
Arabidopsis DNA, the GenBank (release 76.0) and EMBL (release 
23.0) nucleic acid databases  were searched using 20 nucleotide 
queries corresponding to all possible di- and mono-nucleotides. 
Searches were carried out on a Sun Sparc2 workstation by FASTA 
(Pearson and Lipman, 1988) under the GCG package (Genetics 
Computer Group, 1991) and by regular expressions as part of DNA 
workbench, an interactive DNA and protein analysis program 
(Tisdall, 1993). 

Construction of plasmid library  5 ug of genomic DNA of the Columbia 
(Col-0) ecotype was digested to completion with AluI, RsaI, TaqI and 
EcoRV in 1X KGB (potassium glutamate buffer, Sambrook et al, 1988) 
and the resulting fragments were rendered blunt by treatment with 
the Klenow fragment of E. coli DNA polymerase I. After 
phenol/chloroform extraction and ethanol precipitation, the DNA was 
separated on 2% agarose and the 200-500 base pairs (bp) fraction 
(representing 15-25% of the genome) was purified with GlasPac 
(National Scientific). In two subsequent steps, cohesive NotI/EcoRI 
adaptors were ligated on and the 5' ends were phosphorylated by T4 
polynucleotide kinase. The DNA was separated from excess adaptors 
by chromatography through a Sephacryl S-300 cDNA spun column 
(Pharmacia) according to the manufacturer's protocol. To remove any 
remaining adaptors, the DNA was run out for a short distance into a 
2% agarose gel and purified a second time with GlasPac. The inserts 
were ligated to EcoRI treated and de-phosphorylated pBluescript KS+ 
and portions of the ligation reactions were introduced into E.coli 
strain CJ236 (dut-1, ung-1, thi-1, relA1; pCJ105 (Cmr)) by 
electroporation and plated on LB plates containing ampicillin. 
Approximately 100 000 colonies were pooled, supended in 10 ml LB 
broth containing 7% dimethyl sulfoxide (DMSO), frozen in 200 ul 
aliquots in liquid nitrogen and stored at -80C. 

Construction of marker selected libraries. (CA)n and (GA)n marker 
selected libraries were constructed essentially according to Ostrander 
et al (1992) as follows: Single stranded phage were prepared by 
inoculating 2 ml of 2X YT broth containing ampicillin with 1 ul of the 
pooled bacteria, super-infecting with the helper phage VCSM13 and 
selecting for infected bacteria by kanamycin selection during 
overnight incubation. The uracylated single stranded DNA (ssDNA) 
was purified from culture supernatant by standard methods 
(Sambrook et al, 1988). Approximately 500 ng of uracylated ssDNA 
was mixed with 5 pmol of the phosphorylated oligonucleotide (CT)10 
or (GT)10 in a 100 ul reaction mixture containing 1X Taq polymerase 
buffer (Promega) and 200 uM deoxy-ribonucleotides. This mixture 
was heated to 95C for 5 minutes, cooled to 60C for 2 minutes, during 
which 1 unit of Taq polymerase was added, and then incubated at 
72C for 30 minutes. After phenol/chloroform extraction, ethanol 
precipitation and drying, the DNA was taken up in 50 ul of 1X 
ligation buffer (Promega) containing 1 mM ATP and 1 unit of T4 DNA 
ligase and incubated for 2 hours at room temperature to repair the 
single strand nicks remaining after the primer extension. The DNA 
was concentrated by ethanol precipitation, resuspended in water and 
aliquots were electroporated into E. coli strain DH5a (supE44 
DlacU169 (f80 lacZDM15) hsdR17 recA1 endA1 gyrA96 thi-1 relA1) 
to generate libraries enriched for clones containing (CA)n and (GA)n 
repeats.

Construction of a small-insert Lambda ZapII library. To generate a 
library fully representative of the genome that combined the 
efficiency of bacteriophage lambda cloning and the convenience of 
plasmids with small inserts, DNA that was randomly digested with 
DNAse was cloned into lambda ZapII. 10 ug of genomic DNA was 
partially digested with DNAse I in the presence of 10 mM manganese 
chloride. After repair of the ends with T4 DNA polymerase, the DNA 
was run out on a 2% agarose gel and the 300-700 bp fraction was cut 
out. Purification of the size selected DNA and ligation of adaptors was 
as above. The DNA was ligated to dephosphorylated Lambda ZapII 
vector arms and the ligation was packaged using Gigapack Gold 
packaging extract. 2x106 clones were amplified by plating on E. coli 
strain LE392 and eluting the phage in SM buffer. As determined by 
PCR of random clones using T7 and T3 primers, the library contains 
70% recombinants with inserts averaging 500 bp.

Hybridization screening for (CA)n and (GA)n microsatellites. The 
marker selected plasmid libraries and lambda ZapII library were 
screened by colony and plaque hybridization (Sambrook et al, 1989), 
respectively, using poly (dA-dC)/ poly (dG.dT) and poly (dA.dG)/ 
poly (dC.dT) as probes, prepared by random hexamer labeling 
(Feinberg and Vogelstein, 1983). Prehybridization of nitrocellulose (S 
and S) or nylon (Magna) filters was done in 7% sodium dodecyl 
sulfate (SDS), 0.5 M sodium phosphate pH 7.2, 1% BSA (Sigma fraction 
V) overnight at 60C. Hybridization was done overnight in the same 
solution containing 1-2 x 106 cpm/ml of probe. The filters were 
washed in 2x SSPE (Sambrook et al, 1989), once for 20 minutes at 
room temperature and twice for 30 minutes each at 55C, and positive 
plaques or colonies were identified by autoradiography. For ZapII 
clones, pBluescript plasmids were recovered by in vivo excision using 
the stratagene Exassist/SOLR system. Miniprep plasmid DNA was 
sequenced using modified T7 DNA polymerase (Sequenase version 2) 
and autoradiography or with an Applied Biosystems 373A 
instrument. 

Plant material. The Arabidopsis ecotypes: Columbia (Col-0), Landberg 
erecta (Ler), Wassileskija (Ws-0), Niederzanz (Nd-0), and RLD were 
used as sources of genomic DNA, which was prepared according to 
Ausubel (RED BOOK) from bulked plant material or from leaf pieces 
or individual seedlings by the  method of Edwards et al (1991). 
Genomic DNA of Nossen (No-0) was a gift from Tom Mitchell-Olds.

Polymerase chain reaction and polymorphism determination. PCR 
primers flanking microsatellite repeat sequences were selected using 
the PRIMER program (Eric Lander, Whitehead Institute) and either 
synthesized in house on an Applied Biosystems XXX or purchased 
from Research Genetics Inc., Huntsville Alabama. Microsatellites were 
amplified from genomic DNA in 20 ul reactions containing 1-10 ng 
genomic DNA, 5 picomoles of each primer, 200uM 
deoxyribonucleotides, 50 mM KCl, 10 mM Tris-Cl pH 9, 0.01% gelatin, 
0.1% Triton X-100 and 2 units of Taq polymerase. The final 
concentration of magnesium chloride was usually 2 mM, but was 
varied for some primer pairs. The DNA in a 10 ul volume of water 
was heated to 100C for 5 minutes along with a 12 ul pellet of 
paraffin wax and then cooled to room temperature. After the wax 
had solidified over the DNA, the remaining reagents were added in a 
10 ul volume, and the reaction was heated to 94C for three minutes 
to melt the wax, providing a hot start. Standard cycling conditions 
were: 94C for 15 seconds, 55C for 15 seconds and 72C for 30 seconds, 
repeated 40 times. The annealing temperature was modified for 
some primer pairs as described in the results. Amplification was 
done in a Perkin Elmer Cetus 480 or in a Bios Biosycler oven. Length 
variation between PCR products from different ecotypes was 
assessed by analyzing 4 ul of PCR reactions on 4% agarose gels. When 
no polymorphisms were detected in this way, one of the primers was 
labeled using g32P ATP and the radioactive PCR products were 
analyzed by 6% denaturing polyacrylamide gel electrophoresis and 
autoradiography.

Linkage mapping. A set of recombinant inbred strains derived from a 
cross between Col-0 and Ler was obtained from Dr. C. Dean (John 
Innes Institute). These strains are F8 by single seed descent and so 
are expected to be greater than 99% homozygous. Primer pairs 
detecting polymorphisms between Ler and Col-0 were used to 
amplify genomic DNA from subsets of 48 or 96 of the recombinant 
inbreds and each strain was scored for the parental alleles. The data 
were entered into the program RI Plant Manager 2.4 (K. Manly, 
Manly and Elliot, 1991) which assigned linkage positions for the 
microsatellites in relation to an existing set of approximately 60 RFLP 
markers (C. Dean, personal communication). Two-, three- and 
multipoint linkage analysis were carried out using the program 
MAPMAKER 3.0 (Lander et al, 1987; Lincoln et al, 1992) running on a 
Sun Sparc2 workstation.

Results

Microsatellite sequences in previously cloned DNA

	Searches of the GenBank and EMBL databases revealed 
fourteen Arabidopsis entries with mono- or dinucleotide repeats 
greater than 20 nucleotides long. The locus identifications, accession 
numbers and the repeating units are shown in table 1. The most 
common motif is (AT)n with seven entries, followed by (AG)n and 
(A)n with three entries each, and (CA)n with one entry. PCR primers 
flanking the repeats were synthesized for all of these with the 
exception of ATHCRBAA and ATCRB, which are members of a gene 
family and so considered hazardous for genetic analysis due to the 
danger of amplifying DNA from more than one location in the 
genome. ATGBF3 and ATHMYB0 were also omitted since they were 
previously reported to be non-polymorphic (Konieczny and Ausubel, 
1993). Primers flanking the (A)36 repeat in ATHACS were kindly 
provided by A. Theologis (Plant Gene Expression Center, Albany, CA)
	Genomic DNA of the Columbia ecotype was successfully 
amplified using all ten of the primer pairs tested, however, the 
results for ATATSG were inconsistent and this locus was not studied 
further. In the case of ATHATPC1 and S45384S1, a Landsberg allele 
could not be amplified even after attempts were made to optimize 
the PCR conditions. In theory, these loci could be mapped as 
dominant markers but in the absence of an internal control, lack of 
amplification cannot unequivocally be taken to mean a true negative 
result so no attempt was made to map these loci. Of the remaining 
seven microsatellites, all but ATHPRECA were found to be 
polymorphic between Columbia and Landsberg, permitting 
assignment of a linkage position to these loci. 

Isolation of (CA)n and (GA)n containing plasmid and lambda clones.

	The marker selection procedure provided approximately 10-
fold enrichment for (CA)n and (GA)n containing plasmid clones, as 
estimated from the frequency of positive hybridization signals in the 
primary plasmid library and in the marker-selected libraries. This 
level of enrichment was sufficient to make large scale isolation of 
these clones straightforward, however, the enrichment was also 
accompanied by bias in the distribution of clones in the marker 
selected libraries. Sequencing of 79 (CA)n-containing independently 
picked clones revealed only 34 unique sequences, and several of 
these were sequenced four, five or six times. A smaller sample from 
the (GA)n marker selected library was examined but a similar 
pattern was noted. Since the enrichment by the marker seletion 
procedure was only modest and accompanied by considerable 
redundant sequencing, the small insert lambda ZapII library was 
used as the source of the majority of microsatellites. 

PCR amplification and polymorphism determination.

	After discarding false positive clones and those containing 
microsatellites less than 20 nucleotides long, in total, primers were 
selected for 22 (CA)n, 6 (AT)n , 4 (A)n and 37 (GA)n sequences. 
Amplification was initially carried out on genomic DNA from 
Columbia and Landsberg using the standard PCR conditions and 
analyzing the products on 4% agarose gels to check for amplification 
and the presence of polymorphisms. In cases where agarose gels 
revealed no polymorphism, the PCR was repeated with one of the 
primers end-labeled with 32P and denaturing 6% polyacrylamide 
gels were run to check for polymorphisms. Where amplification was 
seen in only one of the ecotypes or not at all, the PCR conditions were 
varied by altering the annealing temperature and/or the magnesium 
concentration an an effort to determine optimum conditions. 
	The first set of clones to be studied contained (CA)n repeats, 
which, almost without exception, were very difficult to amplify, 
requiring extensive optimization of the PCR conditions. In 3 out of 22
cases no amplification could be achieved, while in the remaining 18, 
multiple amplification products were mostly obtained.  The number 
of bands were reduced to two or three after optimizing the 
conditions but only in three cases was a single band obtained. Of the 
18 primer pairs that gave amplification, only one detected a 
polymorphism between Columbia and Landsberg. Of five (AT)n 
sequences studied, ATEAT1 and ATHCHIB detected polymorphisms 
between Columbia and Landberg, whereas S45384S1 and ATHATPC1 
were amplified only from Columbia DNA, and amplification of 
ATATSG was unreliable. Of four (A)n sequences studied, all were 
successfully amplified, those in ATHACS and ATHGENEA being 
polymorphic, while those in ATHPRECA and nga78 (a clone originally 
identified as putatively containing a (AG)n tract) were not. 
	After discovering that the (CA)n class of repeats were mostly 
uninformative, attention was turned exclusively to the (GA)n class, 
37 of which were studied. Of these, seven were unamplifiable, five 
were amplifiable but non-polymorphic between Columbia and 
Landsberg, and the remaining twenty five were polymorphic. These 
results are summarized in table two.
	The primer pairs that detected polymorphisms are shown in 
table 3. These were used to amplify genomic DNA from six commonly 
used laboratory ecotypes using the end-lableled PCR primer method. 
The amplification products were separated on 6% denaturing 
polyacrylamide gels and their sizes estimated by comparison with a 
sequencing ladder. Table 4 shows PCR product sizes for these six 
ecotypes, amplified using primer pairs for thirty microsatellites. In 
the great majority of cases, amplification was successful with all six 
ecotypes, failing only 5 times, twice each with Niederzanz and RLD 
and once with Nossen. In one case, nga248 amplified from RLD, two 
alleles were detected; all other loci were homozygous in all ecotypes. 
The number of alleles detected ranged from 2 to 6 with a mean of 
4.16. The primer pairs flanking the (AT)n repeat in the basic 
chitinase gene intron (ATHCHIB) were used to assess polymorphisms 
accross a larger sample of 20 ecotypes. The results, shown in figure 
1A, show 12 alleles in 19 samples that were amplified. One ecotype, 
Ei-5, was heterozygous at this locus.

Abundance of (CA)n and (GA)n sequences in the genome.

	The abundance of (CA)n and (GA)n sequences in the genome 
was estimated by plaque hybridization. Discounting the 30% non-
recombinants in the ZapII library, (CA)n and (GA)n containing clones 
were detected at frequencies of one in 860 and one in 488 
respectively. With an average insert size of 500 bp this indicates that 
these sequences are found, on average, every 430 and 244 kb 
respectively.

Linkage mapping

	Each of the primer pairs in table three was used to amplify 
DNA from 48, or in some cases, 96 recombinant inbred strains 
derived from a Columbia X Landsberg erecta cross (C. Dean, personal 
communication).  Where the size difference between the Landsberg 
and Columbia alleles permitted, the amplification products were 
analyzed on 4% agarose gels, otherwise on 6% denaturing 
polyacrylamide gels. Figure 1B shows an example of one such 
experiment with segregation of Landsberg and Columbia alleles of 
microsatellite nga128 in 24 RI lines.
	The strain distribution patterns were analyzed by the program 
RI Plant Manager 2.4 (K. Manly, Manly and Elliot, 1991) running on a 
Macintosh IIci in order to make initial linkage assignments relative 
to a set of approximately 60 RFLP markers (Lister and Dean, 1993). 
Two-, three- and multipoint linkage analyses were then carried out 
using the program MAPMAKER 3.0 (Lander et al, 1987; Lincoln et al, 
1992) running on a Sun Sparc2 workstation. All of the microsatellite 
markers were found to be linked to at least two other markers at 
greater than LOD 3.0, by two point analysis, and were uneqivocally 
assigned to a single chromosome. Multipoint analysis established a 
single linkage group for chromosomes two, three, four and five, plus 
two linkage groups on chromosome one. The maximum likelihood 
position of marker GAP-B is between the two chromosome one 
linkage groups, in agreement with Lister and Dean (1993). Figure 2 
shows the maximum-likelihood linkage maps of all five 
chromosomes. The microsatellite loci assigned in this study are 
boxed. 

Discussion

	Thirty polymorphic microsatellite loci have been assigned to 
the Arabidopsis linkage map. Of these, six were obtained from 
previously cloned sequences and the remainder were obtained by 
screening genomic DNA libraries. The most abundant class of 
microsatellite longer than 20 nucleotides found by database 
searching were (AT)n repeats, of which seven were detected. (AG)n 
and (A)n repeats were approximately half as abundant as (AT)n 
elements in the databases, with three each, whereas only one (CA)n 
repeat was found. Prevalence of (AT)n repeats seems to be a general 
feature of plant genomes (Lagercrantz et al, 1993; Morgante and 
olivieri, 1993), as does relative paucity of (CA)n repeats, which are 
the most common dinucleotides in mammalian DNA (Beckmann and 
Weber, 1992). The frequencies of (CA)n and (GA)n repeats in the 
Arabidopsis  genome were estimated by plaque hybridization to be 
one every 430 and 244 kb, respectively. The reported frequencies 
for these repeats in other higher plants range from one every 86-
300 kb for (CA)n and one every 17-125 kb for (GA)n (Condit and 
Hubbell ,1991; Lagerkrantz et al, 1993) making the Arabidopsis 
genome the least rich in these repeats. It is likely that our estimates 
of (CA)n and (GA)n repeat frequencies are low, since they were made 
from fairly stringent hybridization experiments which excluded most 
repeats of n<15 from detection. Also, an amplified library was used 
for the analysis which raises the possibilty of a biased distribution of 
clones. However, the relative paucity of these sequences perhaps 
should not be surprising since Arabidopsis has the smallest genome 
of any higher plant and and is generally characterized by having 
lower quantities of repetitive DNA (REF). The greater abundance of 
(GA)n repeats compared to (CA)n repeats appears to be a consistent 
feature of plant genomes (Lagercrantz et al, 1993).
	Attempts to use poly (AT) as a hybridization probe were 
largely unsuccessful, probably due to the self complimentarity of this 
sequence and also to high background resulting from the low 
stringency conditions used to accomodate the instability of the AT 
base pairs. No estimates of (AT)n or (A)n microsatellite frequency 
were made, but given the frequent occurence of these sequences in 
database entries of plant DNA, they probably represent a large 
untapped pool of polymorphisms. 
	Initial efforts were inspired by the success of (CA)n repeats as 
polymorphic markers in mammalian studies, therefore the discovery 
that these sequences are very conserved in length between the 
Columbia and Landsberg ecotypes of Arabidopsis was very 
surprising. Interestingly, lack of polymorphism was correlated with 
complex repeat structure and difficult PCR amplification. All but 
three of the (CA)n elements studied are compound repeats, with 
short di-, tri-, or in a few cases, tetranucleotide repeats adjacent to 
the major run of (CA)n. The majority also required extensive 
optimization of the PCR conditions, requiring annealing temperatures 
of 60-64 C. The optimum conditions for each primer pair also 
differed between ecotypes meaning that comparison of allele sizes 
across a range of ecotypes was unfeasable.
	In contrast to (CA)n, (GA)n repeats were found to be highly 
polymorphic. 83% of primer pairs giving amplification with both 
Landberg and Columbia ecotypes also detected a polymorphism 
between them. Unlike the (CA)n repeats, the (GA)n class were 
without exception simple in structure, and were mostly amplified 
with a single set of conditions, requiring no optimization. Why repeat 
class, complexity of structure and ease of amplification should be 
correlated with polymorphism is unclear. The (CA)n repeats were 
more readily amplified from DNA cloned in plasmids, which may 
indicate that some higher order structure is the cause of difficult 
amplification from the genomic DNA. 
	Amplification of DNA from six common ecotypes showed that 
the microsatellites in this study are highly polymorphic. These was 
no obvious correlation of polymorphism information content with 
repeat length up to 50 nucleotides; some of the shorter repeats 
(n=13-16) had 4-6 alleles, while some longer repeats (n=23-25) has 
only 2-3 alleles. However, repeats longer than 52 nucleotides had a 
mean of 5 alleles and none had less than 4 alleles The mean number 
of alleles for all markers was 4.16. These results mean that randomly 
selected microsatellites are likely to be informative in any given 
mapping population, and will be especially useful for studying the 
evolutionary relationships between the many ecotypes of 
Arabidopsis thaliana.
	Using the set of recombinant inbred lines developed by Lister 
and Dean (1993) all thirty markers were assigned unequivocally to 
one chromosome, and linkage for each was established to 
neighboring markers at greater than LOD 3.0. Mapping the 
microsatellites was straightforward since in most cases the 
Columbia/ Landberg polymorphism could be confidently scored using 
agarose gel electrophoresis, and the 48 PCR reactions normally used 
for mapping could be accomodated in one gel. Ten, one, seven, two 
and ten markers were assigned to the five chromosomes 
respectively. Given the size of the sample, this distribution appears 
biased against chromosomes two and four, but more markers will 
need to be mapped before this can be stated unequivocally. Also, 
since (GA)n clones were the majority in this study, very little 
information on the chromosomal distribution of (AT)n and (A)n 
clones is available.
	Three point and multi point analyses were carried out to 
generate maximum likelihood maps of each chromosome. Based on 
this map, XX% of the XXX cM map is within 20 cM of one 
microsatellite marker and XX% is within 30 cM, meaning that most of 
the genome can be scanned for linkage with any given mutation 
using this limited set of markers. 

	In this study we have confirmed that mono- and dinucleotide 
microsatellites are present in the Arabidopsis genome, estimated 
their abundance, demonstrated a high probability of finding 
polymorphisms between different ecotypes and assigned thirty new 
markers to the Arabidopsis linkage map. It is clear that these 
markers will be very useful for two of their proposed uses, linkage 
mapping of mutations and population studies, due to their high rate 
of polymorphism and distribution among the chromosomes. The 
other potential use, as a dense STS set for construction of a physical 
map is still an open question, since our estimate of the abundance of 
useful markers may mean that there are too few to construct a map 
of sufficient density. However, the abundance of (A)n and (AT)n 
microsatellites, as well as tri- and tetranucleotide repeats is as yet 
unknown and these most likely represent a large pool of 
undiscovered polymorphisms that will contribute to this goal.