Simple Sequence Length Polymorphisms
Assignment of 30 microsatellite loci to the linkage map of Arabidopsis
Abstract
Thirty microsatellite loci were assigned to the Arabidopsis linkage
map. The existence of microsatellite sequences in the Arabidopsis
genome was confirmed by searching the EMBL and GenBank
databases for di- and mono-nucleotide tracts. Initially, primers were
synthesized flanking an (AT)n repeat in the intron of the gene
encoding basic chitinase and an (AG)n repeat in the 5' untranslated
region of the vacuolar ATPase 57 kd nucleotide binding subunit
cDNA and these were subsequently found to detect polymorphisms
between different Arabidopsis ecotypes by the polymerase chain
reaction (PCR). After demonstrating the presence of microsatellites in
Arabidopsis and their utility for genetic mapping, systematic
screening for (CA)n and (GA)n sequences was carried out on marker-
selected plasmid libraries and a small-insert genomic library in
lambda ZapII using poly (dA.dC)/ poly (dG.dT) and poly (dA.dG)/
poly (dC.dT) as probes. Clones hybridizing to these probes were
sequenced and PCR primers flanking the repeats were selected using
the PRIMER program (Whitehead Institute). PCR was carried out on
the ecoypes Columbia and Landsberg erecta, the parental strains of a
set of recombinant inbred lines, in order to look for useful
polymorphisms. Surprisingly, of 18 (CA)n repeats (n>13), only one
was polymorphic. In contrast, 25 out of 30 (GA)n repeats, 2 out of 3
(AT)n repeats and 2 out of 4 (A)n repeats were polymorphic. The
majority of the (CA)n repeats were complex, with adjacent short di-,
tri- or tetra-nucleotide repeats, whereas most of the (GA)n, (TA)n
and (A)n repeats were simple. The (CA)n repeats were also
refractory to PCR analysis, requiring extensive optimization of PCR
conditions, whereas the other repeat classes were mostly amplified
with a single set of standard conditions. Where polymorphisms were
detected, genomic DNAs from subsets of 48 or 96 recombinant inbred
lines were amplified and the loci were placed on the map by
comparison of the resulting strain distribution patterns (SDP) with
SDPs of an existing set of restriction fragment length polymorphisms
(RFLPs). Chromosomal assignments were made using the program R.I.
Plant Manager (K.F. Manly and R.W. Elliot (1991) Mammalian Genome
1: 123) and two-, three- and multipoint linkage analysis were
carried out using the program MAPMAKER 3.0 (Lander et al (1987)
Genomics 1: 174; Lincoln et al (1992) Whitehead Inst. Tech. Report
3rd ed.). As estimated by plaque hybridization, (GA)n and (CA)n
repeats are relatively abundant in Arabidopsis, although much less
so than in mammalian genomes, with, on average, one repeat every
240 and 430 kilobase pairs respectively.
Introduction
Genetic mapping in mammals has undergone a transformation
since the discovery of simple sequence length polymorphisms
(SSLPs) (Weber and May, 1989, Litt and Luty, 1989, Tautz, 1989)
and their exploitation as linkage markers (Reviewed by Hearne et al;
Human linkage map - Nature genome issue; Mouse paper, Rat paper).
The many benefits of SSLPs should apply equally to plant studies
where there is also a need for abundant, highly informative,
randomly distributed markers that can be assayed by the
polymerase chain reaction (PCR) and distributed between
laboratories as primer sequences. The adoption of Arabidopsis
thaliana as a model system for plant genetics and molecular biology
makes it desirable to have a dense linkage map of broad utility for
this organism. A linkage map of SSLPs would have three obvious
uses in Arabidopsis.
The first of these would be the rapid mapping of mutations,
which is currently carried out using classical markers, restriction
fragment length polymorphisms (Chang et al, 1988; Nam et al, 1989)
or random amplified polymorphic DNAs (Williams et al, 1990).
Classical markers are simple to use and require no use of molecular
biology but can suffer from ambiguous scoring and interference
between the marker phenotype and the phenotype to be mapped. In
addition, only a few markers can be reliably followed in a single
cross, meaning that many crosses have to be made to arrive at a
location for the gene of interest. RAPDs are easily generated, simple
to score and their use is amenable to automation, but they are
generally dominant in nature meaning that they cannot be used in
the F2 or backcross populations that are commonly used for
mapping. For these reasons, RFLPs and related codominant cleaved
amplified polymorphic sequences (CAPS, Konieczny and Ausubel,
1993) are more commonly used. Unlike with RFLPs, mapping with
SSLPs can be accomplished with small preparations of miniprep DNA
made from single seedlings or leaf pieces, and polymorphisms are
visualized by electrophoresis rather than blotting and hybridization.
DNA preparation, PCR, analysis of the amplification products, and
determination of map position can be accomplished in two days.
CAPS are a logical extension of RFLPs that use PCR technology but
their generation requires the prior generation of an RFLP and the
complete sequence of the RFLP probe. For these reasons they have so
far been limited to cloned genes. The second use of an SSLP map
would be as an ordered set of sequence tagged sites (STSs) for
construction of a physical map by STS content mapping (Olson et al,
1989). The assembly of yeast artificial chromosomes (YACs) into
contiguous physical maps can be complicated by false positive and
negative results, by the chimaeric nature of some YACs and by STSs
that detect sequences at multiple locations in the genome. Prior
knowledge of the relative order of the STSs provides a means of
detecting some of these errors therefore it would be desirable to
have high confidence in this linkage order. The small amounts of
template DNA required and the simple nature of the PCR assay
makes this feasable through scoring large mapping populations.
Thirdly, their multiple alleles and probable selective neutrality make
these ideal markers for population and evolutionary studies.
The existence of microsatellite repeat sequences has been
shown in several plant species, beginning with Beckman and Soller
(1989) who by database searching showed the existence of (AT)n,
(GA)n and (CG)n repeats in potato, and Condit and Hubbell, (1991),
who estimated the abundance of (CA)n and (GA)n repeats in corn and
in four species of tropical trees. Akkaya et al (1992) demonstated
that (AT)n and (ATT)n repeats are polymorphic in soybean, and
mapped the first microsatellite loci in a plant species. Lagercrantz et
al (1993) estimated the frequency of plant microsatellite sequences
in the EMBL database, showing that these elements are less frequent
in plant genomes than in mammals, with, on average, one repeat
longer than 20 bp every 29 kb, compared to a similar figure of 6 kb
in mammals. The most abundant plant microsatellite was found to be
(A)n, followed by (AT)n then (AG)n, with (CA)n repeats being
relatively scarce compared to mammalian genomes, (Lagercrantz,
1993).
In this study we investigate the utility of microsatellites as
tools for genetic mapping in Arabidopsis thaliana , estimate the
abundance of (CA)n and (GA)n sequences in the genome, assign 30
microsatellites of various types to the linkage map and provide
polymorphism data for these 30 repeats in six ecotypes.
Materials and Methods
Identification and isolation of microsatellites
Database search. To identify microsatellites in previously sequenced
Arabidopsis DNA, the GenBank (release 76.0) and EMBL (release
23.0) nucleic acid databases were searched using 20 nucleotide
queries corresponding to all possible di- and mono-nucleotides.
Searches were carried out on a Sun Sparc2 workstation by FASTA
(Pearson and Lipman, 1988) under the GCG package (Genetics
Computer Group, 1991) and by regular expressions as part of DNA
workbench, an interactive DNA and protein analysis program
(Tisdall, 1993).
Construction of plasmid library 5 ug of genomic DNA of the Columbia
(Col-0) ecotype was digested to completion with AluI, RsaI, TaqI and
EcoRV in 1X KGB (potassium glutamate buffer, Sambrook et al, 1988)
and the resulting fragments were rendered blunt by treatment with
the Klenow fragment of E. coli DNA polymerase I. After
phenol/chloroform extraction and ethanol precipitation, the DNA was
separated on 2% agarose and the 200-500 base pairs (bp) fraction
(representing 15-25% of the genome) was purified with GlasPac
(National Scientific). In two subsequent steps, cohesive NotI/EcoRI
adaptors were ligated on and the 5' ends were phosphorylated by T4
polynucleotide kinase. The DNA was separated from excess adaptors
by chromatography through a Sephacryl S-300 cDNA spun column
(Pharmacia) according to the manufacturer's protocol. To remove any
remaining adaptors, the DNA was run out for a short distance into a
2% agarose gel and purified a second time with GlasPac. The inserts
were ligated to EcoRI treated and de-phosphorylated pBluescript KS+
and portions of the ligation reactions were introduced into E.coli
strain CJ236 (dut-1, ung-1, thi-1, relA1; pCJ105 (Cmr)) by
electroporation and plated on LB plates containing ampicillin.
Approximately 100 000 colonies were pooled, supended in 10 ml LB
broth containing 7% dimethyl sulfoxide (DMSO), frozen in 200 ul
aliquots in liquid nitrogen and stored at -80C.
Construction of marker selected libraries. (CA)n and (GA)n marker
selected libraries were constructed essentially according to Ostrander
et al (1992) as follows: Single stranded phage were prepared by
inoculating 2 ml of 2X YT broth containing ampicillin with 1 ul of the
pooled bacteria, super-infecting with the helper phage VCSM13 and
selecting for infected bacteria by kanamycin selection during
overnight incubation. The uracylated single stranded DNA (ssDNA)
was purified from culture supernatant by standard methods
(Sambrook et al, 1988). Approximately 500 ng of uracylated ssDNA
was mixed with 5 pmol of the phosphorylated oligonucleotide (CT)10
or (GT)10 in a 100 ul reaction mixture containing 1X Taq polymerase
buffer (Promega) and 200 uM deoxy-ribonucleotides. This mixture
was heated to 95C for 5 minutes, cooled to 60C for 2 minutes, during
which 1 unit of Taq polymerase was added, and then incubated at
72C for 30 minutes. After phenol/chloroform extraction, ethanol
precipitation and drying, the DNA was taken up in 50 ul of 1X
ligation buffer (Promega) containing 1 mM ATP and 1 unit of T4 DNA
ligase and incubated for 2 hours at room temperature to repair the
single strand nicks remaining after the primer extension. The DNA
was concentrated by ethanol precipitation, resuspended in water and
aliquots were electroporated into E. coli strain DH5a (supE44
DlacU169 (f80 lacZDM15) hsdR17 recA1 endA1 gyrA96 thi-1 relA1)
to generate libraries enriched for clones containing (CA)n and (GA)n
repeats.
Construction of a small-insert Lambda ZapII library. To generate a
library fully representative of the genome that combined the
efficiency of bacteriophage lambda cloning and the convenience of
plasmids with small inserts, DNA that was randomly digested with
DNAse was cloned into lambda ZapII. 10 ug of genomic DNA was
partially digested with DNAse I in the presence of 10 mM manganese
chloride. After repair of the ends with T4 DNA polymerase, the DNA
was run out on a 2% agarose gel and the 300-700 bp fraction was cut
out. Purification of the size selected DNA and ligation of adaptors was
as above. The DNA was ligated to dephosphorylated Lambda ZapII
vector arms and the ligation was packaged using Gigapack Gold
packaging extract. 2x106 clones were amplified by plating on E. coli
strain LE392 and eluting the phage in SM buffer. As determined by
PCR of random clones using T7 and T3 primers, the library contains
70% recombinants with inserts averaging 500 bp.
Hybridization screening for (CA)n and (GA)n microsatellites. The
marker selected plasmid libraries and lambda ZapII library were
screened by colony and plaque hybridization (Sambrook et al, 1989),
respectively, using poly (dA-dC)/ poly (dG.dT) and poly (dA.dG)/
poly (dC.dT) as probes, prepared by random hexamer labeling
(Feinberg and Vogelstein, 1983). Prehybridization of nitrocellulose (S
and S) or nylon (Magna) filters was done in 7% sodium dodecyl
sulfate (SDS), 0.5 M sodium phosphate pH 7.2, 1% BSA (Sigma fraction
V) overnight at 60C. Hybridization was done overnight in the same
solution containing 1-2 x 106 cpm/ml of probe. The filters were
washed in 2x SSPE (Sambrook et al, 1989), once for 20 minutes at
room temperature and twice for 30 minutes each at 55C, and positive
plaques or colonies were identified by autoradiography. For ZapII
clones, pBluescript plasmids were recovered by in vivo excision using
the stratagene Exassist/SOLR system. Miniprep plasmid DNA was
sequenced using modified T7 DNA polymerase (Sequenase version 2)
and autoradiography or with an Applied Biosystems 373A
instrument.
Plant material. The Arabidopsis ecotypes: Columbia (Col-0), Landberg
erecta (Ler), Wassileskija (Ws-0), Niederzanz (Nd-0), and RLD were
used as sources of genomic DNA, which was prepared according to
Ausubel (RED BOOK) from bulked plant material or from leaf pieces
or individual seedlings by the method of Edwards et al (1991).
Genomic DNA of Nossen (No-0) was a gift from Tom Mitchell-Olds.
Polymerase chain reaction and polymorphism determination. PCR
primers flanking microsatellite repeat sequences were selected using
the PRIMER program (Eric Lander, Whitehead Institute) and either
synthesized in house on an Applied Biosystems XXX or purchased
from Research Genetics Inc., Huntsville Alabama. Microsatellites were
amplified from genomic DNA in 20 ul reactions containing 1-10 ng
genomic DNA, 5 picomoles of each primer, 200uM
deoxyribonucleotides, 50 mM KCl, 10 mM Tris-Cl pH 9, 0.01% gelatin,
0.1% Triton X-100 and 2 units of Taq polymerase. The final
concentration of magnesium chloride was usually 2 mM, but was
varied for some primer pairs. The DNA in a 10 ul volume of water
was heated to 100C for 5 minutes along with a 12 ul pellet of
paraffin wax and then cooled to room temperature. After the wax
had solidified over the DNA, the remaining reagents were added in a
10 ul volume, and the reaction was heated to 94C for three minutes
to melt the wax, providing a hot start. Standard cycling conditions
were: 94C for 15 seconds, 55C for 15 seconds and 72C for 30 seconds,
repeated 40 times. The annealing temperature was modified for
some primer pairs as described in the results. Amplification was
done in a Perkin Elmer Cetus 480 or in a Bios Biosycler oven. Length
variation between PCR products from different ecotypes was
assessed by analyzing 4 ul of PCR reactions on 4% agarose gels. When
no polymorphisms were detected in this way, one of the primers was
labeled using g32P ATP and the radioactive PCR products were
analyzed by 6% denaturing polyacrylamide gel electrophoresis and
autoradiography.
Linkage mapping. A set of recombinant inbred strains derived from a
cross between Col-0 and Ler was obtained from Dr. C. Dean (John
Innes Institute). These strains are F8 by single seed descent and so
are expected to be greater than 99% homozygous. Primer pairs
detecting polymorphisms between Ler and Col-0 were used to
amplify genomic DNA from subsets of 48 or 96 of the recombinant
inbreds and each strain was scored for the parental alleles. The data
were entered into the program RI Plant Manager 2.4 (K. Manly,
Manly and Elliot, 1991) which assigned linkage positions for the
microsatellites in relation to an existing set of approximately 60 RFLP
markers (C. Dean, personal communication). Two-, three- and
multipoint linkage analysis were carried out using the program
MAPMAKER 3.0 (Lander et al, 1987; Lincoln et al, 1992) running on a
Sun Sparc2 workstation.
Results
Microsatellite sequences in previously cloned DNA
Searches of the GenBank and EMBL databases revealed
fourteen Arabidopsis entries with mono- or dinucleotide repeats
greater than 20 nucleotides long. The locus identifications, accession
numbers and the repeating units are shown in table 1. The most
common motif is (AT)n with seven entries, followed by (AG)n and
(A)n with three entries each, and (CA)n with one entry. PCR primers
flanking the repeats were synthesized for all of these with the
exception of ATHCRBAA and ATCRB, which are members of a gene
family and so considered hazardous for genetic analysis due to the
danger of amplifying DNA from more than one location in the
genome. ATGBF3 and ATHMYB0 were also omitted since they were
previously reported to be non-polymorphic (Konieczny and Ausubel,
1993). Primers flanking the (A)36 repeat in ATHACS were kindly
provided by A. Theologis (Plant Gene Expression Center, Albany, CA)
Genomic DNA of the Columbia ecotype was successfully
amplified using all ten of the primer pairs tested, however, the
results for ATATSG were inconsistent and this locus was not studied
further. In the case of ATHATPC1 and S45384S1, a Landsberg allele
could not be amplified even after attempts were made to optimize
the PCR conditions. In theory, these loci could be mapped as
dominant markers but in the absence of an internal control, lack of
amplification cannot unequivocally be taken to mean a true negative
result so no attempt was made to map these loci. Of the remaining
seven microsatellites, all but ATHPRECA were found to be
polymorphic between Columbia and Landsberg, permitting
assignment of a linkage position to these loci.
Isolation of (CA)n and (GA)n containing plasmid and lambda clones.
The marker selection procedure provided approximately 10-
fold enrichment for (CA)n and (GA)n containing plasmid clones, as
estimated from the frequency of positive hybridization signals in the
primary plasmid library and in the marker-selected libraries. This
level of enrichment was sufficient to make large scale isolation of
these clones straightforward, however, the enrichment was also
accompanied by bias in the distribution of clones in the marker
selected libraries. Sequencing of 79 (CA)n-containing independently
picked clones revealed only 34 unique sequences, and several of
these were sequenced four, five or six times. A smaller sample from
the (GA)n marker selected library was examined but a similar
pattern was noted. Since the enrichment by the marker seletion
procedure was only modest and accompanied by considerable
redundant sequencing, the small insert lambda ZapII library was
used as the source of the majority of microsatellites.
PCR amplification and polymorphism determination.
After discarding false positive clones and those containing
microsatellites less than 20 nucleotides long, in total, primers were
selected for 22 (CA)n, 6 (AT)n , 4 (A)n and 37 (GA)n sequences.
Amplification was initially carried out on genomic DNA from
Columbia and Landsberg using the standard PCR conditions and
analyzing the products on 4% agarose gels to check for amplification
and the presence of polymorphisms. In cases where agarose gels
revealed no polymorphism, the PCR was repeated with one of the
primers end-labeled with 32P and denaturing 6% polyacrylamide
gels were run to check for polymorphisms. Where amplification was
seen in only one of the ecotypes or not at all, the PCR conditions were
varied by altering the annealing temperature and/or the magnesium
concentration an an effort to determine optimum conditions.
The first set of clones to be studied contained (CA)n repeats,
which, almost without exception, were very difficult to amplify,
requiring extensive optimization of the PCR conditions. In 3 out of 22
cases no amplification could be achieved, while in the remaining 18,
multiple amplification products were mostly obtained. The number
of bands were reduced to two or three after optimizing the
conditions but only in three cases was a single band obtained. Of the
18 primer pairs that gave amplification, only one detected a
polymorphism between Columbia and Landsberg. Of five (AT)n
sequences studied, ATEAT1 and ATHCHIB detected polymorphisms
between Columbia and Landberg, whereas S45384S1 and ATHATPC1
were amplified only from Columbia DNA, and amplification of
ATATSG was unreliable. Of four (A)n sequences studied, all were
successfully amplified, those in ATHACS and ATHGENEA being
polymorphic, while those in ATHPRECA and nga78 (a clone originally
identified as putatively containing a (AG)n tract) were not.
After discovering that the (CA)n class of repeats were mostly
uninformative, attention was turned exclusively to the (GA)n class,
37 of which were studied. Of these, seven were unamplifiable, five
were amplifiable but non-polymorphic between Columbia and
Landsberg, and the remaining twenty five were polymorphic. These
results are summarized in table two.
The primer pairs that detected polymorphisms are shown in
table 3. These were used to amplify genomic DNA from six commonly
used laboratory ecotypes using the end-lableled PCR primer method.
The amplification products were separated on 6% denaturing
polyacrylamide gels and their sizes estimated by comparison with a
sequencing ladder. Table 4 shows PCR product sizes for these six
ecotypes, amplified using primer pairs for thirty microsatellites. In
the great majority of cases, amplification was successful with all six
ecotypes, failing only 5 times, twice each with Niederzanz and RLD
and once with Nossen. In one case, nga248 amplified from RLD, two
alleles were detected; all other loci were homozygous in all ecotypes.
The number of alleles detected ranged from 2 to 6 with a mean of
4.16. The primer pairs flanking the (AT)n repeat in the basic
chitinase gene intron (ATHCHIB) were used to assess polymorphisms
accross a larger sample of 20 ecotypes. The results, shown in figure
1A, show 12 alleles in 19 samples that were amplified. One ecotype,
Ei-5, was heterozygous at this locus.
Abundance of (CA)n and (GA)n sequences in the genome.
The abundance of (CA)n and (GA)n sequences in the genome
was estimated by plaque hybridization. Discounting the 30% non-
recombinants in the ZapII library, (CA)n and (GA)n containing clones
were detected at frequencies of one in 860 and one in 488
respectively. With an average insert size of 500 bp this indicates that
these sequences are found, on average, every 430 and 244 kb
respectively.
Linkage mapping
Each of the primer pairs in table three was used to amplify
DNA from 48, or in some cases, 96 recombinant inbred strains
derived from a Columbia X Landsberg erecta cross (C. Dean, personal
communication). Where the size difference between the Landsberg
and Columbia alleles permitted, the amplification products were
analyzed on 4% agarose gels, otherwise on 6% denaturing
polyacrylamide gels. Figure 1B shows an example of one such
experiment with segregation of Landsberg and Columbia alleles of
microsatellite nga128 in 24 RI lines.
The strain distribution patterns were analyzed by the program
RI Plant Manager 2.4 (K. Manly, Manly and Elliot, 1991) running on a
Macintosh IIci in order to make initial linkage assignments relative
to a set of approximately 60 RFLP markers (Lister and Dean, 1993).
Two-, three- and multipoint linkage analyses were then carried out
using the program MAPMAKER 3.0 (Lander et al, 1987; Lincoln et al,
1992) running on a Sun Sparc2 workstation. All of the microsatellite
markers were found to be linked to at least two other markers at
greater than LOD 3.0, by two point analysis, and were uneqivocally
assigned to a single chromosome. Multipoint analysis established a
single linkage group for chromosomes two, three, four and five, plus
two linkage groups on chromosome one. The maximum likelihood
position of marker GAP-B is between the two chromosome one
linkage groups, in agreement with Lister and Dean (1993). Figure 2
shows the maximum-likelihood linkage maps of all five
chromosomes. The microsatellite loci assigned in this study are
boxed.
Discussion
Thirty polymorphic microsatellite loci have been assigned to
the Arabidopsis linkage map. Of these, six were obtained from
previously cloned sequences and the remainder were obtained by
screening genomic DNA libraries. The most abundant class of
microsatellite longer than 20 nucleotides found by database
searching were (AT)n repeats, of which seven were detected. (AG)n
and (A)n repeats were approximately half as abundant as (AT)n
elements in the databases, with three each, whereas only one (CA)n
repeat was found. Prevalence of (AT)n repeats seems to be a general
feature of plant genomes (Lagercrantz et al, 1993; Morgante and
olivieri, 1993), as does relative paucity of (CA)n repeats, which are
the most common dinucleotides in mammalian DNA (Beckmann and
Weber, 1992). The frequencies of (CA)n and (GA)n repeats in the
Arabidopsis genome were estimated by plaque hybridization to be
one every 430 and 244 kb, respectively. The reported frequencies
for these repeats in other higher plants range from one every 86-
300 kb for (CA)n and one every 17-125 kb for (GA)n (Condit and
Hubbell ,1991; Lagerkrantz et al, 1993) making the Arabidopsis
genome the least rich in these repeats. It is likely that our estimates
of (CA)n and (GA)n repeat frequencies are low, since they were made
from fairly stringent hybridization experiments which excluded most
repeats of n<15 from detection. Also, an amplified library was used
for the analysis which raises the possibilty of a biased distribution of
clones. However, the relative paucity of these sequences perhaps
should not be surprising since Arabidopsis has the smallest genome
of any higher plant and and is generally characterized by having
lower quantities of repetitive DNA (REF). The greater abundance of
(GA)n repeats compared to (CA)n repeats appears to be a consistent
feature of plant genomes (Lagercrantz et al, 1993).
Attempts to use poly (AT) as a hybridization probe were
largely unsuccessful, probably due to the self complimentarity of this
sequence and also to high background resulting from the low
stringency conditions used to accomodate the instability of the AT
base pairs. No estimates of (AT)n or (A)n microsatellite frequency
were made, but given the frequent occurence of these sequences in
database entries of plant DNA, they probably represent a large
untapped pool of polymorphisms.
Initial efforts were inspired by the success of (CA)n repeats as
polymorphic markers in mammalian studies, therefore the discovery
that these sequences are very conserved in length between the
Columbia and Landsberg ecotypes of Arabidopsis was very
surprising. Interestingly, lack of polymorphism was correlated with
complex repeat structure and difficult PCR amplification. All but
three of the (CA)n elements studied are compound repeats, with
short di-, tri-, or in a few cases, tetranucleotide repeats adjacent to
the major run of (CA)n. The majority also required extensive
optimization of the PCR conditions, requiring annealing temperatures
of 60-64 C. The optimum conditions for each primer pair also
differed between ecotypes meaning that comparison of allele sizes
across a range of ecotypes was unfeasable.
In contrast to (CA)n, (GA)n repeats were found to be highly
polymorphic. 83% of primer pairs giving amplification with both
Landberg and Columbia ecotypes also detected a polymorphism
between them. Unlike the (CA)n repeats, the (GA)n class were
without exception simple in structure, and were mostly amplified
with a single set of conditions, requiring no optimization. Why repeat
class, complexity of structure and ease of amplification should be
correlated with polymorphism is unclear. The (CA)n repeats were
more readily amplified from DNA cloned in plasmids, which may
indicate that some higher order structure is the cause of difficult
amplification from the genomic DNA.
Amplification of DNA from six common ecotypes showed that
the microsatellites in this study are highly polymorphic. These was
no obvious correlation of polymorphism information content with
repeat length up to 50 nucleotides; some of the shorter repeats
(n=13-16) had 4-6 alleles, while some longer repeats (n=23-25) has
only 2-3 alleles. However, repeats longer than 52 nucleotides had a
mean of 5 alleles and none had less than 4 alleles The mean number
of alleles for all markers was 4.16. These results mean that randomly
selected microsatellites are likely to be informative in any given
mapping population, and will be especially useful for studying the
evolutionary relationships between the many ecotypes of
Arabidopsis thaliana.
Using the set of recombinant inbred lines developed by Lister
and Dean (1993) all thirty markers were assigned unequivocally to
one chromosome, and linkage for each was established to
neighboring markers at greater than LOD 3.0. Mapping the
microsatellites was straightforward since in most cases the
Columbia/ Landberg polymorphism could be confidently scored using
agarose gel electrophoresis, and the 48 PCR reactions normally used
for mapping could be accomodated in one gel. Ten, one, seven, two
and ten markers were assigned to the five chromosomes
respectively. Given the size of the sample, this distribution appears
biased against chromosomes two and four, but more markers will
need to be mapped before this can be stated unequivocally. Also,
since (GA)n clones were the majority in this study, very little
information on the chromosomal distribution of (AT)n and (A)n
clones is available.
Three point and multi point analyses were carried out to
generate maximum likelihood maps of each chromosome. Based on
this map, XX% of the XXX cM map is within 20 cM of one
microsatellite marker and XX% is within 30 cM, meaning that most of
the genome can be scanned for linkage with any given mutation
using this limited set of markers.
In this study we have confirmed that mono- and dinucleotide
microsatellites are present in the Arabidopsis genome, estimated
their abundance, demonstrated a high probability of finding
polymorphisms between different ecotypes and assigned thirty new
markers to the Arabidopsis linkage map. It is clear that these
markers will be very useful for two of their proposed uses, linkage
mapping of mutations and population studies, due to their high rate
of polymorphism and distribution among the chromosomes. The
other potential use, as a dense STS set for construction of a physical
map is still an open question, since our estimate of the abundance of
useful markers may mean that there are too few to construct a map
of sufficient density. However, the abundance of (A)n and (AT)n
microsatellites, as well as tri- and tetranucleotide repeats is as yet
unknown and these most likely represent a large pool of
undiscovered polymorphisms that will contribute to this goal.