Evolution of the trnF(GAA) Gene in Arabidopsis Relatives and the Brassicaceae Family: Monophyletic Origin and Subsequent Diversification of
http://www.100md.com
《分子生物学进展》
* Heidelberg Institute of Plant Science, Heidelberg University, Heidelberg, Germany; Department of Systematic Botany, University of Osnabrück, Germany; Botany Department, National History Museum, London, United Kingdom; and Max Planck Institute for Chemical Ecology, Jena, Germany
Correspondence: E-mail: marcus.koch@urz.uni-heidelberg.de.
Abstract
Recently, we used the 5'-trnL(UAA)–trnF(GAA) region of the chloroplast DNA for phylogeographic reconstructions and phylogenetic analysis among the genera Arabidopsis, Boechera, Rorippa, Nasturtium, and Cardamine. Despite the fact that extensive gene duplications are rare among the chloroplast genome of higher plants, within these taxa the anticodon domain of the trnF(GAA) gene exhibit extensive gene duplications with one to eight tandemly repeated copies in close 5' proximity of the functional gene. Interestingly, even in Arabidopsis thaliana we found six putative pseudogenic copies of the functional trnF gene within the 5'-intergenic trnL-trnF spacer. A reexamination of trnL(UAA)-trnF(GAA) regions from numerous published phylogenetic studies among halimolobine, cardaminoid, and other cruciferous taxa revealed not only extensive trnF gene duplications but also favor the hypothesis about a single origin of trnF pseudogene formation during evolution of the Brassicaceae family 16–21 MYA. Conserved sequence motifs from this tandemly repeated region are codistributed nonrandomly throughout the plastome, and we found some similarities with a DNA sequence duplication in the rps7 gene and its adjacent spacer. Our results demonstrate the potential evolutionary dynamics of a plastidic region generally regarded as highly conserved and probably cotranscribed and, as shown here for several genera among cruciferous plants, greatly characterized by parallel gains and losses of duplicated trnF copies.
Key Words: Brassicaceae ? trnF(GAA) ? pseudogenes ? phylogeny ? gene duplication
Introduction
Among plant systematic and phylogeographic studies the chloroplast genome is widely used and generally accepted as an excellent source for molecular information (Olmstead and Palmer 1994; Newton et al. 1999; Hewitt 2001). There are several reasons for this. First, the uniparental inheritance (maternally in most angiosperms, paternally in gymnosperms; Reboud and Zeyl 1994) ensures orthology of sequences. Biparental inheritance is a rare exception (Johnson and Palmer 1989). Second, even within an individual the possibility of recombination between genomes from individual plastids is extremely low, and there are only a few studies describing the occurrence of multimeric chloroplast DNA (cpDNA) genomes or interchromosomal cpDNA recombination (Govindaraju, Dancik, and Wagner 1989; Dally and Second 1990). Third, dramatic changes in gene content and structure only occurred after the chloroplast genome entered a eukaryotic cell via primary endocytobiosis (Martin et al. 1998), whereas land plant plastomes are highly conserved (Goremykin et al. 2003; Kelch, Driskell, and Mishler 2004). However, some studies indicated that the chloroplast genome in higher plants still has the potential for evolutionary changes as indicated by a radically reduced "minimal plastid" genome (parasitic Epifagus: Wolfe, Morden, and Palmer 1992) or possible DNA recombination (lodgepole pine: Marshall, Newton, and Ritland 2001).
A summary of structural mutations in the chloroplast genome is provided by Vijverberg and Bachmann (1999), and it has been concluded that most structural mutations concern indels <10 bp. These microstructural changes have been shown to be extremely useful even in resolving deep phylogenies (Graham et al. 2000; L?hne and Borsch 2005) and have been analyzed in more detail in the chloroplast genome of Silene (Ingvarsson, Ribstein, and Taylor 2003). Structural mutations such as gene duplications among higher plant plastomes are rarely described. Those examples involve tRNA genes (e.g., Hipkens et al. 1995; Vijverberg and Bachmann 1999; Drábkova et al. 2004), rpl2 and rpl23 (Bowman, Barker, and Dyer 1988), psbA (Lidholm, Szmidt, and Gustafsson 1991), and psaM (Wakasugi et al. 1994). An overview of losses of chloroplast genes in angiosperms is provided by Millen et al. (2001), and it seems obvious that most of the duplications can be manifested only in rearranged chloroplast genomes, such as those of the grasses, legumes, and conifers. Interestingly, evolutionary dynamics of the chloroplast genomes such as rearrangements and nucleotide substitution rates greatly depend on such large-scale rearrangement, for example, the loss of one copy of the inverted repeat (IR) (Palmer and Thompson 1981, 1982; Perry and Wolfe 2002). And consequently one of the few reports of pseudogenes came from Vigna angularis, legume family, and describes ycf2 gene duplication (Perry et al. 2002).
One of the most widely used plastidic molecular markers in plant systematics and phylogeography is the trnT-trnF region since Taberlet et al. (1991) introduced universal primers to amplify the region comprising the trnT(UGA) gene, the trnL(UAA) gene including a group I intron, the trnF(GAA) gene, and the corresponding two spacers. Interestingly, this region provided not only phylogenetic signal to resolve deep angiosperm phylogeny (e.g., Borsch et al. 2003) but also revealed extensive haplotype variation to elaborate speciation processes on the population level (e.g., Dobe, Mitchell-Olds, and Koch 2004). The trnL-trnF genes are cotranscribed (Kanno and Hirai 1993), and therefore it can be assumed that intron as well as spacer regions are of functional importance.
The trnL group I intron resembles an ancestral intron type, which can be traced back to a single cyanobacterial endosymbiosis, and this region has been analyzed intensively (Kuhsel, Strickland, and Palmer 1990; Xu et al. 1990; Cech et al. 1992; Paquin et al. 1997; Besendahl et al. 2000; Costa, Paulstrud, and Lindblatt 2002). However, less efforts have been undertaken to understand function and evolution of the trnL(UAA)-trnF(GAA) spacer region (Bakker et al. 2000; Hamilton, Braverman, and Soria-Hernanz 2003) and to analyze the relevance of putative promoter elements and mutational hotspots with little structural constraints for evolution and phylogenetic reconstructions (Borsch et al. 2003; Quandt et al. 2004).
It is remarkable that the only examples of trnF gene copy number variation outside the Brassicaceae have been reported from Microseris and Uropappus (Vijverberg and Bachmann 1999), Taraxacum (Wittzell 1999) from the Asteraceae family, both of which are members of the tribe Lactuceae, and from Juncus and Luzula from the Juncaceae family (Drábkova et al. 2004). It has been concluded that polymorphic pseudogenes are not subject to purifying selection in Taraxacum, and in closely related genera Youngia and Crepis, no pseudogenes have been observed (Wittzell 1999). This finding might evoke the question whether an initial pseudogene formation could have occurred within a particular lineage of Asteraceae with parallel losses and elimination of particular pseudogene copies. However, this is uncertain, although Vijverberg and Bachmann (1999) already concluded that an initial duplication must have occurred in an ancestor of the genera under study and that the duplication is rather ancient. The repetitive nature of the pseudogenes is substantiated by interspersed 4-bp (AATA) motifs in Taraxacum (Wittzell 1999), and a proposed mechanism of generation of pseudogenes via interchromosomal recombination and intrachromosomal duplications has been provided (Vijverberg and Bachmann 1999).
In this study we investigated the trnF(GAA) gene and its evolution in cruciferous plants. Recently, we detected extensive trnF(GAA) pseudogene formation among the cruciferous genera Cardaminopsis (meanwhile integrated into a newly defined genus Arabidopsis [O'Kane and Al-Shehbaz 1997, 2003] and Boechera [Koch, Dobe, and Matschinger 2003; Dobe, Mitchell-Olds, and Koch 2004]). Therefore, herein we aim (1) to reconstruct the evolutionary history of trnF pseudogenes in Brassicaceae with special emphasis on the genus Arabidopsis (which is the most diverse model known so far), (2) to analyze mutational patterns and sequence motifs in the spacer region that might provide insights into the mechanism of pseudogene formation, and (3) to evaluate the positional occurrence of complete or partial trnF pseudogenes in angiosperm chloroplast genomes to assess if they form transposable elements within the plastome to demonstrate their taxonomic distribution and to comment on their phylogenetic utility.
Materials and Methods
The DNA sequence data of the trnL(UAA)-trnF(GAA) region were obtained from two different sources. Most of the sequences have been selected from previously published studies on crucifer systematics and evolution (table 1). All these sequences had been already deposited in GenBank, and we only refer to the corresponding publication. A second source and large-scale study of approximately 750 accessions sequenced is focusing on the phylogeography of the genus Arabidopsis, and in this study we present some selected sequences (Matschinger and Koch 2003) representing most of the variation in trnF copy number (table 1). In addition, we submitted sequence data of numerous taxa from the genera Draba and Arabis to GenBank (unpublished data, AF134196–AF134278). However, none of those sequences contained any pseudogene.
Table 1 Distribution of trnF Pseudogenes Among Cruciferous Plants and Number of Multiplicated trnF Anticodon Domains
Detailed protocols of DNA isolation, polymerase chain reaction, and DNA sequencing are given in Dobe, Mitchell-Olds and Koch (2004), and the methods used follow standard procedures.
For halimolobine Brassicaceae and several out-groups we used the trnL(UAA)-trnF(GAA) alignment provided by Bailey, Price, and Doyle (2002) as an example to demonstrate pseudogene copy number distribution in the context of a published and robust phylogeny.
Additionally, we selected trnL-F spacer regions from numerous cruciferous taxa and several species from the order Capparales as out-groups (table 1) to cover as many genera as possible. A National Center for Biotechnology Information GenBank search (using the ENTREZ gateway and "keywords trnF and Brassicaceae" at http://www.ncbi.nlm.nih.gov/entrez/) resulted in 726 sequences. However, there are only a few publications and studies comprising more than 99% of these sequences (Draba, Erophila, Tomostima, Cusickiella, and related taxa: Koch and Al-Shehbaz [2002] [78 sequences]; Draba, Schivereckia, Arabis: Koch unpublished [82 sequences: AF134196– AF134278]; Lepidium, Cardaria, Hymenolobus, Pritzelago, Hornungia and related taxa: Mummenhoff, Brüggemann, and Bowman [2001] [82 sequences]; Lepidium: Lee, Mummenhoff, and Bowman [2002] [58 sequences]; selection of taxa from the order Capparales: Hall, Sytsma, and Iltis [2002] [57 sequences in total, 11 from Brassicaceae]; halimolobine Brassicaceae: Bailey, Price, and Doyle [2002] [47 sequences]; Rorippa and Nasturtium: Bleeker, Weber-Sparenberg, and Hurka [2002] [35 sequences]; Rorippa: Bleeker and Hurka [2001] [76 sequences—characterizing haplotypes from 359 individuals]; Cardamine, Rorippa: Bleeker et al. [2002] [24 sequences]; Cardamine: Lihova et al. [2004] [76 sequences]; selection of taxa from tribe Brassiceae: Yang et al. [2002] [12 sequences]; Brassia relatives and Diplotaxis: Lanner [1998] [15 sequences]; Noccaea, Raparia, and Microthlaspi: Koch and Al-Shehbaz [2004] [26 sequences]; Boechera, Cusickiella and related taxa: Dobe, Mitchell-Olds, and Koch [2004] [103 sequences—characterizing haplotypes from 654 accessions]). The several taxa are summarized in table 1.
TrnF(GAA) Pseudogene Recognition and Copy Number in Arabidopsis thaliana
We used the GenBank accession of the chloroplast genome of Arabidopsis thaliana (AP000423) to select the corresponding trnL(UAA)-trnF(GAA) region (bp positions 46894–48247; with 46894–46928 and 47441–47490 for the two exons of trnL, and 48175–48247 for the trnF gene). Initially, we scored this region with the central anticodon domain of the trnF gene (fig. 1). After the recognition of six pseudogenic copies of this anticodon domain (region D) in the trnL-F spacer region, an alignment of these multiplicated copies and its flanking sequences (regions A–C, E) was done manually (fig. 2a). A blast search in the whole chloroplast genome of A. thaliana using these anticodon copies as query revealed the trnF gene as the only possible source of the pseudogenes.
FIG. 1.— Nucleotide sequence and secondary structure of the Arabidopsis thaliana trnF gene. Secondary structuring of the DNA is indicated by symbols f–j and f'–j'.
FIG. 2.— (a) Nucleotide sequence of the Arabidopsis thaliana trnL (2. exon), the trnL-F intergenic spacer region, and part of the trnF gene. The duplicated copies have been scored from I to VIII, and the different regions have been named A–E. Copies IV–VI refer to the copies found in the other Arabidopsis species investigated, but they are not, or only partially, present in A. thaliana (fig. 1 and (b) of this figure). *, # search motifs and matches for a blast search within the whole plastome (refer to table 2). + search motifs against the whole plastome (refer to table 2). (b) Alignment of eight types of the trnF pseudogene copies demonstrating single nucleotide and indel polymorphism observed in the trnL-F region of Arabidopsis-Cardaminopsis. The results of a phylogenetic analysis based on this alignment are shown in figure 5. Designation of the several regions (A–E) follows (a) of this figure.
Table 2 Distribution of Short DNA Motifs A, B, C (Refer to Fig. 2a) Within the Arabidopsis thaliana Chloroplast Genome (the Position Is Given in bp)
FIG. 5.— Phylogenetic relationships among "halimolobine" crucifers and out-group taxa as published by Bailey, Price, and Doyle (2002). The occurrence of multiplicated trnF anticodon domains is indicated.
In order to obtain information about additional co-occurrence of sequences similar to the flanking regions A–C in the chloroplast genome, we searched for exact matches of highly conserved 7- to 8-bp fragments (fig. 2a) within the A. thaliana chloroplast genome.
Regions A–C were also used for subsequent blast searches against the whole chloroplast genome to identify similarly modularized DNA sequences.
TrnF Pseudogenes in the Genus Arabidopsis-Cardaminopsis
A selected number of 11 sequences from the trnL(UAA)-trnF(GAA) region of different Arabidopsis species (the former genus Cardaminopsis, for details refer to O'Kane and Al-Shehbaz 1997) are presented here (table 1). The recognition of duplicated sequences has been performed taking advantage of the results from A. thaliana (fig. 2a), and a corresponding alignment has been generated manually (fig. 1, Supplementary Material online). For a deeper understanding of copy number evolution within the genus Arabidopsis, we separated each single pseudogene copy from each Arabidopsis sequence, aligned them accordingly (fig. 2b), and performed a phylogenetic analysis using a parsimony approach (PAUP4.0b10, Swofford 2000) with the heuristic search settings using the tree-bisection-reconnection option and using the option GAPMODE = MISSING. No additional gap coding has been performed (e.g., as binary character) to minimize a bias caused by the alignment of the multiple sequences itself. The bootstrap option of PAUP (1,000 replicates) was used to assess relative support in the unweighted parsimony analysis.
Halimolobine Brassicaceae
The anticodon domain sequence from A. thaliana has been also used to recognize duplicated copies in a recently published study on systematics of halimolobine Brassicaceae (Bailey, Price, and Doyle 2002). We used the original alignment to demonstrate anticodon domain copy number distribution and its correspondence to the published phylogenetic hypothesis, which is not only based on the trnL-F region but which is also supported by sequence data of the internal transcribed spacers of nuclear ribosomal DNA (ITS 1 and ITS 2) and the pistillata intron.
Comparisons Within the Brassicaceae Family
For all trnL-trnF sequences as summarized in table 1 we analyzed (1) the occurrence of duplicated sequences of the anticodon domain of the trnF(GAA) gene and (2) the occurrence of the several motifs (A–C, E) as characterized in A. thaliana (fig. 2a). The results were compared with a phylogeny of the whole family. The most comprehensive and available phylogeny based on a multiple data set (internal transcribed spacers 1 and 2 of nuclear ribosomal DNA, maturase K, alcoholdehydrogenase, and chalcone synthase; with several genes missing for numerous taxa in various combinations) has been published recently (Koch 2003). However, a more robust phylogenetic framework with less taxa but clocklike evolution of the corresponding molecular markers (nuclear-encoded chalcone synthase and alcohol dehydrogenase, and plastidic maturase K) has been elaborated (Koch, Haubold, and Mitchell-Olds 2000, 2001), and these phylogenetic frameworks with their corresponding divergence time estimates have been used to show a timeframe for the first occurrence of trnF gene duplications. For methods of calibrating the molecular clock and computing divergence time estimates refer to Koch, Haubold and Mitchell-Olds (2000, 2001).
Results
TrnF Pseudogenes in A. thaliana
In A. thaliana we characterized six multiple sequences in total within the 668-bp trnL(UAA)-trnF(GAA) intergenic spacer with the trnF(GAA) anticodon domain as the most highly conserved element (fig. 2a). These duplicated sequences have been enumerated as copies I to VIII. Copies IV–VI refer to the copies found in the other Arabidopsis species investigated, but they are not, or only partially, present in A. thaliana (fig. 2b, fig. 1, Supplementary Material online).
However, the neighboring regions of the anticodon-like sequence (indicated as regions A–C, E) did show low similarity only to the different regions of the trnF(GAA) gene (acceptor stem, D domain, and T domain [fig. 1]), with the exception of 3–6 base pairs at the 5'- and 3'-flanking regions of the D and T domains (fig. 2a).
Of particular interest is a common AGTA motif and its modifications (ATTA, AGGA, CGTA, GGTA), respectively, which is frequently found at the 5' end of the different duplicated regions A–C (fig. 2a).
Multiple trnF Pseudogene Copies in Cardaminopsis-Arabidopsis
The twelve different trnL-F spacer sequences of the several Arabidopsis species revealed 2–8 (table 1) pseudogenic copies among the different species (figs. 1 and 2, Supplementary Material online). The most similar copy to the trnF gene is pseudogene no. VII (fig. 5a and b and table 3). However, this copy is not present in two Cardaminopsis haplotypes analyzed herein (fig. 1, Supplementary Material online). Such losses of a particular pseudogene were found for all of the eight pseudogene copies in one or another accession. It has been shown recently that A. thaliana and the remaining representatives of the newly defined genus Arabidopsis (former genus Cardaminopsis) have diverged from each other roughly 5.8 MYA (Koch, Haubold, and Mitchell-Olds 2001), which provides a good time frame for the evolution of the several copies differing in their modularized structure of regions A–E (fig. 2a).
Table 3 Simple Pairwise P Value (Mean) and Standard Deviation (Sequence Distance, PAUP4.0b10) of the Different trnF Copies (Region C–E, Refer to Fig. 2a and b) Compared to the Original trnF Gene Among the Twelve Arabidopsis Accessions Investigated Herein
A parsimony analysis using all pseudogene copies separately provides some more detailed evidence for their evolutionary history (fig. 3a and b). Copies I, VI, VII and VIII of A. thaliana are clustering together with the corresponding copies of Cardaminopsis (see alignment in fig. 1, Supplementary Material online), indicating that these copies had existed prior to the split of the two phylogenetic lineages 5.8 MYA. This is also supported by the fact that at least copies I, VII, and VIII have identical or very similar gap positions (not included in our analysis). The only exception is copy VI, of which only part of the A. thaliana sequence (regions A', C, D, and E, refer to fig. 2a) is homologous to the Cardaminopsis copy VI sequences. Cardaminopsis copy VI served as source for copy V type 2. The parsimony analysis also indicated that copies II, III, and IV of Cardaminopsis most likely evolved independently from the most similar copy VIII. Copy V type 1 is identical to copy V type 2 concerning its structural alignment and gap information, however, phylogenetic analysis based on single nucleotide polymorphisms placed this copy close to copy IV from Cardaminopsis. This is best explained by two independent duplication events.
FIG. 3.— TrnF pseudogene copy number evolution among Arabidopsis species (Arabidopsis thaliana and former Cardaminopsis). For details of the alignment refer to figure 2b. (a) 50% Majority Rule Consensus Tree based on regions C, D, E only (tree length 44, consistency index [CI] 0.62, 88 trees). (b) 50% Majority Rule Consensus Tree based on the entire region A–E (tree length 82, CI 0.64, 1,000 trees). (c) hypothetical model of successive copy number evolution in A. thaliana and members of the former genus Cardaminopsis.
Consequently, A. thaliana copies II and III evolved independently from Cardaminopsis copies II and III. A schematic summary of pseudogene copy evolution based on parsimony analysis is provided in figure 5c.
Occurrence and Distribution of Pseudogenes Among Cruciferous Plants
Duplicated copies of the trnF(GAA) anticodon domain have been detected in numerous genera of the mustard family and a summary is given in table 1. We obtained the original alignments from most of the studies listed in table 1, and we were able to search for the duplicated regions directly within these alignments. These searches revealed several findings: the alignment of a phylogenetic study of the order Capparales (Hall, Sytsma, and Iltis 2002) revealed that few sequences (Nasturtium, Barbarea, and Capsella) ended at the 3' end with the first trnF (GAA) pseudogene copy, and the authors did not provide the entire sequence of the trnL-F spacer region including the "true" trnF(GAA) gene. However, this had no effect on their conclusion and results on Capparales systematics. In our study from these three sequences we could only estimate a minimum number of pseudogene copies. Fortunately, all these species have been included in other studies and have been analyzed on a broader scale (e.g., Bailey, Price, and Doyle 2002; Bleeker et al. 2002). A similar situation has been found in Cardamine (Lihova et al. 2004). The alignment of the spacer region ended with a pseudogene and not with the trnF(GAA) gene as indicated, and here we also provided a minimum number of pseudogenes for the taxa analyzed.
In many other cases we found no duplicated anticodon domains (e.g., Draba, Arabis, Noccaea, and others, table 1). Interestingly, in all cases of lacking pseudogenes we also did not find any of the repetitive motifs B, C, and E in the corresponding trnL(UAA)-trnF(GAA) spacer region. However, the prominent motif A/A' is always present in close 5' proximity of the functional trnF gene.
The distribution of the pseudogenic trnF(GAA) tandem repeats is totally in congruence with previously published phylogenies (fig. 4) of the Brassicaceae family (Koch 2003; summarized in Koch, Al-Shehbaz, and Mummenhoff 2003), and it is obvious that a first pseudogene copy evolved only once at the base of a highly supported monophyletic lineage (fig. 4). This is the first time that a reliable marker (molecular or morphological) has been described, which separates this taxonomically notorious difficult family with a relative deep split in time of approximately 18.5 (mean estimate of matk and chs from node A, fig. 4) to 16 MYA (mean estimate of matk and chs from node B, fig. 4). Divergence time estimates have been redrawn from previous investigations (Koch, Haubold, and Mitchell-Olds 2001).
FIG. 4.— Phylogenetic relationships among cruciferous plants based on chs and matK sequence data (redrawn from Koch, Haubold, and Mitchell-Olds 2001). Some genera have been added according to Koch (2003) and their phylogenetic position is indicated by a dotted line. Filled circles indicate taxa with trnF pseudogenes. Taxa marked with open circles have been proved to contain no pseudogenes.
However, "non" pseudogene–containing taxa remain paraphyletic in respect to the pseudogene carrying taxa.
The Example "Halimolobine" Brassicaceae
The analysis of the alignment provided by Bailey, Price, and Doyle (2002) revealed varying numbers of a pseudogenic anticodon domain from 1 to 6 (fig. 5). Enumeration of pseudogene anticodon copies followed their occurrence within the alignment and has been adopted to copy enumeration I to VIII in A. thaliana and Cardaminopsis. This analysis demonstrates that all copies (except for copy number 1) either have been constituted independently several times or have been lost several times in parallel throughout their evolution. Interestingly, the different regions A, A', B, and C are present at the 5'end of the first pseudogene copy in all taxa carrying the pseudogenes. A comparison with taxa that do not carry a pseudogene copy demonstrates that among all cruciferous taxa analyzed herein only region A is highly conserved and regions B, A' and C are missing in non–pseudogene carrying species (data not shown).
TrnF(GAA) Pseudogene Evolution in Angiosperms
We screened the trnL-trnF alignment of Borsch et al. (2003) covering all major groups of angiosperms for trnF pseudogene (or partial anticodon domain) insertion, and none of these taxa contained any duplications. In addition, we also screened this alignment for the different regions A, B, and C occurring in all cruciferous taxa showing anticodon domain duplications. However, none of these regions could be identified with a significant sequence identity among all noncruciferous taxa analyzed by Borsch et al. (2003).
This is also true for those Asteraceae (Microseris, Uropappus, Taraxacum) that represent the only examples of trnF gene copy number variation outside the Brassicaceae (Vijverberg and Bachmann 1999; Wittzell 1999).
However, in these cases the entire trnF gene has been duplicated, which is in sharp contrast to the Brassicaceae with extensive duplication of the trnF anticodon domain only.
Discussion
TrnF Pseudogene Characterization in Cruciferous Plants
The trnF(GAA) pseudogenes from cruciferous plants are quite different from those characterized in Microseris and Uropappus (Vijverberg and Bachmann 1999). In these species the whole gene including both acceptor stem regions has been tandemly duplicated with a sequence identity to the original trnF gene varying from 88% to 99%. A similar situation was found in Taraxacum (Wittzell 1999), with a sequence identity ranging from 80%–92%. Contrarily, in A. thaliana several different repetitive motifs occur (A–C, E) as indicated in figure 2a, which are not part of the functional trnF gene. It is notable that these motifs are also conserved among a variety of different taxa exclusively characterized by anticodon domain duplications, as shown by the halimolobine species data set (fig. 2, Supplementary Material online, e.g., alignment positions 966–1072). The majority of these motifs are not found in trnL-F spacer regions of cruciferous plants lacking such duplications. The only exception is the 5' region A/A'. This motif of 22 base pairs (fig. 2a) is present in all trnL-F spacer regions in closest proximity to the functional trnF gene. Blast searches for region A, B, and C against the whole chloroplast genome sequence of A. thaliana (AP000423) revealed no significant hits, with the exception of parts of region A matching parts of the rps7 gene and also its neighboring trnV-rps7 spacer (table 2 and fig. 2a). Interestingly, the situation changes when we select shorter motifs (7–8 bp) from regions A, B, and C to search for identical motifs throughout the A. thaliana chloroplast genome (table 2 and fig. 2a). As expected because of shorter search strings, the number of hits increased greatly. It is also obvious from the summary scores in table 2, that the single hits are randomly distributed all over the plastome, which in this case is 154,478 bp in size. However, the three selected 7- to 8-bp motifs revealed a significant nonrandom clustering. Out of 25 hits in total, 13 are co-occurring in similar regions of the plastome—a finding, for which we have no explanation so far. From these results we can conclude that (1) the flanking sequence regions A–C of the trnF(GAA) anticodon domain are unique and have not been simply transferred from other regions of the plastome and (2) the occurrence of region A in all cruciferous taxa regardless of any anticodon domain duplication provides evidence that the duplicated sequences resulted from rearrangement of the trnF gene and its neighboring areas. However, the finding that the only significant matches of the blast search concern region A (table 2), and, moreover, that these matches are found in a coding gene (rps7) and its neighboring spacer region (spacer trnV-rps7) might indicate that a sequence like that from region A might have driven the first corresponding duplication.
A comparison of the distribution of trnF anticodon duplications among cruciferous plants implies a single origin of an initial duplication within a monophyletic lineage (fig. 4). The dates of divergence between nodes A (approximately 18 MYA) and B (approximately 16 MYA) provide time estimates (Koch, Haubold, and Mitchell-Olds 2000, 2001). The phylogenetic hypothesis shown in figure 4 comprises only a limited set of taxa. However, our finding of the distribution of anticodon duplications among cruciferous plants is also fully consistent with a large-scale phylogeny provided recently (Koch 2003) and not shown here.
The consistent co-occurrence of flanking regions with duplicated anticodon domains can be studied as an example in some more detail focusing on the halimolobine crucifer data set (Bailey, Price, and Doyle 2002). A. thaliana anticodon pseudogene copy 1 (fig. 2a) is distributed in all species included in this study (fig. 5, cf. fig. 2, Supplementary Material online: alignment positions 966–1100). Pairwise sequence identity of this pseudogene copy 1 among the different species is always higher than compared to the original trnF gene (data not shown), which also provides good evidence for the monophyletic origin of the first pseudogene copy.
In addition to the duplicated pseudogenes (anticodon domains) and the neighboring regions A–E, we were also able to characterize promoter elements that show high similarity to a putative sigma70-type bacterial promoter motif (–35 TTGACA/–10 GAGGAT) (Quandt et al. 2004). In a comprehensive study across land plants, this motif has been found consistently (Quandt et al. 2004), and it has been speculated that this promoter represents the original trnFGAA gene promoter. However, it has been concluded that the trnFGAA gene is cotranscribed with trnLUAA (Kanno and Hirai 1993), and consequently the –35 TTGACA/–10 GAGGAT promoter motif in front of the trnFGAA gene should be nonfunctionable. Our data largely support this conclusion because the –10 element and the –35 element are present in several trnL-F spacer sequences of the genus Arabidopsis (fig. 1, Supplementary Material online: position 198–203, position 900–905), and all duplications are inserted between these two elements. Consequently, it can be hardly believed that they are still functionable.
TrnF Pseudogene Copy Number Evolution: The Genus Arabidopsis
We can only speculate about the mode of origin of the first pseudogene copy, which dates back roughly 17 MYA. However, the example from Cardaminopsis-Arabidopsis provides some more detailed insights into the dynamics of subsequent copy number evolution (figs. 3a–c). In all eight cases the newly arisen copy was placed between already existing pseudogenes (fig. 3c), and they did not move further downstream of the 5' end of copy I. The parsimony analysis did not recognize all groups significantly with high bootstrap support, but tree topologies are congruent when different proportions of the total pseudogene region have been selected (fig. 3a vs. fig. 3b), which might indicate that in most cases the total region has been subjected to several duplication events. However, we cannot exclude additional recombination events, and the example of copy V might indicate such a situation: Relative position and gap structure is totally conserved between both copies (fig. 2b), but parsimony analysis does not recognize them as orthologues (fig. 3a and b).
Similarly, genetic distances are not always in congruence with our hypothesis of trnF pseudogene evolution (table 3). One might expect that if we regard copy I as ancestral type, this copy must show the highest sequence distance when compared to the original functional gene. This is not the case, and copies VI and VIII show significant higher distance values than copies I or VII. Our sequence distance values provide a mutation rate for regions C–E (fig. 2a), varying between 2.4 x 10–8 and 3.8 x 10–8 mutations/site/year. However, these values exceed the normal mutation rate of the entire trnL-intron–trnL-F spacer region by a factor of 20 (3.6 x 10–9 to 7.7 x 10–9, e.g., calculated in Mummenhoff et al. 2004), which can be at least partly explained by an increase of the mutation rate of single nucleotides by structural mutations such as recombination resulting in new copies. From our data it might be also speculated that the highly conserved 5' region of the first pseudogene copy (as well as of the 3' part and, by selection, the trnF gene) might be the consequence of not being prone to recombination of these regions, in contrast to the region in-between.
However, further research is needed to understand the underlying evolutionary mechanisms.
Phylogenetic Utility of trnF Pseudogenes
It has to be mentioned here that the evolutionary history of the Brassicaceae on a family-wide scale is still poorly understood (Koch 2003; Koch, Al-Shehbaz, and Mummenhoff 2003). The most important conclusion of the various phylogenetic studies published so far is that traditional classification schemes based on morphology, embryology, or cytology often do not reflect phylogenetic relationships, depending on the taxonomical level considered. The occurrence of trnF pseudogenes among cruciferous plants is the first character defining a significant split in the deep Brassicaceae phylogeny roughly 16–18 MYA. The corresponding clade comprises taxa from various artificially designed tribes (Sisymbrieae, Arabideae, Lepidieae) as defined by traditional taxonomists such as Janchen (1942), Schulz (1936), or Hayek (1911). Future molecular studies might substantiate our findings on a family-wide scale to contribute clarifying the systematic situation in the mustard family as it was done based on structural mutation in the chloroplast genome in various families (Asteraceae: Jansen and Palmer 1987; Fabaceae: Bruneau, Doyle, and Palmer 1990; Doyle, Lavin, and Bruneau 1992; Poaceae: Doyle et al. 1992; Doyle, Doyle, and Palmer 1995; and reviewed by D. E. Soltis and P. S. Soltis 1998).
Acknowledgements
This work was supported by grants from the Austrian Science Foundation—FWF (GEN-15609 and GEN-14463) and the German Science Foundation—DFG (Ko-2302/1-1) to M.K. We also thank all authors providing us with their original DNA sequence alignments.
References
Bailey, C. D., R. A. Price, and J. J. Doyle. 2002. Systematics of the halimolobine Brassicaceae: evidence from three loci and morphology. Syst. Bot. 27:318–332.
Bakker, F. T., A. Culham, R. Gomez-Martinez, J. Carvalho, J. Compton, R. Dawtrey, and M. Gibby. 2000. Pattern of nucleotide substitution in angiosperm cpDNA trnL(UAA)-trnF(GAA) regions. Mol. Biol. Evol. 17:1146–1155.
Besendahl, A., Y.-L. Qiu, J. Lee, J. D. Palmer, and D. Bhattacharya. 2000. The cyanobacterial origin and vertical transmission of the plastid tRNALeu group-I-intron. Curr. Genet. 37:12–23.
Bleeker, W., A. Franzke, K. Pollmann, A. H. D. Brown, and H. Hurka. 2002. Phylogeny and biogeography of Southern Hemisphere high-mountain Cardamine species (Brassicaceae). Aust. Syst. Bot. 15:575–581.
Bleeker, W., and H. Hurka. 2001. Introgressive hybridization in Rorippa (Brassicaceae): gene flow and its consequences in natural and anthropogenic habitats. Mol. Ecol. 10:2013–2022.
Bleeker, W., C. Weber-Sparenberg, and H. Hurka. 2002. Chloroplast DNA variation and biogeography in the genus Rorippa Scop. (Brassicaceae). Plant Biol. 4:104–111.
Borsch, T., K. W. Hilu, D. Quandt, V. Wilde, C. Neinhuis, and W. Barthlott. 2003. Noncoding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J. Evol. Biol. 16:558–576.
Bowman, C. M., R. F. Barker, and T. A. Dyer. 1988. The location and possible evolutionary significance of small dispersed repeats in wheat ctDNA. Curr. Genet. 10:931–941.
Bruneau, A., J. J. Doyle, and J. D. Palmer. 1990. A chloroplast DNA structural mutation as a subtribal character in the Phaseoleae (Leguminosae). Syst. Bot. 15:378–386.
Cech, T. R., D. Herschlag, J. A. Piccirilli, and A. M. Pyle. 1992. RNA catalysis by a group I ribozyme: developing a model for transition state stabilization. J. Biol. Chem. 267:17479–17482.
Costa, J. L., P. Paulstrud, and P. Lindblatt. 2002. The cyanobacterial tRNALeu(UAA) intron: evolutionary patterns in a genetic marker. Mol. Biol. Evol. 19:850–857.
Dally, A. M., and G. Second. 1990. Chloroplast DNA diversity in wild and cultivated species of rice (Genus Oryza, section Oryza). Cladistic-mutation and genetic-distance analysis. Theor. Appl. Genet. 80:209–222.
Dobe, C., T. Mitchell-Olds, and M. Koch. 2004. Extensive chloroplast haplotype variation indicates Pleistocene hybridization and radiation of North American Arabis drummondii, A. xdivaricarpa, and A. holboellii (Brassicaceae) Mol. Ecol. 13:349–370.
Doyle, J. J., J. I. Davis, R. J. Soreng, D. Garvin, and M. J. Anderson. 1992. Chloroplast DNA inversions and the origin of the grass family (Poaceae). Proc. Natl. Acad. Sci. USA 89:7722–7726.
Doyle, J. J., J. L. Doyle, and J. D. Palmer. 1995. Multiple independent losses of two genes and one intron from legume chloroplast genomes. Syst. Bot. 20:272–294.
Doyle, J. J., M. Lavin, and A. Bruneau. 1992. Contributions of molecular data to papillionoid legume systematics. Pp. 223–251 in P. S. Soltis, D. E. Soltis, and J. J. Doyle, eds. Molecular systematics of plants. Chapman and Hall, New York.
Drábkova, L., J. Kirschner, . Vlek, and V. Paek. 2004. TrnL-trnF intergenic spacer and trnL intron define major clades within Luzula and Juncus (Juncaceae): importance of structural mutations. J. Mol. Evol. 59:1–10.
Goremykin, V. V., K. I. Hirsch-Ernst, S. W?lfl, and F. H. Hellwig. 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20:1499–1505.
Govindaraju, D. R., B. R. Dancik, and D. B. Wagner. 1989. Novel chloroplast DNA polymorphism in a sympatric region of two pines. J. Evol. Biol. 2:49–59.
Graham, S. W., P. A. Reeves, C. E. Burns, and R. G. Olmstead. 2000. Microstructural changes in non-coding DNA: interpretation, evolution and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83–S96.
Hall, J. C., K. J. Sytsma, and H. H. Iltis. 2002. Phylogeny of Capparaceae and Brassicaceae based on chloroplast sequence data. Am. J. Bot. 89:1826–1842.
Hamilton, M. B., J. M. Braverman, and D. F. Soria-Hernanz. 2003. Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions in New World species of the Lecythidaceae. Mol. Biol. Evol. 20:1710–1721.
Hayek, A. 1911. Entwurf eines Cruciferensystems auf phylogenetischer Grundlage. Beih. Bot. Centralbl. 27:127–335.
Hewitt, G. M. 2001. Speciation, hybrid zones and phylogeography—or seeing genes in space and time. Mol. Ecol. 10:537–549.
Hipkens, V. D., K. A. Marshall, D. B. Neale, W. H. Rottmann, and S. H. Strauss. 1995. A mutation hotspot in the chloroplast genome of a conifer (Douglas fir: Pseudotsuga) is caused by variability in the number of direct repeats from a partially duplicated tRNA gene. Curr. Genet. 27:527–579.
Ingvarsson, P. K., S. Ribstein, and D. R. Taylor. 2003. Molecular Evolution of insertions and deletions in the chloroplast genome of Silene. Mol. Biol. Evol. 20:1737–1740.
Janchen, E. 1942. Das System der Cruciferen. ?sterr. Bot. Z. 91:1–28.
Jansen, R. K., and J. D. Palmer. 1987. A chloroplast DNA inversion marks an ancient evolutionary split in the sunflower family (Asteraceae). Proc. Natl. Acad. Sci. USA 84:5818–5822.
Johnson, L. B., and J. D. Palmer. 1989. Heteroplasmy of chloroplast DNA in Medicago. Plant Mol. Biol. 12:3–11.
Kanno, A., and A. Hirai. 1993. A transcription map of the chloroplast genome from rice (Oryza sativa). Curr. Genet. 23:166–174.
Kelch, D. G., A. Driskell, and B. Mishler. 2004. Inferring phylogeny using genomic characters: a case study using land plant plastomes. Mongr. Syst. Bot. Mo. Bot. Gard. 98:3–12.
Koch, M. 2003. Molecular phylogenetics, evolution and population biology in the Brassicaceae. Pp. 1–35 in A. K. Sharma and A. Sharma, eds. Plant genome: biodiversity and evolution, Vol. 1. Phanerogams. Science Publishers, Inc., Enfield, N.H.
Koch, M., and I. A. Al-Shehbaz. 2002. Molecular data indicate complex intra- and intercontinental differentiation of American Draba (Brassicaceae). Ann. Mo. Bot. Gard. 89:88–109.
———. 2004. Taxonomic and phylogenetic evaluation of the American "Thlaspi" species: identity and relationship to the Eurasian genus Noccaea (Brassicaceae). Syst. Bot. 29:375–384.
Koch, M., I. A. Al-Shehbaz, and K. Mummenhoff. 2003. Molecular systematics, evolution and population biology in the mustard family (Brassicaceae). Ann. Mo. Bot. Gard. 90:151–171.
Koch, M., C. Dobe, and M. Matschinger. 2003. The trnF(GAA) gene in cruciferous plants: extensive duplication, variation in copy number and parallel evolution. Palmarum Hortus Francofurtensis 7:54.
Koch, M., B. Haubold, and T. Mitchell-Olds. 2000. Comparative evolutionary analysis of chalcone synthase and alcoholdehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17:1483–1498.
———. 2001. Molecular systematics of the cruciferae: evidence from coding plastome matK and nuclear CHS sequences. Am. J. Bot. 88:534–544.
Kuhsel, M. G., R. Strickland, and J. D. Palmer. 1990. An ancient group I intron shared by eubacteria and chloroplasts. Science 250:1570–1573.
Lanner, C. 1998. Relationships of wild Brassica species with chromosome number 2n=18, based on comparison of the DNA sequence of the chloroplast intergenic region between trnL(UAA) and trnF(GAA). Can. J. Bot. 76:228–237.
Lee, J.-Y., K. Mummenhoff, and J. L. Bowman. 2002. Allopolyploidization and evolution of species with reduced floral structures in Lepidium L. (Brassicaceae). Proc. Natl. Acad. Sci. USA 99:16835–16840.
Lidholm, J., A. Szmidt, and P. Gustafsson. 1991. Duplication of the psbA gene in the chloroplast genome of two Pinus species. Mol. Gen. Genet. 226:345–352.
Lihova, J., J. Fuertes-Aguilar, K. Marhold, and G. Nieto-Feliner. 2004. Origin of the disjunct tetraploid Cardamine amporitana (Brassicaceae) assessed with nuclear and chloroplast DNA sequence data. Am. J. Bot. 91:1231–1242.
L?hne C., and T. Borsch. 2005. Molecular evolution and phylogenetic utility of the petD group II intron: a case study in basal angiosperms. Mol. Biol. Evol. 22:1–16.
Marshall, H. D., C. Newton., and K. Ritland. 2001. Sequence-repeat polymorphisms exhibit the signature of recombination in lodgepole pine chloroplast DNA. Mol. Biol. Evol. 18:2136–2138.
Martin, W., B. Stoebe,V. Goremykin, S. Hansmann, M. Hasegawa, and K. V. Kowallik. 1998. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393:161–165.
Matschinger, M., and M. Koch. 2003. Molecular systematics, phylobiogeography and evolution of the genus Cardaminopsis Hayek (Brassicaceae), the closest relatives of the model plant Arabidopsis thaliana (L.) Heynh. Palmarum Hortus Francofurtensis 7:196.
Millen, R. S., R. G. Olmstead, K. L. Adams et al. (12 co-authors). 2001. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13:645–658.
Mummenhoff, K., H. Brüggemann, and J. Bowman. 2001. Chloroplast DNA phylogeny and biogeography of the genus Lepidium (Brassicaceae). Am. J. Bot. 88:2051–2063.
Mummenhoff, K., P. Linder, N. Friesen, J. L. Bowman, J.-Y. Lee, and A. Franzke. 2004. Molecular evidence for bicontinental hybridogenous genomic constitution in Lepidium sensu stricto (Brassicaceae) species from Australia and New Zealand. Am. J. Bot. 91:254–261.
Newton, A. C., T. R. Allnutt, A. C. M. Gillies, A. J. Lowe, and R. A. Ennos. 1999. Molecular phylogeography, intraspecific variation and the conservation of tree species. Trends Ecol. Evol. 14:140–145.
O'Kane, S. L. Jr., and I. A. Al-Shehbaz. 1997. A synopsis of Arabidopsis (Brassicaceae). Novon 7:323–327.
O'Kane, S., and I. A. Al-Shehbaz. 2003. Phylogenetic position and generic limits of Arabidopsis (Brassicaceae) based on sequences of nuclear ribosomal DNA. Ann. Mo. Bot. Gard. 90:603–612.
Olmstead, R. G., and J. D. Palmer. 1994. Chloroplast DNA systematics: a review of methods and data analysis. Am. J. Bot. 81:1205–1224.
Palmer, J. D., and W. F. Thompson. 1981. Rearrangements in the chloroplast genomes of mung bean and pea. Proc. Natl. Acad. Sci. USA 78:5533–5537.
———. 1982. Chloroplast DNA rearrangements are more frequent when a large inverted repeat sequence is lost. Cell 29:537–550.
Paquin, B., S. D. Kathe, S. A. Nierzwicki-Bauer, and D. A. Shub. 1997. Origin and evolution of group I introns in cyanobacterial tRNA genes. J. Bacteriol. 179:6798–6806.
Perry, A. S., S. Brennan, D. J. Murphy, T. A. Kavanagh, and K. H. Wolfe. 2002. Evolutionary re-organization of a large operon in Adzuki bean chloroplast DNA caused by inverted repeat movement. DNA Res. 9:157–162.
Perry, A. S., and K. H. Wolfe. 2002. Nucleotide substitution rates in legume chloroplast DNA depend on the presence of the inverted repeat. J. Mol. Evol. 55:501–508.
Quandt, D., K. Müller, M. Stech, J.-P. Frahm, W. Frey, K. W. Hilu, and T. Borsch. 2004. Molecular evolution of the chloroplast trnL-F region in land plants. Monogr. Syst. Bot. Missouri Bot. Gard. 98:13–37.
Reboud, X., and C. Zeyl. 1994. Organelle inheritance in plants. Heredity 72:132–140.
Schulz, O. E. 1936. Cruciferae. Pp. 227–658 in A. Engler and K. Prantl, eds. Die natürlichen Pflanzenfamilien, Vol. 17B. Verlag von Wilhelm Engelmann, Leipzig.
Soltis, D. E., and P. S. Soltis. 1998. Choosing an approach and appropriate gene for phylogenetic analysis. Pp. 1–42 in D. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecular systematics of plants II. DNA sequencing. Kluwer Academic Publishers, London.
Swofford D. L. 2000. PAUP* 4.0b10. Sinauer Associates, Sunderland, Mass.
Taberlet, P., L. Gielly, and G. Pautou. 1991. Universal primers for amplification of three non-coding chloroplast regions. Plant Mol. Biol. 17:1105–1109.
Vijverberg, K., and K. Bachmann. 1999. Molecular evolution of a tandemly repeated trnF(GAA) gene in the chloroplast genome of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analysis. Mol. Biol. Evol. 16:1329–1340.
Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sigiura. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91: 9794–9798.
Wittzell, H. 1999. Chloroplast DNA variation and reticulate evolution in sexual and apomictic sections of dandelions. Mol. Ecol. 8:2023–2035.
Wolfe, K. H., C. W. Morden, and J. D. Palmer. 1992. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc. Natl. Acad. Sci. USA 89:10648–10652.
Xu, M.-Q., S. D. Kathe, H. Goodrich-Blair, S. A. Nierzwicki-Bauer, and D. A. Shub. 1990. Bacterial origin of a chloroplast intron: conserved self-splicing group-I introns in cyanobacteria. Science 250:1566–1570.
Yang, Y.-W., P.-Y. Tai, Y. Chen, and W.-H. Li. 2002. A study of the phylogeny of Brassica rapa, B. nigra, Raphanus sativus, and their related genera using noncoding regions of the chloroplast DNA. Mol. Phylogenet. Evol. 23:268–275.(Marcus A. Koch*, Christop)
Correspondence: E-mail: marcus.koch@urz.uni-heidelberg.de.
Abstract
Recently, we used the 5'-trnL(UAA)–trnF(GAA) region of the chloroplast DNA for phylogeographic reconstructions and phylogenetic analysis among the genera Arabidopsis, Boechera, Rorippa, Nasturtium, and Cardamine. Despite the fact that extensive gene duplications are rare among the chloroplast genome of higher plants, within these taxa the anticodon domain of the trnF(GAA) gene exhibit extensive gene duplications with one to eight tandemly repeated copies in close 5' proximity of the functional gene. Interestingly, even in Arabidopsis thaliana we found six putative pseudogenic copies of the functional trnF gene within the 5'-intergenic trnL-trnF spacer. A reexamination of trnL(UAA)-trnF(GAA) regions from numerous published phylogenetic studies among halimolobine, cardaminoid, and other cruciferous taxa revealed not only extensive trnF gene duplications but also favor the hypothesis about a single origin of trnF pseudogene formation during evolution of the Brassicaceae family 16–21 MYA. Conserved sequence motifs from this tandemly repeated region are codistributed nonrandomly throughout the plastome, and we found some similarities with a DNA sequence duplication in the rps7 gene and its adjacent spacer. Our results demonstrate the potential evolutionary dynamics of a plastidic region generally regarded as highly conserved and probably cotranscribed and, as shown here for several genera among cruciferous plants, greatly characterized by parallel gains and losses of duplicated trnF copies.
Key Words: Brassicaceae ? trnF(GAA) ? pseudogenes ? phylogeny ? gene duplication
Introduction
Among plant systematic and phylogeographic studies the chloroplast genome is widely used and generally accepted as an excellent source for molecular information (Olmstead and Palmer 1994; Newton et al. 1999; Hewitt 2001). There are several reasons for this. First, the uniparental inheritance (maternally in most angiosperms, paternally in gymnosperms; Reboud and Zeyl 1994) ensures orthology of sequences. Biparental inheritance is a rare exception (Johnson and Palmer 1989). Second, even within an individual the possibility of recombination between genomes from individual plastids is extremely low, and there are only a few studies describing the occurrence of multimeric chloroplast DNA (cpDNA) genomes or interchromosomal cpDNA recombination (Govindaraju, Dancik, and Wagner 1989; Dally and Second 1990). Third, dramatic changes in gene content and structure only occurred after the chloroplast genome entered a eukaryotic cell via primary endocytobiosis (Martin et al. 1998), whereas land plant plastomes are highly conserved (Goremykin et al. 2003; Kelch, Driskell, and Mishler 2004). However, some studies indicated that the chloroplast genome in higher plants still has the potential for evolutionary changes as indicated by a radically reduced "minimal plastid" genome (parasitic Epifagus: Wolfe, Morden, and Palmer 1992) or possible DNA recombination (lodgepole pine: Marshall, Newton, and Ritland 2001).
A summary of structural mutations in the chloroplast genome is provided by Vijverberg and Bachmann (1999), and it has been concluded that most structural mutations concern indels <10 bp. These microstructural changes have been shown to be extremely useful even in resolving deep phylogenies (Graham et al. 2000; L?hne and Borsch 2005) and have been analyzed in more detail in the chloroplast genome of Silene (Ingvarsson, Ribstein, and Taylor 2003). Structural mutations such as gene duplications among higher plant plastomes are rarely described. Those examples involve tRNA genes (e.g., Hipkens et al. 1995; Vijverberg and Bachmann 1999; Drábkova et al. 2004), rpl2 and rpl23 (Bowman, Barker, and Dyer 1988), psbA (Lidholm, Szmidt, and Gustafsson 1991), and psaM (Wakasugi et al. 1994). An overview of losses of chloroplast genes in angiosperms is provided by Millen et al. (2001), and it seems obvious that most of the duplications can be manifested only in rearranged chloroplast genomes, such as those of the grasses, legumes, and conifers. Interestingly, evolutionary dynamics of the chloroplast genomes such as rearrangements and nucleotide substitution rates greatly depend on such large-scale rearrangement, for example, the loss of one copy of the inverted repeat (IR) (Palmer and Thompson 1981, 1982; Perry and Wolfe 2002). And consequently one of the few reports of pseudogenes came from Vigna angularis, legume family, and describes ycf2 gene duplication (Perry et al. 2002).
One of the most widely used plastidic molecular markers in plant systematics and phylogeography is the trnT-trnF region since Taberlet et al. (1991) introduced universal primers to amplify the region comprising the trnT(UGA) gene, the trnL(UAA) gene including a group I intron, the trnF(GAA) gene, and the corresponding two spacers. Interestingly, this region provided not only phylogenetic signal to resolve deep angiosperm phylogeny (e.g., Borsch et al. 2003) but also revealed extensive haplotype variation to elaborate speciation processes on the population level (e.g., Dobe, Mitchell-Olds, and Koch 2004). The trnL-trnF genes are cotranscribed (Kanno and Hirai 1993), and therefore it can be assumed that intron as well as spacer regions are of functional importance.
The trnL group I intron resembles an ancestral intron type, which can be traced back to a single cyanobacterial endosymbiosis, and this region has been analyzed intensively (Kuhsel, Strickland, and Palmer 1990; Xu et al. 1990; Cech et al. 1992; Paquin et al. 1997; Besendahl et al. 2000; Costa, Paulstrud, and Lindblatt 2002). However, less efforts have been undertaken to understand function and evolution of the trnL(UAA)-trnF(GAA) spacer region (Bakker et al. 2000; Hamilton, Braverman, and Soria-Hernanz 2003) and to analyze the relevance of putative promoter elements and mutational hotspots with little structural constraints for evolution and phylogenetic reconstructions (Borsch et al. 2003; Quandt et al. 2004).
It is remarkable that the only examples of trnF gene copy number variation outside the Brassicaceae have been reported from Microseris and Uropappus (Vijverberg and Bachmann 1999), Taraxacum (Wittzell 1999) from the Asteraceae family, both of which are members of the tribe Lactuceae, and from Juncus and Luzula from the Juncaceae family (Drábkova et al. 2004). It has been concluded that polymorphic pseudogenes are not subject to purifying selection in Taraxacum, and in closely related genera Youngia and Crepis, no pseudogenes have been observed (Wittzell 1999). This finding might evoke the question whether an initial pseudogene formation could have occurred within a particular lineage of Asteraceae with parallel losses and elimination of particular pseudogene copies. However, this is uncertain, although Vijverberg and Bachmann (1999) already concluded that an initial duplication must have occurred in an ancestor of the genera under study and that the duplication is rather ancient. The repetitive nature of the pseudogenes is substantiated by interspersed 4-bp (AATA) motifs in Taraxacum (Wittzell 1999), and a proposed mechanism of generation of pseudogenes via interchromosomal recombination and intrachromosomal duplications has been provided (Vijverberg and Bachmann 1999).
In this study we investigated the trnF(GAA) gene and its evolution in cruciferous plants. Recently, we detected extensive trnF(GAA) pseudogene formation among the cruciferous genera Cardaminopsis (meanwhile integrated into a newly defined genus Arabidopsis [O'Kane and Al-Shehbaz 1997, 2003] and Boechera [Koch, Dobe, and Matschinger 2003; Dobe, Mitchell-Olds, and Koch 2004]). Therefore, herein we aim (1) to reconstruct the evolutionary history of trnF pseudogenes in Brassicaceae with special emphasis on the genus Arabidopsis (which is the most diverse model known so far), (2) to analyze mutational patterns and sequence motifs in the spacer region that might provide insights into the mechanism of pseudogene formation, and (3) to evaluate the positional occurrence of complete or partial trnF pseudogenes in angiosperm chloroplast genomes to assess if they form transposable elements within the plastome to demonstrate their taxonomic distribution and to comment on their phylogenetic utility.
Materials and Methods
The DNA sequence data of the trnL(UAA)-trnF(GAA) region were obtained from two different sources. Most of the sequences have been selected from previously published studies on crucifer systematics and evolution (table 1). All these sequences had been already deposited in GenBank, and we only refer to the corresponding publication. A second source and large-scale study of approximately 750 accessions sequenced is focusing on the phylogeography of the genus Arabidopsis, and in this study we present some selected sequences (Matschinger and Koch 2003) representing most of the variation in trnF copy number (table 1). In addition, we submitted sequence data of numerous taxa from the genera Draba and Arabis to GenBank (unpublished data, AF134196–AF134278). However, none of those sequences contained any pseudogene.
Table 1 Distribution of trnF Pseudogenes Among Cruciferous Plants and Number of Multiplicated trnF Anticodon Domains
Detailed protocols of DNA isolation, polymerase chain reaction, and DNA sequencing are given in Dobe, Mitchell-Olds and Koch (2004), and the methods used follow standard procedures.
For halimolobine Brassicaceae and several out-groups we used the trnL(UAA)-trnF(GAA) alignment provided by Bailey, Price, and Doyle (2002) as an example to demonstrate pseudogene copy number distribution in the context of a published and robust phylogeny.
Additionally, we selected trnL-F spacer regions from numerous cruciferous taxa and several species from the order Capparales as out-groups (table 1) to cover as many genera as possible. A National Center for Biotechnology Information GenBank search (using the ENTREZ gateway and "keywords trnF and Brassicaceae" at http://www.ncbi.nlm.nih.gov/entrez/) resulted in 726 sequences. However, there are only a few publications and studies comprising more than 99% of these sequences (Draba, Erophila, Tomostima, Cusickiella, and related taxa: Koch and Al-Shehbaz [2002] [78 sequences]; Draba, Schivereckia, Arabis: Koch unpublished [82 sequences: AF134196– AF134278]; Lepidium, Cardaria, Hymenolobus, Pritzelago, Hornungia and related taxa: Mummenhoff, Brüggemann, and Bowman [2001] [82 sequences]; Lepidium: Lee, Mummenhoff, and Bowman [2002] [58 sequences]; selection of taxa from the order Capparales: Hall, Sytsma, and Iltis [2002] [57 sequences in total, 11 from Brassicaceae]; halimolobine Brassicaceae: Bailey, Price, and Doyle [2002] [47 sequences]; Rorippa and Nasturtium: Bleeker, Weber-Sparenberg, and Hurka [2002] [35 sequences]; Rorippa: Bleeker and Hurka [2001] [76 sequences—characterizing haplotypes from 359 individuals]; Cardamine, Rorippa: Bleeker et al. [2002] [24 sequences]; Cardamine: Lihova et al. [2004] [76 sequences]; selection of taxa from tribe Brassiceae: Yang et al. [2002] [12 sequences]; Brassia relatives and Diplotaxis: Lanner [1998] [15 sequences]; Noccaea, Raparia, and Microthlaspi: Koch and Al-Shehbaz [2004] [26 sequences]; Boechera, Cusickiella and related taxa: Dobe, Mitchell-Olds, and Koch [2004] [103 sequences—characterizing haplotypes from 654 accessions]). The several taxa are summarized in table 1.
TrnF(GAA) Pseudogene Recognition and Copy Number in Arabidopsis thaliana
We used the GenBank accession of the chloroplast genome of Arabidopsis thaliana (AP000423) to select the corresponding trnL(UAA)-trnF(GAA) region (bp positions 46894–48247; with 46894–46928 and 47441–47490 for the two exons of trnL, and 48175–48247 for the trnF gene). Initially, we scored this region with the central anticodon domain of the trnF gene (fig. 1). After the recognition of six pseudogenic copies of this anticodon domain (region D) in the trnL-F spacer region, an alignment of these multiplicated copies and its flanking sequences (regions A–C, E) was done manually (fig. 2a). A blast search in the whole chloroplast genome of A. thaliana using these anticodon copies as query revealed the trnF gene as the only possible source of the pseudogenes.
FIG. 1.— Nucleotide sequence and secondary structure of the Arabidopsis thaliana trnF gene. Secondary structuring of the DNA is indicated by symbols f–j and f'–j'.
FIG. 2.— (a) Nucleotide sequence of the Arabidopsis thaliana trnL (2. exon), the trnL-F intergenic spacer region, and part of the trnF gene. The duplicated copies have been scored from I to VIII, and the different regions have been named A–E. Copies IV–VI refer to the copies found in the other Arabidopsis species investigated, but they are not, or only partially, present in A. thaliana (fig. 1 and (b) of this figure). *, # search motifs and matches for a blast search within the whole plastome (refer to table 2). + search motifs against the whole plastome (refer to table 2). (b) Alignment of eight types of the trnF pseudogene copies demonstrating single nucleotide and indel polymorphism observed in the trnL-F region of Arabidopsis-Cardaminopsis. The results of a phylogenetic analysis based on this alignment are shown in figure 5. Designation of the several regions (A–E) follows (a) of this figure.
Table 2 Distribution of Short DNA Motifs A, B, C (Refer to Fig. 2a) Within the Arabidopsis thaliana Chloroplast Genome (the Position Is Given in bp)
FIG. 5.— Phylogenetic relationships among "halimolobine" crucifers and out-group taxa as published by Bailey, Price, and Doyle (2002). The occurrence of multiplicated trnF anticodon domains is indicated.
In order to obtain information about additional co-occurrence of sequences similar to the flanking regions A–C in the chloroplast genome, we searched for exact matches of highly conserved 7- to 8-bp fragments (fig. 2a) within the A. thaliana chloroplast genome.
Regions A–C were also used for subsequent blast searches against the whole chloroplast genome to identify similarly modularized DNA sequences.
TrnF Pseudogenes in the Genus Arabidopsis-Cardaminopsis
A selected number of 11 sequences from the trnL(UAA)-trnF(GAA) region of different Arabidopsis species (the former genus Cardaminopsis, for details refer to O'Kane and Al-Shehbaz 1997) are presented here (table 1). The recognition of duplicated sequences has been performed taking advantage of the results from A. thaliana (fig. 2a), and a corresponding alignment has been generated manually (fig. 1, Supplementary Material online). For a deeper understanding of copy number evolution within the genus Arabidopsis, we separated each single pseudogene copy from each Arabidopsis sequence, aligned them accordingly (fig. 2b), and performed a phylogenetic analysis using a parsimony approach (PAUP4.0b10, Swofford 2000) with the heuristic search settings using the tree-bisection-reconnection option and using the option GAPMODE = MISSING. No additional gap coding has been performed (e.g., as binary character) to minimize a bias caused by the alignment of the multiple sequences itself. The bootstrap option of PAUP (1,000 replicates) was used to assess relative support in the unweighted parsimony analysis.
Halimolobine Brassicaceae
The anticodon domain sequence from A. thaliana has been also used to recognize duplicated copies in a recently published study on systematics of halimolobine Brassicaceae (Bailey, Price, and Doyle 2002). We used the original alignment to demonstrate anticodon domain copy number distribution and its correspondence to the published phylogenetic hypothesis, which is not only based on the trnL-F region but which is also supported by sequence data of the internal transcribed spacers of nuclear ribosomal DNA (ITS 1 and ITS 2) and the pistillata intron.
Comparisons Within the Brassicaceae Family
For all trnL-trnF sequences as summarized in table 1 we analyzed (1) the occurrence of duplicated sequences of the anticodon domain of the trnF(GAA) gene and (2) the occurrence of the several motifs (A–C, E) as characterized in A. thaliana (fig. 2a). The results were compared with a phylogeny of the whole family. The most comprehensive and available phylogeny based on a multiple data set (internal transcribed spacers 1 and 2 of nuclear ribosomal DNA, maturase K, alcoholdehydrogenase, and chalcone synthase; with several genes missing for numerous taxa in various combinations) has been published recently (Koch 2003). However, a more robust phylogenetic framework with less taxa but clocklike evolution of the corresponding molecular markers (nuclear-encoded chalcone synthase and alcohol dehydrogenase, and plastidic maturase K) has been elaborated (Koch, Haubold, and Mitchell-Olds 2000, 2001), and these phylogenetic frameworks with their corresponding divergence time estimates have been used to show a timeframe for the first occurrence of trnF gene duplications. For methods of calibrating the molecular clock and computing divergence time estimates refer to Koch, Haubold and Mitchell-Olds (2000, 2001).
Results
TrnF Pseudogenes in A. thaliana
In A. thaliana we characterized six multiple sequences in total within the 668-bp trnL(UAA)-trnF(GAA) intergenic spacer with the trnF(GAA) anticodon domain as the most highly conserved element (fig. 2a). These duplicated sequences have been enumerated as copies I to VIII. Copies IV–VI refer to the copies found in the other Arabidopsis species investigated, but they are not, or only partially, present in A. thaliana (fig. 2b, fig. 1, Supplementary Material online).
However, the neighboring regions of the anticodon-like sequence (indicated as regions A–C, E) did show low similarity only to the different regions of the trnF(GAA) gene (acceptor stem, D domain, and T domain [fig. 1]), with the exception of 3–6 base pairs at the 5'- and 3'-flanking regions of the D and T domains (fig. 2a).
Of particular interest is a common AGTA motif and its modifications (ATTA, AGGA, CGTA, GGTA), respectively, which is frequently found at the 5' end of the different duplicated regions A–C (fig. 2a).
Multiple trnF Pseudogene Copies in Cardaminopsis-Arabidopsis
The twelve different trnL-F spacer sequences of the several Arabidopsis species revealed 2–8 (table 1) pseudogenic copies among the different species (figs. 1 and 2, Supplementary Material online). The most similar copy to the trnF gene is pseudogene no. VII (fig. 5a and b and table 3). However, this copy is not present in two Cardaminopsis haplotypes analyzed herein (fig. 1, Supplementary Material online). Such losses of a particular pseudogene were found for all of the eight pseudogene copies in one or another accession. It has been shown recently that A. thaliana and the remaining representatives of the newly defined genus Arabidopsis (former genus Cardaminopsis) have diverged from each other roughly 5.8 MYA (Koch, Haubold, and Mitchell-Olds 2001), which provides a good time frame for the evolution of the several copies differing in their modularized structure of regions A–E (fig. 2a).
Table 3 Simple Pairwise P Value (Mean) and Standard Deviation (Sequence Distance, PAUP4.0b10) of the Different trnF Copies (Region C–E, Refer to Fig. 2a and b) Compared to the Original trnF Gene Among the Twelve Arabidopsis Accessions Investigated Herein
A parsimony analysis using all pseudogene copies separately provides some more detailed evidence for their evolutionary history (fig. 3a and b). Copies I, VI, VII and VIII of A. thaliana are clustering together with the corresponding copies of Cardaminopsis (see alignment in fig. 1, Supplementary Material online), indicating that these copies had existed prior to the split of the two phylogenetic lineages 5.8 MYA. This is also supported by the fact that at least copies I, VII, and VIII have identical or very similar gap positions (not included in our analysis). The only exception is copy VI, of which only part of the A. thaliana sequence (regions A', C, D, and E, refer to fig. 2a) is homologous to the Cardaminopsis copy VI sequences. Cardaminopsis copy VI served as source for copy V type 2. The parsimony analysis also indicated that copies II, III, and IV of Cardaminopsis most likely evolved independently from the most similar copy VIII. Copy V type 1 is identical to copy V type 2 concerning its structural alignment and gap information, however, phylogenetic analysis based on single nucleotide polymorphisms placed this copy close to copy IV from Cardaminopsis. This is best explained by two independent duplication events.
FIG. 3.— TrnF pseudogene copy number evolution among Arabidopsis species (Arabidopsis thaliana and former Cardaminopsis). For details of the alignment refer to figure 2b. (a) 50% Majority Rule Consensus Tree based on regions C, D, E only (tree length 44, consistency index [CI] 0.62, 88 trees). (b) 50% Majority Rule Consensus Tree based on the entire region A–E (tree length 82, CI 0.64, 1,000 trees). (c) hypothetical model of successive copy number evolution in A. thaliana and members of the former genus Cardaminopsis.
Consequently, A. thaliana copies II and III evolved independently from Cardaminopsis copies II and III. A schematic summary of pseudogene copy evolution based on parsimony analysis is provided in figure 5c.
Occurrence and Distribution of Pseudogenes Among Cruciferous Plants
Duplicated copies of the trnF(GAA) anticodon domain have been detected in numerous genera of the mustard family and a summary is given in table 1. We obtained the original alignments from most of the studies listed in table 1, and we were able to search for the duplicated regions directly within these alignments. These searches revealed several findings: the alignment of a phylogenetic study of the order Capparales (Hall, Sytsma, and Iltis 2002) revealed that few sequences (Nasturtium, Barbarea, and Capsella) ended at the 3' end with the first trnF (GAA) pseudogene copy, and the authors did not provide the entire sequence of the trnL-F spacer region including the "true" trnF(GAA) gene. However, this had no effect on their conclusion and results on Capparales systematics. In our study from these three sequences we could only estimate a minimum number of pseudogene copies. Fortunately, all these species have been included in other studies and have been analyzed on a broader scale (e.g., Bailey, Price, and Doyle 2002; Bleeker et al. 2002). A similar situation has been found in Cardamine (Lihova et al. 2004). The alignment of the spacer region ended with a pseudogene and not with the trnF(GAA) gene as indicated, and here we also provided a minimum number of pseudogenes for the taxa analyzed.
In many other cases we found no duplicated anticodon domains (e.g., Draba, Arabis, Noccaea, and others, table 1). Interestingly, in all cases of lacking pseudogenes we also did not find any of the repetitive motifs B, C, and E in the corresponding trnL(UAA)-trnF(GAA) spacer region. However, the prominent motif A/A' is always present in close 5' proximity of the functional trnF gene.
The distribution of the pseudogenic trnF(GAA) tandem repeats is totally in congruence with previously published phylogenies (fig. 4) of the Brassicaceae family (Koch 2003; summarized in Koch, Al-Shehbaz, and Mummenhoff 2003), and it is obvious that a first pseudogene copy evolved only once at the base of a highly supported monophyletic lineage (fig. 4). This is the first time that a reliable marker (molecular or morphological) has been described, which separates this taxonomically notorious difficult family with a relative deep split in time of approximately 18.5 (mean estimate of matk and chs from node A, fig. 4) to 16 MYA (mean estimate of matk and chs from node B, fig. 4). Divergence time estimates have been redrawn from previous investigations (Koch, Haubold, and Mitchell-Olds 2001).
FIG. 4.— Phylogenetic relationships among cruciferous plants based on chs and matK sequence data (redrawn from Koch, Haubold, and Mitchell-Olds 2001). Some genera have been added according to Koch (2003) and their phylogenetic position is indicated by a dotted line. Filled circles indicate taxa with trnF pseudogenes. Taxa marked with open circles have been proved to contain no pseudogenes.
However, "non" pseudogene–containing taxa remain paraphyletic in respect to the pseudogene carrying taxa.
The Example "Halimolobine" Brassicaceae
The analysis of the alignment provided by Bailey, Price, and Doyle (2002) revealed varying numbers of a pseudogenic anticodon domain from 1 to 6 (fig. 5). Enumeration of pseudogene anticodon copies followed their occurrence within the alignment and has been adopted to copy enumeration I to VIII in A. thaliana and Cardaminopsis. This analysis demonstrates that all copies (except for copy number 1) either have been constituted independently several times or have been lost several times in parallel throughout their evolution. Interestingly, the different regions A, A', B, and C are present at the 5'end of the first pseudogene copy in all taxa carrying the pseudogenes. A comparison with taxa that do not carry a pseudogene copy demonstrates that among all cruciferous taxa analyzed herein only region A is highly conserved and regions B, A' and C are missing in non–pseudogene carrying species (data not shown).
TrnF(GAA) Pseudogene Evolution in Angiosperms
We screened the trnL-trnF alignment of Borsch et al. (2003) covering all major groups of angiosperms for trnF pseudogene (or partial anticodon domain) insertion, and none of these taxa contained any duplications. In addition, we also screened this alignment for the different regions A, B, and C occurring in all cruciferous taxa showing anticodon domain duplications. However, none of these regions could be identified with a significant sequence identity among all noncruciferous taxa analyzed by Borsch et al. (2003).
This is also true for those Asteraceae (Microseris, Uropappus, Taraxacum) that represent the only examples of trnF gene copy number variation outside the Brassicaceae (Vijverberg and Bachmann 1999; Wittzell 1999).
However, in these cases the entire trnF gene has been duplicated, which is in sharp contrast to the Brassicaceae with extensive duplication of the trnF anticodon domain only.
Discussion
TrnF Pseudogene Characterization in Cruciferous Plants
The trnF(GAA) pseudogenes from cruciferous plants are quite different from those characterized in Microseris and Uropappus (Vijverberg and Bachmann 1999). In these species the whole gene including both acceptor stem regions has been tandemly duplicated with a sequence identity to the original trnF gene varying from 88% to 99%. A similar situation was found in Taraxacum (Wittzell 1999), with a sequence identity ranging from 80%–92%. Contrarily, in A. thaliana several different repetitive motifs occur (A–C, E) as indicated in figure 2a, which are not part of the functional trnF gene. It is notable that these motifs are also conserved among a variety of different taxa exclusively characterized by anticodon domain duplications, as shown by the halimolobine species data set (fig. 2, Supplementary Material online, e.g., alignment positions 966–1072). The majority of these motifs are not found in trnL-F spacer regions of cruciferous plants lacking such duplications. The only exception is the 5' region A/A'. This motif of 22 base pairs (fig. 2a) is present in all trnL-F spacer regions in closest proximity to the functional trnF gene. Blast searches for region A, B, and C against the whole chloroplast genome sequence of A. thaliana (AP000423) revealed no significant hits, with the exception of parts of region A matching parts of the rps7 gene and also its neighboring trnV-rps7 spacer (table 2 and fig. 2a). Interestingly, the situation changes when we select shorter motifs (7–8 bp) from regions A, B, and C to search for identical motifs throughout the A. thaliana chloroplast genome (table 2 and fig. 2a). As expected because of shorter search strings, the number of hits increased greatly. It is also obvious from the summary scores in table 2, that the single hits are randomly distributed all over the plastome, which in this case is 154,478 bp in size. However, the three selected 7- to 8-bp motifs revealed a significant nonrandom clustering. Out of 25 hits in total, 13 are co-occurring in similar regions of the plastome—a finding, for which we have no explanation so far. From these results we can conclude that (1) the flanking sequence regions A–C of the trnF(GAA) anticodon domain are unique and have not been simply transferred from other regions of the plastome and (2) the occurrence of region A in all cruciferous taxa regardless of any anticodon domain duplication provides evidence that the duplicated sequences resulted from rearrangement of the trnF gene and its neighboring areas. However, the finding that the only significant matches of the blast search concern region A (table 2), and, moreover, that these matches are found in a coding gene (rps7) and its neighboring spacer region (spacer trnV-rps7) might indicate that a sequence like that from region A might have driven the first corresponding duplication.
A comparison of the distribution of trnF anticodon duplications among cruciferous plants implies a single origin of an initial duplication within a monophyletic lineage (fig. 4). The dates of divergence between nodes A (approximately 18 MYA) and B (approximately 16 MYA) provide time estimates (Koch, Haubold, and Mitchell-Olds 2000, 2001). The phylogenetic hypothesis shown in figure 4 comprises only a limited set of taxa. However, our finding of the distribution of anticodon duplications among cruciferous plants is also fully consistent with a large-scale phylogeny provided recently (Koch 2003) and not shown here.
The consistent co-occurrence of flanking regions with duplicated anticodon domains can be studied as an example in some more detail focusing on the halimolobine crucifer data set (Bailey, Price, and Doyle 2002). A. thaliana anticodon pseudogene copy 1 (fig. 2a) is distributed in all species included in this study (fig. 5, cf. fig. 2, Supplementary Material online: alignment positions 966–1100). Pairwise sequence identity of this pseudogene copy 1 among the different species is always higher than compared to the original trnF gene (data not shown), which also provides good evidence for the monophyletic origin of the first pseudogene copy.
In addition to the duplicated pseudogenes (anticodon domains) and the neighboring regions A–E, we were also able to characterize promoter elements that show high similarity to a putative sigma70-type bacterial promoter motif (–35 TTGACA/–10 GAGGAT) (Quandt et al. 2004). In a comprehensive study across land plants, this motif has been found consistently (Quandt et al. 2004), and it has been speculated that this promoter represents the original trnFGAA gene promoter. However, it has been concluded that the trnFGAA gene is cotranscribed with trnLUAA (Kanno and Hirai 1993), and consequently the –35 TTGACA/–10 GAGGAT promoter motif in front of the trnFGAA gene should be nonfunctionable. Our data largely support this conclusion because the –10 element and the –35 element are present in several trnL-F spacer sequences of the genus Arabidopsis (fig. 1, Supplementary Material online: position 198–203, position 900–905), and all duplications are inserted between these two elements. Consequently, it can be hardly believed that they are still functionable.
TrnF Pseudogene Copy Number Evolution: The Genus Arabidopsis
We can only speculate about the mode of origin of the first pseudogene copy, which dates back roughly 17 MYA. However, the example from Cardaminopsis-Arabidopsis provides some more detailed insights into the dynamics of subsequent copy number evolution (figs. 3a–c). In all eight cases the newly arisen copy was placed between already existing pseudogenes (fig. 3c), and they did not move further downstream of the 5' end of copy I. The parsimony analysis did not recognize all groups significantly with high bootstrap support, but tree topologies are congruent when different proportions of the total pseudogene region have been selected (fig. 3a vs. fig. 3b), which might indicate that in most cases the total region has been subjected to several duplication events. However, we cannot exclude additional recombination events, and the example of copy V might indicate such a situation: Relative position and gap structure is totally conserved between both copies (fig. 2b), but parsimony analysis does not recognize them as orthologues (fig. 3a and b).
Similarly, genetic distances are not always in congruence with our hypothesis of trnF pseudogene evolution (table 3). One might expect that if we regard copy I as ancestral type, this copy must show the highest sequence distance when compared to the original functional gene. This is not the case, and copies VI and VIII show significant higher distance values than copies I or VII. Our sequence distance values provide a mutation rate for regions C–E (fig. 2a), varying between 2.4 x 10–8 and 3.8 x 10–8 mutations/site/year. However, these values exceed the normal mutation rate of the entire trnL-intron–trnL-F spacer region by a factor of 20 (3.6 x 10–9 to 7.7 x 10–9, e.g., calculated in Mummenhoff et al. 2004), which can be at least partly explained by an increase of the mutation rate of single nucleotides by structural mutations such as recombination resulting in new copies. From our data it might be also speculated that the highly conserved 5' region of the first pseudogene copy (as well as of the 3' part and, by selection, the trnF gene) might be the consequence of not being prone to recombination of these regions, in contrast to the region in-between.
However, further research is needed to understand the underlying evolutionary mechanisms.
Phylogenetic Utility of trnF Pseudogenes
It has to be mentioned here that the evolutionary history of the Brassicaceae on a family-wide scale is still poorly understood (Koch 2003; Koch, Al-Shehbaz, and Mummenhoff 2003). The most important conclusion of the various phylogenetic studies published so far is that traditional classification schemes based on morphology, embryology, or cytology often do not reflect phylogenetic relationships, depending on the taxonomical level considered. The occurrence of trnF pseudogenes among cruciferous plants is the first character defining a significant split in the deep Brassicaceae phylogeny roughly 16–18 MYA. The corresponding clade comprises taxa from various artificially designed tribes (Sisymbrieae, Arabideae, Lepidieae) as defined by traditional taxonomists such as Janchen (1942), Schulz (1936), or Hayek (1911). Future molecular studies might substantiate our findings on a family-wide scale to contribute clarifying the systematic situation in the mustard family as it was done based on structural mutation in the chloroplast genome in various families (Asteraceae: Jansen and Palmer 1987; Fabaceae: Bruneau, Doyle, and Palmer 1990; Doyle, Lavin, and Bruneau 1992; Poaceae: Doyle et al. 1992; Doyle, Doyle, and Palmer 1995; and reviewed by D. E. Soltis and P. S. Soltis 1998).
Acknowledgements
This work was supported by grants from the Austrian Science Foundation—FWF (GEN-15609 and GEN-14463) and the German Science Foundation—DFG (Ko-2302/1-1) to M.K. We also thank all authors providing us with their original DNA sequence alignments.
References
Bailey, C. D., R. A. Price, and J. J. Doyle. 2002. Systematics of the halimolobine Brassicaceae: evidence from three loci and morphology. Syst. Bot. 27:318–332.
Bakker, F. T., A. Culham, R. Gomez-Martinez, J. Carvalho, J. Compton, R. Dawtrey, and M. Gibby. 2000. Pattern of nucleotide substitution in angiosperm cpDNA trnL(UAA)-trnF(GAA) regions. Mol. Biol. Evol. 17:1146–1155.
Besendahl, A., Y.-L. Qiu, J. Lee, J. D. Palmer, and D. Bhattacharya. 2000. The cyanobacterial origin and vertical transmission of the plastid tRNALeu group-I-intron. Curr. Genet. 37:12–23.
Bleeker, W., A. Franzke, K. Pollmann, A. H. D. Brown, and H. Hurka. 2002. Phylogeny and biogeography of Southern Hemisphere high-mountain Cardamine species (Brassicaceae). Aust. Syst. Bot. 15:575–581.
Bleeker, W., and H. Hurka. 2001. Introgressive hybridization in Rorippa (Brassicaceae): gene flow and its consequences in natural and anthropogenic habitats. Mol. Ecol. 10:2013–2022.
Bleeker, W., C. Weber-Sparenberg, and H. Hurka. 2002. Chloroplast DNA variation and biogeography in the genus Rorippa Scop. (Brassicaceae). Plant Biol. 4:104–111.
Borsch, T., K. W. Hilu, D. Quandt, V. Wilde, C. Neinhuis, and W. Barthlott. 2003. Noncoding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J. Evol. Biol. 16:558–576.
Bowman, C. M., R. F. Barker, and T. A. Dyer. 1988. The location and possible evolutionary significance of small dispersed repeats in wheat ctDNA. Curr. Genet. 10:931–941.
Bruneau, A., J. J. Doyle, and J. D. Palmer. 1990. A chloroplast DNA structural mutation as a subtribal character in the Phaseoleae (Leguminosae). Syst. Bot. 15:378–386.
Cech, T. R., D. Herschlag, J. A. Piccirilli, and A. M. Pyle. 1992. RNA catalysis by a group I ribozyme: developing a model for transition state stabilization. J. Biol. Chem. 267:17479–17482.
Costa, J. L., P. Paulstrud, and P. Lindblatt. 2002. The cyanobacterial tRNALeu(UAA) intron: evolutionary patterns in a genetic marker. Mol. Biol. Evol. 19:850–857.
Dally, A. M., and G. Second. 1990. Chloroplast DNA diversity in wild and cultivated species of rice (Genus Oryza, section Oryza). Cladistic-mutation and genetic-distance analysis. Theor. Appl. Genet. 80:209–222.
Dobe, C., T. Mitchell-Olds, and M. Koch. 2004. Extensive chloroplast haplotype variation indicates Pleistocene hybridization and radiation of North American Arabis drummondii, A. xdivaricarpa, and A. holboellii (Brassicaceae) Mol. Ecol. 13:349–370.
Doyle, J. J., J. I. Davis, R. J. Soreng, D. Garvin, and M. J. Anderson. 1992. Chloroplast DNA inversions and the origin of the grass family (Poaceae). Proc. Natl. Acad. Sci. USA 89:7722–7726.
Doyle, J. J., J. L. Doyle, and J. D. Palmer. 1995. Multiple independent losses of two genes and one intron from legume chloroplast genomes. Syst. Bot. 20:272–294.
Doyle, J. J., M. Lavin, and A. Bruneau. 1992. Contributions of molecular data to papillionoid legume systematics. Pp. 223–251 in P. S. Soltis, D. E. Soltis, and J. J. Doyle, eds. Molecular systematics of plants. Chapman and Hall, New York.
Drábkova, L., J. Kirschner, . Vlek, and V. Paek. 2004. TrnL-trnF intergenic spacer and trnL intron define major clades within Luzula and Juncus (Juncaceae): importance of structural mutations. J. Mol. Evol. 59:1–10.
Goremykin, V. V., K. I. Hirsch-Ernst, S. W?lfl, and F. H. Hellwig. 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20:1499–1505.
Govindaraju, D. R., B. R. Dancik, and D. B. Wagner. 1989. Novel chloroplast DNA polymorphism in a sympatric region of two pines. J. Evol. Biol. 2:49–59.
Graham, S. W., P. A. Reeves, C. E. Burns, and R. G. Olmstead. 2000. Microstructural changes in non-coding DNA: interpretation, evolution and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83–S96.
Hall, J. C., K. J. Sytsma, and H. H. Iltis. 2002. Phylogeny of Capparaceae and Brassicaceae based on chloroplast sequence data. Am. J. Bot. 89:1826–1842.
Hamilton, M. B., J. M. Braverman, and D. F. Soria-Hernanz. 2003. Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions in New World species of the Lecythidaceae. Mol. Biol. Evol. 20:1710–1721.
Hayek, A. 1911. Entwurf eines Cruciferensystems auf phylogenetischer Grundlage. Beih. Bot. Centralbl. 27:127–335.
Hewitt, G. M. 2001. Speciation, hybrid zones and phylogeography—or seeing genes in space and time. Mol. Ecol. 10:537–549.
Hipkens, V. D., K. A. Marshall, D. B. Neale, W. H. Rottmann, and S. H. Strauss. 1995. A mutation hotspot in the chloroplast genome of a conifer (Douglas fir: Pseudotsuga) is caused by variability in the number of direct repeats from a partially duplicated tRNA gene. Curr. Genet. 27:527–579.
Ingvarsson, P. K., S. Ribstein, and D. R. Taylor. 2003. Molecular Evolution of insertions and deletions in the chloroplast genome of Silene. Mol. Biol. Evol. 20:1737–1740.
Janchen, E. 1942. Das System der Cruciferen. ?sterr. Bot. Z. 91:1–28.
Jansen, R. K., and J. D. Palmer. 1987. A chloroplast DNA inversion marks an ancient evolutionary split in the sunflower family (Asteraceae). Proc. Natl. Acad. Sci. USA 84:5818–5822.
Johnson, L. B., and J. D. Palmer. 1989. Heteroplasmy of chloroplast DNA in Medicago. Plant Mol. Biol. 12:3–11.
Kanno, A., and A. Hirai. 1993. A transcription map of the chloroplast genome from rice (Oryza sativa). Curr. Genet. 23:166–174.
Kelch, D. G., A. Driskell, and B. Mishler. 2004. Inferring phylogeny using genomic characters: a case study using land plant plastomes. Mongr. Syst. Bot. Mo. Bot. Gard. 98:3–12.
Koch, M. 2003. Molecular phylogenetics, evolution and population biology in the Brassicaceae. Pp. 1–35 in A. K. Sharma and A. Sharma, eds. Plant genome: biodiversity and evolution, Vol. 1. Phanerogams. Science Publishers, Inc., Enfield, N.H.
Koch, M., and I. A. Al-Shehbaz. 2002. Molecular data indicate complex intra- and intercontinental differentiation of American Draba (Brassicaceae). Ann. Mo. Bot. Gard. 89:88–109.
———. 2004. Taxonomic and phylogenetic evaluation of the American "Thlaspi" species: identity and relationship to the Eurasian genus Noccaea (Brassicaceae). Syst. Bot. 29:375–384.
Koch, M., I. A. Al-Shehbaz, and K. Mummenhoff. 2003. Molecular systematics, evolution and population biology in the mustard family (Brassicaceae). Ann. Mo. Bot. Gard. 90:151–171.
Koch, M., C. Dobe, and M. Matschinger. 2003. The trnF(GAA) gene in cruciferous plants: extensive duplication, variation in copy number and parallel evolution. Palmarum Hortus Francofurtensis 7:54.
Koch, M., B. Haubold, and T. Mitchell-Olds. 2000. Comparative evolutionary analysis of chalcone synthase and alcoholdehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17:1483–1498.
———. 2001. Molecular systematics of the cruciferae: evidence from coding plastome matK and nuclear CHS sequences. Am. J. Bot. 88:534–544.
Kuhsel, M. G., R. Strickland, and J. D. Palmer. 1990. An ancient group I intron shared by eubacteria and chloroplasts. Science 250:1570–1573.
Lanner, C. 1998. Relationships of wild Brassica species with chromosome number 2n=18, based on comparison of the DNA sequence of the chloroplast intergenic region between trnL(UAA) and trnF(GAA). Can. J. Bot. 76:228–237.
Lee, J.-Y., K. Mummenhoff, and J. L. Bowman. 2002. Allopolyploidization and evolution of species with reduced floral structures in Lepidium L. (Brassicaceae). Proc. Natl. Acad. Sci. USA 99:16835–16840.
Lidholm, J., A. Szmidt, and P. Gustafsson. 1991. Duplication of the psbA gene in the chloroplast genome of two Pinus species. Mol. Gen. Genet. 226:345–352.
Lihova, J., J. Fuertes-Aguilar, K. Marhold, and G. Nieto-Feliner. 2004. Origin of the disjunct tetraploid Cardamine amporitana (Brassicaceae) assessed with nuclear and chloroplast DNA sequence data. Am. J. Bot. 91:1231–1242.
L?hne C., and T. Borsch. 2005. Molecular evolution and phylogenetic utility of the petD group II intron: a case study in basal angiosperms. Mol. Biol. Evol. 22:1–16.
Marshall, H. D., C. Newton., and K. Ritland. 2001. Sequence-repeat polymorphisms exhibit the signature of recombination in lodgepole pine chloroplast DNA. Mol. Biol. Evol. 18:2136–2138.
Martin, W., B. Stoebe,V. Goremykin, S. Hansmann, M. Hasegawa, and K. V. Kowallik. 1998. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393:161–165.
Matschinger, M., and M. Koch. 2003. Molecular systematics, phylobiogeography and evolution of the genus Cardaminopsis Hayek (Brassicaceae), the closest relatives of the model plant Arabidopsis thaliana (L.) Heynh. Palmarum Hortus Francofurtensis 7:196.
Millen, R. S., R. G. Olmstead, K. L. Adams et al. (12 co-authors). 2001. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13:645–658.
Mummenhoff, K., H. Brüggemann, and J. Bowman. 2001. Chloroplast DNA phylogeny and biogeography of the genus Lepidium (Brassicaceae). Am. J. Bot. 88:2051–2063.
Mummenhoff, K., P. Linder, N. Friesen, J. L. Bowman, J.-Y. Lee, and A. Franzke. 2004. Molecular evidence for bicontinental hybridogenous genomic constitution in Lepidium sensu stricto (Brassicaceae) species from Australia and New Zealand. Am. J. Bot. 91:254–261.
Newton, A. C., T. R. Allnutt, A. C. M. Gillies, A. J. Lowe, and R. A. Ennos. 1999. Molecular phylogeography, intraspecific variation and the conservation of tree species. Trends Ecol. Evol. 14:140–145.
O'Kane, S. L. Jr., and I. A. Al-Shehbaz. 1997. A synopsis of Arabidopsis (Brassicaceae). Novon 7:323–327.
O'Kane, S., and I. A. Al-Shehbaz. 2003. Phylogenetic position and generic limits of Arabidopsis (Brassicaceae) based on sequences of nuclear ribosomal DNA. Ann. Mo. Bot. Gard. 90:603–612.
Olmstead, R. G., and J. D. Palmer. 1994. Chloroplast DNA systematics: a review of methods and data analysis. Am. J. Bot. 81:1205–1224.
Palmer, J. D., and W. F. Thompson. 1981. Rearrangements in the chloroplast genomes of mung bean and pea. Proc. Natl. Acad. Sci. USA 78:5533–5537.
———. 1982. Chloroplast DNA rearrangements are more frequent when a large inverted repeat sequence is lost. Cell 29:537–550.
Paquin, B., S. D. Kathe, S. A. Nierzwicki-Bauer, and D. A. Shub. 1997. Origin and evolution of group I introns in cyanobacterial tRNA genes. J. Bacteriol. 179:6798–6806.
Perry, A. S., S. Brennan, D. J. Murphy, T. A. Kavanagh, and K. H. Wolfe. 2002. Evolutionary re-organization of a large operon in Adzuki bean chloroplast DNA caused by inverted repeat movement. DNA Res. 9:157–162.
Perry, A. S., and K. H. Wolfe. 2002. Nucleotide substitution rates in legume chloroplast DNA depend on the presence of the inverted repeat. J. Mol. Evol. 55:501–508.
Quandt, D., K. Müller, M. Stech, J.-P. Frahm, W. Frey, K. W. Hilu, and T. Borsch. 2004. Molecular evolution of the chloroplast trnL-F region in land plants. Monogr. Syst. Bot. Missouri Bot. Gard. 98:13–37.
Reboud, X., and C. Zeyl. 1994. Organelle inheritance in plants. Heredity 72:132–140.
Schulz, O. E. 1936. Cruciferae. Pp. 227–658 in A. Engler and K. Prantl, eds. Die natürlichen Pflanzenfamilien, Vol. 17B. Verlag von Wilhelm Engelmann, Leipzig.
Soltis, D. E., and P. S. Soltis. 1998. Choosing an approach and appropriate gene for phylogenetic analysis. Pp. 1–42 in D. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecular systematics of plants II. DNA sequencing. Kluwer Academic Publishers, London.
Swofford D. L. 2000. PAUP* 4.0b10. Sinauer Associates, Sunderland, Mass.
Taberlet, P., L. Gielly, and G. Pautou. 1991. Universal primers for amplification of three non-coding chloroplast regions. Plant Mol. Biol. 17:1105–1109.
Vijverberg, K., and K. Bachmann. 1999. Molecular evolution of a tandemly repeated trnF(GAA) gene in the chloroplast genome of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analysis. Mol. Biol. Evol. 16:1329–1340.
Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sigiura. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91: 9794–9798.
Wittzell, H. 1999. Chloroplast DNA variation and reticulate evolution in sexual and apomictic sections of dandelions. Mol. Ecol. 8:2023–2035.
Wolfe, K. H., C. W. Morden, and J. D. Palmer. 1992. Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc. Natl. Acad. Sci. USA 89:10648–10652.
Xu, M.-Q., S. D. Kathe, H. Goodrich-Blair, S. A. Nierzwicki-Bauer, and D. A. Shub. 1990. Bacterial origin of a chloroplast intron: conserved self-splicing group-I introns in cyanobacteria. Science 250:1566–1570.
Yang, Y.-W., P.-Y. Tai, Y. Chen, and W.-H. Li. 2002. A study of the phylogeny of Brassica rapa, B. nigra, Raphanus sativus, and their related genera using noncoding regions of the chloroplast DNA. Mol. Phylogenet. Evol. 23:268–275.(Marcus A. Koch*, Christop)