当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第4期 > 正文
编号:11259370
Rapid Evolution of a Pollen-Specific Oleosin-Like Gene Family from Arabidopsis thaliana and Closely Related Species
     * Department of Genetics and Evolution, Max-Planck-Institute of Chemical Ecology, Jena, Germany

    Department of Biology, University College London, London, U.K.

    E-mail: schmid@ice.mpg.de.

    Abstract

    It has been shown in a variety of species that genes expressed in reproductive tissues evolve rapidly, which often appears to be the result of positive Darwinian selection. We investigated the evolution of a family of seven pollen-specific oleosin-like proteins (or oleopollenins) in Arabidopsis thaliana and two closely related species. More than 30 kb of a genomic region that harbors the complete, tandemly repeated oleopollenin cluster were sequenced from Arabidopsis lyrata ssp. lyrata, and Boechera drummondii. A phylogenetic analysis of the complete gene cluster from these three species and from Brassica oleracea confirmed its rapid evolution resulting from gene duplication and gene loss events, numerous amino acid substitutions, and insertions/deletions in the coding sequence. Independent duplications were inferred in the lineages leading to Arabidopsis and to Brassica, and gene loss was inferred in the lineage leading to B. drummondii. Comparisons of the ratio of nonsynonymous (dN) and synonymous (dS) divergence revealed that the oleopollenins are among the most rapidly evolving proteins currently known from Arabidopsis and that they may evolve under positive Darwinian selection. Reverse transcriptase polymerase chain reaction analysis demonstrated the expression of oleopollenins in flowers of the outcrossing A. lyrata, the selfing B. drummondii, and the apomictic Boechera holboellii, suggesting that oleopollenins play an important role in species with different breeding systems. These results are consistent with a putative function in species recognition, but further analyses of protein function and sequence variation in species with different breeding systems are necessary to reveal the underlying causes for the rapid evolution of oleopollenins.

    Key Words: oleopollenin ? reproductive protein ? positive selection ? comparative genomics ? Arabidopsis ? speciation

    Introduction

    The rapid divergence of reproductive traits between species suggests that the evolution of reproductive isolation is a key step in speciation. The incompatibility of heterospecific gametes has long been considered to be one of the most important mechanisms of prezygotic reproductive isolation and thus speciation (Dobzhansky 1937). This hypothesis is supported by the observation that genes expressed in reproductive tissues tend to evolve more rapidly than genes expressed in other tissues (reviewed by Swanson and Vacquier 2002a). In animal species, many reproductive proteins diverge as a result of adaptive evolution—i.e., because of preferential fixation of amino acid replacements (Lee, Ota, and Vacquier 1995; Metz and Palumbi 1996; Bielawski and Yang 2001; Swanson et al. 2001a, 2001b; Togerson, Kulathinal, and Singh 2002; Swanson, Nielsen, and Yang 2003; Civetta 2003). These include proteins that mediate the interaction between male and female gametes during the first steps of the fertilization process. Although several explanations for the rapid evolution of reproductive proteins have been proposed (e.g., sexual conflict, reproductive isolation, sexual selection, pathogen attacks; reviewed by Howard 1999; Swanson and Vacquier 2002b), differentiating between these hypotheses is difficult and still a largely unresolved problem. In comparison to animal species, less is known about the evolution of reproductive proteins in plants, except for proteins involved in self-incompatibility (SI), which are characterized by great allelic diversity within species (Schopfer, Nasrallah, and Nasrallah 1999; Schopfer and Nasrallah 2000).

    The sexual reproduction of plants begins with pollination, which involves the adhesion of pollen grains to the stigma, and is followed by the hydration of pollen and the subsequent outgrowth of the pollen tube (reviewed by Lord and Russell 2002). It is believed that several recognition steps differentiate between compatible and incompatible pollen. In Arabidopsis thaliana, the first step process occurs during pollen-stigma adhesion and it is able to differentiate between pollen from the Brassicaceae and other plant families (Zinkl et al. 1999). Several studies have suggested that the pollen coat is involved in the adhesion to the stigma and the recognition process (Luu et al. 1999; Doughty, Wong, and Dickinson 2000; Takayama et al. 2000), but there is also contradictory evidence from Arabidopsis (Zinkl et al. 1999).

    The pollen coat forms the outer, extracellular layer of the pollen grain and fills the pits of the exine wall layer (Heslop-Harrison et al. 1974). It is a complex mixture of lipid and protein molecules (Piffanelli, Ross, and Murphy 1998; Piffanelli and Murphy 1998). In A. thaliana, more than 90% of pollen coat proteins are lipases and oleosin-like proteins (Mayfield et al. 2001). Pollen-specific oleosin-like proteins, or oleopollenins (Murphy 2001), are a family of proteins whose members are highly expressed in the tapetal cells of the anthers (Robert et al. 1994; Ross and Murphy 1996) where they are synthesized as full-length proteins. Oleopollenins are encoded by a tandemly repeated gene family in A. thaliana and in B. oleracea (de Oliveira et al. 1993; Mayfield et al. 2001). The encoded proteins consist of two domains that include the highly conserved N-terminal oleosin-like domain and the highly variable C-terminal domain, whose length varies from 7 to 37 kDa. The N-terminal domain contains a 70-amino acid hydrophobic motif, whereas the C-terminal domain tends to be highly repetitive and contain repeats that are similar to those found in structural proteins (Ross and Murphy 1996; Murphy and Ross 1998). After protein synthesis, oleopollenins are associated with cytosolic lipid bodies (tapetosomes) until the tapetal cells undergo programmed cell death. At this point, the oleopollenins are cleaved into the oleosin-like domains and the repetitive C-terminal domains, which form the mature pollenins (Tind, Ratnayake, and Huang 1998; Hernández-Pinzón et al. 1999). The former are degraded and the latter are combined with other products from the tapetal cells to build the pollen coat. The different pollenins are the most abundant protein class in the pollen coat of B. oleracea and A. thaliana (Murphy and Ross 1998; Mayfield et al. 2001).

    The exact function of the oleopollenins is not known. Evidently the mature pollenins have an important function because (1) they are so abundant in the pollen coat, (2) no pseudogene was found in a sequence survey of this cluster in five accessions of A. thaliana (Mayfield et al. 2001), and (3) a knock-out mutant of the AtGRP17 oleopollenin resulted in a significant delay in pollen hydration when compared against the wild type (Mayfield and Preuss 2000). It has been speculated that the oleosin-like domain of the full-length protein stabilizes the oil bodies in the tapetosomes or, alternatively, acts as a signaling sequence for the mature pollenin (Murphy 2001). The pollenin domains may have a function as water channels that facilitate the permeation of water through the lipidic pollen coat during pollen hydration or a function as species-specific signaling molecules to receptor molecules on the pistil surface that trigger further steps in the pollination process.

    Mayfield et al. (2001) suggested that this class of proteins may be involved in species recognition by discriminating between compatible and incompatible pollen. This hypothesis is based on their observation of a lack of sequence conservation in the pollenin domain between A. thaliana and Brassica napus, substantial sequence variation within species, and an abundance of oleosin proteins in the pollen coat. It follows that rapid, selection-driven divergence of the oleopollenin genes between closely related species may occur. To test for a possible role of this protein family in species recognition as indicated by patterns of sequence evolution, we sequenced the oleopollenin gene cluster from Arabidopsis lyrata and Boechera drummondii, which are closely related to A. thaliana (ca. 5 and 10 Myr evolutionary distance; Koch, Haubold, and Mitchell-Olds 2000, 2001). We also analyzed the expression patterns of oleopollenins in these species and the role of positive selection in the sequence evolution by comparing the level of synonymous (dS) and nonsynonymous (dN) substitutions (Yang and Bielawski 2000). Finally, we compared rates of evolution between paralogs and in different evolutionary lineages to investigate the effects of functional divergence after duplication and of the breeding system on the evolution of this family of reproductive proteins.

    Materials and Methods

    Plant Material

    The following accessions were used for the expression analysis: Ler and In-0 (Innsbruck, Austria) accessions from A. thaliana; an accession collected at Mount Baker, Washington, from A. lyrata ssp. lyrata; an individual from Plech, Fr?nkische Schweiz, Germany, from A. lyrata ssp. petraea; individuals collected near Challis, Idaho (Ploidy levels 2n and 3n), near Anaconda, Montana (ploidy levels 2n + x), and in the Highwood Mountains south of Butte, Montana (ploidy levels 3n + x), from Boechera holboellii. All accessions except Ler were collected by T. M.-O. The ploidy levels were determined by fluorocytometry (T. Sharbel, personal communication).

    BAC Libraries

    The BAC library from A. lyrata ssp. lyrata was made available to us by Dr. June Nasrallah (Cornell University). The average size of inserts is 110–120 kbp, and they were cloned into the pBeloBACII vector. The BAC library from B. drummondii was constructed by LION Biosciences (Heidelberg, Germany) and cloned into a pCLD04541 vector.

    Clone Identification and Sequencing

    BAC clones containing the homologous region of the oleopollenin gene family were identified by filter hybridization of gridded nylon membranes containing the clones of the BAC libraries. The conserved oleosin-like domain of the AtGRP17 (TAIR-ID: At5g07530) gene was used as a probe. Radioactive labeling and hybridization was done according to standard protocols (Sambrook and Russell 2001). Several positive clones were isolated from each library, amplified in overnight culture, and shotgun cloned. To verify that the correct BAC clone was picked, one dozen shotgun clones from every BAC clone were sequenced from both ends and compared to the Arabidopsis genome using BLASTN (National Center for Biotechnology Information Blast2.0). Among BAC clones that matched the AtGRP17-containing regions on chromosome 5, one was picked from each species for further sequencing.

    The selected BAC clones were subjected to separate partial digestions with Tsp509I and Sau3AI enzymes, size fractionated into 1–3-kb and 5–9-kb fragments, and cloned into a pUC19 vector. To identify subclones that mapped to the oleopollenin gene cluster, several hundred subclones were sequenced and compared to A. thaliana with BLASTN. Subclones covering the region of interest were selected for complete sequencing using primer walking or transposon-based sequencing (GeneJumper kit, Invitrogen) approaches. Base calling, sequence assembly, and editing were done with the phred, phrap and consed suite of programs (Ewing et al. 1998; Gordon, Abajian, and Green 1998).

    Sequence Annotation

    The A. lyrata and B. drummondii sequences were annotated by comparison to the homologous A. thaliana genomic region using BLASTN and WISE (Birney, Thompson, and Gibson 1996) programs. Because of the rapid evolution of the repetitive second exon of the oleopollenin genes, the computer-based determination of the correct open reading frame (ORF) was difficult for some genes in the B. drummondii sequence. For this reason, reverse transcriptase–polymerase chain reaction (RT-PCR) with intron-spanning primer pairs and sequencing of the resulting PCR fragments was employed to determine the correct exon-intron boundaries and the correct reading frames. Sequences were submitted to GenBank under accession numbers AY292860 and AY292861.

    The nomenclature of this gene family is not fully standardized and there are no guidelines available. For this reason, we named the genes using the first letters of the species name and the unique identifier from the TAIR-ID in A. thaliana. For example the ortholog of At5g07530 in B. drummondii is called Bd5g07530.

    Evolutionary Analysis

    Sequences were aligned with MLAGAN (Brudno et al. 2003) and visually inspected to improve the alignment. Phylogenetic reconstruction was performed by analyzing both DNA and protein sequences with parsimony (dnapars program of PHYLIP; Felsenstein 1989), Tree-Puzzling (Schmidt et al. 2002) and Neighbor-Joining (neighbor program of PHYLIP) methods and the resulting phylogenies were compared. Tests of selection were conducted with the codeml program of the PAML package, which uses maximum likelihood (ML) estimation and takes transition/transversion bias and unequal codon usage into account (Yang 1997). To estimate numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site, the F3X4 codon model was used. We estimated = dN/dS and calculated the likelihood for all pairs of aligned sequences from A. thaliana and A. lyrata using both codon-substitution models. In cases where > 1, we tested for positive selection by calculating the likelihood of a strictly neutral model ( = 1) and comparing both likelihood values in a likelihood ratio test (LRT), where the test statistic was calculated as 2 = 2 x (0 - 1) and compared to a 2 distribution with one degree of freedom.

    Tests for positive selection in the oleosin-like domain (71 codons) were based on multiple alignments of this domain and were done using both the site-specific and the branch ML models (Yang 1998; Nielsen and Yang 1998; Yang et al. 2000). F3X4 was used as the codon-substitution model. Tests of positive selection were conducted by comparing site-specific model M1 (neutral) with M2 (selection) and M3 (discrete), respectively, and M7 (beta) with M8 (beta & ) in a likelihood ratio test in which the degrees of freedom statistic equals the difference in the numbers of parameters in each model (For an explanation of the models, see Yang et al. 2000). We also conducted a test of selection with a modified model M8, in which the null model M8A (beta & = 1) was compared against M8B (beta & > 1) (see Swanson, Nielsen, and Yang 2003).

    Expression Analysis

    RNA was isolated from four different tissues: leaves, stems, flowers, and seeds. We sampled flowers from different stages of development, including closed flower buds and fertilized flowers with outgrowing siliques. RNA was extracted by using a standard protocol employing cell lysis with the TRIZOL reagent and RNA purification with phenol-chloroform and ethanol precipitation. For RT-PCR, the Omniscript kit (Qiagen, Hildesheim, Germany) was used. The cDNA was generated with an oligo-dT primer (VT18) and Omniscript Reverse Transcriptase. The PCR amplification was done in combination with gene-specific primers or a gene-specific primer and a 3' RACE-Primer (dT17-Ro-Ri: ATACCTCTAACCATCACCGAATTCGACTCACTATAGGGATTTTTTTTTTTTTTTT). A standard PCR protocol was used (cycling scheme: 4 min at 94°C, 40 cycles of 30 s each at 94°C, X°C, and 72°C, and a final extension for 7 min at 72°C.). The annealing temperature was different for every primer pair. Primers were designed to span the intron so that PCR products differentiated from the cDNA could be derived from genomic contaminants. The PCR products were analyzed on an agarose gel and the presence and absence of bands was scored.

    Results

    Sequencing of the Oleopollenin Cluster from Arabidopsis lyrata and Boechera drummondii

    We sequenced and annotated the region that spans the oleopollenin cluster in A. lyrata and B. drummondii. Figure 1 shows a schematic comparison of the oleopollenin region from A. thaliana, A. lyrata, B. drummondii, and Brassica oleracea. The Arabidopsis sequence was obtained from GenBank accession AL163912 and the Brassica oleracea sequence from accession AY028608 (Mayfield et al. 2001). The single-copy genes At5g07500 and At5g07610 were used to define the boundaries of the cluster because they are colinear across species. The length of this region is 34,615 bp in A. thaliana, 33,919 bp in A. lyrata, and 31,901 bp in B. drummondii. The comparison among the four species shows variation in copy number and the length of coding sequences (see table 1 in the Supplementary Material online). Gene number is conserved between A. thaliana and A. lyrata, but not in B. drummondii, where the first two oleosin genes (At5g07510 and At5g07520) are missing and only the first exon of gene At5g07600 appears to be present. The rapid sequence divergence and length differences made it difficult in both species to identify the second exon of the oleopollenin genes by sequence alignments alone. For this reason, the exon-intron structure and the correct coding sequence of all members of the oleopollenin family in A. lyrata and B. drummondii were determined by RT-PCR using intron-spanning primers and sequencing of the resulting PCR product. The exon-intron structure is conserved among members of the cluster, but a comparison of the length of introns and exons revealed substantial differences. Introns and coding sequences are approximately of the same size in A. thaliana and A. lyrata, whereas in B. drummondii, introns are significantly longer and the coding sequences shorter than in A. thaliana (Table 1 in the Supplementary Material online). The largest differences were observed between At5g07530 (AtGRP17) where intron length is 186 bp and coding sequence (CDS) length is 1,632 bp, and Bd5g07530 with an intron length of 548 bp and an ORF length of 1,023 bp. The shorter coding sequences are mostly due to size differences in the second exon, which encodes the highly repetitive glycine/proline-rich domain. The relative proportions of the most frequent amino acids in this domain (Gly, Ser, Pro, and Lys) are roughly the same among orthologs despite the length differences (data not shown).

    FIG. 1. Schematic outline of the oleopollenin gene cluster in Arabidopsis thaliana and close relatives. Oleopollenin genes are indicated by gray boxes. Arrows indicate significant relationships based on the phylogenetic analysis of the oleosin-like domain as shown in figure 4. Dashed arrows indicate evolutionary relationships with weak bootstrap support in the DNA parsimony analysis

    Table 1 Values of Based on Pairwise Comparisons Between A. thaliana and A. lyrata.

    Between oleopollenin genes At5g07560 and At5g07600, three additional genes without sequence similarity to the oleopollenins are located. The first of these genes, At5g07570, was annotated as a glycine-rich, repetitive protein. This gene is present in A. lyrata, but absent in B. drummondii. The next gene, At5g07580, is conserved in A. lyrata and B. drummondii, but a substitution occurs in A. lyrata that leads to a premature stop codon in the N-terminal region of the protein. At5g07590 encodes a WD-repeat-like protein and is conserved in all three species.

    Analysis of Oleopollenin Gene Family Expression in Different Tissues

    Expression analysis by RT-PCR confirmed that oleopollenin genes are expressed in the flowers of A. thaliana, A. lyrata, and B. holboellii, a very close relative of B. drummondii (Koch, Bishop, and Mitchell-Olds 1999). In B. holboellii, which is predominately apomictic (Koch, Dobes, and Mitchell-Olds 2003), we analyzed the expression in individuals that differed from each other by their ploidy level (fig. 2) and did not detect any difference between these individuals. This observation suggests that the function of these proteins is maintained in individuals with different levels of ploidy. None of the genes was expressed in any other adult tissue in the species surveyed.

    FIG. 2. Expression of four oleopollenin genes in eight individuals of the predominantly apomictic species Boechera holboellii. The ploidy level as determined by fluorocytometry is indicated on the left side of the panel. Two individuals each were analyzed for every ploidy level, except for Bd5g07560. Intron-spanning primers were used to detect contamination with genomic DNA, which appear as bands with higher molecular weight on the panel

    Phylogenetic Reconstruction

    For the phylogenetic analysis of the gene family, only the oleosin-like region was used (fig. 3) because the second exon could not be aligned reliably between distant species. We obtained nearly identical phylogenetic tree topologies with different reconstruction methods (fig. 4). Orthology-paralogy relationships between the homologs from A. thaliana, A. lyrata, and B. drummondii could be resolved unambiguously because of high bootstrap support values. The only variation in the tree topology concerned the BoGRP3 gene. When DNA sequences were used for reconstruction, this gene formed a node with the BoGRP4/BoGRP5 group, and when the protein sequences where used, it was grouped with the At5g07530-related genes.

    FIG. 3. Alignment of the amino acid sequences of the oleosin domain from A. thaliana, A. lyrata, Brassica drummondii, and Brassica oleracea. The nomenclature of the gene names is explained in the Methods section

    FIG. 4. Majority-rule phylogeny of the oleosin-like domain of oleopollenin studies from A. thaliana (At), A. lyrata (Al), Boechera drummondii (Bd), and Brassica oleracea (Bo). Support values from the Tree-Puzzle analysis and bootstrap values from the DNA parsimony analysis are shown. Values are shown only for nodes with at least one value > 50. The tree topology was obtained with the dnapars program of the PHYLIP package, and the branch lengths were estimated with the codeml program of the PAML package using the one ratio model (M0), which assumes equal dN/dS ratios in all branches. Bold branches result from gene duplication events, and thin branches result from speciation events. Note that the tree is unrooted and the only the tree topology is used for comparing different likelihood models

    By mapping the phylogenetic tree onto the gene cluster, it becomes evident that the oleopollenin genes from A.lyrata and B. drummondii are colinear with their orthologs in A. thaliana (fig. 1). The genes at both ends of the cluster (At5g07510, At5g07520, At5g07600) form a subclade with strong bootstrap support that indicates a common origin through relatively recent duplication events. B. drummondii lacks orthologs of At5g07510 and At5g07520, but because we were able to find small sequence stretches with weak but significant similarity to these genes, we hypothesize that these genes were present in the common ancestor of A. thaliana, A. lyrata, and B. drummondii and were deleted in the lineage leading to B. drummondii.

    Not all of the B. oleracea genes could be assigned to orthologs in A. thaliana. We observe colinearity for the BoGRP1 and BoGRP2 genes and the corresponding A. thaliana genes. Despite low bootstrap support, it appears that BoGRP3 is the ortholog of At5g07530, because of its position in the cluster (fig. 1). BoGRP4 and BoGRP5 appear to have been duplicated from BoGRP3 after the divergence of B. oleracea from the common ancestor with the Arabidopsis and Boechera species. In contrast, At5g07510, At5g07520, At5g07600, and, possibly, At5g07530 appear to have originated by duplication of a common ancestor in the Arabidopsis/Boechera lineage.

    A comparison of the gene order shown in figure 1 and the phylogenetic relationships in figure 4 suggests that after each gene duplication, the duplicate tends to be next to the original copy, a pattern that can be expected if unequal crossing-over is the main driving force for the gene family expansion. With the exception of the duplication that generated At5g07600, all other gene duplications appear to conform to this pattern.

    Coding Sequence Divergence

    The comparison of the coding sequences revealed a rapid sequence divergence and substantial length differences in the glycine/proline-rich domain of orthologous genes (fig. 5), which made alignments between B. oleracea and the other species impossible. Alignments between A. thaliana, A. lyrata, and B. drummondii were possible but not unambiguous in some regions.

    FIG. 5. Schematic protein sequence alignment of At5g07530 (AtGRP17). Every vertical line corresponds to an amino acid residue. The sequence from A. thaliana was taken as the reference. Amino acid residues that are identical (i.e., conserved) in A. lyrata or B. drummondii species are shown in black, and nonconserved amino acid residues in gray. The white regions are sequence alignment gaps due to indels. The figure was created with the TeXshade package (Beitz, 2000)

    To test the hypothesis that divergent selection may drive the evolution of the oleopollenin genes, we calculated values of for pairwise alignments between A. thaliana and A. lyrata (table 1). Estimates of for complete coding sequences are highly variable among paralogs, which indicates different levels of evolutionary constraint. One gene (At5g07510) had > 1, but this value was not significantly different from = 1 in a LRT. When the two exons were compared separately, they showed highly variable values of dN, dS, and between paralogs (table 1). Two genes had > 1 in the first exon and one gene in the second exon, but none of these values were significantly different from = 1 in a LRT. There was no correlation between the codon number and values of , dN, and dS, suggesting other causes for the observed differences in sequence divergence.

    We also tested for different rates of evolution in the oleosin-like domain using site-specific and branch models (table 2). We first tested the hypothesis that positive selection occurs at particular sites in the oleosin-like domain during the evolution of this gene family by comparing different models in a LRT. All comparisons involving models of selection were significant: M1 vs. M2 (2 = 43.8, d.f. = 2, P = 3 x 10-10), M1 vs. M3 (2 = 48.56, d.f. = 4, P = 7 x 10-10), M7 vs M8 (2 = 9.62, d.f. = 2, P = 0.008) and M8A vs M8B (2 = 5.3, d.f. = 1, P = 0.01). All models that allow for sites under selection identify at least three sites with high posterior probability support values (P 0.95).

    Table 2 Maximum Likelihood Parameter Estimates of the Oleosin-Like Domain.

    Comparisons of branch models revealed no significant differences between different lineages in the gene phylogeny. We first compared the fixed ratio (M0) model with the free ratio model, and although the free ratio model had a higher likelihood, the difference was not significant (2 = 56.15, d.f. = 43, P = 0.086). When the fixed ratio model was compared with a two ratio model that groups branches into those that result from speciation and from gene duplications, respectively (fig. 4), the LRT again was not significant (2 = 11.24, d.f. = 43, P = 1.0). In a separate set of analyses, we also applied LRT tests of branch models to the complete sequence of genes At5g07530, At5g07540, At5g07550, and At5g07560 from A. thaliana, A. lyrata, and B. drummondii. In none of the four comparisons was the LRT significant, indicating that there are no rate differences in these genes (results not shown).

    Discussion

    The comparative sequence analysis of the oleopollenin gene family revealed that this gene family evolves very rapidly within the family Brassicaceae. Mechanisms of evolution include (1) gene duplication and loss in different lineages, (2) frequent insertions and deletions in the coding sequences and (3) a large number of amino acid replacements. Our results extend the finding of Mayfield et al. (2001) that oleopollenins show not only a high degree of variation within but also between closely related species like A. lyrata and B. drummondii. In addition, we were able to demonstrate by ML analysis of synonymous and nonsynonymous divergence that at least some of these changes result from positive selection acting on these genes.

    Gene duplication has long been considered to be an important mechanism for the evolution of functional divergence (Ohno 1970). The evolutionary analysis of the oleopollenin gene cluster showed that it is very dynamic with respect to copy number. In both A. thaliana and A. lyrata, the cluster consists of seven genes; in B. drummondii, of four genes; and in B. oleracea, of five genes, all of which are expressed in flower tissue. The phylogenetic reconstruction using the 71 codons of the hydrophobic oleosin-like domain helped to clarify the orthology-paralogy relationships of oleopollenin genes among species. There is strong evidence for independent duplication events in the lineages leading to Arabidopsis and Brassica. In A. thaliana and A. lyrata these include At5g07510, At5g07520, and At5g07600, whereas in B. oleracea, BoGRP3, BoGRP4, and BoGRP5 appear to have originated by duplication in the lineage leading to B. oleracea. It was not possible to reconstruct the relationship of the paralogous groups with high confidence because of the small number of codons that could be used for the phylogenetic analysis and the weak bootstrap support. The small number of oleopollenin genes in B. drummondii is interesting because it appears to be the result of gene loss. We found a short region with weak sequence similarity to At5g07520 and to the first but not the second exon of Atg07600 in B. drummondii, and none of these gene fragements appears to be expressed. The available data suggest a gene loss in the lineage leading to B. drummondii, but additional sequence data from closely related species are necessary to resolve this issue. It is unclear at this stage whether a small oleopollenin family and the shorter oleopollenin coding sequences are related phenomena that may indicate a degeneration of this gene family in the B. drummondii/B.holboellii species complex, in which both selfing and apomixis occur (Roy 1995). However, our expression analysis in B. drummondii and the predominately apomictic sibling species B. holboellii indicate that the remaining four genes are expressed and functional in this species. Because most of the B. holboellii pollen is infertile (Koch, Dobes, and Mitchell-Olds 2003), it appears that the expression of the oleopollenin genes is unaffected by apomixis.

    It has already been observed by Mayfield et al. (2001) in their comparison of intraspecific variation in the oleopollenin gene family that insertions and deletions frequently occur in the second exon of the oleopollenins. We observe the same pattern in the interspecific comparisons, particularly in comparisons between A. thaliana and B. drummondii, where up to one-third of the amino acid residues are missing in the latter species. The reduction in size happens by a reduction of the number of short amino acid repeat motifs in the second exon, but the overall amino acid composition is not significantly changed between the different species. These indels may contribute to the generation of species-specific variants, but in this case the observed high level of intraspecific polymorphism is not expected. Alternatively, the indels may simply reflect neutral divergence and result from a propensity for indel mutations in a repetitive protein. However, because none of the observed insertions/deletions leads to a frameshift, it is apparent that the oleopollenin genes evolve under purifying selection. Because there are currently no tests of selection available that include indels, functional studies are necessary to investigate biological consequences of indels in oleopollenin proteins.

    The analysis of sequence conservation and divergence revealed several patterns. First, despite the fact that the oleopollenins are among the most rapidly evolving proteins currently known from A. thaliana when compared to A. lyrata (Wright, Lauga, and Charlesworth 2002; Barrier et al. 2003), it is also evident that these proteins evolve under purifying selection. This is true not only for the conserved oleosin-like domain but also for the glycine/proline-rich domains, which appear to be the functional protein domains in the mature pollen (Tind et al. 1998; Hernández-Pinzón et al. 1999). Despite the numerous insertions/deletions in the second exon, there is sufficient sequence conservation to allow an alignment between A. thaliana and B. drummondii. Second, both synonymous and nonsynonymous divergence vary among paralogs, but there is no correlation with GC content or codon usage bias that would explain the variation. Some of the pairwise comparisons revealed values larger than one, but none of the values was significantly different from = 1 in a LRT. However, it is important to note that a test of selection based on the calculation of a dN/dS ratio of pairwise sequence comparisons has little power in detecting selection (Endo, Ikeo, and Gojobori 1996; Sharp 1997; Akashi 1999; Crandall et al. 1999) because many proteins have a high proportion of highly conserved amino acids and may evolve under positive selection at only a subset of amino acids, as may be the case in the glycine-rich domains of the oleopollenin proteins.

    The ML analysis of the oleosin-like domain suggests the existence of positive selection in this domain, because all models with selection are significantly different from their respective neutral models. Most amino acid residues that have been identified under selection map to the region of the oleosin-like domain that is probably immersed in the triacylglycerol interior of the tapetal oil bodies (Alexander et al. 2002). However, because of the hypothetical nature of the 3D structure of the oleosin domain, conclusions about the functional effects of selected amino acid replacements must remain speculative. Rates of divergence do not differ in the various lineages of the gene genealogy, as indicated by non-significant LRTs of branch models (table 2). In addition, accelerated rates of amino acid evolution do not seem to be associated with gene duplication events, as indicated by the comparison of the two-ratio model with the one-ratio model. This finding is in contrast to other studies which were able to show accelerated evolution after gene duplication (Ohta 1994; Li 1995) and may result from the low power of such tests with short sequences as the oleosin-like domain. We were also interested in whether differences in the breeding system affect rates of evolution, because one can postulate that selection pressures would be different on reproductive genes in an outcrossing versus a selfing species. In particular, we expected a more rapid evolution in the lineage leading to A. lyrata as evolutionary theory predicts that natural selection should be more efficient in outcrossing than in selfing species (reviewed by Charlesworth and Wright 2001) Comparison of the complete coding sequences between A. thaliana, A. lyrata, and B. drummondii provided no support for this hypothesis, and the observed differences in lineage-specific dN and dS between A. thaliana and A. lyrata (using B. drummondii as the outgroup) are not significant.

    In summary, our analyses provide additional evidence that oleopollenins are a very rapidly evolving gene family with a high degree of evolutionary dynamics. Although we were able to show that positive Darwinian selection plays a role in the evolution of this gene family, several questions remain unanswered and require further investigation. The main limitation to a better understanding of the evolutionary forces acting on the oleopollenin gene family is the unknown molecular function of the oleopollenins in the mature pollen grain. If, on the one hand, the pollenin-domains act as ligands to receptors on the pistil and are involved in species recognition, then the rapid sequence divergence and numerous indels in some of the oleopollenins may contribute to the species-specificity of such interactions. Such a hypothesis is supported by the observation of rapid sequence divergence and numerous indels in some of the genes. Similar patterns of sequence evolution were observed in other reproductive proteins (Swanson and Vacquier 2002a, 2000b). On the other hand, if the main function is to maintain structural integrity of the pollen coat, most sequence variation may be essentially neutral and not primarily involved in the recognition between pollen and pistil. Such an explanation is consistent with the limited species-specificity of the pollen-stigma adhesion process (Heizmann, Luu, and Dumas 2000) and the observation of numerous amino acid and indel substitutions within species (Mayfield et al. 2001).

    To improve our understanding of the potential role of oleopollenin genes in species recognition, further studies are needed. These include functional approaches that evaluate the phenotypic effects of sequence variation in these genes and evolutionary studies that compare the level of intraspecific and interspecific polymorphism in species with different breeding systems. For example, the hypothesis that genes involved in species recognition should have a lower level of polymorphism in an outbreeding species (e.g., A. lyrata) than in a selfing species (A. thaliana) needs to be tested. If, however, these studies verify the non-neutral evolution of oleopollenin genes, they may constitute excellent candidates for investigating the spatial scale, time frame, and demographic mechanisms of positive selection in plants.

    Acknowledgements

    This work was funded by Human Frontier Sciences Program grant RGY0055/2001-M to Z.Y. and K.S. and by an Emmy-Noether-Fellowship of the DFG (German Science Foundation) to K.S. We are grateful to Antje Figuth and Domenica Schnabelrauch for help with the sequencing and to Jürgen Kroymann and Heiko Vogel for discussions. K.S. thanks Charles Aquadro (Cornell University) for discussions and for his hospitality during the writing of this article.

    Literature Cited

    Akashi, H. 1999. Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238.

    Alexander, L., R. Sessions, A. Clarke, A. Tatham, P. Shewry, and J. Napier. 2002. Characterization and modelling of the hydrophobic domain of a sunflower oleosin. Planta 214:546-551.

    Barrier, M., C. Bustamante, J. Yu, and M. Purugganan. 2003. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163:723-733.

    Beitz, E. 2000. TeXshade: shading and labeling of multiple sequence alignments using LaTeX2e. Bioinformatics 16:135-139.

    Bielawski, J., and Z. Yang. 2001. Positive and negative selection in the DAZ gene family. Mol. Biol. Evol. 18:523-529.

    Birney, E., J. Thompson, and T. Gibson. 1996. PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids. Res. 24:2730-2739.

    Brudno, M., C. Do, G. Cooper, M. Kim, E. Davydov, NISC Comparative Sequencing Program, E. Green, A. Sidow, and S. Batzoglou. 2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13:721-731.

    Charlesworth, D., and S. Wright. 2001. Breeding systems and genome evolution. Curr. Opin. Genet. Dev. 11:685-690.

    Civetta, A. 2003. Positive selection within sperm-egg adhesion domains of fertilin: an ADAM gene with a potential role in fertilization. Mol. Biol. Evol. 20:21-29.

    Crandall, K., C. Kelsey, H. Imanichi, H. Lane, and N. Salzman. 1999. Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Mol. Biol. Evol. 16:372-382.

    de Oliveira, D., L. Franco, C. Simoens, J. Seurinck, J. Coppieters, J. Botterman, and M. Van Montagu. 1993. Inflorescence-specific genes from Arabidopsis thaliana encoding glycine-rich proteins. Plant J. 3:495-507.

    Dobzhansky, T. 1937. Genetics and the origin of species. Columbia University Press, New York.

    Doughty, J., H. Wong, and H. Dickinson. 2000. Cysteine-rich pollen coat proteins (PCPs) and their interactions with stigmatic S (incompatibility) and S-related proteins in Brassica: putative roles in SI and pollination. Ann. Bot. 85:161-169.

    Endo, T., K. Ikeo, and T. Gojobori. 1996. Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13:685-690.

    Ewing, B., L. Hillier, M. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175-185.

    Felsenstein, J. 1989. PHYLIP—phylogeny inference package (version 3.2). Cladistics 5:164-166.

    Gordon, D., C. Abajian, and P. Green. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195-202.

    Heizmann, P., D.-T. Luu, and C. Dumas. 2000. The clues to species specificity of pollination among Brassicaceae. Sex Plant Reprod. 13:157-161.

    Hernández-Pinzón, I., J. Ross, K. Barnes, A. Damant, and D. Murphy. 1999. Composition and role of tapetal lipid bodies in the biogenesis of the pollen coat of Brassica napus. Planta 208:588-598.

    Heslop-Harrison, J., R. Knox, and Y. Heslop-Harrison. 1974. Pollen-wall proteins: exine-held fractions associated with the incompatibility response in cruciferae. Theor. Appl. Genet. 44:133-137.

    Howard, D. 1999. Conspecific sperm and pollen precedence and speciation. Annu. Rev. Ecol. Syst. 20:109-132.

    Koch, M., J. Bishop, and T. Mitchell-Olds. 1999. Molecular systematics and evolution of Arabidopsis and Arabis. Plant Biol. 1:529-537.

    Koch, M., C. Dobes, and T. Mitchell-Olds. 2003. Multiple hybrid formation in natural populations: concerted evolution of the internal transcribed spaces of nuclear ribosomal DNA (ITS) in North American Arabis divavicarpia (Brassicaceae). Mol. Biol. Evol. 20:338-350.

    Koch, M., B. Haubold, and T. Mitchell-Olds. 2000. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis and related genera (Brassicaceae). Mol. Biol. Evol. 17:1483-1498.

    Koch, M., B. Haubold, and T. Mitchell-Olds. 2001. Molecular systematics of the Brassicaceae: evidence from plastidic matK and nuclear Chs sequences. Am. J. Bot. 88:534-544.

    Lee, Y.-H., T. Ota, and V. Vacquier. 1995. Positive selection is a general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12:231-238.

    Li, W. 1995. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Lord, E., and S. Russell. 2002. The mechanisms of pollination and fertilization in plants. Annu. Rev. Cell. Dev. Biol. 18:81-105.

    Luu, D.-T., D. Mary-Mazars, M. Trick, C. Dumas, and P. Heizmann. 1999. Pollen-stigma adhesion in Brassica spp. involved SLG and SLR1 glycoproteins. Plant Cell 11:251-262.

    Mayfield, J., A. Fiebig, S. Johnstone, and D. Preuss. 2001. Gene families from the Arabidopsis thaliana pollen coat proteome. Science 292:2482-2485.

    Mayfield, J., and D. Preuss. 2000. Rapid initiation of Arabidopsis pollination requires the oleosin-domain protein GRP17. Nature Cell Biol. 2:128-130.

    Metz, E., and S. Palumbi. 1996. Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Mol. Biol. Evol. 13:397-406.

    Murphy, D. 2001. The biogenesis of lipid bodies in animals, plants and microorganisms. Prog. Lipid Res. 40:325-438.

    Murphy, D., and J. Ross. 1998. Biosynthesis, targeting and processing of oleosin-like proteins, which are major pollen coat components in Brassica napus. Plant J. 13:1-16.

    Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin.

    Ohta, T. 1994. Further examples of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138:1331-1337.

    Piffanelli, P., and D. Murphy. 1998. Novel organelles and targeting mechanisms in the anther tapetum. Trends Plant Sci. 3:250-253.

    Piffanelli, P., J. Ross, and D. Murphy. 1998. Biogenesis and function of the lipidic structures of pollen grains. Sex Plant Reprod. 11:65-80.

    Robert, L., J. Gerster, S. Allard, L. Cass, and J. Simmonds. 1994. Molecular characterization of 2 Brassica napus genes related to oleosins which are highly expressed in the tapetum. Plant J. 6:927-933.

    Ross, J., and D. Murphy. 1996. Characterization of anther-expressed genes encoding a major class of extracellular oleosin-like proteins in the pollen coat of Brassicaceae. Plant J. 9:625-637.

    Roy, B. 1995. The breeding systems of six species of Arabis (Brassicaceae). Am. J. Bot. 82:869-877.

    Sambrook, J., and D. Russell. 2001. Molecular cloning. Cold Spring Harbor Press, Cold Spring Harbor, N.Y.

    Schmidt, H., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartests and parallel computing. Bioinformatics 18:502-504.

    Schopfer, C., and J. Nasrallah. 2000. Self-incompatibility. Prospects for a novel putative peptide-signaling molecule. Plant Physiol. 124:935-939.

    Schopfer, C., M. Nasrallah, and J. Nasrallah. 1999. The male determinant of self-incompatibility in Brassica. Science 286:1697-1699.

    Sharp, P. 1997. In search of molecular Darwinism. Nature 385:401-404.

    Swanson, W., A. Clark, H. Waldrip-Dail, M. Wolfner, and C. Aquadro. 2001a. Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc. Natl. Acad. Sci. USA 19:7375-7379.

    Swanson, W., R. Nielsen, and Z. Yang. 2003. Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20:18-20.

    Swanson, W., and V. Vacquier. 2002a. The rapid evolution of reproductive proteins. Nature Rev. Genet. 3:137-144.

    Swanson, W., and V. Vacquier. 2002b. Reproductive protein evolution. Annu. Rev. Ecol. Syst. 33:161-179.

    Swanson, W., Z. Yang, M. Wolfner, and C. Aquadro. 2001b. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA 98:2509-2514.

    Takayama, S., H. Shiba, M. Iwano, K. Asano, M. Hara, F.-S. Che, M. Watanabe, K. Hinata, and A. Isogai. 2000. Isolation and an characterization of pollen coat proteins of Brassica campestris that interact with S locus–related glycoprotein 1 involved in pollen-stigma adhesion. Proc. Natl. Acad. Sci. USA 97:3765-3770.

    Tind, J., S. Wu, C. Ratnayake, and A. Huang. 1998. Constituents of the tapetosomes and elaioplasts in Brassica campestris tapetum and their degradation and retention during microsporogenesis. Plant J. 16:541-551.

    Togerson, D., R. Kulathinal, and R. Singh. 2002. Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Mol. Biol. Evol. 19:1973-1980.

    Wright, S., B. Lauga, and D. Charlesworth. 2002. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol. Biol. Evol. 19:1407-1420.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556.

    Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.

    Yang, Z., and J. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Genet. 15:496-503.

    Yang, Z., R. Nielsen, N. Goldman, and A.-M. Krabbe Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.

    Zinkl, G., B. Zwiebel, D. Grier, and D. Preuss. 1999. Pollen-stigma adhesion in Arabidopsis: a species-specific interaction mediated by lipophilic molecules in the pollen exine. Development 126:5431-5440.(Manja Schein*, Ziheng Yan)