当前位置: 首页 > 期刊 > 《基因杂志》 > 2003年第2期 > 正文
编号:10585816
Selection on Rapidly Evolving Proteins in the Arabidopsis Genome
http://www.100md.com 《基因杂志》2003年第2期
     a Department of Genetics, North Carolina State University, Raleigh, North Carolina 27695o+!q&, 百拇医药

    b Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695o+!q&, 百拇医药

    c Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdomo+!q&, 百拇医药

    ABSTRACTo+!q&, 百拇医药

    Genes that have undergone positive or diversifying selection are likely to be associated with adaptive divergence between species. One indicator of adaptive selection at the molecular level is an excess of amino acid replacement fixed differences per replacement site relative to the number of synonymous fixed differences per synonymous site ({omega} = Ka/Ks). We used an evolutionary expressed sequence tag (EST) approach to estimate the distribution of {omega} among 304 orthologous loci between Arabidopsis thaliana and A. lyrata to identify genes potentially involved in the adaptive divergence between these two Brassicaceae species. We find that 14 of 304 genes (~ 5%) have an estimated {omega} > 1 and are candidates for genes with increased selection intensities. Molecular population genetic analyses of 6 of these rapidly evolving protein loci indicate that, despite their high levels of between-species nonsynonymous divergence, these genes do not have elevated levels of intraspecific replacement polymorphisms compared to previously studied genes. A hierarchical Bayesian analysis of protein-coding region evolution within and between species also indicates that the selection intensities of these genes are elevated compared to previously studied A. thaliana nuclear loci.

    THE genetic architecture of species differences has been the subject of intense study in the last few years (ORR and COYNE 1992 ; HAAG and TRUE 2001 ; WU 2001 ). There has been a concerted effort to identify genes responsible for adaptive differences between species to examine the genetic mechanisms that accompany evolutionary diversification (HAAG and TRUE 2001 ) and even speciation (WU 2001 ). Adaptive morphological and physiological differences between species should leave a signature of positive selection at the molecular level and permit an analysis of evolutionary divergence at both the molecular genetic and the phenotypic levels (HAAG and TRUE 2001 ). By investigating loci whose sequences have been shaped by positive selection, it may be possible to unravel the evolutionary genetic mechanisms that underlie adaptive divergence between species and the origins and evolution of species differences.j2y0, http://www.100md.com

    One indicator of adaptive selection at the molecular level is an excess of amino acid replacement fixed differences per replacement site relative to the number of synonymous fixed differences per synonymous site ( = Ka/Ks; LI et al. 1981 , LI et al. 1985 ; HUGHES and YEAGER 1998 ). Purifying selection on amino acid variation, for example, causes a decrease in the rate of amino acid fixation and thus an inferred {omega} < 1. If most amino acid variation is neutral, such as in pseudogenes, 1 (LI et al. 1981 ). Strong diversifying or positive selection operating on amino acid variation is associated with > 1 (HUGHES and YEAGER 1998 ; ANISIMOVA et al. 2001 ). Empirically, the distribution of {omega} varies radically among different classes of genes: in plant Brassicaceae species, the mean is ~ 0.14 (TIFFIN and HAHN 2002 ), while for many mammalian pseudogenes {omega} has been shown to cluster around 1.0 (BUSTAMANTE et al. 2002A ). In contrast, values of > 1 have been observed in gamete recognition protein-coding genes (SWANSON and VACQUIER 1995 ), loci associated with host-parasite interaction (HUGHES 1991 ), and genes involved in adaptation to specific environments (MESSIER and STEWART 1997 ). The elevated values of {omega} in these latter genes, which encode rapidly evolving proteins, are believed to arise from selection for divergence in protein structure and function.

    Identifying genes with increased values of will be facilitated by evolutionary genomic approaches that permit investigators to sample and compare large numbers of genes between species genomes to search for those loci characterized by rapid evolution (ENDO et al. 1996 ; SWANSON et al. 2001 ; TIFFIN and HAHN 2002 ). Although whole-genome sequences are not generally available from two closely related species, rapidly evolving proteins can be identified using expressed sequence tags (ESTs). An evolutionary EST approach has been used, for example, in demonstrating that genes encoding accessory gland-specific proteins in Drosophila species evolve faster than other loci in the genome, possibly as a result of selection pressures associated with mate choice and intersexual genomic conflict (SWANSON et al. 2001 ).ue, http://www.100md.com

    Our objective is to examine whether an evolutionary EST approach can identify genes with increased selection intensities in the Arabidopsis genome. Expressed sequence tags from developing inflorescences of Arabidopsis lyrata were compared to the whole-genome sequence of the model plant A. thaliana to estimate the distribution of nonsynonymous and synonymous substitution differences between these two Brassicaceae species. Fourteen genes with > 1, which encode rapidly evolving proteins, were identified between these two plant taxa. These genes have accelerated rates of nonsynonymous substitutions that may be associated with adaptive evolution since the divergence of these two species 5.2 MYA (KOCH et al. 2000 ). Molecular population genetic analysis of six of these rapidly evolving protein loci confirms that the selection intensities on protein sequence change in these genes are significantly higher than those in previously studied Arabidopsis nuclear loci.

    MATERIALS AND METHODSwp)&8, 百拇医药

    Isolation and sequencing of expressed sequence tags:wp)&8, 百拇医药

    Seeds from individuals of a population of A. lyrata in Karhumaki, Russia were obtained from Outi Savolainen and Helmi Kuittinen. Total RNA was extracted from the A. lyrata inflorescences using the RNeasy plant mini kit (QIAGEN, Valencia, CA) and a cDNA library constructed in the plasmid vector pCMV-PCR using the PCR cDNA library construction kit (Stratagene, La Jolla, CA). Plasmid DNA from cDNA clones was isolated using the REAL Prep96 BioRobot kit (QIAGEN) with the BioRobot 9600 (QIAGEN) and sequenced from the 5' end using an ABI Prism 3700 96-capillary automated sequencer (Perkin-Elmer, Norwalk, CT). Sequences were edited on the basis of Phred (EWING et al. 1998 ) quality scores, with a Phred scoring threshold of 20. Ambiguous base calls were visually confirmed against the chromatograms. These EST sequences are deposited in GenBank (accession nos. BQ834040, BQ834596 and BQ839827– BQ839830).

    Analyses of ESTs:4i, 百拇医药

    A. thaliana sequences homologous to the high-quality A. lyrata EST sequences were identified by BLAST analysis against the A. thaliana whole-genome coding database found at The Arabidopsis Information Resource database , using a maximum expected value (E) of e-5. The GenBank nonredundant nucleotide sequence database was also searched to find the closest matching A. thaliana genomic bacterial artificial chromosome (BAC) clone sequence. The top matches from each database were visually aligned with their matching A. lyrata EST sequence. Calculations were made on the basis of pairwise comparisons between the A. lyrata EST and A. thaliana coding region genomic DNA sequence. EST-based nonsynonymous and synonymous distances were calculated using the modified NEI and GOJOBORI 1986 method as implemented in MEGA 2.1 (KUMAR et al. 1993 ). A software package was developed to carry out a permutation analysis of nonsynonymous/synonymous substitution ratios for genes incorporating a modified Nei-Gojobori method (ZHANG et al. 1998 ). Since there are several possible sources of sequence error in the experimental acquisition of the EST sequence data, we refer to these estimates of nonsynonymous and synonymous substitutions as K*a and K*s, respectively. In general, and, where {epsilon} is an error term that reflects the empirical error in sequence determination. Sequence comparisons using duplicate ESTs suggest that {epsilon} ~ 0.2% (M. BARRIER, unpublished results), although this may be an overestimate since some of these differences may arise from allelic variation.

    Isolation and sequencing of alleles:4, http://www.100md.com

    Six genes with K*a/ K*s values >1 were chosen for molecular population genetic analysis. Primers were designed to amplify 1- to 2-kb regions of these genes on the basis of the A. thaliana sequences in the selected comparisons (see Table S1 at ). Leaf tissue from 10–13 A. thaliana ecotypes was obtained from single-seed propagated material provided by the Arabidopsis Biological Resource Center (see Table S2 at ). DNA was isolated from these A. thaliana ecotypes as well as 2–5 A. lyrata individuals (see above), using the DNeasy plant mini kit (QIAGEN). PCR of A. thaliana samples was performed with Taq DNA polymerase (Eppendorf, Madison, WI), using standard protocols. A. thaliana samples were sequenced directly via cycle sequencing with primers in both directions. PCR of A. lyrata samples was performed with the error-correcting Tgo polymerase (Roche, Indianapolis), using the manufacturer's amplification protocol. The error rate of error-correcting polymerases is <1 in 7000 bp (OLSEN et al. 2002 ). Amplified A. lyrata products were cloned into pCR4Blunt-TOPO vector using the Zero Blunt TOPO PCR cloning kit (Invitrogen, San Diego). Plasmid miniprep DNA was isolated using the QIAprep miniprep kit (QIAGEN) and sequenced via cycle sequencing. DNA sequencing was conducted with a Prism 3700 96-capillary automated sequencer (Perkin-Elmer). GenBank accession numbers for these population data are AY140430, AY140431, AY140432, AY140433, AY140434, AY140435, AY140436, AY140437, AY140438, AY140439, AY140440, AY140441, AY140442, AY140443, AY140444, AY140445, AY140446 and AY140459, AY140460, AY140461, AY140462, AY140463, AY140464, AY140465, AY140466, AY140467, AY140468, AY140469, AY140470, AY140471, AY140472, AY140473, AY140474, AY140475, AY140476, AY140477, AY140478, AY140479, AY140480, AY140481, AY140482, AY140483, AY140484, AY140485, AY140486, AY140487, AY140488, AY140489, AY140490, AY140491, AY140492, AY140493, AY140494, AY140495, AY140496, AY140497, AY140498, AY140499, AY140500, AY140501, AY140502, AY140503, AY140504, AY140505, AY140506, AY140507, AY140508, AY140509, AY140510, AY140511, AY140512, AY140513, AY140514, AY140515, AY140516, AY140517, AY140518, AY140519, AY140520, AY140521, AY140522, AY140523, AY140524, AY140525, AY140526, AY140527, AY140528, AY140529, AY140530, AY140531.

    Molecular population genetic data analysis:'ez, 百拇医药

    Sequences of A. thaliana and A. lyrata populations were aligned and visually corrected. All polymorphisms were visually checked against chromatographs or via resequencing. Analysis of polymorphism and divergence was carried out using DnaSP 3.5 (ROZAS and ROZAS 1999 ). A. thaliana species-wide silent site nucleotide diversity, (NEI 1987 ), and {theta} (WATTERSON 1975 ) were estimated. The McDonald-Kreitman test (MCDONALD and KREITMAN 1991 ) was performed to test for neutral evolution in the protein-coding region. A hierarchical Bayesian method is utilized to analyze McDonald-Kreitman-type tables for 12 previously studied (BUSTAMANTE et al. 2002B ) and 6 rapidly evolving A. thaliana protein genes to estimate selection coefficients for replacement changes under a Poisson random field model (SAWYER and HARTL 1992 ). Details of the analytical approach utilized here are described in the accompanying Appendix.'ez, 百拇医药

    RESULTS AND DISCUSSION

    Expressed sequence tags in A. lyrata:1#, 百拇医药

    A small collection of ESTs were isolated and sequenced to obtain genes expressed in developing inflorescences in A. lyrata. From a cDNA library of ~ 2800 colonies, 768 clones were sequenced. From this sequence collection, 561 good-quality sequences of at least 200 bp in length were subjected to further analysis. A. thaliana orthologs to these A. lyrata ESTs were identified by BLAST searches of the whole-genome A. thaliana coding sequence database; 78 of the sequences did not match a clear A. thaliana ortholog and were not considered further. Ninety-five duplicate EST matches for A. thaliana coding sequences were also eliminated. The GenBank nonredundant nucleotide database was also searched for A. thaliana BAC clone sequences containing genes homologous to the A. lyrata EST sequences. By aligning the genomic BAC clone sequence with the A. lyrata and A. thaliana coding sequences, the boundaries of noncoding regions were located. After eliminating 84 sequence alignments with <150 bp of coding sequence, 304 unique ESTs remained for further analysis (see Table S3 at ). Although these ESTs represent coding region fragments, we subsequently refer to these as genes.

    The A. lyrata EST sequences were assigned to different functional categories using the classifications of the orthologous A. thaliana sequences from the TAIR database (ARABIDOPSIS GENOME INITIATIVE 2000). Unclassified genes and those whose classifications were ambiguous were not included in this comparison. Of the >25,000 sequences in the A. thaliana coding sequence database, only slightly >4000 have thus far been unambiguously classified (ARABIDOPSIS GENOME INITIATIVE 2000). Eighty-seven of the 304 unique A. lyrata ESTs matched an A. thaliana coding sequence that has yet to be classified. In determining the count for each functional category, multiple categories listed for a single sequence were each counted as an equal fraction of the sample count. On the basis of this analysis, the range of functional categories represented by the 304 unique ESTs appears to be representative of those observed for the entire A. thaliana gene set (see 1).?!s!aq, 百拇医药

    fig.ommitted?!s!aq, 百拇医药

    Figure 1. Comparison of functional classifications of genes in the A. thaliana genome and the A. lyrata evolutionary ESTs.

    Distribution of nucleotide substitution rates between A. thaliana and A. lyrata::^ask1, 百拇医药

    Comparisons of the coding sequences of the 304 ESTs from A. lyrata with the whole genome sequence from A. thaliana allow us to compare the distribution of the rates of nucleotide substitution between these two species. The distributions of both synonymous and nonsynonymous substitutions are shown in 2. The mean length of aligned coding sequences is 111 ± 2.19 codons. The distribution of the number of synonymous nucleotide substitutions ranges from 0.000 to 0.552 synonymous substitutions per synonymous site. The distribution of K*s has one mode between 0.10 and 0.15 and has a long tail. The mean synonymous substitution distance is 0.119 ± 0.004 (see 2A), which is comparable to estimates observed in previous comparisons of A. thaliana/A. lyrata loci. If we assume a divergence time of 5.2 MYA for the two species (KOCH et al. 2000 ), this indicates that the average synonymous mutation rate for Arabidopsis nuclear genes is ~ 1.1 x 10-8 substitutions/site/year.

    fig.ommitted$, 百拇医药

    Figure 2. Distributions of (A) synonymous (K*s) and (B) nonsynonymous (K*a) sequence distances, as well as (C) * estimates, among 304 orthologs between A. thaliana and A. lyrata.$, 百拇医药

    The distribution of the number of nonsynonymous substitutions between the two species differs from that observed for synonymous substitutions. The nonsynonymous distance distribution has a mode of <0.050 nonsynonymous substitutions/nonsynonymous site, and the frequency decreases with increasing nonsynonymous substitution distances until ~ 0.150 (see 2B). The range of K*a is 0.000–0.159 nonsynonymous substitutions/nonsynonymous site, with a mean of 0.025 ± 0.002. As expected, the mean K*a < K*s, which reflects that action of purifying selection that prevents many nonsynonymous mutations from reaching fixation between species. The mean rate of nonsynonymous substitutions in nuclear genes between the two species is 0.24 x 10-8 substitutions/site/year, which is approximately fivefold lower than the average synonymous mutation rate.

    The distribution in evolutionary rates between species assumes that the comparisons are for orthologs between the two species. Identifying the correct orthologs between species will be confounded by gene duplications immediately at or prior to the most recent common ancestor of A. thaliana and A. lyrata, followed by deletion of one of the duplicate gene copies in the A. thaliana genome. It is unclear what the rate of gene duplication/deletion is within these species genomes.@;, 百拇医药

    Variation in selective constraints among Arabidopsis genes:@;, 百拇医药

    The historical action of selection on a gene can be inferred from the relative ratio of nonsynonymous to synonymous substitutions, {omega} = Ka/Ks (LI et al. 1981 , LI et al. 1985 ; HUGHES and YEAGER 1998 ; ANISIMOVA et al. 2001 ). The evolutionary EST data can be used to estimate the distribution of the selection parameter {omega} * between A. thaliana and A. lyrata. The value of is ascertained for each of the 304 orthologous pairs between the two species, corrected for sequencing errors. The estimates of {omega} * range from 0.00 to 2.59; the distribution is similar to that of K*a in that the most genes have a low * value, and the distribution decreases with increasing * (see 2C). The mean value of {omega} *, obtained by bootstrap resampling of K*a and K*s pair values 100,000 times (BUSTAMANTE et al. 2002A ), is 0.213 [95% confidence interval (C.I.): 0.187 * 0.241]. This is higher than previous estimates of {omega} ~ 0.14 for a set of comparisons between A. thaliana and Brassica rapa (TIFFIN and HAHN 2002 ) and 0.18 for four gene comparisons between A. thaliana and A. lyrata (LAWTON-RAUH et al. 1999 ).

    Of the 304 orthologous gene pairs in this evolutionary EST study, 37 genes (12%) have * = 0. These represent genes whose encoded proteins are under very strong selective constraint. At the other extreme, 14 genes (5%) have * values >1, suggesting that these genes may have accelerated rates of nonsynonymous substitutions associated with higher selection intensities on amino acid replacement changes. These loci include genes that encode RNA and zinc-finger helicases, extensin-like proteins, and zinc-finger transcription factors. More than half of these rapidly evolving protein loci (8 out of 14 genes) encode hypothetical or putative proteins of unknown function. Genes with > 1 have a higher mean K*a (0.075 ± 0.011) and a lower mean K*s (0.042 ± 0.009) than genes that comprise the entire EST data set. The increased {omega} estimates for these genes thus stem from having both elevated absolute levels of nonsynonymous substitutions and lower levels of synonymous substitutions.

    It is possible that identifying 14 loci with {omega} * > 1 may not represent genes subject to gene-specific selection mechanisms, but may simply be due to stochastic sampling from a set of 304 loci. To test this possibility, we approximated the distribution of K*a/K*s under the null hypothesis that all genes evolved according to some common evolutionary process such as selection. To approximate this null distribution, 1000 data sets were simulated, each of which consisted of 304 simulated gene pairs. The 304 simulated gene pairs were generated by sampling the aligned A. thaliana and A. lyrata codons from the EST data set without replacement. The lengths of the 304 simulated genes matched the lengths of the 304 genes in the actual EST data set. For every pseudodata set, the * ratio was separately estimated for all 304 simulated genes, and the number of these 304 {omega} * estimates that exceeded 1 was then counted. None of the 1000 simulated data sets yielded as many as 14 genes with {omega} * > 1 (P < 0.001; see 3). In fact, 10 was the highest number of genes with K*a/K*s estimates >1 and this occurred only once. The mean number of genes with {omega} * > 1 under this null hypothesis was 2.121 with a sample standard error of 0.047. This indicates that finding 14 genes with {omega} * > 1 by chance in a set of 304 loci is improbable under the null hypothesis that the action of evolutionary forces is homogenous across all loci.

    fig.ommitted%}m, http://www.100md.com

    Figure 3. Distribution of number of loci with * > 1 in permutation analysis. The mean for 1000 permuted gene data sets and the observed value from the actual observed distribution are indicated.%}m, http://www.100md.com

    Molecular population genetics of Arabidopsis loci that encode rapidly evolving proteins:%}m, http://www.100md.com

    One other approach to confirm the roles of positive or diversifying selection in the evolution of rapidly evolving protein genes is to examine the levels and patterns of nucleotide variation at these loci within and between species (MCDONALD and KREITMAN 1991 ; SCHMID and AQUADRO 2001 ). If the elevated levels of replacement differences between species arise from neutral processes, we should also observe a comparable increase in intraspecific replacement polymorphisms. Moreover, molecular population genetics also provides methods of selection analysis that determine whether genes are evolving according to the predictions of the neutral theory or have been subject to adaptive selection (NIELSEN 2001 ).

    Analysis of within-species nucleotide diversity in A. thaliana was undertaken for six genes identified from the evolutionary EST analysis as having {omega} * > 1 (see 1). These genes have a range of {omega} * from 1.28 to 2.03 and were chosen to encompass the range of high {omega} * values. These sampled genes include the Arabidopsis NAC2 transcriptional activator (XIE et al. 2000 ) and a locus homologous to the human p55.11 protein-coding gene (BOLDIN et al. 1995 ). Four others are hypothetical or putative proteins predicted in the A. thaliana genome annotation and represent genes of unknown function. All these genes are found in the A. thaliana EST sequence database, indicating that they are expressed in the developing plant. The genome annotation of the correct reading frame for one gene (AT2G04410) is ambiguous; we have relied on the original genome annotation to identify the reading frame for this locus and this choice does not significantly affect our results.%fh.ho, 百拇医药

    fig.ommitted

    Table 1. Nucleotide variation levels of Arabidopsis thaliana genes encoding rapidly evolving proteinsoo, 百拇医药

    Alleles of each of these genes were isolated and sequenced from 10–13 ecotypes in A. thaliana and 2–5 individuals from a Russian population of A lyrata. The latter sample was included to provide an interspecific comparison; the sample sizes from the A. lyrata population were too small to permit a meaningful assessment of diversity for these genes in this species. The sequenced portions range in size from ~ 0.4 to 1.6 kb and include exon sequences that encompass the protein-coding regions that display the high * values observed in the evolutionary EST analysis. The number of codons analyzed in these molecular population genetic data sets ranges from 84 to 341 codons. For four of the six genes, the amount of coding sequence assayed in the molecular population genetic analysis was ~ 1.5–6 times greater than the size of the sequenced ESTs. The other two had coding sequence lengths nearly equal to the length in the evolutionary EST analysis. Levels of within-species nucleotide diversity at silent sites, , for these six rapidly evolving protein genes ranged from 0.0003 to 0.0140 in A. thaliana, with a mean of 0.0050 ± 0.0022 (see 1). This is slightly lower but comparable to the mean of ~ 0.007 observed for other previously studied A. thaliana nuclear genes (MIYASHITA et al. 1998 ; PURUGGANAN and SUDDITH 1998 ; AGUADE 2001 ; OLSEN et al. 2002 ). The mean value for silent-site {theta} is 0.0057 ± 0.0023.

    The mean {omega} estimates for the regions sequenced in this population genetic survey are all, except for the unknown gene AT2G04410, <1 (see 1). This reduction in {omega} values compared to those obtained in the EST study may arise from the longer lengths of coding sequences analyzed for most of the genes in the molecular population genetic study and underlying heterogeneities in selective constraint across these loci. All of the {omega} values, however, are greater than the mean for A. thaliana genes. Moreover, the mean Ks from this expanded sequence region is 0.09 ± 0.02 substitutions/site, which is close to the mean for the entire data set (mean ). Permutation analysis indicates that the mean Ks of the genes in the molecular population genetic study is not significantly different from the mean Ks of the EST data set (P < 0.22). Thus, the molecular population genetic analysis is based on sequence information whose synonymous substitution rate is comparable to the mean for A. thaliana genes.

    Elevated levels of fixed replacement differences among rapidly evolving Arabidopsis protein genes:2.!fi^, http://www.100md.com

    The relative levels of within- to between-species polymorphisms in nucleotide sites that encode a gene's products provide information on the selective forces that act in protein-coding regions (MCDONALD and KREITMAN 1991 ; BUSTAMANTE et al. 2002B ). Levels of within-species replacement and synonymous polymorphisms as well as fixed differences between A. thaliana and A. lyrata in six rapidly evolving protein genes are shown in 2. The levels of evolutionary change observed for these six rapidly evolving protein loci can be compared with the levels and patterns of nucleotide variation observed among 12 other previously studied A. thaliana genes (BUSTAMANTE et al. 2002B ). In this study, the latter genes represent a set of Arabidopsis nuclear loci chosen without regard to their rates of protein evolution. The previously studied genes have a mean Ka of 0.020 ± 0.014 and a mean Ks of 0.148 ± 0.014. These genes have values ranging from 0.03 to 0.30.

    fig.ommittedwcr{{, 百拇医药

    Table 2. Replacement and synonymous changes at rapidly evolving A. thaliana geneswcr{{, 百拇医药

    The posterior distributions of the interspecies divergence time, t, between A. thaliana and A. lyrata are comparable between the two gene classes (for the method, see the Appendix). For the previously studied gene class, the mean of the posterior distribution for t1 equaled 8.6 in multiples of twice the effective population size with 95% highest posterior credibility interval (CI) of [6.9 t1 11.2]. Using data for the rapidly evolving protein genes only, the posterior distribution of t2 has mean 9.5 and 95% highest posterior (HP) CI of [7.0 t2 13.9]. Using all of the data, we get 95% HPCI of [7.41 t 11.22] for t with the posterior distribution having a mean of 9.16. The similarity in interspecies divergence time estimates suggests that the data from both gene classes compare loci of similar divergence times and are thus likely orthologous, and not paralogous, between A. thaliana and A. lyrata. This also indicates that the levels of synonymous substitutions between the two gene classes give comparable estimates of divergence time, indicating that the levels of interspecific synonymous divergence are comparable between both gene classes.

    As expected, there is a significant elevation in the levels of fixed replacement differences between A. thaliana and A. lyrata in these six rapidly evolving protein genes. Among these loci, a total of 89 of the 168 coding region differences between these two species (53%) result in amino acid replacements in the encoded proteins. In contrast, only 123 of 373 fixed differences (33%) in previously studied Arabidopsis nuclear genes are replacement differences. The contrast in relative levels of replacement to synonymous fixed differences between these two gene classes is significant (Fisher's exact test, P < 2 x 10-5).p+o:d&d, 百拇医药

    By comparison, the relative levels of within-species replacement to synonymous nucleotide polymorphisms within A. thaliana do not differ significantly between the rapidly evolving protein loci and previously studied nuclear genes. Among the genes in this study, 24 of the 38 intraspecific coding region polymorphisms (63%) are replacement polymorphisms. Among 12 previously studied A. thaliana nuclear genes, 108 of 212 polymorphisms (51%) are replacement changes. The relative levels of within-species replacement to synonymous site polymorphisms are not significantly different between the two gene classes (Fisher's exact test, P < 0.22). These results indicate that while the rapidly evolving protein loci have increased levels of fixed replacement differences this is not accompanied by a significant increase in relative levels of intraspecific replacement polymorphisms.

    Rapidly evolving protein genes display elevated selection intensities in protein-coding regions:k-s%+, 百拇医药

    Selection in a specific protein-coding gene is conventionally detected in a test of homogeneity (the McDonald-Kreitman test) that examines within- and between-species replacement to synonymous nucleotide changes (MCDONALD and KREITMAN 1991 ). Despite the overall increase in between-species fixed replacement differences between A. thaliana and A. lyrata in these six rapidly evolving protein genes, none of these individual genes show evidence of positive selection (Fisher's exact tests, P < 0.10–1.00).k-s%+, 百拇医药

    Although none of these individual genes show evidence of positive selection, each gene contains information regarding the selective forces that act on amino acid replacements (BUSTAMANTE et al. 2002B ). Using the cell entries from a conventional McDonald-Kreitman contingency table it is possible to estimate the four parameters in a Poisson random field model (S for synonymous sites, R for replacement sites, t for interspecies divergence time, and for replacement sites selection coefficient; BUSTAMANTE et al. 2002B ). For a set of McDonald-Kreitman-type tables from the same species pairs, we can also model variation in selection among genes. This information can be analyzed using a hierarchical Bayesian framework to describe the probability distribution of the selection intensity, , for each individual Arabidopsis gene (BUSTAMANTE et al. 2002B ; see Appendix). These selection intensities can be considered as the relative levels of selection on amino acid replacements with respect to synonymous site changes. Variation in selection among genes is modeled as normally distributed with unknown mean, µ, and variance, 2, for both the rapidly evolving protein and the previously studied gene classes.

    The Markov chain Monte Carlo (MCMC) sampling scheme described in the Appendix was used to draw from the joint posterior probability distribution of several parameters in the model given the data in the McDonald-Kreitman tables for all 18 genes. These include the mean and variance parameters for previously studied (µ1, 1) and rapidly evolving protein loci (µ2, 2), the scaled species divergence parameter (t), the vectors of mutation rates at synonymous sites (S) and replacement sites ({theta} R), and the vector of selection coefficients . At completion of the sampling scheme, we have 10,000 quasi-independent vectors for each parameter in the model drawn from the joint probability distribution of the parameters given the data.w^, 百拇医药

    The distribution of the sampled values of for each locus [i(1), i(2), ... , i(10,000) for 1 i 18] within and between the two classes of Arabidopsis genes provides several striking results. The means of the draws for each gene in the class of previously studied Arabidopsis loci ranged from -2.285 to +0.941. Six of these loci have 95% HPCIs that are entirely <0 (see 4). These negative selection intensities suggest that most amino acid replacements are slightly deleterious and persist due to the inbreeding associated with the predominant selfing observed in this species (BUSTAMANTE et al. 2002B ). In contrast, the means of the samples for the rapidly evolving protein loci ranged from -0.823 to 0.856. Three of the six rapidly evolving protein genes (50%) have < 0. Only one of these genes, which encodes a protein of unknown function, has a 95% HPCI <0. Three genes have > 0, although the 95% HPCIs for all of these also encompass 0.

    fig.ommittedf/:, 百拇医药

    Figure 4. Selection intensities of A. thaliana nuclear genes. The open and solid circles are values for previously studied and rapidly evolving A. thaliana protein genes, respectively. The bars indicate the 95% highest posterior credible intervals of estimates. The genes from left to right are AP3, PgiC, PI, AP1, ChiA, CAL, TFL1, FAH1, hypothetical gene (AT1G67140), Adh1, F3H, unknown gene (AT2G04410), putative gene (AT4G15950), hypothetical gene (AT3G22060), LFY, p55.11-like gene(AT4G28470), NAC2, and CHI.f/:, 百拇医药

    The posterior distribution of µ, the average selective effect of amino acid replacement changes for rapidly evolving protein genes, shows a shift in the positive direction (see 5). The previously studied Arabidopsis loci have a posterior mean for the average selective effect of amino acid replacements 1) of -0.9622. We find that the posterior probability that µ1 > 0, P(µ1 0), is 0.02. In contrast, the posterior distribution for µ2 (the average selective effect of amino acid replacement in the rapidly evolving protein genes) has a mean of +0.1201 and P(µ2 ">=" 0) ~ 0.56. The posterior mean of the difference between the average selective effects 1 - µ2) is -1.0823. This analysis indicates that amino acid replacement changes in these rapidly evolving protein genes are more beneficial (or less deleterious) than those found in previously studied nuclear loci.

    fig.ommitted7'wrrpv, http://www.100md.com

    Figure 5. Distribution of selection coefficients for A. thaliana nuclear genes. µ1 (solid line) and µ2 (dotted line) are the selection coefficient distributions for previously studied and rapidly evolving A. thaliana protein loci.7'wrrpv, http://www.100md.com

    Evolution of rapidly evolving protein genes in the genome:7'wrrpv, http://www.100md.com

    Approximately 5% of the inflorescence-expressed genes examined in this evolutionary study have values of * > 1 between A. thaliana and A. lyrata orthologs and are potential candidates for genes associated with adaptive divergence between these two species. The high proportion of genes with * > 1 suggests that rapidly evolving protein-coding loci may represent a significant portion of genes in eukaryotic genomes. This proportion, however, is higher than that observed in a similar analysis between A. thaliana and B. rapa, two species that last shared a common ancestor 35 MYA (TIFFIN and HAHN 2002 ). On the basis of 218 coding sequences from a floral EST-based B. rapa data set, no gene was identified with > 1. A larger study using 3595 gene sequences across all species represented in DNA databases also indicates that the number of genes that have > 1 is <0.5%, suggesting the number of genes in this class may be very low (ENDO et al. 1996 ). In these two studies, however, the comparisons were between distantly related taxa. It is likely that the ability to identify genes with > 1 may be facilitated by using more closely related species, where the signature for accelerated nonsynonymous substitution, possibly arising from positive selection, may be more readily apparent. Indeed, a study of male accessory gland ESTs from closely related Drosophila species has identified 11% of genes with > 1 (SWANSON et al. 2001 ).

    Molecular population genetic analysis confirms the increased selection intensities associated with genes that display accelerated rates of nonsynonymous evolution. In previously studied A. thaliana nuclear genes, most replacement changes are slightly deleterious and their estimated selection intensities are generally negative (BUSTAMANTE et al. 2002B ). Many A. thaliana nuclear loci studied to date possess high levels of within-species replacement polymorphisms (PURUGGANAN and SUDDITH 1998 ), few of which go to fixation and contribute to differences between A. thaliana and A. lyrata. In contrast, the class of rapidly evolving protein genes that were identified in this evolutionary EST study as having accelerated rates of nonsynonymous evolution generally possesses higher selection intensities on amino acid replacements. This is evident in the shift of the distribution of selection coefficients, µ, toward positive values compared to the distribution of previously studied A. thaliana genes (see 5).

    Positive selection associated with the fixation of protein sequence variants may explain the increased selection intensities on these genes. The increase in selection intensities for rapidly evolving protein genes may also arise, however, from neutral evolutionary forces on replacement polymorphisms, leading to increased fixation of amino acid changes. This is underscored by the distribution of selection coefficients, µ2, for the rapidly evolving protein gene class, which while shifted to the positive direction is nevertheless centered near µ = 0 (see 5). All the rapidly evolving protein genes used in the molecular population genetic study, however, are expressed in both species and there are no premature stop codons in these loci. This suggests that if neutral evolution underlies the increased selection intensities in these rapidly evolving protein loci, they do not appear to be associated with pseudogene formation.|m}, 百拇医药

    It is likely that both neutral evolution and positive selection may be responsible for the rapidly evolving protein genes identified in this evolutionary EST study. Nevertheless, evolutionary ESTs do appear to provide a general genomic approach to identify loci associated with increased selection intensities on protein sequence, some of which may underlie adaptive evolution between these species. It should be noted that while some of the loci with {omega} * > 1 identified in this evolutionary EST study may be associated with adaptive divergence, this may underestimate the role of positive or diversifying selection in shaping gene structure and function in the genome. The criteria of {omega} * > 1 as an indicator of selection can be overly stringent, as it requires that amino acid fixations occur throughout the gene and does not recognize adaptive fixation of small numbers of replacement changes. Moreover, it does not identify genes in which the selective force acts on regulatory regions of the gene, which is believed to be a major factor in adaptive divergence between species (DOEBLEY and LUKENS 1998 ; SUCENA and STERN 2000 ).

    The function of the genes that encode rapidly evolving proteins remains largely unknown. Of the three genes in the molecular population genetic analysis that have selection intensities >0, one encodes a previously unknown protein while the other two are homologous to known genes in A. thaliana or other eukaryotic organisms. One gene is NAC2, which belongs to a family of transcription factors required in Arabidopsis development (XIE et al. 2000 ). The other gene encodes a protein homologous to the human p55.11 protein that binds to the tumor necrosis factor p55 receptor (BOLDIN et al. 1995 ). The precise functions of these genes in Arabidopsis remain to be elucidated. Expression studies as well as phenotypic analysis of T-DNA insertion mutants for these genes may permit an assessment of their functions and may also provide clues as to the traits that are the targets of selection in the divergence between A. thaliana and A. lyrata.(xn, 百拇医药

    ACKNOWLEDGMENTS(xn, 百拇医药

    The authors thank the NCSU Genome Research Laboratory for use of its facilities and the Purugganan laboratory and three anonymous reviewers for helpful suggestions that improved the article. We are also grateful to Jeff Thorne for suggesting the permutation analysis. M.B. was funded by a National Institutes of Health training grant on the genetics of complex traits and C.D.B. by a Marshall-Sherfield Fellowship from the Marshall Commemoration Commission. This work was funded in part by grants from the National Science Foundation Integrated Research Challenges in Environmental Biology program to M.D.P, J. I. Schmitt, and T. F. C. Mackay; and from the Alfred P. Sloan Foundation to M.D.P.

    Manuscript received June 10, 2002; Accepted for publication November 11, 2002.ugg+, 百拇医药

    APPENDIXugg+, 百拇医药

    It has been shown (SAWYER and HARTL 1992 ) that under the assumptions of the Poisson random field setting, the sampling distributions for the number of polymorphic sites within species, S, and fixed differences between two species, K, are independent Poisson-distributed random variables with ratesugg+, 百拇医药

    where is the scaled selection coefficient for new mutations (2Nes), is the scaled mutation rate (4Ne·µ), t is the scaled time since species divergence (number of generations since divergence/2Ne), Ne is the effective population size, and n and m are the sample sizes from the two species. The functions F(, n) and G(, n) are as previously described (SAWYER and HARTL 1992 ; BUSTAMANTE et al. 2002B ).ugg+, 百拇医药

    Using the cell entries from a conventional McDonald-Kreitman table (MCDONALD and KREITMAN 1991 ), it is possible to estimate four parameters in such a model: {theta} S (mutation rate for silent sites), R (mutation rate for replacement sites), t, and (selection intensity for replacement sites) assuming silent sites are neutral (i.e., = 0 for all silent sites). For a set of such tables from the same species pairs, it is also possible to model variation in selection among genes by specifying a distribution for and estimating the parameters of this hyperdistribution given the data in all of the tables. A convenient form to use is the normal distribution since selection coefficients can be either positive or negative. It should also be noted that the species divergence time is a shared parameter across all the tables in such an analysis.

    Hierarchical model:6*av)v, 百拇医药

    The analysis we present is based on a description of the joint and marginal posterior probability distributions of the following model, described in three parts.6*av)v, 百拇医药

    Part 1: Let be the vector of selection coefficients, with 1, ... , 12 being the set of previously studied loci (BUSTAMANTE et al. 2002B ) and 13, ... , 18 representing rapidly evolving protein loci. R and S are the corresponding vectors of mutation rates at replacement and silent sites. Denote the mean and variance of the distribution of among previously studied genes as µ1 and 21 and use µ2 and 22 for the analogous quantities for the rapidly evolving protein genes.6*av)v, 百拇医药

    Part 2: Set a truncated uniform prior distribution for t on (0, T), where T is chosen on the basis of prior information on the upper bound for the species divergence time. We used T = 100, corresponding to an upper bound of between 20 and 200 million years ago.

    Part 3: Assume a normal conjugate prior probability distribution for the mean and variance parameters for each of the two classes of genes so that$5-]5, http://www.100md.com

    where µ0, {kappa} 0, {nu} 0, and {sigma} 20 are parameters of the prior distributions for µi and {sigma} 2i, Inv - 2 refers to an inverse 2 distribution, and i {1, 2} indexes the two sets of hyperparameters for both classes of genes. The notation is borrowed from GELMAN et al. 1997 . Note that if 0 and {nu} 0 are chosen to be small and {sigma} 20 to be large, the prior distribution will be uninformative. In our runs we used .$5-]5, http://www.100md.com

    Results for conditional posterior distributions:$5-]5, http://www.100md.com

    The joint posterior distribution of interest p(, R, S, t, µ1, 21, µ2, 22|data) can be approximated using a Markov chain Monte Carlo sampling scheme similar to that implemented in BUSTAMANTE et al. 2002B using the following results:

    Result 1: The conditional posterior distribution of µ1|21, depends only on the entries in {gamma} that are members of the class i and can be shown to be normally distributed asz01d%, http://www.100md.com

    where i is the arithmetic average of the entries in for class i and Ji is the number of genes in the class.z01d%, http://www.100md.com

    Result 2: The marginal posterior distribution of 2i conditional on , which depends only on the sample variance of the entries in that are members of the class i and the parameters of the prior distribution, has an inverse 2 distribution with parameters Ji and 2Ji as given in GELMAN et al. 1997 .z01d%, http://www.100md.com

    Result 3: Using independent Gamma prior distributions with parameters 0 and ß0 for each of the mutation rates yields independent Gamma posterior distributions conditional on t and with parameters 0 + K + S and

    The mean of this distribution is {alpha} /ß and the variance is {alpha} /ß2. As such, if {alpha} 0 and ß0 are chosen to be small, the prior will be uninformative. For all our analysis, we used {alpha} = ß = 0.001.|', http://www.100md.com

    Result 4: The posterior distribution p(t|R, S, , data) is proportional to the likelihood function at the point (t, R, S, ) if t < T and 0 otherwise.|', http://www.100md.com

    Result 5: The joint conditional posterior distribution p(|R, S, t, µ, {sigma} 2, data) factors into the product of the individuals' conditional distributions p(j|Rj, Sj, t, µi, {sigma} 2i, KRj, SRj, data). Furthermore, the conditional posterior distribution p(j|Rj, Sj, t, µi, {sigma} 2i, KRj, SRj) for a given gene j in class i is proportional to the product of the likelihood for the gene given Rj, Sj, j, and t, and the probability density of a normal distribution with mean µi and variance {sigma} 2i, at the point j.

    Markov chain Monte Carlo algorithm::), 百拇医药

    Given the model and results outlined above, it is then possible to sample from p(R, S, t, µ1, 21, µ2, 22|data) using the following algorithm (METROPOLIS et al. 1953 ) for each chain. The algorithm we employ in this analysis has the following steps.:), 百拇医药

    Step 1: Initialize by drawing a value for j for 1 j J1 + J2 independently from a normal distribution with mean near 0 and a reasonably large variance. We used several starting values for the mean in the range [-5, 5] and the variance in the range [1, 100].:), 百拇医药

    Step 2: Using the values in , update 2i for i {1, 2} by sampling from the conditional distribution of 2i|, which is inverse-2 distributed as detailed above.:), 百拇医药

    Step 3: Using the values in and 2i for i {1, 2}, update µi by sampling a new value from a normal distribution with the updated parameters in Result 1 above.xpx7$m, 百拇医药

    Step 4: Update t by using Metropolis sampling.xpx7$m, 百拇医药

    a. Sample a proposal value t' from a U(t - t, t + t) distribution.xpx7$m, 百拇医药

    b. If p(t'|R, S, data) > p(t|R, S, |data), set t = t'. Otherwise, set t = t' with probability proportional to the ratio of these two quantities.xpx7$m, 百拇医药

    Step 5: Update each j in by using J1 + J2 independent Metropolis steps as follows.xpx7$m, 百拇医药

    a. Sample a proposal value 'j from a U(j, - , j + ).xpx7$m, 百拇医药

    b. If p('j|Rj, Sj, t, µi, {sigma} 2i, SRj, KRj) > p(j|Rj, Sj, t, µi, {sigma} 2i, SRj, KRj), set . Otherwise, set with probability proportional to their ratio.

    Step 6: For each gene, draw a value for Rj and Sj using the result that the posterior distribution for j|j, t is a Gamma distribution with parameters as described in Result 3 above.#7^h[?, 百拇医药

    Step 7: Repeat steps 2–6.#7^h[?, 百拇医药

    We used the above algorithm to approximate the joint posterior distributions using 10 different starting points (i.e., 10 different chains) run for 10,000 steps each after an initial 2000-step burn-in and retention of draws every 10 steps (for a total of 10,000 draws for each parameter in the model). For the Metropolis step for updating t, we used a proposal distribution with {delta} t = 1.0, which gave a rejection rate of ~ 26.36% for the 100,000 draws retained after the initial burn-in. To measure convergence we used a statistic that was below 1.01 for all parameters in the model before samples were retained (conventionally one retains after 1.2 or less), illustrating that the 10 chains had converged well before we retained our samples.

    LITERATURE CITED&}9//zj, 百拇医药

    AGUADE, M., 2001 Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana. Mol. Biol. Evol. 18:1-9.&}9//zj, 百拇医药

    ANISIMOVA, M., J. P. BIELAWSI, and Z. H. YANG, 2001 Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18:1585-1591.&}9//zj, 百拇医药

    Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.. (2000) Nature 408:796-815.&}9//zj, 百拇医药

    BOLDIN, M. P., I. L. METT, and D. WALLACH, 1995 A protein related to a proteasomal subunit binds to the intracellular domain of the p55 TNF receptor upstream to its ‘death domain’. FEBS Lett. 367:39-44.&}9//zj, 百拇医药

    BUSTAMANTE, C. D., R. NIELSEN, and D. L. HARTL, 2002a A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 19:110-117.&}9//zj, 百拇医药

    BUSTAMANTE, C. D., R. NIELSEN, S. SAWYER, K. M. OLSEN, and M. D. PURUGGANAN et al., 2002b The cost of inbreeding in Arabidopsis. Nature 416:531-534.

    DOEBLEY, J. and L. LUKENS, 1998 Transcriptional regulators and the evolution of plant form. Plant Cell 10:1075-1082.+q-u, 百拇医药

    ENDO, T., K. IKEO, and T. GOJOBORI, 1996 Large-scale selection for genes on which positive selection may operate. Mol. Biol. Evol. 13:685-690.+q-u, 百拇医药

    EWING, B., L. HILLIER, M. C. WENDL, and P. GREEN, 1998 Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175-185.+q-u, 百拇医药

    GELMAN, A., J. S. CARLIN, H. S. STERN and D. B. RUBIN, 1997 Bayesian Data Analysis. Chapman & Hall, Boca Raton, FL.+q-u, 百拇医药

    HAAG, E. S. and J. R. TRUE, 2001 Perspective: From mutants to mechanisms? Assessing the candidate gene paradigm in evolutionary biology. Evolution 55:1077-1084.+q-u, 百拇医药

    HUGHES, A. L., 1991 Circumsporozoite protein genes of malaria parasites (Plasmodium spp.): evidence for positive selection on immunogenic regions. Genetics 127:345-353.+q-u, 百拇医药

    HUGHES, A. L. and M. YEAGER, 1998 Natural selection at major histocompatibility complex loci of vertebrates. Annu. Rev. Genet. 32:415-435.

    KOCH, M., B. HAUBOLD, and T. MITCHELL-OLDS, 2000 Comparative evolutionary analysis of the chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis and related genera. Mol. Biol. Evol. 17:1483-1498.+%1[p/, http://www.100md.com

    KUMAR, S., K. TAMURA and M. NEI, 1993 MEGA: Molecular Evolutionary Genetics Analysis, Version 1.0. Pennsylvania State University, University Park, PA.+%1[p/, http://www.100md.com

    LAWTON-RAUH, A., E. S. BUCKLER, and M. D. PURUGGANAN, 1999 Patterns of molecular evolution among paralogous floral homeotic genes. Mol. Biol. Evol. 16:1037-1045.+%1[p/, http://www.100md.com

    LI, W.-H., T. GOJOBORI, and M. NEI, 1981 Pseudogenes as a paradigm of neutral evolution. Nature 292:237-239.+%1[p/, http://www.100md.com

    LI, W.-H., C.-I WU, and C. C. LUO, 1985 A new method for estimating synonymous and nonsynonymous rates on nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174.+%1[p/, http://www.100md.com

    MCDONALD, J. and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.

    MESSIER, W. and C. B. STEWART, 1997 Episodic adaptive evolution of primate lysozymes. Nature 385:151-154.y@., http://www.100md.com

    METROPOLIS, N., A. ROSENBLUTH, M. ROSENBLUTH, A. TELLER, and E. TELLER, 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1092.y@., http://www.100md.com

    MIYASHITA, N. T., A. KAWABE, H. INNAN, and R. TERAUCHI, 1998 Intra- and interspecific DNA variation and codon bias of the Alcohol Dehydrogenase (Adh) locus in Arabis and Arabidopsis species. Mol. Biol. Evol. 15:1420-1429.y@., http://www.100md.com

    NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.y@., http://www.100md.com

    NEI, M. and T. GOJOBORI, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.y@., http://www.100md.com

    NIELSEN, R., 2001 Statistical tests of selective neutrality in the age of genomics. Heredity 86:641-647.y@., http://www.100md.com

    OLSEN, K. M., A. WOMACK, A. R. GARRETT, J. I. SUDDITH, and M. D. PURUGGANAN, 2002 Contrasting evolutionary forces in the Arabidopsis thaliana floral developmental pathway. Genetics 160:1641-1650.

    ORR, H. A. and J. A. COYNE, 1992 The genetics of adaptation: a reassessment. Am. Nat. 140:725-742.*@ct, 百拇医药

    PURUGGANAN, M. D. and J. I. SUDDITH, 1998 Molecular population genetics of floral homeotic loci: departures from the equilibrium-neutral model at the AP3 and PI genes of Arabidopsis thaliana. Genetics 151:839-848.*@ct, 百拇医药

    ROZAS, J. and R. ROZAS, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.*@ct, 百拇医药

    SAWYER, S. A. and D. L. HARTL, 1992 Population genetics of polymorphism and divergence. Genetics 132:1161-1176.*@ct, 百拇医药

    SCHMID, K. J. and C. F. AQUADRO, 2001 The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159:589-598.*@ct, 百拇医药

    SUCENA, E. and D. L. STERN, 2000 Divergence of larval morphology between Drosophila sechellia and its sibling species caused by cis-regulatory evolution of ovo/shaven-baby. Proc. Natl. Acad. Sci. USA 97:4530-4534.

    SWANSON, W. J. and V. D. VACQUIER, 1995 Extraordinary divergence and positive Darwinian selection in a fusagenic protein coating the acrosomal process of abalone spermatozoa. Proc. Natl. Acad. Sci. USA 92:4957-4961.ui.45, http://www.100md.com

    SWANSON, W. J., A. G. CLARK, H. M. WALDRIP-DAIL, M. F. WOLFNER, and C. F. AQUADRO, 2001 Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc. Natl. Acad. Sci. USA 98:7375-7379.ui.45, http://www.100md.com

    TIFFIN, P. and M. HAHN, 2002 Coding sequence divergence between two closely-related plant species—Arabidopsis thaliana and Brassica rapa ssp. pekinensis. J. Mol. Evol. 54:746-753.ui.45, http://www.100md.com

    WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276.ui.45, http://www.100md.com

    WU, C.-I, 2001 The genic view of the process of speciation. J. Evol. Biol. 14:851-865.ui.45, http://www.100md.com

    XIE, Q., G. FRUGIS, D. COLGAN, and N. H. CHUA, 2000 Arabidopsis NAC1 transduces auxin signal downstream of TIR1 to promote lateral root development. Genes Dev. 14:3024-3036.ui.45, http://www.100md.com

    ZHANG, J., H. F. ROSENBERG, and M. NEI, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713.(Marianne Barrier Carlos D. Bustamante Jiaye Yu and Michael D. Purugganan)