当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第8期 > 正文
编号:11255068
Differential Selection of Genes of Cucumber Mosaic Virus Subgroups
     Station de Pathologie Végétale, Institut National de la Recherche Agronomique, Montfavet Cedex, France

    E-mail: moury@avignon.inra.fr.

    Abstract

    Cucumber mosaic virus (CMV) has an extremely broad plant-host range, a large number of vector species, and a wide geographical distribution. CMV is, therefore, a model by which to understand plant virus adaptation. The selective constraints exerted on the five proteins expressed from the CMV genome were evaluated by application of newly developed maximum-likelihood algorithms to analyze sequences available in data banks. The ratio between nonsynonymous and synonymous substitution rates () was used to detect positive selection on particular codon sites. Amino acid sequences were conserved with ranging from 0.07 to 0.60 in different proteins. However, a small proportion of amino acids in proteins 1a, 2a, and 3b, the coat protein (CP), were positively selected ( > 1). Moreover, the evolution of the CP in the three subgroups of CMV strains revealed different selection profiles along the sequence and significantly different speed of evolution at many positions. Constraints exerted by aphid transmission, rather than plant adaptation, seemed to be responsible for these patterns of evolution in the CP.

    Key Words: positive selection ? cucumber mosaic virus ? Cucumovirus ? insect transmission ? epidemiology

    Introduction

    Cucumber mosaic virus (CMV), the type species of the genus Cucumovirus, family Bromoviridae, is an exceptional model by which to study virus evolution and adaptation because of its extremely broad host range (more than 1,000 species of plants) and its worldwide distribution. Much information is available on the biology and ecology of CMV (Palukaitis et al. 1992; Gallitelli 2000) and on its genome structure and functions, and a large number of sequences is available (e.g., more than 80 coat protein sequences). CMV has a tripartite, positive-sense RNA genome with five open reading frames (ORFs) (fig. 1). RNA 1 encodes protein 1a, which is necessary for viral replication and contains helicase and methyl transferase motifs (Kadaré and Haenni 1997). RNA 2 encodes protein 2a, the viral polymerase (Ishihama and Barbier 1994, O'Reilly and Kao 1998), and protein 2b, expressed from a subgenomic RNA (Ding et al. 1994). ORF 2b overlaps with the C-terminus of ORF 2a, and protein 2b is involved in the long-distance movement of CMV in the plant (Soards et al. 2002) and in the inhibition of posttranscriptional gene silencing (Béclin et al. 1998; Brigneti et al. 1998). RNA 3 encodes two proteins, the cell-to-cell movement protein (protein 3a) and the coat protein (CP) (protein 3b), which is expressed from a subgenomic RNA. The CP of CMV is multifunctional and is involved in aphid transmission, as well as in cell-to-cell and systemic movement (Chen and Francki 1990; Suzuki et al. 1991; Kaplan, Zhang, and Palukaitis 1998; Schmitz and Rao 1998). In addition, CMV can harbor satellite RNAs that do not encode proteins but can affect the symptoms induced by the virus (Kaper and Waterworth 1977; Gonsalves, Provvidenti, and Edwards 1982; Yoshida, Goto, and Iizuka 1985). CMV is transmitted mainly by aphids in a nonpersistent manner (Palukaitis et al. 1992) and through seeds in some plant species, a property that seems to be determined by RNA 1 (Hampton and Francki 1992).

    FIG. 1. Genome organization of CMV and functions associated with the different proteins. Codon positions subjected to positive selection are indicated above the ORF boxes for strains in subgroups IA and IB or below the ORF boxes for strains in subgroup II of CMV. Codons undergoing positive selection for which a biological function (aphid transmission) has been demonstrated are underlined. Nucleotide (nt) and codon positions numbers are given for the Fny strain. Subgenomic RNAs are not shown

    Phylogenetic and diversity studies reveal three subgroups within the CMV (Palukaitis et al. 1992; Palukaitis and Zaitlin 1997; Roossinck, Zhang, and Hellwald 1999). Subgroups I and II are quite distantly related, and their genomes have approximately 75% nucleotide identity. Subgroup I can be further divided into subgroups IA and IB that are more closely related (92% to 95% nucleotide identity) (Roossinck 2002). Subgroup IB consists essentially of East Asian CMV isolates, whereas the other two subgroups are distributed worldwide. Some mechanisms of CMV evolution and adaptation have been explored. Reassortment between RNAs (Roossinck 2002) and recombinations in the 5' and 3' nontranslated regions and between ORFs 3a and 3b have been shown to contribute to CMV evolution (Fraile et al. 1997; Roossinck, Zhang, and Hellwald 1999; Chen, Goldbach, and Prins 2002). Recombination in the 3' nontranslated regions was shown to correlate with plant-host adaptation (Chen, Goldbach, and Prins 2002). Conversely, there was no evidence of recombination within the coding regions of the three RNAs (Candresse et al. 1997).

    By measuring the selective pressure exerted on the proteins encoded by the CMV genome, I show here that rapid amino acid substitutions also contribute to CMV evolution. Comparing the evolution patterns of the different subgroups of CMV suggests that selective constraints are exerted differently on them. This study also illustrates that such analyses, yet rarely conducted with plant viruses, can establish bridges between diversity and phylogenetic data, usually mainly descriptive, and the study of protein function and structure, as well as virus epidemiology.

    Materials and Methods

    Multiple Sequence Alignments and Phylogeny Estimations

    Sequence data were obtained from GenBank in April 2002 (accession numbers are available in the Supplementary Material online at www.mbe.oupjournals.org). Almost all the sequences correspond to CMV isolates collected on different plants and not to virus sequences that represent the heterogeneity of CMV populations within individual plants. Thus, these sequences correspond either to the most common sequence in these populations or to the consensus sequence of the whole populations. Each protein was analyzed separately. The nucleotide sequences were first aligned using ClustalW version 1.8 (Thompson, Higgins, and Gibson 1994) and then checked by hand. A few codons corresponding to gaps or to unreliable alignments were excluded from further analyses (four codons of ORF 1a, six codons of ORF 2a, and 11 codons of ORF 2b). Two codons were also excluded when the CP sequences of CMV strains in subgroups I and II were compared. Alignments are available from the author on request. Phylogeny construction and evaluation were done by application of the neighbor-joining (NJ), the Fitch and Margoliash, and the maximum-parsimony (MP) methods in the PHYLIP software package (Felsenstein 1993). One thousand bootstrap replications were used to place confidence estimates on groups contained in the most-parsimonious unrooted trees. Nodes with a low reliability (bootstrap support below 70%) (Hillis and Bull 1993) were collapsed and the subsequent tree topology was used for analyses of codon substitution.

    Estimation of the Selective Pressures on Proteins

    The method used for measuring the selective pressure on protein-coding sequences was previously described (Yang and Bielawski 2000; Hurst 2002; Moury et al. 2002). The ratio () of nonsynonymous (amino acid–altering) to synonymous (silent) substitution rates provides an estimate of the selective pressure on the encoded protein (Kimura 1983). A maximum-likelihood (ML) method that utilizes models of sequence evolution can be employed to calculate ratios and to identify amino acid sites as conserved, neutral, or positively selected (Yang et al. 2000). Instead of averaging across all codon sites, Yang et al.'s (2000) method allows estimations of on a codon-by-codon basis. This method originally employed 14 models that use statistical distributions to account for variable ratios among codon sites. Models M0, M1, M2, M3, M7, and M8 were shown to be sufficient for accurate selection analysis (Yang et al. 2000). Models M0, M1, and M7 do not allow for the existence of positively selected sites. M0 calculates a single ratio (between 0 and 1) averaged over all sites, M1 accounts for neutral evolution by estimating the proportion of conserved ( = 0) and neutral ( = 1) sites, and M7 uses a discrete beta distribution between 0 and 1 to model different ratios between sites. Alternatively, models M2, M3, and M8 account for positive selection by using parameters that can estimate > 1. Models M2 and M8 extend M1 and M7, respectively, through the addition of two parameters that have the potential to estimate > 1 for an extra class of sites. M3 provides the most sensitive test for positive selection by estimating an ratio for a predetermined number of classes (three in these analyses). The first step in identifying amino acid sites under positive selection is to test whether sites exist with > 1 by application of likelihood ratio tests (LRTs) to compare nested models. M0 and M1 are both special cases of M2 and M3, and M7 is a special case of M8. Such nested models can be compared by LRTs. Once positively selected sites have been shown to exist, the second step is to use Bayesian methods to locate their position. Sites having high posterior probabilities (P > 90%) of belonging to a site class with > 1 are good candidates for positively selected sites. The methods and models described here were implemented within the CODEML program of the PAML version 3.0c package (Yang 1997). To avoid artifactual detection of positive selection, occurrence of substitution saturation at the three positions in the codons and recombination events within the different ORFs were checked, as previously described (Moury et al. 2002), and each program was run at least three times with different initial values for to avoid local ML estimates.

    Evolutionary Rate Shifts Among CMV Subgroups

    The LRT developed by Knudsen and Miyamoto (2001) was used to detect specific amino acid or nucleotide sites that evolve at different rates in different subgroups of CMV sequences. A significant rate difference between two subgroups at a given site would, thereby, mean that the function of this position could be different in the two groups and/or that evolutionary constraints differ between CMV subgroups. The likelihood of the null hypothesis assuming that a given position evolves with different rates in the two sequence subgroups is compared with the likelihood of the alternative hypothesis (same rate in the two subgroups). The number of sites with rate differences detected by the LRT at a given P significance level can be compared with the number of sites expected by chance (P x l; l is the length of the alignment) to assess the number of positions with significantly different rates. Because of limited numbers of sequences in some data sets, pairwise comparisons between CMV subgroups were performed on nucleotide and amino acid sequences of the CP and 3a protein only. The program is available at www.daimi.au.dk/compbio/rateshift and allows analysis of 30 sequences at the same time. Consequently, for the CP of subgroup IA CMV strains, two random subsets of 30 sequences (among 44) were analyzed and revealed only a few differences.

    Nucleotide Frequencies and Codon Usage

    Nucleotide, dinucleotide, and codon frequencies in the CP sequences of the CMV strains in subgroups IA and IB were calculated by use of DAMBE version 4.0.75 (Xia and Xie 2001). The theoretical codon distributions for each amino acid within each codon site affected by evolutionary rate shift between subgroups IA and IB was calculated along with the average codon frequencies in each subgroup. For each amino acid at these sites, deviation of the observed codon distribution in the two CMV subgroups from the theoretical distribution was evaluated by a 2 test.

    RNA Structural Constraints

    The secondary structure of the CP-coding sequences and of the corresponding subgenomic RNA 4 sequences of five IA and five IB strains were predicted by use of the mFOLD version 3.1 program (Zucker 1989) with the temperature parameter set at 30°C. Based on free energy values, the three most stable structures were examined for each sequence. Secondary-structure predictions can be independently supported by occurrence of nucleotide covariation, in which a nucleotide substitution of a base-paired sequence is matched by a substitution in the paired sequence that preserves binding. For each nucleotide site of the CP-coding sequence potentially affected by evolutionary rate shifts between subgroups IA and IB of CMV, I searched for covarying nucleotide sites in the alignment of the CP ORF. This search was done by hand with the help of Microsoft EXCEL.

    Results

    Mean Selective Pressures on the Different Proteins Encoded by CMV

    The trees representing the different ORFs in the CMV genome were almost the same as those previously presented (Roossinck, Zhang, and Hellwald 1999; Roossinck 2002), except for ORFs 3a and 3b as more sequences were included in this analysis (fig. 2). This development did not change the overall distribution in CMV subgroups. Subgroups I and II are clearly monophyletic (fig. 2). Subgroup IA is monophyletic within group I, whereas subgroup IB, as previously defined (Roossinck, Zhang and Hellwald 1999), is not monophyletic, but composed of those clades and strains in subgroup I that do not belong to subgroup IA (fig. 2). Similar consensus trees were obtained with the different phylogenetic methods used (data not shown). The branching pattern of the trees representing the different ORFs has been used to produce hypotheses concerning the evolutionary constraints exerted on the corresponding proteins (Roossinck 2002). However, no measures of these constraints have been published. Moreover, variations in population size also affect the topology of phylogenetic trees independently of any variation in evolutionary constraint (Emerson, Paradis, and Thébaud 2001). Therefore, I have made a more precise study of the selective constraints exerted on CMV. To avoid any bias in the evaluation of the selective pressures, occurrence of substitution saturation was checked in the different sequence sets. There was no evidence for substitution saturation at any of the three positions in the codons when sequences from CMV strains that belong to subgroups IA and IB were compared. However, substitution saturation occurred when sequences from CMV strains that belong to subgroup II were compared with sequences from CMV strains in other subgroups (data not shown). As a consequence, sequences from subgroup II were analyzed separately for the CP, and sequences from subgroups IA and IB only were analyzed for other proteins because there were too few sequences available for subgroup II. There was no evidence of recombination within ORFs, as was previously reported (Candresse et al. 1997).

    FIG. 2. Phylogenetic relationships of CMV coat protein sequences. The distance method of neighbor-joining was used to construct the tree topology. All bootstrap values of 70% or greater are indicated on the tree. The scale bar indicates the numbers of nucleotide substitutions per site

    Nonsynonymous/synonymous substitution rates were evaluated with CODEML for the different models (tables 1–4). Likelihood values and parameter estimates are detailed in Supplementary Material online. The mean ratios differed greatly between the different ORFs. For ORF 1a, the mean was estimated to 0.07 for the different models, except for M1 (0.14), which fitted the data poorly (M2 and M3 rejected M1 in LRTs [table 2]). For ORF 2a, mean estimates were approximately twice as large as estimates for ORF 1a (between 0.13 and 0.14, except M1, which fitted the data poorly when compared with M2 and M3 [table 1]). ORF 2b showed the largest mean estimates (0.43–0.60). Overlapping ORFs often show high estimates (García-Arenal, Fraile, and Malpica 2001; Guyader and Giblot Ducray 2002) because the synonymous substitution rate is reduced in these overlaps as synonymous substitutions in one ORF would frequently change amino acids in the other ORF. However, the overlapping domain of the 2b protein did not seem to evolve faster than the nonoverlapping one (data not shown).

    Table 1 Model Parameter Estimates for Positive Selection in ORFs 1a, 2a, 2b, and 3a of Subgroups IA and IB of CMV.

    Table 2 Likelihood Ratio Tests (LRTs) for Positive Selection in ORFs 1a, 2a, 2b, and 3a of Subgroups IA and IB of CMV.

    Evolution of ORF 3a was largely constrained with a mean estimate of 0.10, except for model M1, which fitted the data poorly in comparison with models M2 and M3 (table 1). For ORF 3b (the CP), each CMV subgroup could be analyzed separately because of the large number of sequences available. Evolution of the CP of subgroup IB strains was more constrained (mean estimates varied between 0.09 and 0.10, except for model M1, which fitted the data poorly in comparison with models M0, M2, and M3) than that of subgroup IA (mean estimates varied between 0.19 and 0.31) and especially of subgroup II (mean estimates varied between 0.35 and 0.47) strains.

    Selection Analysis of ORFs 1a, 2a, 2b, and 3a

    Selection analysis of ORFs 2b and 3a data sets did not identify any positively selected site (table 1). In contrast, positive selection was identified in ORFs 1a and 2a data sets with models M3 and M8. For both ORFs, M2 was unable to detect a positively selected class, because its extra parameters were used to account for a relatively large class (29% for ORF 1a and 33% for ORF 2a) of fairly conserved amino acid sites ( = 0.1 for ORF 1a and 0.2 for ORF 2a). For both ORFs, M3 detected about 1% of sites under weak positive selection ( 2), M3 was able to reject M0 and M1 in LRTs but was unable to reject M2. M8, which also predicted a small proportion of sites with similar positive selection, was able to reject M7, which confirms the significance of positive selection in both ORFs. M8 predicted that eight sites in ORF 1a (aligning with amino acid sites 249, 256, 259, 448, 550, 551, 553, and 697 of strain Fny [accession number D00356]) and that two sites in ORF 2a (aligning with amino acid sites 270 and 851 of strain Fny [accession number D00355]) belonged to the positively selected class with P > 90%, whereas M3 predicted that sites 249, 256, 259, 448, 553, and 697 in ORF 1a and site 270 in ORF 2a were positively selected with P > 90% (table 6 and fig. 1).

    Table 6 Amino Acid Sites Putatively Affected by Positive Selection in the Different ORFs of CMV.

    Selection Analysis of the 3b (CP) ORF

    Evidence for strong positive selection was obtained in the CP of CMV strains that belong to subgroups IA and II (table 3). Models M2, M3, and M8 estimated that a small proportion of sites was under positive selection ( = 4.7 to 6.2 for subgroup IA and = 7.7 to 11.6 for subgroup II). For both subgroups, M2 and M3 rejected M1 and M0 in LRTs, whereas M8 rejected M7 (table 6). Bayesian methods assigned sites 25, 28, 65, and 205 of subgroup IA sequences to the positively selected class estimated by M2, M3, and M8 with P > 90% (table 6). Sites 17 and 214 were also assigned to the positively selected class, although with lesser significance. These last two sites are probably affected by weaker positive selection, which was confirmed when CP sequences of subgroups IA and IB were analyzed together (see below). Sites 41 and 44 of the CP of subgroup II strains were assigned to the positively selected class estimated by M3 and M8 with P > 90%, whereas only site 44 was shown to be positively selected with P > 90% by M2 (table 6). For the CP of CMV strains belonging to subgroup IB, weak ( = 1.9) positive selection was detected for a small proportion of sites with M3 and M8 but not M2. M3 was able to reject M0 and M1 in LRTs but did not reject M2. M8 was able to reject M7, which confirmed occurrence of positive selection. Sites 82 and 137 were shown to belong to the positively selected class detected with M3 and M8 with P > 90%. When subgroups IA and IB were analyzed together, 4.4% of the CP sites were affected by weak, although significant, positive selection with = 1.4 (data not shown). This class of sites comprised those sites previously identified in either subgroups together with sites 17, 76, and 214 with P > 90% (table 6).

    Table 3 Model Parameter Estimates for Positive Selection in the Coat Protein Gene of the Three Subgroups of CMV Strains.

    Comparison of the Evolution of the CP in the Different Subgroups of CMV

    The fact that different amino acid positions in the CP, depending on the subgroups, appeared to be under positive selection suggests that selective pressures and substitution rates at particular sites could vary between subgroups. To confirm this possibility, an LRT was used to detect evolution rate shifts at particular sites between the three CMV subgroups. Comparison of subgroup II with either subgroup IA or IB was not very informative, because of the large distance between the sequences in these groups. Only positions corresponding to amino acids 41 and 44 were shown to evolve significantly faster in subgroup II than in subgroup IA or IB (P < 0.0005; among the 216 aligned amino acids of the CP, 0.1 0.0005 x 216 would have been detected by chance). Comparison of CP amino acid sequences revealed that only two positions (25 and 205) evolved significantly faster (P < 0.002; among the 218 aligned amino acids of the CP, 0.4 0.002 x 218 would have been detected by chance) in subgroup IA than in subgroup IB. Comparison of nucleotide sequences in these two subgroups revealed that 38 nucleotide sites evolved at different rates between groups at the 5% significance threshold (table 5). Whatever the significance threshold between 0.01% and 3%, there was a large excess of sites affected by different evolution rates in comparison with what would have been expected by chance (fig. 3). This finding is consistent with contrasted evolution patterns of the CP sequences between subgroups IA and IB. Among these sites, six corresponded to nonsynonymous mutations that affect amino acids previously shown to be under positive selection in subgroups IA or IB of CMV strains, which strengthens their significance (tables 3 and 5). Surprisingly, substitution rate shifts at the majority of the 38 sites corresponded to synonymous mutations (table 5). The RNA itself, apart from its protein-coding capacity, may contribute to viral fitness because of its nucleotide composition (Eyre-Walker 1999) or to preserve secondary structures (Simmonds and Smith 1999) and could explain why some synonymous mutations appeared at different rates in the CP gene between different CMV subgroups. However, neither differences in nucleotide or dinucleotide composition, in codon usage, nor in predicted RNA structures between subgroups IA and IB explained the vast majority of evolutionary rate shifts (table 5 and data not shown). Concerning the models of RNA structures, it is plausible that these were not realistic enough because they do not take into account folding during RNA synthesis or the strong RNA-CP interactions that are present in CMV virions. Nevertheless, for each nucleotide affected by evolutionary rate shifts between subgroups IA and IB, I did not identify significant covariation with any other nucleotide in the CP-coding sequence that would preserve base pairing (nucleotide covariation occurred always in less than half of the pairs of sequences compared [data not shown]).

    Table 5 The 38 Nucleotide Positions in the Coat Protein of CMV with Significant (P < 5%) Evolutionary Rate Shiftsa Between Subgroups IA and IB.

    FIG. 3. Number of nucleotide sites affected by evolution rate shifts in the coat protein gene between subgroup IA and subgroup IB of CMV strains estimated with Knudsen and Miyamoto's (2001) algorithm. White bars indicate values expected by chance. Black bars indicate estimated values

    Discussion

    To detect variations in the ratio between sites, I used the ML method implemented in PAML (Yang 1997). This method was successfully applied to detection of positive selection in a number of genes. For example, predicted positively selected sites correlated well with known structure and function of chitinases involved in plant defense against pathogens (Bishop, Dean, and Mitchell-Olds 2000) or virus epitopes recognized by the immune system (Haydon et al. 2001). Computer simulations evaluating the power and accuracy of PAML (Anisimova, Bielawski, and Yang 2001, 2002; Suzuki and Nei 2002) suggest that detection of positive selection in the CMV genome is all the more significant because (1) multiple models of heterogeneous selective pressures among sites detected positive selection with high significance, (2) at least for the CP, positive selection was strong for a number of amino acids, and (3) the divergence between sequences was relatively low. Another method of detecting positive selection at specific amino acid sites, based on MP, was developed by Suzuki and Gojobori (1999). However, this method requires more sequences than are currently available for most of the CMV proteins and is excessively conservative (Suzuki and Nei 2002). With these limitations in mind, I analyzed the CP sequence set for subgroups IA and IB together using Suzuki and Gojobori's (1999) method implemented in ADAPTSITE (Suzuki, Gojobori, and Nei 2001) and found that two sites, also detected by PAML, had a probability above 90% of being under positive selection, namely sites 25 (P = 0.94) and 137 (P = 0.93). The contribution of these two sites to adaptive evolution may, therefore, be particularly strong.

    On the average, the evolutionary constraints exerted on proteins 1a, 2a, and 3a are larger than those exerted on proteins 2b and 3b. These conclusions agree with those of Roossinck (2002) based on the topology of the trees representing these ORFs. The different proteins encoded by the CMV genome play different roles in the infection cycle of the virus. They are all involved in different steps of infection within the plant (replication, movement, or seed infection) and the CP also interacts with the aphid vectors, which allows plant-to-plant transmission of the virus. Regarding this diversity of function and of interaction, it is not surprising that evolutionary constraints vary between these proteins. What is surprising is that the evolutionary constraints also varied between subgroups of CMV. The average selective pressure on the CP differed largely between the three subgroups (table 3). Confirming this finding is the fact that a small proportion of amino acid sites were shown to be under strong positive selection in subgroups IA and II, whereas only weak positive selection was shown in subgroup IB. Moreover, the positively selected sites are different in the three subgroups, belong to different structural domains of the CP (Wikoff et al. 1997; Smith et al. 2000); and almost all of them were indeed shown to evolve faster in one subgroup than in others (table 5). CMV strains belonging to subgroups IA and II are distributed worldwide but can show host preferences (Quiot et al. 1979) and different temperature sensitivities (Douine et al. 1979). Twenty-three out of the 25 strains in CMV subgroup IB were collected in East Asia. These biological differences and/or these distribution variations could be the reasons for different evolution patterns.

    Although CMV can be transmitted by a very large number of aphid species (Edwardson and Christie 1991), and although it shows an extremely wide host range, some degree of specificity exists both for transmission (Perry, Zhang, and Palukaitis 1998) and plant infection (Leroux et al. 1979; Shintaku, Zhang, and Palukaitis 1992; Suzuki et al. 1995; Szilassy, Salánki, and Balázs 1999; Takeshita, Suzuki, and Takanami 2001; Kobori et al. 2002). Adaptation to the plant or to the vector could then explain why diversifying selection affects several sites in the CMV genome. Also, concerted evolution between the CMV proteins or between amino acids within a protein could be internal constraints that drive such rapid amino acid substitutions. No particular functions have been attributed to the positively selected sites in proteins 1a and 2a of CMV. Two clusters of three positively selected sites are noticeable in protein 1a (at positions 249, 256, and 259 and at positions 550, 551, and 553). The comparison of protein 1a of CMV with the structure of the corresponding protein of brome mosaic virus (BMV) (O'Reilly and Kao 1998), which belongs also to the family Bromoviridae, indicates that positively selected amino acid sites 249, 256, 259, and 448 belong to the methyltransferase-like domain, that sites 550, 551, and 553 are located in a putative flexible hinge that separates the methyltransferase-like and the helicase-like domains, and that site 697 belongs to the helicase-like domain. Analogy to a functional model proposed for protein 1a of BMV by O'Reilly et al. (1998) further suggests that variations of the positively selected amino acids located in the methyltransferase-like or helicase-like domains can affect physical interactions at different levels (between domains of a single 1a protein, between two different 1a proteins, or between proteins 1a and 1b). The amino acids located between the methyltransferase-like and helicase-like domains could be directly involved in binding with viral or nonviral ligands because they are exposed on the surface of the RNA-dependent RNA polymerase complex of BMV (Dohi et al. 2002). However, the lack of covariation between amino acids subjected to positive selection within protein 1a, between proteins 1a and 2a, or between protein 1a and the CP (data not shown) suggests that these amino acids could be involved in adaptation of CMV through interaction with nonviral ligands.

    For the CP, almost all amino acids subjected to positive selection (except amino acids 76, 82, and 137) are buried in the folded CP or between subunits in assembled virions (Smith et al. 2000). Variations at these positions may indirectly affect the CP structure and CMV fitness. Amino acids 25, 76, and 214 are subjected to positive selection in subgroups IA and/or IB and were shown to affect transmission by aphids (Perry, Zhang, and Palukaitis 1998). Substitutions at these three amino acid positions affected transmission efficiency by Myzus persicae rather than by Aphis gossypii (Perry, Zhang, and Palukaitis 1998). In this study, rapid evolution of amino acid at position 25 in the CP was detected in all independent analyses with high significance. In subgroups IA and IB, amino acid 25 is a serine or a proline, the substitution of which may imply a substantial structural change in the protein. The fact that this position aligns with a one-amino-acid gap in the CP of subgroup II strains (data not shown) strengthens the flexibility of this region. These data suggest that aphid transmission could be a major evolutionary constraint on the CP of CMV. At least two mechanisms can account for differential selection through nonpersistent aphid transmission: (1) different aphid species or populations can select and propagate different components in virus populations because of different affinities in binding to different CMV virions, or (2) even with identical affinities between aphids and CMV variants, tradeoffs between aphid transmissibility and accumulation within host plants can accelerate diversifying selection in the CMV genome. The first mechanism was shown with strains and mutants of CMV obtained in the laboratory with two different aphid species (Perry, Zhang, and Palukaitis 1998). These different CMV variants did not seem, however, to accumulate at different titers in the plants (Perry, Zhang, and Palukaitis 1998). Conversely, amino acids that are exposed on the surface of the virus and whose variation drastically affect aphid transmission (Perry, Zhang, and Palukaitis 1998; Liu et al. 2002) were not detected by positive selection analyses, which suggests that variations at these sites confer too large fitness penalties in natural populations of CMV.

    Amino acid positions in the CP or other ORFs that were previously associated with plant host adaptation or symptom variations, namely amino acid positions 129, 162, and 193 of the CP (Shintaku, Zhang, and Palukaitis 1992; Suzuki et al. 1995; Ryu, Kim, and Palukaitis 1998; Szilassy, Salánki, and Balázs 1999; Takeshita, Suzuki, and Takanami 2001; Kobori et al. 2002), amino acid positions 51 and 240 of protein 3a (Kaplan, Gal-On, and Palukaitis 1997; Takeshita, Suzuki, and Takanami 2001), and amino acid positions 631 and 641 of protein 2a (Kim and Palukaitis 1997) did not belong to positively selected classes of amino acids. This finding suggests that infection of plants belonging to a wide diversity of species does not, on the whole, shape CMV evolution.

    Supplementary Material

    Accession numbers of sequences analyzed. Estimates of parameters and likelihood values for selection analysis of the five proteins expressed from the cucumber mosaic virus (CMV) genome.

    Table 4 Likelihood Ratio Tests (LRTs) for Positive Selection in the Coat Protein Gene of the Three Subgroups of CMV Strains.

    Acknowledgements

    I thank J.-P. Bouchet for his very valuable help with computation, M. Roossinck, F. García-Arenal, M. Tepfer, and C. Desbiez for greatly improving previous versions of the manuscript, and M. Jacquemond, O. Pierrugues, and H. Lecoq for helpful discussions about the results.

    Literature Cited

    Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18:1585-1592.

    Anisimova, M., J. P. Bielawski, and Z. Yang. 2002. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950-958.

    Béclin, C., R. Berthomé, J.-C. Palauqui, M. Tepfer, and H. Vaucheret. 1998. Infection of tobacco or Arabidopsis plants by CMV counteracts systemic post-transcriptional silencing of nonviral (trans)genes. Virology 252:313-317.

    Bishop, J. G., A. M. Dean, and T. Mitchell-Olds. 2000. Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. Proc. Natl. Acad. Sci. USA 97:5322-5327.

    Brigneti, G., O. Voinnet, W.-X. Li, L.-H. Ji, S.-W. Ding, and D. C. Baulcombe. 1998. Viral pathogenicity determinants are suppressors of transgene silencing in Nicotiana benthamiana. EMBO J. 17:6739-6746.

    Candresse, T., F. Revers, O. Le Gall, S. A. Kofalvi, J. Marcos, and V. Pallás. 1997. Systematic search for recombination events in plant viruses and viroids. Pp. 20–25 in M. Tepfer and E. Balazs eds. Virus-resistant transgenic plants: potential ecological impact. INRA/Springer-Verlag, Paris, France.

    Chen, B., and R. I. B. Francki. 1990. Cucumovirus transmission by the aphid Myzus persicae is determined solely by the viral coat protein. J. Gen. Virol. 71:939-944.

    Chen, Y.-K., R. Goldbach, and M. Prins. 2002. Inter- and intramolecular recombinations in the cucumber mosaic virus genome related to adaptation to alstroemeria. J. Virol. 76:4119-4124.

    Ding, S.-W., B. J. Anderson, H. R. Haase, and R. H. Symons. 1994. New overlapping gene encoded by the cucumber mosaic virus genome. Virology 198:593-601.

    Dohi, K., K. Mise, I. Furusawa, and T. Okuno. 2002. RNA-dependent RNA polymerase complex of brome mosaic virus: analysis of the molecular structure with monoclonal antibodies. J. Gen. Virol. 83:2879-2890.

    Douine, L., G. Marchoux, J. B. Quiot, and P. Clément. 1979. Phénomènes d'interférence entre souches du Virus de la Mosa?que du Concombre (CMV). II. Effet de la température d'incubation sur la multiplication de deux souches de sensibilités thermiques différentes, inoculées simultanément ou successivement à un h?te sensible: Nicotiana tabacum var. Xanthi n.c. Ann. Phytopathol. 11:421-430.

    Edwardson, J. R., and R. G. Christie. 1991. Cucumoviruses. Pp. 293–319 in CRC Handbook of viruses infecting legumes. CRC Press, Boca Raton, Fla.

    Emerson, B. C., E. Paradis, and C. Thébaud. 2001. Revealing the demographic histories of species using DNA sequences. Trends Ecol. Evol. 16:707-716.

    Eyre-Walker, A. 1999. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675-683.

    Felsenstein, J. 1993. PHYLIP (phylogenetic inference package). Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Fraile, A., J. L. Alonso-Prados, M. A. Aranda, J. J. Bernal, J. M. Malpica, and F. García-Arenal. 1997. Genetic exchange by recombination or reassortment is infrequent in natural populations of a tripartite RNA plant virus. J. Virol. 71:934-940.

    Gallitelli, D. 2000. The ecology of cucumber mosaic virus and sustainable agriculture. Virus Res. 71:9-21.

    García-Arenal, F., A. Fraile, and J. M. Malpica. 2001. Variability and genetic structure of plant virus populations. Ann. Rev. Phytopathol. 39:157-186.

    Gonsalves, D., R. Provvidenti, and M. C. Edwards. 1982. Tomato white leaf: the relation of an apparent satellite RNA and cucumber mosaic virus. Phytopathology 72:1533-1538.

    Guyader, S., and D. Giblot Ducray. 2002. Sequence analysis of potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J. Gen. Virol. 83:1799-1807.

    Hampton, R. O., and R. I. B. Francki. 1992. RNA-1 dependent seed transmissibility of cucumber mosaic virus in Phaseolus vulgaris. Phytopathology 82:127-130.

    Haydon, D. T., A. D. Bastos, N. J. Knowles, and A. R. Samuel. 2001. Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates. Genetics 157:7-15.

    Hillis, D. M., and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182-192.

    Hurst, L. D. 2002. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Ecol. Evol. 18:486-487.

    Ishihama, A., and Barbier, P. 1994. Molecular anatomy of viral RNA-directed RNA polymerases. Arch. Virol. 134:235-258.

    Kadaré, G., and A.-L. Haenni. 1997. Virus-encoded RNA helicases. J. Virol. 71:2583-2590.

    Kaper, J. M., and H. E. Waterworth. 1977. Cucumber mosaic virus associated RNA 5: causal agent for tomato necrosis. Science 196:429-431.

    Kaplan, I.B., A. Gal-On, and P. Palukaitis. 1997. Characterization of cucumber mosaic virus. III. Localization of sequences in the movement protein controlling systemic infection in cucurbits. Virology 230:343-349.

    Kaplan, I.B., L. Zhang, and P. Palukaitis. 1998. Characterization of cucumber mosaic virus. V. Cell-to-cell movement requires capsid protein but not virions. Virology 246:221-231.

    Kim, C.-H., and P. Palukaitis. 1997. The plant defense response to cucumber mosaic virus in cowpea is elicited by the viral polymerase gene and affects virus accumulation in single cells. EMBO J. 16:4060-4068.

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, United Kingdom.

    Knudsen, B., and M.M. Miyamoto. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. USA 98:14512-14517.

    Kobori, T., M. Miyagawa, K. Nishioka, S.T. Ohki, and T. Osaki. 2002. Amino acid 129 of cucumber mosaic virus coat protein determines local symptom expression and systemic movement in Tetragonia expansa, Momordica charantia and Physalis floridana. J. Gen. Plant Pathol. 68:81-88.

    Leroux, J.-P., J.-B. Quiot, H. Lecoq, and M. Pitrat. 1979. Mise en évidence et répartition dans le Sud-Est de la France d'un pathotype particulier du virus de la mosa?que du concombre. Ann. Phytopathol. 11:431-438.

    Liu, S., X. He, G. Park, C. Josefsson, and K. L. Perry. 2002. A conserved capsid protein surface domain of cucumber mosaic virus is essential for efficient aphid vector transmission. J. Virol. 76:9756-9762.

    Moury, B., C. Morel, E. Johansen, and M. Jacquemond. 2002. Evidence for diversifying selection in potato virus Y and in the coat protein of other potyviruses. J. Gen. Virol. 83:2563-2573.

    O'Reilly, E. K., and C. C. Kao. 1998. Analysis of RNA-dependent RNA polymerase structure and function as guided by known polymerase structures and computer predictions of secondary structure. Virology 252:287-303.

    O'Reilly, E. K., Z. Wang, R. French, and C. C. Kao. 1998. Interactions between the structural domains of the RNA replication proteins of plant-infecting RNA viruses. J. Virol. 72:7160-7169.

    Palukaitis, P., M. J. Roossinck, R. G. Dietzgen, and R. I. B. Francki. 1992. Cucumber mosaic virus. Adv. Virus Res. 41:281-348.

    Palukaitis, P., and M. Zaitlin. 1997. Replicase-mediated resistance to plant virus disease. Adv. Virus Res. 48:349-377.

    Perry, K. L., L. Zhang, and P. Palukaitis. 1998. Amino acid changes in the coat protein of cucumber mosaic virus differentially affect transmission by the aphids Myzus persicae and Aphis gossypii. Virology 242:204-210.

    Quiot, J. B., J.-C. Devergne, L. Cardin, M. Verbrugghe, G. Marchoux, and G. Labonne. 1979. Ecologie et épidémiologie du Virus de la Mosa?que du Concombre dans le Sud-Est de la France. VII. Répartition de deux types de populations virales dans les cultures sensibles. Ann. Phytopathol. 11:359-373.

    Roossinck, M. J. 2002. Evolutionary history of cucumber mosaic virus deduced by phylogenetic analyzes. J. Virol. 76:3382-3387.

    Roossinck, M. J., L. Zhang, and K.-H. Hellwald. 1999. Rearrangements in the 5' nontranslated region and phylogenetic analyzes of cucumber mosaic virus RNA 3 indicate radial evolution of three subgroups. J. Virol. 73:6752-6758.

    Ryu, K. H., C.-H. Kim, and P. Palukaitis. 1998. The coat protein of cucumber mosaic virus is a host range determinant for infection in maize. Mol. Plant-Microbe Interact. 5:351-357.

    Schmitz, I., and A. L. N. Rao. 1998. Deletions in the conserved amino-terminal basic arm of cucumber mosaic virus coat protein disrupt virion assembly but do not abolish infectivity and cell-to-cell movement. Virology 248:323-331.

    Shintaku, M. H., L. Zhang, and P. Palukaitis. 1992. A single amino acid substitution in the coat protein of cucumber mosaic virus induces chlorosis in tobacco. Plant Cell 4:751-757.

    Simmonds, P., and D. B. Smith. 1999. Structural constraints on RNA virus evolution. J. Virol. 73:5787-5794.

    Smith, T. J., E. Chase, T. Schmidt, and K. L. Perry. 2000. The structure of cucumber mosaic virus and comparison to cowpea chlorotic mottle virus. J. Virol. 74:7578-7586.

    Soards, A. J., A. M. Murphy, P. Palukaitis, and J. P. Carr. 2002. Virulence and differential local and systemic spread of cucumber mosaic virus in tobacco are affected by the CMV 2b protein. Mol. Plant-Microbe Interact. 15:647-653.

    Suzuki, M., S. Kuwata, J. Kataoka, C. Masuta, N. Nitta, and Y. Takanami. 1991. Functional analysis of deletion mutants of cucumber mosaic virus RNA 3 using an in vitro transcription system. Virology 183:106-113.

    Suzuki, M., S. Kuwata, C. Masuta, and Y. Takanami. 1995. Point mutations in the coat protein of cucumber mosaic virus affect symptom expression and virion accumulation in tobacco. J. Virol. 76:1791-1799.

    Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328.

    Suzuki, Y., T. Gojobori, and M. Nei. 2001. ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17:660-661.

    Suzuki, Y., and M. Nei. 2002. Simulation study of the reliability and robustness of the statistical methods of detecting positive selection at single amino acid sites. Mol. Biol. Evol. 19:1865-1869.

    Szilassy, D., K. Salánki, and E. Balázs. 1999. Stunting induced by cucumber mosaic cucumovirus-infected Nicotiana glutinosa is determined by a single amino acid residue in the coat protein. Mol. Plant-Microbe Interact. 12:1105-1113.

    Takeshita, M., M. Suzuki, and Y. Takanami. 2001. Combination of amino acids in the 3a protein and the coat protein of cucumber mosaic virus determines symptom expression and virus spread in bottle gourd. Arch. Virol. 146:697-711.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.

    Wikoff, W. R., C. J. Tsai, G. Wang, T. S. Baker, and J. E. Johnson. 1997. The structure of cucumber mosaic virus: cryoelectron microscopy, X-ray crystallography, and sequence analysis. Virology 232:91-97.

    Xia, X., and Z. Xie. 2001. DAMBE: software package for data analysis in molecular biology and evolution. J. Heredity 92:371-373.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comp. Appl. Biosci. 13:555-556.

    Yang, Z., and J. P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496-503.

    Yang, Z., R. Nielsen, N. Goldman, and A.-M. Krabbe Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.

    Yoshida, K., T. Goto, and N. Iizuka. 1985. Attenuated isolates of cucumber mosaic virus produced by satellite RNA and cross-protection between attenuated isolates and virulent ones. Ann. Phytopath. Soc. Jpn. 51:238-242.

    Zucker, M. 1989. On finding all suboptimal foldings of an RNA molecule. Science 244:48-52.(Beno?t Moury)