Molecular Evolution of the Phytochrome Gene Family in Sorghum: Changing Rates of Synonymous and Replacement Evolution
http://www.100md.com
分子生物学进展 2004年第4期
* Plant and Invertebrate Ecology Department, Institute of Arable Crops Research-Rothamsted, Harpenden, Hertfordshire, United Kingdom
Institute of Genomic Diversity, Cornell University
E-mail: sk20@cornell.edu.
Abstract
The photoreceptor phytochromes, encoded by a small gene family, are responsible for controlling the expression of a number of light-responsive genes and photomorphogenic events, including agronomically important phenotypes such as flowering time and shade-avoidance behavior. The understanding and control of flowering time are particularly important goals in sorghum cultivar development for diverse environments, and naturally occurring variation in the phytochrome genes might prove useful in breeding programs. Also of interest is whether variation observed at the phytochrome loci in domesticated sorghum, or in particular races, is a result of human selection. Population genetic studies can reveal evidence of such selection in patterns of polymorphism and divergence. In this study we report a population genetic analysis of the PHY gene family in Sorghum bicolor (L.) Moench in a diverse panel including both cultivated and wild accessions. We show that the level of nucleotide variation in all gene family members is about half the average for this species, consistent with purifying selection acting on these loci. However, the rate of amino acid substitution is accelerated at PHYC compared to the other two loci. In comparisons to a closely related sorghum species, PHYC shows a pattern of intermediate frequency amino acid changes that differ from the patterns observed in comparisons across longer evolutionary distances. There is also a departure from expected patterns of polymorphism and divergence at synonymous sites in PHYC, although the data do not fit a simple model of directional or diversifying selection. Cultivated sorghum has a level of variation similar to that of wild relatives (ssp. verticilliflorum), but many polymorphisms are subspecies-specific, including several amino acid variants.
Key Words: population genetics ? selection ? polymorphism ? sorghum ? phytochrome ? evolution
Introduction
The photoreceptor phytochromes are chromoproteins responsible for controlling the expression of a number of light-responsive genes and photomorphogenic events (Smith 2000). Classification of the five phytochrome genes of Arabidopsis, PHYA-E (Clack, Mathews, and Sharrock 1994), has provided a framework for the classification of homologous genes in different plant species. Phylogenetic analyses of phytochrome sequences in angiosperms have shown that the evolutionary history of the PHY gene family in Arabidopsis is representative of other flowering plants (Mathews, Lavin, and Sharrock 1995). Because PHYA, B, and C genes are found widely in angiosperms, the duplications giving rise to these three family members were inferred to have occurred prior to the radiation of the angiosperm lineage (Kolukisaoglu et al. 1995; Mathews, Lavin, and Sharrock 1995). This inference is supported by the detection of PHYA and PHYC in the earliest diverging angiosperm species (Mathews and Donoghue 2000), although PHYC has been found to be absent from at least one plant group (Howe et al. 1998). The first major duplication in the gene family gave rise to two lineages. One subsequently split into PHYA and PHYC, while the other split into PHYB and PHYE. PHYE is missing from some groups, such as monocots and poplars (Mathews and Sharrock 1996; Howe et al. 1998), while PHYA and PHYB have duplicated in others; thus, most angiosperms have three or four phytochromes , and some (e.g., Arabidopsis, tomato) have five (Mathews and Sharrock 1997).
Many processes are mediated through signals from phyA and phyB, including de-etiolation, flowering, control of seed germination, hypocotyl gravitropic orientations, shade avoidance, and regulation of the photosynthetic apparatus (Smith 2000). Characterization of the specific functions of individual phytochrome proteins has been accomplished in large part by the analysis of mutants in Arabidopsis. These studies indicate that phyA plays a primary role in far-red light perception and signal transduction (Nagatani, Reed, and Chory 1993), whereas phyB mediates most responses induced by pulsed or continuous red light (Reed et al. 1994). Studies of PHYA and PHYB mutants in grasses and eudicots suggest that this divergence in function is maintained throughout angiosperms (Childs et al. 1997; Devlin et al. 1997; Weller et al. 2000). PHYC null mutants have recently been described (Monte et al. 2003), and reveal that phyC is involved in response to continuous red light, with a photosensory specificity similar to that of phyB. With respect to the phenotypes of seedling development and flowering response, the effects of null mutations at PHYC alone are not dramatic, and are undetectable in a PHYB null background. PhyC appears to interact with both phyA and phyB in perception of day length and flowering response, although its effects are relatively modest.
Variation in evolutionary rate has been observed both within and between the phytochrome loci (Mathews and Sharrock 1996; Alba et al. 2000). Comparisons within both eudicots and grasses (specialized monocots) have shown that in these plant groups PHYC has evolved faster than PHYA, which in turn has evolved faster than PHYB. At all three loci, the C-terminal domain, which is involved in dimerization and signal transduction, evolves at least twice as fast as the N-terminal domain, which is responsible for photosensory specificity. These conclusions are based on comparisons involving relatively large evolutionary distances (e.g., maize to rice, soybean to tomato). However, in monocots other than grasses and in all early-diverging dicots, the photosensory domain of PHYA has evolved more rapidly than that of PHYC, and an episode of positive selection on PHYA occurred early in the divergence of this gene pair (Mathews, Burleigh, and Donoghue 2003).
Three PHY genes have been characterized in sorghum (PHYA, PHYB, and PHYC). Fine scale mapping of the ma3R allele in S. bicolor indicates that the Ma3 maturity gene encodes PHYB, and truncation of the PHYB message in the ma3R allele corresponds to reduced photoperiod sensitivity (Childs et al. 1997). Because control of flowering time and shade avoidance have been important goals in cultivar development for diverse environments (Morgan et al. 2002), variation at the phytochrome loci in domesticated sorghum may be a result of human selection on these traits. Population genetic studies can reveal evidence of such selection in patterns of polymorphism and divergence. A study of variation in the PHY genes of sorghum also provides an opportunity to examine the molecular evolution of this gene family on shorter time scales: within S. bicolor, between S. bicolor and its close relative S. propinquum, and between sorghum and maize, from which it diverged about 16.5 MYA (Gaut and Doebley 1997).
In this study we report a population genetic analysis of the PHY gene family in Sorghum bicolor (L.) Moench in a panel of accessions chosen to represent S. bicolor's diversity with respect to race and geographic range in Africa, its center of origin. We use polymorphism and divergence data to test for evidence of selection, and we examine rates of amino acid evolution within functionally defined regions. Finally, we compare the variation between wild and cultivated accessions to look for evidence of selection associated with domestication and breeding.
Materials and Methods
Sorghum Accessions
The accessions used in this analysis are shown in table 1. A panel of 15 lines was chosen to include representatives of two of S. bicolor's subspecies (ssp.) (bicolor, verticilliflorum) and a close relative of S. bicolor, Sorghum propinquum (Kunth.) Hitchc, as an outgroup. The samples of ssp. bicolor and verticilliflorum included representatives of racial diversity chosen across a broad geographical range. Two U.S. inbred lines (RTx430 and BTx623) were also included in the spp. bicolor sample. Genomic DNA from young leaves was isolated using a modification of the method of Doyle and Doyle (1987).
Table 1 Accessions and Their Geographical Origins.
DNA Sequences
Primers were designed in S. bicolor to amplify PHYA, B, and C based on sequence data available in GenBank (PHYA—U56729; PHYB—AF182394; PHYC—U56731) and primers made available for PHYB from Kevin Childs (Texas A&M University). A total of 3,834 bp was sequenced in PHYA, 7,214 bp in PHYB and 4,419 bp in PHYC. Sequence data for each gene encompassed all coding exons and the intervening introns. All the primers and their sequences are available from the authors. Polymerase chain reaction (PCR) amplifications were performed using 10 ng of genomic DNA, 4 pmol of each primer, 0.5 U Taq polymerase (Promega), 2.5 mM MgCl2 in a volume of 20 μl under the following conditions: 2 min at 94°C, followed by 30 cycles of 1 min at 94°C, 1 min at 55°C, and 2 min at 72°C, followed by a final 10 min extension at 72°C. The PCR products were cloned (TOPO cloning kit, Invitrogen) and sequenced. Two clones were sequenced for each amplicon to ensure that PCR errors were identified. DNA fragments were re-amplified and re-sequenced when singletons were detected. The sequencing was performed on an ABI 3700 DNA sequencer (PerkinElmer). Sequences have been deposited into GenBank under accession numbers AY466067–AY466100 and AY466452–AY466468.
Sequence Analysis
Polymorphism and divergence measures and tests of neutrality were calculated with DnaSP version 3.0 (Rozas and Rozas 1999). Evidence for non-neutral evolution was investigated using the HKA test (Hudson, Kreitman, and Aguade 1987) and the MK test (McDonald and Kreitman 1991). Maize sequences were kindly provided by Moira Sheehan. For the HKA test, as an alternative to testing the PHY loci against one putatively neutral locus, we compared them to a pooled data set of 12 short loci from across the sorghum genome (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). These 12 loci were chosen from a set of 95 regions based on two criteria: (1) a Tajima's D statistic with an absolute value less than 1.0, and (2) an HKA statistic with a P value greater than 0.2 when tested against the other 95 loci combined. There is clearly substantial recombination present in this reference "locus," but the HKA test's assumption of no recombination is conservative. For these tests we used a subsample of 10 accessions from each study so that the test and reference samples were very similar to each other: each contained five accessions representing the five bicolor races, the two referenced inbred lines, and one accession each of races verticilliflorum, arundinaceum, and aethiopicum. The reference data were: segregating sites = 33, length = 3,767, sample size = 10, average pairwise divergence = 39.4.
Tests for positive selection were conducted with the codeml program from PAML (Yang 1997). The likelihood of Model B, with the lineages leading to S. propinquum and S. bicolor allowed to have a different rate of amino acid evolution from the rest of the tree, was compared to the likelihood of the discrete model with only two site classes (see Yang and Nielsen [2002]). Data sets included one allele each from maize, rice, and S. propinquum, and one or two alleles from S. bicolor, chosen such that the same polymorphism did not appear twice in the data. The PHYC data set also included an allele from wheat.
Results
Levels of Nucleotide Polymorphism in the PHY Gene Family
We surveyed genomic sequence variation across the three phytochrome genes in a diverse sample of S. bicolor accessions. Figure 1 shows the haplotype at variable sites for each accession (no heterozygotes were observed). A summary of nucleotide diversity in the PHY gene family is shown in table 2, with the samples partitioned into the following groups: all S. bicolor (n = 15); ssp. bicolor (n = 7); ssp. verticilliflorum (n = 8). The level of total sequence diversity () is similar among loci and groups, varying less than twofold in all comparisons. However, this similarity in total diversity masks marked differences that are revealed when the data are analyzed by functional class. For example, PHYC has reduced variation at synonymous sites and elevated variation at noncoding sites, whereas PHYB has relatively little variation at replacement sites. Tajima's (1989) D, an index of the frequency spectrum that has an expectation of about zero under the assumption of neutrality, is not significantly different from zero in any of the samples.
FIG. 1. DNA sequence variation at the phytochrome genes. The numbers refer to nucleotide position starting with the initiation codon. r = a replacement site; s = a synonymous site; i = an intron site
Table 2 DNA Sequence Variation at the PHY Loci.
The average level of total sequence variation () across the S. bicolor genome is about 0.0023, based on a sample that included subspecies bicolor and verticilliflorum and covered both coding and noncoding sequences (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). Variation at the PHY genes in this similar sample is about half of that value. Tested against that data set by the method of Hudson, Kreitman, and Aguade (1987; see Methods), total diversity at the PHY genes is not unusual, given their level of divergence to S. propinquum, and is consistent with a lower neutral mutation rate at these loci. When we tested the PHY loci against each other, using the entire sample of 15 S. bicolor, PHY C has somewhat reduced variation compared to PHY A (P = 0.14) and PHY B (P = 0.22), but neither result is significant.
Amino Acid Variation
The phytochrome protein can be divided into two domains with distinct and separable functions: the N-terminal photosensory domain (amino acids 1–623 for PHYA and PHYC, amino acids 1–673 for PHYB) and the C-terminal signal transduction domain, which is the remainder of the protein (Smith 2000). The distribution of replacement polymorphism in these domains differs among the PHY loci in sorghum. At PHYA, four out of five amino acid variants fall in the C-terminal domain, similar to the pattern of divergence observed over larger evolutionary distances (Alba et al. 2000). These variants are all at intermediate frequency, suggesting that they are not likely to be deleterious. In contrast, most replacement polymorphisms at PHY B and PHYC fall in the N-terminal domain (three out of four and six out of seven, respectively), inconsistent with longer-term evolutionary trends.
Under the null hypothesis of selective neutrality, the ratio of synonymous to replacement variation should be the same within and between species. The distribution of synonymous and replacement polymorphisms in each PHY locus, as well as the number of fixed differences to S. propinquum, are shown in Table 3A. The ratio of replacement to synonymous polymorphisms differs among the loci, with PHY B showing the strongest pattern of purifying selection and PHYC showing the least. When the intraspecific data are compared with fixed differences to S. propinquum by Fisher's exact test (McDonald and Kreitman 1991), PHYC is found to depart from the neutral expectation (P = 0.03), apparently in the direction of too many amino acid polymorphisms. However, comparing the pattern at PHYC to that of PHYA and PHYB, the departure can also be interpreted as an excess of fixations at synonymous sites. This hypothesis can be tested by comparing polymorphism and divergence at synonymous sites and noncoding sites. Table 3B shows that this comparison is significant (P = 0.02), suggesting that the departure at PHYC is likely due to unusual evolution of synonymous sites.
Table 3 Tests of Polymorphism and Divergence.
Divergence Between Wild and Cultivated Subspecies
Our sample can be divided into two subsamples that represent the cultivated subspecies bicolor and the wild subspecies verticilliflorum, thought to be ancestral to cultivated sorghum (Kimber 2000). In table 4, we show the distribution of variation between the two subspecies. At all three loci, the wild sample has more segregating sites than the cultivated one, although the sequence diversity is not reduced proportionately. This may be due, at least in part, to the differential sampling of races in the two subspecies, or it may reflect a real difference in allele frequencies.
Table 4 Differentiation Between Wild and Cultivated S. bicolor.
Table 4 also shows that most polymorphisms are not shared between subspecies. In fact, many polymorphisms are found only in one race, as can be seen by inspection of figure 1. This non-random distribution of variation reflects the population structure that underlies the subspecific and racial classification system. Interestingly, in PHYC there are three amino acid polymorphisms that appear to be specific to races verticilliflorum (positions 1099 and 1310) and aethiopicium (4416), though the sample sizes are too small for this observation to be significant. Two other replacement polymorphisms (484 and 569) are found only in the inbred lines.
Rates of Evolution
The three phytochrome genes play different roles in the plant, and they have different patterns of amino acid variation within S. bicolor. We were interested to test whether they are also evolving at different rates relative to their orthologs in maize, which shared a common ancestor with sorghum about 16.5 MYA, and in which each PHY gene is duplicated, presumably as a result of an allopolyploidization event involving two diploid ancestors (Gaut and Doebley 1997). Each S. bicolor gene was compared to the two homologous maize genes and, in all three cases, the S. bicolor gene was more closely related to one of the maize genes than the two maize genes were to each other. The rates presented in table 5 are based on comparison of the S. bicolor gene to its more similar Zea homolog. In all cases, the maize gene that was closest in terms of nucleotide similarity was also the closest in terms of amino acid similarity. Concordant with both the intrageneric and inter-family data of Alba et al. (2000), these analyses show that PHYB is most conserved at the amino acid level, while PHYC is evolving most rapidly, with a ka/ks ratio almost three times higher than that of PHYB. To compare our data with that of Alba et al. (2000), who estimated ka /year for the PHY genes, we divided ka by 33 Myr, twice the divergence time of sorghum and maize. The rates we obtained are similar to those obtained by Alba et al. (2000). The rates of synonymous substitution at the three phytochrome genes are similar to each other and are only about one-half to two-thirds the rate observed in comparisons of maize and sorghum sequences at the waxy and mdh loci (Gaut and Doebley 1997).
Table 5 Rates of Nucleotide and Amino Acid Evolution.
The distribution of replacement variation within functional domains was unexpected. We were interested to know whether these trends were limited to polymorphism data or whether they could also be observed in divergence from S. propinquum and from Zea mays. Estimates of ka by functional domain (table 6) show that the increased rate of protein evolution in the N-terminal domain is observed at PHYB and PHYC in comparisons with S. propinquum, but such an increase is not observed at any loci in comparisons with Zea mays.
Table 6 Rates of Synonymous (Ks) and Nonsynonymous (Ka) in the Major Functional Domains.
To test whether there is any evidence of positive selection on amino acid changes in the phytochrome genes, we analyzed the data with PAML (Yang 1997). Specifically, we tested whether the ratio of nonsynonymous to synonymous changes along the sorghum lineages was significantly different than along the branches to maize and rice. Models that allowed for such differences did not fit the data better than models that assumed that all branches had the same ratios, and no amino acids showed evidence of selection.
Discussion
The level of nucleotide diversity at all PHY genes is lower than the average observed for a large number of unlinked sites sampled in a similar panel of sorghum accessions (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). Phylogenetic studies have shown that phytochrome genes experienced rapid evolution after the duplication events that produced the gene family prior to the divergence of the angiosperms (Alba et al. 2000; Yang and Nielsen 2002), but that the major evolutionary force on phytochromes appears to have been purifying selection in conjunction with strong positive selection on a few sites (Yang and Nielsen 2002). Total variation at the PHY genes in sorghum is low, and synonymous site evolution is low, but some partitions of the data indicate that rates of evolution are not constant in this lineage (table 6) and that some aspect of function may still be evolving, particularly at PHYC. These findings are discussed in more detail in the following sections.
Synonymous Site Evolution at PHYC
A departure from the neutral expectation was detected in comparisons of polymorphism and divergence across functional classes at PHYC; however, the observed pattern is not easily interpreted in terms of models of directional, diversifying, or slightly deleterious evolution. The excess of fixations at synonymous sites compared to polymorphisms suggests that positive selection has accelerated these fixations. HKA tests of variation at PHYC also show that, while noncoding variation and replacement variation are both consistent with the neutral expectation, synonymous variation is not (P = 0.06). The direction of the departure indicates either that synonymous variation is lower than expected or that synonymous divergence is higher than expected. Furthermore, all synonymous polymorphisms at PHYC are singletons, while much of the noncoding and replacement variation is at intermediate frequency.
A similar phenomenon, namely excess synonymous divergence, has been reported for the ADH locus in Arabidopsis, in comparisons between A. thaliana and A. korshinky, and between A. thaliana and A. flagllosa (Miyashita et al. 1998). The authors evidently did not consider this a departure from neutrality, as they offered no interpretation. At the Notch locus in Drosophila (DuMont et al. 2004), excess synonymous fixations are also observed, and result in changes from preferred to unpreferred codons in the D. melanogaster lineage. Whether these codons are themselves the targets of selection or are being fixed by some sort of hitchhiking process is not clear. In the case of PHYC, codon bias as measured by ENC (effective number of codons) is 55.8, intermediate between that of PHYA (53.3) and PHYB (58.2). All of these figures represent fairly low codon bias.
The pattern of sequence evolution at PHYC within the genus Sorghum contrasts with the pattern seen over the longer divergence time between sorghum and maize, during which all three loci have very similar rates of synonymous substitution. This suggests that a change in selection has occurred since the divergence of maize and sorghum. Because we are not able to assign mutations to the S. bicolor and S. propinquum lineages, it is not possible to know whether this change in selection is specific to S. bicolor.
Amino Acid Evolution
Protein changes in the maize/sorghum lineage are occurring about twice as rapidly at PHYC as at PHYA and PHYB. Patterns of within-species amino acid variation also differ across the three genes. PHY A and PHYC have similar levels of replacement variation, but PHYA is more variable at the C-terminus of the protein, a pattern observed at all three loci across diverse angiosperm lineages (Alba et al. 2000). PHYC, in contrast, is most variable in the photosensory domain near the N-terminus, a trend that is maintained in divergence from S. propinquum, but not in divergence from Z. mays (table 6). These observations suggest that phyC may have experienced a change in selective constraint or be diverging in function in the sorghum lineage. (Note that the lower ka/ks ratio in the comparison of S. bicolor to S. propinquum is due to the unusually elevated rate of ks.) In other lineages, the presence of PHYE, PHYD, or duplicated PHYB loci provides more opportunities for functional divergence in the red-light response. In sorghum, with only three PHY loci, phyC may play a number of different roles in the life of the plant, as suggested by the work of Monte et al. (2003). PHYC may thus be a target of adaptive evolution.
Although phyB is known to be involved in photoperiod response, we found no evidence of new variation associated with either human or natural selection at this locus. Amino acid variants in PHYB are at low frequency, and purifying selection appears to be strongest on this gene. In contrast, PHYA and PHYC both have a number of amino acid variants at intermediate frequency, suggesting that they are not deleterious. At PHYA, there is no association of amino acid variants with population structure, but at PHYC several of the variants appear to have a race-specific distribution. This may simply be due to population structure, because several silent sites (e.g., 3395 and 3748 in PHYA) also show a race-specific distribution. Alternatively, it is possible that different alleles may be favored in different environments. Larger samples of these races, as well as population samples of cultivated races, would be needed to explore this question adequately. Note that, at PHYC, the variants are associated with races of the wild subspecies, and their distribution could therefore not be due to human selection.
It should be noted that genetic studies of natural variation in photoperiod response have not found much evidence of a role for the phytochrome genes. In Arabidopsis, for example, a large number of loci are associated with quantitative trait loci (QTLs) for flowering time (Koornneef et al. 1998), but none of them maps to PHYA or PHYB. In Scots Pine, a survey of the N-terminal domain of two phytochrome genes (O and P) found no association of amino acid variation with time of bud set (García-Gil, Mikkonen, and Savolainen 2003). Naturally occurring variation in flowering time appears to be complex, and the genes involved may vary across environments (Weinig et al. 2002). The predominant role of purifying selection in evolution of the phytochrome genes suggests that most natural variation in photoperiod response occurs downstream of these receptors.
Sorghum Domestication
Cultivated sorghum is believed to have been derived from the wild subspecies verticilliflorum (Kimber 2000). If domestication caused a bottleneck in cultivated sorghum, we would expect to see a reduction in diversity compared to the wild relatives. In maize, for example, Adh1 has 83% of the diversity found in its progenitor, teosinte, whereas glb1 has 60% (Hilton and Gaut 1998) and te1 has 57% (White and Doebley 1999). Studies of allozyme and restriction fragment length polymorphism (RFLP) variation have provided evidence of a domestication bottleneck in sorghum (Aldrich and Doebley 1992; Aldrich et al. 1992; Cui et al 1995), based on both the number of alleles per locus and total heterozygosity. No comparable reduction is observed at the phytochrome loci: partitioning the variation within subspecies, similar levels of nucleotide diversity are found within ssp. bicolor and ssp. verticilliflorum for each PHY gene (tables 2 and 4).
Although diversity in wild and cultivated sorghum is similar, and there are no fixed differences between the two forms, most of the polymorphisms present are not shared (table 4). This finding is in contrast to RFLP studies, in which 60% of the alleles were detected in both cultivated and wild sorghum (Cui et al. 1995). The different pattern of divergence detected in this study may simply be due to chance, or to the different sampling strategies employed. A more interesting possibility is that it may be due to a different evolutionary history at the PHY loci. If variants at the PHY loci are either slightly deleterious (PHYB), or under diversifying selection (PHYC), then they would not be expected to show the same distribution as neutral markers.
Acknowledgements
We thank K. Childs for sharing information on primer sequences; W. Rooney, M. Tuinstra, and A. Paterson for providing sorghum material and seed; M. Sheehan for providing maize PHY sequences; S. Mitchell and V. Bauer Dumont for discussion; R. G. Reeves, A. Wong, and R. Nielsen for help with implementation and interpretation of PAML; and S. Mitchell, T. Brutnell, J. Labate, C. Aquadro, and two anonymous reviewers for comments on earlier versions of this manuscript. This research was supported by National Science Foundation grant DBI 0115903.
Literature Cited
Alba, R., P. M. Kelmenson, M. M. Cordonnier-Pratt, and L. H. Pratt. 2000. The phytochrome gene family in tomato and the rapid differential evolution of this family in angiosperms. Mol. Biol. Evol. 17:362-373.
Aldrich, P. R., and J. Doebley. 1992. Restriction fragment variation in the nuclear and chloroplast genomes of cultivated and wild Sorghum bicolor. Theor. Appl. Genet. 85:293-302.
Aldrich, P. R., J. Doebley, K. F. Schertz, and A. Stec. 1992. Patterns of allozyme variation in cultivated and wild Sorghum bicolor. Theor. Appl. Genet. 85:451-460.
Childs, K. L., F. R. Miller, M. M. Cordonnier-Pratt, L. H. Pratt, P. W. Morgan, and J. E. Mullet. 1997. The sorghum photoperiod sensitivity gene, Ma3, encodes a phytochrome B. Plant Physiol. 113:611-619.
Clack, T., S. Mathews, and R. A. Sharrock. 1994. The phytochrome apoprotein family in Arabidopsis is encoded by five genes: the sequences and expression of PHYD and PHYE. Plant Mol. Biol. 25:413-427.
Cui, Y. X., G. W. Xu, C. W. Magill, K. F. Schertz, and G. E. Hart. 1995. RFLP-based assay of Sorghum bicolor (L.) Moench genetic diversity. Theor. Appl. Genet. 90:787-796.
Devlin, P. F., D. E. Somers, P. H. Quail, and G. C. Whitelam. 1997. The Brassica rapa elongated internode (EIN) gene encodes phytochrome B. Plant Mol. Biol. 34:537-547.
Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure for small amounts of leaf tissue. Phytochem. Bull. 19:11-15.
DuMont, N., V. Bauer, J. C. Fay, P. P. Calabrese, and C. F. Aquandro. 2004. DNA variability and divergence at the Notch locus region of Drosophila melanogaster and D. simulans: a case of accelerated synonymous site divergence. Genetics (in press).
García-Gil, M. R., M. Mikkonen, and O. Savolainen. 2003. Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Mol. Ecol. 12:1195-1206.
Gaut, B. S., and J. Doebley. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA 94:6809-6814.
Hamblin, M. T., S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich. 2004. Comparative population genetics of the panicoid grasses: Sequence polymorphism, linkage disequilibrium, and selection in a diverse sample of Sorghum bicolor. Genetics (in press).
Hilton, H., and B. S. Gaut. 1998. Speciation and domestication in maize and its wild relatives: evidence from the globulin-1 gene. Genetics 150:863-872.
Howe, G. T., P. A. Bucciaglia, W. P. Hackett, G. R. Furnier, M. M. Cordonnier-Pratt, and G. Gardner. 1998. Evidence that the phytochrome gene family in black cottonwood has one PHYA locus and two PHYB loci but lacks members of the PHYC/F and PHYE subfamilies. Mol. Biol. Evol. 15:160-175.
Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.
Kimber, C. 2000. Origins of domesticated sorghum and its early diffusion to India and China. Pp. 3–98 in W. C. Smith and R. A. Frederiksen, eds. Sorghum. John Wiley, New York.
Kolukisaoglu, H. U., S. Marx, C. Wiegmann, S. Hanelt, and H. A. Schneider-Poetsch. 1995. Divergence of the phytochrome gene family predates angiosperm evolution and suggests that Selaginella and Equisetum arose prior to Psilotum. J. Mol. Evol. 41:329-337.
Koornneef, M., C. Alonso-Blanco, A. J. M. Peeters, and W. Soppe. 1998. Genetic control of flowering time in Arabidopsis. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:345-370.
Mathews, S., J. G. Burleigh, and M. J. Donoghue. 2003. Adaptive evolution in the photosensory domain of phytochrome A in early angiosperms. Mol. Biol. Evol. 20:1087-1097.
Mathews, S., and M. J. Donoghue. 2000. Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int. J. Plant Sci. 161:S41-S55.
Mathews, S., M. Lavin, and R. A. Sharrock. 1995. Evolution of the phytochrome gene family and its utility for phylogenetic analyses of angiosperms. Ann. Missouri Bot. Garden 82:296-321.
Mathews, S., and R. A. Sharrock. 1996. The phytochrome gene family in grasses (Poaceae): a phylogeny and evidence that grasses have a subset of the loci found in dicot angiosperms. Mol. Biol. Evol. 13:1141-1150.
Mathews, S., and R. A. Sharrock. 1997. Phytochrome gene diversity. Plant, Cell Environ. 20:666-671.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.
Miyashita, N. T., A. Kawabe, H. Innan, and R. Terauch. 1998. Intra- and interspecific DNA variation and codon bias of the alcohol dehydrogenase (Adh) locus in Arabis and Arabidopsis species. Mol. Biol. Evol. 15:1420-1429.
Monte, E., J. M. Alonso, J. R. Ecker, Y. Zhang, X. Li, J. Young, S. Austin-Phillips, and P. H. Quail. 2003. Isolation and characterization of phyC mutants in Arabidopsis reveals complex crosstalk between phytochrome signaling pathways. Plant Cell 15:1962-1980.
Morgan, P. W., S. A. Finlayson, K. L. Childs, J. E. Mullet, and W. L. Rooney. 2002. Opportunities to improve adaptability and yield in grasses: lessons from sorghum. Crop Sci. 42:1791-1799.
Nagatani, A., J. W. Reed, and J. Chory. 1993. Isolation and initial characterization of Arabidopsis mutants that are deficient in phytochrome A. Plant Physiol. 102:269-277.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Reed, J. W., A. Nagatani, T. D. Elich, M. Fagan, and J. Chory. 1994. Phytochrome A and phytochrome B have overlapping but distinct functions in Arabidopsis development. Plant Physiol 104:1139-1149.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
Smith, H. 2000. Phytochromes and light signal perception by plants—an emerging synthesis. Nature 407:585-591.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
Weinig, C., M. C. Ungerer, L. A. Dorn, N. C. Kane, Y. Toyonaga, S. S. Halldorsdottir, T. F. Mackay, M. D. Purugganan, and J. Schmitt. 2002. Novel loci control variation in reproductive timing in Arabidopsis thaliana in natural environments. Genetics 162:1875-1884.
Weller, J. L., M. E. Schreuder, H. Smith, M. Koornneef, and R. E. Kendrick. 2000. Physiological interactions of phytochromes A, B1 and B2 in the control of development in tomato. Plant J. 24:345-356.
White, S. E., and J. F. Doebley. 1999. The molecular evolution of terminal ear1, a regulatory gene in the genus Zea. Genetics 153:1455-1462.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556.
Yang, Z., and R. Nielsen. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917.(Gemma M. White*, Martha T)
Institute of Genomic Diversity, Cornell University
E-mail: sk20@cornell.edu.
Abstract
The photoreceptor phytochromes, encoded by a small gene family, are responsible for controlling the expression of a number of light-responsive genes and photomorphogenic events, including agronomically important phenotypes such as flowering time and shade-avoidance behavior. The understanding and control of flowering time are particularly important goals in sorghum cultivar development for diverse environments, and naturally occurring variation in the phytochrome genes might prove useful in breeding programs. Also of interest is whether variation observed at the phytochrome loci in domesticated sorghum, or in particular races, is a result of human selection. Population genetic studies can reveal evidence of such selection in patterns of polymorphism and divergence. In this study we report a population genetic analysis of the PHY gene family in Sorghum bicolor (L.) Moench in a diverse panel including both cultivated and wild accessions. We show that the level of nucleotide variation in all gene family members is about half the average for this species, consistent with purifying selection acting on these loci. However, the rate of amino acid substitution is accelerated at PHYC compared to the other two loci. In comparisons to a closely related sorghum species, PHYC shows a pattern of intermediate frequency amino acid changes that differ from the patterns observed in comparisons across longer evolutionary distances. There is also a departure from expected patterns of polymorphism and divergence at synonymous sites in PHYC, although the data do not fit a simple model of directional or diversifying selection. Cultivated sorghum has a level of variation similar to that of wild relatives (ssp. verticilliflorum), but many polymorphisms are subspecies-specific, including several amino acid variants.
Key Words: population genetics ? selection ? polymorphism ? sorghum ? phytochrome ? evolution
Introduction
The photoreceptor phytochromes are chromoproteins responsible for controlling the expression of a number of light-responsive genes and photomorphogenic events (Smith 2000). Classification of the five phytochrome genes of Arabidopsis, PHYA-E (Clack, Mathews, and Sharrock 1994), has provided a framework for the classification of homologous genes in different plant species. Phylogenetic analyses of phytochrome sequences in angiosperms have shown that the evolutionary history of the PHY gene family in Arabidopsis is representative of other flowering plants (Mathews, Lavin, and Sharrock 1995). Because PHYA, B, and C genes are found widely in angiosperms, the duplications giving rise to these three family members were inferred to have occurred prior to the radiation of the angiosperm lineage (Kolukisaoglu et al. 1995; Mathews, Lavin, and Sharrock 1995). This inference is supported by the detection of PHYA and PHYC in the earliest diverging angiosperm species (Mathews and Donoghue 2000), although PHYC has been found to be absent from at least one plant group (Howe et al. 1998). The first major duplication in the gene family gave rise to two lineages. One subsequently split into PHYA and PHYC, while the other split into PHYB and PHYE. PHYE is missing from some groups, such as monocots and poplars (Mathews and Sharrock 1996; Howe et al. 1998), while PHYA and PHYB have duplicated in others; thus, most angiosperms have three or four phytochromes , and some (e.g., Arabidopsis, tomato) have five (Mathews and Sharrock 1997).
Many processes are mediated through signals from phyA and phyB, including de-etiolation, flowering, control of seed germination, hypocotyl gravitropic orientations, shade avoidance, and regulation of the photosynthetic apparatus (Smith 2000). Characterization of the specific functions of individual phytochrome proteins has been accomplished in large part by the analysis of mutants in Arabidopsis. These studies indicate that phyA plays a primary role in far-red light perception and signal transduction (Nagatani, Reed, and Chory 1993), whereas phyB mediates most responses induced by pulsed or continuous red light (Reed et al. 1994). Studies of PHYA and PHYB mutants in grasses and eudicots suggest that this divergence in function is maintained throughout angiosperms (Childs et al. 1997; Devlin et al. 1997; Weller et al. 2000). PHYC null mutants have recently been described (Monte et al. 2003), and reveal that phyC is involved in response to continuous red light, with a photosensory specificity similar to that of phyB. With respect to the phenotypes of seedling development and flowering response, the effects of null mutations at PHYC alone are not dramatic, and are undetectable in a PHYB null background. PhyC appears to interact with both phyA and phyB in perception of day length and flowering response, although its effects are relatively modest.
Variation in evolutionary rate has been observed both within and between the phytochrome loci (Mathews and Sharrock 1996; Alba et al. 2000). Comparisons within both eudicots and grasses (specialized monocots) have shown that in these plant groups PHYC has evolved faster than PHYA, which in turn has evolved faster than PHYB. At all three loci, the C-terminal domain, which is involved in dimerization and signal transduction, evolves at least twice as fast as the N-terminal domain, which is responsible for photosensory specificity. These conclusions are based on comparisons involving relatively large evolutionary distances (e.g., maize to rice, soybean to tomato). However, in monocots other than grasses and in all early-diverging dicots, the photosensory domain of PHYA has evolved more rapidly than that of PHYC, and an episode of positive selection on PHYA occurred early in the divergence of this gene pair (Mathews, Burleigh, and Donoghue 2003).
Three PHY genes have been characterized in sorghum (PHYA, PHYB, and PHYC). Fine scale mapping of the ma3R allele in S. bicolor indicates that the Ma3 maturity gene encodes PHYB, and truncation of the PHYB message in the ma3R allele corresponds to reduced photoperiod sensitivity (Childs et al. 1997). Because control of flowering time and shade avoidance have been important goals in cultivar development for diverse environments (Morgan et al. 2002), variation at the phytochrome loci in domesticated sorghum may be a result of human selection on these traits. Population genetic studies can reveal evidence of such selection in patterns of polymorphism and divergence. A study of variation in the PHY genes of sorghum also provides an opportunity to examine the molecular evolution of this gene family on shorter time scales: within S. bicolor, between S. bicolor and its close relative S. propinquum, and between sorghum and maize, from which it diverged about 16.5 MYA (Gaut and Doebley 1997).
In this study we report a population genetic analysis of the PHY gene family in Sorghum bicolor (L.) Moench in a panel of accessions chosen to represent S. bicolor's diversity with respect to race and geographic range in Africa, its center of origin. We use polymorphism and divergence data to test for evidence of selection, and we examine rates of amino acid evolution within functionally defined regions. Finally, we compare the variation between wild and cultivated accessions to look for evidence of selection associated with domestication and breeding.
Materials and Methods
Sorghum Accessions
The accessions used in this analysis are shown in table 1. A panel of 15 lines was chosen to include representatives of two of S. bicolor's subspecies (ssp.) (bicolor, verticilliflorum) and a close relative of S. bicolor, Sorghum propinquum (Kunth.) Hitchc, as an outgroup. The samples of ssp. bicolor and verticilliflorum included representatives of racial diversity chosen across a broad geographical range. Two U.S. inbred lines (RTx430 and BTx623) were also included in the spp. bicolor sample. Genomic DNA from young leaves was isolated using a modification of the method of Doyle and Doyle (1987).
Table 1 Accessions and Their Geographical Origins.
DNA Sequences
Primers were designed in S. bicolor to amplify PHYA, B, and C based on sequence data available in GenBank (PHYA—U56729; PHYB—AF182394; PHYC—U56731) and primers made available for PHYB from Kevin Childs (Texas A&M University). A total of 3,834 bp was sequenced in PHYA, 7,214 bp in PHYB and 4,419 bp in PHYC. Sequence data for each gene encompassed all coding exons and the intervening introns. All the primers and their sequences are available from the authors. Polymerase chain reaction (PCR) amplifications were performed using 10 ng of genomic DNA, 4 pmol of each primer, 0.5 U Taq polymerase (Promega), 2.5 mM MgCl2 in a volume of 20 μl under the following conditions: 2 min at 94°C, followed by 30 cycles of 1 min at 94°C, 1 min at 55°C, and 2 min at 72°C, followed by a final 10 min extension at 72°C. The PCR products were cloned (TOPO cloning kit, Invitrogen) and sequenced. Two clones were sequenced for each amplicon to ensure that PCR errors were identified. DNA fragments were re-amplified and re-sequenced when singletons were detected. The sequencing was performed on an ABI 3700 DNA sequencer (PerkinElmer). Sequences have been deposited into GenBank under accession numbers AY466067–AY466100 and AY466452–AY466468.
Sequence Analysis
Polymorphism and divergence measures and tests of neutrality were calculated with DnaSP version 3.0 (Rozas and Rozas 1999). Evidence for non-neutral evolution was investigated using the HKA test (Hudson, Kreitman, and Aguade 1987) and the MK test (McDonald and Kreitman 1991). Maize sequences were kindly provided by Moira Sheehan. For the HKA test, as an alternative to testing the PHY loci against one putatively neutral locus, we compared them to a pooled data set of 12 short loci from across the sorghum genome (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). These 12 loci were chosen from a set of 95 regions based on two criteria: (1) a Tajima's D statistic with an absolute value less than 1.0, and (2) an HKA statistic with a P value greater than 0.2 when tested against the other 95 loci combined. There is clearly substantial recombination present in this reference "locus," but the HKA test's assumption of no recombination is conservative. For these tests we used a subsample of 10 accessions from each study so that the test and reference samples were very similar to each other: each contained five accessions representing the five bicolor races, the two referenced inbred lines, and one accession each of races verticilliflorum, arundinaceum, and aethiopicum. The reference data were: segregating sites = 33, length = 3,767, sample size = 10, average pairwise divergence = 39.4.
Tests for positive selection were conducted with the codeml program from PAML (Yang 1997). The likelihood of Model B, with the lineages leading to S. propinquum and S. bicolor allowed to have a different rate of amino acid evolution from the rest of the tree, was compared to the likelihood of the discrete model with only two site classes (see Yang and Nielsen [2002]). Data sets included one allele each from maize, rice, and S. propinquum, and one or two alleles from S. bicolor, chosen such that the same polymorphism did not appear twice in the data. The PHYC data set also included an allele from wheat.
Results
Levels of Nucleotide Polymorphism in the PHY Gene Family
We surveyed genomic sequence variation across the three phytochrome genes in a diverse sample of S. bicolor accessions. Figure 1 shows the haplotype at variable sites for each accession (no heterozygotes were observed). A summary of nucleotide diversity in the PHY gene family is shown in table 2, with the samples partitioned into the following groups: all S. bicolor (n = 15); ssp. bicolor (n = 7); ssp. verticilliflorum (n = 8). The level of total sequence diversity () is similar among loci and groups, varying less than twofold in all comparisons. However, this similarity in total diversity masks marked differences that are revealed when the data are analyzed by functional class. For example, PHYC has reduced variation at synonymous sites and elevated variation at noncoding sites, whereas PHYB has relatively little variation at replacement sites. Tajima's (1989) D, an index of the frequency spectrum that has an expectation of about zero under the assumption of neutrality, is not significantly different from zero in any of the samples.
FIG. 1. DNA sequence variation at the phytochrome genes. The numbers refer to nucleotide position starting with the initiation codon. r = a replacement site; s = a synonymous site; i = an intron site
Table 2 DNA Sequence Variation at the PHY Loci.
The average level of total sequence variation () across the S. bicolor genome is about 0.0023, based on a sample that included subspecies bicolor and verticilliflorum and covered both coding and noncoding sequences (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). Variation at the PHY genes in this similar sample is about half of that value. Tested against that data set by the method of Hudson, Kreitman, and Aguade (1987; see Methods), total diversity at the PHY genes is not unusual, given their level of divergence to S. propinquum, and is consistent with a lower neutral mutation rate at these loci. When we tested the PHY loci against each other, using the entire sample of 15 S. bicolor, PHY C has somewhat reduced variation compared to PHY A (P = 0.14) and PHY B (P = 0.22), but neither result is significant.
Amino Acid Variation
The phytochrome protein can be divided into two domains with distinct and separable functions: the N-terminal photosensory domain (amino acids 1–623 for PHYA and PHYC, amino acids 1–673 for PHYB) and the C-terminal signal transduction domain, which is the remainder of the protein (Smith 2000). The distribution of replacement polymorphism in these domains differs among the PHY loci in sorghum. At PHYA, four out of five amino acid variants fall in the C-terminal domain, similar to the pattern of divergence observed over larger evolutionary distances (Alba et al. 2000). These variants are all at intermediate frequency, suggesting that they are not likely to be deleterious. In contrast, most replacement polymorphisms at PHY B and PHYC fall in the N-terminal domain (three out of four and six out of seven, respectively), inconsistent with longer-term evolutionary trends.
Under the null hypothesis of selective neutrality, the ratio of synonymous to replacement variation should be the same within and between species. The distribution of synonymous and replacement polymorphisms in each PHY locus, as well as the number of fixed differences to S. propinquum, are shown in Table 3A. The ratio of replacement to synonymous polymorphisms differs among the loci, with PHY B showing the strongest pattern of purifying selection and PHYC showing the least. When the intraspecific data are compared with fixed differences to S. propinquum by Fisher's exact test (McDonald and Kreitman 1991), PHYC is found to depart from the neutral expectation (P = 0.03), apparently in the direction of too many amino acid polymorphisms. However, comparing the pattern at PHYC to that of PHYA and PHYB, the departure can also be interpreted as an excess of fixations at synonymous sites. This hypothesis can be tested by comparing polymorphism and divergence at synonymous sites and noncoding sites. Table 3B shows that this comparison is significant (P = 0.02), suggesting that the departure at PHYC is likely due to unusual evolution of synonymous sites.
Table 3 Tests of Polymorphism and Divergence.
Divergence Between Wild and Cultivated Subspecies
Our sample can be divided into two subsamples that represent the cultivated subspecies bicolor and the wild subspecies verticilliflorum, thought to be ancestral to cultivated sorghum (Kimber 2000). In table 4, we show the distribution of variation between the two subspecies. At all three loci, the wild sample has more segregating sites than the cultivated one, although the sequence diversity is not reduced proportionately. This may be due, at least in part, to the differential sampling of races in the two subspecies, or it may reflect a real difference in allele frequencies.
Table 4 Differentiation Between Wild and Cultivated S. bicolor.
Table 4 also shows that most polymorphisms are not shared between subspecies. In fact, many polymorphisms are found only in one race, as can be seen by inspection of figure 1. This non-random distribution of variation reflects the population structure that underlies the subspecific and racial classification system. Interestingly, in PHYC there are three amino acid polymorphisms that appear to be specific to races verticilliflorum (positions 1099 and 1310) and aethiopicium (4416), though the sample sizes are too small for this observation to be significant. Two other replacement polymorphisms (484 and 569) are found only in the inbred lines.
Rates of Evolution
The three phytochrome genes play different roles in the plant, and they have different patterns of amino acid variation within S. bicolor. We were interested to test whether they are also evolving at different rates relative to their orthologs in maize, which shared a common ancestor with sorghum about 16.5 MYA, and in which each PHY gene is duplicated, presumably as a result of an allopolyploidization event involving two diploid ancestors (Gaut and Doebley 1997). Each S. bicolor gene was compared to the two homologous maize genes and, in all three cases, the S. bicolor gene was more closely related to one of the maize genes than the two maize genes were to each other. The rates presented in table 5 are based on comparison of the S. bicolor gene to its more similar Zea homolog. In all cases, the maize gene that was closest in terms of nucleotide similarity was also the closest in terms of amino acid similarity. Concordant with both the intrageneric and inter-family data of Alba et al. (2000), these analyses show that PHYB is most conserved at the amino acid level, while PHYC is evolving most rapidly, with a ka/ks ratio almost three times higher than that of PHYB. To compare our data with that of Alba et al. (2000), who estimated ka /year for the PHY genes, we divided ka by 33 Myr, twice the divergence time of sorghum and maize. The rates we obtained are similar to those obtained by Alba et al. (2000). The rates of synonymous substitution at the three phytochrome genes are similar to each other and are only about one-half to two-thirds the rate observed in comparisons of maize and sorghum sequences at the waxy and mdh loci (Gaut and Doebley 1997).
Table 5 Rates of Nucleotide and Amino Acid Evolution.
The distribution of replacement variation within functional domains was unexpected. We were interested to know whether these trends were limited to polymorphism data or whether they could also be observed in divergence from S. propinquum and from Zea mays. Estimates of ka by functional domain (table 6) show that the increased rate of protein evolution in the N-terminal domain is observed at PHYB and PHYC in comparisons with S. propinquum, but such an increase is not observed at any loci in comparisons with Zea mays.
Table 6 Rates of Synonymous (Ks) and Nonsynonymous (Ka) in the Major Functional Domains.
To test whether there is any evidence of positive selection on amino acid changes in the phytochrome genes, we analyzed the data with PAML (Yang 1997). Specifically, we tested whether the ratio of nonsynonymous to synonymous changes along the sorghum lineages was significantly different than along the branches to maize and rice. Models that allowed for such differences did not fit the data better than models that assumed that all branches had the same ratios, and no amino acids showed evidence of selection.
Discussion
The level of nucleotide diversity at all PHY genes is lower than the average observed for a large number of unlinked sites sampled in a similar panel of sorghum accessions (M. T. Hamblin, S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich, in press). Phylogenetic studies have shown that phytochrome genes experienced rapid evolution after the duplication events that produced the gene family prior to the divergence of the angiosperms (Alba et al. 2000; Yang and Nielsen 2002), but that the major evolutionary force on phytochromes appears to have been purifying selection in conjunction with strong positive selection on a few sites (Yang and Nielsen 2002). Total variation at the PHY genes in sorghum is low, and synonymous site evolution is low, but some partitions of the data indicate that rates of evolution are not constant in this lineage (table 6) and that some aspect of function may still be evolving, particularly at PHYC. These findings are discussed in more detail in the following sections.
Synonymous Site Evolution at PHYC
A departure from the neutral expectation was detected in comparisons of polymorphism and divergence across functional classes at PHYC; however, the observed pattern is not easily interpreted in terms of models of directional, diversifying, or slightly deleterious evolution. The excess of fixations at synonymous sites compared to polymorphisms suggests that positive selection has accelerated these fixations. HKA tests of variation at PHYC also show that, while noncoding variation and replacement variation are both consistent with the neutral expectation, synonymous variation is not (P = 0.06). The direction of the departure indicates either that synonymous variation is lower than expected or that synonymous divergence is higher than expected. Furthermore, all synonymous polymorphisms at PHYC are singletons, while much of the noncoding and replacement variation is at intermediate frequency.
A similar phenomenon, namely excess synonymous divergence, has been reported for the ADH locus in Arabidopsis, in comparisons between A. thaliana and A. korshinky, and between A. thaliana and A. flagllosa (Miyashita et al. 1998). The authors evidently did not consider this a departure from neutrality, as they offered no interpretation. At the Notch locus in Drosophila (DuMont et al. 2004), excess synonymous fixations are also observed, and result in changes from preferred to unpreferred codons in the D. melanogaster lineage. Whether these codons are themselves the targets of selection or are being fixed by some sort of hitchhiking process is not clear. In the case of PHYC, codon bias as measured by ENC (effective number of codons) is 55.8, intermediate between that of PHYA (53.3) and PHYB (58.2). All of these figures represent fairly low codon bias.
The pattern of sequence evolution at PHYC within the genus Sorghum contrasts with the pattern seen over the longer divergence time between sorghum and maize, during which all three loci have very similar rates of synonymous substitution. This suggests that a change in selection has occurred since the divergence of maize and sorghum. Because we are not able to assign mutations to the S. bicolor and S. propinquum lineages, it is not possible to know whether this change in selection is specific to S. bicolor.
Amino Acid Evolution
Protein changes in the maize/sorghum lineage are occurring about twice as rapidly at PHYC as at PHYA and PHYB. Patterns of within-species amino acid variation also differ across the three genes. PHY A and PHYC have similar levels of replacement variation, but PHYA is more variable at the C-terminus of the protein, a pattern observed at all three loci across diverse angiosperm lineages (Alba et al. 2000). PHYC, in contrast, is most variable in the photosensory domain near the N-terminus, a trend that is maintained in divergence from S. propinquum, but not in divergence from Z. mays (table 6). These observations suggest that phyC may have experienced a change in selective constraint or be diverging in function in the sorghum lineage. (Note that the lower ka/ks ratio in the comparison of S. bicolor to S. propinquum is due to the unusually elevated rate of ks.) In other lineages, the presence of PHYE, PHYD, or duplicated PHYB loci provides more opportunities for functional divergence in the red-light response. In sorghum, with only three PHY loci, phyC may play a number of different roles in the life of the plant, as suggested by the work of Monte et al. (2003). PHYC may thus be a target of adaptive evolution.
Although phyB is known to be involved in photoperiod response, we found no evidence of new variation associated with either human or natural selection at this locus. Amino acid variants in PHYB are at low frequency, and purifying selection appears to be strongest on this gene. In contrast, PHYA and PHYC both have a number of amino acid variants at intermediate frequency, suggesting that they are not deleterious. At PHYA, there is no association of amino acid variants with population structure, but at PHYC several of the variants appear to have a race-specific distribution. This may simply be due to population structure, because several silent sites (e.g., 3395 and 3748 in PHYA) also show a race-specific distribution. Alternatively, it is possible that different alleles may be favored in different environments. Larger samples of these races, as well as population samples of cultivated races, would be needed to explore this question adequately. Note that, at PHYC, the variants are associated with races of the wild subspecies, and their distribution could therefore not be due to human selection.
It should be noted that genetic studies of natural variation in photoperiod response have not found much evidence of a role for the phytochrome genes. In Arabidopsis, for example, a large number of loci are associated with quantitative trait loci (QTLs) for flowering time (Koornneef et al. 1998), but none of them maps to PHYA or PHYB. In Scots Pine, a survey of the N-terminal domain of two phytochrome genes (O and P) found no association of amino acid variation with time of bud set (García-Gil, Mikkonen, and Savolainen 2003). Naturally occurring variation in flowering time appears to be complex, and the genes involved may vary across environments (Weinig et al. 2002). The predominant role of purifying selection in evolution of the phytochrome genes suggests that most natural variation in photoperiod response occurs downstream of these receptors.
Sorghum Domestication
Cultivated sorghum is believed to have been derived from the wild subspecies verticilliflorum (Kimber 2000). If domestication caused a bottleneck in cultivated sorghum, we would expect to see a reduction in diversity compared to the wild relatives. In maize, for example, Adh1 has 83% of the diversity found in its progenitor, teosinte, whereas glb1 has 60% (Hilton and Gaut 1998) and te1 has 57% (White and Doebley 1999). Studies of allozyme and restriction fragment length polymorphism (RFLP) variation have provided evidence of a domestication bottleneck in sorghum (Aldrich and Doebley 1992; Aldrich et al. 1992; Cui et al 1995), based on both the number of alleles per locus and total heterozygosity. No comparable reduction is observed at the phytochrome loci: partitioning the variation within subspecies, similar levels of nucleotide diversity are found within ssp. bicolor and ssp. verticilliflorum for each PHY gene (tables 2 and 4).
Although diversity in wild and cultivated sorghum is similar, and there are no fixed differences between the two forms, most of the polymorphisms present are not shared (table 4). This finding is in contrast to RFLP studies, in which 60% of the alleles were detected in both cultivated and wild sorghum (Cui et al. 1995). The different pattern of divergence detected in this study may simply be due to chance, or to the different sampling strategies employed. A more interesting possibility is that it may be due to a different evolutionary history at the PHY loci. If variants at the PHY loci are either slightly deleterious (PHYB), or under diversifying selection (PHYC), then they would not be expected to show the same distribution as neutral markers.
Acknowledgements
We thank K. Childs for sharing information on primer sequences; W. Rooney, M. Tuinstra, and A. Paterson for providing sorghum material and seed; M. Sheehan for providing maize PHY sequences; S. Mitchell and V. Bauer Dumont for discussion; R. G. Reeves, A. Wong, and R. Nielsen for help with implementation and interpretation of PAML; and S. Mitchell, T. Brutnell, J. Labate, C. Aquadro, and two anonymous reviewers for comments on earlier versions of this manuscript. This research was supported by National Science Foundation grant DBI 0115903.
Literature Cited
Alba, R., P. M. Kelmenson, M. M. Cordonnier-Pratt, and L. H. Pratt. 2000. The phytochrome gene family in tomato and the rapid differential evolution of this family in angiosperms. Mol. Biol. Evol. 17:362-373.
Aldrich, P. R., and J. Doebley. 1992. Restriction fragment variation in the nuclear and chloroplast genomes of cultivated and wild Sorghum bicolor. Theor. Appl. Genet. 85:293-302.
Aldrich, P. R., J. Doebley, K. F. Schertz, and A. Stec. 1992. Patterns of allozyme variation in cultivated and wild Sorghum bicolor. Theor. Appl. Genet. 85:451-460.
Childs, K. L., F. R. Miller, M. M. Cordonnier-Pratt, L. H. Pratt, P. W. Morgan, and J. E. Mullet. 1997. The sorghum photoperiod sensitivity gene, Ma3, encodes a phytochrome B. Plant Physiol. 113:611-619.
Clack, T., S. Mathews, and R. A. Sharrock. 1994. The phytochrome apoprotein family in Arabidopsis is encoded by five genes: the sequences and expression of PHYD and PHYE. Plant Mol. Biol. 25:413-427.
Cui, Y. X., G. W. Xu, C. W. Magill, K. F. Schertz, and G. E. Hart. 1995. RFLP-based assay of Sorghum bicolor (L.) Moench genetic diversity. Theor. Appl. Genet. 90:787-796.
Devlin, P. F., D. E. Somers, P. H. Quail, and G. C. Whitelam. 1997. The Brassica rapa elongated internode (EIN) gene encodes phytochrome B. Plant Mol. Biol. 34:537-547.
Doyle, J. J., and J. L. Doyle. 1987. A rapid DNA isolation procedure for small amounts of leaf tissue. Phytochem. Bull. 19:11-15.
DuMont, N., V. Bauer, J. C. Fay, P. P. Calabrese, and C. F. Aquandro. 2004. DNA variability and divergence at the Notch locus region of Drosophila melanogaster and D. simulans: a case of accelerated synonymous site divergence. Genetics (in press).
García-Gil, M. R., M. Mikkonen, and O. Savolainen. 2003. Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Mol. Ecol. 12:1195-1206.
Gaut, B. S., and J. Doebley. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA 94:6809-6814.
Hamblin, M. T., S. E. Mitchell, G. M. White, J. Gallego, R. Kukatla, R. A. Wing, A. H. Paterson, and S. Kresovich. 2004. Comparative population genetics of the panicoid grasses: Sequence polymorphism, linkage disequilibrium, and selection in a diverse sample of Sorghum bicolor. Genetics (in press).
Hilton, H., and B. S. Gaut. 1998. Speciation and domestication in maize and its wild relatives: evidence from the globulin-1 gene. Genetics 150:863-872.
Howe, G. T., P. A. Bucciaglia, W. P. Hackett, G. R. Furnier, M. M. Cordonnier-Pratt, and G. Gardner. 1998. Evidence that the phytochrome gene family in black cottonwood has one PHYA locus and two PHYB loci but lacks members of the PHYC/F and PHYE subfamilies. Mol. Biol. Evol. 15:160-175.
Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.
Kimber, C. 2000. Origins of domesticated sorghum and its early diffusion to India and China. Pp. 3–98 in W. C. Smith and R. A. Frederiksen, eds. Sorghum. John Wiley, New York.
Kolukisaoglu, H. U., S. Marx, C. Wiegmann, S. Hanelt, and H. A. Schneider-Poetsch. 1995. Divergence of the phytochrome gene family predates angiosperm evolution and suggests that Selaginella and Equisetum arose prior to Psilotum. J. Mol. Evol. 41:329-337.
Koornneef, M., C. Alonso-Blanco, A. J. M. Peeters, and W. Soppe. 1998. Genetic control of flowering time in Arabidopsis. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:345-370.
Mathews, S., J. G. Burleigh, and M. J. Donoghue. 2003. Adaptive evolution in the photosensory domain of phytochrome A in early angiosperms. Mol. Biol. Evol. 20:1087-1097.
Mathews, S., and M. J. Donoghue. 2000. Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int. J. Plant Sci. 161:S41-S55.
Mathews, S., M. Lavin, and R. A. Sharrock. 1995. Evolution of the phytochrome gene family and its utility for phylogenetic analyses of angiosperms. Ann. Missouri Bot. Garden 82:296-321.
Mathews, S., and R. A. Sharrock. 1996. The phytochrome gene family in grasses (Poaceae): a phylogeny and evidence that grasses have a subset of the loci found in dicot angiosperms. Mol. Biol. Evol. 13:1141-1150.
Mathews, S., and R. A. Sharrock. 1997. Phytochrome gene diversity. Plant, Cell Environ. 20:666-671.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.
Miyashita, N. T., A. Kawabe, H. Innan, and R. Terauch. 1998. Intra- and interspecific DNA variation and codon bias of the alcohol dehydrogenase (Adh) locus in Arabis and Arabidopsis species. Mol. Biol. Evol. 15:1420-1429.
Monte, E., J. M. Alonso, J. R. Ecker, Y. Zhang, X. Li, J. Young, S. Austin-Phillips, and P. H. Quail. 2003. Isolation and characterization of phyC mutants in Arabidopsis reveals complex crosstalk between phytochrome signaling pathways. Plant Cell 15:1962-1980.
Morgan, P. W., S. A. Finlayson, K. L. Childs, J. E. Mullet, and W. L. Rooney. 2002. Opportunities to improve adaptability and yield in grasses: lessons from sorghum. Crop Sci. 42:1791-1799.
Nagatani, A., J. W. Reed, and J. Chory. 1993. Isolation and initial characterization of Arabidopsis mutants that are deficient in phytochrome A. Plant Physiol. 102:269-277.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Reed, J. W., A. Nagatani, T. D. Elich, M. Fagan, and J. Chory. 1994. Phytochrome A and phytochrome B have overlapping but distinct functions in Arabidopsis development. Plant Physiol 104:1139-1149.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
Smith, H. 2000. Phytochromes and light signal perception by plants—an emerging synthesis. Nature 407:585-591.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
Weinig, C., M. C. Ungerer, L. A. Dorn, N. C. Kane, Y. Toyonaga, S. S. Halldorsdottir, T. F. Mackay, M. D. Purugganan, and J. Schmitt. 2002. Novel loci control variation in reproductive timing in Arabidopsis thaliana in natural environments. Genetics 162:1875-1884.
Weller, J. L., M. E. Schreuder, H. Smith, M. Koornneef, and R. E. Kendrick. 2000. Physiological interactions of phytochromes A, B1 and B2 in the control of development in tomato. Plant J. 24:345-356.
White, S. E., and J. F. Doebley. 1999. The molecular evolution of terminal ear1, a regulatory gene in the genus Zea. Genetics 153:1455-1462.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556.
Yang, Z., and R. Nielsen. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917.(Gemma M. White*, Martha T)