当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第9期 > 正文
编号:11255085
Average Allozyme Heterozygosity in Vertebrates Correlates with Ka/Ks Measured in the Human-Mouse Lineage
     * School of Biological Sciences, University of Wales Swansea, Singleton Park, Swansea, SA2 8PP, UK

    CSIRO Marine Research, Hobart, GPO Box 1538 Tasmania, 7001 Australia

    E-mail: d.o.f.skibinski@swansea.ac.uk.

    Abstract

    It is well established that different allozyme proteins vary in heterozygosity in averages made over large numbers of species. For example, the enzyme 6-phosphogluconate dehydrogenase has a much higher average heterozygosity than glutamate dehydrogenase. Allozyme data alone provide insufficient power to determine the evolutionary cause of such a difference. Many studies have now been carried out on the DNA sequences coding for allozymes. These have identified diverse selective and nonselective causes of polymorphisms at individual loci. However the studies are mainly in a small number of model species; thus, it is difficult to identify from these DNA studies specific causes of global average heterozygosity differences among allozyme proteins. Here we demonstrate that estimates of average heterozygosity for 37 allozyme proteins in vertebrates correlate positively with Ka and Ka/Ks but not with Ks, measured in the human-mouse lineage. The values of Ka/Ks are less than 0.25, and Ka/Ks is negatively correlated with subunit number (quaternary structure), a measure of structural constraint. Proteins with lower levels of constraint have higher values of both Ka/Ks and heterozygosity. These results better support the hypothesis that differences in average allozyme diversity between proteins are more closely related to differences in the level of purifying selection than to differences in the underlying mutation rate or level of positive selection.

    Key Words: allozyme ? heterozygosity ? Ka ? Ks ? constraint ? substitution

    Introduction

    The use of protein electrophoresis to study allozyme variation has been an important method for studying genetic diversity in natural populations for the last forty years. However, it has been suggested that, compared with the analysis of DNA sequences, the statistical analysis of allozyme data has little power for identifying unambiguously the causes of genetic diversity (Lewontin 1991). In the last two decades, many studies, using a variety of tests, have focused on the DNA sequences coding for allozyme proteins. These have been more successful in pinpointing the evolutionary forces acting on individual allozyme loci and indicate a great diversity of results in terms of the action of these forces. At some loci, positive directional or balancing selection have been detected, and at others the variation is neutral or nearly neutral (Hey 1999).

    In this paper we focus on the question as to why there are differences in average heterozygosity among the allozyme proteins scored in electrophoretic studies. For example, 6-phosphogluconate dehydrogenase examined in 507 vertebrate species has a mean (standard error [SE]) heterozygosity of 0.110 (0.007), whereas glutamate dehydrogenase in 188 vertebrate species has a mean of only 0.031 (0.008) (Ward, Skibinski, and Woodwark 1992). This question is distinct from the question as to why there are differences in heterozygosity between individual allozyme loci in a particular species or population. The DNA studies carried out for allozyme proteins do not provide a clear consensus answer to the first question because of the diversity of the results obtained. In addition, the number of organisms analyzed for DNA sequence variation is rather limited compared with allozyme surveys; most results for the former come from Drosophila and widely studied mammals such as humans.

    Differences in average heterozygosity between allozyme proteins could result from differences in selective constraint such that proteins with higher heterozygosity have a greater number of neutral or nearly neutral mutations and less purifying selection. Alternatively, the differences in heterozygosity might reflect differences between proteins in the underlying mutation rate. A third possibility is that the differences reflect variation in the propensity of particular proteins to respond to environmental influences; fluctuating selection due to changing environments (Gillespie 1991) or balancing selection might then lead to average differences in heterozygosity between allozyme proteins.

    For protein coding sequences, the synonymous rate (Ks) is often regarded as a measure of the underlying mutation rate (Miyata, Yasunaga, and Nishida 1980), though it may be influenced by other factors (Williams and Hurst 2002). By contrast, the nonsynonymous rate (Ka) or the ratio Ka/Ks (which corrects for variation in Ks among proteins) is often regarded either as a measure of the amount of purifying selection on the protein or as a measure of the amount of positive selection. For most genes, nonsynonymous rates are lower than synonymous rates and are much more variable from gene to gene; this is thought to reflect differences in the extent of selective constraint and purifying selection among proteins (Kimura 1983; Graur and Li 2000). For allozyme data there are no directly analogous methods of measuring mutation rate or constraint. However, DNA databases of some organisms now contain sequence information for all the allozyme proteins regularly examined in electrophoretic studies. Here we estimate Ka and Ks for these allozyme proteins by comparing human and mouse DNA sequences. As the coalescence time for the different protein genes in this lineage will be quite similar, differences in relative rates between proteins should reflect differences in selection or mutation rate rather than time elapsed. The correlations between these Ka and Ks estimates and average heterozygosity for allozyme proteins in mammals and other vertebrates are examined.

    Materials and Methods

    Heterozygosity values and information on quaternary structure for enzymes and other proteins scored routinely in allozyme studies were taken from tables VIII and IX in Ward, Skibinski, and Woodwark (1992). The allozyme databases used are available with documentation on request from the authors.

    Locuslink at the National Center for Biotechnology Information (NCBI) site was used to search for proteins by name (e.g., phosphoglucomutase). In some cases sequences from both human and mouse could be obtained in this way. If not, the human (or mouse) sequence was used in a Blast search to identify the most closely similar coding sequence for that enzyme from mouse (or human). Even when both sequences could be obtained from Locuslink, Blast was still used to check for the most closely similar coding sequences between species for that protein. Using this approach, mouse and human sequences could be obtained for almost all the proteins referred to in Ward, Skibinski, and Woodwark's (1992) table VIII. Sequences could obviously not be obtained for some very general categories used in allozyme studies, for example "general protein" or "esterase." In addition, a small number of proteins scored for very few species in table VIII were excluded. Sequences were also collected for the rat and cow using Blast. Unfortunately, the DNA databases are less complete for these species, and sequences for only a proportion of the proteins referred to in table VIII could be obtained. Thus, the present study deals almost exclusively with human and mouse sequences.

    The proteins used are: alcohol dehydrogenase (ADH, EC 1.1.1.1), glycerol-3-phosphate dehydrogenase (G3PDH, 1.1.1.8), iditol (sorbitol) dehydrogenase (IDDH, 1.1.1.14), lactate dehydrogenase (LDH, 1.1.1.27), malate dehydrogenase (MDH, 1.1.1.37), NADP-malate dehydrogenase (malic enzyme) (ME, 1.1.1.40), isocitrate dehydrogenase (IDH, 1.1.1.42), phosphogluconate dehydrogenase (6PGDH, 1.1.1.44), glucose dehydrogenase (GLUDH, 1.1.1.47), glucose-6-phosphate dehydrogenase (G6PDH, 1.1.1.49), xanthine dehydrogenase (XDH, 1.1.1.204), glyceraldehyde-3-phosphate dehydrogenase (GAPDH, 1.2.1.12), glutamate dehydrogenase (GDH, 1.4.1.2–4), diaphorase (DIA, 1.8.1.4), catalase (CAT, 1.11.1.6), superoxide dismutase (SOD, 1.15.1.1), purine nucleoside phosphorylase (NP, 2.4.2.1), aspartate aminotransferase (glutamate oxalate transaminase) (AAT, 2.6.1.1), alanine aminotransferase (ALAT, 2.6.1.2), hexokinase (HK, 2.7.1.1.1), creatine kinase (CK, 2.7.3.2), adenylate kinase (AK, 2.7.4.3), alkaline phosphatase (ALP, 3.1.3.1), acid phosphatase (ACP, 3.1.3.2), amylase (AMY, 3.2.1.1), leucine aminopeptidase (LAP, 3.4.11.1), adenosine deaminase (ADA, 3.5.4.4), aldolase (ALD, 4.1.2.13), fumarase (FUM, 4.2.1.2), aconitase (ACO, 4.2.1.3), triose phosphate isomerase (TPI, 5.3.1.1), mannose phosphate isomerase (MPI, 5.3.1.8), glucose phosphate isomerase (GPI, 5.3.1.9), phosphoglucomutase (PGM, 5.4.2.2), haemoglobin (HB), transferrin (TF), and albumin (ALB).

    For many allozyme proteins, two or more loci have been sequenced. For example, there are both liver and intestinal forms of alkaline phosphatase. Those loci thought unlikely to be scored in routine vertebrate allozyme surveys have been excluded. Subsequently, when more than one locus remained per protein, the values for the estimates relating to substitution rates and protein size were averaged over the loci sequenced for that protein. Examples are the soluble and mitochondrial forms of enzymes such as malate dehydrogenase, malic enzyme, isocitrate dehydrogenase, superoxide dismutase, and aconitase. The subcellular locations of these enzymes are often unspecified in allozyme surveys. Accession numbers of all the sequences used in the study are available on request from the authors.

    The sequences were translated and aligned in Bioedit (Hall 1999) using ClustalW (Thompson, Higgins, and Gibson 1994). The amino acid alignment was tidied by eye and the initiation and termination codons removed. The alignment was then back translated. Nonsynonymous (Ka) and synonymous (Ks) substitution rates were calculated using three different methods, that of Nei and Gojobori (1986) implemented in Dnasp (Rozas and Rozas 1999; NG86), that of Comeron (1995) implemented in K estimator (Comeron 1999; C95), and the maximum likelihood method of Goldman and Yang (1994) implemented in PAML (Yang 1997; ML). The number of nonsynonymous substitutions resulting from radical charge changes was calculated using the program HON-NEW (Zhang 2000). Radical charge changes are those nonsynonymous substitutions that result in the replacement of one amino acid by another of a different electrical charge. This latter method may be most relevant to allozyme variation where allele product mobility differences reflect differences in charge. Significance levels for Pearson's parametric and Spearman's nonparametric correlation coefficients are indicated in the text below as *P 0.05, **P 0.01, and ***P 0.001.

    Results and Discussion

    Table 1 gives basic variables for each of the 37 proteins analyzed. As expected (Miyata, Yasunaga, and Nishida 1980; Li, Wu, and Luo 1985), the average value of Ka for radical charge changes in the human-mouse lineage (mean Ka = 0.046, SE = 0.006) is very much less than the average synonymous rate (mean Ks = 0.584, SE = 0.020). These compare with average values for Ka and Ks of 0.090 and 0.460 measured in a large sample of 1,880 human-mouse orthologs (Makalowski and Boguski 1998). Also as expected, the variability in Ka (range 0.004 to 0.154, coefficient of variance 74.7%) is very much greater than the variability in Ks (range 0.323 to 0.879, coefficient of variance 20.8%).

    Table 1 Allozyme Proteins Used Together with Some of the Variables Analyzed.

    Table 2 gives Pearson's correlation coefficients calculated between average allozyme heterozygosity estimates for mammals or other taxonomic groupings with various statistics relating to nonsynonymous and synonymous substitutions in the human-mouse lineage. Positive, and often significant, correlations are observed between allozyme heterozygosity and Ka or the number of nonsynonymous substitutions. The highest correlations occur for radical charge differences, although differences between the significant correlation coefficients are quite small. Given that Ka is a per site measure that does not take account of protein size, lower correlations might be expected than with the number of nonsynonymous substitutions; however, there is no marked difference.

    Table 2 Pearson's Correlation Coefficients Between Average Allozyme Heterozygosity Estimates for Mammals or Other Taxonomic Groupings with Statistics Relating to Nonsynonymous and Synonymous Substitutions in the Human-Mouse Lineage.

    Significant correlations of the same magnitude are also observed for Ka/Ks. Figure 1 plots the data points giving the correlation of 0.620 between Ka/Ks and mean heterozygosity for pooled vertebrates. Figure 2 is presented to demonstrate how selection of subsets of proteins might influence the value of the correlation coefficient. The proteins were ranked according to the number of vertebrate species contributing data for that protein. The correlation was then computed for the top ten proteins and plotted against the number of species contributing data as a proportion of the number of species contributing data summed up over all proteins. The correlations and proportions were then recalculated and plotted for the top eleven, top twelve, and so on until all proteins were included. Thus, the bottom left-hand point indicates that about 60% of species contributed to the ten most commonly assessed allozyme proteins, and for this subset of data the correlation between heterozygosity and Ka/Ks is about 0.15. The plotted points cluster more tightly to the right of the graph, as the proteins included later on have data contributed by relatively fewer species. The graph shows that the coefficient reaches a relatively stable value and does not fluctuate appreciably as proteins scored for fewer species are finally included. In figure 1, the coefficient of determination is 36%, and in factor analysis 81% of the joint variation in the two variables is accounted for by the first principal component.

    FIG. 1. Relationship between average heterozygosity for 37 allozyme proteins averaged over many vertebrate species and Ka/Ks measured in the human-mouse lineage

    FIG. 2. Pearson's correlation between average heterozygosity and Ka/Ks plotted against an increasingly larger proportion of allozyme data and number of allozyme proteins contributing data to the correlation (see text for further explanation). Contour lines for 5% and 1% significance of the correlation coefficient are shown

    The large and significant correlation between average allozyme heterozygosity for pooled vertebrates and Ka/Ks is the main result from this study. The correlation might be caused by a common factor with which the two variables covary positively. One possibility is selective constraint, proteins with higher Ka/Ks and average heterozygosity experiencing less intense purifying selection and having relatively more mutations that are neutral or nearly neutral. Another possibility is that the increasing values of both variables reflect an increasing level of positive selection.

    In the present study, the average values of Ka/Ks for the NG86, C95, ML, and radical change methods are 0.100, 0.124, 0.094, and 0.078, respectively. These compare well with an average value of 0.093 for genes in a study of 1,581 human-mouse orthologs (Zhang and Li 2004) that, in common with most allozymes, are expressed in many tissues (housekeeping genes). These values, lower than observed for tissue-specific genes (e.g., liver, 0.233; lung, 0.255), could reflect higher selective constraint and purifying selection in such housekeeping genes. With respect to the causes of variation in heterozygosity, Hughes et al. (2003) observed significantly lower variation (over 1,442 human single nucleotide polymorphisms (SNPs)) for radical amino acid compared with synonymous changes, consistent with widespread purifying selection.

    Observations of Ka/Ks > 1 provide strong evidence for positive directional selection. However, Endo, Ikeo, and Gojobori (1996) identified only 0.5% of 3,595 gene groups to have experienced positive selection using this criterion, suggesting that positive selection is uncommon. The maximum value of Ka/Ks observed here is 0.241, for transferrin. An overall value of Ka/Ks < 1 for a gene does not rule out the possibility of positive selection occurring in one part of it. For example, Hellman et al. (2003) observed higher divergence in 5' untranslated regions than expected from SNP diversity in a study comparing 1,226 transcripts between humans and chimps; Fay, Wyckoff, and Wu (2004) reported for Drosophila that genes with higher Ka/Ks have excess amino acid divergence (assessed against polymorphisms); and Clark et al. (2003) observed accelerated evolution in several functional gene classes in the human lineage in a study of 7,645 human-chimp-mouse orthologs. Thus, positive selection might be prevalent, even if Ka/Ks < 1. A sliding window test for positive selection within the genes studied here has been performed. Windows of width 40–100 codons with steps of 20–50, depending on coding sequence length, were used. For the radical charge method, Ka/Ks ranges from 0.012 to 0.241 for the 37 individual allozyme proteins. In the sliding window analysis the range is 0 to 0.501 for a total of 477 separate windows. The analysis has thus failed to find evidence for positive selection in any gene. The method does, however, have low power compared with codon-specific methods (Suzuki and Gojobori 1999; Yang and Nielsen 2002), but these require multiple alignments of many sequences.

    There are other considerations in assessing the possible role of positive selection. Directional selection can drive variation from populations, as in a selective sweep, for example. In this circumstance, genes with higher Ka should have a lower rather than the higher average heterozygosity observed here. However, a positive correlation between genetic diversity and divergence can occur with environmentally caused fluctuations in selection coefficients (Gillespie 1991; Skibinski and Ward 1998), though there is as yet no strong evidence that allozyme loci are subject in a pervasive manner to this form of selection.

    Many studies have related allozyme heterozygosity to constraint inferred through protein structural considerations pertaining to quaternary structure or subunit size (Ward, Skibinski, and Woodwark 1992, and references therein). The structure of a multimeric protein is constrained in the surface regions where subunits interact; by contrast, in monomeric proteins the entire surface of the protein lacks this constraint, perhaps leading to a relatively higher overall neutral mutation rate and the observed higher average heterozygosity. If the correlated increase in Ka/Ks and average heterozygosity is caused by a decrease in selective constraint, then proteins with higher values of both variables should have lower structural constraint. This is observed in the present study. Pearson's correlation between Ka/Ks and subunit number is r = –0.350* (Spearman's r = –0.351*). The contrast is greatest between monomorphic (Ka/Ks = 0.106) and multimeric allozymes (Ka/Ks = 0.067); the difference is significant in a t-test (P = 0.039) and a one-tailed Mann-Whitney test (P = 0.034). The alternative hypothesis, that increasing Ka/Ks reflects increasing positive selection, must explain why the less structurally constrained proteins show more positive selection. The concept that proteins differ in the extent to which that they are "environmentally challenged" (Gillespie 1991) offers a possible rationale.

    Balancing selection, both modern and ancient, might also cause parallel increases in average heterozygosity and Ka/Ks, as in the case of the MHC complex (Hughes and Nei 1989; Garrigan and Hedrick 2003). The question then is whether balancing selection is sufficiently prevalent, not only in humans and mice but also in vertebrates generally, to give the observed correlation. Although there are instances of balancing selection on allozyme loci in some animals, for example Drosophila, there is as yet no relevant evidence (from the allozyme or DNA literature) for the vast majority of the species that contribute to the mean allozyme heterozygosity estimates used in the current analysis. This hypothesis must also be reconciled with the negative correlation between Ka/Ks and subunit number.

    For the individual vertebrate groups shown in table 2, significant correlations in mammals occur between average allozyme heterozygosity and ‘number of nonsynonymous substitutions,’ Ka, and Ka/Ks. Some significant but generally lower values are obtained for other groups, for example birds. The correlation values tend to be lower for the specific taxonomic groups (fish, birds, reptiles, and amphibia) compared with the values obtained when these groups are pooled (vertebrates minus mammals). This can be explained if the drift (stochastic) error in the estimates of heterozygosity for individual allozyme loci within species is averaged out by the pooling. Correlations between Ka/Ks for radical charge changes and heterozygosity were also computed for two human allozyme studies (Harris, Hopkinson, and Robson 1974; King and Wilson 1975) and two mice studies (Britton-Davidian and Thaler 1978; Britton-Davidian et al. 1989). Pearson's correlations for the human studies are r = 0.114 (Spearman's r = 0.198) for 15 allozymes and r = 0.073 (Spearman's r = 0.034) for 18 allozymes; values for the mouse studies are r = –0.136 (Spearman's r = 0.023) for 12 allozymes and r = –0.100 (Spearman's r = 0.039) for 17 allozymes. These values are all low and nonsignificant. If these correlations are recalculated using the vertebrate averages instead, but only for the corresponding allozymes scored in each of the above studies, the values are much higher: r = 0.642** (Spearman's r = 0.500) and r = 0.362 (Spearman's r = 0.285) for the allozymes scored in the human studies, and r = 0.496 (Spearman's r = 0.572) and r = 0.613** (Spearman's r = 0.642**) for the allozymes scored in the mouse studies. Again, the existence of substantial stochastic variation can explain the low and nonsignificant correlation values for the individual human and mouse studies, demonstrating the value of pooling over many species.

    The correlations of Ks with heterozygosity are mostly low and nonsignificant (table 2). There is only one significant correlation, and this is only significant at the 5% level; given the remaining 27 nonsignificant correlations, this is quite possibly a statistical artefact. Thus, there is no evidence that a substantial part of the variation in average heterozygosity between proteins is caused by variation in the underlying mutation rate on a per site basis. Multiplying Ks by the number of nonsynonymous sites should give a more realistic estimate of the neutral mutation rate that takes account of protein size. The correlations of this quantity with heterozygosity tend to be higher than those for Ks but are all nonsignificant.

    The amount of variation in average allozyme heterozygosity explained statistically by Ka/Ks, quaternary structure, coding sequence length, and the number of radical sites has been investigated by linear regression (table 3). When the independent variables are individually put first into the equation, the coefficient of determination for Ka/Ks is appreciably higher than that for the other variables. After Ka/Ks, of the other variables, only quaternary structure seems to be influential, though there are borderline significant results for coding sequence length. When quaternary structure is then put first into the equation, coding sequence length (added second) explains negligible variation, whereas Ka/Ks (added third) explains a substantial and significant amount for mammals and vertebrates overall. These results suggest that although some of the variation in average heterozygosity explained by Ka/Ks can be attributed to quaternary structure, a significant proportion is explained by an as yet unidentified factor whose effects are reflected in variation in Ka/Ks. Together, quaternary structure and Ka/Ks explain a substantial proportion (42.3%) of the variation in average allozyme heterozygosity for vertebrates.

    Table 3 Average Allozyme Heterozygosity Explained by Regression on Ka/Ks (for Radical Sites) Quaternary Structure (Number of Subunits), Coding Sequence (Cds) Length, and the Number of Radical Sites.

    Values of Ka/Ks for radical charge changes were also computed for the lineages leading to the mouse and to the rat, and also from their common ancestor to humans for a smaller sample of 30 proteins. Pearson's correlations of these lineage Ka/Ks values with vertebrate heterozygosity were, respectively, r = 0.319 (Spearman's r = 0.432*), r = 0.539** (Spearman's r = 0.362*), and r = 0.460* (Spearman's r = 0.283); with mammal heterozygosity they were, respectively, r = 0.248 (Spearman's r = 0.404*), r = 0.369* (Spearman's r = 0.298), and r = 0.432* (Spearman's r = 0.271). These results are broadly consistent with those obtained for the larger sample of proteins for the human-mouse lineage.

    The correlation coefficients obtained in the present study are affected by many sources of error. The allozyme averages are based on large samples of loci, but the contributing species and populations do not overlap exactly for the different proteins; relative values of Ka/Ks for the human-mouse lineage may not be representative of those in other mammalian and vertebrate lineages; the estimates of Ka and Ks and allozyme heterozygosity will be subject to sampling and stochastic error; the technique of protein electrophoresis does not detect all nonsynonymous variation; electrophoretic alleles can be composite with different amino acid sequences giving the same charge, and there is evidence that some allozyme proteins might characteristically have more of this hidden variation than others (Barbadilla, King, and Lewontin 1996). Any future work in which some of these errors are reduced might well result in correlations higher than those observed in the present study. Finally, it should be emphasised that this paper focuses on the cause of a difference in average heterozygosity between allozyme proteins, not the cause of differences in heterozygosity between the individual loci contributing data for each protein. The current study does not throw direct light on the evolutionary forces responsible for the allozyme variation observed at individual loci within particular populations or species. However if selective constraint has some importance in relation to average heterozygosity differences, it would be reasonable to assume for it some importance in relation to differences between individual loci.

    Acknowledgements

    We thank Martin Lercher for helpful comments on an earlier version of this manuscript.

    Literature Cited

    Barbadilla, A., L. M. King, and R. C. Lewontin. 1996. What does electrophoretic variation tell us about protein variation? Mol. Biol. Evol. 13:427-432.

    Britton-Davidian, J., J. H. Nadeau, H. Croset, and L. Thaler. 1989. Genic differentiation and origin of Robertsonian populations of the house mouse (Mus musculus domesticus Rutty). Genet. Res. Camb. 53:29-44.

    Britton-Davidian, J., and L. Thaler. 1978. Evidence for the presence of two sympatric species of mice (Genus Mus L.) in southern France based on biochemical genetics. Biochem. Genet. 16:213-225.

    Clark, A. G., S. Glanowski, and R. Nielsen, et al. (14 co-authors). 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960-1963.

    Comeron, J. M. 1995. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J. Mol. Evol. 41:1152-1159.

    Comeron, J. M. 1999. K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15:763-764.

    Endo, T., K. Ikeo, and T. Gojobori. 1996. Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13:685-690.

    Fay, J. C., G. J. Wyckoff, and C.-I. Wu. 2004. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:1024-1026.

    Garrigan, D., and P. W. Hedrick. 2003. Perspective: detecting adaptive molecular polymorphisms: lessons from the MHC. Evolution 57:1707-1722.

    Gillespie, J. H. 1991. The causes of molecular evolution. Oxford University Press, Oxford.

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol. Evol. 11:725-736.

    Graur, D., and W.-H. Li. 2000. Fundamentals of molecular evolution. Sinauer Associates, Sunderland, Mass.

    Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95-98.

    Harris, H., D. A. Hopkinson, and E.B. Robson. 1974. The incidence of rare alleles determining electrophoretic variants: data on 43 enzyme loci in man. Ann. Hum. Genet. 27:237-253.

    Hellmann, I., S. Zollner, W. Enard, I. Ebersberger, B. Nickel, and S. Paabo. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 13:831-837.

    Hey, J. 1999. The neutralist, the fly and the selectionist. Trends Ecol. Evol. 14:35-38.

    Hughes, A. L., and M. Nei. 1989. Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc. Natl. Acad. Sci. USA. 86:958-962.

    Hughes, A. L., B. Packer, R. Welch, A. W. Bergen, S. J. Chanock, and M. Yeager. 2003. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc. Natl. Acad. Sci. USA 100:15754-15757.

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

    King, M.-C., and A. C. Wilson. 1975. Evolution at two levels in humans and chimpanzees. Science 188:107-116.

    Lewontin, R. C. 1991. Twenty-five years ago in genetics: electrophoresis in the development of evolutionary genetics: milestone or millstone? Genetics 128:657-662.

    Li, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174.

    Makalowski, W., and M.S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:9407-9412.

    Miyata, T., T. Yasunaga, and T. Nishida. 1980. Nucleotide sequence divergence and functional constraint in mRNA evolution. Proc. Natl. Acad. Sci. USA 77:7328-7332.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.

    Skibinski, D. O. F., and R. D. Ward. 1998. Are polymorphism and evolutionary rate of allozyme proteins limited by mutation or selection? Heredity 81:692-702.

    Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.

    Ward, R. D., D. O. F. Skibinski, and M. Woodwark. 1992. Protein heterozygosity, protein structure, and taxonomic differentiation. Evol. Biol. 26:73-157.

    Williams, E. J. D., and L. D. Hurst. 2002. Is the synonymous substitution rate in mammals gene-specific? Mol. Biol. Evol. 19:1395-1398.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556.

    Yang, Z., and R. Nielsen. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917.

    Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:56-68.

    Zhang, L., and W.-H. Li. 2004. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. 21:236-239.(David O. F. Skibinski* an)