Gene_Genealogies(期刊论文)

Gene Genealogies When the Sample Size Exceeds the Effective Size of the Population

http://www.100md.com 《分子生物学进展》2003年第2期

     Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts7vk, http://www.100md.com

    Abstract7vk, http://www.100md.com

    We study the properties of gene genealogies for large samplesusing a continuous approximation introduced by R. A. Fisher.We show that the major effect of large sample size, relativeto the effective size of the population, is to increase theproportion of polymorphisms at which the mutant type is foundin a single copy in the sample. We derive analytical expressionsfor the expected number of these singleton polymorphisms andfor the total number of polymorphic, or segregating, sites thatare valid even when the sample size is much greater than theeffective size of the population. We use simulations to assessthe accuracy of these predictions and to investigate other aspectsof large-sample genealogies. Lastly, we apply our results tosome data from Pacific oysters sampled from British Columbia.This illustrates that, when large samples are available, itis possible to estimate the mutation rate and the effectivepopulation size separately, in contrast to the case of smallsamples in which only the product of the mutation rate and theeffective population size can be estimated.

    Key Words: coalescent theory • genealogies • effective population size • multiple mergersih2-i, 百拇医药

    Introductionih2-i, 百拇医药

    Although the history of population genetics dates back morethan one hundred years, the genealogical approach that characterizesmodern work emerged only during the 1970s in response to newly availablegenetic data . It wassoon formalized as the coalescent by and studied extensively from a more biological standpoint by and . The coalescent is intuitivelyappealing, has a relatively simple mathematical structure, andis easily applied to data. Thus it has led to impressive advancesand now frames most work in population genetics. A number oftests of the coalescent null model have been proposed, amongthem D and the statistics of .Because of the overwhelming historical importance of the neutraltheory of molecular evolution , these tests areoften mistakenly viewed as tests of selective neutrality only.However, the standard coalescent model involves a long listof assumptions, and when the model is rejected it is difficultto distinguish among several possible explanations .

    In addition to natural selection, demographic factors like populationsubdivision, population growth, and population decline can causethe model to be rejected. Accepting their lack of specificity,the fact that D and the statistics of have power to detect these deviations can be viewedas advantageous, because subdivision and changes in size areimportant biological properties of populations. Here we consideran assumption of the coalescent that has mostly been overlooked:the assumption that the sample size is much smaller than theeffective size of the population (n << N_e). We deriveexpressions for the expected number of singleton polymorphismsand the expected total number of polymorphisms in a sample thatcan be as large or larger than the effective size of the population.Under the infinite sites model of mutation ,we find that the main effect of large sample size isto increase the number of singletons in the sample relativeto coalescent predictions. The increase in the relative numberof singletons will give negative values of the statistics mentionedabove, and thus will be indistinguishable by these tests fromother factors such as population growth .This is clearly undesirable and suggeststhat the genealogical approach to population genetics shouldbe expanded to include the possibilty that the sample size isnot much greater than the effective size of the population.

    We use a continuous approximation for the sample size dividedby the effective size (x = n/N_e) that was previously employedby and . studiedvariability maintained in a large population by the introductionof a single mutant each generation. He used what is now knownas the infinite sites model of mutation with free recombinationbetween sites and derived expected values of thenumbers of mutants at low frequency (singletons, doublets, etc.),as well as the total number of polymorphisms maintained. Inmodern terms, Fisher's solution applies when the parameter is equal to 2, because is defined to be the mutation rate pergene copy times twice the number of gene cpies in the population.Here we assume a haploid population, so = 2N_eu, but the resultscan be applied to diploid organisms if = 4N_eu. The fact thatFisher assumed exactly one mutant entered the population eachgeneration is irrelevant in comparing predictions about expectedlevels of polymorphism. He simply assumed that there was novariability in the mutation process, whereas today we modelmuations in the population as a Poisson process with rate /2per generation. Another difference between Fisher's approachand the modern genealogical one concerns recombination. Underneutrality, however, the expected values derived by and and those reported by us below do not dependon the recombination rate because the marginal distributionof genealogies at every site is the same regardless of recombination.Predictions about the variances of these quantities would dependon the recombination rate.

    shows predictions for the expected numbersof mutants in one through five copies in the entire population.Fisher used what is now known as the Wright-Fisher model, inwhich the effective size of the population is identical to thecensus size. Thus, predicts the pattern of variabilityin a sample whose size is the same as the effective size ofthe population. The values in the table are scaled in termsof . That is, they hold for = 1, and predictions for othervalues are obtained simply by multiplying these values by .The column marked "Coalescent" shows what is now clear are thepredictions of the standard coalescent model: that singletonsare expected, /2 doublets, /3 triplets, and so on . The coalescent predictions are surprisingly closeto the actual values, even when the entire population is sampled.They are off by a little more than 12% for singletons, 4.6%for doublets, and by less than 1% for all other classes of mutations.This is surprising because a fundamental property of the coalescent—thatat most one common ancestor event can occur in a single generation—doesnot hold for large samples. We show below, however, that thesedifferences between coalescent predictions and reality can bequite large when the sample size is greater than the effectivesize of the population.

    fig.ommitted+/], http://www.100md.com

    Table 1 Coalescent and x = 1 Predictions for the Expected Number of Mutant Factors Maintained in Low Count in the Population when = 2.+/], http://www.100md.com

    It is generally accepted that in many cases the effective sizeof a population will be less than the its actual size (;Hedrick 2000), although one exception to thisis when the population is subdivided . This raisesthe possibility that the sample size in empirical populationgenetics studies might exceed the effective size of the population.This is likely already the case for hypervariable region 1 ofhuman mitochondrial DNA (mtDNA), for which there are n = 9388sequences available (as of June 2002; see )and N_e may only be about 5000 .The work we present here shows that the main effect of thiswill be to increase the proportion of singleton polymorphismsin the sample. Beckenbach (1994) proposed that sample sizeslarger than the effective population size could explain suchseemingly odd patterns of genetic variation in samples of mtDNAdata from Pacific oysters, Crassostrea gigas, from British Columbia.We reanalyze their data below and show that they are in factconsistent with small N_e. However, the mutation rate neededto reconcile the dichotomy between abundant polymorhisms andsmall N_e indicates that n > N_e is not the only explanationfor the observed pattern.

    Theory}j&4}, 百拇医药

    Let x = n/N_e be the scaled sample size from a population ofeffective size N_e. To allow that x could be greater than 1,we assume a population of constant size N in which only N_e individuals(N ">="}j&4}, 百拇医药

    N_e) reproduce and the other N - N_e die without reproducing.Generations are assumed to be discrete; each generation alladults die and are replaced by offspring. We assume that thetypes of these N offspring are obtained by random sampling withreplacement among the N_e individuals that do reproduce. Althoughwe assume a haploid organism, another way to think of this isthat N_e individuals each produce a very large number of "gametes"and the next generation (of N individuals) is a random samplefrom this gamete pool. If N = N_e, then this model is identicalto the usual Wright-Fisher model. We assume that mutations occurat rate u per gene copy per generation, and we use the scaledmutation rate = 2N_eu, because any mutations that happen inthe germ lines of the N - N_e indiviuals that do not reproduceare lost.

    We seek expressions for the expected number of singleton polymorphismsE[₁] and the expected total number of polymorphic or segregatingsites E[S]. Because of the Poisson nature of the mutation process,we havexv@k, http://www.100md.com

    where₁ and are the expected lengths of all the external branchesin the genealogy of the sample and the expected total lengthof the genealogy of the sample, respectively, measured in unitsof 2N_e generations. Under the standard coalescent model (inwhich x 0), we have ₁ = 1 and = 1/i, andthese results can be obtained in a number of different ways. Here,we take a backwards-looking "balls in boxes" approach. Thatis, the genealogy of the sample is generated by throwing n ballsinto N_e boxes, allowing for coalescent events, and repeatingthis procedure each generation with the remaining ancestrallineages until the most recent common ancestor of the sampleis reached. This is a standard method under the coalescent,but when n is large, multiple coalescent events can occur inthe same generation.

    To obtain ₁(x) and (x) here we follow and and consider a continuous approximation of the scaledsample size as N_e goes to infinity for a given x = n/N_e. Inthis case, we can use the fact that the scaled number of ancestorsof the sample of size x converges in probability to its asymptoticmean 1 - e^-x as N_e goes to infinity; see page 267 in .Therefore, in the case of (x) we have the followingrecursion over a single generation,98swzp., http://www.100md.com

    inwhich the time parameter is suppressed because we assume thatthe population is at equilibrium. In words, equation (3) saysthat the expected total branch length of the genealogy of asample of size n in this limit is equal to the lengths of branchesbetween now and the previous generation, n/(2N_e) = x/2, whentime is measured in units of 2N_e generations, plus the expectedtotal branch length of the genealogy of the N_e(1 - e^-x) lineagesremaining one generation in the past.

    and found solutions for (x) usingseries approximations near x = 0. These solutions do not holdwhen x is large but are quite good for x < 2 (see Simulations,below). Here we use the fact that 1 - e^-x is less than 1 forall x, together with results and equation(3) to make predictions for any value of x. In the present notation, equation 1.4b givesor\nt(, 百拇医药

    inwhich g(x) is given by equation 2.24, andwhere = 0.57721566... is Euler's constant. Thus, we use equation(3), but replace the second term on the right with *(1 - e^-x)to make (x) accurate for all x.or\nt(, 百拇医药

    did not consider mutant allele frequencies,and derived the expectations only for mutantsin low copy number and assuming x = 1 However, asolution for ₁(x) can be obtained, again via a recursive equationover a single generation. In this case it is necessary to weightthe contributions of ancestral lineages by the probability thatthey have just one descendent in the sample. We obtain

    Thefirst term on the right represents the increment to ₁ in thefirst generation looking back. It is the same as the first termon the right in equation (3) because all these first-generationbranches have just one descendent in the sample. Some proportionof the ancestors of the sample will have one descendent in thesample, but others will have two, three, four, etc. The numberof ancestors that have a single descendent in the sample isthe same as the number of boxes that contain exactly one ballwhen n balls are thrown into N_e boxes. Like the case of thetotal scaled number of ancestors (1 - e^-x) above, the scalednumber of ancestors that have one descendent in the sample ofsize x converges in probability to its asymptotic mean xe^-xas N_e goes to infinity; see , p. 59). Thus, theterm multiplying ₁(1 - e^-x) on the right side of equation (5)is equal to the proportion of ancestral lineages that have justone descendent in this limit.

    By successively taking derivatives with respect to x on bothsides of equation (5), we can obtain a series approximationto the function ₁(x) near x = 0. This is the method used to obtain his equation (2.24) for g(x). Here weobtainhl:l, 百拇医药

    and this number ofterms is sufficient to give (1) = 1.120439 which is close to the value obtained by shownin . Of course, we cannot expect a series approximationnear x = 0 to be accurate for larger x, so we use equation (5),but put (1 - e^-x) on the right in place of₁(1 - e^-x). This gives ₁(1) = 1.120458 which matches result to six decimal places and makes ₁(x) accuratefor any x.hl:l, 百拇医药

    Because (x) and ₁(x) can be computed, we can use a simple momentmethod to jointly estimate and x. Namely, we equate the observedvalues of ₁ and S with their expectations (1) and (2), and solvenumerically for and x. Because x = n/N_e and n is always known,estimating and x is equivalent to estimating N_e and u. It isalso possible, using the simulations described in the next section,to estimate the likelihood surface for the observed S and ₁or a posterior distribution of and x by Monte Carlo integrationover genealogies. We apply both these methods to some mtDNAdata from Pacific oysters under Application to Oyster Data, below.

    Simulations)c!4':, http://www.100md.com

    We performed simulations to assess the accuracy of these analyticalapproximations over a range of values of N_e and to investigateother properties of these large-sample genealogies. The simulationsbuilt sample genealogies under the discrete-generations modelby randomly choosing the parents of all ancestral lineages eachgeneration. If there are k lineages, this is equivalent to throwingk balls into N_e boxes. The number of balls in each occupiedbox determines the number of common ancestor or coalescent events,and the full genealogy of the sample was recorded. While k isnot small relative to N_e, there can be many coalescent eventsper generation. The program is written in the C programminglanguage and is available at .)c!4':, http://www.100md.com

    compares simulation results with the predictions fromequations (3) and (5) using expressions (4) and (6) on the right-handsides as described above. The results (4) and (6) using seriesthe approximations near x = 0 are also shown. In the case of(x), the predictions of the standard (n << N_e) coalescentare shown as well. The coalescent prediction for ₁(x) is equalto 1 for all x. The simulations presented in were performedwith N_e = 1000 over a range of n from 500 to 10,000 (x = 0.5to x = 10). As expected, equations (4) and (6) do not performwell when x is large. In addition, the predictions of the standardcoalescent are good only for small x. The predictions usingequations (3) and (5) together with equations (4) and (6) areaccurate for all x.

    fig.ommitted6;2uo, 百拇医药

    FIG. 1. Comparison of simulations to analytical results for (A) the total length of external branches and (B) the total length of the genealogy. Dots are the average values among ten thousand simulation replicates, and solid curves plot the theoretical expectations derived in the text. The dashed curve below in (B) is the expectation from the coalescent, and the other dashed lines are series approximation for the expectations around x = 0 (see text for details)6;2uo, 百拇医药

    Whereas the theory of the previous section focused on singletonpolymorphisms, and this is certainly the major effect, shows that other components of the site-frequency distributioncan also differ markedly from the predictions of the standardcoalescent. Looking at equation (5), we can see that, when xis large, nearly all the singleton polymorphisms will be theresult of mutations that occurred in the immediately previousgeneration. Prior to that, few lineages will have only one descendentin the sample. In fact, so many coalescent events will occurin that first generation that doublet, triplet, etc., polymorphismswill be underrepresented relative to the standard coalescent.This is evident in , which displays resultsfor x = 1. In general, there will be a mode in the site-frequencydistribution at mutant counts close to x(1 - e^-x)—approximatelyx when x is large—which is the expected number of ballsper box when n balls are thrown into N_e boxes or, equivalently,the expected number of descendents per lineage. showsthis effect when x = 10.

    fig.ommitted07h4^, 百拇医药

    FIG. 2. The expected proportion of segregating sites at which the mutant base is present in counts ranging from 2 to 20 in a sample of n = 10000. Black bars are averages of ten thousand simulation replicates with N_e = 1000 (x = 10), and grey bars are the analytical prediction of the coalescent . This is just the far left edge of the distribution; mutant counts can be as large 9999. The values for singleton mutants are not shown; they are 0.397 for the simulated data and 0.102 for the coalescent prediction07h4^, 百拇医药

    We also used simulations to examine the accuracy of the theoreticalpredictions when N_e is not large. It might have been expectedthat our results using a continuous approximation for x = n/N_ewould not be accurate for smaller n and n and N_e. Surprisingly,our results give accurate predictions over a very broad rangeof N_e. We do not display these results, but note that the worstcase we examined was n = N_e = 2. The correct result here isE[S] = E[₁] = , whereas our results predict that E[S] = 1.37and E[₁] = 1.12.

    Application to Oyster Data'nu$et, 百拇医药

    It is typical to seek an explanation whenever data show an excessof singleton polymorphisms relative to the predictions of thecoalescent, for instance whenever D is negative.The results presented under Theory, above, show that a samplesize close to or larger than the effective size of the populationcan explain an excess of singletons. Thus, if such a patternis observed, for instance if D is significantlynegative, it may be appropriate to fit the model we consideredhere to the data. Note that if the excess of singletons is greaterthan about 12% , the model will estimate N_eto be less than the sample size n. Thus, the present model shouldprobably not be applied if n is small.'nu$et, 百拇医药

    Boom, Boulding, and sampled n = 141 Pacificoysters, C. gigas, from British Columbia and performed restrictionenzyme digests of their mtDNA. Subsequently, analyzed the pattern of these restriction fragment length polymorphism(RFLP) in the context of the infinite alleles mutation model. He proposed that samples sizes larger than theeffective population size could explain the overabundance oflow-frequency haplotypes (i.e., ones found in a single copy,or a few copies, in the sample of n = 141) in British ColumbianC. gigas. used simulations to show that largesample size can explain such a pattern, with most single-copyhaplotypes resulting from mutations in the immediately previousgeneration and the few middle frequency haplotypes resultingfrom mutations that occurred earlier in the history.

    To illustrate the application of our results, we reanalyzedthe data of , but fromthe perspective of the infinite sites mutation model we haveassumed. The RFLP haplotype frequency data in table 1 of and the lists of fragment sizesin their table 2 were used to estimate that the data are theresult of S = 50 mutations and that for ₁ = 31 of these themutant type is found in only a single copy in the sample. Equatingthese to their expectations (1) and (2) and solving numerically,we obtain point estimates of = 5.8 and x = 10.8, and thus N_e= n/x = 13. We also used our simulation program to estimatethe likelihood surface for these data using Monte Carlo integration(over genealogies). A grid of paired (N_e,) values was examined,and for each of these we computed the log-likelihood of thedata by averaging its value over 50,000 replicate genealogies.The likelihood for each simulated genealogy is easily computedby recording its values of and ₁ and using the fact that, giventhese values, S - ₁ and ₁ are independent Poisson random variableswith parameters ( - ₁) and ₁, respectively. shows theresult. Note that , resealed, could be interpreted asa posterior distribution of N_e and under a Bayesian approach.

    fig.ommitted&+, http://www.100md.com

    FIG. 3. Contour plot of the likelihood surface for the data (n = 141, S = 50, ₁ = 31) of . Contours are draw every three log-likelihood units from the maximum which is marked with an x&+, http://www.100md.com

    Discussion&+, http://www.100md.com

    The genealogies of large samples, where n is on the order ofor even greater than the effective size of the population, differfrom those of smaller samples because multiple coalescent eventsoccur in single generations. Most of these occur in the firstfew generations looking back. Multiple coalescent events are,in fact, the sole cause of the differences between the patternswe have described and the predictions of the coalescent. Thetwo main effects of large sample size, when only single-sitepatterns are considered, are that singleton polymorphisms arerelatively more abundant in large samples and that there isa mode in the site-frequency distribution for mutant countsaround n/N_e. These effects become quite pronounced when n >N_e, and are surprisingly mild when n N_e. Mutations which haveoccurred in the immediately previous generation are the sourceof the excess singletons, and the expected number of these isx/2, or nu. In the standard coalescent, this number is negligiblein comparison to the expected number of singletons and theexpected number of segregating sites ( 1/i),but for large samples these recent mutations can account forthe bulk of polymorphisms in the sample.

    The mode in the site-frequency distribution is similar to thepattern recently described for samples from a single local populationin a metapopulation subject to local extinction and recolonization. In both cases, this is the resultof multiple coalescent events in a single generation. The mutantcount at this mode is equal to the expected number of descendentsper ancestral lineage when ancestors are chosen by randomlythrowing n balls into N_e boxes (in the metapopulation case,the propagule size k replaces N_e). This highlights a potentialproblem with the coalescent approach to studying populationbottlenecks, in which it is assumed that the bottleneck merelyrescales coalescent times. More generally, this could be a problemwhenever populations change in size over time. When the samplesize or the number of ancestral lineages at the time of thebottleneck is not smaller than the effective size of the bottleneckpopulation it will be important to allow for simulaneous coalescentevents. A rigorous but abstract theory of coalescents with suchmultiple mergers is being developed, as well as general theory of such processesboth forward and backward in time ,but so far without attention to making predictions about measuresof genetic variation.

    The application of our model and results to the Pacific oystermtDNA data of shows thatan excess of singleton polymorphisms can lead to estimates ofthe effective size of the population that are smaller than thesample size. An interesting aspect of the present work is that,given appropriate data (i.e., where n > N_e), it will be possibleto estimate N_e and u separately, in contrast to the case ofsmall samples, in which only the composite parameter can beestimated. However, in this case, the parameter estimates themselvesindicate that n > N_e is not the (only) explanation for theobserved pattern. Namely, we estimate u to be equal to /(2N_e)= 5.8/26 = 0.2 per generation. Although it is difficult to sayhow many sites in the mtDNA were effectively surveyed in therestriction digests of,this value of u is unrealistically large. Some other phenomenon,such as recent population growth or natural selection, mustbe the source of (at least some of) the excess singletons inthis sample.

    We did not present an analysis of the obvious data for this:the 9388 sequences of hypervariable region 1 of human mitochondrialDNA mentioned in the Introduction. A preliminary analysis ofthese data revealed that, in contrast to the oyster data, theyshowed a deficiency of singletons rather than an excess. Still,it seems likely that n > N_e for these human mtDNA data. Assumingthis is so, one possible explanation for the absence of thepredicted pattern is that hypervariable region 1 of human mitochondrialDNA does not conform to the infinite sites model .If the nu mutations expected in the immediately previous generationwere to occur mostly at some small number of hypermutable sites,then those sites would have mutant counts greater than 1.?-7|f, 百拇医药

    As molecular technologies develop even further to allow easymeasurement of genetic variation, it will become even more importantto model large-sample genealogies and to develop efficient methodsof analysis. Although the simplicity of the standard coalescentwill be lost, the work presented here shows that a continuousapproximation for x = n/N_e, first used by thenlater by , can give useful analytical results.

    Acknowledgementsao, 百拇医药

    We thank Andy Beckenbach for alerting us to his very relevantwork and for aid in interpreting the oyster RFLP data. We alsothank Matt Hare and Simon Tavaré for helpful discussions.Two anonymous reviewers gave helpful comments on the manuscript.This work was supported by grants DEB-9815367 and DEB-0133760from the National Science Foundation to J.W.ao, 百拇医药

    Golding, Associate Editorao, 百拇医药

    Literature Citedao, 百拇医药

    Beckenbach, A. T. 1994. Mitochondrial haplotype frequencies in oysters: neutral alternatives to selection models. Pp. 188–198 in B. Golding, ed. Non-neutral evolution. Chapman & Hall, New York.ao, 百拇医药

    Boom, J. D. G., E. G. Boulding, and A. T. Beckenbach. 1994. Mitochondrial DNA variation in introduced populations of Pacific oyster, Crassostrea gigas, in British Columbia. Can. J. Fish. Aquat. Sci. 51:1608-1614.ao, 百拇医药

    Donnelly, P., and T. G. Kurtz. 1999. Particle representations for measure-valued population models. Ann. Prob. 27:166-205.

    Ewens, W. J. 1972. The sampling theory of selectively neutral alleles. Theor. Pop. Biol. 3:87-112.am, 百拇医药

    Feller, W. 1968. An introduction to probability theory and its applications, Vol. 1. 3rd edition. John Wiley & Sons, New York.am, 百拇医药

    Fisher, R. A. 1930. The distribution of gene ratios for rare mutations. Proc. R. Soc. Edinb. 50:205-220.am, 百拇医药

    Fu, X.-Y. 1995. Statistical properties of segregating sites. Theor. Pop. Biol. 48:172-197.am, 百拇医药

    Fu, X.-Y., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693-709.am, 百拇医药

    Harris, H. 1966. Enzyme polymorphism in man. Proc. R. Soc. Lond. Ser. B 164:298-310.am, 百拇医药

    Hartl, D. L., and A. G. Clark. 1997. Principles of population genetics. 3rd edition. Sinauer Associates, Sunderland, Mass.am, 百拇医药

    Hawks, J., K. Hunley, S.-H. Lee, and M. Wolpoff. 2000. Population bottlenecks and Pleistocene human evolution. Mol. Biol. Evol. 17:2-22.am, 百拇医药

    Hedrick, P. W. 2000. Genetics of populations. Jones and Barlett, Sudbury, Mass.

    Hudson, R. R. 1983. Testing the constant-rate neutral allele model with protein sequence data. Evolution 37:203-217.j{, http://www.100md.com

    Karlin, S., and J. McGregor. 1972. Addendum to paper of W. Ewens. Theor. Pop. Biol. 3:113-116.j{, http://www.100md.com

    Kimura, M. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to the steady flux of mutations. Genetics 61:893-903.j{, http://www.100md.com

    Kimura, M. 1983. The neutral theory of molecular evolution,. Cambridge University Press, Cambridge.j{, http://www.100md.com

    Kingman, J. F. C. 1982a. The coalescent. Stochastic Process. Appl. 13:235-248.j{, http://www.100md.com

    Kingman, J. F. C. 1982b. On the genealogy of large populations. J. Appl. Prob. 19A:27-43.j{, http://www.100md.com

    Lewontin, R. C., and J. L. Hubby. 1966. A molecular aproach to the study of genic diversity in natural populations II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54:595-609.j{, http://www.100md.com

    Nielsen, R. 2001. Statistical tests of neutrality in the age of genomics. Heredity 86:641-647.

    Pitman, J. 1999. Coalescents with multiple collisions. Ann. Prob. 27:1870-1902.o].'?, 百拇医药

    Sagitov, S. 1999. The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Prob. 36:1116-1125.o].'?, 百拇医药

    Schweisnberg, J. 2000. Coalescents with simultaneous multiple collisions. Electron. J. Prob. 5:1-50.o].'?, 百拇医药

    Simonsen, K. L., G. A. Churchill, and C. F. Aquadro. 1995. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413-429.o].'?, 百拇医药

    Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437-460.o].'?, 百拇医药

    Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.o].'?, 百拇医药

    Takahata, N. 1995. A genetic perspective on the origin and history of humans. Annu. Rev. Ecol. Syst. 26:343-372.o].'?, 百拇医药

    Wakeley, J. 1993. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J. Mol. Evol. 37:613-623.o].'?, 百拇医药

    Wakeley, J., and N. Aliacar. 2001. Gene genealogies in a metapopulation. Genetics 159:893-905 Corrigendum (Fig. 2): 160:1263–1264.o].'?, 百拇医药

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Pop. Biol. 7:256-276.o].'?, 百拇医药

    Wright, S. 1943. Isolation by distance. Genetics 28:114-138.o].'?, 百拇医药

    Accepted for publication October 9, 2002.(John Wakeley and Tsuyoshi Takahashi)

百拇医药网 http://www.100md.com/html/DirDu/2005/05/06/58/22/22.htm