当前位置: 首页 > 期刊 > 《遗传学报》 > 2000年第5期
编号:10257842
植物三种不同遗传方式基因的地理家系谱理论与应用初探
http://www.100md.com 《遗传学报》 2000年第5期
     作者:胡新生

    单位:中国林业科学研究院林业所, 北京 100091

    关键词:基因家系谱;植物地理;双亲遗传;父本遗传;母本遗传

    遗传学报000510

    摘 要: 将已知用于从地理空间上离散或连续分布群体随机抽取基因样本的基因家系谱理论推广到两性异交植物上。由于存在不同的群体间基因迁移率,对3种不同遗传方式的植物基因组(核、叶绿体和线粒体DNA)分别进行了研究。理论上证明对于不同遗传方式的基因,通过相应适当调整有效群体大小和迁移率,现有的基因家系谱理论可直接应用于植物群体上。其中一个结论就是当从离散分布群体中随机抽取n个基因样本时,亚群体间的花粉流和种子流的相对比率可以用亚群体间和亚群体内的DNA碱基序列总差异数量估计值来估算。另一具有理论意义的结论是在离散分布的群体结构模型中,父本遗传的单倍体基因(针叶树的叶绿体DNA上的基因)的平均同祖并合发生时间最短,而在一定条件下,母本遗传的单倍体基因(被子植物的叶绿体DNA和被子或裸子植物的线粒体DNA上的基因)的平均同祖并合发生时间最长,然而这两个结论在连续分布的群体遗传结构模型却难以获得。
, 百拇医药
    中图分类号: Q943 文献标识码: A 文章编号: 0379-4172(2000)05-0440-09

    A Preliminary Approach to the Theory of Geographical Gene Genealogy

    for Plant Genomes with Three Different Modes of Inheritance

    and Its Application

    HU XinSheng

    (Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, P R China)
, 百拇医药
    Abstract: This paper extends to hermaphrodite outcrossing plant populations the existing gene genealogy theories for a sample of genes randomly chosen from geographically discrete or continuously distributed populations. Three plant genomes (nuclear, chloroplast and mitochondrial DNA) with different modes of inheritance are considered separately due to the difference in migration rates. It is shown that on certain assumptions, the previous coalescence theories can be applied to plant by appropriate reparametrization of the effective population size and migration rate specific to each genome. One result is that estimation of the ratio of pollen to seed flow from a sample of n (n≥2) individual genes can be obtained in terms of the number of segregating sites between and within populations that are discretely distributed in space. Another result of theoretical interest is that in the discrete model of population structure, mean coalescent time is the shortest for the paternally inherited genes (cpDNA in conifers) and, given certain conditions, is the longest for the maternally inherited genes (cpDNA in angiosperms and mtDNA in conifers and angiosperms). However, these results are difficult to obtain in the model of population that is continuously distributed in space.≥2) individual genes can be obtained in terms of the number of segregating sites between and within populations that are discretely distributed in space. Another result of theoretical interest is that in the discrete model of population structure, mean coalescent time is the shortest for the paternally inherited genes (cpDNA in conifers) and, given certain conditions, is the longest for the maternally inherited genes (cpDNA in angiosperms and mtDNA in conifers and angiosperms). However, these results are difficult to obtain in the model of population that is continuously distributed in space.
, 百拇医药
    Key words: gene genealogies; plant geography; Biparental inheritance; paternal inheritance; maternal inheritance

    The gene genealogy or coalescent process is an important way of describing the evolutionary process of a population. Unlike traditional population genetic theory, which was developed in terms of inbreeding coefficient[1] or probability of identity by descent[2], analysis of the genealogy focuses on the times at which
, http://www.100md.com
    two or more genes have a common ancestor in the past. It was argued that there were many advantages for the coalescent analysis over some theories of traditional population genetics[3]. The results of traditional and genealogical theories are equivalent when both of them describe the same phenomenon of biological evolution[4]. However, these two methods are applicable to different types of genetic data. Traditional theory uses allele frequency data while the genealogy analysis employs DNA sequence data.
, 百拇医药
    Since the introduction of coalescence theory[5~7], there have been extensive studies on this process[8,9]. These studies focused at first on completely isolated populations. The coalescent process was then addressed for those samples randomly chosen from partially isolated populations[10~15] and from populations with a continuous distribution in space[16,17].

    However, the coalescent process for a sample taken from partially isolated or continuously distributed populations of hermaphrodite plants becomes more complicated.This is because gene flow among plant populations can be mediated by either seed flow or pollen flow, or by both. Furthermore, there is asymmetric migration among three differently inherited plant genomes[18,19]. In most conifers, chloroplast DNA (cpDNA) is paternally inherited, and mitochondrial DNA (mtDNA) is maternally inherited[20]. The nuclear DNA (nDNA) is bi杙arentally inherited. Thus, models of the coalescent process into which both seed and pollen flow are incorporated can provide an insight into evolutionary process of geographically structured or unstructured plant populations. This paper will thus apply the existing gene genealogy theories to plant populations. A simple case for the genealogy under L(L2) partially isolated populations is firstly considered by using Nei and Takahata′s method[14]. Results of the coalescent process for continuously distributed populations[16,17] are then extended to plant species. Practical implications of these theoretical results are finally discussed.
, 百拇医药
    1 General assumptions

    For the three genomes, paternally and maternally inherited organelle genomes are assumed to be haploid. Biparentally inherited nuclear genomesare assumed to be diploid. Only selectively neutral genes are considered. Absence of linkage disequilibrium is assumed among these three types of inherited genes.

    The basic biological framework for investigating genealogy in discretely distributed plant populations with occurrence of seed and pollen flow is outlined below. Our consideration begins with adults in each subpopulation at generation t. These adults produce pollen grains and ovules. Pollen dispersal occurs among subpopulations. In each subpopulation, pollen grains including the migrant fraction, randomly fertilise ovules (randomly mating assumption). Seeds so formed then disperse among subpopulations. Each subpopulation contains a small proportion of migrant seeds. After seed flow, a fixed number of seeds is sampled and these seeds grow up to form adults at the next generation t+1. This process continues from generation to generation.
, 百拇医药
    2 Theoretical analysis

    2.1 Discretely distributed populations

    2.1.1 The existing results For a population with constant effective size (Ne) per generation (Wright-isher model), the mean coalescent time for a sample of n individual genes, E(T), is E(T)=4Ne(1-1/n) and its variance, (i-1)]2[21]. If a constant neutral mutation rate, , is assumed, the total number of segregating sites, E(S), and its variance, V(S), can be obtained, i.e. E(S)=4Neμa and V(S)=E(S)[1+E(S)b/a2] where
, 百拇医药
    The effective population size of haploid genes is assumed to be half that of diploid genes. Therefore, estimates of the above parameters for paternally and maternally inherited haploid organelle genes can be obtained by replacing Ne with Ne/2 in equations mentioned above.

    2.1.2 The case in plant population Now we consider the case where the population is subdivided into L subpopulations. It can be seen from above equations that the coalescent analysis of a sample of n individuals randomly drawn from the population can be obtained by substitution of appropriate effective population size into Ne. This is the exact way used by Nei and Takahata[14]. Nei and Takahata[14] pointed out that the effective population size of the whole population in this case could be calculated by Wright[23] , i.e.
, 百拇医药
    Ne=LN/(1-Gst) (1)

    where N is the effective subpopulation size and Gst is the average inbreeding coefficient between random gametes within subdivision relative to gametes of the total population. Following the method used by Wright[23], it can be shown that equation (1)still holds for haploid genes[24].

    Thus, once the expression of Gst for plant populations is obtained, the effective population size for each of the three plant genomes can be calculated according to equation (1). Derivation of Gst for the finite island model
, 百拇医药
    is shown to be Gst=1/ where is 2N and N for biparentally inherited diploid genes and uniparentally inherited haploid genes, respectively, is mutation rate, and is mp/2+ms for diploid inherited genes, mp+ms for paternally inherited haploid genes, and ms for maternally inherited haploid genes, and ms and mp refer to migration rates of seed and pollen in any subpopulation, respectively. Following the consideration similar to Nei and Takahata[14], the effective population size is then (2)
, http://www.100md.com
    Therefore, the mean coalescent time and its variance, and the expected number of segregating sites and its variance for each of the three plant genomes can be immediately obtained by substituting equation (2)into those existing results. For example, the mean coalescent time and the mean of segregating sites are (3) (4)

    respectively, where is B for biparentally inherited nuclear genes, P for paternally inherited genes and μM for maternally inherited genes. In particular, if only two genes are randomly sampled from L subpopulations then the value of a is equal to 1, and equation (4) is the same as that obtained by Strobeck[25] for biparentally inherited nuclear genes in the finite island model. Thus the equation (4) provides a general case for sampling n(n>1) individual genes.
, http://www.100md.com
    2.2 Continuously distributed populations

    Barton and Wilson[16,17] presented a method for calculating coalescent time suitable for a continuously distributed population. Because of the difficulties in modelling populations that are continuously distributed, an ideal mathematical model to describe the biological situation is not available[2,23,26]. Both Wright′s isolation by distance model[23] and Malcot′s model[2] can not avoid clumping of population because of lack of regulation of population density[26]. However, the clumping can be avoided by considering the dispersal behaviour of offspring[27]. Nevertheless, in this section we first consider incorporation of seed and pollen dispersal into the results obtained by Barton and Wilson[16,17]. Then the coalescence process is re-nalysed purely based on Wright′s isolation by distance model[23].
, 百拇医药
    A key parameter in calculating the coalescent times is to estimate the neighborhood size (Nb)[16,17]. When applied to plant populations,Nb can be obtained by following Crawford′s calculation[28,29], i.e. 4(σ2p/2+σ2s)d, 2(σ2p2s)d, and 22sd for biparentally, paternally, and maternally inherited genes, respectively. σ2p and σ2s stand for the variance of the distances between parents and offsprings in pollen and seeds in two-imensional space, respectively. Dispersal of both pollen and seed is assumed to follow normal distribution with mean zero and variance of σ2p and σ2s, respectively. d is the effective population density. Suppose that there is randomly mating between pollen and ovules in any neighborhood at each generation. Following Wright′s idea, the neighborhood size at ancestral generation t is the product of t and Nb in two-imensional space, i.e. tNb. These assumptions were used by Barton and Wilson[16] in deriving their equation (11a). Thus, putting these parameters into the formula obtained by Barton and Wilson[16], the coalescent probability of any pair of genes at any generation t in the past can be immediately available.
, 百拇医药
    However, if we purely base on Wright′s isolation by distance model[23], an alternative simple way to calculate the coalescent times can be obtained immediately. In the following we consider biparentally inherited nuclear genes only. For the case of maternally and paternally inherited haploid genes, the following analyses require modification by replacing the half neighborhood size of diploid nuclear genes (2Nb) with that of haploid organelle genes. Let f(t) be the probability of coalescence at generation t in the past. For a sample of n individual genes, according to Wright′s isolation by distance model, the probability for n distinct ancestor at generation k in the past, g(k), can be calculated by (5)
, 百拇医药
    If there is no occurrence of coalescent of any pair of the two genes in the past t-1 generation, but one common ancestor occurs at generation t in the past, the probability of coalescent time, f(t), can be obtained, i.e. (6a) (6b) (6c)

    where If the population size is fixed per generation (Wright-Fisher model), the equation (6a) becomes the standard result of coalescent theory in a completely isolated population. Compared with Barton and Wilson′s[16.17] model, equation (6a,b,c) provides no additional information regarding geographical positions for the sampled genes, but it is the extension of original coalescent theory to plant
, 百拇医药
    population that is continuously distributed in space.

    3 Implication and discussion

    Two obvious implications from the above results can be obtained. Firstly, the theoretical results provide a possibility to address how the mode of inheritance and the seed/pollen flow influence the coalescent process if mutation rates are approximately same in some regions between different genomes and there exists molecular clock[30]. These impacts can be addressed by comparing relative values of the mean coalescent time among different modes of inheritance genes because the expected number of segregation sites is completely decided by the sum of the branch lengths of a gene genealogy[E(Ttot)], i.e. E(S)=E(Ttot)[22]. Since there are asymmetric migration rates and different population sizes, or neighbourhood sizes among the three plant genomes, the expected coalescent times differ from one another.
, 百拇医药
    Denote the mean coalescent times of biparentally, paternally and maternally inherited genes, by E(TB), E(TP) and E(TM), respectively. In the case of populations that are discretely distributed in space, according to equation (4), it can be shown that E(TM)>E(TP) and E(TB)>E(TP). If the condition, i.e. 2NLms(2ms/mp+1)<(1-1/L)2, or roughly 2NLms<1 is satisfied, then we can further obtain E(TM)>E(TB). Therefore, the value of 2NLms is important in affecting the relative coalescence processes between biparentally and maternally inherited genes. Mean coalescent time is shortest for paternal genes among the three genomes, and, given particular conditions, mean coalescent time is longest for maternally inherited genes.
, http://www.100md.com
    However, in the case of a population that is continuously distributed in space, according to equation (6c), the mean coalescent time for biparentally inherited diploid nuclear genes is The expressions can be obtained for the mean coalescent time of paternally and maternally inherited genes. Although the sum of the left杊and side of E(TB) is convergent, it is difficult to obtain a simple relationship among the three plant genomes in terms of the mean coalescent times, which is also the case in Barton and Wilson′s model[16,17].
, http://www.100md.com
    Secondly, results obtained in this paper provide the possibility to estimate the ratio of pollen to seed flow. If the same number of selectively neutral genes or genetic markers randomly drawn from L subpopulations are sequenced among the three plant genomes, thus it is possible to estimate the number of segregating sites[5,31]. Denote the expected total number of segregating sites within subpopulations by E(SB)(=4NBLa), E(Sp)(=2NpLa) and E(SM)(=2NMLa) for biparentally, paternally and maternally inherited genes, respectively. Let their estimates of these parameters be respectively, calculated using DNA sequence data[31]. Therefore, the ratio of mutation rates between two different genomes can be estimated under certain assumptions. For example, the ratio of mutation rate between biparentaly and paternally inherited genes and its variance are (7a) (7b)
, 百拇医药
    Equation (7b) is obtained according to Kendall and Stuart′s formula[32] and use of the independence hypothesis between different genomes.

    Similarly, let the estimates of the expected number of segregating sites among subpopulations be for biparentally, paternally and maternally inherited genes, respectively. According to equation (4), we can obtain (8)
, http://www.100md.com
    If the numbers of sampled individual genes are different among the three plant genomes, it is still possible to estimate of the ratio of pollen to seed flow by modifying equation (8). This result extends those obtained by Hu and Ennos[29] to n(n2) number of genes investigated. Variance of the ratio of pollen to seed flow can also be estimated by using bootstraps method, or using Kendall and Stuart′s formula[32] that is not presented further due to sophisticated calculation.
, 百拇医药
    When applying equation (8) in practice, we need first sampling n(n≥2) individual genes for any two of the three types of inherited genes from different subpopulations. These genes are then sequenced and the total number of segregating sites within and between subpopulations can be calculated using the method introduced by Tajima[31]. After that, relative ratio of pollen to seed flow can be estimated according to equation (8). If the sampling size, n, is large, a series of the ratios of pollen to seed flow can be calculated using those samples generated from the original sample by the way of bootstraps, and thus, the variance of the ratio is obtained.
, 百拇医药
    Currently, a set of universal primers for amplifying polymorphic non-coding regions of mtDNA and cpDNA in plants was reported[33,34]. These primers provide a convenience to obtain the required markers for plant organelle genomes. Use of these genetic markers (selectively neutral markers) to address population structure has already been reported in plant species[35,36]. Therefore, it is possible that the theoretical results obtained in this paper will be applied in practical work in the foreseeable future.
, 百拇医药
    References

    [1]Wright S. Evolution and the Genetics of Populations. The University of Chicago Press, Chicago, The theory of Gene Frequencies 1969, 2:169~210.

    [2]Malcot G. The Mathematics of Heredity. In: Translated by Yermanos D M, Freeman, San Francisco, 1969.

    [3]Harding R M. New phylogenies: an introductory look at the coalescent. In: New Uses for New Phylogenies.Harvey PH, Brown A J L, Smith J M et al. Oxford University Press, 1996, 15~22.
, http://www.100md.com
    [4]Slatkin M. Inbreeding coefficients and coalescence times. Genetic Research, 1991, 58:167~175.

    [5]Watterson G A. On the number of segregating sites in genetical models without recombination. Theoretical Population Biology, 1975, 7:256~276.

    [6]Kingman J F C. On the genealogy of large populations. Journal of Applied Probability, 1982a, 19A:27~43.

    [7]Kingman, J F C. The coalescent. Stochastic Processes and their Applications, 1982b, 13:235~248.
, http://www.100md.com
    [8]Tavar S. Line-of descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology, 1984, 26:119—164.

    [9]Takahata N, Nei M. Gene genealogy and variance of interpopulational nucleotide differences. Genetics, 1985, 110:325~344.

    [10]Takahata N. The coalescent in two partially isolated diffusion populations. Genetic Research, 1988, 52:213~222.
, http://www.100md.com
    [11]Takahata N. Genealogy of neutral genes and spreading of selected mutations in a geographically structured population. Genetics, 1991, 129:585~595.

    [12]Takahata N, Slatkin M. Genealogy of neutral genes in two partially isolated populations. Theoretical Population Biology, 1990, 38:331~350.

    [13]Slatkin M, Maddison W P. Detecting isolation by distance using phylogenies of gene. Genetics, 1990, 126:249~260.
, http://www.100md.com
    [14]Nei M, Takahata N. Effective population size, genetic diversity, and coalescence time in subdivided populations. Journal of Molecular Evolution, 1993, 37:240~244.

    [15]Notohara M. The coalescent and the genealogical process in geographically structured population. Journal of Mathematical Biology, 1990, 29:59~75.

    [16]Barton N H, Wilson I. Genealogies and geography. Philosophical Transactions of the Royal Society of London Series B Biological Sciences, 1995a, 349:49~59.
, 百拇医药
    [17]Barton N H, Wilson I. Genealogies and geography. In New uses for new phylogenies. In: Edited by P.H. Harvey A J, Brown L, Smith J M, Oxford University Press, 1995b, 23~56.

    [18]Petit R J, Kremer A, Wagner D B. Finite island model for organelle and nuclear genes in plants. Heredity,1993, 71:630~641.

    [19]Ennos RA. Estimating the relative rates of pollen and seed migration among plant populations. Heredity, 1994, 72:250~259.
, http://www.100md.com
    [20]Mogensen H L. The hows and whys of cytoplasmic inheritance in seed plants. American Journal of Botany,1996, 83:383~404.

    [21]Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics, 1983, 105:437~460.

    [22]Hudson R R. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology, 1992, 7:1~44.

    [23]Wright S. Isolation by distance. Genetics, 1943, 28:114~138.
, 百拇医药
    [24]Hu X S. Genetic Marker Studies of the Larix gmelinii Complex and the Development of Genetic Marker Theory for Plant populations, (Ph. D. thesis), University of Edinburgh, UK, 1998, 277.

    [25]Strobeck C. Average number of nucleotide difference in a sample from a single subpopulation: A test for population subdivision. Genetics, 1987, 117:149~153.

    [26]Felsenstein J. A pain in the torus: some difficulties with the model of isolation by distance. American Naturalist, 1975, 109:359~368.
, 百拇医药
    [27]Kawata M. Effective population size in a continuously distributed population. Evolution, 1995, 49:1046~1054.

    [28]Crawford T J. The estimation of neighbourhood parameters for plant populations. Heredity, 1984, 52:273~283.

    [29]Hu X S, Ennos R A. On estimation of the ratio of pollen to seed flow among plant populations. Heredity,1997, 79:541~552.

    [30]Kimura M. Molecular evolutionary clock and the neutral theory. Journal of Molecular Evolution, 1987, 26:24~33.
, 百拇医药
    [31]Tajima F. Measurement of DNA polymorphism In: Mechanisms of Molecular Evolution. Ed. by Takahata N, Clark A G. Japan Scientific Societies Press, Sinauer Associate, Inc., 1993, 37~59.

    [32]Kendall M G, Stuart A. The Advanced Theory of

    Statistics. Vol. 1 Distribution Theory. Charles Griffin and Company Limited, London, 1969, 132~133.

    [33]Taberlet P, Gielly L, Pautou G et al. Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Molecular Biology, 1991, 17:1105~1109.
, http://www.100md.com
    [34]Demesure B, Sodzi N, Petit R J. A set of universal primers for amplification of polymorphic non-coding regions of mitochondrial and chloroplast DNA in plants. Molecular Ecology, 1995, 4:129~131.

    [35]Jhnk N, Siegismund H R. Population structure and post-glacial migration routes of Quercus robur and Quercus petraea in Denmark, based on chloroplast DNA analysis. Scandinavian Journal of Forest Research, 1997, 12:130~137.

    [36]Ferris C, Oliver R P, Davy A J et al. Using chloroplast DNA to trace postglacial mogration routes of oaks into Britain. Molecular Ecology, 1995, 4:731~738.

    1998-09-29

    1999-07-12, 百拇医药