Evolutionary Dynamics of Human Retroviruses Investigated Through Full-Genome Scanning
http://www.100md.com
《分子生物学进展》
Rega Institute for Medical Research, KULeuven, Leuven, Belgium
Correspondence: E-mail: philippe.lemey@uz.kuleuven.ac.be.
Abstract
To test hypotheses on the differences in retroviral genetic diversity, we compared the evolutionary dynamics of the human immunodeficiency virus type 1 (HIV-1) group M and the primate T-cell lymphotropic virus (PTLV) using a full-genome analysis. Evolutionary rates and nonsynonymous/synonymous substitution rate ratios were estimated across the genome using a maximum likelihood sliding window approach, and molecular clock properties were investigated. We confirm a remarkable difference in genetic stability and selective pressure at the interhost level. While there is evidence for adaptive evolution in HIV-1, the evolution of PTLV is almost exclusively characterized by negative selection or nearly neutral processes. For both retroviruses, evolutionary rate estimates across the genome reflect the differential selective constraints. However, based on the relationship between evolutionary rate and selective pressure and based on the comparison of synonymous substitution rates, the differences in rate between HIV-1 and PTLV cannot be explained by selective forces only. Several evolutionary and statistical assumptions, examined using a Bayesian coalescent method, were shown to have little influence on our inference.
Key Words: evolutionary rate ? HIV ? HTLV ? natural selection ? retrovirus
Introduction
The retroviruses comprise a variety of enveloped RNA viruses that infect a wide range of animal species and cause a wide spectrum of diseases. The common denominator of the retrovirus family is its replication strategy including the reverse transcription of the RNA genome into double-stranded DNA and the integration of this DNA into the genome of the host cell. The possibility of culturing T-cells in vitro has led to the identification of two human pathogenic retroviruses, an oncovirus called human T-cell lymphotropic virus (HTLV) (Poiesz et al. 1980) and a lentivirus called human immunodeficiency virus (HIV) (Barre-Sinoussi et al. 1983). In all infected patients, HIV causes an acquired immunodeficiency syndrome (AIDS) (Barre-Sinoussi 1996), while HTLV infection can lead to adult T-cell leukemia or tropical spastic paraparesis in a minority of affected individuals (Yoshida, Miyoshi, and Hinuma 1982; Gessain et al. 1985; Osame et al. 1986). Both retroviruses originate from cross-species transmissions from simians to humans (Sharp, Robertson, and Hahn 1995; Vandamme, Salemi, and Desmyter 1998).
Although both viruses have a comparable morphology, life cycle, and genetic structure, their evolutionary strategy is markedly different. In particular, HTLV and HIV differ greatly in the rate at which they are accumulating nucleotide substitutions over time. For example, the genetic variation in envelope sequences of a single HIV-infected patient is greater than the variation among all HTLV-II–infected Amerindians (Pedroza Martins, Chenciner, and Wain-Hobson 1992). This might seem surprising because both viruses have a highly error-prone reverse transcriptase, and thus the capacity of generating considerable sequence diversity. However, HTLV does not seem to exploit this capacity but chiefly maintains its high proviral load through clonal expansion of the HTLV-infected cells (Wattel et al. 1995). Therefore, the cellular DNA polymerase that—in contrast to the reverse transcriptase—possesses a proofreading mechanism, mainly replicates HTLV genomes. Although clonal expansion has been frequently demonstrated in HTLV-infected patients (Wattel et al. 1995; Cavrois et al. 1996; Cavrois et al. 1998; Gabet et al. 2000), great debate still remains on the contribution of reverse transcription in HTLV evolution. A few studies have provided evidence that persistent virion replication plays an important role in maintaining the high proviral load of HTLV-1 (Taylor et al. 1999; Machuca and Soriano 2000). Based on these results and mathematical modeling studies (Wodarz and Bangham 2000), it has been argued that selection forces, like the immune response and the limited availability of appropriate target cells during transmission and persistence, are mainly responsible for the limited sequence diversity (Overbaugh and Bangham 2001).
As a consequence of the remarkable difference in genetic stability, different concepts are used to quantify the evolutionary rate of HIV and HTLV. HIV sequences sampled at different time points usually show a statistically significant accumulation of genetic differences over time (e.g., Shankarappa et al. 1999), a temporal aspect that can be used to estimate evolutionary rates (Drummond et al. 2003). In general, fast evolving viruses, to which this reasoning can be applied, are considered as measurably evolving populations (Drummond et al. 2003). The low fixation rate for HTLV makes serial sampling over several years, or even decades, useless (Gessain, Gallo, and Franchini 1992). Here, the molecular clock needs to be calibrated using a known date for a node in the HTLV phylogeny. Unfortunately, no fossil records are available to accomplish this task for viruses. In the case of primate T-cell lymphotropic viruses (PTLV), researchers have relied upon phylogeographical dispersal patterns (Holmes 2004), and used an anthropological migration date of the human host to calibrate a molecular clock for the viral phylogeny (Yanagihara et al. 1995; Salemi, Desmyter, and Vandamme 2000). In this study, we use a scanning approach to obtain systematic evolutionary rates across the HIV-1 and PTLV genome. Using the same approach we quantify the intensity of selective pressure across the genome based on the nonsynonymous/synonymous substitution rate ratio (dN/dS). For both viruses, we investigate the relationship between evolutionary rate and selective pressure at the interhost level. Based on this relationship, we predict the evolutionary rate under different selective constraints and show that the latter cannot explain the observed difference in sequence diversity between both retroviruses. A Bayesian coalescent approach applied to the HIV data indicated that several strong assumptions of the scanning analysis had little influence on our inference.
Materials and Methods
Sequence Data
Fifty-six full-length coding sequences were selected as representatives for the HIV-1 group M subtypes, in order to obtain a wide and homogenous spread in time with respect to the sampling years. (Accession numbers: AY043175, M62320, AF004885, AF069671, U51190, AF286237, K02007, AJ006287, AF042100, M93258, M38429, U63632, M17449, AF042101, U21135, M17451, AF110959, AF110967, AF290028, AF110969, AF110972, AF110974, AF286231, AF067154, AF067158, AF067157, AB023804, AF286233, AF286234, AF286227, AF286224, U88824, M27323, K03454, U88822, AF005494, AF075703, AJ249238, AJ249236, AJ249237, AF084936, U88826, AF061642, AF190128, AF005496, AF082394, AJ249239, AJ249235, AF286228, U52953.) This data set includes sequences from two Australian transmission chains with known transmission dates and sampling dates: a mother-child (MC) transmission pair, for which the date of transmission was 1983, and a cohort of patients infected between 1982 and 1984 by blood products obtained from the same donor, designated as the Sidney Blood Bank Cohort III (SBBC III) (Lemey et al. 2003).
The HTLV data set contains 29 strains representative for PTLV-1, PTLV-2, and PTLV-3. These sequences include all coding regions in the PTLV genome (gag, protease, polymerase, envelope, and tax). (Accession numbers: AF139170, AF412314, J02029, U19949, L36905, NC_003323, L11456, AF074965, Y13051, X89270, AF033817, L02534, L03561, M10060, L20734, Y07616, NC_001815, U90557, AF042071, AF326583, AF139382, Z46900, M86840, AF074966, AF259264, Y14365, AF326584, M67514, AY590142.)
Phylogenetic Analyses
Sequences were aligned using ClustalW (Thompson, Higgins, and Gibson 1994) and manually edited according to their codon-reading frame in Se-Al (http://evolve.zoo.ox.ac.uk). Appropriate nucleotide substitution models were determined with Modeltest v3.06 based on hierarchical likelihood ratio testing (Posada and Crandall 1998). For both the complete HIV-1 and PTLV alignments, the general time reversible model with gamma distributed rate variation among sites and a proportion of invariant sites were selected. Phylogenetic trees were reconstructed in PAUP*4.0b10 (Swofford 1998), using a maximum likelihood approach: model parameters were estimated on an initial neighbor-joining (NJ) tree, and tree topologies were evaluated using a heuristic search approach that implemented both tree bisection-reconnection and nearest-neighbor interchange perturbations. The reliability of the internal branches in the trees was evaluated using 1,000 NJ bootstrap replicates.
Evolutionary Rates and Molecular Clock Testing
Substitution rates were estimated using maximum likelihood in a sliding window fashion (window size = 801 bp, step size = 81 bp) in nonoverlapping full-genome DNA alignment. For the HIV data set, evolutionary rates and divergence times with 95% confidence intervals (CI) were estimated under the single rate dated tip (SRDT) model developed by Rambaut (2000). For the PTLV data set, the molecular clock was calibrated by setting the divergence time for the HTLV-1c subtype (MEL5), detected exclusively in Melanesia and Australia, and other subtypes at 50,000 years ago. To calculate CIs, we used a time interval (40,000–60,000 years ago) that expresses the uncertainty on the earliest human migration to these islands (Roberts, Jones, and Smith 1990; Cavalli-Sforza, Menozzi, and Piazza 1994; Van Dooren, Salemi, and Vandamme 2001). The molecular clock hypothesis was tested using a likelihood ratio test (LRT). Calculations were performed using the PAML package (Yang 1997) and Rhino (http://evolve.zoo.ox.ac.uk/). A perl script is available on request to generate input files for the sliding window analyses.
dN/dS Scanning and Identification of Positively Selected Sites
The dN/dS was estimated using a maximum likelihood method based on a codon substitution model that assumes a single dN/dS among sites and among lineages (Nielsen and Yang 1998). The same sliding window approach was used as for the evolutionary rate scanning (window size = 801 bp, step size = 81 bp), but branch lengths of the fixed topology were optimized by codeml for each window. Approximate rates of synonymous substitutions/codon site/year were estimated for the HIV-1 data set by dividing the total number of expected synonymous substitutions per codon site for all branches in the tree, estimated using codeml, by the total number of years represented by all branches of the tree, estimated for the complete genome sequences using the SRDT model implemented in baseml. Positively selected sites in the nonoverlapping full-genome DNA alignment were identified under a codon substitution model that allows for variable nonsynonymous/synonymous substitution rate ratios (dN/dS) among sites (Nielsen and Yang 1998; Yang et al. 2000). The LRT was used to evaluate whether an unconstrained discrete distribution to model heterogeneous dN/dS ratios among sites (M3) is significantly better than assuming a single dN/dS ratio for all sites (M0). If the dN/dS ratio for any site class is above 1, the Bayes theorem is used to calculate the posterior probability that each site, given its data, is from such a site class. All calculations were performed using the PAML package (Yang 1997).
Bayesian Estimation of Evolutionary Rates
The HIV-1 data set, partitioned in 10 nonoverlapping windows (window size = 867 bp), was analyzed using a Bayesian coalescent framework for the joint estimation of population parameters, substitution parameters, dates of divergence, and tree topology (Drummond et al. 2003). The program BEAST was used to perform Metropolis-Hastings Markov Chain Monte Carlo (MCMC) sampling that integrates over different coalescent trees (Drummond and Rambaut 2003). An exponential growth model was used as demographic function describing the change in population size over time. Two independent MCMC chains were run for 12.5 x 106 generations sampling every 1,000th generation. The burn-in was set after sampling 106 generations.
Results
Phylogenetic trees were reconstructed for the full-length PTLV and HIV-1 sequences using maximum likelihood (ML) methods (figs. 1 and 2). The root of the PTLV tree was placed on the branch leading to PTLV-1 as suggested in a previous amino acid analyses including a bovine lymphotropic virus strain as an out-group (Salemi, Desmyter, and Vandamme 2000). This root also yielded the highest likelihood under the molecular clock assumption in our analysis (data not shown). For the HIV-1 group M tree, we chose the root that resulted in the highest likelihood under the SRDT model. Both transmission chains (MC and SBBCIII) are represented as a highly supported monophyletic cluster. We assumed both phylogenetic reconstructions to be reliable evolutionary hypotheses and subsequently used them to perform the full-genome scanning.
FIG. 1.— Maximum likelihood phylogenetic tree for 29 full-genome PTLV strains. Types and subtypes of the viral strains are indicated at the tips of the tree. Numbers at the nodes indicate the percentage of bootstrap samples (of 1,000) in which the right cluster is supported (only values >80% are shown). The node used for calibration of the molecular clock is indicated with an arrow.
FIG. 2.— Maximum likelihood phylogenetic tree for 56 full-genome HIV-1 group M strains. Subtypes of the viral strains are indicated at the tips of the tree. Numbers at the nodes indicate the percentage of bootstrap samples (of 1,000) in which the right cluster is supported (only values >80% are shown). The clusters of both transmission chains (MC and SBBCIII) are indicated with rectangles.
Using a sliding window approach, evolutionary rates were estimated across the complete coding genomes using the SRDT model (fig. 3a and b). Not only are the substitution rates of different order of magnitude for both retroviruses but also the variability across the genome shows distinct patterns. For HIV-1, the estimates vary between 4.27 x 10–4 and 2.71 x 10–3 substitutions/site/year with a relatively low evolutionary rate in pol, a high rate in env and accessory genes, and an intermediate rate in gag. These differences are in agreement with expectations of varying selective forces or functional constraints along the HIV genome. PTLV rates were estimated under the single rate (SR) model using an anthropological calibration date. In contrast to HIV-1 rates, the highest PTLV rates are observed in the integrase part of pol (6.64 x 10–7 substitutions/site/year), while the lowest rate is observed in the 3' end of env (2.64 x 10–7). It should be noted that the CIs for evolutionary rates are estimated differently for HIV and PTLV and therefore not directly comparable. While the CIs for PTLV rely upon a normal approximation of the maximum likelihood estimates, more realistic CIs for HIV are estimated by determining the range of evolutionary rates, which we would be unable to reject under an LRT (Rambaut 2000). We have also provided estimates of the synonymous substitution rate per codon site per year across the genome of both retroviruses. These estimates are not based on a clock model applied to each window but on the temporal information in the complete genome alignment (see Materials and Methods). Therefore, the synonymous rate might only have an approximate scaling but its pattern of variability across the genome is still very useful for relative comparisons. Because the synonymous substitutions are expected to be approximately neutral, their substitution rate should not be influenced by selective constraints. As expected, there is no obvious relationship between the variability in nucleotide substitution rates and the variability in synonymous substitution rates across the genome for both HIV-1 and PTLV. For PTLV, there is particular decrease in synonymous substitutions at the border region between pol and env and in tax. The latter can explained by the fact that the tax reading frame partly overlaps with the rex gene.
FIG. 3.— Results of the full-genome scanning for HIV-1 group M and PTLV. (a) HIV-1 evolutionary rates estimated under the SRDT model using a sliding window approach, with a window size of 801 bp and an increment of 81 bp, are plotted in black. CIs were only estimated for every 10th window to reduce computational burden. Synonymous substitution rates per codon sites per year are plotted in gray. (b) PTLV evolutionary rates (in black) with CIs (dotted lines) estimated, using the early migration of Melanesian settlers as calibration, according to the same sliding window approach. The synonymous substitution rate is plotted in gray according to the secondary y axis. (c) Molecular clock scanning for the HIV-1 group M data set: LRT statistics for the SRDT model against the DR model (-?-) and for the SR model against SRDT model (--). The upper and lower horizontal bars represent the 95% confidence limit, according to the 2 distribution, for the test statistic under the null hypothesis in the former and latter comparison, respectively. In addition, maximum likelihood estimates for the MRCA of the MC pair are plotted (--, secondary y axis). (d) Molecular clock scanning for the PTLV data set: LRT statistics for the SR model against the DR model. The horizontal bar represents the 95% confidence limit for the test statistic under the null hypothesis. (e) and (f) Nonsynonymous/synonymous substitution rate ratio scanning for the HIV-1 and PTLV genomes, respectively. The positively selected sites, identified for the HIV-1 data set, are indicated as vertical bars across the genome. The height of the bars represents the posterior probability that the site is from the class of positively selected sites.
Because the evolutionary rates were estimated under the rate constancy assumption, we tested the molecular clock hypothesis in each window using an LRT (fig. 3c and d). For the HIV-1 data, the molecular clock test compares the SRDT model against the different rates (DR) model. Although the molecular clock was significantly rejected for the full-length HIV-1 group M sequences (P < 0.001), there appears to be considerable variability in the LRT statistic along the genome (fig. 3c). There are small regions, most pronounced in pol, where the molecular clock hypothesis cannot be significantly rejected. An LRT comparing the SR model against the DR model indicates that there is also little evidence for rate constancy among lineages in the PTLV phylogeny (fig. 3d). Only in the tax gene the molecular clock could not be significantly rejected.
For the HIV-1 data, we also performed an LRT of the SRDT model against the SR model that makes no accommodation for the temporal sampling of the isolates (fig. 3c). If the SR model is significantly rejected in favor of the SRDT, it follows that incorporating isolation dates into an SR model significantly improves the likelihood in these windows. Although clocklike behavior was hardly observed, incorporating isolation dates can be considered as a significant improvement for the clock model in almost all genes. Only in the protease part of pol, there is a small region where the sequences cannot be considered temporally distinct so that using differences in isolation times to estimate substitution rates and test the molecular clock is unjustified. Simulations have shown that even when the clock is rejected, the confidence limits may sometimes still include the true substitution rate, provided that the variation among lineages is small (Jenkins et al. 2002). Here, we have no knowledge about the true rate, but we do know the time of transmission between the MC pair included in our HIV-1 data set. On the secondary axis, the ML estimates for the most recent common ancestor (MRCA) of the MC pair are plotted along the genome. Because this node was consistently estimated as 1983 in previous analyses (Lemey et al. 2003), in agreement with the transmission date, we considered the absolute difference between the "true" date and the ML estimate as a measure of over- or underestimation. Interestingly, this difference had a weak positive correlation with the likelihood of clock rejection (r = 0.36) and a weak negative correlation with the likelihood of SR rejection in favor of SRDT (r = –0.39). This suggests that the more temporal information the sequences contain and the more clocklike this information, the better our estimates.
We investigated the selective constraints across the genome by estimating the dN/dS ratio using the same sliding window approach (fig. 3e and f). For PTLV, these ratios fluctuate between 0.031 and 0.128; for HIV-1, the values range between 0.129 and 0.723. Although this confirms a difference in selective constraints between both retroviruses, this difference is less pronounced than the difference in evolutionary rates. The dN/dS ratios are considerably higher for HIV-1; however, they do not exceed the threshold for positive selection. If only a few amino acid sites would be positively selected (e.g., Hughes and Nei 1988), an average dN/dS for a gene region is usually not sensitive enough to uncover adaptive evolution at the molecular level (Nielsen and Yang 1998). Therefore, we also tested for positively selected sites in the complete nonoverlapping DNA alignments using codon substitution models that allow the selection regimes to vary across codon sites (Nielsen and Yang 1998; Yang et al. 2000). For both retroviruses, an unconstrained discrete distribution to model heterogeneous dN/dS ratios among sites (M3) fits significantly better than a model that assumes a single dN/dS among sites (M0) (table 1). The parameters, estimated for the three classes in the discrete model, are markedly different between HIV-1 and PTLV. For HIV-1, the discrete model suggests 9% of sites in the complete genome under positive selection with dN/dS = 2.51, while no class of positively selected sites is identified for PTLV. The positively selected sites, identified by an empirical Bayes approach and plotted in figure 3e, are distributed all over the HIV genome with the highest density in env and accessory genes.
Table 1 Log-Likelihood Values and Parameter Estimates for the Codon Substitution Models Applied to the HIV-1 and PTLV Complete Genome Data
For both viral genomes, the dN/dS pattern appears to correlate with the evolutionary rate pattern. Linear regression analysis confirms a significant relationship between evolutionary rate and dN/dS for HIV-1 (P < 0.01, R2 = 0.46) and PTLV (P < 0.01, R2 = 0.58). For HIV-1, the regression markedly improved (R2 = 0.69) when only data points were included for which the MRCA estimate for the MC pair did not deviate more than 5 years from the actual transmission event. Based upon these relationships, we calculated evolutionary rate prediction intervals for dN/dS values between 0 and 1. By plotting these prediction intervals onto the same log-scale (fig. 4), we illustrate that the HIV-1 and PTLV evolutionary rates would consistently be about 3 logs different, independent of the dN/dS value. A similar conclusion can be reached by investigating the synonymous substitution rate for both retroviruses. As expected, there is no strong correlation between the synonymous rate and the dN/dS values (r = –0.08 and r = –0.39 for HIV-1 and PTLV, respectively), and the synonymous rate fluctuates within ranges that are also about 3 logs different between HIV-1 and PTLV.
FIG. 4.— Prediction intervals for the evolutionary rate of HIV-1 and PTLV based on the linear regression between dN/dS and evolutionary rate.
To obtain the relationship between dN/dS and evolutionary rate, we made several strong assumptions. The sliding window analysis was performed using a single phylogenetic tree across the complete genome. This approach not only ignores the error in phylogenetic reconstruction but also assumes a single evolutionary history for all genes. Especially for HIV, the latter might be problematic because recombination is a relatively frequent event in the evolution of this virus (Robertson et al. 1995). Moreover, the sliding window approach results in estimates for overlapping data partitions, thereby violating the data independency assumption in the linear regression model. To assess the impact of these (violated) assumptions, we applied a novel Bayesian coalescent method on nonoverlapping data partitions of the HIV-1 full-genome alignment. For the partitioned genome, we compared a model that assumes a single phylogeny for each locus (linked) with a recently developed multilocus model that accommodates an independent genealogical history for each locus while sharing the same demographic history (unlinked) (Lemey et al. 2004). This Bayesian approach also accommodates for phylogenetic error in each partition and allows estimating appropriate CIs on the parameters. The estimates for the evolutionary rate and date for the MRCA are listed in table 2. Both for the linked and the unlinked model, linear regression analysis indicates a significant relationship between evolutionary rate and dN/dS (P = 0.01), both with the same amount of variance explained by the dN/dS (R2 = 0.61). So, this relationship appears to be robust to some violations in the assumptions of our ML scanning approach. In comparison to the unlinked model, the evolutionary rates are only marginally lower and date for the MRCA is only marginally earlier for the linked model.
Table 2 Estimates of Evolutionary Rates and Dates for the Most Recent Common Ancestor of HIV-1 Group M Using the Bayesian Coalescent Method
Discussion
We present here a comparative analysis of HIV-1 and PTLV evolutionary rates and selective constraints. Although both pathogens have many retroviral features in common, their evolutionary dynamics show remarkable differences. Using a scanning approach we have provided systematic evolutionary rates across the HIV-1 and PTLV genome. The range of the evolutionary rate estimates and the variability across the genome are quantitatively similar to previous studies based on single or multiple genes (Korber et al. 1997; Salemi et al. 2001). It should be noted that the extent of rate variability in sliding window analysis depends on the window size. For example, smaller window sizes might reveal subtler differences in evolutionary rate, but this might also result in a loss of the temporal distinction between the sequences. HIV, with a nucleotide substitution rate ranging from 4.27 x 10–4 to 2.71 x 10–3 substitutions/site/year, has one of the fastest evolving genomes (Wain-Hobson 1993). This lentivirus owes its evolutionary potential to a combination of a high mutation rate (Mansky and Temin 1995; Gao et al. 2004), a short generation time (Ho et al. 1995; Wei et al. 1995), and a large number of infected cells (Buckley et al. 2001). With a range of 2.64 x 10–7 to 6.64 x 10–7 substitutions/site/year, PTLV evolutionary rates are several orders of magnitude lower than HIV-1. PTLV is also subjected to stronger purifying selection than HIV. About 10% of the sites in the HIV genome appear to be positively selected, in agreement with the findings of widespread adaptive evolution in the HIV-1 genome (Yang, Bielawski, and Yang 2003). No class of positively selected sites was inferred for the complete PTLV genome.
The relationship between dN/dS and evolutionary rate we demonstrated is an expected one. However, it forms the basis for further comparative analyses between HIV-1 and PTLV. Extrapolating on this relationship, HIV-1 and PTLV evolutionary rates are about 3 logs different, independent of the dN/dS ratio (fig. 4). A similar conclusion was obtained by comparing the rates of synonymous substitution. Therefore, different selective constraints do not provide an adequate explanation for the observed differences in evolutionary rate. Instead, the reason should most probably be sought in the underlying process by which genetic variation is generated. Differences in mutation rate between HIV (3.5 x 10–5 per base per cycle) and HTLV (7 x 10–6 per base per cycle) are also insufficient to explain the enormous substitution rate difference (Mansky and Temin 1995; Mansky 2000). It has been argued that the number of successive replication cycles is probably more important than mutation rate in establishing viral genetic variation (Coffin 1990). However, HTLV maintains high proviral loads while remaining genetically stable (Wattel et al. 1992; Albrecht et al. 1998; Gabet et al. 2000). This discrepancy has been resolved by the finding of clonal expansion of the infected cells (Wattel et al. 1995). Cell-associated provirus replication makes use of a DNA polymerase with proofreading capacity and generates only limited genetic variation. It has been suggested through a squirrel monkey model that HTLV-1 infection is characterized by a transient phase of reverse transcription followed by the persistent multiplication of infected cells (Mortreux et al. 2001). However, the exact contribution of replication through reverse transcription has yet to be elucidated. Recent findings suggest an important role for persistent virion replication in maintaining high proviral loads, and other factors are limiting genetic diversity for HTLV (Taylor et al. 1999; Wodarz and Bangham 2000; Overbaugh and Bangham 2001). For example, cells that start to express the transactivator protein Tax after infection are likely to be killed by a Tax-specific cytotoxic T lymphocyte (CTL) response before completing the viral replication cycle (Bangham 2000; Hanon et al. 2000). Such mechanisms are not selective constraints in an evolutionary sense because they act irrespective of the phenotype of newly generated variants (except if this would be specific CTL escape mutant in tax). Therefore, our analysis using dN/dS ratios is not able to distinguish between such constraints and predominant clonal expansion. The difference in natural selection between HIV-1 and PTLV most probably results from a different impact of the host immune system. It is well known that HIV successfully fixes mutations to evade immune responses (generated by neutralizing antibodies, T-helper cells, and CTL). HTLV is able to transform cells and spreads through cell-to-cell contact, suggesting a limited exposure to selection pressure exerted by antibodies (Bangham 2003). However, HTLV is persistently transcribed, and there is a strong CTL response to HTLV-1 with tax as the dominant target antigen (Kannagi et al. 1991). Niewiesk et al. (1995) showed that CTL selection favored the emergence of variant Tax sequences. The latter, however, appeared defective in their transactivating activity (Niewiesk et al. 1995). This suggests that functional constraints, and thus purifying selection, might not allow for significant immune escape. However, immune escape for HTLV infection needs to be further investigated.
We are aware that this analysis compares viral populations with a distinct epidemiological and demographic history. HIV-1 group M originated through a relatively recent cross-species transmission of simian immunodeficiency virus from chimpanzees to humans (Gao et al. 1999; Korber et al. 2000; Salemi et al. 2001), resulting in an explosive spread in the human population. PTLV viruses have frequently crossed the species barrier between humans and simians (Vandamme, Salemi, and Desmyter 1998), and the contemporary strains are the result of evolution during a considerably longer time span (Salemi, Desmyter, and Vandamme 2000; Van Dooren, Salemi, and Vandamme 2001). Due to the genetic stability of HTLV, we have chosen to analyze a comprehensive data set including HTLV-1, HTLV-2, and interspersed simian T-cell lymphotropic virus (STLV) sequences. A calibration date for a node in the phylogeny was provided by anthropological information (Yanagihara et al. 1995; Salemi, Desmyter, and Vandamme 2000). HIV sequences sampled at different time points usually have a statistical significant accumulation of genetic differences over time, which allows estimating the rate of molecular evolution (Drummond et al. 2003). The PTLV and HIV-1 date sets inevitably represent very different scales of evolution. While the time to the most recent common ancestor (TMRCA) is around 70 years for HIV-1 group M (Korber et al. 2000), the TMRCA for the PTLV phylogeny is about 4 orders of magnitude larger (Salemi 2000). Therefore, we also attempted to analyze an HTLV-1 subset excluding all simian strains. However, molecular clock estimates were not powerful enough to correlate with dN/dS estimates (data not shown). Crossing the species barrier might also have had its influence on the evolutionary parameters we have inferred for PTLV. However, in the light of recent findings it seems plausible that effect of different hosts on the evolution of the virus is subtler than the differences we observe between PTLV and HIV-1. Gabet, Gessain, and Wattel (2003) have shown that, as for HTLV-1, STLV-1 combines extremely high proviral loads with inter- and intra-animal genetic stability. Moreover, the same paradoxical combination for this simian oncovirus could also be explained by the demonstration of clonal expansion in vivo (Gabet, Gessain, and Wattel 2003).
The sliding window approach estimated evolutionary parameters under a single-tree topology. However, frequent recombination might result in different phylogenies along the HIV-1 genome. Moreover, due to overlapping data in the sliding window analyses, the windows cannot be considered as completely independent and we might be too confident in the relation between evolutionary rate and dN/dS. To address this, we also estimated evolutionary rates using a Bayesian coalescent method that allows comparing linked or unlinked evolutionary histories among nonoverlapping partitions of the HIV-1 genome. Although the discrete model of unlinked evolutionary histories will not fully accommodate for recombination, our comparison is at least expected to indicate a possible bias of assuming a single evolutionary history. As in the sliding window analysis, these rates were also significantly correlated with the dN/dS values. Thus for HIV-1, this relationship appears to be robust to some of our statistical model assumptions. Interestingly, the date for the MRCA of HIV-1 group M (1929, CI: 1920–1938) is in perfect agreement with previous estimates (Korber et al. 2000; Salemi et al. 2001), and, considering the CIs, this estimate is only marginally earlier than the MRCAs for the single loci (table 2). Previous simulations studies have suggested that assuming a single evolutionary history will result in an overestimation of the time to the MRCA when recombination has significantly shaped the sequence data (Schierup and Forsberg 2001; Worobey 2001). Our findings, suggest that this effect of recombination can be noticeable when estimating rates and dates for HIV sequences, but it might be less severe than expected. A full discussion of estimates under the unlinked model compared to simulation results is available elsewhere (Lemey et al. 2004). In conclusion, our scanning approach can reveal the relationship between selective pressure and evolutionary rate, which provides useful information on the evolutionary dynamics of viral populations.
Acknowledgements
This work was supported by the Flemish Fonds voor Wetenschappelijk Onderzoek (FWO G.0288.01); P.L. was supported by the Flemish Institute for Promotion and Innovation through Science and Technology in Flanders (IWT-Vlaanderen).
References
Albrecht, B., N. D. Collins, G. C. Newbound, L. Ratner, and M. D. Lairmore. 1998. Quantification of human T-cell lymphotropic virus type 1 proviral load by quantitative competitive polymerase chain reaction. J. Virol. Methods 75:123–140.
Bangham, C. R. 2000. The immune response to HTLV-I. Curr. Opin. Immunol. 12:397–402.
——— 2003. The immune control and cell-to-cell spread of human T cell lymphotropic virus type 1. J. Gen. Virol. 84:3177–3189.
Barre-Sinoussi, F. 1996. HIV as the cause of AIDS. Lancet 348:31–35.
Barre-Sinoussi, F., J. C. Chermann, F. Rey et al. (12 co-authors). 1983. Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science 220:868–871.
Buckley, T. R., C. Simon, H. Shimodaira, and G. K. Chambers. 2001. Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. Mol. Biol. Evol. 18:223–234.
Cavalli-Sforza, L., L. Menozzi, and A. Piazza. 1994. The history and geography of human genes. Princeton University Press, Princeton, N.J.
Cavrois, M., A. Gessain, S. Wain-Hobson, and E. Wattel. 1996. Proliferation of HTLV-1 infected circulating cells in vivo in all asymptomatic carriers and patients with TSP/HAM. Oncogene 12:2419–2423.
Cavrois, M., I. Leclercq, O. Gout, A. Gessain, S. Wain-Hobson, and E. Wattel. 1998. Persistent oligoclonal expansion of human T-cell leukemia virus type 1-infected circulating cells in patients with tropical spastic paraparesis/HTLV-1 associated myelopathy. Oncogene 17:77–82.
Coffin, J. M. 1990. Genetic variation in avian retroviruses. Dev. Biol. Stand. 72:123–132.
Drummond, A. J., G. K. Nicholls, A. G. Rodrigo, and W. Solomon. 2002. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307–1320.
Drummond, A., and A. Rambaut. 2003. BEAST v1.0. (http://evolve.zoo.ox.ac.uk/beast/).
Drummond, A. J., O. G. Pybus, A. Rambaut, R. Forsberg, and A. G. Rodrigo. 2003. Measurably evolving populations. Trends Ecol. Evol. 18:481–488.
Gabet, A. S., A. Gessain, and E. Wattel. 2003. High simian T-cell leukemia virus type 1 proviral loads combined with genetic stability as a result of cell-associated provirus replication in naturally infected, asymptomatic monkeys. Int. J. Cancer 107:74–83.
Gabet, A. S., F. Mortreux, A. Talarmin, Y. Plumelle, I. Leclercq, A. Leroy, A. Gessain, E. Clity, M. Joubert, and E. Wattel. 2000. High circulating proviral load with oligoclonal expansion of HTLV-1 bearing T cells in HTLV-1 carriers with strongyloidiasis. Oncogene 19:4954–4960.
Gao, F., E. Bailes, D. L. Robertson et al. (12 co-authors). 1999. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436–441.
Gao, F., Y. Chen, D. N. Levy, J. A. Conway, T. B. Kepler, and H. Hui. 2004. Unselected mutations in the human immunodeficiency virus type 1 genome are mostly nonsynonymous and often deleterious. J. Virol. 78:2426–2433.
Gessain, A., F. Barin, J. C. Vernant, O. Gout, L. Maurs, A. Calender, and G. de The. 1985. Antibodies to human T-lymphotropic virus type-I in patients with tropical spastic paraparesis. Lancet 2:407–410.
Gessain, A., R. C. Gallo, and G. Franchini. 1992. Low degree of human T-cell leukemia/lymphoma virus type I genetic drift in vivo as a means of monitoring viral transmission and movement of ancient human populations. J. Virol. 66:2288–2295.
Hanon, E., S. Hall, G. P. Taylor, M. Saito, R. Davis, Y. Tanaka, K. Usuku, M. Osame, J. N. Weber, and C. R. Bangham. 2000. Abundant tax protein expression in CD4+ T cells infected with human T-cell lymphotropic virus type I (HTLV-I) is prevented by cytotoxic T lymphocytes. Blood 95:1386–1392.
Ho, D. D., A. U. Neumann, A. S. Perelson, W. Chen, J. M. Leonard, and M. Markowitz. 1995. Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373:123–126.
Holmes, E. C. 2004. The phylogeography of human viruses. Mol. Ecol. 13:745–756.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167–170.
Jenkins, G. M., A. Rambaut, O. G. Pybus, and E. C. Holmes. 2002. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J. Mol. Evol. 54:156–165.
Kannagi, M., S. Harada, I. Maruyama, H. Inoko, H. Igarashi, G. Kuwashim, S. Sato, M. Morita, M. Kidokoro, and M. Sugimoto. 1991. Predominant recognition of human T cell leukemia virus type I (HTLV-I) pX gene products by human CD8+ cytotoxic T cells directed against HTLV-I-infected cells. Int. Immunol. 3:761–767.
Korber, B., I. Loussert-Ajaka, J. Blouin, and S. Saragosti. 1997. A comparison of HIV-1 group M and group O functional and immunogenic domains in the gag p24 protein and the C2V3 region of the envelope protein. Theoretical and Biophysical Group, Los Alamos National Laboratory, Los Alamos, N.M. (Part IV):63–79.
Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288:1789–1796.
Lemey, P., O. G. Pybus, A. Rambaut, A. J. Drummond, D. L. Robertson, P. Roques, M. Worobey, and A. M. Vandamme. 2004. The molecular population genetics of HIV-1 group O. Genetics 167:1059–1068.
Lemey, P., M. Salemi, B. Wang, M. Duffy, W. H. Hall, N. K. Saksena, and A. M. Vandamme. 2003. Site stripping based on likelihood ratio reduction is a useful tool to evaluate the impact of non-clock-like behavior on viral phylogenetic reconstructions. FEMS Immunol. Med. Microbiol. 39:125–132.
Machuca, A., and V. Soriano. 2000. In vivo fluctuation of HTLV-I and HTLV-II proviral load in patients receiving antiretroviral drugs. J. Acquir. Immune Defic. Syndr. 24:189–193.
Mansky, L. M. 2000. In vivo analysis of human T-cell leukemia virus type 1 reverse transcription accuracy. J. Virol. 74:9525–9531.
Mansky, L. M., and H. M. Temin. 1995. Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J. Virol. 69:5087–5094.
Mortreux, F., M. Kazanji, A. S. Gabet, B. de Thoisy, and E. Wattel. 2001. Two-step nature of human T-cell leukemia virus type 1 replication in experimentally infected squirrel monkeys (Saimiri sciureus). J. Virol. 75:1083–1089.
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936.
Niewiesk, S., S. Daenke, C. E. Parker, G. Taylor, J. Weber, S. Nightingale, and C. R. Bangham. 1995. Naturally occurring variants of human T-cell leukemia virus type I Tax protein impair its recognition by cytotoxic T lymphocytes and the transactivation function of Tax. J. Virol. 69:2649–2653.
Osame, M., K. Usuku, S. Izumo, N. Ijichi, H. Amitani, A. Igata, M. Matsumoto, and M. Tara. 1986. HTLV-I associated myelopathy, a new clinical entity. Lancet 1:1031–1032.
Overbaugh, J., and C. R. Bangham. 2001. Selection forces and constraints on retroviral sequence variation. Science 292:1106–1109.
Pedroza Martins, L., N. Chenciner, and S. Wain-Hobson. 1992. Complex intrapatient sequence variation in the V1 and V2 hypervariable regions of the HIV-1 gp 120 envelope sequence. Virology 191:837–845.
Poiesz, B. J., A. F. Ruscetti, P. A. Gazdar, P. A. Bunn, J. A. Minna, and R. C. Gallo. 1980. Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma. Proc. Natl. Acad. Sci. USA 77:7415–7419.
Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818.
Rambaut, A. 2000. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16:395–399.
Roberts, R. G., R. Jones, and M. A. Smith. 1990. Report of thermoluminescence dates supporting the arrival of people between 50 and 60 kya in southern Australia. Nature 345:153.
Robertson, D. L., P. M. Sharp, F. E. McCutchan, and B. H. Hahn. 1995. Recombination in HIV-1. Nature 374:124–126.
Salemi, M., J. Desmyter, and A. M. Vandamme. 2000. Tempo and mode of human and simian T-lymphotropic virus (HTLV/STLV) evolution revealed by analyses of full-genome sequences. Mol. Biol. Evol. 17:374–386.
Salemi, M., K. Strimmer, W. W. Hall, M. Duffy, E. Delaporte, S. Mboup, M. Peeters, and A. M. Vandamme. 2001. Dating the common ancestor of SIVcpz and HIV-1 group M and the origin of HIV-1 subtypes using a new method to uncover clock-like molecular evolution. FASEB J. 15:276–278.
Schierup, M. H., and R. Forsberg. 2001. Recombination and phylogenetic analysis of HIV-1. Pp. 231–245 in Origins of HIV and emerging persistent viruses, Rome.
Shankarappa, R., J. B. Margolick, S. J. Gange et al. (12 co-authors). 1999. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73:10489–10502.
Sharp, P. M., D. L. Robertson, and B. H. Hahn. 1995. Cross-species transmission and recombination of ‘AIDS’ viruses. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 349:41–47.
Swofford, D. L. 1998. PAUP* 4.0—phylogenetic analysis using parsimony (*and other methods). Sinauer Assoc., Sunderland, Mass.
Taylor, G. P., S. E. Hall, S. Navarrete et al. (12 co-authors). 1999. Effect of lamivudine on human T-cell leukemia virus type 1 (HTLV-1) DNA copy number, T-cell phenotype, and anti-tax cytotoxic T-cell frequency in patients with HTLV-1-associated myelopathy. J. Virol. 73:10289–10295.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
Van Dooren, S., M. Salemi, and A. M. Vandamme. 2001. Dating the origin of the African human T-cell lymphotropic virus type-i (HTLV-I) subtypes. Mol. Biol. Evol. 18:661–671.
Vandamme, A. M., M. Salemi, and J. Desmyter. 1998. The simian origins of the pathogenic human T-cell lymphotropic virus type I. Trends Microbiol. 6:477–483.
Wain-Hobson, S. 1993. The fastest genome evolution ever described: HIV variation in situ. Curr. Opin. Genet. Dev. 3:878–883.
Wattel, E., M. Mariotti, F. Agis, E. Gordien, F. F. Le Coeur, L. Prin, P. Rouger, I. S. Chen, S. Wain-Hobson, and J. J. Lefrere. 1992. Quantification of HTLV-1 proviral copy number in peripheral blood of symptomless carriers from the French West Indies. J. Acquir. Immune Defic. Syndr. 5:943–946.
Wattel, E., J. P. Vartanian, C. Pannetier, and H. Wain. 1995. Clonal expansion of human T-cell leukemia virus type I-infected cells in asymptomatic and symptomatic carriers without malignancy. J. Virol. 69:2863–2868.
Wei, X., S. K. Ghosh, M. E. Taylor et al. (12 co-authors). 1995. Viral dynamics in human immunodeficiency virus type 1 infection. Nature 373:117–122.
Wodarz, D., and C. R. Bangham. 2000. Evolutionary dynamics of HTLV-I. J. Mol. Evol. 50:448–455.
Worobey, M. 2001. A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol. Biol. Evol. 18:1425–1434.
Yanagihara, R., N. Saitou, V. R. Nerurkar, K. J. Song, I. Bastian, G. Franchini, and D. C. Gajdusek. 1995. Molecular phylogeny and dissemination of human T-cell lymphotropic virus type I viewed within the context of primate evolution and human migration. Cell. Mol. Biol. 41(Suppl. 1):S145–S161.
Yang, W., J. P. Bielawski, and Z. Yang. 2003. Widespread adaptive evolution in the human immunodeficiency virus type 1 genome. J. Mol. Evol. 57:212–221.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556.
Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.
Yoshida, M., I. Miyoshi, and Y. Hinuma. 1982. Isolation and characterization of retrovirus from cell lines of human adult T-cell leukemia and its implication in the disease. Proc. Natl. Acad. Sci. USA 79:2031–2035.(Philippe Lemey, Sonia Van)
Correspondence: E-mail: philippe.lemey@uz.kuleuven.ac.be.
Abstract
To test hypotheses on the differences in retroviral genetic diversity, we compared the evolutionary dynamics of the human immunodeficiency virus type 1 (HIV-1) group M and the primate T-cell lymphotropic virus (PTLV) using a full-genome analysis. Evolutionary rates and nonsynonymous/synonymous substitution rate ratios were estimated across the genome using a maximum likelihood sliding window approach, and molecular clock properties were investigated. We confirm a remarkable difference in genetic stability and selective pressure at the interhost level. While there is evidence for adaptive evolution in HIV-1, the evolution of PTLV is almost exclusively characterized by negative selection or nearly neutral processes. For both retroviruses, evolutionary rate estimates across the genome reflect the differential selective constraints. However, based on the relationship between evolutionary rate and selective pressure and based on the comparison of synonymous substitution rates, the differences in rate between HIV-1 and PTLV cannot be explained by selective forces only. Several evolutionary and statistical assumptions, examined using a Bayesian coalescent method, were shown to have little influence on our inference.
Key Words: evolutionary rate ? HIV ? HTLV ? natural selection ? retrovirus
Introduction
The retroviruses comprise a variety of enveloped RNA viruses that infect a wide range of animal species and cause a wide spectrum of diseases. The common denominator of the retrovirus family is its replication strategy including the reverse transcription of the RNA genome into double-stranded DNA and the integration of this DNA into the genome of the host cell. The possibility of culturing T-cells in vitro has led to the identification of two human pathogenic retroviruses, an oncovirus called human T-cell lymphotropic virus (HTLV) (Poiesz et al. 1980) and a lentivirus called human immunodeficiency virus (HIV) (Barre-Sinoussi et al. 1983). In all infected patients, HIV causes an acquired immunodeficiency syndrome (AIDS) (Barre-Sinoussi 1996), while HTLV infection can lead to adult T-cell leukemia or tropical spastic paraparesis in a minority of affected individuals (Yoshida, Miyoshi, and Hinuma 1982; Gessain et al. 1985; Osame et al. 1986). Both retroviruses originate from cross-species transmissions from simians to humans (Sharp, Robertson, and Hahn 1995; Vandamme, Salemi, and Desmyter 1998).
Although both viruses have a comparable morphology, life cycle, and genetic structure, their evolutionary strategy is markedly different. In particular, HTLV and HIV differ greatly in the rate at which they are accumulating nucleotide substitutions over time. For example, the genetic variation in envelope sequences of a single HIV-infected patient is greater than the variation among all HTLV-II–infected Amerindians (Pedroza Martins, Chenciner, and Wain-Hobson 1992). This might seem surprising because both viruses have a highly error-prone reverse transcriptase, and thus the capacity of generating considerable sequence diversity. However, HTLV does not seem to exploit this capacity but chiefly maintains its high proviral load through clonal expansion of the HTLV-infected cells (Wattel et al. 1995). Therefore, the cellular DNA polymerase that—in contrast to the reverse transcriptase—possesses a proofreading mechanism, mainly replicates HTLV genomes. Although clonal expansion has been frequently demonstrated in HTLV-infected patients (Wattel et al. 1995; Cavrois et al. 1996; Cavrois et al. 1998; Gabet et al. 2000), great debate still remains on the contribution of reverse transcription in HTLV evolution. A few studies have provided evidence that persistent virion replication plays an important role in maintaining the high proviral load of HTLV-1 (Taylor et al. 1999; Machuca and Soriano 2000). Based on these results and mathematical modeling studies (Wodarz and Bangham 2000), it has been argued that selection forces, like the immune response and the limited availability of appropriate target cells during transmission and persistence, are mainly responsible for the limited sequence diversity (Overbaugh and Bangham 2001).
As a consequence of the remarkable difference in genetic stability, different concepts are used to quantify the evolutionary rate of HIV and HTLV. HIV sequences sampled at different time points usually show a statistically significant accumulation of genetic differences over time (e.g., Shankarappa et al. 1999), a temporal aspect that can be used to estimate evolutionary rates (Drummond et al. 2003). In general, fast evolving viruses, to which this reasoning can be applied, are considered as measurably evolving populations (Drummond et al. 2003). The low fixation rate for HTLV makes serial sampling over several years, or even decades, useless (Gessain, Gallo, and Franchini 1992). Here, the molecular clock needs to be calibrated using a known date for a node in the HTLV phylogeny. Unfortunately, no fossil records are available to accomplish this task for viruses. In the case of primate T-cell lymphotropic viruses (PTLV), researchers have relied upon phylogeographical dispersal patterns (Holmes 2004), and used an anthropological migration date of the human host to calibrate a molecular clock for the viral phylogeny (Yanagihara et al. 1995; Salemi, Desmyter, and Vandamme 2000). In this study, we use a scanning approach to obtain systematic evolutionary rates across the HIV-1 and PTLV genome. Using the same approach we quantify the intensity of selective pressure across the genome based on the nonsynonymous/synonymous substitution rate ratio (dN/dS). For both viruses, we investigate the relationship between evolutionary rate and selective pressure at the interhost level. Based on this relationship, we predict the evolutionary rate under different selective constraints and show that the latter cannot explain the observed difference in sequence diversity between both retroviruses. A Bayesian coalescent approach applied to the HIV data indicated that several strong assumptions of the scanning analysis had little influence on our inference.
Materials and Methods
Sequence Data
Fifty-six full-length coding sequences were selected as representatives for the HIV-1 group M subtypes, in order to obtain a wide and homogenous spread in time with respect to the sampling years. (Accession numbers: AY043175, M62320, AF004885, AF069671, U51190, AF286237, K02007, AJ006287, AF042100, M93258, M38429, U63632, M17449, AF042101, U21135, M17451, AF110959, AF110967, AF290028, AF110969, AF110972, AF110974, AF286231, AF067154, AF067158, AF067157, AB023804, AF286233, AF286234, AF286227, AF286224, U88824, M27323, K03454, U88822, AF005494, AF075703, AJ249238, AJ249236, AJ249237, AF084936, U88826, AF061642, AF190128, AF005496, AF082394, AJ249239, AJ249235, AF286228, U52953.) This data set includes sequences from two Australian transmission chains with known transmission dates and sampling dates: a mother-child (MC) transmission pair, for which the date of transmission was 1983, and a cohort of patients infected between 1982 and 1984 by blood products obtained from the same donor, designated as the Sidney Blood Bank Cohort III (SBBC III) (Lemey et al. 2003).
The HTLV data set contains 29 strains representative for PTLV-1, PTLV-2, and PTLV-3. These sequences include all coding regions in the PTLV genome (gag, protease, polymerase, envelope, and tax). (Accession numbers: AF139170, AF412314, J02029, U19949, L36905, NC_003323, L11456, AF074965, Y13051, X89270, AF033817, L02534, L03561, M10060, L20734, Y07616, NC_001815, U90557, AF042071, AF326583, AF139382, Z46900, M86840, AF074966, AF259264, Y14365, AF326584, M67514, AY590142.)
Phylogenetic Analyses
Sequences were aligned using ClustalW (Thompson, Higgins, and Gibson 1994) and manually edited according to their codon-reading frame in Se-Al (http://evolve.zoo.ox.ac.uk). Appropriate nucleotide substitution models were determined with Modeltest v3.06 based on hierarchical likelihood ratio testing (Posada and Crandall 1998). For both the complete HIV-1 and PTLV alignments, the general time reversible model with gamma distributed rate variation among sites and a proportion of invariant sites were selected. Phylogenetic trees were reconstructed in PAUP*4.0b10 (Swofford 1998), using a maximum likelihood approach: model parameters were estimated on an initial neighbor-joining (NJ) tree, and tree topologies were evaluated using a heuristic search approach that implemented both tree bisection-reconnection and nearest-neighbor interchange perturbations. The reliability of the internal branches in the trees was evaluated using 1,000 NJ bootstrap replicates.
Evolutionary Rates and Molecular Clock Testing
Substitution rates were estimated using maximum likelihood in a sliding window fashion (window size = 801 bp, step size = 81 bp) in nonoverlapping full-genome DNA alignment. For the HIV data set, evolutionary rates and divergence times with 95% confidence intervals (CI) were estimated under the single rate dated tip (SRDT) model developed by Rambaut (2000). For the PTLV data set, the molecular clock was calibrated by setting the divergence time for the HTLV-1c subtype (MEL5), detected exclusively in Melanesia and Australia, and other subtypes at 50,000 years ago. To calculate CIs, we used a time interval (40,000–60,000 years ago) that expresses the uncertainty on the earliest human migration to these islands (Roberts, Jones, and Smith 1990; Cavalli-Sforza, Menozzi, and Piazza 1994; Van Dooren, Salemi, and Vandamme 2001). The molecular clock hypothesis was tested using a likelihood ratio test (LRT). Calculations were performed using the PAML package (Yang 1997) and Rhino (http://evolve.zoo.ox.ac.uk/). A perl script is available on request to generate input files for the sliding window analyses.
dN/dS Scanning and Identification of Positively Selected Sites
The dN/dS was estimated using a maximum likelihood method based on a codon substitution model that assumes a single dN/dS among sites and among lineages (Nielsen and Yang 1998). The same sliding window approach was used as for the evolutionary rate scanning (window size = 801 bp, step size = 81 bp), but branch lengths of the fixed topology were optimized by codeml for each window. Approximate rates of synonymous substitutions/codon site/year were estimated for the HIV-1 data set by dividing the total number of expected synonymous substitutions per codon site for all branches in the tree, estimated using codeml, by the total number of years represented by all branches of the tree, estimated for the complete genome sequences using the SRDT model implemented in baseml. Positively selected sites in the nonoverlapping full-genome DNA alignment were identified under a codon substitution model that allows for variable nonsynonymous/synonymous substitution rate ratios (dN/dS) among sites (Nielsen and Yang 1998; Yang et al. 2000). The LRT was used to evaluate whether an unconstrained discrete distribution to model heterogeneous dN/dS ratios among sites (M3) is significantly better than assuming a single dN/dS ratio for all sites (M0). If the dN/dS ratio for any site class is above 1, the Bayes theorem is used to calculate the posterior probability that each site, given its data, is from such a site class. All calculations were performed using the PAML package (Yang 1997).
Bayesian Estimation of Evolutionary Rates
The HIV-1 data set, partitioned in 10 nonoverlapping windows (window size = 867 bp), was analyzed using a Bayesian coalescent framework for the joint estimation of population parameters, substitution parameters, dates of divergence, and tree topology (Drummond et al. 2003). The program BEAST was used to perform Metropolis-Hastings Markov Chain Monte Carlo (MCMC) sampling that integrates over different coalescent trees (Drummond and Rambaut 2003). An exponential growth model was used as demographic function describing the change in population size over time. Two independent MCMC chains were run for 12.5 x 106 generations sampling every 1,000th generation. The burn-in was set after sampling 106 generations.
Results
Phylogenetic trees were reconstructed for the full-length PTLV and HIV-1 sequences using maximum likelihood (ML) methods (figs. 1 and 2). The root of the PTLV tree was placed on the branch leading to PTLV-1 as suggested in a previous amino acid analyses including a bovine lymphotropic virus strain as an out-group (Salemi, Desmyter, and Vandamme 2000). This root also yielded the highest likelihood under the molecular clock assumption in our analysis (data not shown). For the HIV-1 group M tree, we chose the root that resulted in the highest likelihood under the SRDT model. Both transmission chains (MC and SBBCIII) are represented as a highly supported monophyletic cluster. We assumed both phylogenetic reconstructions to be reliable evolutionary hypotheses and subsequently used them to perform the full-genome scanning.
FIG. 1.— Maximum likelihood phylogenetic tree for 29 full-genome PTLV strains. Types and subtypes of the viral strains are indicated at the tips of the tree. Numbers at the nodes indicate the percentage of bootstrap samples (of 1,000) in which the right cluster is supported (only values >80% are shown). The node used for calibration of the molecular clock is indicated with an arrow.
FIG. 2.— Maximum likelihood phylogenetic tree for 56 full-genome HIV-1 group M strains. Subtypes of the viral strains are indicated at the tips of the tree. Numbers at the nodes indicate the percentage of bootstrap samples (of 1,000) in which the right cluster is supported (only values >80% are shown). The clusters of both transmission chains (MC and SBBCIII) are indicated with rectangles.
Using a sliding window approach, evolutionary rates were estimated across the complete coding genomes using the SRDT model (fig. 3a and b). Not only are the substitution rates of different order of magnitude for both retroviruses but also the variability across the genome shows distinct patterns. For HIV-1, the estimates vary between 4.27 x 10–4 and 2.71 x 10–3 substitutions/site/year with a relatively low evolutionary rate in pol, a high rate in env and accessory genes, and an intermediate rate in gag. These differences are in agreement with expectations of varying selective forces or functional constraints along the HIV genome. PTLV rates were estimated under the single rate (SR) model using an anthropological calibration date. In contrast to HIV-1 rates, the highest PTLV rates are observed in the integrase part of pol (6.64 x 10–7 substitutions/site/year), while the lowest rate is observed in the 3' end of env (2.64 x 10–7). It should be noted that the CIs for evolutionary rates are estimated differently for HIV and PTLV and therefore not directly comparable. While the CIs for PTLV rely upon a normal approximation of the maximum likelihood estimates, more realistic CIs for HIV are estimated by determining the range of evolutionary rates, which we would be unable to reject under an LRT (Rambaut 2000). We have also provided estimates of the synonymous substitution rate per codon site per year across the genome of both retroviruses. These estimates are not based on a clock model applied to each window but on the temporal information in the complete genome alignment (see Materials and Methods). Therefore, the synonymous rate might only have an approximate scaling but its pattern of variability across the genome is still very useful for relative comparisons. Because the synonymous substitutions are expected to be approximately neutral, their substitution rate should not be influenced by selective constraints. As expected, there is no obvious relationship between the variability in nucleotide substitution rates and the variability in synonymous substitution rates across the genome for both HIV-1 and PTLV. For PTLV, there is particular decrease in synonymous substitutions at the border region between pol and env and in tax. The latter can explained by the fact that the tax reading frame partly overlaps with the rex gene.
FIG. 3.— Results of the full-genome scanning for HIV-1 group M and PTLV. (a) HIV-1 evolutionary rates estimated under the SRDT model using a sliding window approach, with a window size of 801 bp and an increment of 81 bp, are plotted in black. CIs were only estimated for every 10th window to reduce computational burden. Synonymous substitution rates per codon sites per year are plotted in gray. (b) PTLV evolutionary rates (in black) with CIs (dotted lines) estimated, using the early migration of Melanesian settlers as calibration, according to the same sliding window approach. The synonymous substitution rate is plotted in gray according to the secondary y axis. (c) Molecular clock scanning for the HIV-1 group M data set: LRT statistics for the SRDT model against the DR model (-?-) and for the SR model against SRDT model (--). The upper and lower horizontal bars represent the 95% confidence limit, according to the 2 distribution, for the test statistic under the null hypothesis in the former and latter comparison, respectively. In addition, maximum likelihood estimates for the MRCA of the MC pair are plotted (--, secondary y axis). (d) Molecular clock scanning for the PTLV data set: LRT statistics for the SR model against the DR model. The horizontal bar represents the 95% confidence limit for the test statistic under the null hypothesis. (e) and (f) Nonsynonymous/synonymous substitution rate ratio scanning for the HIV-1 and PTLV genomes, respectively. The positively selected sites, identified for the HIV-1 data set, are indicated as vertical bars across the genome. The height of the bars represents the posterior probability that the site is from the class of positively selected sites.
Because the evolutionary rates were estimated under the rate constancy assumption, we tested the molecular clock hypothesis in each window using an LRT (fig. 3c and d). For the HIV-1 data, the molecular clock test compares the SRDT model against the different rates (DR) model. Although the molecular clock was significantly rejected for the full-length HIV-1 group M sequences (P < 0.001), there appears to be considerable variability in the LRT statistic along the genome (fig. 3c). There are small regions, most pronounced in pol, where the molecular clock hypothesis cannot be significantly rejected. An LRT comparing the SR model against the DR model indicates that there is also little evidence for rate constancy among lineages in the PTLV phylogeny (fig. 3d). Only in the tax gene the molecular clock could not be significantly rejected.
For the HIV-1 data, we also performed an LRT of the SRDT model against the SR model that makes no accommodation for the temporal sampling of the isolates (fig. 3c). If the SR model is significantly rejected in favor of the SRDT, it follows that incorporating isolation dates into an SR model significantly improves the likelihood in these windows. Although clocklike behavior was hardly observed, incorporating isolation dates can be considered as a significant improvement for the clock model in almost all genes. Only in the protease part of pol, there is a small region where the sequences cannot be considered temporally distinct so that using differences in isolation times to estimate substitution rates and test the molecular clock is unjustified. Simulations have shown that even when the clock is rejected, the confidence limits may sometimes still include the true substitution rate, provided that the variation among lineages is small (Jenkins et al. 2002). Here, we have no knowledge about the true rate, but we do know the time of transmission between the MC pair included in our HIV-1 data set. On the secondary axis, the ML estimates for the most recent common ancestor (MRCA) of the MC pair are plotted along the genome. Because this node was consistently estimated as 1983 in previous analyses (Lemey et al. 2003), in agreement with the transmission date, we considered the absolute difference between the "true" date and the ML estimate as a measure of over- or underestimation. Interestingly, this difference had a weak positive correlation with the likelihood of clock rejection (r = 0.36) and a weak negative correlation with the likelihood of SR rejection in favor of SRDT (r = –0.39). This suggests that the more temporal information the sequences contain and the more clocklike this information, the better our estimates.
We investigated the selective constraints across the genome by estimating the dN/dS ratio using the same sliding window approach (fig. 3e and f). For PTLV, these ratios fluctuate between 0.031 and 0.128; for HIV-1, the values range between 0.129 and 0.723. Although this confirms a difference in selective constraints between both retroviruses, this difference is less pronounced than the difference in evolutionary rates. The dN/dS ratios are considerably higher for HIV-1; however, they do not exceed the threshold for positive selection. If only a few amino acid sites would be positively selected (e.g., Hughes and Nei 1988), an average dN/dS for a gene region is usually not sensitive enough to uncover adaptive evolution at the molecular level (Nielsen and Yang 1998). Therefore, we also tested for positively selected sites in the complete nonoverlapping DNA alignments using codon substitution models that allow the selection regimes to vary across codon sites (Nielsen and Yang 1998; Yang et al. 2000). For both retroviruses, an unconstrained discrete distribution to model heterogeneous dN/dS ratios among sites (M3) fits significantly better than a model that assumes a single dN/dS among sites (M0) (table 1). The parameters, estimated for the three classes in the discrete model, are markedly different between HIV-1 and PTLV. For HIV-1, the discrete model suggests 9% of sites in the complete genome under positive selection with dN/dS = 2.51, while no class of positively selected sites is identified for PTLV. The positively selected sites, identified by an empirical Bayes approach and plotted in figure 3e, are distributed all over the HIV genome with the highest density in env and accessory genes.
Table 1 Log-Likelihood Values and Parameter Estimates for the Codon Substitution Models Applied to the HIV-1 and PTLV Complete Genome Data
For both viral genomes, the dN/dS pattern appears to correlate with the evolutionary rate pattern. Linear regression analysis confirms a significant relationship between evolutionary rate and dN/dS for HIV-1 (P < 0.01, R2 = 0.46) and PTLV (P < 0.01, R2 = 0.58). For HIV-1, the regression markedly improved (R2 = 0.69) when only data points were included for which the MRCA estimate for the MC pair did not deviate more than 5 years from the actual transmission event. Based upon these relationships, we calculated evolutionary rate prediction intervals for dN/dS values between 0 and 1. By plotting these prediction intervals onto the same log-scale (fig. 4), we illustrate that the HIV-1 and PTLV evolutionary rates would consistently be about 3 logs different, independent of the dN/dS value. A similar conclusion can be reached by investigating the synonymous substitution rate for both retroviruses. As expected, there is no strong correlation between the synonymous rate and the dN/dS values (r = –0.08 and r = –0.39 for HIV-1 and PTLV, respectively), and the synonymous rate fluctuates within ranges that are also about 3 logs different between HIV-1 and PTLV.
FIG. 4.— Prediction intervals for the evolutionary rate of HIV-1 and PTLV based on the linear regression between dN/dS and evolutionary rate.
To obtain the relationship between dN/dS and evolutionary rate, we made several strong assumptions. The sliding window analysis was performed using a single phylogenetic tree across the complete genome. This approach not only ignores the error in phylogenetic reconstruction but also assumes a single evolutionary history for all genes. Especially for HIV, the latter might be problematic because recombination is a relatively frequent event in the evolution of this virus (Robertson et al. 1995). Moreover, the sliding window approach results in estimates for overlapping data partitions, thereby violating the data independency assumption in the linear regression model. To assess the impact of these (violated) assumptions, we applied a novel Bayesian coalescent method on nonoverlapping data partitions of the HIV-1 full-genome alignment. For the partitioned genome, we compared a model that assumes a single phylogeny for each locus (linked) with a recently developed multilocus model that accommodates an independent genealogical history for each locus while sharing the same demographic history (unlinked) (Lemey et al. 2004). This Bayesian approach also accommodates for phylogenetic error in each partition and allows estimating appropriate CIs on the parameters. The estimates for the evolutionary rate and date for the MRCA are listed in table 2. Both for the linked and the unlinked model, linear regression analysis indicates a significant relationship between evolutionary rate and dN/dS (P = 0.01), both with the same amount of variance explained by the dN/dS (R2 = 0.61). So, this relationship appears to be robust to some violations in the assumptions of our ML scanning approach. In comparison to the unlinked model, the evolutionary rates are only marginally lower and date for the MRCA is only marginally earlier for the linked model.
Table 2 Estimates of Evolutionary Rates and Dates for the Most Recent Common Ancestor of HIV-1 Group M Using the Bayesian Coalescent Method
Discussion
We present here a comparative analysis of HIV-1 and PTLV evolutionary rates and selective constraints. Although both pathogens have many retroviral features in common, their evolutionary dynamics show remarkable differences. Using a scanning approach we have provided systematic evolutionary rates across the HIV-1 and PTLV genome. The range of the evolutionary rate estimates and the variability across the genome are quantitatively similar to previous studies based on single or multiple genes (Korber et al. 1997; Salemi et al. 2001). It should be noted that the extent of rate variability in sliding window analysis depends on the window size. For example, smaller window sizes might reveal subtler differences in evolutionary rate, but this might also result in a loss of the temporal distinction between the sequences. HIV, with a nucleotide substitution rate ranging from 4.27 x 10–4 to 2.71 x 10–3 substitutions/site/year, has one of the fastest evolving genomes (Wain-Hobson 1993). This lentivirus owes its evolutionary potential to a combination of a high mutation rate (Mansky and Temin 1995; Gao et al. 2004), a short generation time (Ho et al. 1995; Wei et al. 1995), and a large number of infected cells (Buckley et al. 2001). With a range of 2.64 x 10–7 to 6.64 x 10–7 substitutions/site/year, PTLV evolutionary rates are several orders of magnitude lower than HIV-1. PTLV is also subjected to stronger purifying selection than HIV. About 10% of the sites in the HIV genome appear to be positively selected, in agreement with the findings of widespread adaptive evolution in the HIV-1 genome (Yang, Bielawski, and Yang 2003). No class of positively selected sites was inferred for the complete PTLV genome.
The relationship between dN/dS and evolutionary rate we demonstrated is an expected one. However, it forms the basis for further comparative analyses between HIV-1 and PTLV. Extrapolating on this relationship, HIV-1 and PTLV evolutionary rates are about 3 logs different, independent of the dN/dS ratio (fig. 4). A similar conclusion was obtained by comparing the rates of synonymous substitution. Therefore, different selective constraints do not provide an adequate explanation for the observed differences in evolutionary rate. Instead, the reason should most probably be sought in the underlying process by which genetic variation is generated. Differences in mutation rate between HIV (3.5 x 10–5 per base per cycle) and HTLV (7 x 10–6 per base per cycle) are also insufficient to explain the enormous substitution rate difference (Mansky and Temin 1995; Mansky 2000). It has been argued that the number of successive replication cycles is probably more important than mutation rate in establishing viral genetic variation (Coffin 1990). However, HTLV maintains high proviral loads while remaining genetically stable (Wattel et al. 1992; Albrecht et al. 1998; Gabet et al. 2000). This discrepancy has been resolved by the finding of clonal expansion of the infected cells (Wattel et al. 1995). Cell-associated provirus replication makes use of a DNA polymerase with proofreading capacity and generates only limited genetic variation. It has been suggested through a squirrel monkey model that HTLV-1 infection is characterized by a transient phase of reverse transcription followed by the persistent multiplication of infected cells (Mortreux et al. 2001). However, the exact contribution of replication through reverse transcription has yet to be elucidated. Recent findings suggest an important role for persistent virion replication in maintaining high proviral loads, and other factors are limiting genetic diversity for HTLV (Taylor et al. 1999; Wodarz and Bangham 2000; Overbaugh and Bangham 2001). For example, cells that start to express the transactivator protein Tax after infection are likely to be killed by a Tax-specific cytotoxic T lymphocyte (CTL) response before completing the viral replication cycle (Bangham 2000; Hanon et al. 2000). Such mechanisms are not selective constraints in an evolutionary sense because they act irrespective of the phenotype of newly generated variants (except if this would be specific CTL escape mutant in tax). Therefore, our analysis using dN/dS ratios is not able to distinguish between such constraints and predominant clonal expansion. The difference in natural selection between HIV-1 and PTLV most probably results from a different impact of the host immune system. It is well known that HIV successfully fixes mutations to evade immune responses (generated by neutralizing antibodies, T-helper cells, and CTL). HTLV is able to transform cells and spreads through cell-to-cell contact, suggesting a limited exposure to selection pressure exerted by antibodies (Bangham 2003). However, HTLV is persistently transcribed, and there is a strong CTL response to HTLV-1 with tax as the dominant target antigen (Kannagi et al. 1991). Niewiesk et al. (1995) showed that CTL selection favored the emergence of variant Tax sequences. The latter, however, appeared defective in their transactivating activity (Niewiesk et al. 1995). This suggests that functional constraints, and thus purifying selection, might not allow for significant immune escape. However, immune escape for HTLV infection needs to be further investigated.
We are aware that this analysis compares viral populations with a distinct epidemiological and demographic history. HIV-1 group M originated through a relatively recent cross-species transmission of simian immunodeficiency virus from chimpanzees to humans (Gao et al. 1999; Korber et al. 2000; Salemi et al. 2001), resulting in an explosive spread in the human population. PTLV viruses have frequently crossed the species barrier between humans and simians (Vandamme, Salemi, and Desmyter 1998), and the contemporary strains are the result of evolution during a considerably longer time span (Salemi, Desmyter, and Vandamme 2000; Van Dooren, Salemi, and Vandamme 2001). Due to the genetic stability of HTLV, we have chosen to analyze a comprehensive data set including HTLV-1, HTLV-2, and interspersed simian T-cell lymphotropic virus (STLV) sequences. A calibration date for a node in the phylogeny was provided by anthropological information (Yanagihara et al. 1995; Salemi, Desmyter, and Vandamme 2000). HIV sequences sampled at different time points usually have a statistical significant accumulation of genetic differences over time, which allows estimating the rate of molecular evolution (Drummond et al. 2003). The PTLV and HIV-1 date sets inevitably represent very different scales of evolution. While the time to the most recent common ancestor (TMRCA) is around 70 years for HIV-1 group M (Korber et al. 2000), the TMRCA for the PTLV phylogeny is about 4 orders of magnitude larger (Salemi 2000). Therefore, we also attempted to analyze an HTLV-1 subset excluding all simian strains. However, molecular clock estimates were not powerful enough to correlate with dN/dS estimates (data not shown). Crossing the species barrier might also have had its influence on the evolutionary parameters we have inferred for PTLV. However, in the light of recent findings it seems plausible that effect of different hosts on the evolution of the virus is subtler than the differences we observe between PTLV and HIV-1. Gabet, Gessain, and Wattel (2003) have shown that, as for HTLV-1, STLV-1 combines extremely high proviral loads with inter- and intra-animal genetic stability. Moreover, the same paradoxical combination for this simian oncovirus could also be explained by the demonstration of clonal expansion in vivo (Gabet, Gessain, and Wattel 2003).
The sliding window approach estimated evolutionary parameters under a single-tree topology. However, frequent recombination might result in different phylogenies along the HIV-1 genome. Moreover, due to overlapping data in the sliding window analyses, the windows cannot be considered as completely independent and we might be too confident in the relation between evolutionary rate and dN/dS. To address this, we also estimated evolutionary rates using a Bayesian coalescent method that allows comparing linked or unlinked evolutionary histories among nonoverlapping partitions of the HIV-1 genome. Although the discrete model of unlinked evolutionary histories will not fully accommodate for recombination, our comparison is at least expected to indicate a possible bias of assuming a single evolutionary history. As in the sliding window analysis, these rates were also significantly correlated with the dN/dS values. Thus for HIV-1, this relationship appears to be robust to some of our statistical model assumptions. Interestingly, the date for the MRCA of HIV-1 group M (1929, CI: 1920–1938) is in perfect agreement with previous estimates (Korber et al. 2000; Salemi et al. 2001), and, considering the CIs, this estimate is only marginally earlier than the MRCAs for the single loci (table 2). Previous simulations studies have suggested that assuming a single evolutionary history will result in an overestimation of the time to the MRCA when recombination has significantly shaped the sequence data (Schierup and Forsberg 2001; Worobey 2001). Our findings, suggest that this effect of recombination can be noticeable when estimating rates and dates for HIV sequences, but it might be less severe than expected. A full discussion of estimates under the unlinked model compared to simulation results is available elsewhere (Lemey et al. 2004). In conclusion, our scanning approach can reveal the relationship between selective pressure and evolutionary rate, which provides useful information on the evolutionary dynamics of viral populations.
Acknowledgements
This work was supported by the Flemish Fonds voor Wetenschappelijk Onderzoek (FWO G.0288.01); P.L. was supported by the Flemish Institute for Promotion and Innovation through Science and Technology in Flanders (IWT-Vlaanderen).
References
Albrecht, B., N. D. Collins, G. C. Newbound, L. Ratner, and M. D. Lairmore. 1998. Quantification of human T-cell lymphotropic virus type 1 proviral load by quantitative competitive polymerase chain reaction. J. Virol. Methods 75:123–140.
Bangham, C. R. 2000. The immune response to HTLV-I. Curr. Opin. Immunol. 12:397–402.
——— 2003. The immune control and cell-to-cell spread of human T cell lymphotropic virus type 1. J. Gen. Virol. 84:3177–3189.
Barre-Sinoussi, F. 1996. HIV as the cause of AIDS. Lancet 348:31–35.
Barre-Sinoussi, F., J. C. Chermann, F. Rey et al. (12 co-authors). 1983. Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science 220:868–871.
Buckley, T. R., C. Simon, H. Shimodaira, and G. K. Chambers. 2001. Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. Mol. Biol. Evol. 18:223–234.
Cavalli-Sforza, L., L. Menozzi, and A. Piazza. 1994. The history and geography of human genes. Princeton University Press, Princeton, N.J.
Cavrois, M., A. Gessain, S. Wain-Hobson, and E. Wattel. 1996. Proliferation of HTLV-1 infected circulating cells in vivo in all asymptomatic carriers and patients with TSP/HAM. Oncogene 12:2419–2423.
Cavrois, M., I. Leclercq, O. Gout, A. Gessain, S. Wain-Hobson, and E. Wattel. 1998. Persistent oligoclonal expansion of human T-cell leukemia virus type 1-infected circulating cells in patients with tropical spastic paraparesis/HTLV-1 associated myelopathy. Oncogene 17:77–82.
Coffin, J. M. 1990. Genetic variation in avian retroviruses. Dev. Biol. Stand. 72:123–132.
Drummond, A. J., G. K. Nicholls, A. G. Rodrigo, and W. Solomon. 2002. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307–1320.
Drummond, A., and A. Rambaut. 2003. BEAST v1.0. (http://evolve.zoo.ox.ac.uk/beast/).
Drummond, A. J., O. G. Pybus, A. Rambaut, R. Forsberg, and A. G. Rodrigo. 2003. Measurably evolving populations. Trends Ecol. Evol. 18:481–488.
Gabet, A. S., A. Gessain, and E. Wattel. 2003. High simian T-cell leukemia virus type 1 proviral loads combined with genetic stability as a result of cell-associated provirus replication in naturally infected, asymptomatic monkeys. Int. J. Cancer 107:74–83.
Gabet, A. S., F. Mortreux, A. Talarmin, Y. Plumelle, I. Leclercq, A. Leroy, A. Gessain, E. Clity, M. Joubert, and E. Wattel. 2000. High circulating proviral load with oligoclonal expansion of HTLV-1 bearing T cells in HTLV-1 carriers with strongyloidiasis. Oncogene 19:4954–4960.
Gao, F., E. Bailes, D. L. Robertson et al. (12 co-authors). 1999. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436–441.
Gao, F., Y. Chen, D. N. Levy, J. A. Conway, T. B. Kepler, and H. Hui. 2004. Unselected mutations in the human immunodeficiency virus type 1 genome are mostly nonsynonymous and often deleterious. J. Virol. 78:2426–2433.
Gessain, A., F. Barin, J. C. Vernant, O. Gout, L. Maurs, A. Calender, and G. de The. 1985. Antibodies to human T-lymphotropic virus type-I in patients with tropical spastic paraparesis. Lancet 2:407–410.
Gessain, A., R. C. Gallo, and G. Franchini. 1992. Low degree of human T-cell leukemia/lymphoma virus type I genetic drift in vivo as a means of monitoring viral transmission and movement of ancient human populations. J. Virol. 66:2288–2295.
Hanon, E., S. Hall, G. P. Taylor, M. Saito, R. Davis, Y. Tanaka, K. Usuku, M. Osame, J. N. Weber, and C. R. Bangham. 2000. Abundant tax protein expression in CD4+ T cells infected with human T-cell lymphotropic virus type I (HTLV-I) is prevented by cytotoxic T lymphocytes. Blood 95:1386–1392.
Ho, D. D., A. U. Neumann, A. S. Perelson, W. Chen, J. M. Leonard, and M. Markowitz. 1995. Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373:123–126.
Holmes, E. C. 2004. The phylogeography of human viruses. Mol. Ecol. 13:745–756.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167–170.
Jenkins, G. M., A. Rambaut, O. G. Pybus, and E. C. Holmes. 2002. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J. Mol. Evol. 54:156–165.
Kannagi, M., S. Harada, I. Maruyama, H. Inoko, H. Igarashi, G. Kuwashim, S. Sato, M. Morita, M. Kidokoro, and M. Sugimoto. 1991. Predominant recognition of human T cell leukemia virus type I (HTLV-I) pX gene products by human CD8+ cytotoxic T cells directed against HTLV-I-infected cells. Int. Immunol. 3:761–767.
Korber, B., I. Loussert-Ajaka, J. Blouin, and S. Saragosti. 1997. A comparison of HIV-1 group M and group O functional and immunogenic domains in the gag p24 protein and the C2V3 region of the envelope protein. Theoretical and Biophysical Group, Los Alamos National Laboratory, Los Alamos, N.M. (Part IV):63–79.
Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288:1789–1796.
Lemey, P., O. G. Pybus, A. Rambaut, A. J. Drummond, D. L. Robertson, P. Roques, M. Worobey, and A. M. Vandamme. 2004. The molecular population genetics of HIV-1 group O. Genetics 167:1059–1068.
Lemey, P., M. Salemi, B. Wang, M. Duffy, W. H. Hall, N. K. Saksena, and A. M. Vandamme. 2003. Site stripping based on likelihood ratio reduction is a useful tool to evaluate the impact of non-clock-like behavior on viral phylogenetic reconstructions. FEMS Immunol. Med. Microbiol. 39:125–132.
Machuca, A., and V. Soriano. 2000. In vivo fluctuation of HTLV-I and HTLV-II proviral load in patients receiving antiretroviral drugs. J. Acquir. Immune Defic. Syndr. 24:189–193.
Mansky, L. M. 2000. In vivo analysis of human T-cell leukemia virus type 1 reverse transcription accuracy. J. Virol. 74:9525–9531.
Mansky, L. M., and H. M. Temin. 1995. Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J. Virol. 69:5087–5094.
Mortreux, F., M. Kazanji, A. S. Gabet, B. de Thoisy, and E. Wattel. 2001. Two-step nature of human T-cell leukemia virus type 1 replication in experimentally infected squirrel monkeys (Saimiri sciureus). J. Virol. 75:1083–1089.
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936.
Niewiesk, S., S. Daenke, C. E. Parker, G. Taylor, J. Weber, S. Nightingale, and C. R. Bangham. 1995. Naturally occurring variants of human T-cell leukemia virus type I Tax protein impair its recognition by cytotoxic T lymphocytes and the transactivation function of Tax. J. Virol. 69:2649–2653.
Osame, M., K. Usuku, S. Izumo, N. Ijichi, H. Amitani, A. Igata, M. Matsumoto, and M. Tara. 1986. HTLV-I associated myelopathy, a new clinical entity. Lancet 1:1031–1032.
Overbaugh, J., and C. R. Bangham. 2001. Selection forces and constraints on retroviral sequence variation. Science 292:1106–1109.
Pedroza Martins, L., N. Chenciner, and S. Wain-Hobson. 1992. Complex intrapatient sequence variation in the V1 and V2 hypervariable regions of the HIV-1 gp 120 envelope sequence. Virology 191:837–845.
Poiesz, B. J., A. F. Ruscetti, P. A. Gazdar, P. A. Bunn, J. A. Minna, and R. C. Gallo. 1980. Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma. Proc. Natl. Acad. Sci. USA 77:7415–7419.
Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818.
Rambaut, A. 2000. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16:395–399.
Roberts, R. G., R. Jones, and M. A. Smith. 1990. Report of thermoluminescence dates supporting the arrival of people between 50 and 60 kya in southern Australia. Nature 345:153.
Robertson, D. L., P. M. Sharp, F. E. McCutchan, and B. H. Hahn. 1995. Recombination in HIV-1. Nature 374:124–126.
Salemi, M., J. Desmyter, and A. M. Vandamme. 2000. Tempo and mode of human and simian T-lymphotropic virus (HTLV/STLV) evolution revealed by analyses of full-genome sequences. Mol. Biol. Evol. 17:374–386.
Salemi, M., K. Strimmer, W. W. Hall, M. Duffy, E. Delaporte, S. Mboup, M. Peeters, and A. M. Vandamme. 2001. Dating the common ancestor of SIVcpz and HIV-1 group M and the origin of HIV-1 subtypes using a new method to uncover clock-like molecular evolution. FASEB J. 15:276–278.
Schierup, M. H., and R. Forsberg. 2001. Recombination and phylogenetic analysis of HIV-1. Pp. 231–245 in Origins of HIV and emerging persistent viruses, Rome.
Shankarappa, R., J. B. Margolick, S. J. Gange et al. (12 co-authors). 1999. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73:10489–10502.
Sharp, P. M., D. L. Robertson, and B. H. Hahn. 1995. Cross-species transmission and recombination of ‘AIDS’ viruses. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 349:41–47.
Swofford, D. L. 1998. PAUP* 4.0—phylogenetic analysis using parsimony (*and other methods). Sinauer Assoc., Sunderland, Mass.
Taylor, G. P., S. E. Hall, S. Navarrete et al. (12 co-authors). 1999. Effect of lamivudine on human T-cell leukemia virus type 1 (HTLV-1) DNA copy number, T-cell phenotype, and anti-tax cytotoxic T-cell frequency in patients with HTLV-1-associated myelopathy. J. Virol. 73:10289–10295.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
Van Dooren, S., M. Salemi, and A. M. Vandamme. 2001. Dating the origin of the African human T-cell lymphotropic virus type-i (HTLV-I) subtypes. Mol. Biol. Evol. 18:661–671.
Vandamme, A. M., M. Salemi, and J. Desmyter. 1998. The simian origins of the pathogenic human T-cell lymphotropic virus type I. Trends Microbiol. 6:477–483.
Wain-Hobson, S. 1993. The fastest genome evolution ever described: HIV variation in situ. Curr. Opin. Genet. Dev. 3:878–883.
Wattel, E., M. Mariotti, F. Agis, E. Gordien, F. F. Le Coeur, L. Prin, P. Rouger, I. S. Chen, S. Wain-Hobson, and J. J. Lefrere. 1992. Quantification of HTLV-1 proviral copy number in peripheral blood of symptomless carriers from the French West Indies. J. Acquir. Immune Defic. Syndr. 5:943–946.
Wattel, E., J. P. Vartanian, C. Pannetier, and H. Wain. 1995. Clonal expansion of human T-cell leukemia virus type I-infected cells in asymptomatic and symptomatic carriers without malignancy. J. Virol. 69:2863–2868.
Wei, X., S. K. Ghosh, M. E. Taylor et al. (12 co-authors). 1995. Viral dynamics in human immunodeficiency virus type 1 infection. Nature 373:117–122.
Wodarz, D., and C. R. Bangham. 2000. Evolutionary dynamics of HTLV-I. J. Mol. Evol. 50:448–455.
Worobey, M. 2001. A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol. Biol. Evol. 18:1425–1434.
Yanagihara, R., N. Saitou, V. R. Nerurkar, K. J. Song, I. Bastian, G. Franchini, and D. C. Gajdusek. 1995. Molecular phylogeny and dissemination of human T-cell lymphotropic virus type I viewed within the context of primate evolution and human migration. Cell. Mol. Biol. 41(Suppl. 1):S145–S161.
Yang, W., J. P. Bielawski, and Z. Yang. 2003. Widespread adaptive evolution in the human immunodeficiency virus type 1 genome. J. Mol. Evol. 57:212–221.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556.
Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.
Yoshida, M., I. Miyoshi, and Y. Hinuma. 1982. Isolation and characterization of retrovirus from cell lines of human adult T-cell leukemia and its implication in the disease. Proc. Natl. Acad. Sci. USA 79:2031–2035.(Philippe Lemey, Sonia Van)