Covarion Shifts Cause a Long-Branch Attraction Artifact That Unites Microsporidia and Archaebacteria in EF-1 Phylogenies
http://www.100md.com
分子生物学进展 2004年第7期
* Program in Evolutionary Biology, Canadian Institute for Advanced Research and Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
Department of Bioscience, Nagahama Institute of Bioscience and Technology, 1266 Tamura, Nagahama, Shiga 526-0829, Japan
Department ofMathematics and Statistics and Genome Atlantic, Dalhousie University, Halifax, Nova Scotia, Canada
Departmentof Botany, University of British Columbia, Vancouver, British Columbia, Canada
E-mail: yinagai@dal.ca.
Abstract
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1 (EF-1) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial (outgroup) sequences ("long-branch attraction," or "LBA"). However, no previous studies have explicitly determined the reason(s) why the artifactual allegiance of microsporidia and archaebacteria ("M + A") is recovered by all phylogenetic methods, including maximum likelihood, a method that is supposed to be resistant to classical LBA. Here we show that the M + A affinity can be attributed to those alignment sites associated with large differences in evolutionary site rates between the eukaryotic and archaebacterial subtrees. Therefore, failure to model the significant evolutionary rate distribution differences (covarion shifts) between the ingroup and outgroup sequences is apparently responsible for the artifactual basal position of microsporidia in phylogenetic analyses of EF-1 sequences. Currently, no evolutionary model that accounts for discrete changes in the site rate distribution on particular branches is available for either protein or nucleotide level phylogenetic analysis, so the same artifacts may affect many other "deep" phylogenies. Furthermore, given the relative similarity of the site rate patterns of microsporidian and archaebacterial EF-1 proteins ("parallel site rate variation"), we suggest that the microsporidian orthologs may have lost some eukaryotic EF-1–specific nontranslational functions, exemplifying the extreme degree of reduction in this parasitic lineage.
Key Words: phylogenetic artifact ? long-branch attraction ? EF-1 ? microsporidia ? covarion model ? parallel site rate variation
Introduction
Microsporidia are a monophyletic assemblage of obligate intracellular parasites that generally infect animals (particularly arthropods and fish). These unicellular eukaryotes lack classical mitochondria, peroxisomes and Golgi apparatus at the ultrastructural level (reviewed in Keeling and Fast [2002]). At the molecular level, these organisms appear to possess unusual ribosomes and ribosomal RNA (rRNA). The sedimentation coefficient of microsporidian ribosomes is more similar to that of prokaryotic (i.e., eubacterial or archaebacterial) 70S ribosomes than to that of typical eukaryotic 80S ribosomes (Ishihara and Hayashi 1968). Fusion of the large subunit (LSU) and 5.8S rRNAs is another trait shared between microsporidia and prokaryotes (Vossbrinck and Woese 1986). Furthermore, microsporidia were robustly estimated to be one of the basal lineages among eukaryotes in the first molecular phylogenies that include microsporidia, those of small subunit (SSU) rRNA sequences (Vossbrinck et al. 1987). This basal position was recovered in later SSU phylogenies regardless of the tree reconstruction method, model of nucleotide substitution, or taxonomic sampling (e.g., Cavalier-Smith and Chao [1996] and Kumar and Rzhetsky [1996]). Importantly, similar results were reproduced by early phylogenetic studies based on the amino acid sequences of translation elongation factors 1 and 2 (EF-1 and EF-2) (Kamaishi et al. 1996a, 1996b). Taken together, these results appeared to suggest that microsporidia were one of the earliest offshoots in eukaryotic evolution, diverging from the main trunk of eukaryotic lineages before the mitochondrial endosymbiosis.
However, in the past decade, this evolutionary scenario has drastically changed. Nuclear genes encoding proteins of mitochondrial origin were found in microsporidia, strongly suggesting the secondary loss of typical mitochondria in this lineage (Germot, Philippe, and Le Guyader 1997; Hirt et al. 1997; Peyretaillade et al. 1998; Arisue et al. 2002). Katinka et al. (2001) identified additional genes for mitochondrial proteins in the completed genome of the microsporidian, Encephalitozoon cuniculi. In line with these findings, Williams et al. (2002) recently ultrastructurally identified tiny, relic mitochondria in the microsporidian Trachipleistophora hominis. Secondly, further analyses of protein sequences (except elongation factors) have indicated the evolutionary affinity between microsporidia and fungi (reviewed in Keeling and Fast [2002]). In particular, relatively strong support for a relationship between microsporidia and fungi ("M + F") is recovered in phylogenies based on tubulins (Edlind et al. 1996; Keeling and Doolittle 1996; Keeling, Luker, and Palmer 2000; Keeling 2003), the largest subunit of RNA polymerase II (Hirt et al. 1999), seryl-tRNA synthetase, transcription initiation factor IIB, a subunit of the vacuolar ATPase and a GTPase (Katinka et al. 2001). Multigene phylogenies that include microsporidia also recover the M + F connection with high support values (Baldauf et al. 2000; Hashimoto, Arisue, and Hasegawa 2002). In sum, results accumulating since the late 1990s strongly suggest that microsporidia are fungi or close relatives of fungi. The basal position of microsporidia in SSU rRNA and elongation factor phylogenies has been rationalized as an artifact resulting from "long-branch attraction" (LBA), whereby the fast-evolving microsporidian and the long branch leading to the archaebacterial sequences were artifactually attracted to one another in tree reconstruction (e.g., Embley and Hirt [1998] and Philippe and Laurent [1998]). Under this hypothesis, the relatively simple architecture of microsporidian cells can be interpreted as the result of extensive adaptation to their parasitic lifestyle.
Unlike unweighted maximum-parsimony (MP), maximum-likelihood (ML) methods are generally expected to be statistically consistent if the Markov model of sequence change closely matches reality (for discussion, see Felsenstein [2003]). Simulation studies concur with this result, indicating that ML methods are less susceptible to LBA with finite data than MP or distance methods (e.g., Hasegawa and Fujiwara [1993] and Swofford et al. [2001]). However, in practice, neither infinite data nor perfectly correct models are available, and ML methods can yield biased estimates of phylogenies. For instance, ML methods display sensitivity to LBA when the evolutionary model is violated by the data, such as failure to model among-site rate variation (ASRV) (e.g., Sullivan and Swofford [2001]). Therefore, microsporidian phylogenies estimated under simple models that assume an equal rate of evolution among sites may be severely affected by LBA. Indeed, the EF-1 ML analyses under a more realistic model that takes into account ASRV failed to statistically reject the M + F relationship, although the basal position of microsporidia was still the ML estimate (Hirt et al. 1999; Hashimoto, Arisue, and Hasegawa 2002). A similar trend was observed in concatenated SSU plus LSU rRNA gene phylogenies (Philippe and Germot 2000). Importantly, the LSU rRNA (rooted) distance phylogeny and EF-2 (unrooted) distance and ML phylogenies under an ASRV model actually recovered the M + F relationship, albeit with weak support values (Hirt et al. 1999; Van de Peer, Ben Ali, and Meyer 2000; Keeling and Fast 2002). These results consistently suggest that the fast-evolving microsporidian sequences incorrectly branch at the base of eukaryotes because of LBA.
ASRV models are thought to be essential for phylogenetic analyses of problematic data sets, including fast-evolving sequences (e.g., Philippe and Laurent [1998]). However, these models are obviously insufficient to describe the complex processes of amino acid and nucleotide evolution. One of the crucial processes ignored in ASRV models is across-tree site rate variation (XTSRV), also called "heterotachy" by some authors (Lopez, Casane, and Philippe 2002). Functional and/or structural constraints on a biological molecule can be changed over evolutionary time, so that the rates at sites may vary across the tree (i.e., among the sequences in an alignment). This evolutionary process was first described as the concomitantly variable codon (covarion) model that allows a site to shift between two states, invariable ("off") or variable ("on"), in a protein over time (Fitch and Markowitz 1970). Explicit formulations of Markov models of covarion evolution have only recently been conceived (Tuffley and Steel 1998; Penny et al. 2001). The original covarion and ASRV models have also been merged into "covarion-like" models for both protein and nucleotide evolution that, rather than having simply invariable and variable sites, allow sites to "switch" between a number of different site rates across subtrees (reviewed in Penny and Hasegawa [2001]). The covarionlike models in Tuffley and Steel (1998), Penny et al. (2001), Galtier (2001), Lopez, Casane and Philippe (2002) and Huelsenbeck (2002) assume XTSRV to be a stochastic process varying across both sites and subtrees with frequent changes in rates throughout the larger tree. Although rates do vary across both sites and subtrees in these models, the process is stationary, implying that the marginal distributions of rates in any two subtrees will be the same. This stationarity of the frequency of changes throughout the larger tree does not provide a natural model for biological molecules subjected to a "covarion shift," where many sites experience a shift in evolutionary rates within a short period of time associated with a modification in functional and/or structural properties (Inagaki et al. 2003). The XTSRV in covarion shift models (e.g., modeled by a bivariate rate distribution in Susko et al. [2002]) should allow different marginal distributions in rates for subtrees on either side of the shift, and the rate shifts should occur with lower frequency in evolutionary time than stochastic covarion drift. Although the two types of covarion-like models are intrinsically different, they are not mutually exclusive and both serve to make some adjustment for covarionlike effects.
In this study, we investigated the impact of XTSRV in the EF-1 data set on the phylogenetic position of microsporidia. We divided the data into the archaebacterial and eukaryotic sub-trees, and identified the sites associated with a large difference in evolutionary rates across the subtrees ("SRN" sites) estimated by the bivariate rate ML method (Susko et al. 2002). The difference in log-likelihood (lnL) between the two tree topologies representing the competing hypotheses of (1) microsporidia as an early diverging eukaryotic lineage versus (2) microsporidia as relatives of fungi then was calculated based on the data sets excluding SRN sites. Similar analyses based on data sets excluding random sites (jackknifing) were then conducted to evaluate the significance of SRN site exclusions. Finally, the data sets with and without SRN sites were analyzed using the ML method.
Philippe and Germot (2000) previously investigated the impact of the "covarion structure" on putative LBA artifacts in a concatenated SSU plus LSU rRNA data set using a similar approach to ours. The authors excluded the "covarion" sites identified by a MP-based method (Lopez, Forterre, and Philippe 1999), and tested whether this procedure lowered the support for putative LBA artifacts such as the basal branching of diplomonads (e.g., Giardia), Trichomonas (a parabasalid), and microsporidia (Philippe and Germot [2000]). However, we have identified two potential deficiencies in these analyses. First, this pioneering work lacks control experiments that would allow the assessment of "covarion" site deletion on support for potential LBA artifacts (i.e., jackknifing of random sites). Another difficulty arises from the tripartition of the data (i.e., archaebacteria, "basal branching," and "crown" eukaryotes) to identify the covarion sites. Therefore, it is unclear which partition of the rRNA data bears the significant rate difference that allegedly produces the LBA artifacts.
By contrast, our results unequivocally illustrate that the failure to model covarion shifts across the archaebacterium-eukaryote split in the data results in an artifactual affinity between the long branches leading to the archaebacterial (outgroup) and microsporidian sequences—a kind of LBA that originates from model misspecification. Furthermore, we detect similarity between the patterns of site rates for the microsporidian and archaebacterial sequences ("parallel site rate variation") relative to eukaryotic homologs, and propose the loss of some eukaryotic EF-1–specific nontranslational functions in the microsporidian lineage to explain this "convergent" property of the two EF-1 subtrees.
Materials and Methods
Data Sets
The experimental details for the isolation of an EF-1 gene from a microsporidian Antonospora locustae (Nosema locustae was renamed to Antonospora locustae; see Slamovits, Williams, and Keeling [2004]) were described in Roger (1996). The novel DNA sequence of Antonospora EF-1 was deposited under DDBJ/EMBL/GenBank accession number AY452734.
We manually aligned the deduced EF-1 amino acid sequences of 22 phylogenetically divergent eukaryotes (19 "short-branch" eukaryotes [SB eukaryotes] and three microsporidia) and 20 archaebacteria. The accession numbers of the sequences considered in this study are listed in the caption of figure 1. To focus on LBA between the microsporidian and archaebacterial EF-1 sequences, other fast-evolving eukaryotic sequences (e.g., those from diplomonads and parabasalids) were not considered in this study. We excluded ambiguously aligned sites and sites that included gaps, creating the "Micro" data set consisting of 42 sequences with 349 sites (table 1). Subsequently, the "Basic" data set was generated from the Micro data by exclusion of the microsporidian sequences for the estimation of conditional mode rates at sites (site rates) and evaluating the rate distribution difference across eukaryotic and archaebacterial subtrees (see below). Smaller data sets, "Micro*" and "Basic*," were derived from the Micro and Basic data sets by exclusion of all archaebacterial sequences except those of Methanosarcina and Aeropyrum, respectively (table 1). These data sets were subjected to computationally intensive ML phylogenetic analyses.
FIG. 1. (A) A histogram of the absolute value of SR calculated between the SB-eukaryotic and archaebacterial subtrees in the Basic data set. Arrows indicate the ranges of top 5% to 40% absolute SR. The absolute value SR range, from which 245 sites were randomly jackknifed to generate the "JK*" data sets, is shaded. (B) The tree topologies examined for the phylogenetic positions of microsporidia. The archaebacterial clade is omitted. The microsporidian clade was attached to the edge A or B, creating the "M + A" or "M + F" tree topology, respectively. Accession numbers for the sequence considered are as follows: Sulfolobus acidocaldaris, X52382; S. solfataricus, X70701; S. tokodaii, BAB65236; Desulfurococcus mobilis, X73582; Aeropyrum pernix, AP000062; Pyrobaculum aerophilum, U94347; uncultured crenarchaeote 74A4, AF393466; Thermococcus celer, X52383; Pyrococcus horikohii, AP000006; P. abysii, AJ248285; P. woesei, X59857; Archaeoglobus fulgidus, AE001039; Methanococcus jannaschii, U67486; M vannielii, X05698; Methanosarcina barkeri, ZP_00079008; Ferroplasma acidarmanus, ZP_000005; Halobacterium sp., AAG20682; H. halobium, D32120; Haloarcula marismotui, X16677; Thermoplasma acidophilum, AL445064; Homo sapiens, BC002845; Caenorhabditis elegans, AAA81688; Schizosaccharomyces pombe, D82571; Podospora anserina, X74799; Arabidopsis thaliana, AB026655; Nicotiana tabacum, D63396; Euglena gracilis, X16890; Trypanosoma brucei, U10562; Acrasis rosea, AF190771; Dictyostelium discoideum, X55972; Mastigamoeba balamuthi, a contig of BE636534, BE636560, BE636596, and BE636646; Entamoeba histolytica, M92073; Porphyra purpurea, U08844; Plasmodium knowlesi, AJ224153; Cryptosporidium parvum, U71180; Stylonychia lemnae, X57926; Blastocystis hominis, AF091355; Phytophthora capsici, AF109666; Cyanophora paradoxa, AF092951; Antonospora (Nosema) locustae, AY452734; Glugea plecoglossi, BAA06865; Encephalitozoon cuniculi, CAD25298
Table 1 Details of the Data Sets Used in This Study.
Site rates of the SB-eukaryotic and archaebacterial subtrees in the Basic data set were estimated by the bivariate ML rate method with the PAM001 amino acid substitution model using Bivar (Susko et al. 2002). In this method, ASRV in the two subtrees was described by a bivariate discrete ML rate distribution with a matrix of 25 x 25 equally spaced rate categories. The upper limit for rates was set as 6.0. The probability of each rate category was estimated from the data.
Absolute values of the difference in bivariate site rates (SR) across the SB-eukaryotic and archaebacterial subtrees were calculated, and then eight sets of sites with top 5% to 40% absolute SR with 5% interval were defined (fig. 1A). Henceforth, the sites that associate with top N% SR across the two subtrees in the Basic data set, as well as their homologous positions in other data sets, are designated as the SRN sites (N = 5, 10, 15, 20, 25, 30, 35, or 40). The SRN sites were excluded from the Micro data set, generating eight "Micro-SRN" data sets. These data sets were subjected to the examination of the impact of SRN sites on the phylogenetic position of microsporidia (see below). Likewise, the "Micro*-SR15," "Basic*-SR15," "Basic-SR15," and "Basic-SR30" data sets were further generated from the Micro*, Basic*, and Basic data sets by exclusion of the SR15 or SR30 sites. The first two data sets were used for ML phylogenetic analyses, and the last two were used for evaluation of the rate distribution difference across the archaebacteria-eukaryote split (see below). The details of the data sets described above are listed in table 1.
Phylogenetic Analyses
All distance analyses in this study were conducted as follows: ML distance matrices were calculated under the JTT amino acid substitution model with ASRV approximated using a gamma () distribution with eight equally probable discrete rate categories (JTT+ model) in Tree-Puzzle version 5.0 (Schmidt et al. 2002). Shape parameters (s) for distributions were estimated from each data set (table 1). Phylogenetic trees were reconstructed from ML distance matrices using the Fitch-Margoliash weighted least squares (Fitch) method implemented in PHYLIP version 3.6a (Felsenstein 1993) with the input sequence order randomized five times and global rearrangements. Distance bootstrap analyses (500 replicates) were conducted using Seqboot, Fitch, Consense in PHYLIP (Felsenstein 1993), Tree-Puzzle (Schmidt et al. 2002), and Puzzleboot version 1.02 (A. J. Roger and M. E. Holder, http://hades.biochem.dal.ca/Rogerlab/Software/Software.html). The parameters and settings for the bootstrap analyses were the same as described above, except a single randomized input sequence order per replicate was performed.
ML trees were reconstructed under the JTT+ model using ProML in PHYLIP (Felsenstein 1993) with global rearrangements and the input sequence order randomized five times. ASRV was modeled using discrete approximation of distributions (eight equally probable categories). All parameters required were estimated from the data using Tree-Puzzle (Schmidt et al. 2002). The parameters and settings for the ML bootstrap analyses (100 replicates) were the same as described above, except (1) a single randomized input sequence order per replicate was performed in the tree search, and (2) ASRV was modeled using discrete distributions with four equally probable rate categories.
Parametric Bootstrap Tests for the Overall Rate Distribution Difference Across Two Subtrees
We have previously established a parametric bootstrap test to examine the rate distribution difference between two subtrees in a combined data set—the overall rate distribution difference is represented by the sum of absolute values of "weighted" SR (henceforth designated as "abrsum") (Susko et al. 2002). The observed value is compared with a null distribution from 1,000 Monte Carlo simulated data sets to evaluate its significance. In this study, the rate distribution difference across the SB-eukaryotic and archaebacterial subtrees was assessed using the Basic, Basic-SR15, and Basic-SR30 data sets (table 1). The details of the parametric bootstrap test are described in Susko et al. (2002).
Comparison of the Two Hypotheses for the Phylogenetic Position of Microsporidia
We generated tree topologies to test two alternative hypotheses for the phylogenetic position of microsporidia: the "M + A" hypothesis (under which microsporidia are one of the earliest diverged lineages among the extant eukaryotes) and the "M + F" hypothesis (under which microsporidia branch with or among fungi). First, a distance phylogenetic tree was reconstructed under the JTT+ model from the Basic data set as described above. The microsporidian clade was then attached to the edge A or B in the optimal distance tree topology (see figure 1B), representing the M + A or M + F hypothesis, respectively. The log-likelihoods (lnLs) of the M + A or M + F trees were calculated from the Micro and Micro-SRN data sets by the ML method under the JTT+ model using Tree-Puzzle (Schmidt et al. 2002). The parameters and branch lengths for each tree were re-estimated from each data set. Difference in the lnLs of the two trees (lnL) was then calculated by subtraction of the former from the latter, assessing the dominance of the M + A topology over the other.
We generated, at each of eight values of N (N = 5, 10, 15, 20, 25, 30, 35, or 40 [see above]), 100 "JK" data sets by randomly deleting (jackknifing) the same number of sites as the SRN sites, using Jacksite (H. Philippe, Université de Montreal). Similarly, we prepared another eight sets of 100 "JK*" data sets as follows. The 245 alignment sites in the Micro data set that correspond neither to the constant sites nor to the SR10 sites in the Basic data set were jackknifed randomly to generate JK* data sets (shaded area in figure 1A). The lnLs between the M + A and M + F trees were calculated from the JK and JK* data sets (null distributions) and compared with the corresponding values obtained from the SRN site exclusion (observed values).
Estimation of Site Rate Pattern of EF-1 Subtrees
We investigated the difference in site rate pattern (1) between the SB-eukaryotic and archaebacterial subtrees, (2) between the SB-eukaryotic and microsporidian subtrees, and (3) between the archaebacterial and microsporidian subtrees. SR between two subtrees and their 95% confidence intervals were estimated using the bivariate ML method (Susko et al. 2002). Because large errors were expected in the estimation of microsporidian site rates as a result of the poor sequence sampling (only three sequences are available for this study), we considered only sites associating with significant SR ("SRS" sites), for which 95 % confidence intervals do not contain zero. The absolute values of these significant SR sites were summed to represent the "overall rate distance" between two subtrees (henceforth designated as "arsum").
Results and Discussion
Robustness of the ML Method Against the Artifactual Microsporidia Plus Archaebacteria Grouping
Microsporidia were estimated to be the earliest diverging lineage among eukaryotes considered (M + A grouping) in the optimal distance trees based on both Micro and Micro* data sets, which differ from each other only in the sampling of archaebacterial sequences (20 versus 2, respectively [table 1]). Likewise, the distance bootstrap analyses on the two data sets display similar results, except the monophyletic clades of the animal sequences and of the euglenozoan sequences (the first two columns in table 2). These results indicate that the selection of archaebacterial sequences has no large impact on the overall relationship among eukaryotic sequences. Thus, the smaller Micro* data set was used for more stringent phylogenetic analyses using the ML method under the JTT+ model.
Table 2 Impact of Phylogenetic Methods and SR15 Site Exclusion on BP Values.
The M + A grouping was recovered in the optimal ML tree based on the Micro* data set (fig. 2A) in accordance with previously published works assessing a microsporidian (Glugea plecoglossi) EF-1 sequence (Kamaishi et al. 1996a, 1996b). The M + A grouping receives relatively high support (BP = 83%) in the distance analysis, whereas the ML analysis gives very weak support (BP = 34%) to the same relationship (fig. 2A). These results clearly indicate that the ML method under the JTT+ model is less susceptible to LBA producing the M + A grouping in the EF-1 data than is the distance method under the same model.
FIG. 2. Phylogenetic analyses of the microsporidian EF-1 sequences. (A) An ML tree from the Micro* data set (349 sites) rooted by the archaebacterial sequences. Only ML BP values over 50% are shown. The BP values for particular nodes are also listed in table 2. (B) An ML tree based on the Micro*-SR15 data set (298 sites) rooted by the archaebacterial sequences. The details were as described in (A)
The ML or distance bootstrap analyses under the JTT+ model fail to resolve deep level relationships among eukaryotic lineages with BPs greater than 50%, except the M + A grouping in the latter analysis (figure 2A and the second and third columns in table 2). Significantly, no other large incongruities between the BP values from the ML method and the corresponding values from the distance method were observed for other major eukaryotic clades recovered, indicating that the ML bootstrap analysis is not overly stringent (the first three columns in table 2).
A Covarion Shift Across the SB-Eukaryotic and Archaebacterial Subtrees
As discussed above, the optimal ML tree reconstructed under the JTT+ model has the artifactual M + A grouping (fig. 2A). Unless significant model misspecification is present, the ML method may be robust against classical LBA from coincidental concurrence of character states between two distantly related fast-evolving lineages. In addition, we detect no significant homoplasy of amino acid characters across the microsporidian and archaebacterial sequences in the EF-1 data set (see Supplementary Material online). Therefore, our phylogenetic analyses under the JTT+ model most probably fails to account for a crucial aspect of the evolutionary process that generated the data.
Our group previously reported a significant rate distribution difference (covarion shift) between the archaebacterial and eukaryotic subtrees in the combined EF-1 tree (Susko et al. 2002; Inagaki et al. 2003). This covarion shift clearly violates the assumption of the JTT+ model that requires that the rates at sites be stationary over the tree. Therefore, we hypothesized that the failure to account for the covarion shift across the two EF-1 subtrees in the phylogenetic analyses was responsible for the artifactual M + A grouping.
We reassessed the covarion shift across the archaebacterium-eukaryote split in the Basic data sets using parametric bootstrap tests (Susko et al. 2002; Inagaki et al. 2003). The observed abrsum value appears to fall outside of the bootstrap distribution (P < 0.001, 5.7 standard errors [SE] away from the mean of bootstrap values [fig. 3A]), suggesting that the overall rate distributions in the two subtrees are significantly different. However, the distance between the observed value and the mean of bootstrap values appears to be reduced to 3.2 and 2.6 SE by both exclusions of the SR15 and SR30 groups of sites respectively, although both P values are still less than 0.01 (fig. 3B and C). These results indicate that the magnitude of the covarion shift across the two subtrees can be mitigated by SRN site exclusion.
FIG. 3. (A–C) Rate distribution differences across the SB-eukaryotic and archaebacterial subtrees. The significance of the overall rate distribution difference across the SB-eukaryotic and archaebacterial subtrees was examined using parametric bootstrap tests (Susko et al. 2002). The results from the analyses based on the Basic, Basic-SR15, and Basic-SR30 data sets are shown in (A), (B), and (C), respectively. The histograms and arrows indicate the bootstrap distributions and observed values, respectively. (D–E) Impact of the SRN site exclusion on the M + A grouping. Log-likelihoods of the M + F and M + A trees was calculated from each of the Micro-SRN data sets, and then the former was subtracted from the latter. The bootstrap distributions generated from 100 JK and JK* data sets are presented using box plots in (D) and (E), respectively. The observed values are indicated by diamond symbols. In the case that the null hypothesis of lowering of lnL solely by the reduction of data was rejected at the 1% or 5% level, the observed values are indicated by open or double diamond symbols, respectively. The lnL = 12.58 from the Micro data set (no site exclusion) is indicated by an arrow on the Y-axis. Details of the two tree topologies considered are described in figure 1B
Impact of SRN Site Exclusion on the Microsporidia Plus Archaebacteria Grouping
The hypothesis that the artifactual M + A grouping was caused by the covarion shift in the EF-1 data was assessed by comparing the M + A and M + F tree topologies using lnLs ("tree lnL comparison"). If our working hypothesis is correct, the dominance of the M + A tree topology over the alternative should be significantly reduced by the SRN site exclusions that diminish the magnitude of the covarion shift (see figures 3A–C).
Without any SRN site exclusions, the M + A tree appears to be better than the alternative tree by 12.58 lnL units (arrow on the Y-axis in figures 3D and E). The lnLs appear to drop from 12.58 to 1.14 lnL units by the progressive exclusion of the SRN sites (figs. 3C and D). In comparison between each observed value and the corresponding null distribution from 100 JK data sets, a significant decrease in 1nL (P < 0.05) was obtained after the exclusion of the SR10 sites (open diamond symbols in figure 3D). P < 0.01 was obtained by the exclusion of the SR35 and SR40 sites (double diamond symbols in figure 3D). We can conclude that the decrease in the lnL because of the exclusion of SRN sites is greater than one would expect based on random deletion.
The SRN sites never contain any of 70 constant sites among 19 SB-eukaryotic and 20 archaebacterial sequences (20% of 349 analyzed sites in the Basic data set). Therefore, to mimic this effect, we created the JK* data sets that contain the sites corresponding to the constant and SR10 sites in the Basic data set. However, the jackknife values calculated from the JK* data sets appear to be similar to those from the JK data sets (fig. 3E). P < 0.01 was obtained after the exclusion of the SR15 sites (double diamond symbols in figure 3E). The tree lnL comparisons clearly indicate that the sites associated with a covarion shift across the SB-eukaryotic and archaebacterial subtrees are responsible for the dominance of the artifactual M + A tree, independent of the jackknife distribution methods used (JK or JK*).
Impact of the SR15 Site Exclusion on ML Phylogeny
We further investigated the impact of the covarion shift on ML phylogenetic analyses using the Micro*-SR15 data set. The optimal ML topology appears to have no M + A grouping (fig. 2B), suggesting that the SR15 site exclusion sufficiently eliminates the source of the LBA uniting the microsporidian and archaebacterial sequences in the Micro* data set. Significantly, much of the phylogenetic information in the original data may remain after the SR15 site exclusion because the ML BP values for the major eukaryotic clades were not drastically changed by the presence or absence of the SR15 sites, except the bootstrap support for the animal plus fungi clade reduced nearly 20% by the site exclusion (the third and fourth columns in table 2). Likewise, these support values appear not to be affected by the presence or absence of the microsporidian sequences, except the apicomplexan clade (compare the third and fifth column or the fourth and sixth columns in table 2). Unfortunately, none of the sequences in the data displays strong affinity (BP > 50%) to the microsporidian sequences in the ML bootstrap analysis (fig. 2B). The true phylogenetic signal in the microsporidian sequences has probably been erased by their extremely high evolutionary rates. It is also possible that the number of phylogenetically informative sites may be relatively small in this data set. Taken together the results from the tree lnL comparisons and ML phylogeny on the Micro*-SR15 data set, we conclude that the artifactual M + A grouping in the EF-1 phylogenies originates from the failure to modeling the covarion shift across the eukaryotic and archaebacterial subtrees.
Our strategy of compensating for a covarion shift in the EF-1 data set is generally applicable to rooted phylogenies. Regardless of the phylogenetic marker or the data type (amino acid or nucleotide sequences), the difference in site rate distribution across ingroup and outgroup sequences may be significant. When the basal branching of a fast-evolving sequence is suspected to be an artifact and an alternative hypothesis is available (e.g., M + A versus M + F tree topologies in this study), one should examine whether model misspecification from covarion shift biases phylogenetic estimates using the strategy described above. The identification and exclusion of sites that exhibit a significant XTSRV across the ingroup and outgroup sequences by various methods (e.g., those described in this study and in Lopez, Forterre, and Philippe [1999]) should be carried out routinely for "deep" phylogenies to compensate or prevent potential artifacts from covarion shifts.
It would be intriguing to assess whether the basal position of microsporidia in SSU rRNA phylogenies is a phylogenetic artifact caused by the covarion shift across the eukaryotic and prokaryotic (outgroup) subtrees using the same strategy developed in this study. If SSU rRNA phylogenies are biased by the failure to model of the covarion shift across the ingroup and outgroup subtrees in a SSU rRNA phylogeny, we can expect significant diminishment of the support for the basal position of microsporidia by SRN exclusion across the two SSU rRNA subtrees. Likewise, the fast-evolving EF-1 sequences from diplomonads and/or parabasalids may be misplaced at the base of the eukaryotic subtree by the same model misspecification. Nevertheless, it is difficult to assess whether the basal positions of diplomonads and parabasalids in the EF-1 phylogeny are artifacts, because, currently, there are no clearly supported alternative hypotheses for their phylogenetic positions.
Parallel Site Rate Variation in EF-1
The rate heterogeneity in the data set including SB-eukaryotic EF-1 sequences ("SB-euk" data set [table 1]) is more extreme than that in the data set including both SB-eukaryotic and microsporidian orthologs ("SB-euk plus Micro" [table 1]); the distribution shape parameters (s) for the former and latter data sets are estimated to be 0.33 and 0.74, respectively. Similarly, the degree of rate heterogeneity in the SB-euk data set ( = 0.33) and that in the data set including 20 archaebacterial sequences are largely different ( = 0.74 ["Arch" in table 1]). These observations circumstantially suggest that the overall evolutionary mode of the microsporidian subtree is more similar to that of the archaebacterial subtree than to that of the SB-eukaryotic subtree. Thus, it is possible that the eukaryote-microsporidian split represented a second covarion shift in the data. To examine the potential parallel variation of evolutionary modes in the EF-1 phylogeny, we estimated and compared the site rates of the three subtrees.
We detected 44 SRS sites with relatively large SR between the archaebacterial and SB-eukaryotic subtrees (arsum = 75.25 [fig. 4A]), corresponding to significant covarion shift identified by the parametric bootstrap test shown in figure 3A. Interestingly, 75 SRS sites with arsum = 58.15 were detected between the SB-eukaryotic and microsporidian subtrees, whereas only 27 SRS sites with arsum = 19.71 were found between the archaebacterial and microsporidian subtrees (figs. 4B and C). These results led us to propose a probable scenario for EF-1 evolution in archaebacteria and eukaryotes. First, the EF-1 site rate pattern was largely differentiated at the archaebacterium-eukaryote transition by acquisition of a number of nontranslational (auxiliary) functions in the eukaryote proteins. Then, after microsporidia split from fungi, extreme reduction of the EF-1 functions occurred in the former lineage, causing in the site rate pattern of the microsporidian EF-1 orthologs to change again. In this view, microsporidian and archaebacterial EF-1 proteins could convergently share relatively similar site rate distributions, a phenomenon we call "parallel site rate variation" or "PSRV."
FIG. 4. Site rate pattern comparisons. (A) SB-eukaryotic (abbreviated as "SB-euk") EF-1 versus archaebacterial ("Arch") EF-1. (B) SB-eukaryotic EF-1 versus microsporidian ("Micro") EF-1. (C) Archaebacterial EF-1 versus microsporidian EF-1. Only sites associated with significant SR are plotted. Hash marks and vertical bars indicate SR and their 95% confidence intervals, respectively
This scenario is congruent with our knowledge about eukaryotic EF-1 functions and biology of microsporidia. Eukaryotic EF-1 is known to have auxiliary functions that are most probably absent in the archaebacterial orthologs, as well as its core function as a translation elongation factor. For instance, biochemical/genetic studies using SB-eukaryotes (e.g., Dictyostelium and ciliates) indicated that eukaryotic EF-1 binds to cytoskeletal proteins and calmodulin and is involved in the ubiquitin-dependent proteolytic system (reviewed in Negrutskii, and El'skaya [1998]). It is also likely that additional (potentially many) eukaryotic EF-1–specific protein-protein interactions remain unidentified. Taken together, it is rational to assume that some sites in the SB-eukaryotic proteins are strictly constrained by the eukaryotic EF-1–specific auxiliary functions, whereas the corresponding sites in the archaebacterial proteins are free to vary because of the primary absence of these functions as schematically drawn in figure 5.
FIG. 5. A scheme for the parallel site rate variation hypothesis between the microsporidian and archaebacterial EF-1. EF-1 in archaebacteria (abbreviated as "Arch," top left) is constrained solely by the core function related to translation (primary lack of nontranslational [auxiliary] functions). The area in EF-1 constrained by this core function is highlighted in gray. At the archaebacterium-eukaryote transition (numeral 1), EF-1 acquired auxiliary functions (top right). The areas constrained from the SB-eukaryotic ("SB-euk") EF-1–specific auxiliary functions are shown in black spots. During the reduction of functionality of microsporidian cellular system (numeral 2), some auxiliary functions in microsporidian ("Micro") EF-1 were lost or weakened (bottom). Steps during eukaryotic evolution are shaded
In contrast to eukaryotes in general, the entire cellular system of microsporidia (including the nuclear genome structure; see Katinka et al. [2001]) is highly reduced, most probably because of their intracellular parasitic lifestyle (Keeling and Fast 2002). Likewise, some of the EF-1 auxiliary functions may have been lost/weakened during the reductive evolution of microsporidia (schematically drawn in figure 5). Actually, 62 SRS sites with small negative SR detected in the site rate pattern comparison between the SB-eukaryotic and microsporidian subtrees confirms this notion (fig. 4B). At these alignment positions, the amino acid character appears to be constant in the SB-eukaryotic subtree but variable in the microsporidian subtree (data not shown), indicating that the microsporidian proteins experience less functional constraints than the SB-eukaryotic orthologs. As the core translational functions of EF-1 are absolutely indispensable for cell viability and cannot be reduced, secondary loss of the auxiliary functions in the microsporidian proteins is the most plausible explanation (fig. 5). Under the parallel absence of eukaryotic EF-1–specific auxiliary functions in microsporidian and archaebacterial EF-1 proteins, the two subtrees could have evolved a relatively similar rate distribution independently.
Obviously, this supposed PSRV in the EF-1 data set should be examined by more accurate site rates estimated from a larger set of microsporidian EF-1 sequences in future studies. In-depth studies of the auxiliary functions of both SB-eukaryotic and microsporidian EF-1 proteins would also be essential for testing this hypothesis.
Conclusion
Here we have shown that the covarion shift across the in-group and out-group sequences most likely is responsible for the artifactual "deep" branching position of microsporidia in EF-1 phylogenies. Furthermore, we detected PSRV between the microsporidian and archaebacterial EF-1 sequences. Covarion shifts, and, in particular, PSRV, need to be pursued as intriguing potential sources of artifacts in many deep phylogenetic inferences. Indeed, a study undertaken by some of us shows analytically that under simple Markov models of sequence change, neighbor-joining and least squares distance methods will succumb to the LBA form of inconsistency when an ASRV model is employed if significant PSRV is occurring (Susko, Inagaki, and Roger, 2004). To further investigate the phylogenetic artifacts that arise from PSRV, studies on data simulated with a more complex protein evolutionary models employing both ASRV and XTSRV will be key.
Supplementary Material
EF-1 alignment considered in this study is shown in the Supplementary Material online.
Acknowledgements
We thank J. Leigh for critical reading and the members of Statistical and Evolutionary Bioinformatics Group (Dalhousie University, Halifax, Canada) for valuable discussions. We also thank H. Philippe (Université de Montreal, Montreal, Canada) for coding the Jacksite program. Y.I. is supported by the "Understanding Prokaryote Genome Diversity and Evolution Project" of Genome Atlantic. A.J.R. is supported as a Scholar by the CIAR program in Evolutionary Biology. This work is supported by operating 227085-00 from NSERC (awarded to A.J.R.). This collaboration is part of a Genome Atlantic/Genome Canada large-scale project.
Literature Cited
Arisue, N., L. B. Sanchez, L. M. Weiss, M. Muller, and T. Hashimoto. 2002. Mitochondrial-type hsp70 genes of the amitochondriate protists, Giardia intestinalis, Entamoeba histolytica and two microsporidians. Parasitol. Int. 51:9-16.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972-977.
Cavalier-Smith, T., and E. E. Chao. 1996. Molecular phylogeny of the free-living archezoan Trepomonas agilis and the nature of the first eukaryote. J. Mol. Evol. 43:551-562.
Edlind, T. D., J. Li, G. S. Visvesvara, M. H. Vodkin, G. L. McLaughlin, and S. K. Katiyar. 1996. Phylogenetic analysis of beta-tubulin sequences from amitochondrial protozoa. Mol. Phylogenet. Evol. 5:359-367.
Embley, T. M., and R. P. Hirt. 1998. Early branching eukaryotes? Curr. Opin. Genet. Dev. 8:624-629.
Felsenstein, J. 1993. Phylogeny inference package. Version 3.2. Cladistics 5:166.
Felsenstein, J. 2003. Likelihood method. Pp. 248–274 in Inferring phylogenies. Sinauer Associates, Sunderland, Mass.
Fitch, W. M., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579-593.
Galtier, N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol. Evol. 18:866-873.
Germot, A., H. Philippe, and H. Le Guyader. 1997. Evidence for loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae. Mol. Biochem. Parasitol. 87:159-168.
Hasegawa, M., and M. Fujiwara. 1993. Relative efficiencies of the maximum likelihood, maximum parsimony and neighbor-joining methods for estimating protein phylogeny. Mol. Phylogenet. Evol. 2:1-5.
Hashimoto, T., N. Arisue, and M. Hasegawa. 2002. Application of molecular phylogenetic inference and associated problems: illustrative data analysis on early eukaryotic evolution. Proc Inst. Stat. Math. 50:45-68.
Hirt, R. P., B. Healy, C. R. Vossbrinck, E. U. Canning, and T. M. Embley. 1997. A mitochondrial Hsp70 orthologue in Vairimorpha necatrix: molecular evidence that microsporidia once contained mitochondria. Curr. Biol. 7:995-998.
Hirt, R. P., J. M. Logsdon, Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:580-585.
Huelsenbeck, J. P. 2002. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. 19:698-707.
Inagaki, Y., C. Blouin, E. Susko, and A. J. Roger. 2003. Assessing functional divergence in EF-1 and its paralogs in eukaryotes and archaebacteria. Nucleic Acids Res. 31:4227-4237.
Ishihara, R., and Y. Hayashi. 1968. Some properties of ribosomes from the sporoplasm of Nosema bombycis. J. Invert. Pathol. 11:377-385.
Kamaishi, T., T. Hashimoto, Y. Nakamura, Y. Masuda, F. Nakamura, K. Okamoto, M. Shimizu, and M. Hasegawa. 1996a. Complete nucleotide sequences of the genes encoding translation elongation factors 1 and 2 from a microsporidian parasite, Glugea plecoglossi: implications for the deepest branching of eukaryotes. J. Biochem. (Tokyo) 120:1095-1103.
Kamaishi, T., T. Hashimoto, Y. Nakamura, F. Nakamura, S. Murata, N. Okada, K. Okamoto, M. Shimizu, and M. Hasegawa. 1996b. Protein phylogeny of translation elongation factor EF-1 suggests microsporidians are extremely ancient eukaryotes. J. Mol. Evol. 42:257-263.
Katinka, M. D., S. Duprat, and E. Cornillot, et al. 2001 (17 co-authors). Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414:450-453.
Keeling, P. J. 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38:298-309.
Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13:1297-1305.
Keeling, P. J., and N. M. Fast. 2002. Microsporidia: biology and evolution of highly reduced intracellular parasites. Annu. Rev. Microbiol. 56:93-116.
Keeling, P. J., M. A. Luker, and J. D. Palmer. 2000. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. Mol. Biol. Evol. 17:23-31.
Kumar, S., and A. Rzhetsky. 1996. Evolutionary relationships of eukaryotic kingdoms. J. Mol. Evol. 42:183-193.
Lopez, P., D. Casane, and H. Philippe. 2002. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19:1-7.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496-508.
Negrutskii, B. S., and A. V. El'skaya. 1998. Eukaryotic elongation factor 1: structure, expression, functions and possible aminoacyl-tRNA channeling. Prog. Nucleic Acid Res. Mol. Biol. 60:47-78.
Penny, D., and M. Hasegawa. 2001. Covarion model of molecular evolution. Pp. 473–477 in S. Brenner and J. H. Miller, eds. Encyclopedia of genetics. Academic Press, San Diego.
Penny, D., B. J. McComish, M. A. Charleston, and M. D. Hendy. 2001. Mathematical elegance with biochemical realism: the covarion model of molecular evolution. J. Mol. Evol. 53:711-723.
Peyretaillade, E., V. Broussolle, P. Peyret, G. Metenier, M. Gouy, and C. P. Vivares. 1998. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin. Mol. Biol. Evol. 15:683-689.
Philippe, H., and A. Germot. 2000. Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol. Biol. Evol. 17:830-834.
Philippe, H., and J. Laurent. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616-623.
Roger, A. J. 1996. Studies on the phylogenetic and gene structure of early-branching eukaryotes. PhD Thesis. Dalhousie University, Halifax, Canada.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504.
Slamovits, C. H., B. A. P. Williams, and P. J. Keeling. 2004. Transfer of Nosema locustae (Microporidia) to Antonospora locustae n. comb. based on molecular and ultrastructural data. J Euk. Microbiol. 51:270-213.
Sullivan, J., and D. L. Swofford. 2001. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst. Biol. 50:723-729.
Susko, E., Y. Inagaki, and A. J. Roger. 2004. On inconsistency of the neighbor-joining, lease squares and minimum evolutions estimation when substitution processes are incorrectly modeled. Mol. Biol. Evol. (in press).
Susko, E., Y. Inagaki, C. Field, M. E. Holder, and A. J. Roger. 2002. Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol. Biol. Evol. 19:1514-1523.
Swofford, D. L., P. J. Waddell, J. P. Huelsenbeck, P. G. Foster, P. O. Lewis, and J. S. Rogers. 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50:525-539.
Tuffley, C., and M. Steel. 1998. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147:63-91.
Van de Peer, Y., A. Ben Ali, and A. Meyer. 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246:1-8.
Vossbrinck, C. R., J. V. Maddox, S. Friedman, B. A. Debrunner-Vossbrinck, and C. R. Woese. 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature 326:411-414.
Vossbrinck, C. R., and C. R. Woese. 1986. Eukaryotic ribosomes that lack a 5.8S RNA. Nature 320:287-288.
Williams, B. A., R. P. Hirt, J. M. Lucocq, and T. M. Embley. 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418:865-869.(Yuji Inagaki*,, Edward Su)
濠电姷鏁告慨鐑藉极閹间礁纾块柟瀵稿Т缁躲倝鏌﹀Ο渚&婵炲樊浜濋崑鎰版偣閸ヮ亜鐨烘い锔诲幖閳规垿鎮╃紒妯婚敪濠电偛鐪伴崐婵嬨€佸鑸电劶鐎广儱妫涢崢鍗炩攽閻愭潙鐏ョ€规洦鍓熷鎼佸Χ閸℃劒绨婚梺鍐叉惈閸燁偊宕㈤幘顔界厸閻忕偛澧介埥澶愭煃閽樺妯€鐎规洩绻濋幃娆忣啅椤斾粙鐛庨梻鍌欑閹碱偊藝椤愶箑鐤炬繝濠傚悩濞戙垹绀冩い鏃囧閹芥洖鈹戦悙鏉戠仸閼裤倝鏌$€n亪鍙勯柣鎿冨亰瀹曞爼濡搁敃鈧惃鎴︽⒑缁嬫鍎愰柟姝屽吹缁骞掗弬鍝勪壕闁挎繂绨肩花浠嬫煕閺冩挾鐣辨い顏勫暣婵″爼宕卞Δ鈧ḿ鎴︽⒑缁嬫鍎愰柟鐟版喘瀵鈽夐姀鈥充汗閻庤娲栧ú銏ゅ焵椤掍礁绗氬ǎ鍥э躬瀹曪絾寰勬繝鍌ゆ綋闂備礁鐤囬~澶愬垂閸喚鏆﹂柛顐f礀閻撴﹢鏌i幇闈涘濞存粍鐟╁铏规嫚閺屻儺鈧绱掗悩鑼х€规洘娲熷畷锟犳倷瀹ュ棛鈽夐柍钘夘樀婵偓闁绘﹢娼ф慨锔戒繆閻愵亜鈧牜鏁幒妞濆洭顢涢悙鏉戔偓鍫曟煥閺冨倻鎽傛繛鍫滅矙閺岋綁骞囬浣叉灆闂佺ǹ瀛╁褰掋€冮妷鈺傚€烽柛娆忣槸閺嬬姴顪冮妶鍐ㄧ仾婵☆偄鍟悾鐑藉Ω閳哄﹥鏅i梺缁樺姍濞佳囧箹缁嬪簱鏀介柣妯虹仛閺嗏晠鏌涚€n剙鈻堟鐐存崌椤㈡棃宕卞Δ鍐摌闂備浇顕栭崹濂告倶閹邦優娲敂閸曨収鍚呴梻浣虹帛閿氶柛姘e亾缂備焦顨呴ˇ闈涱潖濞差亜绠伴幖杈剧悼閻g敻姊洪悷鏉挎Щ妞ゆ垵顦悾鐑芥偨缁嬪潡鍞跺┑鐘焺娴滄繂螞閸愵喖鏋侀柛灞剧矌妞规娊鎮楅敐搴濈盎濠碘€炽偢濮婄粯鎷呴崨濠冨枑闂佺ǹ顑囬崑銈夌嵁閹版澘绠柦妯侯槼閹芥洟姊洪崫鍕偍闁搞劌缍婇悰顕€濮€閵堝棛鍘搁梺鎼炲劗閺呮稑鐡┑鐘殿暯閳ь剝灏欓惌娆撴煛瀹€瀣М濠殿喒鍋撻梺瀹犳〃閼宠埖绂掗埡鍐=闁稿本姘ㄥ瓭闂佹寧娲︽禍顏堟偘椤旂晫绡€闁搞儯鍔嶅▍銏ゆ⒑缂佹〒鍦焊濞嗘挻鍋柍褜鍓氭穱濠囨倷椤忓嫧鍋撻弽顓炵濡わ絽鍟壕濠氭煟閺傛寧鎲告い鈺冨厴閺岀喖骞嶉纰辨毉闂佹娊鏀遍崹鍧楀蓟濞戙垺鏅滈悹鍥ㄥ絻缁楀矂姊洪懡銈呮瀾闁革絾娼欑叅妞ゅ繐瀚€瑜旈弻娑㈠Ψ椤旀儳甯ュ┑鐐跺亹閸犳牕顫忓ú顏勭闁告瑥顦伴崕鎾绘⒑閻熸澘鏆辨慨姗堝閹广垹顫濋懜纰樻嫽婵炶揪绲介幊娆掋亹閹烘垵鐝樺銈嗗笒閸婂鎯屽▎鎾寸厵闁绘垶锕╁▓鏇㈡煕婵犲倻浠涢柕鍥у瀵剟宕归瑙勫瘱缂備胶鍋撻崕鎶藉Χ閹间礁钃熼柨鐔哄Т缁€鍐煏婵炲灝鍔楅柛瀣尵缁瑩骞愭惔銏㈡婵犵數濮烽弫鍛婃叏椤撱垹绠柛鎰靛枛缁€瀣煕椤垵浜為柡澶婄秺閺岋絾鎯旈姀鈺佹櫛闂佸摜濮甸悧鐘诲箖閵夆晜鍋傞幖杈剧稻濞堟儳鈹戦濮愪粶闁稿鎸搁埞鎴﹀焺閸愵亝鎲欏銈忛檮婵炲﹪寮诲☉銏犵闁哄鍨规禒鍝ョ磽娴h櫣甯涚紒瀣崌閸┾偓妞ゆ帊鑳堕埊鏇炵暆閿濆懏鍋ラ柟顖氭湰缁绘繈宕戦懞銉︻棃闁诡喒鏅犲畷锝嗗緞鐎n亝鐎鹃梻鍌欐祰濡椼劎娆㈤敓鐘查棷闁挎繂顦埀顑跨窔瀵挳濮€閳╁啯鐝曢梻浣藉Г閿氭い锔跨矙閸┾偓妞ゆ帊鑳堕埥澶愭煃鐟欏嫬鐏撮柡浣哥Ч瀹曠喖顢曢敍鍕礋婵犵數濮幏鍐川椤旂晫褰х紓鍌欑贰閸犳鎮烽敂鐐床婵犻潧顑呴悙濠囨煏婵炑冨暙缁犵増绻濋悽闈浶為柛銊у帶閳绘柨鈽夐姀鈩冩珖闂侀潧鐗嗛ˇ顖毿ч弻銉︾厸闁搞儮鏅涢弸鎴︽煕濞嗗繒绠婚柡宀嬬秮閹垽鎮℃惔婵嗘瀳婵犵妲呴崹顏堝磻閹剧粯鈷掑ù锝堝Г閵嗗啰绱掗埀顒佹媴閾忓湱鐓嬮梺鍦檸閸犳宕戦埡鍌滅瘈闂傚牊渚楅崕蹇曠磼閳ь剟宕橀埞澶哥盎濡炪倖鎸鹃崰搴ㄦ倶閿濆鐓欓悗鐢登归崢瀛樻叏婵犲嫮甯涢柟宄版噽缁數鈧綆浜濋悾浼存⒒娴e憡鎲稿┑顔炬暬楠炴垿宕惰閺嗭附绻濋棃娑冲姛闁汇倐鍋撴繝鐢靛仦閸ㄥ吋銇旈幖浣哥柧闁挎繂娲ㄧ壕浠嬫煕鐏炲墽鎳呴柛鏂跨Т閻f繈鏁愰崨顔间淮闂佺硶鏂傞崹钘夘嚕椤曗偓瀹曞ジ鎮㈤崫鍕闂傚倷鑳堕幊鎾活敋椤撱垹纾块柟瀵稿仧閻熻绻涢幋娆忕仾闁绘挻娲熼弻锟犲磼濮樺彉铏庨梺鍝勬4缂嶄線寮婚垾宕囨殕閻庯綆鍓涢惁鍫ユ倵鐟欏嫭绀冮悽顖涘浮閿濈偛鈹戠€e灚鏅i梺缁橈耿濞佳囩嵁閳ь剛绱撻崒姘偓鎼佸磹閻戣姤鍊块柨鏇炲€堕埀顒€鍟村畷銊ヮ潰閵堝懏鍠橀柡灞芥椤撳ジ宕ㄩ姘曞┑锛勫亼閸婃牜鏁幒妤€绐楁慨姗嗗厳缂傛岸鏌熼柇锕€鏋ょ痪鎯у悑閵囧嫰寮崶褌姹楃紓浣哄Т椤兘寮婚妸銉㈡斀闁割偅绻€濡叉劙姊烘潪鎵槮闁哥喐娼欓悾鐑藉箳閹搭厽鍍甸梺鍛婎殘閸嬬偤骞夐姀銈嗏拻濞达綀妫勯崥鐟扳攽閻愨晛浜剧紓鍌氬€搁崐褰掓嚌妤e啫鐓濈€广儱顦伴弲鏌ユ煕閵夈垺娅囬柣鎾愁儔濮婃椽宕滈懠顒€甯ラ梺鍝ュУ閹瑰洭宕洪埀顒併亜閹哄秶顦︾紒妤佸笚閵囧嫰顢曢敐鍥╃杽濡炪們鍨洪敃銏ゅ箖閳哄拋鏁冮柨婵嗗缂嶅矂姊婚崒姘偓椋庢濮橆兗缂氱憸宥囧弲闂侀潧鐗嗛幃鍨洪鍕庘晠鏌嶆潪鎷屽厡闁告棑绠戦—鍐Χ閸℃鐟ㄩ梺绋款儏濡繂顕i妸鈺傚亱闁割偁鍨婚鏇熺節閵忥絾纭炬い鎴濇搐鐓ら悗鐢电《閸嬫挾鎲撮崟顒傦紭闂佹悶鍔忔慨銈嗙┍婵犲洤閱囬柡鍥╁仧閸婄偤姊洪棃娴ㄥ綊宕曢悜妯侯嚤闁哄洢鍨洪悡鐔兼煟閺傛寧鎲搁柣顓炶嫰椤儻顦虫い銊ョ墦瀵偊顢欓崲澶嬫瀹曨亪宕橀鍡忔(婵犵數濮伴崹濂稿春閺嶎厼绀夐柡宥庡幖绾惧鏌ㄥ┑鍡╂Ч闁绘挻娲熼弻鏇熷緞濡櫣浠╁銈傛櫓閸撴瑩鍩ユ径鎰闁荤喐澹嗗В銏㈢磽娴d粙鍝洪悽顖涱殔宀h儻绠涘☉妯溾晝鎲歌箛娑欏仼濠靛倸鎲¢埛鎴犳喐閻楀牆淇俊顐e灥閳规垿鎮欓埡浣峰闂傚倷娴囬鏍窗濡ゅ啫鍨濋柟鎹愵嚙缁犳澘鈹戦悩鎻掓殭缂佸墎鍋涢…璺ㄦ崉閾忓湱浼囧┑顔硷功閸嬫挾鎹㈠┑瀣仺闂傚牊绋愮划璺侯渻閵堝棙澶勯柛妯圭矙楠炲牓濡搁敂鍝勪簼闂佸憡鍔戦崝宥呂i鍕拺闁告繂瀚埢澶愭煕濡亽鍋㈢€规洖缍婂畷鎺楁倷鐎电ǹ甯惧┑鐘垫暩閸嬬喖宕戦幘鏂ユ瀺闁糕剝绋掗悡鏇㈡倵閿濆啫濡煎┑鈥炽偢濡焦寰勯幇顓犲弳濠电娀娼уΛ娆撳闯瑜版帗鐓涢柛鈩冪懃閺嬫垵菐閸パ嶈含妞ゃ垺绋戦~婵嬪础閻愨晛寰嶅┑鐘愁問閸犳牠鏁冮妷銉富濞寸姴顑冮埀顑跨窔瀵挳濮€閳╁啯鐝抽梻浣虹《閸撴繈鎮烽敃鍌ゆ晣濠电姵纰嶉埛鎺懨归敐鍫燁仩閻㈩垱鐩弻鐔哄枈閸楃偘绨介梺鐟扮畭閸ㄥ綊鍩為幋鐘亾閿濆骸澧紒渚婄畵閺岋絾鎯旈婊呅i梺鍝ュУ閻楁粎鍒掓繝姘櫜闁糕剝鐟ч惁鍫熺節閻㈤潧孝闁稿﹥鎮傞、鏃堫敃閿旂晫鍘甸梺鍝勵儛閸嬪嫭鎱ㄩ崒娑欏弿濠电姴鍟妵婵囶殽閻愭潙濮堥柟顖涙閺佹劙宕掑☉杈ㄧ秾闂傚倸鍊风粈渚€骞栭锔藉亱婵犲﹤鐗嗙粈鍫ユ煟閺冨牜妫戦柡鍡畵閺岋綁鎮㈢粙娆炬闂佸壊鍋掓禍顏堝蓟濞戙垹鍗抽柕濞垮劚鐎涳絽鈹戦悙瀛樼稇閻庢凹鍓熼垾鏃堝礃椤斿槈褔鏌涢埄鍐炬畼闁荤喐鍔欏铏圭磼濡椽鍤嬬紓浣哄У閹告悂顢氶敐澶婄缂佹稑顑嗛弲婊堟⒑閸撴彃浜為柛鐘虫崌瀹曘垽鏌嗗鍡忔嫼闂佽崵鍠撴晶妤呭箚閸垻纾煎璺侯儐缂嶆垿鏌i敐鍥ㄦ毈鐎规洖宕埥澶娾枎閹存繂绗氶梺鑽ゅ枑缁秶鍒掗幘宕囨殾婵犲﹤鍠氬ḿ鈺傘亜閹烘埈妲归柛宥囨暬濮婃椽妫冨ù銉ョ墦瀵彃饪伴崼婵堬紱闂佺ǹ鐬奸崑鐐烘偂閵夆晜鐓熼柡鍌涘閹牏鈧稒绻堥弻锝夋偄閸濄儲鍤傜紓浣哄У閹瑰洭鎮伴鈧浠嬵敇閻愭鍟囨俊鐐€栭幐楣冨磻閻斿摜顩锋い鏍ㄧ矌绾捐棄霉閿濆棗绲诲ù婊呭亾缁绘繈濮€閿濆棛銆愰梺鎰佸灡濞叉繈濡甸幇鏉跨闁瑰濯Σ顖炴⒒娴e懙鍦崲濡ゅ懎纾婚柟閭﹀厴閺嬫棃鏌曢崼婵愭Ч闁抽攱鍨圭槐鎾存媴鐠愵垳绱扮紓浣哄У閻楃娀寮婚敐澶婄厸濠电姴鍊绘禒鈺呮⒑娴兼瑧鎮奸柛蹇旓耿瀵偊骞樼紒妯轰汗闂佽偐鈷堥崜锕€危娴煎瓨鈷掑ù锝堟娴滃綊鏌嶅畡鎵ⅵ鐎规洘绮岄~婵囷紣濠靛洦娅撻梻浣侯攰閹活亪姊介崟顖氱厱闁硅揪闄勯崑锝夋煕閵夘垳宀涢柛瀣崌閹煎綊顢曢妶鍕寜闂傚倸鍊风粈渚€骞夐敓鐘茶摕闁靛⿵瀵屽▓浠嬫煙闂傚顦﹂柣鎾寸箞閺岀喖骞戦幇闈涙缂備胶濯寸紞渚€寮婚敐澶婄疀妞ゆ挾鍠撶粙鍥ь渻閵堝懎顒㈤柟鍛婃倐閸╃偤骞嬮敂钘変汗闁荤姴娉ч崘褏鐭楅梻鍌欑閹芥粓宕抽妷鈺佸瀭闁割偅娲滃畵渚€鏌涢埄鍐槈缁炬儳鍚嬮幈銊╂晲鎼粹€崇缂傚倸绉撮敃顏堢嵁閸愵喖鐓涢柛娑卞幘椤斿矂姊洪崷顓炲妺缂佽鍊规穱濠囨煥鐎n剛鐦堥梺姹囧灲濞佳嗏叴闂備胶枪椤戝棝骞戦崶顒€钃熼柣鏂跨殱閺嬫棃鏌涢…鎴濇灍闁诲繑鎸剧槐鎺撱偅閸愵亞鏆紓浣哄У閻楃姴顕f繝姘亜閻炴稈鈧厖澹曢梺姹囧灮濞呫儵鎮烽悧鍫熺槑闂備浇宕甸崰鎰垝鎼淬垺娅犳俊銈呮噺閸嬪倿鏌ㄩ悢鍝勑㈤柦鍐枑缁绘盯骞嬪▎蹇曚患缂備胶濮垫繛濠囧蓟閿熺姴纾兼慨妯块哺閻ㄦ垿姊洪崫鍕靛剰闁瑰啿閰i崺鐐哄箣閿旇棄鈧兘鏌涘▎蹇fЦ婵炲拑缍佸缁樻媴缁嬭法鐩庣紓浣藉皺閸嬫挾绮氭潏銊х瘈闁搞儜鍜佸晪婵$偑鍊栧Λ浣规叏閵堝應鏋嶉柕蹇嬪€栭埛鎴︽偣閹帒濡兼繛鍛姍閺岀喖宕欓妶鍡楊伓Department of Bioscience, Nagahama Institute of Bioscience and Technology, 1266 Tamura, Nagahama, Shiga 526-0829, Japan
Department ofMathematics and Statistics and Genome Atlantic, Dalhousie University, Halifax, Nova Scotia, Canada
Departmentof Botany, University of British Columbia, Vancouver, British Columbia, Canada
E-mail: yinagai@dal.ca.
Abstract
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1 (EF-1) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial (outgroup) sequences ("long-branch attraction," or "LBA"). However, no previous studies have explicitly determined the reason(s) why the artifactual allegiance of microsporidia and archaebacteria ("M + A") is recovered by all phylogenetic methods, including maximum likelihood, a method that is supposed to be resistant to classical LBA. Here we show that the M + A affinity can be attributed to those alignment sites associated with large differences in evolutionary site rates between the eukaryotic and archaebacterial subtrees. Therefore, failure to model the significant evolutionary rate distribution differences (covarion shifts) between the ingroup and outgroup sequences is apparently responsible for the artifactual basal position of microsporidia in phylogenetic analyses of EF-1 sequences. Currently, no evolutionary model that accounts for discrete changes in the site rate distribution on particular branches is available for either protein or nucleotide level phylogenetic analysis, so the same artifacts may affect many other "deep" phylogenies. Furthermore, given the relative similarity of the site rate patterns of microsporidian and archaebacterial EF-1 proteins ("parallel site rate variation"), we suggest that the microsporidian orthologs may have lost some eukaryotic EF-1–specific nontranslational functions, exemplifying the extreme degree of reduction in this parasitic lineage.
Key Words: phylogenetic artifact ? long-branch attraction ? EF-1 ? microsporidia ? covarion model ? parallel site rate variation
Introduction
Microsporidia are a monophyletic assemblage of obligate intracellular parasites that generally infect animals (particularly arthropods and fish). These unicellular eukaryotes lack classical mitochondria, peroxisomes and Golgi apparatus at the ultrastructural level (reviewed in Keeling and Fast [2002]). At the molecular level, these organisms appear to possess unusual ribosomes and ribosomal RNA (rRNA). The sedimentation coefficient of microsporidian ribosomes is more similar to that of prokaryotic (i.e., eubacterial or archaebacterial) 70S ribosomes than to that of typical eukaryotic 80S ribosomes (Ishihara and Hayashi 1968). Fusion of the large subunit (LSU) and 5.8S rRNAs is another trait shared between microsporidia and prokaryotes (Vossbrinck and Woese 1986). Furthermore, microsporidia were robustly estimated to be one of the basal lineages among eukaryotes in the first molecular phylogenies that include microsporidia, those of small subunit (SSU) rRNA sequences (Vossbrinck et al. 1987). This basal position was recovered in later SSU phylogenies regardless of the tree reconstruction method, model of nucleotide substitution, or taxonomic sampling (e.g., Cavalier-Smith and Chao [1996] and Kumar and Rzhetsky [1996]). Importantly, similar results were reproduced by early phylogenetic studies based on the amino acid sequences of translation elongation factors 1 and 2 (EF-1 and EF-2) (Kamaishi et al. 1996a, 1996b). Taken together, these results appeared to suggest that microsporidia were one of the earliest offshoots in eukaryotic evolution, diverging from the main trunk of eukaryotic lineages before the mitochondrial endosymbiosis.
However, in the past decade, this evolutionary scenario has drastically changed. Nuclear genes encoding proteins of mitochondrial origin were found in microsporidia, strongly suggesting the secondary loss of typical mitochondria in this lineage (Germot, Philippe, and Le Guyader 1997; Hirt et al. 1997; Peyretaillade et al. 1998; Arisue et al. 2002). Katinka et al. (2001) identified additional genes for mitochondrial proteins in the completed genome of the microsporidian, Encephalitozoon cuniculi. In line with these findings, Williams et al. (2002) recently ultrastructurally identified tiny, relic mitochondria in the microsporidian Trachipleistophora hominis. Secondly, further analyses of protein sequences (except elongation factors) have indicated the evolutionary affinity between microsporidia and fungi (reviewed in Keeling and Fast [2002]). In particular, relatively strong support for a relationship between microsporidia and fungi ("M + F") is recovered in phylogenies based on tubulins (Edlind et al. 1996; Keeling and Doolittle 1996; Keeling, Luker, and Palmer 2000; Keeling 2003), the largest subunit of RNA polymerase II (Hirt et al. 1999), seryl-tRNA synthetase, transcription initiation factor IIB, a subunit of the vacuolar ATPase and a GTPase (Katinka et al. 2001). Multigene phylogenies that include microsporidia also recover the M + F connection with high support values (Baldauf et al. 2000; Hashimoto, Arisue, and Hasegawa 2002). In sum, results accumulating since the late 1990s strongly suggest that microsporidia are fungi or close relatives of fungi. The basal position of microsporidia in SSU rRNA and elongation factor phylogenies has been rationalized as an artifact resulting from "long-branch attraction" (LBA), whereby the fast-evolving microsporidian and the long branch leading to the archaebacterial sequences were artifactually attracted to one another in tree reconstruction (e.g., Embley and Hirt [1998] and Philippe and Laurent [1998]). Under this hypothesis, the relatively simple architecture of microsporidian cells can be interpreted as the result of extensive adaptation to their parasitic lifestyle.
Unlike unweighted maximum-parsimony (MP), maximum-likelihood (ML) methods are generally expected to be statistically consistent if the Markov model of sequence change closely matches reality (for discussion, see Felsenstein [2003]). Simulation studies concur with this result, indicating that ML methods are less susceptible to LBA with finite data than MP or distance methods (e.g., Hasegawa and Fujiwara [1993] and Swofford et al. [2001]). However, in practice, neither infinite data nor perfectly correct models are available, and ML methods can yield biased estimates of phylogenies. For instance, ML methods display sensitivity to LBA when the evolutionary model is violated by the data, such as failure to model among-site rate variation (ASRV) (e.g., Sullivan and Swofford [2001]). Therefore, microsporidian phylogenies estimated under simple models that assume an equal rate of evolution among sites may be severely affected by LBA. Indeed, the EF-1 ML analyses under a more realistic model that takes into account ASRV failed to statistically reject the M + F relationship, although the basal position of microsporidia was still the ML estimate (Hirt et al. 1999; Hashimoto, Arisue, and Hasegawa 2002). A similar trend was observed in concatenated SSU plus LSU rRNA gene phylogenies (Philippe and Germot 2000). Importantly, the LSU rRNA (rooted) distance phylogeny and EF-2 (unrooted) distance and ML phylogenies under an ASRV model actually recovered the M + F relationship, albeit with weak support values (Hirt et al. 1999; Van de Peer, Ben Ali, and Meyer 2000; Keeling and Fast 2002). These results consistently suggest that the fast-evolving microsporidian sequences incorrectly branch at the base of eukaryotes because of LBA.
ASRV models are thought to be essential for phylogenetic analyses of problematic data sets, including fast-evolving sequences (e.g., Philippe and Laurent [1998]). However, these models are obviously insufficient to describe the complex processes of amino acid and nucleotide evolution. One of the crucial processes ignored in ASRV models is across-tree site rate variation (XTSRV), also called "heterotachy" by some authors (Lopez, Casane, and Philippe 2002). Functional and/or structural constraints on a biological molecule can be changed over evolutionary time, so that the rates at sites may vary across the tree (i.e., among the sequences in an alignment). This evolutionary process was first described as the concomitantly variable codon (covarion) model that allows a site to shift between two states, invariable ("off") or variable ("on"), in a protein over time (Fitch and Markowitz 1970). Explicit formulations of Markov models of covarion evolution have only recently been conceived (Tuffley and Steel 1998; Penny et al. 2001). The original covarion and ASRV models have also been merged into "covarion-like" models for both protein and nucleotide evolution that, rather than having simply invariable and variable sites, allow sites to "switch" between a number of different site rates across subtrees (reviewed in Penny and Hasegawa [2001]). The covarionlike models in Tuffley and Steel (1998), Penny et al. (2001), Galtier (2001), Lopez, Casane and Philippe (2002) and Huelsenbeck (2002) assume XTSRV to be a stochastic process varying across both sites and subtrees with frequent changes in rates throughout the larger tree. Although rates do vary across both sites and subtrees in these models, the process is stationary, implying that the marginal distributions of rates in any two subtrees will be the same. This stationarity of the frequency of changes throughout the larger tree does not provide a natural model for biological molecules subjected to a "covarion shift," where many sites experience a shift in evolutionary rates within a short period of time associated with a modification in functional and/or structural properties (Inagaki et al. 2003). The XTSRV in covarion shift models (e.g., modeled by a bivariate rate distribution in Susko et al. [2002]) should allow different marginal distributions in rates for subtrees on either side of the shift, and the rate shifts should occur with lower frequency in evolutionary time than stochastic covarion drift. Although the two types of covarion-like models are intrinsically different, they are not mutually exclusive and both serve to make some adjustment for covarionlike effects.
In this study, we investigated the impact of XTSRV in the EF-1 data set on the phylogenetic position of microsporidia. We divided the data into the archaebacterial and eukaryotic sub-trees, and identified the sites associated with a large difference in evolutionary rates across the subtrees ("SRN" sites) estimated by the bivariate rate ML method (Susko et al. 2002). The difference in log-likelihood (lnL) between the two tree topologies representing the competing hypotheses of (1) microsporidia as an early diverging eukaryotic lineage versus (2) microsporidia as relatives of fungi then was calculated based on the data sets excluding SRN sites. Similar analyses based on data sets excluding random sites (jackknifing) were then conducted to evaluate the significance of SRN site exclusions. Finally, the data sets with and without SRN sites were analyzed using the ML method.
Philippe and Germot (2000) previously investigated the impact of the "covarion structure" on putative LBA artifacts in a concatenated SSU plus LSU rRNA data set using a similar approach to ours. The authors excluded the "covarion" sites identified by a MP-based method (Lopez, Forterre, and Philippe 1999), and tested whether this procedure lowered the support for putative LBA artifacts such as the basal branching of diplomonads (e.g., Giardia), Trichomonas (a parabasalid), and microsporidia (Philippe and Germot [2000]). However, we have identified two potential deficiencies in these analyses. First, this pioneering work lacks control experiments that would allow the assessment of "covarion" site deletion on support for potential LBA artifacts (i.e., jackknifing of random sites). Another difficulty arises from the tripartition of the data (i.e., archaebacteria, "basal branching," and "crown" eukaryotes) to identify the covarion sites. Therefore, it is unclear which partition of the rRNA data bears the significant rate difference that allegedly produces the LBA artifacts.
By contrast, our results unequivocally illustrate that the failure to model covarion shifts across the archaebacterium-eukaryote split in the data results in an artifactual affinity between the long branches leading to the archaebacterial (outgroup) and microsporidian sequences—a kind of LBA that originates from model misspecification. Furthermore, we detect similarity between the patterns of site rates for the microsporidian and archaebacterial sequences ("parallel site rate variation") relative to eukaryotic homologs, and propose the loss of some eukaryotic EF-1–specific nontranslational functions in the microsporidian lineage to explain this "convergent" property of the two EF-1 subtrees.
Materials and Methods
Data Sets
The experimental details for the isolation of an EF-1 gene from a microsporidian Antonospora locustae (Nosema locustae was renamed to Antonospora locustae; see Slamovits, Williams, and Keeling [2004]) were described in Roger (1996). The novel DNA sequence of Antonospora EF-1 was deposited under DDBJ/EMBL/GenBank accession number AY452734.
We manually aligned the deduced EF-1 amino acid sequences of 22 phylogenetically divergent eukaryotes (19 "short-branch" eukaryotes [SB eukaryotes] and three microsporidia) and 20 archaebacteria. The accession numbers of the sequences considered in this study are listed in the caption of figure 1. To focus on LBA between the microsporidian and archaebacterial EF-1 sequences, other fast-evolving eukaryotic sequences (e.g., those from diplomonads and parabasalids) were not considered in this study. We excluded ambiguously aligned sites and sites that included gaps, creating the "Micro" data set consisting of 42 sequences with 349 sites (table 1). Subsequently, the "Basic" data set was generated from the Micro data by exclusion of the microsporidian sequences for the estimation of conditional mode rates at sites (site rates) and evaluating the rate distribution difference across eukaryotic and archaebacterial subtrees (see below). Smaller data sets, "Micro*" and "Basic*," were derived from the Micro and Basic data sets by exclusion of all archaebacterial sequences except those of Methanosarcina and Aeropyrum, respectively (table 1). These data sets were subjected to computationally intensive ML phylogenetic analyses.
FIG. 1. (A) A histogram of the absolute value of SR calculated between the SB-eukaryotic and archaebacterial subtrees in the Basic data set. Arrows indicate the ranges of top 5% to 40% absolute SR. The absolute value SR range, from which 245 sites were randomly jackknifed to generate the "JK*" data sets, is shaded. (B) The tree topologies examined for the phylogenetic positions of microsporidia. The archaebacterial clade is omitted. The microsporidian clade was attached to the edge A or B, creating the "M + A" or "M + F" tree topology, respectively. Accession numbers for the sequence considered are as follows: Sulfolobus acidocaldaris, X52382; S. solfataricus, X70701; S. tokodaii, BAB65236; Desulfurococcus mobilis, X73582; Aeropyrum pernix, AP000062; Pyrobaculum aerophilum, U94347; uncultured crenarchaeote 74A4, AF393466; Thermococcus celer, X52383; Pyrococcus horikohii, AP000006; P. abysii, AJ248285; P. woesei, X59857; Archaeoglobus fulgidus, AE001039; Methanococcus jannaschii, U67486; M vannielii, X05698; Methanosarcina barkeri, ZP_00079008; Ferroplasma acidarmanus, ZP_000005; Halobacterium sp., AAG20682; H. halobium, D32120; Haloarcula marismotui, X16677; Thermoplasma acidophilum, AL445064; Homo sapiens, BC002845; Caenorhabditis elegans, AAA81688; Schizosaccharomyces pombe, D82571; Podospora anserina, X74799; Arabidopsis thaliana, AB026655; Nicotiana tabacum, D63396; Euglena gracilis, X16890; Trypanosoma brucei, U10562; Acrasis rosea, AF190771; Dictyostelium discoideum, X55972; Mastigamoeba balamuthi, a contig of BE636534, BE636560, BE636596, and BE636646; Entamoeba histolytica, M92073; Porphyra purpurea, U08844; Plasmodium knowlesi, AJ224153; Cryptosporidium parvum, U71180; Stylonychia lemnae, X57926; Blastocystis hominis, AF091355; Phytophthora capsici, AF109666; Cyanophora paradoxa, AF092951; Antonospora (Nosema) locustae, AY452734; Glugea plecoglossi, BAA06865; Encephalitozoon cuniculi, CAD25298
Table 1 Details of the Data Sets Used in This Study.
Site rates of the SB-eukaryotic and archaebacterial subtrees in the Basic data set were estimated by the bivariate ML rate method with the PAM001 amino acid substitution model using Bivar (Susko et al. 2002). In this method, ASRV in the two subtrees was described by a bivariate discrete ML rate distribution with a matrix of 25 x 25 equally spaced rate categories. The upper limit for rates was set as 6.0. The probability of each rate category was estimated from the data.
Absolute values of the difference in bivariate site rates (SR) across the SB-eukaryotic and archaebacterial subtrees were calculated, and then eight sets of sites with top 5% to 40% absolute SR with 5% interval were defined (fig. 1A). Henceforth, the sites that associate with top N% SR across the two subtrees in the Basic data set, as well as their homologous positions in other data sets, are designated as the SRN sites (N = 5, 10, 15, 20, 25, 30, 35, or 40). The SRN sites were excluded from the Micro data set, generating eight "Micro-SRN" data sets. These data sets were subjected to the examination of the impact of SRN sites on the phylogenetic position of microsporidia (see below). Likewise, the "Micro*-SR15," "Basic*-SR15," "Basic-SR15," and "Basic-SR30" data sets were further generated from the Micro*, Basic*, and Basic data sets by exclusion of the SR15 or SR30 sites. The first two data sets were used for ML phylogenetic analyses, and the last two were used for evaluation of the rate distribution difference across the archaebacteria-eukaryote split (see below). The details of the data sets described above are listed in table 1.
Phylogenetic Analyses
All distance analyses in this study were conducted as follows: ML distance matrices were calculated under the JTT amino acid substitution model with ASRV approximated using a gamma () distribution with eight equally probable discrete rate categories (JTT+ model) in Tree-Puzzle version 5.0 (Schmidt et al. 2002). Shape parameters (s) for distributions were estimated from each data set (table 1). Phylogenetic trees were reconstructed from ML distance matrices using the Fitch-Margoliash weighted least squares (Fitch) method implemented in PHYLIP version 3.6a (Felsenstein 1993) with the input sequence order randomized five times and global rearrangements. Distance bootstrap analyses (500 replicates) were conducted using Seqboot, Fitch, Consense in PHYLIP (Felsenstein 1993), Tree-Puzzle (Schmidt et al. 2002), and Puzzleboot version 1.02 (A. J. Roger and M. E. Holder, http://hades.biochem.dal.ca/Rogerlab/Software/Software.html). The parameters and settings for the bootstrap analyses were the same as described above, except a single randomized input sequence order per replicate was performed.
ML trees were reconstructed under the JTT+ model using ProML in PHYLIP (Felsenstein 1993) with global rearrangements and the input sequence order randomized five times. ASRV was modeled using discrete approximation of distributions (eight equally probable categories). All parameters required were estimated from the data using Tree-Puzzle (Schmidt et al. 2002). The parameters and settings for the ML bootstrap analyses (100 replicates) were the same as described above, except (1) a single randomized input sequence order per replicate was performed in the tree search, and (2) ASRV was modeled using discrete distributions with four equally probable rate categories.
Parametric Bootstrap Tests for the Overall Rate Distribution Difference Across Two Subtrees
We have previously established a parametric bootstrap test to examine the rate distribution difference between two subtrees in a combined data set—the overall rate distribution difference is represented by the sum of absolute values of "weighted" SR (henceforth designated as "abrsum") (Susko et al. 2002). The observed value is compared with a null distribution from 1,000 Monte Carlo simulated data sets to evaluate its significance. In this study, the rate distribution difference across the SB-eukaryotic and archaebacterial subtrees was assessed using the Basic, Basic-SR15, and Basic-SR30 data sets (table 1). The details of the parametric bootstrap test are described in Susko et al. (2002).
Comparison of the Two Hypotheses for the Phylogenetic Position of Microsporidia
We generated tree topologies to test two alternative hypotheses for the phylogenetic position of microsporidia: the "M + A" hypothesis (under which microsporidia are one of the earliest diverged lineages among the extant eukaryotes) and the "M + F" hypothesis (under which microsporidia branch with or among fungi). First, a distance phylogenetic tree was reconstructed under the JTT+ model from the Basic data set as described above. The microsporidian clade was then attached to the edge A or B in the optimal distance tree topology (see figure 1B), representing the M + A or M + F hypothesis, respectively. The log-likelihoods (lnLs) of the M + A or M + F trees were calculated from the Micro and Micro-SRN data sets by the ML method under the JTT+ model using Tree-Puzzle (Schmidt et al. 2002). The parameters and branch lengths for each tree were re-estimated from each data set. Difference in the lnLs of the two trees (lnL) was then calculated by subtraction of the former from the latter, assessing the dominance of the M + A topology over the other.
We generated, at each of eight values of N (N = 5, 10, 15, 20, 25, 30, 35, or 40 [see above]), 100 "JK" data sets by randomly deleting (jackknifing) the same number of sites as the SRN sites, using Jacksite (H. Philippe, Université de Montreal). Similarly, we prepared another eight sets of 100 "JK*" data sets as follows. The 245 alignment sites in the Micro data set that correspond neither to the constant sites nor to the SR10 sites in the Basic data set were jackknifed randomly to generate JK* data sets (shaded area in figure 1A). The lnLs between the M + A and M + F trees were calculated from the JK and JK* data sets (null distributions) and compared with the corresponding values obtained from the SRN site exclusion (observed values).
Estimation of Site Rate Pattern of EF-1 Subtrees
We investigated the difference in site rate pattern (1) between the SB-eukaryotic and archaebacterial subtrees, (2) between the SB-eukaryotic and microsporidian subtrees, and (3) between the archaebacterial and microsporidian subtrees. SR between two subtrees and their 95% confidence intervals were estimated using the bivariate ML method (Susko et al. 2002). Because large errors were expected in the estimation of microsporidian site rates as a result of the poor sequence sampling (only three sequences are available for this study), we considered only sites associating with significant SR ("SRS" sites), for which 95 % confidence intervals do not contain zero. The absolute values of these significant SR sites were summed to represent the "overall rate distance" between two subtrees (henceforth designated as "arsum").
Results and Discussion
Robustness of the ML Method Against the Artifactual Microsporidia Plus Archaebacteria Grouping
Microsporidia were estimated to be the earliest diverging lineage among eukaryotes considered (M + A grouping) in the optimal distance trees based on both Micro and Micro* data sets, which differ from each other only in the sampling of archaebacterial sequences (20 versus 2, respectively [table 1]). Likewise, the distance bootstrap analyses on the two data sets display similar results, except the monophyletic clades of the animal sequences and of the euglenozoan sequences (the first two columns in table 2). These results indicate that the selection of archaebacterial sequences has no large impact on the overall relationship among eukaryotic sequences. Thus, the smaller Micro* data set was used for more stringent phylogenetic analyses using the ML method under the JTT+ model.
Table 2 Impact of Phylogenetic Methods and SR15 Site Exclusion on BP Values.
The M + A grouping was recovered in the optimal ML tree based on the Micro* data set (fig. 2A) in accordance with previously published works assessing a microsporidian (Glugea plecoglossi) EF-1 sequence (Kamaishi et al. 1996a, 1996b). The M + A grouping receives relatively high support (BP = 83%) in the distance analysis, whereas the ML analysis gives very weak support (BP = 34%) to the same relationship (fig. 2A). These results clearly indicate that the ML method under the JTT+ model is less susceptible to LBA producing the M + A grouping in the EF-1 data than is the distance method under the same model.
FIG. 2. Phylogenetic analyses of the microsporidian EF-1 sequences. (A) An ML tree from the Micro* data set (349 sites) rooted by the archaebacterial sequences. Only ML BP values over 50% are shown. The BP values for particular nodes are also listed in table 2. (B) An ML tree based on the Micro*-SR15 data set (298 sites) rooted by the archaebacterial sequences. The details were as described in (A)
The ML or distance bootstrap analyses under the JTT+ model fail to resolve deep level relationships among eukaryotic lineages with BPs greater than 50%, except the M + A grouping in the latter analysis (figure 2A and the second and third columns in table 2). Significantly, no other large incongruities between the BP values from the ML method and the corresponding values from the distance method were observed for other major eukaryotic clades recovered, indicating that the ML bootstrap analysis is not overly stringent (the first three columns in table 2).
A Covarion Shift Across the SB-Eukaryotic and Archaebacterial Subtrees
As discussed above, the optimal ML tree reconstructed under the JTT+ model has the artifactual M + A grouping (fig. 2A). Unless significant model misspecification is present, the ML method may be robust against classical LBA from coincidental concurrence of character states between two distantly related fast-evolving lineages. In addition, we detect no significant homoplasy of amino acid characters across the microsporidian and archaebacterial sequences in the EF-1 data set (see Supplementary Material online). Therefore, our phylogenetic analyses under the JTT+ model most probably fails to account for a crucial aspect of the evolutionary process that generated the data.
Our group previously reported a significant rate distribution difference (covarion shift) between the archaebacterial and eukaryotic subtrees in the combined EF-1 tree (Susko et al. 2002; Inagaki et al. 2003). This covarion shift clearly violates the assumption of the JTT+ model that requires that the rates at sites be stationary over the tree. Therefore, we hypothesized that the failure to account for the covarion shift across the two EF-1 subtrees in the phylogenetic analyses was responsible for the artifactual M + A grouping.
We reassessed the covarion shift across the archaebacterium-eukaryote split in the Basic data sets using parametric bootstrap tests (Susko et al. 2002; Inagaki et al. 2003). The observed abrsum value appears to fall outside of the bootstrap distribution (P < 0.001, 5.7 standard errors [SE] away from the mean of bootstrap values [fig. 3A]), suggesting that the overall rate distributions in the two subtrees are significantly different. However, the distance between the observed value and the mean of bootstrap values appears to be reduced to 3.2 and 2.6 SE by both exclusions of the SR15 and SR30 groups of sites respectively, although both P values are still less than 0.01 (fig. 3B and C). These results indicate that the magnitude of the covarion shift across the two subtrees can be mitigated by SRN site exclusion.
FIG. 3. (A–C) Rate distribution differences across the SB-eukaryotic and archaebacterial subtrees. The significance of the overall rate distribution difference across the SB-eukaryotic and archaebacterial subtrees was examined using parametric bootstrap tests (Susko et al. 2002). The results from the analyses based on the Basic, Basic-SR15, and Basic-SR30 data sets are shown in (A), (B), and (C), respectively. The histograms and arrows indicate the bootstrap distributions and observed values, respectively. (D–E) Impact of the SRN site exclusion on the M + A grouping. Log-likelihoods of the M + F and M + A trees was calculated from each of the Micro-SRN data sets, and then the former was subtracted from the latter. The bootstrap distributions generated from 100 JK and JK* data sets are presented using box plots in (D) and (E), respectively. The observed values are indicated by diamond symbols. In the case that the null hypothesis of lowering of lnL solely by the reduction of data was rejected at the 1% or 5% level, the observed values are indicated by open or double diamond symbols, respectively. The lnL = 12.58 from the Micro data set (no site exclusion) is indicated by an arrow on the Y-axis. Details of the two tree topologies considered are described in figure 1B
Impact of SRN Site Exclusion on the Microsporidia Plus Archaebacteria Grouping
The hypothesis that the artifactual M + A grouping was caused by the covarion shift in the EF-1 data was assessed by comparing the M + A and M + F tree topologies using lnLs ("tree lnL comparison"). If our working hypothesis is correct, the dominance of the M + A tree topology over the alternative should be significantly reduced by the SRN site exclusions that diminish the magnitude of the covarion shift (see figures 3A–C).
Without any SRN site exclusions, the M + A tree appears to be better than the alternative tree by 12.58 lnL units (arrow on the Y-axis in figures 3D and E). The lnLs appear to drop from 12.58 to 1.14 lnL units by the progressive exclusion of the SRN sites (figs. 3C and D). In comparison between each observed value and the corresponding null distribution from 100 JK data sets, a significant decrease in 1nL (P < 0.05) was obtained after the exclusion of the SR10 sites (open diamond symbols in figure 3D). P < 0.01 was obtained by the exclusion of the SR35 and SR40 sites (double diamond symbols in figure 3D). We can conclude that the decrease in the lnL because of the exclusion of SRN sites is greater than one would expect based on random deletion.
The SRN sites never contain any of 70 constant sites among 19 SB-eukaryotic and 20 archaebacterial sequences (20% of 349 analyzed sites in the Basic data set). Therefore, to mimic this effect, we created the JK* data sets that contain the sites corresponding to the constant and SR10 sites in the Basic data set. However, the jackknife values calculated from the JK* data sets appear to be similar to those from the JK data sets (fig. 3E). P < 0.01 was obtained after the exclusion of the SR15 sites (double diamond symbols in figure 3E). The tree lnL comparisons clearly indicate that the sites associated with a covarion shift across the SB-eukaryotic and archaebacterial subtrees are responsible for the dominance of the artifactual M + A tree, independent of the jackknife distribution methods used (JK or JK*).
Impact of the SR15 Site Exclusion on ML Phylogeny
We further investigated the impact of the covarion shift on ML phylogenetic analyses using the Micro*-SR15 data set. The optimal ML topology appears to have no M + A grouping (fig. 2B), suggesting that the SR15 site exclusion sufficiently eliminates the source of the LBA uniting the microsporidian and archaebacterial sequences in the Micro* data set. Significantly, much of the phylogenetic information in the original data may remain after the SR15 site exclusion because the ML BP values for the major eukaryotic clades were not drastically changed by the presence or absence of the SR15 sites, except the bootstrap support for the animal plus fungi clade reduced nearly 20% by the site exclusion (the third and fourth columns in table 2). Likewise, these support values appear not to be affected by the presence or absence of the microsporidian sequences, except the apicomplexan clade (compare the third and fifth column or the fourth and sixth columns in table 2). Unfortunately, none of the sequences in the data displays strong affinity (BP > 50%) to the microsporidian sequences in the ML bootstrap analysis (fig. 2B). The true phylogenetic signal in the microsporidian sequences has probably been erased by their extremely high evolutionary rates. It is also possible that the number of phylogenetically informative sites may be relatively small in this data set. Taken together the results from the tree lnL comparisons and ML phylogeny on the Micro*-SR15 data set, we conclude that the artifactual M + A grouping in the EF-1 phylogenies originates from the failure to modeling the covarion shift across the eukaryotic and archaebacterial subtrees.
Our strategy of compensating for a covarion shift in the EF-1 data set is generally applicable to rooted phylogenies. Regardless of the phylogenetic marker or the data type (amino acid or nucleotide sequences), the difference in site rate distribution across ingroup and outgroup sequences may be significant. When the basal branching of a fast-evolving sequence is suspected to be an artifact and an alternative hypothesis is available (e.g., M + A versus M + F tree topologies in this study), one should examine whether model misspecification from covarion shift biases phylogenetic estimates using the strategy described above. The identification and exclusion of sites that exhibit a significant XTSRV across the ingroup and outgroup sequences by various methods (e.g., those described in this study and in Lopez, Forterre, and Philippe [1999]) should be carried out routinely for "deep" phylogenies to compensate or prevent potential artifacts from covarion shifts.
It would be intriguing to assess whether the basal position of microsporidia in SSU rRNA phylogenies is a phylogenetic artifact caused by the covarion shift across the eukaryotic and prokaryotic (outgroup) subtrees using the same strategy developed in this study. If SSU rRNA phylogenies are biased by the failure to model of the covarion shift across the ingroup and outgroup subtrees in a SSU rRNA phylogeny, we can expect significant diminishment of the support for the basal position of microsporidia by SRN exclusion across the two SSU rRNA subtrees. Likewise, the fast-evolving EF-1 sequences from diplomonads and/or parabasalids may be misplaced at the base of the eukaryotic subtree by the same model misspecification. Nevertheless, it is difficult to assess whether the basal positions of diplomonads and parabasalids in the EF-1 phylogeny are artifacts, because, currently, there are no clearly supported alternative hypotheses for their phylogenetic positions.
Parallel Site Rate Variation in EF-1
The rate heterogeneity in the data set including SB-eukaryotic EF-1 sequences ("SB-euk" data set [table 1]) is more extreme than that in the data set including both SB-eukaryotic and microsporidian orthologs ("SB-euk plus Micro" [table 1]); the distribution shape parameters (s) for the former and latter data sets are estimated to be 0.33 and 0.74, respectively. Similarly, the degree of rate heterogeneity in the SB-euk data set ( = 0.33) and that in the data set including 20 archaebacterial sequences are largely different ( = 0.74 ["Arch" in table 1]). These observations circumstantially suggest that the overall evolutionary mode of the microsporidian subtree is more similar to that of the archaebacterial subtree than to that of the SB-eukaryotic subtree. Thus, it is possible that the eukaryote-microsporidian split represented a second covarion shift in the data. To examine the potential parallel variation of evolutionary modes in the EF-1 phylogeny, we estimated and compared the site rates of the three subtrees.
We detected 44 SRS sites with relatively large SR between the archaebacterial and SB-eukaryotic subtrees (arsum = 75.25 [fig. 4A]), corresponding to significant covarion shift identified by the parametric bootstrap test shown in figure 3A. Interestingly, 75 SRS sites with arsum = 58.15 were detected between the SB-eukaryotic and microsporidian subtrees, whereas only 27 SRS sites with arsum = 19.71 were found between the archaebacterial and microsporidian subtrees (figs. 4B and C). These results led us to propose a probable scenario for EF-1 evolution in archaebacteria and eukaryotes. First, the EF-1 site rate pattern was largely differentiated at the archaebacterium-eukaryote transition by acquisition of a number of nontranslational (auxiliary) functions in the eukaryote proteins. Then, after microsporidia split from fungi, extreme reduction of the EF-1 functions occurred in the former lineage, causing in the site rate pattern of the microsporidian EF-1 orthologs to change again. In this view, microsporidian and archaebacterial EF-1 proteins could convergently share relatively similar site rate distributions, a phenomenon we call "parallel site rate variation" or "PSRV."
FIG. 4. Site rate pattern comparisons. (A) SB-eukaryotic (abbreviated as "SB-euk") EF-1 versus archaebacterial ("Arch") EF-1. (B) SB-eukaryotic EF-1 versus microsporidian ("Micro") EF-1. (C) Archaebacterial EF-1 versus microsporidian EF-1. Only sites associated with significant SR are plotted. Hash marks and vertical bars indicate SR and their 95% confidence intervals, respectively
This scenario is congruent with our knowledge about eukaryotic EF-1 functions and biology of microsporidia. Eukaryotic EF-1 is known to have auxiliary functions that are most probably absent in the archaebacterial orthologs, as well as its core function as a translation elongation factor. For instance, biochemical/genetic studies using SB-eukaryotes (e.g., Dictyostelium and ciliates) indicated that eukaryotic EF-1 binds to cytoskeletal proteins and calmodulin and is involved in the ubiquitin-dependent proteolytic system (reviewed in Negrutskii, and El'skaya [1998]). It is also likely that additional (potentially many) eukaryotic EF-1–specific protein-protein interactions remain unidentified. Taken together, it is rational to assume that some sites in the SB-eukaryotic proteins are strictly constrained by the eukaryotic EF-1–specific auxiliary functions, whereas the corresponding sites in the archaebacterial proteins are free to vary because of the primary absence of these functions as schematically drawn in figure 5.
FIG. 5. A scheme for the parallel site rate variation hypothesis between the microsporidian and archaebacterial EF-1. EF-1 in archaebacteria (abbreviated as "Arch," top left) is constrained solely by the core function related to translation (primary lack of nontranslational [auxiliary] functions). The area in EF-1 constrained by this core function is highlighted in gray. At the archaebacterium-eukaryote transition (numeral 1), EF-1 acquired auxiliary functions (top right). The areas constrained from the SB-eukaryotic ("SB-euk") EF-1–specific auxiliary functions are shown in black spots. During the reduction of functionality of microsporidian cellular system (numeral 2), some auxiliary functions in microsporidian ("Micro") EF-1 were lost or weakened (bottom). Steps during eukaryotic evolution are shaded
In contrast to eukaryotes in general, the entire cellular system of microsporidia (including the nuclear genome structure; see Katinka et al. [2001]) is highly reduced, most probably because of their intracellular parasitic lifestyle (Keeling and Fast 2002). Likewise, some of the EF-1 auxiliary functions may have been lost/weakened during the reductive evolution of microsporidia (schematically drawn in figure 5). Actually, 62 SRS sites with small negative SR detected in the site rate pattern comparison between the SB-eukaryotic and microsporidian subtrees confirms this notion (fig. 4B). At these alignment positions, the amino acid character appears to be constant in the SB-eukaryotic subtree but variable in the microsporidian subtree (data not shown), indicating that the microsporidian proteins experience less functional constraints than the SB-eukaryotic orthologs. As the core translational functions of EF-1 are absolutely indispensable for cell viability and cannot be reduced, secondary loss of the auxiliary functions in the microsporidian proteins is the most plausible explanation (fig. 5). Under the parallel absence of eukaryotic EF-1–specific auxiliary functions in microsporidian and archaebacterial EF-1 proteins, the two subtrees could have evolved a relatively similar rate distribution independently.
Obviously, this supposed PSRV in the EF-1 data set should be examined by more accurate site rates estimated from a larger set of microsporidian EF-1 sequences in future studies. In-depth studies of the auxiliary functions of both SB-eukaryotic and microsporidian EF-1 proteins would also be essential for testing this hypothesis.
Conclusion
Here we have shown that the covarion shift across the in-group and out-group sequences most likely is responsible for the artifactual "deep" branching position of microsporidia in EF-1 phylogenies. Furthermore, we detected PSRV between the microsporidian and archaebacterial EF-1 sequences. Covarion shifts, and, in particular, PSRV, need to be pursued as intriguing potential sources of artifacts in many deep phylogenetic inferences. Indeed, a study undertaken by some of us shows analytically that under simple Markov models of sequence change, neighbor-joining and least squares distance methods will succumb to the LBA form of inconsistency when an ASRV model is employed if significant PSRV is occurring (Susko, Inagaki, and Roger, 2004). To further investigate the phylogenetic artifacts that arise from PSRV, studies on data simulated with a more complex protein evolutionary models employing both ASRV and XTSRV will be key.
Supplementary Material
EF-1 alignment considered in this study is shown in the Supplementary Material online.
Acknowledgements
We thank J. Leigh for critical reading and the members of Statistical and Evolutionary Bioinformatics Group (Dalhousie University, Halifax, Canada) for valuable discussions. We also thank H. Philippe (Université de Montreal, Montreal, Canada) for coding the Jacksite program. Y.I. is supported by the "Understanding Prokaryote Genome Diversity and Evolution Project" of Genome Atlantic. A.J.R. is supported as a Scholar by the CIAR program in Evolutionary Biology. This work is supported by operating 227085-00 from NSERC (awarded to A.J.R.). This collaboration is part of a Genome Atlantic/Genome Canada large-scale project.
Literature Cited
Arisue, N., L. B. Sanchez, L. M. Weiss, M. Muller, and T. Hashimoto. 2002. Mitochondrial-type hsp70 genes of the amitochondriate protists, Giardia intestinalis, Entamoeba histolytica and two microsporidians. Parasitol. Int. 51:9-16.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972-977.
Cavalier-Smith, T., and E. E. Chao. 1996. Molecular phylogeny of the free-living archezoan Trepomonas agilis and the nature of the first eukaryote. J. Mol. Evol. 43:551-562.
Edlind, T. D., J. Li, G. S. Visvesvara, M. H. Vodkin, G. L. McLaughlin, and S. K. Katiyar. 1996. Phylogenetic analysis of beta-tubulin sequences from amitochondrial protozoa. Mol. Phylogenet. Evol. 5:359-367.
Embley, T. M., and R. P. Hirt. 1998. Early branching eukaryotes? Curr. Opin. Genet. Dev. 8:624-629.
Felsenstein, J. 1993. Phylogeny inference package. Version 3.2. Cladistics 5:166.
Felsenstein, J. 2003. Likelihood method. Pp. 248–274 in Inferring phylogenies. Sinauer Associates, Sunderland, Mass.
Fitch, W. M., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579-593.
Galtier, N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol. Evol. 18:866-873.
Germot, A., H. Philippe, and H. Le Guyader. 1997. Evidence for loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae. Mol. Biochem. Parasitol. 87:159-168.
Hasegawa, M., and M. Fujiwara. 1993. Relative efficiencies of the maximum likelihood, maximum parsimony and neighbor-joining methods for estimating protein phylogeny. Mol. Phylogenet. Evol. 2:1-5.
Hashimoto, T., N. Arisue, and M. Hasegawa. 2002. Application of molecular phylogenetic inference and associated problems: illustrative data analysis on early eukaryotic evolution. Proc Inst. Stat. Math. 50:45-68.
Hirt, R. P., B. Healy, C. R. Vossbrinck, E. U. Canning, and T. M. Embley. 1997. A mitochondrial Hsp70 orthologue in Vairimorpha necatrix: molecular evidence that microsporidia once contained mitochondria. Curr. Biol. 7:995-998.
Hirt, R. P., J. M. Logsdon, Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:580-585.
Huelsenbeck, J. P. 2002. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. 19:698-707.
Inagaki, Y., C. Blouin, E. Susko, and A. J. Roger. 2003. Assessing functional divergence in EF-1 and its paralogs in eukaryotes and archaebacteria. Nucleic Acids Res. 31:4227-4237.
Ishihara, R., and Y. Hayashi. 1968. Some properties of ribosomes from the sporoplasm of Nosema bombycis. J. Invert. Pathol. 11:377-385.
Kamaishi, T., T. Hashimoto, Y. Nakamura, Y. Masuda, F. Nakamura, K. Okamoto, M. Shimizu, and M. Hasegawa. 1996a. Complete nucleotide sequences of the genes encoding translation elongation factors 1 and 2 from a microsporidian parasite, Glugea plecoglossi: implications for the deepest branching of eukaryotes. J. Biochem. (Tokyo) 120:1095-1103.
Kamaishi, T., T. Hashimoto, Y. Nakamura, F. Nakamura, S. Murata, N. Okada, K. Okamoto, M. Shimizu, and M. Hasegawa. 1996b. Protein phylogeny of translation elongation factor EF-1 suggests microsporidians are extremely ancient eukaryotes. J. Mol. Evol. 42:257-263.
Katinka, M. D., S. Duprat, and E. Cornillot, et al. 2001 (17 co-authors). Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414:450-453.
Keeling, P. J. 2003. Congruent evidence from alpha-tubulin and beta-tubulin gene phylogenies for a zygomycete origin of microsporidia. Fungal Genet. Biol. 38:298-309.
Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13:1297-1305.
Keeling, P. J., and N. M. Fast. 2002. Microsporidia: biology and evolution of highly reduced intracellular parasites. Annu. Rev. Microbiol. 56:93-116.
Keeling, P. J., M. A. Luker, and J. D. Palmer. 2000. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. Mol. Biol. Evol. 17:23-31.
Kumar, S., and A. Rzhetsky. 1996. Evolutionary relationships of eukaryotic kingdoms. J. Mol. Evol. 42:183-193.
Lopez, P., D. Casane, and H. Philippe. 2002. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19:1-7.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496-508.
Negrutskii, B. S., and A. V. El'skaya. 1998. Eukaryotic elongation factor 1: structure, expression, functions and possible aminoacyl-tRNA channeling. Prog. Nucleic Acid Res. Mol. Biol. 60:47-78.
Penny, D., and M. Hasegawa. 2001. Covarion model of molecular evolution. Pp. 473–477 in S. Brenner and J. H. Miller, eds. Encyclopedia of genetics. Academic Press, San Diego.
Penny, D., B. J. McComish, M. A. Charleston, and M. D. Hendy. 2001. Mathematical elegance with biochemical realism: the covarion model of molecular evolution. J. Mol. Evol. 53:711-723.
Peyretaillade, E., V. Broussolle, P. Peyret, G. Metenier, M. Gouy, and C. P. Vivares. 1998. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin. Mol. Biol. Evol. 15:683-689.
Philippe, H., and A. Germot. 2000. Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol. Biol. Evol. 17:830-834.
Philippe, H., and J. Laurent. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616-623.
Roger, A. J. 1996. Studies on the phylogenetic and gene structure of early-branching eukaryotes. PhD Thesis. Dalhousie University, Halifax, Canada.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504.
Slamovits, C. H., B. A. P. Williams, and P. J. Keeling. 2004. Transfer of Nosema locustae (Microporidia) to Antonospora locustae n. comb. based on molecular and ultrastructural data. J Euk. Microbiol. 51:270-213.
Sullivan, J., and D. L. Swofford. 2001. Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst. Biol. 50:723-729.
Susko, E., Y. Inagaki, and A. J. Roger. 2004. On inconsistency of the neighbor-joining, lease squares and minimum evolutions estimation when substitution processes are incorrectly modeled. Mol. Biol. Evol. (in press).
Susko, E., Y. Inagaki, C. Field, M. E. Holder, and A. J. Roger. 2002. Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol. Biol. Evol. 19:1514-1523.
Swofford, D. L., P. J. Waddell, J. P. Huelsenbeck, P. G. Foster, P. O. Lewis, and J. S. Rogers. 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50:525-539.
Tuffley, C., and M. Steel. 1998. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147:63-91.
Van de Peer, Y., A. Ben Ali, and A. Meyer. 2000. Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246:1-8.
Vossbrinck, C. R., J. V. Maddox, S. Friedman, B. A. Debrunner-Vossbrinck, and C. R. Woese. 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature 326:411-414.
Vossbrinck, C. R., and C. R. Woese. 1986. Eukaryotic ribosomes that lack a 5.8S RNA. Nature 320:287-288.
Williams, B. A., R. P. Hirt, J. M. Lucocq, and T. M. Embley. 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418:865-869.(Yuji Inagaki*,, Edward Su)