The Evolutionary Fate of Nonfunctional DNA in the Bacterial Endosymbiont Buchnera aphidicola
http://www.100md.com
分子生物学进展 2004年第11期
Institut Cavanilles de Biodiversitat i Biologia Evolutiva and Departament de Genètica, Universitat de València, 46071 Valencia, Spain
E-mail: francisco.silva@uv.es.
Abstract
Reduction of the genome size in endosymbiotic bacteria is the main feature linked to the adaptation to a host-associated lifestyle. We have analyzed the fate of the nonfunctional DNA in Buchnera aphidicola, the primary endosymbiont of aphids. At least 164 gene losses took place during the recent evolution of three B. aphidicola strains, symbionts of the aphids Acyrthosiphon pisum (BAp), Schizaphis graminum (BSg), and Baizongia pistacia (BBp). A typical pattern starts with the inactivation of a gene, which produces a pseudogene, and is followed by the progressive loss of its DNA. Our results show that during the period from the separation of the Aphidinae and Pemphiginae lineages (86–164 MYA) to the divergence of BAp and BSg (50–70 MYA) the half-life of a pseudogene was 23.9 Myr. For the remaining periods of evolution, the ranges of values obtained for this parameter are of the same order of magnitude. These results have revealed that a gene inactivated during B. aphidicola evolution requires 40–60 Myr to become almost completely disintegrated. Moreover, we have shown a positive correlation between the decrease in the GC content and the DNA loss for these nonfunctional DNA regions. When gene losses are classified, based on the detection of a pseudogene or otherwise of an absent gene in the modern B. aphidicola genomes, we have observed a drastic reduction of DNA length in the latter versus the former relative to the functional gene. Finally, we have also detected a slight reduction in size of the intergenic regions in the three B. aphidicola strains, when they are compared with the size of the close relative Escherichia coli.
Key Words: Buchnera aphidicola ? DNA loss ? genome reduction ? gene disintegration ? pseudogenes ? symbiosis
Introduction
A large number of bacterial species living in association to animal cells present small size genomes. They may be pathogens such as Mycoplasma spp. (Fraser et al. 1995; Himmelreich et al. 1996; Chambaud et al. 2001), Chlamydia spp. (Stephens et al. 1998; Kalman et al. 1999), or Rickettsia spp. (Andersson et al. 1998; Ogata et al. 2001), but they may also be mutualistic endosymbionts such as Buchnera aphidicola (Shigenobu et al. 2000; Tamas et al. 2002; van Ham et al. 2003), Wigglesworthia glossinidia (Akman et al. 2002), or Blochmannia floridanus (Gil et al. 2003). Their genomes, ranging in size from 450 to 1.2 Mb, seem to have derived from free-living ancestors of larger genomes. After the shift of the bacterial ancestor to a host-associated lifestyle, a process of genome degradation started causing the inactivation of many genes by either point mutations or the removal of the complete gene (Andersson and Kurland 1998; Andersson and Andersson 1999, 2001; Moran and Wernegreen 2000; Clark et al. 2001; Mira, Ochman, and Moran 2001; Moran and Mira 2001; Silva, Latorre, and Moya 2001; Moran 2002). The newly formed pseudogenes then entered a process of disintegration (Andersson and Andersson 2001; Silva, Latorre, and Moya 2001), which contributed to a higher reduction of the genome size. Until now, the bacterial species, pathogen or mutualist, with the smallest known genome sizes are M. genitalium (Fraser et al. 1995) and the strain of B. aphidicola from the host Cinara cedri (Gil et al. 2002), with 580 and 450 kb, respectively.
The close relationship to other free-living gamma-Proteobacteria such as Escherichia coli, Salmonella spp., Yersinia pestis, or Vibrio spp. has caused the endosymbiont species B. aphidicola to be the subject of many studies on genome degradation (Shigenobu et al. 2000; Mira, Ochman, and Moran 2001; Moran and Mira 2001; Silva, Latorre, and Moya 2001, 2003; Tamas et al. 2002; van Ham et al. 2003). Previous analyses allowed the estimation of the minimal gene content of the genome of the ancestor of E. coli and B. aphidicola in 1,818–2,425 genes, indicating that more than 1,000 genes were lost during adaptation to the new lifestyle. Those losses would have occurred by large deletions, simultaneously removing many genes, by gene inactivation and progressive gene disintegration, or by a combination of both processes (Moran and Mira 2001; Silva, Latorre and Moya 2001).
The availability of the genome sequences of three B. aphidicola strains, symbionts of the aphids Acyrthosiphon pisum (BAp; Shigenobu et al. 2000), Schizaphis graminum (BSg; Tamas et al. 2002), and Baizongia pistaciae (BBp; van Ham et al. 2003), has offered the possibility of studying the fate of the DNA of those genes that were lost throughout the evolution of the genome of the three B. aphidicola strains, so becoming part of the nonfunctional DNA. Because B. aphidicola seems only to be transmitted maternally in aphids (Baumann et al. 1995), a parallel evolution of B. aphidicola and aphid lineages has been proposed (Moran et al. 1993). This has permitted an estimation of the divergence times between B. aphidicola strains based on the estimated divergence times of their hosts. Thus, it was estimated that the strains BAp and BSg diverged 50 to 70 MYA (Clark, Moran, and Baumann 1999) due to the fact that their aphid hosts belong to two tribes of the subfamily Aphidinae (Remaudière and Remaudière 1997). On the other hand, the strain BBp, whose host belongs to the subfamily Pemphiginae (Remaudière and Remaudière 1997), probably diverged at the time of the Aphididae family radiation. Recently, it was estimated to have taken place 86 to 164 MYA (von Dohlen and Moran 2000; see fig. 1).
FIG. 1.— Phylogenetic relationships of Buchnera aphidicola strains. Divergence times are shown at the bottom. The minimal gene number deduced for the LCSA genome (640) is distributed between the 629 chromosomal genes and the 11 genes contained in the tryptophan and leucine plasmids. The ancestral genome was composed of 603 protein-coding and 37 RNA-specifying genes. The number of lost genes in each internal or external phylogenetic branch is reported inside the squares. All these losses correspond to protein-coding genes. The numbers of genes of the three B. aphidicola genomes are placed to the right below the name of each strain. These numbers include both the chromosomal and plasmids genes, but in BAp and BSg genomes only a copy of the duplicated genes of the tryptophan plasmid has been counted, either because the number of copies varies between individuals or because it has not been determined. Aphidinae and Pemphiginae are the two aphid subfamilies to which the hosts of the B. aphidicola strains belong. BAp: symbiont of the aphid Acyrthosiphon pisum; BSg: symbiont of Schizaphis graminum; BBp: symbiont of Baizongia pistaciae.
The comparison of the content and order of the genes on the chromosomes of the three B. aphidicola strains revealed an extreme case of genome stability, with just a few chromosomal rearrangements (Tamas et al. 2002; van Ham et al. 2003). It also allowed the reconstruction of the minimal gene content of the genome of the last common symbiont ancestor (LCSA) of the three strains (van Ham et al. 2003). This gene repertoire may increase with those genes that were simultaneously lost in the three strains. Based on this information, the evolutionary history of gene losses in the genome of B. aphidicola was reconstructed (Silva, Latorre, and Moya 2003), determining that at least 164 gene losses had occurred in the course of the evolution of the three B. aphidicola lineages. The inability to acquire horizontally transferred genes (HGT) by the chromosome of B. aphidicola (but see van Ham et al. 2000) supported the hypothesis that every gene present in a B. aphidicola strain was originally present in the chromosome of LCSA. In addition, based on the information of the gene content of plasmids from several B. aphidicola strains (Lai, Baumann, and Baumann 1994; Bracho et al. 1995; Rouhbakhs et al. 1996; van Ham, Moya, and Latorre 1997; Silva et al. 1998; van Ham et al. 1999, 2000; Baumann et al. 1999), some genes located in the chromosome of the BBp strain were originally thought to have been in a plasmid, but they were inserted into the chromosome during lineage evolution (Sabater-Mu?oz et al. 2002, 2004; Silva, Latorre, and Moya 2003). The absence of HGT and the near absence of rearrangements in the chromosome allowed the fate of the DNA of each gene to be followed, once inactivated, even though the high substitution rate of B. aphidicola (Moran 1996; Brynnel et al. 1998) led to a complete loss of sequence similarity between the remnant intergenic region and the original gene.
The aim of this study was the estimation of the rate of nonfunctional DNA loss (pseudogene or intergenic region) in the obligate bacterial endosymbiont B. aphidicola, making use of the estimates of divergent dates for its strains. We have also tried to verify that the ability to lose DNA at a significant rate has been maintained throughout all B. aphidicola lineages.
Materials and Methods
Definition of Gene Loss Events
We define a gene loss as any event producing the destruction of the function of a gene. Thus, gene loss events are revealed by the absence of an ancestral LCSA gene, or its presence as a pseudogene, in the modern B. aphidicola genomes. When the genomes of the three B. aphidicola strains are compared, gene absence or pseudogenes are frequently found for the same ancestral gene in more than one strain. This may be interpreted as either an ancestral loss prior to strain divergence or as two (or three) convergent losses after it. To differentiate between these two episodes in the BAp and BSg lineage evolution (see fig. 1), we first established that if a functional gene was present in one strain and a pseudogene, or a gene absence, was in the other, then the gene loss took place after the divergence in the latter lineage. On the other hand, if both strains presented the absent gene status, we considered that the gene loss event took place prior to the divergence.
In addition, we established a criterion based on sequence similarity for the cases of pseudogene status in both strains, or pseudogene in one strain and absent gene in the other. To establish the criterion, we try to determine the range of E values reported after a Blast search (Altschul et al. 1997) for those genes inactivated after BAp and BSg divergence. We selected 47 genes with complete confidence because, in one of the two strains, they had the status of gene (case 1, gene in BAp and pseudogene in BSg; case 2, gene in BSg and pseudogene in BAp; case 3 gene in BAp and absent gene in BSg; case 4, gene in BSg and absent gene in BAp; see table 1). We took the E. coli protein as a reference and performed a TBlastN against the genomes of BAp and BSg, searching for the E value of the region of the genome where the remnant DNA of the gene should be located. The E value range for the 33 pseudogenes (cases 1 and 2) was from 0 to nondetected (E value > 7) with a median value of 2 x e–74, while no hit was detected for any of the 14 absent genes (cases 3 and 4). These results indicate that the substitution rate in B. aphidicola is so high that the pseudogenes formed prior to BAp and BSg divergence have had time to completely lose any sequence similarity with the functional gene. For that reason the small E values obtained for cvpA, apbE, cmk, bioH, ansA, and hemD in the same conditions (2 x e–11, 6 x e–80, 5 x e–28, 1 x e–54, 1 x e–126, and 7 x e–36, respectively, for the smallest value of BAp or BSg) indicate with high confidence that a functional gene was present at the time of BAp and BSg divergence. Therefore, two convergent gene loss events took place for each of these genes.
Table 1 Gene Losses and Gene Status in Buchnera aphidicola Strains
Definition of Deletion Rate
The main goal of this work was to determine the deletion rate during the evolution of the B. aphidicola lineage. A deletion rate is the frequency of deleted nucleotides per Myr. This frequency was estimated on the remnant DNA of the pseudogenes, which in many cases had completely lost their similarity to the DNA of the functional gene. We must not confuse the DNA deletion rate with the gene inactivation rate. The latter could be defined as the frequency of inactivated genes per Myr. This is not a constant parameter, as can be shown when early and late gene losses during the evolution of BAp and BSg lineages are compared (fig. 1).
Analysis of Genes, Pseudogenes, and Remnant DNA Sequences
To determine the DNA loss in the inactivated genes, we estimated the length of each gene (Lg) and the length of the disintegrated DNA region after the reductive process (Ld). For single gene losses, Lg was defined as the number of nucleotides (nt) included between the upstream and downstream adjacent genes, minus the length of an average intergenic region. Thus, at the beginning of the disintegration process, this length contained the upstream and downstream intergenic regions plus the gene and, after a complete disintegration, the remnant DNA would correspond to an average intergenic region (fig. 2 is a diagram of the possible situations). We used 55.1 nt as the size of an average intergenic region, as was estimated for ancient spacers, defined as those flanked by the same genes in B. aphidicola and E. coli (Mira, Ochman, and Moran 2001). In a similar way, Ld was estimated as the number of nucleotides between the upstream and downstream adjacent genes minus 55.1 nt. This DNA length represents the average contribution of the two contiguous genes to the final intergenic region. Each original gene would have some upstream and downstream noncoding nucleotides which will be lost together with the coding region during the gene disintegration process. Thus, the new intergenic region will be formed by a few downstream noncoding nucleotides of the upstream gene and by the upstream noncoding nucleotides of the downstream gene. Both DNA segments would form, on average, the 55.1 nt of the new intergenic region (fig. 2).
FIG. 2.— Diagram of the regions included in the estimation of the number of nucleotides of Ld and Lg. (A) The lost gene retains the status of a pseudogene. (B) The accumulation of nucleotide substitutions produces the loss of similarity between the gene and the disintegrated region. (C) An example of a block including two lost genes. Ig: intergenic region.
For BBp gene losses, we estimated Lg as the average of BAp and BSg, except when the gene was only present in one of these two strains. For BAp losses we estimated Lg on BSg and vice versa. When the gene was lost in both strains, we estimated on BBp. In the case of the yadF gene that was absent in the three strains, we estimated on BTc (see Results).
When several adjacent genes were simultaneously lost in a lineage we treated them as a block, estimating Ld and Lg for the block (see fig. 2). Independently of the number of lost genes contained in the block, we always considered 55.1 as the final average spacer between two functional genes. For the determination of the number of genes in each disintegrated category, the Ld/Lg ratio of the block was assigned to each of the lost genes in the block. When a block contained one or several pseudogenes, and it was possible to identify their ancestral start and stop codons, the block was divided into the maximum possible number of segments to estimate Ld/Lg. GC contents (GCd and GCg) were estimated in the same DNA segments used for estimating Ld and Lg. Both variables were plotted with a logarithmic transformation for the GCd/GCg ratio. For those genes whose inactivation started after the divergence of BAp and BSg, we estimated the GC content of an active B. aphidicola gene and compared it with the GC content of the remnant DNA region. The ansA and hemD genes were not included in the final analyses because there was not a functional gene in BBp to determine the gene GC content (see table 1).
Although the position on the chromosome of close to 100 genes that were lost after the divergence of E. coli and B. aphidicola and prior to the formation of the LCSA was known (Silva, Latorre, and Moya 2001), they were not taken into account. We consider that the DNA coming from those genes should have practically disappeared, after more than 150 Myr of evolution.
Intergenic Region Size Analysis
A putative shortening of B. aphidicola intergenic regions versus E. coli was studied. Only ancient spacers were analyzed. They were defined according to Mira, Ochman, and Moran (2001) as those with the same flanking genes in E. coli and B. aphidicola. To perform a homogeneous analysis, only those present in all B. aphidicola strains were measured (n = 195). We tested whether each sample comes from a normal distribution and rejected the null hypothesis with the Kolmogorov-Smirnov test with a P value = .000. In addition, because our data were not independent but repeated measures of the sizes of the same intergenic region in four genomes, we applied a nonparametric repeated measures analysis with the Friedman test, the null hypothesis being that there is no difference in mean ranks for the genomes.
Results
Identification of Gene Loss Events
The minimal gene content of the genome of the LCSA of B. aphidicola was determined (fig. 1). It was based on the comparison of the genomes of the BAp, BSg, and BBp strains, with the addition of the gene yadF detected in the B. aphidicola strain from Tetraneura caerulescens (BTc; Sabater et al. 2004) but absent in the other three genomes. It was formed by 640 genes, 629 of them being located at the chromosome. Five genes, annotated as pseudogenes in the genome of BSg (Tamas et al. 2002), were reannotated as genes (Gil et al. 2003), based on the existence of putatively functional open reading frames and the essential role of the protein (lig, mfd, and endA), the putative use of unusual start codons (infC), or because the protein was produced by a programmed ribosomal frameshift (prfB).
One hundred sixty-four gene loss events were identified (table 1). Two of the losses of the BBp lineage were not taken into account because they involved plasmid genes (ibpA and repAC). Thus, at least 94 out of the 629 ancestral chromosomal genes were lost through the 100–150 Myr evolution of the BBp lineage chromosome. In contrast, the lineages of BAp and BSg only lost 32 and 44 genes, respectively, during the same period of time. These events only involved 135 different genes because of several convergent losses that occurred during the evolution of the B. aphidicola lineages. The genes ansA and hemD were lost threefold (BBp, and after the split of BAp and BSg, named from this point as late BAp and late BSg). The gene yadF was lost twice (BBp and before the split of BAp and BSg, named from this point as early BAp and early BSg). Nine genes were lost twice in the BBp and late BAp lineages. Eleven genes were lost twice in the BBp and late BSg lineages. Finally, apbE, cmk, cvpA, and bioH were lost twice in the late BAp and late BSg lineages (see table 1).
DNA Loss in the BAp and BSg Lineages
Eight genes (bioC, bioF, mutH, norM, pal, uspA, yqgE, and yadF) were analyzed (table 1 and fig. 3A) whose inactivation occurred before the divergence of BAp and BSg lineages (fig. 1), between 86–164 Myr (the estimated age for LCSA) and 50–70 Myr (the divergence time of BAp and BSg). Six out of the eight were not adjacent in the LCSA chromosome, while bioC and bioF genes were contiguous and, for that reason, in the DNA loss analysis they were treated as a block. When the length of each gene (Lg) and the length of its homologous disintegrated DNA region (Ld) were compared, in six out of eight cases more than 90% of the nucleotides had been lost, while in the other two genes the remnant DNA was only slightly higher than 10% (fig. 3A). The average Ld/Lg ratio for the eight genes was 0.055, with a range from 0 to 0.13. For each gene, the length of the disintegrated region (Ld) was the mean of the lengths for BAp and BSg. These two lengths were very similar, and the difference in the Ld/Lg value for both strains was not higher than 0.1 in any analyzed gene.
FIG. 3.— DNA loss in Buchnera aphidicola. (A) Aphidinae lineage: genes whose inactivation occurred between LCSA and the divergence of BAp and BSg lineages. (B) Aphidinae lineage: genes whose inactivation occurred after the divergence of BAp and BSg lineages. (C) Pemphiginae lineage: genes whose inactivation occurred during the evolution of BBp lineage. See Material and Methods for the estimation of the length (nucleotides) of each gene (Lg) and the length of the disintegrated DNA region after the reductive process (Ld). The width of each bar along the abscissa represents a class interval of 0.1, except for the first bar (>1.1). Genes were classified according to the presence or absence of similarity against the functional gene. The total height of each class interval represents the sum of these two conditions. Pseudogenes (similarity is still detected) are light gray columns. Regions without similarity are dark gray columns.
Assuming that the disintegration of these genes took place gradually with an average Ld/Lg ratio of 0.055 for an average period of time of 100 Myr, we have applied the continuous decay formula, Ld = Lg e–rt (Petrov and Hartl 1998) to estimate the deletion rate per nucleotide and Myr (r), with Ld being the length of the disintegrated DNA region at time t and Lg the length of the active gene (at time 0). We estimated a deletion rate of 2.9% per Myr (r = 0.029) and the function that explained the gradual decay as Ld = Lg e–0.029t (fig. 4). This means that the half-life of a pseudogene, the period for losing half of its nucleotides, is 23.9 Myr. This theoretical function would imply that the DNA of a gene that was inactivated during the first stages of B. aphidicola evolution would now be almost completely lost, while the DNA of those genes lost after the divergence of BAp and BSg (see fig. 1) would have a wide range of disintegration, with an Ld/Lg ratio from around 0.13 (70 Myr) to 1.
FIG. 4.— Hypothetical gradual DNA loss function based on the available information for the eight genes whose inactivation occurred between LCSA and the divergence of BAp and BSg lineages (see figs. 1 and 3). Half-life of a pseudogene (the period for losing half of its nucleotides) is 23.9 Myr.
Fifty-six out of 60 genes were analyzed (22 in BAp and 34 in BSg lineages) whose inactivation occurred in the lineages of BAp or BSg, after the divergence of both strains (see table 1). Fifty-three were not adjacent, and four of them were treated as blocks of two genes (znuA-yebA and ygcF-ygcM). When the degree of DNA loss was analyzed (fig. 3B), a wide range of variation was observed, with most of the genes presenting degrees of disintegration smaller than 20%. Forty-four of them, receiving the status of pseudogene, were probably inactivated very recently and, for that reason, nucleotide substitutions have not completely removed their similarity to the functional gene. On the contrary, when the DNA loss of nonpseudogene intergenic regions were analyzed separately, 12 out of 14 presented an Ld/Lg value equal to or smaller than 0.603, and some genes were even almost completely disintegrated. These genes would correspond to those inactivated in the first million years after the divergence of both lineages in agreement with the gradual decay rate (fig. 4). Although it is very difficult to make an estimation of the half-life of a pseudogene during this period, because the time of disintegration may vary from tens of millions to a few hundred years, we have tried to give an approximate range using exclusively the lengths of the 14 nonpseudogene intergenic regions. We estimated an average Ld/Lg ratio of 0.426 for them. To apply the continuous decay formula, an upper bound of 60 Myr (the divergence time) and a lower bound that we decided to fix at 20 Myr were used. By using these figures, we obtained a range from 48.7 Myr to 16.2 Myr, which is in the order of magnitude of the 23.9 Myr previously estimated for the first period.
A few genes not only did not present a reduction but even presented a slight increase in size. They were probably inactivated very recently and their larger size may be due to several causes: (1) nucleotide insertions; (2) the ancestral gene and intergenic sizes might not match exactly with that estimated for the reference B. aphidicola strain; (3) a fraction of the DNA might come from the loss of other nondetectable ancestral genes which, as in the case of yadF, were lost in the three completely sequenced B. aphidicola genomes; and (4) the 5' end of several genes might not be incorrectly annotated.
Because many host-associated bacteria display AT-enriched genomes, it has been proposed that a mutational pressure exists in these genomes toward the increase in A+T content (Moran 2002). In B. aphidicola this shift has caused the genic and intergenic G+C content to be very low: 26% and 15% in the genomes of BAp and BSg, respectively (Tamas et al. 2002). Once a gene is inactivated, the bias in the nucleotide substitutions produces the decrease in the G+C content. We compared the decrease in G+C content with the DNA loss for the gene losses of this period and estimated the correlation coefficient between Ld/Lg and ln (GCd/GCg) to be 0.765 (fig. 5). A parallel decrease in GC and length was observed with an equilibrium for the GC content decrease of around 0.47. This means that on average the final GC content of the analyzed regions is 47% of the initial composition.
FIG. 5.— Relationship between length (Ld/Lg) and GC content (GCd/GCg) decrease for those genes whose inactivation occurred after divergence of BAp and BSg lineages. A logarithmic transformation has been applied to the GC content decrease parameter. See figure 3 legend for additional information. Correlation coefficient is 0.765.
DNA Loss in the BBp Lineage
The DNA loss in the 94 genes present in the LCSA chromosome that were lost during the evolution of BBp lineage was analyzed. Fifty-four were analyzed independently, and 40 were analyzed in 13 blocks of genes containing two to eight genes. Because of the putative wide period for the start of disintegration, from 0 to 150 Myr, a large variation in the Ld/Lg values was obtained, but with a large number of genes almost completely disintegrated (fig. 3C). Only 23 out of 94 genes showed Ld/Lg values higher than 0.6. Nearly all pseudogenes were included in this group. No correlation was detected between the decrease in the length and G+C content for the lost genes of this lineage (data not shown). According to the gradual decay function, this result implies that only a few genes started their disintegration recently, while the large majority was decaying for more than 50–60 Myr. We also estimated the range of pseudogene half-life exclusively using the nonpseudogene intergenic regions, although in this case with a larger interval due to the upper bound of divergence (86–164 Myr). The average Ld/Lg ratio was 0.358, which renders a pseudogene half-life range of 81.1 to 13.5 Myr by using disintegration times from 120 to 20 Myr.
Shortening of Ancient Spacers
A comparison of the sizes of the intergenic regions for ancient spacers between E. coli and each of the three B. aphidicola strains is shown in figure 6. The average size (bp ± standard deviation) for BAp (51.1 ± 70.0), BSg (47.4 ± 63.6), and BBp (55.3 ± 76.1) was only slightly smaller than that for E. coli (67.5 ± 98.2). The Friedman test for the four genomes concluded that there was a difference in the mean ranks (P value = .017). This test was applied exclusively using the three B. aphidicola genomes and, in this case, the null hypothesis was not rejected (P value = .103). Therefore, the size distribution for the intergenic regions is different between B. aphidicola and E. coli.
FIG. 6.— Orthologous ancient spacers in Escherichia coli and Buchnera aphidicola. Relation between the size of the intergenic regions of E. coli and the size of the intergenic regions of BAp, BSg, and BBp. Ancient spacers are defined as those flanked by the same genes in B. aphidicola and E. coli (Mira, Ochman, and Moran 2001). Only those spacers present in all B. aphidicola strains were compared (n = 195). A dashed line marks the 1:1 slope.
Functional Role of the Lost Genes
On analyzing the nature of the 133 chromosomal lost genes, according to the Clusters of Orthologous Groups of proteins (COGs) functional classification (Tatusov, Koonin, and Lipman, 1997), it can be seen (table 2) that the losses embrace all of the functional categories, though the numbers are very different. The majority of these genes are involved in coenzyme transport and metabolism (21) and in cellular wall and membrane biogenesis (18). The most conserved are, as it could be expected, genes implicated in information processing. We found that the majority of these losses were not convergent (106 out of 133 genes), and they were independently produced in one lineage of B. aphidicola. These findings would indicate that the differential losses are related with the specific host, due to either its particular diet or lifecycle. A clear example of conservation is the case of the genes involved in the essential amino acid biosynthetic pathways that are conserved in the three analyzed B. aphidicola genomes, due to their nutritional role in the symbiosis providing essential amino acids to the insect. However, the genes involved in the ornithine pathway have been lost independently in BBp (van Ham et al. 2003). This loss affects genes such as argA, B, C, D, and E and others in relation to these pathways, such as the pyr and spe genes, implying that once a gene is lost, all the genes involved in this pathway become nonessential and thus susceptible to elimination.
Table 2 Classification of Lost Genes Based on COGs, and Analysis of Convergent Losses
Discussion
By pulsed-field gel electrophoresis (PFGE), it was shown that the genome of several B. aphidicola strains maintained a more or less stable chromosomal size of around 630–643 kb. (Wernegreen et al. 2000). This result, additionally supported by the sequence of the BAp and BSg genomes (Shigenobu et al. 2000; Tamas et al. 2002), led to the idea that after an initial phase of adaptation to the endosymbiotic lifestyle, B. aphidicola was unable to lose its nonfunctional DNA to continue with the genome size reduction. Moreover, it was recently proposed that the rate at which genic sequences are erased from the modern B. aphidicola genome is as low as one nucleotide per 10,000 years (Mira, Klasson, and Andersson 2002).
However, a new PFGE including B. aphidicola strains from other aphid subfamilies (Gil et al. 2002), together with the sequencing of the BBp strain with a chromosome of 616 kb (van Ham et al. 2003), revealed a wide range of chromosomal sizes, with 450 kb being the minimum size reported so far for a bacterial species. These results show that the genome of B. aphidicola, at least in some period of the evolution of several lineages, had experienced a process of genome reduction at a nonirrelevant rate.
Once a gene is inactivated, its DNA is affected by two types of changes: (1) a mutational bias towards A+T, which provokes a decrease in the GC content (Moran 2002); and (2) the deletion of some of its nucleotides. Our work has shown that both processes present some degree of correlation and, in general, the DNA of the inactivated genes became shorter and A+T richer with time. The rate of DNA loss of B. aphidicola is sufficiently high to produce the complete or almost complete disintegration of the genes in a short period of time. We have shown that the rate of those genes whose inactivation occurred more than 50–70 MYA in the B. aphidicola from the Aphidinae lineage was high enough to almost completely remove them, and the rate of the genes inactivated after this date produced in a few cases a complete loss, and in others a partial loss. Those genes with inappreciable reductions were mainly pseudogenes that were probably inactivated very recently: However, it is possible that some of them were still functional genes. The production of small amounts of a functionally complete protein is possible for some pseudogenes if they are transcribed and if a significant level of ribosomal frameshifting takes place during translation, as has been described for several E. coli genes (Gurvich et al. 2003).
The absence of a date for the divergence of two or more members from the Pemphiginae did not allow us to determine whether the genes lost in an early stage had been completely erased, but it is evident that a large proportion of lost genes have lost a large quantity of their nucleotides. For that reason, we believe that in both lineages the DNA of a gene can be almost completely deleted from the genome in 40 to 60 Myr. Our estimation of the half-life of a pseudogene in B. aphidicola of 23.9 Myr is in the range of the 14.3 Myr estimated for Drosophila (Petrov and Hartl 1998) but is much smaller than the 615 Myr for Laupala (Petrov et al. 2000) or the 884 Myr for mammals (Petrov and Hartl 1998). This estimation was done for an early period of the evolution of the BAp/BSg lineage using the eight genes that became inactivated during this phase. By exclusively using a part of the genes whose disintegration started after BAp/BSg divergence, we obtained a value of the same order (16.2–48.7 Myr). A similar analysis, although affected by the large period of time used for the estimation, rendered a value from 13.5 to 81.1 for the BBp lineage. These results show that DNA loss is taking place at a nonirrelevant rate during the evolution of all B. aphidicola lineages. The disintegration rate for free-living bacteria or for the initial steps of the adaptation to endosymbiosis was probably higher because several mechanisms, now lost in B. aphidicola, can produce drastic losses of nucleotides. The mechanisms are mainly, on one hand, the loss of an efficient recombinational system (Shigenobu et al. 2000) which, in combination with direct repeats, would produce deletions (Frank, Amiri, and Andersson 2002) and, on the other, the decrease in the close direct repeat frequency in the genome which may be the substrate for DNA polymerase slippage. This is probably the reason why it is very difficult to identify pseudogenes in many bacterial species (Lawrence, Hendrix, and Casjens 2001).
What controls the size of the B. aphidicola genome is the importance, or essentiality, of the function of the different DNA sequences that comprise it, either genes or intergenic regions with some kind of function. Once any of these DNA segments loses its function, a process of gradual DNA loss decreases its length. For that reason, the size of the B. aphidicola chromosome may still continue to be reduced, and the limit for this reduction will be associated with the minimum number of genes required for bacterial cell life and the symbiotic contribution to the life of its insect host. It is worth noticing that the genome of the five completely sequenced bacterial endosymbiont of insects share only 313 genes, 277 of them being protein-coding (Gil et al. 2003). This minimal set would produce genome sizes as small as 300 kb, with one-third of these genes nonessential for a bacterial cell but required for supporting the survival of its host. A slight decrease in the ancient intergenic spacers was also detected, but its contribution to the total chromosomal reduction will be much smaller than the loss and disintegration of the genes.
Genome size in bacteria is a balance between several mechanisms that produce the insertion or deletion of small or large DNA segments. Mechanisms producing the insertion or deletion of hundreds or thousands of nucleotides in a single event have a high impact on total genome size. However, these mechanisms seem to have been lost in the present B. aphidicola genome evolution. The stability of the gene order of its genome is probably due to the lack of elements that disrupt the chromosomal structure, such as transposable elements, phages, large repeats, and probably an efficient homologous recombinational mechanism (Rocha 2003). However, the loss of the ability to acquire foreign DNA fragments by horizontal gene transfer events drastically reduces the impact of the mechanisms that increase the genome size. Therefore, the main mechanism affecting the evolution of the size of the modern B. aphidicola genomes are those mutational events involving a small number of nucleotides (insertions or deletions). In fact, the most frequent polymorphism detected during the sequencing of the BBp and BSg genomes were small indels with an average size of between one and two nucleotides (Tamas et al. 2002; van Ham et al. 2003). Slipped-mispair errors during DNA replication are probably the main cause of these polymorphisms. In addition, the presence of close repeats (Rocha and Blanchard 2002), which are short repeats (> 8–10 nt) separated by a spacer of several nucleotides, may be important because it generates slightly larger duplications or deletions (up to several hundred nucleotides; Rocha 2003). Furthermore, although these events took place at a lower rate than the single nucleotide indels, their impact will be much stronger. It has been observed that the genes present in BAp or BSg, whose orthologs were lost in the other strain, presented a slightly higher number of close repeats (larger than 9 nt) than the average gene of the genome (Rocha 2003). This would indicate that the probability of inactivation of a nonessential gene is higher when it contains more repeats. However, the density of repeated sequences in B. aphidicola, as well as in other host-associated bacteria, is dramatically decreased when compared with free-living bacteria such as E. coli, Salmonella spp., or Bacillus spp. (Tamas et al. 2002).
On the contrary, the size of the intergenic regions is not greatly affected by the genome reduction process, and ancient spacers are only slightly smaller in B. aphidicola. This difference disappears in BAp if spacers with annotated regulatory regions in E. coli are excluded (Mira, Ochman, and Moran 2001).
Finally, although insertions and deletions are factors contributing to the evolution of genome size, the evolutionary forces that led to its reduction are a matter for discussion. Several authors have proposed that in many bacterial genomes, a bias to the net DNA loss exists based on a higher number of deletion events (vs. insertions) and/or a higher average size of the deleted segments (vs. inserted; Andersson and Andersson 2001; Lawrence, Hendrix, and Casjens 2001; Mira, Ochman, and Moran 2001; Gregory 2004). If this bias is true, genetic drift could contribute to the fixation of the more abundant deletional mutations. The effect of this mechanism would be very important in B. aphidicola, due to the small population sizes and to the special manner of vertical transmission with bottlenecks in each generation (Mira and Moran 2002).
Alternatively or simultaneously, natural selection may be partially or completely responsible for the reduction. If small-size genomes replicate faster, increasing their frequency in the polyploid B. aphidicola cell (Komaki and Ishikawa 2000), and if these cells divide faster, we can expect deletional mutations to become fixed with time, independently of whether they are produced in higher, lower, or equal rates to insertions. Although this hypothesis has been proposed several times for the reduction of the genome sizes of obligate cellular bacteria and mitochondrial genomes (Selosse, Albert, and Godelle 2001; Silva, Latorre, and Moya 2001), there are few examples supporting it. A negative correlation was observed between DNA content and division rate for some ciliate spp. (Wickham and Lynn 1990). However, no such correlation was observed between doubling times in laboratory conditions and genome sizes over bacteria belonging to 10 major taxonomic divisions (Mira, Ochman, and Moran 2001) nor for growth rates of E. coli strains varying in as much as 25% in chromosome size (Bergthorsson and Ochman 1998). Because of the small difference in size that a 1-nt-indel represents, it seems fully reasonable to accept the conclusion that selection does not differentiate between individual small indels (Gregory 2004). However, studies in Drosophila have shown some evidence that deletions larger than 400 bp may be advantageous (Blumenstiel, Hartl, and Lozovsky 2002). Because of the smaller genome size of bacterial endosymbiont chromosomes, it cannot be completely ruled out that chromosomes with small sizes because of one or several small deletions can be selectively advantageous.
Acknowledgements
This work was supported by grant BMC2003-00305 from the Ministerio de Ciencia y Tecnología (Spain) and grant Grupos03/204 from Generalitat Valenciana (Spain). L.G.-V. was funded by a predoctoral fellowship from Generalitat Valenciana (Spain). We would like to thank the three anonymous reviewers for their comments and suggestions that have contributed to the improvement of this paper. We also want to thank to E. Vercher for support with statistical tests.
References
Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat. Genet. 32:402–407.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 25:3389–3402.
Andersson, J. O., and S. G. Andersson. 1999. Genome degradation is an ongoing process in Rickettsia. Mol. Biol. Evol. 16:1178–1191.
———. 2001. Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol. Biol. Evol. 18:829–839.
Andersson, S. G., and C. G. Kurland. 1998. Reductive evolution of resident genomes. Trends Microbiol. 6:263–268.
Andersson, S. G. E., A. Zomoropipour, J. O. Andersson, T. Sicheritz-Ponten, U. C. M. Alsmark, R. M. Podowski, A. K. Naslund, A.-S. Eriksson, H. Winklerh, and C. G. Kurland. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–143.
Baumann, L., P. Baumann, N. A. Moran, J. Sandstrom, and M. L. Thao. 1999. Genetic characterization of plasmids containing genes encoding enzymes of leucine biosynthesis in endosymbionts (Buchnera) of aphids. J. Mol. Evol. 48:77–85.
Baumann, P., L. Baumann, C.-Y. Lai, and D. Rouhbakhs. 1995. Genetics, physiology, and evolutionary relationships of the genus Buchnera: intracellular symbionts of aphids. Annu. Rev. Microbiol. 49:55–94.
Bergthorsson, U., and H. Ochman. 1998. Distribution of chromosome length variation in natural isolates of Escherichia coli. Mol. Biol. Evol. 15:6–16.
Blumenstiel, J. P., D. L. Hartl, and E. R. Lozovsky. 2002. Patterns of insertion and deletion in contrasting chromatin domains. Mol. Biol. Evol. 19:2211–2225.
Bracho, A. M., D. Martinez-Torres, A. Moya, and A. Latorre. 1995. Discovery and molecular characterization of a plasmid localized in Buchnerasp. bacterial endosymbiont of the aphid Rhopalisiphum padi. J. Mol. Evol. 41:67–73.
Brynnel, E. U., C. G. Kurland, N. Moran, and S. G. E. Anderson. 1998. Evolutionary rates for tuf genes in endosymbionts of aphids. Mol. Biol. Evol. 15:574–582.
Chambaud, I., R. Heilig, S. Ferris et al. (12 co-authors). 2001. The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res. 29:2145–2153.
Clark, M. A., L. Baumann, M. L. Thao, N. A. Moran, and P. Baumann. 2001. Degenerative minimalism in the genome of a psyllid endosymbiont. J. Bacteriol. 183:1853–1861.
Clark, M. A., N. A. Moran, and P. Baumann. 1999. Sequence evolution in bacterial endosymbionts having extreme base compositions. Mol. Biol. Evol. 16:1586–1598.
Frank, A. C., H. Amiri, and S. G. E. Andersson. 2002. Genome deterioration: loss of repeated sequences and accumulation of junk DNA. Genetica. 115:1–12.
Fraser, C. M., J. D. Gocayne, O. White et al. (29 co-authors). 1995. The minimal gene complement of Mycoplasma genitalium. Science 270:397–403.
Gil, R., B. Sabater-Munoz, A. Latorre, F. J. Silva, and A. Moya. 2002. Extreme genome reduction in Buchnera spp.: toward the minimal genome needed for symbiotic life. Proc. Natl. Acad. Sci. USA 99:4454–4458.
Gil, R., F. J. Silva, E. Zientz et al. (13 co-authors). 2003. The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes. Proc. Natl. Acad. Sci. USA 100:9388–9393.
Gregory, T. R. 2004. Insertion-deletion biases and the evolution of genome size. Gene 324:15–34.
Gurvich O. L., P. V. Baranov, J. Zhou, A. W. Hammer, R. F. Gesteland, and J. F. Atkins. 2003. Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. EMBO J. 22:5941–5950.
Himmelreich, R., H. Hilbert, H. Plagens, E. Pirkl, B. C. Li, and R. Herrmann. 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24:4420–4449.
Kalman, S., W. Mitchell, R. Marathe, C. Lammel, J. Fan, R. W. Hyman, L. Olinger, J. Grimwood, R. W. Davis, and R. S. Stephens. 1999. Comparative genomes of Chlmydia pneumoniae and C. trachomatis. Nat. Genet. 21:385–389.
Komaki, K., and H. Ishikawa. 2000. Genomic copy number of intracellular bacterial symbionts of aphids varies in response to developmental stage and morph of their host. Insect. Biochem. Mol. Biol. 30:253–258.
Lai, C.-Y., L. Baumann, and P. Baumann. 1994. Amplification of trpEG: Adaptation of Buchnera aphidicola to an endosymbiotic association with aphids. Proc. Natl. Acad. Sci. USA 91:3819–3823.
Lawrence, J. G., R. W. Hendrix, and S. Casjens. 2001. Where are the pseudogenes in bacterial genomes? Trends Microbiol. 9:535–540.
Mira, A., L. Klasson, and S. G. E. Andersson. 2002. Microbial genome evolution: sources of variability. Curr. Opin. Microbiol. 5:506–551.
Mira, A., and N. A. Moran. 2002. Estimating population size and transmission bottlenecks in maternally transmitted endosymbiotic bacteria. Microb. Ecol. 44:137–143.
Mira, A., H. Ochman, and N. A. Moran. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589–596.
Moran, N. A. 1996. Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93:2873–2878.
———. 2002. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108:583–586.
Moran, N. A., and A. Mira. 2001. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2:research0054.
Moran, N. A., M. A. Munson, P. Baumann, and H. Ishikawa. 1993. A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proc. R. Soc. Lond. B Biol. Sci. 253:167–171.
Moran, N. A., and J. J. Wernegreen. 2000. Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol. Evol. 15:321–326.
Ogata, H., S. Audic, P. Renesto-Audiffren et al. (11 co-authors). 2001. Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293:2093–2098.
Petrov, D. A., and D. L. Hartl. 1998. High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. Mol. Biol. Evol. 15:293–302.
Petrov, D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, and K. L. Shaw. 2000. Evidence for DNA loss as a determinant of genome size. Science 287:1060–1062.
Remaudière, G., and M. Remaudière. 1997. Catalogue des Aphididae du Monde. Homoptera Aphidoidea. Institut National de la Recherche Agronomique, Paris.
Rocha, E. P., and A. Blanchard. 2002. Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res. 30:2031–2042.
Rocha, E. P. C. 2003. An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction. Genome Res. 13:1123–1132.
Rouhbakhsh, D., C. Lai, C. D. von Dohlen, M. A. Clark, L. Baumann, P. Baumann, N. A. Moran, and D. J. Voegtlin. 1996. The Tryptophan biosinthetic pathway of aphid endosymbionts (Buchnera): genetics and evolution of plasmid-associated anthranilate synthase (trpEG ) within the Aphididae. J. Mol. Evol. 42:414–421.
Sabater-Mu?oz, B., L. Gomez-Valero, R. C. van Ham, F. J. Silva, and A. Latorre. 2002. Molecular characterization of the leucine cluster in Buchnera sp. strain PSY, a primary endosymbiont of the aphid Pemphigus spyrothecae. Appl. Environ. Microbiol. 68:2572–2575.
Sabater-Mu?oz, B., R. C. H. J. van Ham, A. Moya, F. J. Silva, and A. Latorre. 2004. Evolution of the leucine gene cluster in Buchnera aphidicola. Insights from chromosomal versions. J. Bacteriol. 186:2646–2654.
Selosse, M., B. Albert, and B. Godelle. 2001. Reducing the genome size of organelles favours gene transfer to the nucleus. Trends Ecol. Evol. 16:135–141.
Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86.
Silva, F. J., A. Latorre, and A. Moya. 2001. Genome size reduction through multiple events of gene disintegration in Buchnera APS. Trends Genet. 17:615–618.
———. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends Genet. 19:176–180.
Silva, F. J., R. C. H. J. van Ham, B. Sabater, and A. Latorre. 1998. Structure and evolution of the leucine plasmids carried by the endosymbiont (Buchnera aphidicola) from aphids of the family Aphididae. FEMS Microbiol. Lett. 168:43–49.
Stephens, R. S., S. Kalman, C. Lammel et al. (12 co-authors). 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282:754–759.
Tamas, I., L. Klasson, B. Canback, A. K. Naslund, A. S. Eriksson, J. J. Wernegreen, J. P. Sandstrom, N. A. Moran, and S. G. E. Andersson. 2002. 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379.
Tatusov, R. L., E. V. Koonin, and D. J. Lipman. 1997. A genomic perspective on protein families. Science. 278:631–637.
van Ham, R. C. H. J., F. Gonzalez-Candelas, F. J. Silva, B. Sabater, A. Moya, and A. Latorre. 2000. Postsymbiotic plasmid acquisition and evolution of the repA1-replicon in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 97:10855–10860.
van Ham, R. C. H. J., J. Kamerbeek, C. Palacios et al. (16 co-authors). 2003. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581–586.
van Ham, R. C. H. J., D. Martinez-Torres, A. Moya, and A. Latorre. 1999. Plasmid-encoded anthranilate synthase (TrpEG) in Buchnera aphidicola from aphids of the family Pemphigidae. Appl. Environ. Microbiol. 65:117–125.
van Ham, R. C. H. J., A. Moya, and A. Latorre. 1997. Putative evolutionary origin of plasmids carrying the genes involved in leucine biosynthesis in Buchnera aphidicola (endosymbiont of aphids ). J. Bacteriol 179:4768–4777.
von Dohlen, C. D., and N. A. Moran. 2000. Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alteration. Biol. J. Linn. Soc. 71:689–717.
Wernegreen, J. J., H. Ochman, I. B. Jones, and N. A. Moran. 2000. Decoupling of genome size and sequence divergence in a symbiotic bacterium. J. Bacteriol. 182:3867–3869.
Wickham, S. A., and D. H. Lynn. 1990. Relations between growth-rate, cell-size, and DNA content in colpodean ciliates (Ciliophora, Colpodea). Eur. J. Protistol. 25:345–352.(Laura Gómez-Valero, Ampar)
E-mail: francisco.silva@uv.es.
Abstract
Reduction of the genome size in endosymbiotic bacteria is the main feature linked to the adaptation to a host-associated lifestyle. We have analyzed the fate of the nonfunctional DNA in Buchnera aphidicola, the primary endosymbiont of aphids. At least 164 gene losses took place during the recent evolution of three B. aphidicola strains, symbionts of the aphids Acyrthosiphon pisum (BAp), Schizaphis graminum (BSg), and Baizongia pistacia (BBp). A typical pattern starts with the inactivation of a gene, which produces a pseudogene, and is followed by the progressive loss of its DNA. Our results show that during the period from the separation of the Aphidinae and Pemphiginae lineages (86–164 MYA) to the divergence of BAp and BSg (50–70 MYA) the half-life of a pseudogene was 23.9 Myr. For the remaining periods of evolution, the ranges of values obtained for this parameter are of the same order of magnitude. These results have revealed that a gene inactivated during B. aphidicola evolution requires 40–60 Myr to become almost completely disintegrated. Moreover, we have shown a positive correlation between the decrease in the GC content and the DNA loss for these nonfunctional DNA regions. When gene losses are classified, based on the detection of a pseudogene or otherwise of an absent gene in the modern B. aphidicola genomes, we have observed a drastic reduction of DNA length in the latter versus the former relative to the functional gene. Finally, we have also detected a slight reduction in size of the intergenic regions in the three B. aphidicola strains, when they are compared with the size of the close relative Escherichia coli.
Key Words: Buchnera aphidicola ? DNA loss ? genome reduction ? gene disintegration ? pseudogenes ? symbiosis
Introduction
A large number of bacterial species living in association to animal cells present small size genomes. They may be pathogens such as Mycoplasma spp. (Fraser et al. 1995; Himmelreich et al. 1996; Chambaud et al. 2001), Chlamydia spp. (Stephens et al. 1998; Kalman et al. 1999), or Rickettsia spp. (Andersson et al. 1998; Ogata et al. 2001), but they may also be mutualistic endosymbionts such as Buchnera aphidicola (Shigenobu et al. 2000; Tamas et al. 2002; van Ham et al. 2003), Wigglesworthia glossinidia (Akman et al. 2002), or Blochmannia floridanus (Gil et al. 2003). Their genomes, ranging in size from 450 to 1.2 Mb, seem to have derived from free-living ancestors of larger genomes. After the shift of the bacterial ancestor to a host-associated lifestyle, a process of genome degradation started causing the inactivation of many genes by either point mutations or the removal of the complete gene (Andersson and Kurland 1998; Andersson and Andersson 1999, 2001; Moran and Wernegreen 2000; Clark et al. 2001; Mira, Ochman, and Moran 2001; Moran and Mira 2001; Silva, Latorre, and Moya 2001; Moran 2002). The newly formed pseudogenes then entered a process of disintegration (Andersson and Andersson 2001; Silva, Latorre, and Moya 2001), which contributed to a higher reduction of the genome size. Until now, the bacterial species, pathogen or mutualist, with the smallest known genome sizes are M. genitalium (Fraser et al. 1995) and the strain of B. aphidicola from the host Cinara cedri (Gil et al. 2002), with 580 and 450 kb, respectively.
The close relationship to other free-living gamma-Proteobacteria such as Escherichia coli, Salmonella spp., Yersinia pestis, or Vibrio spp. has caused the endosymbiont species B. aphidicola to be the subject of many studies on genome degradation (Shigenobu et al. 2000; Mira, Ochman, and Moran 2001; Moran and Mira 2001; Silva, Latorre, and Moya 2001, 2003; Tamas et al. 2002; van Ham et al. 2003). Previous analyses allowed the estimation of the minimal gene content of the genome of the ancestor of E. coli and B. aphidicola in 1,818–2,425 genes, indicating that more than 1,000 genes were lost during adaptation to the new lifestyle. Those losses would have occurred by large deletions, simultaneously removing many genes, by gene inactivation and progressive gene disintegration, or by a combination of both processes (Moran and Mira 2001; Silva, Latorre and Moya 2001).
The availability of the genome sequences of three B. aphidicola strains, symbionts of the aphids Acyrthosiphon pisum (BAp; Shigenobu et al. 2000), Schizaphis graminum (BSg; Tamas et al. 2002), and Baizongia pistaciae (BBp; van Ham et al. 2003), has offered the possibility of studying the fate of the DNA of those genes that were lost throughout the evolution of the genome of the three B. aphidicola strains, so becoming part of the nonfunctional DNA. Because B. aphidicola seems only to be transmitted maternally in aphids (Baumann et al. 1995), a parallel evolution of B. aphidicola and aphid lineages has been proposed (Moran et al. 1993). This has permitted an estimation of the divergence times between B. aphidicola strains based on the estimated divergence times of their hosts. Thus, it was estimated that the strains BAp and BSg diverged 50 to 70 MYA (Clark, Moran, and Baumann 1999) due to the fact that their aphid hosts belong to two tribes of the subfamily Aphidinae (Remaudière and Remaudière 1997). On the other hand, the strain BBp, whose host belongs to the subfamily Pemphiginae (Remaudière and Remaudière 1997), probably diverged at the time of the Aphididae family radiation. Recently, it was estimated to have taken place 86 to 164 MYA (von Dohlen and Moran 2000; see fig. 1).
FIG. 1.— Phylogenetic relationships of Buchnera aphidicola strains. Divergence times are shown at the bottom. The minimal gene number deduced for the LCSA genome (640) is distributed between the 629 chromosomal genes and the 11 genes contained in the tryptophan and leucine plasmids. The ancestral genome was composed of 603 protein-coding and 37 RNA-specifying genes. The number of lost genes in each internal or external phylogenetic branch is reported inside the squares. All these losses correspond to protein-coding genes. The numbers of genes of the three B. aphidicola genomes are placed to the right below the name of each strain. These numbers include both the chromosomal and plasmids genes, but in BAp and BSg genomes only a copy of the duplicated genes of the tryptophan plasmid has been counted, either because the number of copies varies between individuals or because it has not been determined. Aphidinae and Pemphiginae are the two aphid subfamilies to which the hosts of the B. aphidicola strains belong. BAp: symbiont of the aphid Acyrthosiphon pisum; BSg: symbiont of Schizaphis graminum; BBp: symbiont of Baizongia pistaciae.
The comparison of the content and order of the genes on the chromosomes of the three B. aphidicola strains revealed an extreme case of genome stability, with just a few chromosomal rearrangements (Tamas et al. 2002; van Ham et al. 2003). It also allowed the reconstruction of the minimal gene content of the genome of the last common symbiont ancestor (LCSA) of the three strains (van Ham et al. 2003). This gene repertoire may increase with those genes that were simultaneously lost in the three strains. Based on this information, the evolutionary history of gene losses in the genome of B. aphidicola was reconstructed (Silva, Latorre, and Moya 2003), determining that at least 164 gene losses had occurred in the course of the evolution of the three B. aphidicola lineages. The inability to acquire horizontally transferred genes (HGT) by the chromosome of B. aphidicola (but see van Ham et al. 2000) supported the hypothesis that every gene present in a B. aphidicola strain was originally present in the chromosome of LCSA. In addition, based on the information of the gene content of plasmids from several B. aphidicola strains (Lai, Baumann, and Baumann 1994; Bracho et al. 1995; Rouhbakhs et al. 1996; van Ham, Moya, and Latorre 1997; Silva et al. 1998; van Ham et al. 1999, 2000; Baumann et al. 1999), some genes located in the chromosome of the BBp strain were originally thought to have been in a plasmid, but they were inserted into the chromosome during lineage evolution (Sabater-Mu?oz et al. 2002, 2004; Silva, Latorre, and Moya 2003). The absence of HGT and the near absence of rearrangements in the chromosome allowed the fate of the DNA of each gene to be followed, once inactivated, even though the high substitution rate of B. aphidicola (Moran 1996; Brynnel et al. 1998) led to a complete loss of sequence similarity between the remnant intergenic region and the original gene.
The aim of this study was the estimation of the rate of nonfunctional DNA loss (pseudogene or intergenic region) in the obligate bacterial endosymbiont B. aphidicola, making use of the estimates of divergent dates for its strains. We have also tried to verify that the ability to lose DNA at a significant rate has been maintained throughout all B. aphidicola lineages.
Materials and Methods
Definition of Gene Loss Events
We define a gene loss as any event producing the destruction of the function of a gene. Thus, gene loss events are revealed by the absence of an ancestral LCSA gene, or its presence as a pseudogene, in the modern B. aphidicola genomes. When the genomes of the three B. aphidicola strains are compared, gene absence or pseudogenes are frequently found for the same ancestral gene in more than one strain. This may be interpreted as either an ancestral loss prior to strain divergence or as two (or three) convergent losses after it. To differentiate between these two episodes in the BAp and BSg lineage evolution (see fig. 1), we first established that if a functional gene was present in one strain and a pseudogene, or a gene absence, was in the other, then the gene loss took place after the divergence in the latter lineage. On the other hand, if both strains presented the absent gene status, we considered that the gene loss event took place prior to the divergence.
In addition, we established a criterion based on sequence similarity for the cases of pseudogene status in both strains, or pseudogene in one strain and absent gene in the other. To establish the criterion, we try to determine the range of E values reported after a Blast search (Altschul et al. 1997) for those genes inactivated after BAp and BSg divergence. We selected 47 genes with complete confidence because, in one of the two strains, they had the status of gene (case 1, gene in BAp and pseudogene in BSg; case 2, gene in BSg and pseudogene in BAp; case 3 gene in BAp and absent gene in BSg; case 4, gene in BSg and absent gene in BAp; see table 1). We took the E. coli protein as a reference and performed a TBlastN against the genomes of BAp and BSg, searching for the E value of the region of the genome where the remnant DNA of the gene should be located. The E value range for the 33 pseudogenes (cases 1 and 2) was from 0 to nondetected (E value > 7) with a median value of 2 x e–74, while no hit was detected for any of the 14 absent genes (cases 3 and 4). These results indicate that the substitution rate in B. aphidicola is so high that the pseudogenes formed prior to BAp and BSg divergence have had time to completely lose any sequence similarity with the functional gene. For that reason the small E values obtained for cvpA, apbE, cmk, bioH, ansA, and hemD in the same conditions (2 x e–11, 6 x e–80, 5 x e–28, 1 x e–54, 1 x e–126, and 7 x e–36, respectively, for the smallest value of BAp or BSg) indicate with high confidence that a functional gene was present at the time of BAp and BSg divergence. Therefore, two convergent gene loss events took place for each of these genes.
Table 1 Gene Losses and Gene Status in Buchnera aphidicola Strains
Definition of Deletion Rate
The main goal of this work was to determine the deletion rate during the evolution of the B. aphidicola lineage. A deletion rate is the frequency of deleted nucleotides per Myr. This frequency was estimated on the remnant DNA of the pseudogenes, which in many cases had completely lost their similarity to the DNA of the functional gene. We must not confuse the DNA deletion rate with the gene inactivation rate. The latter could be defined as the frequency of inactivated genes per Myr. This is not a constant parameter, as can be shown when early and late gene losses during the evolution of BAp and BSg lineages are compared (fig. 1).
Analysis of Genes, Pseudogenes, and Remnant DNA Sequences
To determine the DNA loss in the inactivated genes, we estimated the length of each gene (Lg) and the length of the disintegrated DNA region after the reductive process (Ld). For single gene losses, Lg was defined as the number of nucleotides (nt) included between the upstream and downstream adjacent genes, minus the length of an average intergenic region. Thus, at the beginning of the disintegration process, this length contained the upstream and downstream intergenic regions plus the gene and, after a complete disintegration, the remnant DNA would correspond to an average intergenic region (fig. 2 is a diagram of the possible situations). We used 55.1 nt as the size of an average intergenic region, as was estimated for ancient spacers, defined as those flanked by the same genes in B. aphidicola and E. coli (Mira, Ochman, and Moran 2001). In a similar way, Ld was estimated as the number of nucleotides between the upstream and downstream adjacent genes minus 55.1 nt. This DNA length represents the average contribution of the two contiguous genes to the final intergenic region. Each original gene would have some upstream and downstream noncoding nucleotides which will be lost together with the coding region during the gene disintegration process. Thus, the new intergenic region will be formed by a few downstream noncoding nucleotides of the upstream gene and by the upstream noncoding nucleotides of the downstream gene. Both DNA segments would form, on average, the 55.1 nt of the new intergenic region (fig. 2).
FIG. 2.— Diagram of the regions included in the estimation of the number of nucleotides of Ld and Lg. (A) The lost gene retains the status of a pseudogene. (B) The accumulation of nucleotide substitutions produces the loss of similarity between the gene and the disintegrated region. (C) An example of a block including two lost genes. Ig: intergenic region.
For BBp gene losses, we estimated Lg as the average of BAp and BSg, except when the gene was only present in one of these two strains. For BAp losses we estimated Lg on BSg and vice versa. When the gene was lost in both strains, we estimated on BBp. In the case of the yadF gene that was absent in the three strains, we estimated on BTc (see Results).
When several adjacent genes were simultaneously lost in a lineage we treated them as a block, estimating Ld and Lg for the block (see fig. 2). Independently of the number of lost genes contained in the block, we always considered 55.1 as the final average spacer between two functional genes. For the determination of the number of genes in each disintegrated category, the Ld/Lg ratio of the block was assigned to each of the lost genes in the block. When a block contained one or several pseudogenes, and it was possible to identify their ancestral start and stop codons, the block was divided into the maximum possible number of segments to estimate Ld/Lg. GC contents (GCd and GCg) were estimated in the same DNA segments used for estimating Ld and Lg. Both variables were plotted with a logarithmic transformation for the GCd/GCg ratio. For those genes whose inactivation started after the divergence of BAp and BSg, we estimated the GC content of an active B. aphidicola gene and compared it with the GC content of the remnant DNA region. The ansA and hemD genes were not included in the final analyses because there was not a functional gene in BBp to determine the gene GC content (see table 1).
Although the position on the chromosome of close to 100 genes that were lost after the divergence of E. coli and B. aphidicola and prior to the formation of the LCSA was known (Silva, Latorre, and Moya 2001), they were not taken into account. We consider that the DNA coming from those genes should have practically disappeared, after more than 150 Myr of evolution.
Intergenic Region Size Analysis
A putative shortening of B. aphidicola intergenic regions versus E. coli was studied. Only ancient spacers were analyzed. They were defined according to Mira, Ochman, and Moran (2001) as those with the same flanking genes in E. coli and B. aphidicola. To perform a homogeneous analysis, only those present in all B. aphidicola strains were measured (n = 195). We tested whether each sample comes from a normal distribution and rejected the null hypothesis with the Kolmogorov-Smirnov test with a P value = .000. In addition, because our data were not independent but repeated measures of the sizes of the same intergenic region in four genomes, we applied a nonparametric repeated measures analysis with the Friedman test, the null hypothesis being that there is no difference in mean ranks for the genomes.
Results
Identification of Gene Loss Events
The minimal gene content of the genome of the LCSA of B. aphidicola was determined (fig. 1). It was based on the comparison of the genomes of the BAp, BSg, and BBp strains, with the addition of the gene yadF detected in the B. aphidicola strain from Tetraneura caerulescens (BTc; Sabater et al. 2004) but absent in the other three genomes. It was formed by 640 genes, 629 of them being located at the chromosome. Five genes, annotated as pseudogenes in the genome of BSg (Tamas et al. 2002), were reannotated as genes (Gil et al. 2003), based on the existence of putatively functional open reading frames and the essential role of the protein (lig, mfd, and endA), the putative use of unusual start codons (infC), or because the protein was produced by a programmed ribosomal frameshift (prfB).
One hundred sixty-four gene loss events were identified (table 1). Two of the losses of the BBp lineage were not taken into account because they involved plasmid genes (ibpA and repAC). Thus, at least 94 out of the 629 ancestral chromosomal genes were lost through the 100–150 Myr evolution of the BBp lineage chromosome. In contrast, the lineages of BAp and BSg only lost 32 and 44 genes, respectively, during the same period of time. These events only involved 135 different genes because of several convergent losses that occurred during the evolution of the B. aphidicola lineages. The genes ansA and hemD were lost threefold (BBp, and after the split of BAp and BSg, named from this point as late BAp and late BSg). The gene yadF was lost twice (BBp and before the split of BAp and BSg, named from this point as early BAp and early BSg). Nine genes were lost twice in the BBp and late BAp lineages. Eleven genes were lost twice in the BBp and late BSg lineages. Finally, apbE, cmk, cvpA, and bioH were lost twice in the late BAp and late BSg lineages (see table 1).
DNA Loss in the BAp and BSg Lineages
Eight genes (bioC, bioF, mutH, norM, pal, uspA, yqgE, and yadF) were analyzed (table 1 and fig. 3A) whose inactivation occurred before the divergence of BAp and BSg lineages (fig. 1), between 86–164 Myr (the estimated age for LCSA) and 50–70 Myr (the divergence time of BAp and BSg). Six out of the eight were not adjacent in the LCSA chromosome, while bioC and bioF genes were contiguous and, for that reason, in the DNA loss analysis they were treated as a block. When the length of each gene (Lg) and the length of its homologous disintegrated DNA region (Ld) were compared, in six out of eight cases more than 90% of the nucleotides had been lost, while in the other two genes the remnant DNA was only slightly higher than 10% (fig. 3A). The average Ld/Lg ratio for the eight genes was 0.055, with a range from 0 to 0.13. For each gene, the length of the disintegrated region (Ld) was the mean of the lengths for BAp and BSg. These two lengths were very similar, and the difference in the Ld/Lg value for both strains was not higher than 0.1 in any analyzed gene.
FIG. 3.— DNA loss in Buchnera aphidicola. (A) Aphidinae lineage: genes whose inactivation occurred between LCSA and the divergence of BAp and BSg lineages. (B) Aphidinae lineage: genes whose inactivation occurred after the divergence of BAp and BSg lineages. (C) Pemphiginae lineage: genes whose inactivation occurred during the evolution of BBp lineage. See Material and Methods for the estimation of the length (nucleotides) of each gene (Lg) and the length of the disintegrated DNA region after the reductive process (Ld). The width of each bar along the abscissa represents a class interval of 0.1, except for the first bar (>1.1). Genes were classified according to the presence or absence of similarity against the functional gene. The total height of each class interval represents the sum of these two conditions. Pseudogenes (similarity is still detected) are light gray columns. Regions without similarity are dark gray columns.
Assuming that the disintegration of these genes took place gradually with an average Ld/Lg ratio of 0.055 for an average period of time of 100 Myr, we have applied the continuous decay formula, Ld = Lg e–rt (Petrov and Hartl 1998) to estimate the deletion rate per nucleotide and Myr (r), with Ld being the length of the disintegrated DNA region at time t and Lg the length of the active gene (at time 0). We estimated a deletion rate of 2.9% per Myr (r = 0.029) and the function that explained the gradual decay as Ld = Lg e–0.029t (fig. 4). This means that the half-life of a pseudogene, the period for losing half of its nucleotides, is 23.9 Myr. This theoretical function would imply that the DNA of a gene that was inactivated during the first stages of B. aphidicola evolution would now be almost completely lost, while the DNA of those genes lost after the divergence of BAp and BSg (see fig. 1) would have a wide range of disintegration, with an Ld/Lg ratio from around 0.13 (70 Myr) to 1.
FIG. 4.— Hypothetical gradual DNA loss function based on the available information for the eight genes whose inactivation occurred between LCSA and the divergence of BAp and BSg lineages (see figs. 1 and 3). Half-life of a pseudogene (the period for losing half of its nucleotides) is 23.9 Myr.
Fifty-six out of 60 genes were analyzed (22 in BAp and 34 in BSg lineages) whose inactivation occurred in the lineages of BAp or BSg, after the divergence of both strains (see table 1). Fifty-three were not adjacent, and four of them were treated as blocks of two genes (znuA-yebA and ygcF-ygcM). When the degree of DNA loss was analyzed (fig. 3B), a wide range of variation was observed, with most of the genes presenting degrees of disintegration smaller than 20%. Forty-four of them, receiving the status of pseudogene, were probably inactivated very recently and, for that reason, nucleotide substitutions have not completely removed their similarity to the functional gene. On the contrary, when the DNA loss of nonpseudogene intergenic regions were analyzed separately, 12 out of 14 presented an Ld/Lg value equal to or smaller than 0.603, and some genes were even almost completely disintegrated. These genes would correspond to those inactivated in the first million years after the divergence of both lineages in agreement with the gradual decay rate (fig. 4). Although it is very difficult to make an estimation of the half-life of a pseudogene during this period, because the time of disintegration may vary from tens of millions to a few hundred years, we have tried to give an approximate range using exclusively the lengths of the 14 nonpseudogene intergenic regions. We estimated an average Ld/Lg ratio of 0.426 for them. To apply the continuous decay formula, an upper bound of 60 Myr (the divergence time) and a lower bound that we decided to fix at 20 Myr were used. By using these figures, we obtained a range from 48.7 Myr to 16.2 Myr, which is in the order of magnitude of the 23.9 Myr previously estimated for the first period.
A few genes not only did not present a reduction but even presented a slight increase in size. They were probably inactivated very recently and their larger size may be due to several causes: (1) nucleotide insertions; (2) the ancestral gene and intergenic sizes might not match exactly with that estimated for the reference B. aphidicola strain; (3) a fraction of the DNA might come from the loss of other nondetectable ancestral genes which, as in the case of yadF, were lost in the three completely sequenced B. aphidicola genomes; and (4) the 5' end of several genes might not be incorrectly annotated.
Because many host-associated bacteria display AT-enriched genomes, it has been proposed that a mutational pressure exists in these genomes toward the increase in A+T content (Moran 2002). In B. aphidicola this shift has caused the genic and intergenic G+C content to be very low: 26% and 15% in the genomes of BAp and BSg, respectively (Tamas et al. 2002). Once a gene is inactivated, the bias in the nucleotide substitutions produces the decrease in the G+C content. We compared the decrease in G+C content with the DNA loss for the gene losses of this period and estimated the correlation coefficient between Ld/Lg and ln (GCd/GCg) to be 0.765 (fig. 5). A parallel decrease in GC and length was observed with an equilibrium for the GC content decrease of around 0.47. This means that on average the final GC content of the analyzed regions is 47% of the initial composition.
FIG. 5.— Relationship between length (Ld/Lg) and GC content (GCd/GCg) decrease for those genes whose inactivation occurred after divergence of BAp and BSg lineages. A logarithmic transformation has been applied to the GC content decrease parameter. See figure 3 legend for additional information. Correlation coefficient is 0.765.
DNA Loss in the BBp Lineage
The DNA loss in the 94 genes present in the LCSA chromosome that were lost during the evolution of BBp lineage was analyzed. Fifty-four were analyzed independently, and 40 were analyzed in 13 blocks of genes containing two to eight genes. Because of the putative wide period for the start of disintegration, from 0 to 150 Myr, a large variation in the Ld/Lg values was obtained, but with a large number of genes almost completely disintegrated (fig. 3C). Only 23 out of 94 genes showed Ld/Lg values higher than 0.6. Nearly all pseudogenes were included in this group. No correlation was detected between the decrease in the length and G+C content for the lost genes of this lineage (data not shown). According to the gradual decay function, this result implies that only a few genes started their disintegration recently, while the large majority was decaying for more than 50–60 Myr. We also estimated the range of pseudogene half-life exclusively using the nonpseudogene intergenic regions, although in this case with a larger interval due to the upper bound of divergence (86–164 Myr). The average Ld/Lg ratio was 0.358, which renders a pseudogene half-life range of 81.1 to 13.5 Myr by using disintegration times from 120 to 20 Myr.
Shortening of Ancient Spacers
A comparison of the sizes of the intergenic regions for ancient spacers between E. coli and each of the three B. aphidicola strains is shown in figure 6. The average size (bp ± standard deviation) for BAp (51.1 ± 70.0), BSg (47.4 ± 63.6), and BBp (55.3 ± 76.1) was only slightly smaller than that for E. coli (67.5 ± 98.2). The Friedman test for the four genomes concluded that there was a difference in the mean ranks (P value = .017). This test was applied exclusively using the three B. aphidicola genomes and, in this case, the null hypothesis was not rejected (P value = .103). Therefore, the size distribution for the intergenic regions is different between B. aphidicola and E. coli.
FIG. 6.— Orthologous ancient spacers in Escherichia coli and Buchnera aphidicola. Relation between the size of the intergenic regions of E. coli and the size of the intergenic regions of BAp, BSg, and BBp. Ancient spacers are defined as those flanked by the same genes in B. aphidicola and E. coli (Mira, Ochman, and Moran 2001). Only those spacers present in all B. aphidicola strains were compared (n = 195). A dashed line marks the 1:1 slope.
Functional Role of the Lost Genes
On analyzing the nature of the 133 chromosomal lost genes, according to the Clusters of Orthologous Groups of proteins (COGs) functional classification (Tatusov, Koonin, and Lipman, 1997), it can be seen (table 2) that the losses embrace all of the functional categories, though the numbers are very different. The majority of these genes are involved in coenzyme transport and metabolism (21) and in cellular wall and membrane biogenesis (18). The most conserved are, as it could be expected, genes implicated in information processing. We found that the majority of these losses were not convergent (106 out of 133 genes), and they were independently produced in one lineage of B. aphidicola. These findings would indicate that the differential losses are related with the specific host, due to either its particular diet or lifecycle. A clear example of conservation is the case of the genes involved in the essential amino acid biosynthetic pathways that are conserved in the three analyzed B. aphidicola genomes, due to their nutritional role in the symbiosis providing essential amino acids to the insect. However, the genes involved in the ornithine pathway have been lost independently in BBp (van Ham et al. 2003). This loss affects genes such as argA, B, C, D, and E and others in relation to these pathways, such as the pyr and spe genes, implying that once a gene is lost, all the genes involved in this pathway become nonessential and thus susceptible to elimination.
Table 2 Classification of Lost Genes Based on COGs, and Analysis of Convergent Losses
Discussion
By pulsed-field gel electrophoresis (PFGE), it was shown that the genome of several B. aphidicola strains maintained a more or less stable chromosomal size of around 630–643 kb. (Wernegreen et al. 2000). This result, additionally supported by the sequence of the BAp and BSg genomes (Shigenobu et al. 2000; Tamas et al. 2002), led to the idea that after an initial phase of adaptation to the endosymbiotic lifestyle, B. aphidicola was unable to lose its nonfunctional DNA to continue with the genome size reduction. Moreover, it was recently proposed that the rate at which genic sequences are erased from the modern B. aphidicola genome is as low as one nucleotide per 10,000 years (Mira, Klasson, and Andersson 2002).
However, a new PFGE including B. aphidicola strains from other aphid subfamilies (Gil et al. 2002), together with the sequencing of the BBp strain with a chromosome of 616 kb (van Ham et al. 2003), revealed a wide range of chromosomal sizes, with 450 kb being the minimum size reported so far for a bacterial species. These results show that the genome of B. aphidicola, at least in some period of the evolution of several lineages, had experienced a process of genome reduction at a nonirrelevant rate.
Once a gene is inactivated, its DNA is affected by two types of changes: (1) a mutational bias towards A+T, which provokes a decrease in the GC content (Moran 2002); and (2) the deletion of some of its nucleotides. Our work has shown that both processes present some degree of correlation and, in general, the DNA of the inactivated genes became shorter and A+T richer with time. The rate of DNA loss of B. aphidicola is sufficiently high to produce the complete or almost complete disintegration of the genes in a short period of time. We have shown that the rate of those genes whose inactivation occurred more than 50–70 MYA in the B. aphidicola from the Aphidinae lineage was high enough to almost completely remove them, and the rate of the genes inactivated after this date produced in a few cases a complete loss, and in others a partial loss. Those genes with inappreciable reductions were mainly pseudogenes that were probably inactivated very recently: However, it is possible that some of them were still functional genes. The production of small amounts of a functionally complete protein is possible for some pseudogenes if they are transcribed and if a significant level of ribosomal frameshifting takes place during translation, as has been described for several E. coli genes (Gurvich et al. 2003).
The absence of a date for the divergence of two or more members from the Pemphiginae did not allow us to determine whether the genes lost in an early stage had been completely erased, but it is evident that a large proportion of lost genes have lost a large quantity of their nucleotides. For that reason, we believe that in both lineages the DNA of a gene can be almost completely deleted from the genome in 40 to 60 Myr. Our estimation of the half-life of a pseudogene in B. aphidicola of 23.9 Myr is in the range of the 14.3 Myr estimated for Drosophila (Petrov and Hartl 1998) but is much smaller than the 615 Myr for Laupala (Petrov et al. 2000) or the 884 Myr for mammals (Petrov and Hartl 1998). This estimation was done for an early period of the evolution of the BAp/BSg lineage using the eight genes that became inactivated during this phase. By exclusively using a part of the genes whose disintegration started after BAp/BSg divergence, we obtained a value of the same order (16.2–48.7 Myr). A similar analysis, although affected by the large period of time used for the estimation, rendered a value from 13.5 to 81.1 for the BBp lineage. These results show that DNA loss is taking place at a nonirrelevant rate during the evolution of all B. aphidicola lineages. The disintegration rate for free-living bacteria or for the initial steps of the adaptation to endosymbiosis was probably higher because several mechanisms, now lost in B. aphidicola, can produce drastic losses of nucleotides. The mechanisms are mainly, on one hand, the loss of an efficient recombinational system (Shigenobu et al. 2000) which, in combination with direct repeats, would produce deletions (Frank, Amiri, and Andersson 2002) and, on the other, the decrease in the close direct repeat frequency in the genome which may be the substrate for DNA polymerase slippage. This is probably the reason why it is very difficult to identify pseudogenes in many bacterial species (Lawrence, Hendrix, and Casjens 2001).
What controls the size of the B. aphidicola genome is the importance, or essentiality, of the function of the different DNA sequences that comprise it, either genes or intergenic regions with some kind of function. Once any of these DNA segments loses its function, a process of gradual DNA loss decreases its length. For that reason, the size of the B. aphidicola chromosome may still continue to be reduced, and the limit for this reduction will be associated with the minimum number of genes required for bacterial cell life and the symbiotic contribution to the life of its insect host. It is worth noticing that the genome of the five completely sequenced bacterial endosymbiont of insects share only 313 genes, 277 of them being protein-coding (Gil et al. 2003). This minimal set would produce genome sizes as small as 300 kb, with one-third of these genes nonessential for a bacterial cell but required for supporting the survival of its host. A slight decrease in the ancient intergenic spacers was also detected, but its contribution to the total chromosomal reduction will be much smaller than the loss and disintegration of the genes.
Genome size in bacteria is a balance between several mechanisms that produce the insertion or deletion of small or large DNA segments. Mechanisms producing the insertion or deletion of hundreds or thousands of nucleotides in a single event have a high impact on total genome size. However, these mechanisms seem to have been lost in the present B. aphidicola genome evolution. The stability of the gene order of its genome is probably due to the lack of elements that disrupt the chromosomal structure, such as transposable elements, phages, large repeats, and probably an efficient homologous recombinational mechanism (Rocha 2003). However, the loss of the ability to acquire foreign DNA fragments by horizontal gene transfer events drastically reduces the impact of the mechanisms that increase the genome size. Therefore, the main mechanism affecting the evolution of the size of the modern B. aphidicola genomes are those mutational events involving a small number of nucleotides (insertions or deletions). In fact, the most frequent polymorphism detected during the sequencing of the BBp and BSg genomes were small indels with an average size of between one and two nucleotides (Tamas et al. 2002; van Ham et al. 2003). Slipped-mispair errors during DNA replication are probably the main cause of these polymorphisms. In addition, the presence of close repeats (Rocha and Blanchard 2002), which are short repeats (> 8–10 nt) separated by a spacer of several nucleotides, may be important because it generates slightly larger duplications or deletions (up to several hundred nucleotides; Rocha 2003). Furthermore, although these events took place at a lower rate than the single nucleotide indels, their impact will be much stronger. It has been observed that the genes present in BAp or BSg, whose orthologs were lost in the other strain, presented a slightly higher number of close repeats (larger than 9 nt) than the average gene of the genome (Rocha 2003). This would indicate that the probability of inactivation of a nonessential gene is higher when it contains more repeats. However, the density of repeated sequences in B. aphidicola, as well as in other host-associated bacteria, is dramatically decreased when compared with free-living bacteria such as E. coli, Salmonella spp., or Bacillus spp. (Tamas et al. 2002).
On the contrary, the size of the intergenic regions is not greatly affected by the genome reduction process, and ancient spacers are only slightly smaller in B. aphidicola. This difference disappears in BAp if spacers with annotated regulatory regions in E. coli are excluded (Mira, Ochman, and Moran 2001).
Finally, although insertions and deletions are factors contributing to the evolution of genome size, the evolutionary forces that led to its reduction are a matter for discussion. Several authors have proposed that in many bacterial genomes, a bias to the net DNA loss exists based on a higher number of deletion events (vs. insertions) and/or a higher average size of the deleted segments (vs. inserted; Andersson and Andersson 2001; Lawrence, Hendrix, and Casjens 2001; Mira, Ochman, and Moran 2001; Gregory 2004). If this bias is true, genetic drift could contribute to the fixation of the more abundant deletional mutations. The effect of this mechanism would be very important in B. aphidicola, due to the small population sizes and to the special manner of vertical transmission with bottlenecks in each generation (Mira and Moran 2002).
Alternatively or simultaneously, natural selection may be partially or completely responsible for the reduction. If small-size genomes replicate faster, increasing their frequency in the polyploid B. aphidicola cell (Komaki and Ishikawa 2000), and if these cells divide faster, we can expect deletional mutations to become fixed with time, independently of whether they are produced in higher, lower, or equal rates to insertions. Although this hypothesis has been proposed several times for the reduction of the genome sizes of obligate cellular bacteria and mitochondrial genomes (Selosse, Albert, and Godelle 2001; Silva, Latorre, and Moya 2001), there are few examples supporting it. A negative correlation was observed between DNA content and division rate for some ciliate spp. (Wickham and Lynn 1990). However, no such correlation was observed between doubling times in laboratory conditions and genome sizes over bacteria belonging to 10 major taxonomic divisions (Mira, Ochman, and Moran 2001) nor for growth rates of E. coli strains varying in as much as 25% in chromosome size (Bergthorsson and Ochman 1998). Because of the small difference in size that a 1-nt-indel represents, it seems fully reasonable to accept the conclusion that selection does not differentiate between individual small indels (Gregory 2004). However, studies in Drosophila have shown some evidence that deletions larger than 400 bp may be advantageous (Blumenstiel, Hartl, and Lozovsky 2002). Because of the smaller genome size of bacterial endosymbiont chromosomes, it cannot be completely ruled out that chromosomes with small sizes because of one or several small deletions can be selectively advantageous.
Acknowledgements
This work was supported by grant BMC2003-00305 from the Ministerio de Ciencia y Tecnología (Spain) and grant Grupos03/204 from Generalitat Valenciana (Spain). L.G.-V. was funded by a predoctoral fellowship from Generalitat Valenciana (Spain). We would like to thank the three anonymous reviewers for their comments and suggestions that have contributed to the improvement of this paper. We also want to thank to E. Vercher for support with statistical tests.
References
Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat. Genet. 32:402–407.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 25:3389–3402.
Andersson, J. O., and S. G. Andersson. 1999. Genome degradation is an ongoing process in Rickettsia. Mol. Biol. Evol. 16:1178–1191.
———. 2001. Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol. Biol. Evol. 18:829–839.
Andersson, S. G., and C. G. Kurland. 1998. Reductive evolution of resident genomes. Trends Microbiol. 6:263–268.
Andersson, S. G. E., A. Zomoropipour, J. O. Andersson, T. Sicheritz-Ponten, U. C. M. Alsmark, R. M. Podowski, A. K. Naslund, A.-S. Eriksson, H. Winklerh, and C. G. Kurland. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–143.
Baumann, L., P. Baumann, N. A. Moran, J. Sandstrom, and M. L. Thao. 1999. Genetic characterization of plasmids containing genes encoding enzymes of leucine biosynthesis in endosymbionts (Buchnera) of aphids. J. Mol. Evol. 48:77–85.
Baumann, P., L. Baumann, C.-Y. Lai, and D. Rouhbakhs. 1995. Genetics, physiology, and evolutionary relationships of the genus Buchnera: intracellular symbionts of aphids. Annu. Rev. Microbiol. 49:55–94.
Bergthorsson, U., and H. Ochman. 1998. Distribution of chromosome length variation in natural isolates of Escherichia coli. Mol. Biol. Evol. 15:6–16.
Blumenstiel, J. P., D. L. Hartl, and E. R. Lozovsky. 2002. Patterns of insertion and deletion in contrasting chromatin domains. Mol. Biol. Evol. 19:2211–2225.
Bracho, A. M., D. Martinez-Torres, A. Moya, and A. Latorre. 1995. Discovery and molecular characterization of a plasmid localized in Buchnerasp. bacterial endosymbiont of the aphid Rhopalisiphum padi. J. Mol. Evol. 41:67–73.
Brynnel, E. U., C. G. Kurland, N. Moran, and S. G. E. Anderson. 1998. Evolutionary rates for tuf genes in endosymbionts of aphids. Mol. Biol. Evol. 15:574–582.
Chambaud, I., R. Heilig, S. Ferris et al. (12 co-authors). 2001. The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res. 29:2145–2153.
Clark, M. A., L. Baumann, M. L. Thao, N. A. Moran, and P. Baumann. 2001. Degenerative minimalism in the genome of a psyllid endosymbiont. J. Bacteriol. 183:1853–1861.
Clark, M. A., N. A. Moran, and P. Baumann. 1999. Sequence evolution in bacterial endosymbionts having extreme base compositions. Mol. Biol. Evol. 16:1586–1598.
Frank, A. C., H. Amiri, and S. G. E. Andersson. 2002. Genome deterioration: loss of repeated sequences and accumulation of junk DNA. Genetica. 115:1–12.
Fraser, C. M., J. D. Gocayne, O. White et al. (29 co-authors). 1995. The minimal gene complement of Mycoplasma genitalium. Science 270:397–403.
Gil, R., B. Sabater-Munoz, A. Latorre, F. J. Silva, and A. Moya. 2002. Extreme genome reduction in Buchnera spp.: toward the minimal genome needed for symbiotic life. Proc. Natl. Acad. Sci. USA 99:4454–4458.
Gil, R., F. J. Silva, E. Zientz et al. (13 co-authors). 2003. The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes. Proc. Natl. Acad. Sci. USA 100:9388–9393.
Gregory, T. R. 2004. Insertion-deletion biases and the evolution of genome size. Gene 324:15–34.
Gurvich O. L., P. V. Baranov, J. Zhou, A. W. Hammer, R. F. Gesteland, and J. F. Atkins. 2003. Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli. EMBO J. 22:5941–5950.
Himmelreich, R., H. Hilbert, H. Plagens, E. Pirkl, B. C. Li, and R. Herrmann. 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24:4420–4449.
Kalman, S., W. Mitchell, R. Marathe, C. Lammel, J. Fan, R. W. Hyman, L. Olinger, J. Grimwood, R. W. Davis, and R. S. Stephens. 1999. Comparative genomes of Chlmydia pneumoniae and C. trachomatis. Nat. Genet. 21:385–389.
Komaki, K., and H. Ishikawa. 2000. Genomic copy number of intracellular bacterial symbionts of aphids varies in response to developmental stage and morph of their host. Insect. Biochem. Mol. Biol. 30:253–258.
Lai, C.-Y., L. Baumann, and P. Baumann. 1994. Amplification of trpEG: Adaptation of Buchnera aphidicola to an endosymbiotic association with aphids. Proc. Natl. Acad. Sci. USA 91:3819–3823.
Lawrence, J. G., R. W. Hendrix, and S. Casjens. 2001. Where are the pseudogenes in bacterial genomes? Trends Microbiol. 9:535–540.
Mira, A., L. Klasson, and S. G. E. Andersson. 2002. Microbial genome evolution: sources of variability. Curr. Opin. Microbiol. 5:506–551.
Mira, A., and N. A. Moran. 2002. Estimating population size and transmission bottlenecks in maternally transmitted endosymbiotic bacteria. Microb. Ecol. 44:137–143.
Mira, A., H. Ochman, and N. A. Moran. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589–596.
Moran, N. A. 1996. Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93:2873–2878.
———. 2002. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108:583–586.
Moran, N. A., and A. Mira. 2001. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2:research0054.
Moran, N. A., M. A. Munson, P. Baumann, and H. Ishikawa. 1993. A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proc. R. Soc. Lond. B Biol. Sci. 253:167–171.
Moran, N. A., and J. J. Wernegreen. 2000. Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol. Evol. 15:321–326.
Ogata, H., S. Audic, P. Renesto-Audiffren et al. (11 co-authors). 2001. Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293:2093–2098.
Petrov, D. A., and D. L. Hartl. 1998. High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. Mol. Biol. Evol. 15:293–302.
Petrov, D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, and K. L. Shaw. 2000. Evidence for DNA loss as a determinant of genome size. Science 287:1060–1062.
Remaudière, G., and M. Remaudière. 1997. Catalogue des Aphididae du Monde. Homoptera Aphidoidea. Institut National de la Recherche Agronomique, Paris.
Rocha, E. P., and A. Blanchard. 2002. Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res. 30:2031–2042.
Rocha, E. P. C. 2003. An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction. Genome Res. 13:1123–1132.
Rouhbakhsh, D., C. Lai, C. D. von Dohlen, M. A. Clark, L. Baumann, P. Baumann, N. A. Moran, and D. J. Voegtlin. 1996. The Tryptophan biosinthetic pathway of aphid endosymbionts (Buchnera): genetics and evolution of plasmid-associated anthranilate synthase (trpEG ) within the Aphididae. J. Mol. Evol. 42:414–421.
Sabater-Mu?oz, B., L. Gomez-Valero, R. C. van Ham, F. J. Silva, and A. Latorre. 2002. Molecular characterization of the leucine cluster in Buchnera sp. strain PSY, a primary endosymbiont of the aphid Pemphigus spyrothecae. Appl. Environ. Microbiol. 68:2572–2575.
Sabater-Mu?oz, B., R. C. H. J. van Ham, A. Moya, F. J. Silva, and A. Latorre. 2004. Evolution of the leucine gene cluster in Buchnera aphidicola. Insights from chromosomal versions. J. Bacteriol. 186:2646–2654.
Selosse, M., B. Albert, and B. Godelle. 2001. Reducing the genome size of organelles favours gene transfer to the nucleus. Trends Ecol. Evol. 16:135–141.
Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86.
Silva, F. J., A. Latorre, and A. Moya. 2001. Genome size reduction through multiple events of gene disintegration in Buchnera APS. Trends Genet. 17:615–618.
———. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends Genet. 19:176–180.
Silva, F. J., R. C. H. J. van Ham, B. Sabater, and A. Latorre. 1998. Structure and evolution of the leucine plasmids carried by the endosymbiont (Buchnera aphidicola) from aphids of the family Aphididae. FEMS Microbiol. Lett. 168:43–49.
Stephens, R. S., S. Kalman, C. Lammel et al. (12 co-authors). 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282:754–759.
Tamas, I., L. Klasson, B. Canback, A. K. Naslund, A. S. Eriksson, J. J. Wernegreen, J. P. Sandstrom, N. A. Moran, and S. G. E. Andersson. 2002. 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379.
Tatusov, R. L., E. V. Koonin, and D. J. Lipman. 1997. A genomic perspective on protein families. Science. 278:631–637.
van Ham, R. C. H. J., F. Gonzalez-Candelas, F. J. Silva, B. Sabater, A. Moya, and A. Latorre. 2000. Postsymbiotic plasmid acquisition and evolution of the repA1-replicon in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 97:10855–10860.
van Ham, R. C. H. J., J. Kamerbeek, C. Palacios et al. (16 co-authors). 2003. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581–586.
van Ham, R. C. H. J., D. Martinez-Torres, A. Moya, and A. Latorre. 1999. Plasmid-encoded anthranilate synthase (TrpEG) in Buchnera aphidicola from aphids of the family Pemphigidae. Appl. Environ. Microbiol. 65:117–125.
van Ham, R. C. H. J., A. Moya, and A. Latorre. 1997. Putative evolutionary origin of plasmids carrying the genes involved in leucine biosynthesis in Buchnera aphidicola (endosymbiont of aphids ). J. Bacteriol 179:4768–4777.
von Dohlen, C. D., and N. A. Moran. 2000. Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alteration. Biol. J. Linn. Soc. 71:689–717.
Wernegreen, J. J., H. Ochman, I. B. Jones, and N. A. Moran. 2000. Decoupling of genome size and sequence divergence in a symbiotic bacterium. J. Bacteriol. 182:3867–3869.
Wickham, S. A., and D. H. Lynn. 1990. Relations between growth-rate, cell-size, and DNA content in colpodean ciliates (Ciliophora, Colpodea). Eur. J. Protistol. 25:345–352.(Laura Gómez-Valero, Ampar)