当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第12期 > 正文
编号:11255325
Nonneutral Evolution of the Transcribed Pseudogene Makorin1-p1 in Mice
     Department of Ecology and Evolutionary biology, University of Michigan, Ann Arbor

    E-mail: jianzhi@umich.edu.

    Abstract

    Pseudogenes are nonfunctional relics of formerly functional genes and are thought to evolve neutrally. In some pseudogenes, however, the molecular evolutionary patterns are atypical of neutrally evolving sequences, exhibiting sequence conservation, codon-usage bias, and other features associated with functional genes. Makorin1-p1 is a transcribed pseudogene first identified in the mouse Mus musculus. The transcript of Makorin1-p1 can regulate the stability of the transcript of its paralogous functional gene Makorin1. Specifically, the half-life of Makorin1 mRNA increases significantly in the presence of Makorin1-p1 transcript, and targeted deletion of Makorin1-p1 is lethal in mice. Here, we show that Makorin1-p1 originated after the separation of Mus and Rattus but before the divergence of M. musculus and M. pahari. The transcribed region of Makorin1-p1 exhibits rates of point and indel substitutions that are two to four times lower than those in the untranscribed region, suggesting that the transcribed region is under functional constraint and is not evolving neutrally. Although the transcript of Makorin1-p1 likely functions by its sequence similarity to Makorin1, we find no evidence of gene conversion between them, indicating that functional conservation alone is sufficient to maintain their coordinated evolution. A duplication-degeneration model is proposed to explain how Makorin1-p1 was co-opted into the regulatory system of Makorin1. There are over 10,000 pseudogenes in a typical mammalian genome, and it is plausible that many functional but untranslatable pseudogenes exist. Our results illustrate the potential of using evolutionary analysis to identify such pseudogenes from genome sequences.

    Key Words: substitution rate ? neutrality ? pseudogene ? Makorin ? rodents ? mice

    Introduction

    Pseudogenes are defined as DNA sequences of formerly functional genes rendered nonfunctional by severe mutations. Operationally, pseudogenes are usually identified by their disrupted open reading frames (ORFs), which are homologous to functional genes. Because of the presumable nonfunctionality, pseudogenes have been regarded as a paradigm of neutral evolution (Li, Gojobori, and Nei 1981). Several examples, however, challenge this traditional perspective. Pseudogenes such as Drosophila yakuba Adh- (Jeffs and Ashburner 1991; Jeffs, Holmes, and Ashburner 1994) and D. melanogaster Lcp- (Pritchard and Schaeffer 1997), -esterase gene cluster (Robin et al. 2000), and Pglym87- (Currie and Sullivan 1994) exhibit features of molecular evolution atypical of neutrally evolving sequences. Although seemingly deleterious mutations render these pseudogenes untranslatable, characteristics such as retention of codon-usage bias, excess of synonymous substitution rate over nonsynonymous rate, and rates of evolution only slightly higher than those of functional genes have been observed (Balakirev and Ayala 2003).

    Direct evidence for biological function of operationally defined pseudogenes is available in only a few cases. Nitric oxide synthase (NOS) is involved in intracellular signaling in the nervous system and is coexpressed with a NOS-related pseudogene in the snail Lymnaea stagnalis (Korneev, Park, and O'Shea 1999). It was experimentally shown that the pseudogene transcript plays an important role in the regulation of the NOS protein synthesis by forming a stable RNA-RNA duplex with the transcript of the functional NOS gene. The formation of the duplex significantly suppresses the translation of NOS mRNA and is possible because of antisense identity between the transcripts of the functional gene and pseudogene. In another dramatic example, targeted deletion of the pseudogene Makorin1-p1 in mice led to 80% mortality within 2 days of birth (Hirotsune et al. 2003). Affected individuals suffered from severe bone deformity and failure to thrive (Hirotsune et al. 2003). Although Makorin1-p1 (on chromosome 5) showed typical features of a truncated pseudogene derived from the functional gene Makorin1 (on chromosome 6), it was robustly transcribed. Sequence analysis of the approximately 700-bp pseudogene transcript revealed high similarity to the 5' end of Makorin1 mRNA. Hirotsune et al. (2003) experimentally showed that the Makorin1-p1 transcript regulates the stability of Makorin1 mRNA in trans. In the presence of Makorin1-p1 transcript, the half-life of Makorin1 mRNA significantly increased. Although the exact molecular mechanism involved in this expression regulation is not clear, Hirotsune et al. (2003) proposed a plausible RNA-mediated model in which transcripts of Makorin1-p1 and Makorin1 directly compete for a destabilizing factor (e.g., RNase) that binds to the highly similar 5' sequence found in both transcripts (see also Lee [2003]). The pseudogene transcript effectively titrates out the destabilizing factor and protects the coding mRNA from decay.

    This gene regulation model, as well as the high sequence similarity (95.4%) between the Makorin1-p1 transcript and the 5' portion of Makorin1 mRNA, suggests that Makorin1-p1 and Makorin1 evolve in a coordinated fashion. Is it because of concerted evolution? Are there functional constraints on the transcribed and untranscribed portions of Makorin1-p1? To address these questions, we sequenced Makorin1-p1 in five additional Mus species. Our results show functional constraints on transcribed regions of Makorin1-p1, yet no evidence for concerted evolution.

    Materials and Methods

    PCR and Sequencing

    Hirotsune et al. (2003) discovered a transcript (700 bp) of the pseudogene Makorin1-p1 (GenBank accession number AF 494488) capable of regulating the expression of its functional paralog Makorin1. We sequenced a continuous genomic region that spans extra 120 nucleotides 5' and 776 nucleotides 3' of the reported Makorin1-p1 transcript from five rodents: Mus macedonicus, M. caroli, M. cervicolor, M. spretus, and M. spicilegus. The high sequence divergence in the nontranscribed regions prevented us from amplifying additional flanking sequences. We also sequenced portions of Makorin1 that are homologous to the transcribed region of Makorin1-p1 from M. spretus and M. caroli. The primers used for PCR amplifications of Makorin1-p1 were based on M. musculus Makorin1-p1 (NT 039308 contig sequence), and the primer sequences are 5' of the first half: AGCAGCCACGTTTATGTCCATC; 3' of the first half: ACCACCTCCATGCAGATCCC and CTGCTTCTCCTCTTTCTCCTCCA; 5' of the second half: GCTGCCCAGAGGTCACAGC and CTTACTGCCCCTTCCTGCACT; and 3' of the second half: GGGAAGGTGAGGGTAGAGGAAC. PCR primers used in the amplification of Makorin1 were designed according to M. musculus Makorin1 (AF192785 ) and the primer sequences are 5' exon1: CCATTGCTGTGTGGGATAAACAGTAAT, 3' exon1: GCAACGTCCTACCTGCAGGTGA; 5' exon2: GATTTCTGGTTCTTTGTCTTACAGATA, 3' exon2: AAGAGTTTATAAACAAATCCTTACCTGC; 5' exon3: TAACTGCTTACTTTTCTCTGTCAGATACG, 3' exon3: GTTTACAGGGTCCTCACAATACTTACTAC; 5' exon4: CCTTTACTACCATCTTAAACATTTTCAGC, 3' exon4- CACAAGCTGCTCTGAGCACTTACTT; 5' exon5: CTCTTGTCTCTTCCCTTCTTCCAGTC, 3' exon5: CGAAGATTGGGGAGGAGTCTCACT; and 5' exon6: AGCCTGTTCATTGTTTCCCAGGTC, 3' exon6: AGCCCGGCTGCTCATACCTCA. PCR products were purified and cloned into pCR4TOPO vector (Invitrogen) and sequenced in both directions by an automated DNA sequencer.

    Sequence Analysis

    The nucleotide sequences of Makorin1-p1 from Mus musculus and the above five Mus species were aligned using ClustalX (Thompson et al. 1997) followed by manual adjustment. A well-supported phylogenetic tree of the six taxa used in our analysis was recently obtained using nuclear and mitochondrial DNA sequences (Lundrigan, Jansa, and Tucker 2002). We used this tree topology in our analysis of substitution rates. Insertion and deletion substitutions were manually counted using the parsimony principle based on the species tree. Nucleotide substitution rates were estimated using MEGA version 2.0 (Kumar et al. 2001). Likelihood ratio tests were used to examine nucleotide substitution rate variation among regions of the sequenced Makorin1-p1 pseudogene. Because likelihood ratio tests may give different results under different assumptions (Zhang 1999), we implemented JC69 (Jukes and Cantor 1969), HKY85 (Hasegawa, Kishino, and Yano 1985), TN93 (Tamura and Nei 1993), and general reversible (GTR [Yang 1997]) nucleotide substitution models. JC69 is the simplest of these four models, and it assumes equal probabilities for all types of nucleotide substitutions. A more complex HKY85 model allows for unequal base frequencies and incorporates different substitution rates for transitions and transversions. TN93 generalizes the HKY85 model to allow for different rates for transitions between purines and between pirimidines. The GTR model is the most complex and includes three parameters describing nucleotide frequencies and five parameters describing relative substitution rates among nucleotides. We also used a gamma distribution that describes rate variation among sites in GTR. PAML (Yang 1997) was used for the likelihood computation. To test whether there is concerted evolution between Makorin1-p1 and Makorin1, we made a gene tree using the neighbor-joining method (Saitou and Nei 1987) and examined sister relationships among genes. Sequences used in this phylogenetic reconstruction included Makorin1-p1 sequences of all six Mus species mentioned above, Makorin1 sequences from M. musculus, M. caroli, and M. spretus, and representatives of the Makorin gene family from mouse, rat, and human. The number of synonymous nucleotide substitution per synonymous site (dS) was computed by the modified Nei-Gojobori method (Zhang, Rosenberg, and Nei 1998).

    Results

    Makorin1-p1 has only been identified previously in Mus musculus (Hirotsune et al. 2003). To determine its time of origin, we searched the rat and human genomes but did not find Makorin1-p1. This suggests either that Makorin1-p1 emerged in mice after they diverged from rats or that the gene deteriorated enough that it could not be recovered from the rat and human genome. To investigate the evolutionary rate and pattern of Makorin1-p1, we sequenced its genomic region that is known to be transcribed (700 bp). In addition, we sequenced an adjacent region upstream of the transcript because the upstream sequence may contain necessary cis regulatory elements for transcription. For comparison, we also sequenced a downstream region of the transcript. Because of high sequence divergence, only 120 and 776 nucleotides were obtained for the two flanking regions, respectively. From 5' to 3', the three regions are referred to as regions A (120 bp), B (700 bp), and C (776 bp) (fig. 1). We were able to obtain these sequences from M. macedonicus (AY699801), M. caroli (AY699804), M. cervicolor (AY699803), M. spretus (AY699805), and M. spicilagus (AY699802). Amplification of Makorin1-p1 from a more distantly related species M. pahari (AY699800) was partially successful, as only region A+B of the pseudogene could be recovered. The full-length sequence from M. musculus was downloaded from GenBank. In the following analyses, Mus pahari is excluded, and only the six full-length Makorin1-p1 sequences of M. macedonicus, M. caroli, M. cervicolor, M. spretus, M. spicilagus, and M. musculus are considered.

    FIG. 1.— Nucleotide sequence alignment of Makorin1-p1 from six Mus species. Dots represent identical nucleotides to the Mus musculus sequence, and dashes indicate alignment gaps. The dotted line marks region A, solid line region B (transcribed portion of Makorin1-p1), and dashed line region C. For comparison, the Makorin1 sequence of M. musculus that is homologous to region B of Makorin1-p1 is also presented. The first 20 and last 22 nucleotides in this alignment are primer-encoded sequences and are not used in subsequent analyses.

    We compared the pairwise distances among the six species for region A+B and for region C of the Makorin1-p1 sequences (fig. 2). Ten out of 15 pairwise distances are higher for region C than for region A+B, prompting us to further examine the rate heterogeneity in detail.

    FIG. 2.— Pairwise Jukes-Cantor distances of the six Mus species at the Makorin1-p1 locus. Distances were estimated with the complete-deletion option.

    Applying the likelihood ratio test, we examined whether the nucleotide substitution rate varies among the three regions aforementioned. We assumed a particular model of nucleotide substitution for all the regions but allowed the substitution rate to vary among regions. The known species tree (fig. 3) for the six Mus species was used. To test several hypotheses of rate heterogeneity, we first implemented the JC69 model and treated the whole Makorin1-p1 sequence as a single region (A+B+C together). The logarithm of the likelihood (lnL) calculated for this hypothesis was –3023.95 (table 1). Next, we treated regions A and B separately from region C (A+B and C), under the assumption that the same JC69 model applies to the two regions and the relative branch lengths remain constant for the two regions. The lnL for this hypothesis was –3013.01, significantly higher than the previous one (P < 0.00001). The relative substitution rate for region C was 2.01 times that of region A+B. This suggests that region C evolves significantly more rapidly than region A+B. Last, we treated all three regions A, B, and C separately (A, B, C), obtaining lnL = –3012.53. The relative substitution rates in regions A, B, and C were 1, 0.72, and 1.59, respectively. This result suggests that the transcribed region (B) evolves the slowest, and region C evolves 1.5 to 2 times faster than region A and B. However, the likelihood ratio test did not show this hypothesis to be significantly better than our second hypothesis, implying that regions A and B evolve at similar pace. Furthermore, we applied HKY85, TN93, and GTR nucleotide substitution models to our data set following a similar strategy to see whether our results are robust. The outcome of these analyses is summarized in table 1. We found nucleotide substitution rates, regardless of the substitution models assumed, to be no different between regions A and B. However, when compared with region C, A and B evolve significantly slower. This result remained unchanged even when a gamma distribution that describes among-site rate heterogeneity was used (table 1).

    FIG. 3.— Numbers of indels in the evolution of Makorin1-p1 in six Mus species. On each tree branch is the parsimony-inferred number of indels for region A+B, followed by that for region C. The phylogeny follows Lundrigan, Jansa, and Tucker (2003).

    Table 1 Likelihood Ratio Tests of Equal Substitution Rates Among Different Regions of Makorin1-p1

    To compare the Makorin1-p1 substitution rates with the neutral substitution rate in the Mus genome, we computed the synonymous nucleotide substitution rate (dS) from 10 orthologous gene pairs of M. caroli and M. musculus using the modified Nei-Gojobori method (Zhang, Rosenberg, and Nei 1998). Because of the limited sequence data of M. caroli in GenBank, these are the only genes in GenBank that are suitable for this analysis. The 10 dS values vary from 0.025 to 0.126 (table 2), with an average of 0.049. The nucleotide distance between the same species pair for Makorin1-p1 was 0.032 ± 0.006 and 0.107 ± 0.012 for the A+B and C regions, respectively. Although the nucleotide substitution rate in region C is significantly higher than the average synonymous substitution rate of the 10 genes concatenated (P < 0.01), there is one gene (Rnase6) that shows higher dS than the substitution rate of region C. If the 10 genes can be considered as representatives of the entire genome, the substitution rate in region C is at the 10% upper tail of the distribution from all genes in the genome. In other words, Makorin1-p1 is located in a region of relatively high mutation rate. Bustamante, Nielsen, and Hartl (2002) reported effects of GC-content differences between parental functional gene and new pseudogene loci on pseudogene substitution rates. In our case, however, Makorin1 and Makorin1-p1 loci have nearly identical GC content. It is also unlikely that region C is under positive selection, as it is not transcribed and has no known function.

    Table 2 Numbers of Synonymous Substitutions per Synonymous Site (ds) Between Orthologous Genes of M. caroli and M. musculus Calculated from Available Coding Sequences in GenBank

    The nucleotide sequence alignment of Makorin1-p1 (fig. 1) shows the presence of 22 insertions/deletions (indels). To test whether indels are randomly distributed between region A+B and region C, we mapped the indels onto the species tree (fig. 3). We observed five and 17 indels for regions A+B and C, respectively. This translates into 0.006 and 0.025 indel substitutions per site for the two regions, respectively. Following Podlaha and Zhang (2003), we compared the two rates by a chi-square test and found them to be significantly different (2= 8.983, P < 0.003). On the other hand, we found no significant difference in indel substitution rate between regions A and B (P > 0.4). This finding, together with the result of a significantly lower nucleotide substitution rate in region A+B than in region C, strongly suggests the presence of selective constraints on region A+B, implying nonneutral evolution of Makorin1-p1.

    Although the exact molecular mechanism by which the Makorin1-p1 transcript regulates the stability of Makorin1 mRNA is unknown, the suggested model (Hirotsune et al. 2003) implies that high sequence similarity between them is necessary for the functionality of the pseudogene. That is, Makorin1-p1 must evolve in concordance with Makorin1. To examine whether gene conversion and concerted evolution is responsible for this coordinated evolution, we reconstructed a gene phylogeny using Makorin1-p1 sequences from all six species and the Makorin1 sequence from M. musculus, M. caroli, and M. spretus (fig. 4). Included in the tree were also other members of the Makorin family from the human, mouse, and rat. Only regions homologous to Makorin1-p1 region B were used for tree-making. Concerted evolution will generate phylogenetic clustering of the functional gene and pseudogene from the same species. In other words, Makorin1-p1 and Makorin1 of M. musculus, for example, should form a monophyletic group, in exclusion of Makorin1-p1 of other species. This, however, was not observed. Makorin1-p1 of M. musculus is more closely related to pseudogenes of other Mus species than to Makorin1 of M. musculus, with high bootstrap support. The same result was obtained when the partial sequence of M. pahari Makorin1-p1 was included in the phylogenetic analysis. We noted, however, that the clade of Makorin1 from M. caroli, M. musculus, and M. spretus does not show the same phylogenetic relationships as the clade of Makorin1-p1 for the same taxa. Makorin1 shows a sister relationship of M. spretus to M. musculus, but Makorin1-p1 exhibits a sister relationship of M. spretus to M. caroli. This discrepancy is most likely caused by the limited number of nucleotides (610) used in this phylogenetic analysis. At any rate, our result suggests that gene conversion is not responsible for the presumably coordinated evolution between Makorin1-p1 and Makorin1. Functional conservation and purifying selection may be the sole reason for the high sequence similarity between them. Figure 4 shows that the branch leading to Makorin1-p1 of a Mus species is significantly longer than that to Makorin1 of the same species when the rat Makorin is used as an outgroup (P < 0.001; Tajima's [1993] test). This suggests that although the transcribed region of Makorin1-p1 is under purifying selection, the selection is weaker than that on Makorin1.

    FIG. 4.— Phylogenetic relationships of Makorin homologs from the mouse, rat, and human, including newly obtained Markorin1 and Markorin1-p1 sequences from Mus. The phylogeny was reconstructed using the DNA sequences homologous to Markorin1-p1 region B. The neighbor-joining method (Saitou and Nei 1987) with Kimura's (1980) distances was used. Bootstrap percentages derived from 1,000 replications are shown on interior branches. The GenBank accession numbers for the sequences are Makorin1 (H. sapiens), AF192784; Makorin1 (M. musculus), AF192785; Makorin1 (R. norvegicus), XM216131; Makorin2 (H. sapiens), AF302084; Makorin2 (R. norvegicus), XM238369; Makorin2 (M. musculus), AF277171; Makorin3 (M. musculus), NM011746; Makorin3 (H. sapiens), NM005664; Makorin3 (R. norvegicus), XM218735; Makorin4 (H. sapiens), NM030757; Makorin1 (M. spretus), AY699807; Makorin1 (M. caroli), AY699806; Makorin1-p1 (M. spretus), AY699805; Makorin1-p1 (M. spicilegus), AY699802; Makorin1-p1 (M. caroli), AY699804; Makorin1-p1 (M. macedonicus), AY699801; Makorin1-p1 (M. cervicolor), AY699803; and Makorin1-p1 (M. musculus), AF494488.

    Discussion

    The tree in figure 4 shows that Makorin1-p1 is a duplicate of Makorin1. Because of our limited taxon sampling, it is difficult to infer an accurate time of duplication. Our success in obtaining part of Makorin1-p1 in M. pahari suggests that the duplication event must have occurred before the divergence of M. musculus and M. pahari 8 to 10 MYA (Singh, Barbour, Berger 1998). Our finding of significantly lower rates of point and indel substitutions in transcribed regions than in untranscribed regions of Makorin1-p1 demonstrates selective constraints on the transcript of this pseudogene. These results are in concordance with Hirotsune et al.'s (2003) finding that the Makorin1-p1 transcript is functionally important. Phylogenetic analysis does not support the hypothesis of gene conversion as the mechanism responsible for the coordinated evolution of Makorin1-p1 and Makorin1.

    Makorin1 belongs to the Makorin family of transcription factors. Putative orthologs of Makorin1 are known in human, mouse, wallaby, chicken, fruitfly, and nematode (Gray, Azama, and Whitmore 2000). Such wide phylogenetic distribution suggests its functional importance. Hirotsune et al.'s (2003) experiments showed a novel Makorin1 regulatory mechanism involving the transcript of the pseudogene Makorin1-p1. Makorin1-p1 knockout mice suffer from severe bone deformities, and most of them die in 2 days. Because Makorin1-p1 is most likely absent in rats, an interesting question is why rats do not suffer from the severe phenotypes that the knockout mice exhibit. How is Makorin1 expression regulated in rats and other species without Makorin1-p1? How and why was Makorin1-p1 co-opted into the regulatory mechanism of Makorin1? Based on experimental evidence, Hirotsune et al. (2003) proposed that Makorin1-p1 functions by titrating out a destabilizing factor that otherwise destabilizes the mRNA of Makorin1. Assuming that this hypothesis is correct, we here propose a duplication-degeneration model (fig. 5) that explains the evolutionary origin of Makorin1-p1's involvement in Makorin1's expression regulation. Before the emergence of Makorin1-p1, the production of Makorin1 mRNA was high enough to titrate out the destabilizing factor. Immediately after Makorin1-p1 was produced by duplication from Makorin1, Makorin1-p1 had the same sequence and expression level as Makorin1. The doubling of the amount of mRNA may be unnecessary or even slightly deleterious. Thus, degenerate mutations that reduce the expression level of Makorin1 could be fixed. If this happened, Makorin1-p1 became indispensable, as its transcript would be needed to titrate out the destabilizing factor. Mutations that disrupted the ORF of Makorin1-p1 could still be fixed because the transcript could still titrate out the destabilizing factor. This would explain the conservation in both expression pattern and transcript sequence of the pseudogene. This model could be tested in the future by examining the expression levels of Makorin1 and Makorin1-p1 in mice and the expression level of Makorin1 in rats. The hypothesis predicts that the amount of Makorin1 expression in rats is higher than that in mice.

    FIG. 5.— Duplication-degeneration model of the origin of the functional pseudogene Makorin1-p1. Hirotsune et al. (2003) proposed that Makorin1-p1 regulates Makorin1 expression by titrating out a destabilizing factor that otherwise destabilizes the mRNA of Makorin1. Our evolutionary model is based on this hypothesis. Before the emergence of Makorin1-p1, the production of Makorin1 mRNA was high enough to titrate out the destabilizing factor (A). Immediately after duplication of the ancestral Makorin1 gene, both copies had same expression levels (B). The doubling of the amount of mRNA could have been unnecessary or even slightly deleterious. Thus, degenerate mutations that reduced the expression level of Makorin1 could be fixed. If this happened, Makorin1-p1 became indispensable, as its transcript would be needed to titrate out the destabilizing factor (C). Mutations disrupting the ORF of Makorin1-p1 could still be fixed because the transcript could still titrate out the destabilizing factor. This would explain the conservation in both expression pattern and transcript sequence of the pseudogene.

    The discovery that pseudogenes can regulate the expression of their functional homologs opens a new research area on gene regulation. An obvious question arising from this finding is how common this phenomenon is throughout the genome. The number of pseudogenes found in a genome varies greatly among organisms. From analyses of completely sequenced genomes, there seems to be no direct correlation between the number of pseudogenes and that of functional genes. For example, the ratio between these two numbers is 0.05 in the nematode Caenorhabditis elegans (1,100 pseudogenes), 0.035 in the yeast Saccharomyces cerevisiae (220), 0.008 in the fly Drosophila melanogaster (110) (Harrison et al. 2003), 0.45 in mouse (14,000 [Waterson et al. 2002]), and 0.66 in human (20,000 [Torrents et al. 2003]). High rates of genomic DNA deletion in Drosophila and Caenorhabditis were thought to be the cause of very few detectable pseudogenes in these species (Petrov and Hartl 2000; Petrov 2001). However, the exact process explaining such discrepancies in relative pseudogene numbers between invertebrates and vertebrates is still unknown. Is it possible that many vertebrate "pseudogenes" are actively involved in gene regulation and contribute to genomic and organismal complexity?

    In this report, we provided evidence for evolutionary constraints on the transcribed portion of Makorin1-1p. Many examples of transcribed pseudogenes are known (e.g., Board, Coggan, and Woodcock 1992; Brandt et al. 1993; Furbass and Vanselow 1995; Bard et al. 1995), and it is possible that some of them also play physiological roles. Even retrotransposed genes, which are generally thought to become pseudogenes immediately after retrotransposition, have been shown to be able to evolve into new functional genes (Long et al. 2003). The findings of functional "pseudogenes" are relevant when pseudogenes are used for estimating rates and patterns of mutations (Li, Gojobori, and Nei 1981; Gojobori, Li, and Graur 1982; Li, Wu, and Luo 1984; Graur, Shuali, and Li 1989; Petrov and Hartl 1999). Care should be taken to ensure that such pseudogenes are indeed nonfunctional. Our approach of comparing point and indel substitution rates in transcribed and untranscribed regions of pseudogenes may be used to systematically look for functional "pseudogenes," particularly when complete genome sequences of closely related species are available, and thereby expedite the discovery of more functional elements in genomes.

    Another question arising from the study of Makorin1-p1 is whether we can still call such sequences pseudogenes. The meaning of nonfunctionality in the pseudogene definition is quite ambiguous. From studies on pseudogene evolution mentioned above, functionality can be described on two levels: operational and biological. At the operational level, one can examine whether a particular gene contains frame-shifting indels, premature stop codons, and disrupted splice recognition sites. At the biological level, functionality can be described as the physiological role and fitness effect of the presence of the pseudogene. Of these two, biological function is much more difficult to examine. Decisions on whether to call a particular DNA sequence a gene or pseudogene can be made only after careful scrutiny and experimental investigations of its function. To strictly adhere to the pseudogene definition, only gene relics that lack biological function should be called pseudogenes. Once a biological function of a gene relic is found, depending on whether the functionality involves the DNA sequence or its transcript, such a relic should be called either a regulatory element or a gene, respectively. In the present case, Makorin1-p1 is probably better called an RNA gene, rather than a pseudogene.

    Supplementary Material

    Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession numbers AY699800 to AY699807.

    Acknowledgements

    We thank David Webb for providing helpful comments on an earlier draft. This work was supported by a start-up fund of the University of Michigan and NIH grant GM67030 to J.Z.

    References

    Balakirev, E. S., and F. J. Ayala. 2003. Pseudogene: are they "junk" or functional DNA?. Annu. Rev. Genet. 37:123–151.

    Bard, J. A., S. P. Nawoschik, B. F. O'Dowd, S. R. George, T. A. Branchek, and R. L. Weinshank. 1995. The human serotonin 5-hydroxytryptamine1D receptor pseudogene is transcribed. Gene 153:295–296.

    Board, P. G., M. Coggan, and D. M. Woodcock. 1992. The human Pi class glutathione transferase sequence at 12q13-q14 is a reverse-transcribed pseudogene. Genomics 14:470–473.

    Brandt, P., M. Unseld, U. Eckert-Ossenkopp, and A. Brennicke. 1993. An rps14 pseudogene is transcribed and edited in Arabidopsis mitochondria. Curr. Genet. 24:330–336.

    Bustamante, C. D., R. Nielsen, and D. L. Hartl. 2002. A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 19:110–117.

    Currie, P. D., and D. T. Sullivan. 1994. Structure, expression and duplication of genes which encode phosphoglyceromutase of Drosophila melanogaster. Genetics 138:353–363.

    Furbass, R., and J. Vanselow. 1995. An aromatase pseudogene is transcribed in the bovine placenta. Gene 154:287–291.

    Gojobori, T., W. H. Li, and D. Graur. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 18:360–369.

    Graur, D., Y. Shuali, and W. H. Li. 1989. Deletions in processed pseudogenes accumulate faster in murids than in humans. J. Mol. Evol. 28:279–285.

    Gray, T. A., K. Azama, K. Whitmore, et al. 2001. Phylogenetic conservation of the Makorin-2 gene, encoding a multiple zinc-finger protein, antisense to the RAF1 proto-oncogene. Genomics 77:119–126.

    Harrison, P. M., D. Milburn, Z. Zhang, P. Bertone, and M. Gerstein. 2003. Identification of pseudogenes in the Drosophila melanogaster genome. Nucleic Acids Res. 32:1033–1037.

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.

    Hirotsune, S., N. Yoshida, A. Chen, L. Garret, F. Suglyama, S. Takahashi, K. Yagami, A Wynshaw-Boris, and A. Yoshiki. 2003. An expressed pseduogene regulates the messenger-RNA stability of its homologous coding gene. Nature 423:91–100.

    Jeffs, P. S., and M. Ashburner. 1991. Processed pseudogenes in Drosophila. Proc. R. Soc. Lond. B Biol. Sci 244:151–159.

    Jeffs, P. S., E. C. Holmes, and M. Ashburner. 1994. The molecular evolution the alcohol dehydrogenase and alcohol dehydrogenase-related genes in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 11:287–304.

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.

    Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.

    Korneev, S. A., J. Park, and M. O'Shea. 1999. Neuronal expression of neural nitric oxidase synthase (nNOS) protein is suppressed by an antisense RNA transcribed from an NOS Pseudogene. J. Neurosci. 19:7711–7720.

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244–1245.

    Lee, J. T. 2003. Complicity of gene and pseudogene. Nature 423:26–28.

    Li, W. H., T. Gojobori, and M. Nei. 1981. Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239.

    Li, W. H., C. I. Wu, and C. C. Luo. 1984. Nonrandomness of point mutation as reflected innucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21:58–71.

    Long, M., E. Betran, K. Thornton, and W. Wang. 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4:865–875.

    Lundrigan, B. L., S. A. Jansa, and P. K. Tucker. 2002. Phylogenetic relationships in the genus Mus, based on paternally, maternally and biparantally inherited characters. Syst. Biol. 51:410–431.

    Petrov, D. A., 2001. Evolution of genome size: new approaches to an old problem. Trends Genet. 17:23–28.

    Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:1475–1479.

    ———. 2000. Pseudogene evolution and natural selection for a compact genome. J. Hered. 91:221–227.

    Podlaha, O., and J. Zhang. 2003. Positive selection on protein-length in the evolution of a primate sperm ion channel. Proc. Natl. Acad. Sci. USA 100:12241–12246.

    Pritchard, J. K., and S. W. Schaeffer. 1997. Polymorphism and divergence at a Drosophila pseudogene locus. Genetics 147:199–208.

    Robin, G. C. Q., R. J. Russell, D. J. Cutler, and J. G. Oakeshott. 2000. The evolution of an ?-esterase pseudogene inactivated in the Drosophila melanogaster lineage. Mol. Biol. Evol. 17:563–575.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.

    Singh, N., K. W. Barbour, and F. G. Berger. 1998. Evolution of transcription regulatory elements within the promoter of a mammalian gene. Mol. Biol. Evol. 15:312–325.

    Tajima, F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135:599–607.

    Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512–526.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882.

    Torrents, D., M. Suyama, E. Zbonov, and P. Bork. 2003. A genome-wide survey of human pseudogenes. Genome Res. 13:2559–2567.

    Waterson, R. K., K. Lindblad-Toh, E. Birney, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556.

    Zhang, J. 1999. Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol. Biol. Evol. 16:868–875.

    Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA 95:3708–3713.(Ondrej Podlaha and Jianzh)