当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第1期 > 正文
编号:11175495
A Globin Gene of Ancient Evolutionary Origin in Lower Vertebrates: Evidence for Two Distinct Globin Families in AnimalsAnja Roesner
http://www.100md.com 《分子生物学进展》
     ** Institute of Zoology, Johannes Gutenberg University, Mainz, Germany; and Institute of Molecular Genetics, Johannes-Gutenberg-University, Mainz, Germany

    Correspondence: E-mail: burmeste@uni-mainz.de.

    AbstractHemoglobin, myoglobin, neuroglobin, and cytoglobin are four types of vertebrate globins with distinct tissue distributions and functions. Here, we report the identification of a fifth and novel globin gene from fish and amphibians, which has apparently been lost in the evolution of higher vertebrates (Amniota). Because its function is presently unknown, we tentatively call it globin X (GbX). Globin X sequences were obtained from three fish species, the zebrafish Danio rerio, the goldfish Carassius auratus, and the pufferfish Tetraodon nigroviridis, and the clawed frog Silurana tropicalis. Globin X sequences are distinct from vertebrate hemoglobins, myoglobins, neuroglobins, and cytoglobins. Globin X displays the highest identity scores with neuroglobin (26% to 35%), although it is not a neuronal protein, as revealed by RT-PCR experiments on goldfish RNA from various tissues. The distal ligand-binding and the proximal heme-binding histidines (E7 and F8), as well as the conserved phenylalanine CD1 are present in the globin X sequences, but because of extensions at the N-terminal and C-terminal, the globin X proteins are longer than the typical eight -helical globins and comprise about 200 amino acids. In addition to the conserved globin introns at helix positions B12.2 and G7.0, the globin X genes contain two introns in E10.2 and H10.0. The intron in E10.2 is shifted by 1 bp in respect to the vertebrate neuroglobin gene (E11.0), providing possible evidence for an intron sliding event. Phylogenetic analyses confirm an ancient evolutionary relationship of globin X with neuroglobin and suggest the existence of two distinct globin types in the last common ancestor of Protostomia and Deuterostomia.

    Key Words: globin ? hemoglobin ? neuroglobin ? gene duplication ? intron sliding

    IntroductionGlobins are small heme-proteins with a characteristic three-on-three -helical sandwich structure. They have the ability to bind oxygen and other gaseous ligands between the iron ion of the porphyrin ring and—typically—a histidine of the polypeptide chain (Dickerson and Geis 1983). Globin-type proteins are widespread and occur in archaea, bacteria, plants, fungi, and animals and exhibit an enormous structural and functional diversity (Hardison 1996; Weber and Vinogradov 2001; Freitas et al. 2004). Although globins are famous for their capability to transport and store oxygen (Dickerson and Geis 1983; Wittenberg and Wittenberg 2003), thus, sustaining oxidative metabolism in the cell, in recent years, other functions have been discovered (Gardner et al. 1998; Sowa et al. 1998; Minning et al. 1999; Fl?gel et al. 2001).

    In the vertebrates, four types of globin have been identified so far. Here, we refer to all globins as "globin superfamily." For functionally derived classifications we use the term "type", the term "family" refers to the distinct globin branches defined here by phylogenetic studies. Hemoglobin (Hb) consists of four globin chains, is located in erythrocytes, and serves for the transport of oxygen from the respiratory surfaces to the inner organs (Dickerson and Geis 1983). Hb is present in all vertebrates investigated so far (Dickerson and Geis 1983), with the exception of some icefish species (Ruud 1954; Sidell et al. 1997). Myoglobin (Mb), mainly localized in the striated and cardiac muscle, is a monomer that acts as temporal oxygen store and facilitates intracellular oxygen diffusion (Wittenberg 1992; Wittenberg and Wittenberg 2003) but also detoxifies NO (Fl?gel et al. 2001). Neuroglobin (Ngb) and cytoglobin (Cygb) are two recently discovered vertebrate globin types with poorly defined functions (Burmester et al. 2000, 2002; Kawada et al. 2001; Trent and Hargrove 2002; Pesce et al. 2002; Burmester and Hankeln 2004). Here, we present the identification and molecular analyses of a fifth vertebrate globin type, adding further diversity to the globin superfamily.

    Materials and MethodsAnimals

    All animals were obtained from a local pet shop. Zebrafishes (Danio rerio: Actinopterygii; Ostariophysi; Cypriniformes) and spotted green pufferfishes (Tetraodon nigroviridis: Actinopterygii; Acanthopterygii; Tetraodontiformes) were kept at 28°C in a freshwater aquarium; goldfishes (Carassius auratus: Actinopterygii; Ostariophysi; Cypriniformes) were kept at 16°C. The western clawed frog Silurana (Xenopus) tropicalis (Amphibia; Anura) was dissected immediately after purchase. Animals were killed by decapitation, the tissues were removed and kept frozen at –80°C or immediately used for the experiments.

    Cloning and Sequencing of Globin X cDNAs and Gene

    RNA was extracted from tissues of D. rerio, T. nigroviridis, and C. auratus according to the guanidine hydrochloride method (Chirgwin et al. 1979) and from S. tropicalis by the Qiagen RNeasy kit. Specific oligonucleotide primers were designed according to the aligned sequences and used to amplify the corresponding cDNAs by reverse-transcription PCR experiments applying the OneStep kit according to the manufacturer's instructions (Qiagen) or employing the SuperScriptTM reverse transcriptase (Invitrogen). The primer sequences are available from the authors upon request. The PCR products were cloned into the pCR4-TOPO (Invitrogen) or the pGEM-Teasy vectors (Promega), sequenced on both strands using DyeTerminatorTM chemistry (Applied Biosystems), and loaded on an ABI3730 capillary sequencer (GENterprise). The missing 3' region of S. tropicalis GbX cDNA was obtained using the RACE system by Invitrogen. Sequences were deposited at the GenBank/EMBL database under the accession numbers AJ635194 (D. rerio), AJ635193 (T. nigroviridis), AJ635195 (C. auratus), and AJ634915 (S. tropicalis). Genomic DNA of T. nigroviridis was obtained using the Stratagene DNA extraction kit. Two sets of nested primer pairs were designed according to the complete T. nigroviridis cDNA and the partial database entries available at GenBank and Genoscope (see above). Overlapping genomic fragments were amplified and their sequences were obtained after the cloning of the PCR products as described above. The PCR clones represent two slightly different GbX genes (accession numbers AJ635196 and AJ635197).

    mRNA Expression Analyses

    Total RNA was isolated from various C. auratus tissues. 1.6 μg of total RNA per tissues form C. auratus were reverse transcribed with an oligo-dT primer followed by PCR reactions on 10% of the total synthesis. Primers used for C. auratus GbX are 5'-CAGGGCTGGTTTCAGGATGGGC-3' and 5'-TACATGTCTGAGATGGTCTTCA-3', with 5' GGATGGGCTGCGCTATTTCGG-3' and 5'-TGAGATGGTCTTCAGATGGCTGTGC-3' as nested primers. The oligonucleotides 5'-GTCCGTGACATCAAGGAGAAGC-3' and 5'-CAGACTCATCGTACTCCTGCTTG-3' were applied for the ?-actin standard.

    Database Analyses

    The BlastN and TBlastN search algorithms (Altschul et al. 1990) were employed to evaluate the sequence databases and trace archives at GenBank (http://www.ncbi.nlm.nih.gov) and the pufferfish genome projects of Takifugu rubripes (http://fugu.hgmp.mrc.ac.uk) and Tetraodon nigroviridis (http://www.genoscope.cns.fr/externe/tetraodon/). The nucleotide sequences were extracted from the databases and assembled with Genedoc 2.6 (Nicholas and Nicholas 1997), using the BlastT results as guidelines. The Web-based tools provided at the ExPASy Molecular Biology Server at the Swiss Institute of Bioinformatics (http://www.expasy.org) were used for the translation of the DNA and further analyses of the protein sequences.

    Sequence Analyses and Phylogenetic Inference

    The GbX protein sequences were manually added to an alignment of selected globin sequences (Burmester et al. 2000, 2002). The sequences used are Ngb from human (Homo sapiens: Chordata; Vertebrata; Mammalia) (HsaNGB, accession number AJ245946), mouse (Mus musculus: Chordata; Vertebrata; Mammalia) (MmuNgb, AJ245945), T. nigroviridis (Chordata; Vertebrata; Actinopterygii) (TniNgb, AJ315609), D. rerio (Chordata; Vertebrata; Actinopterygii) (DreNgb, AJ315610), and rainbow trout (Oncorhynchus mykiss: Chordata; Vertebrata; Actinopterygii) (OmyNgb, AJ547800), nerve Mb of the sea mouse (Aphrodita aculeata: Annelida; Polychaeta) (AacNgb, U46754), sea squirt (Ciona intestinalis: Chordata; Tunicata) Hbs 1 to 4 (CinHb1-4, AJ548500 to AJ548503), bloodworm (Glycera dibranchiata: Annelida; Polychaeta) Hb P3 (GdiHbP3, M55444), earthworm (Lumbricus terrestris: Annelida; Oligochaeta) Hbs B (LteGbB, P02218*) and Hbs D (LteGbD, U55073*), earthworm (Pheretima sieboldi: Annelida; Oligochaeta) Hb I (PsiHb1; S06483*), fanworm (Sabella spallanzanii: Annelida; Polychaeta) Hb (SspHb, AJ131285), lugworm (Arenicola marina: Annelida; Polychaeta) Mb (AmaMb1 [Kleinschmidt and Weber 1998]), beard worm (Lamellibrachia sp.: Pogonophora; Vestimentifera) Hb 2 (LamHb2, P15469*), fruitfly (Drosophila melanogaster: Arthropoda; Insecta) Hb (DmeHb, AJ132818), horse botfly (Gasterophilus intestinalis: Arthropoda; Insecta) Hb (GinHb, AF063938), midge (Chironomus thummi thummi: Arthropoda; Insecta) Hbs (CttHbIIb, AF001292, CttHbVI, P02224*, CttHbVIII, P02227*, CttHbE, P11582*), Atlantic hagfish (Myxine glutinosa: Chordata; Vertebrata; Myxiniformes) Hbs 1 to Hbs 3 (MglHb1, AF156936, MglHb2, AF157494, MglHb3, AF184239), European river lamprey Lampetra fluviatilis Hb (Chordata; Vertebrata; Petromyzontiformes) (LflHb, P02207*), Cygb from human (HsaCYGB, AJ315162), mouse (MmuCygb, AJ315163), and zebrafish (DreCygb, AJ320232), Mb from the gummy shark (Mustelus antarcticus: Chordata; Vertebrata; Chondrichthyes) (ManMb, P14399*), tope shark (Galeorhinus australis: Chordata; Vertebrata; Chondrichthyes) (GauMb, P14397*), T. nigroviridis (TniMb, CAAE01011832), bluefin tuna (Thunnus thynnus: Chordata; Vertebrata; Actinopterygii) (TthMb, Q9DD47*), sperm whale (Physeter catodon: Chordata; Vertebrata; Mammalia) (PcaMb, P02185*), mouse (MmuMb, P04247*), and human (HsaMB, M14603); Hbs a from the Port Jackson shark (Heterodontus portusjacksoni: Chordata; Vertebrata; Chondrichthyes) (HpoHbA, P02021*), spiny dogfish (Squalus acanthias: Chordata; Vertebrata; Chondrichthyes) (SacHbA, A24653), rainbow trout (OmyHbA4, D88114), bluefin tuna (TthHbA2, AB093570), the mouse (MmuHbA, V00714) and human (HsaHBA, J00153), Hbs b from H. portusjacksoni (HpoHbB, P02143*), S. acanthias (SacHbB, B24653), carp (Cyprinus carpio: Chordata; Vertebrata; Actinopterygii) (CcaHbB, D88117), zebrafish (DreHbBa1, NM_131020), human (HsaHBB, M36640), and mouse (MmuHbB, AK028067). Asterisks indicate sequences from the protein database. The final alignment is available from the authors upon request.

    The program packages PHYLIP version 3.6b (Felsenstein 2004), Tree-Puzzle version 5.2 (Strimmer and von Haeseler 1996) and MrBayes version 3.0beta4 (Huelsenbeck and Ronquist 2001) were used for the phylogenetic tree reconstructions. Distance matrices were calculated with Tree-Puzzle using the PAM (Dayhoff, Schwartz, and Orcutt 1978) and the WAG (Whelan and Goldman 2001) models of amino acid evolution, each with assuming gamma distributions of substitution rates with eight categories. Neighbor-joining trees were inferred with the program NEIGHBOR from the PHYLIP package. The reliability of the branching pattern was tested by bootstrap analysis (Felsenstein 1985) with 100 replications, employing PUZZLEBOOT (shell script by M. Holder and A. Roger). Bayesian phylogenetic analyses (cf. Huelsenbeck et al. 2001) were performed by MrBayes, using the PAM and WAG models of amino acid evolution and assuming a gamma distribution of substitution rates. Prior probabilities for all trees were equal. Metropolis-coupled Markov chain Monte Carlo (MCMCMC) sampling was performed with one cold and three heated chains that were run for 300,000 generations. Starting trees were random, trees were sampled every 10th generation, and posterior probabilities were estimated on the final 5,000 trees (burn-in = 25,000).

    Substitution rates were calculated based on a PAM matrix under the assumptions that the Tetrapoda and Teleostei diverged about 420 MYA and that the cyprinid fishes (D. rerio and C. auratus) and percomorph pufferfishes (T. nigroviridis) diverged about 120 MYA (Benton 1990; Wittbrodt, Shima, and Schartl 2002). The number of nonsynonymous (dn) and synonymous (ds) nucleotide substitutions per site were obtained by the SNAP program (Ota and Nei 1994), which implements the method of Nei and Gojobori (1986) for the correction of multiple substitutions.

    ResultsIdentification of a Novel Globin Gene from Fishes and Amphibians

    Systematic Blast searches were carried out on the databases of genomic sequences from three fish species (D. rerio, T. nigroviridis, and Takifugu [Fugu] rubripes) and the amphibian S. tropicalis. Initially, the TBlastN algorithm under the assumption of a BLOSUM 45 substitution matrix was employed, using the human Ngb amino acid sequence (Burmester et al. 2000) as query. For each species, we obtained sequences that correspond to a single novel globin gene. Because the function of the gene product is presently unknown, this novel gene has been termed "globin X" or GbX. Employing the BlastN algorithm, the D. rerio, T. nigroviridis, T. rubripes, and S. tropicalis GbX sequences were compiled from the NCBI trace archives and other resources (see Materials and Methods), and the tentative genomic sequences of the genes were deduced. In addition, an expressed sequence tag (EST) from the catfish Ictalurus punctatus (accession number CK416201) was found that is most likely homologous to GbX. No corresponding sequences were found by TBlastN searches in any of the complete genomes of mouse, human, or chicken, in EST sequences of mammals or birds, or in the trace archives of mammalian genomic sequences.

    The putative coding regions were inferred by comparison with the published Ngb genes and the I. punctatus EST sequence. These data show that the coding region of the T. rubripes GbX gene is not completely covered by the genome sequences in the databases. The actual cDNA sequences of D. rerio, T. nigroviridis, and S. tropicalis GbX were then obtained from total RNA by RT-PCR experiments, also demonstrating that these novel globin genes are in fact expressed in the animals. In addition, we used oligonucleotide primers deduced from the D. rerio sequence to obtain the cDNA of GbX from the goldfish C. auratus.

    Genomic DNA was extracted from T. nigroviridis and the GbX gene was amplified by PCR and sequenced. Two slightly different sequences were obtained from a single animal. Because the available genomic sequences of T. nigroviridis and T. rubripes suggest that GbX is a single-copy gene, the two obtained sequences most likely represent distinct alleles of this gene. The GbX gene of T. nigroviridis comprises 4,301 bp or 4,302 bp (counting from the start ATG to the stop codon). As deduced from the trace files and partial database entries, the GbX genes of T. rubripes and D. rerio are more than 4,390 bp and more than 47,200 bp, respectively (fig. 1). Because both genes are not complete in their intronic regions, these values are lower estimates. The coding regions of S. tropicalis GbX are scattered on different trace files that indicate a gene of less than 7,600 bp. The coding region of T. nigroviridis GbX covers 615 bp, and those of the other GbX genes cover 600 bp. Comparison of the GbX genes with the cDNA sequences shows that the coding regions of fish and amphibian GbX are distributed on five exons (fig. 1). Two of the introns are located at positions B12.2 (i.e., between the second and third base of the 12th codon in globin helix B) and G7.0, which are typical for vertebrate globins (fig. 2). Two other introns are present in positions E10.2 and H10.0.

    FIG. 1.— Genome structure of GbX genes. The intron positions and sizes are indicated. Introns are designated according to the positions in the globin structure (e.g., "B12.2" means codon position 2 within codon 12 of globin helix B). The size bar refers to the T. nigroviridis GbX gene.

    FIG. 2.— Distribution of introns in various metazoan globin genes, shown relative to the globin -helices A to H. The bold boxes indicate the introns in the GbX genes.

    The cDNAs translate into proteins of 200 (S. tropicalis) and 205 (T. nigroviridis) amino acids, with predicted molecular masses in the range of 23 kDa. Comparison of the deduced GbX protein sequences with other globins show that the globin fold, which covers about 140 to 150 amino acids of the standard -helices A through H, is conserved (fig. 3). Because of extensions of about 25 to 30 amino acids at both the N-terminal and the C-terminal, the lengths of fish and amphibian GbX exceed that of the typical globins. Nevertheless, the key residues important for oxygen binding, such as the proximal and distal histidines in the position E7 and F8, as well as the phenylalanine at CD1, are strictly conserved. Computer predictions using the PSORT II program (Nakai and Horton 1999) indicate that the GbX proteins do not contain any signal peptide and are most likely localized in the cytoplasm.

    FIG. 3.— Comparison of GbX amino acid sequences from pufferfish (TniGbX), zebrafish (DreGbX), goldfish (CauGbX), and frog (XtrGbX) with Ngb from zebrafish (DreNgb), mouse (MmuNgb), and human (HsaNGB). The secondary structure of human neuroglobin is superimposed in the upper row, with -helices designated A through H, the globin consensus numbering is given below the sequences. Strictly conserved amino acids are shaded in dark gray; residues conserved among GbX are shaded in light gray. The four intron positions in the GbX genes (i.e., B12.2, E10.2, G7.0 and H10.0) are indicated by arrows at the upper row.

    Sequence Comparison and Phylogenetic Analyses of Vertebrate Globins

    The GbX amino acid sequences of the pufferfish T. nigroviridis and the Cypriniformes (D. rerio and C. auratus) share about 69% of the amino acids and are about 89% similar considering isofunctional replacements (based on a PAM 250 matrix). Fish and amphibian globins are 54.5% to 59.4% identical. Ninety-eight residues are strictly conserved among all four GbX proteins. Within the 140–amino acid globin core, 90 amino acids are unchanged. The substitution rates within the fish GbX are 1.65x10–9amino acid substitutions per site per year (table 1) and 0.63x10–9 if fish and amphibian GbX are compared (data not shown). When only the globin core regions are considered, the substitution rates are estimated to be about the half. Levels of selective constraints were measured by calculating the ds/dn ratios. These vary between 5.2 and 10.5 in the coding region of fish GbX and between 8.4 and 34.9 in the DNA sequences coding for the globin domain.

    Table 1 Mean Substitution Rates in Selected Fish Globins

    The GbX amino acid sequences were then included in an alignment of selected vertebrate and invertebrate globins (see Materials and Methods). Pairwise comparisons showed that GbX displays the highest degree of sequence similarity with the vertebrate Ngbs, with identity values ranging from 26.0% to 34.6%. Other globins have generally lower similarity scores. For example, the GbX proteins share 18% to 26% amino acids with vertebrate Mbs, 15% to 25% with vertebrate Hbs, and 22% to 26% with the cytoglobins. Various phylogenetic trees were constructed employing the neighbor-joining method and Bayesian inference. Because of the high divergence of the plant, fungal, and bacterial globins, these proteins cannot be used as out-group for rooting of the metazoan globin tree. In a radial phylogram, the GbX proteins consistently group with the vertebrate Ngbs, annelid intracellular globins (Glycera Hbs and the Aphrodite nerve Mb) and the Ciona intestinalis Hbs (fig. 4). The monophyly of this branch (fig. 4, node "A") received modest bootstrap support (43% and 48%) and reasonable Bayesian probability scores (0.83 and 0.78) and was recovered in all additional analyses employing other substitution matrices or additional metazoan globin sequences (data not shown). Together, they are strictly separated from the other metazoan globins, represented here by the protostomian globins from annelids and arthropods (node "C") and by the other vertebrate globins, which include the Hb chains and ?, the Mbs and the Cygbs of the gnathostomian vertebrates, and the agnathan (hagfish and lamprey) Hbs (node "B"). Whereas the branch that comprises the protostomian globins is poorly supported, the monophyly of the vertebrate globins (excluding Ngb and GbX) was recovered with high probability scores in all analyses.

    FIG. 4.— Neighbor-joining tree of selected animal globins, based on a WAG substitution matrix with gamma corrections of rates. See Materials and Methods for proteins and abbreviations. The numbers at the trees represent bootstrap support values; the bar equals 0.1 PAM distance. Neighbor-joining bootstrap support values of node "A" are 43% (PAM model) and 48% (WAG model), 68%/73% for node "B," and 37%/33% for node "C." Bayesian posterior probabilities for nodes "A" are 0.83 (PAM model) and 0.73 (WAG model), 1.00/1.00 for node "B," and 0.58/0.48 for node "C."

    Expression of Globin X mRNA

    Total RNA was extracted from nine selected goldfish tissues. Using RT-PCR with two pairs of specific GbX primers, weak GbX cDNA signals were detected in the gills, muscle, heart, gut, kidney, spleen, and liver (fig. 5). No detectable amplification products were observed in the eye and the brain RNA. Although these data do not allow a valid quantitative assessment of expression levels, the abundance of the GbX message appears to be generally low. This observation is corroborated by the available EST data from zebrafish. A single sequence (AL921870) was obtained from the EST database of 450,652 entries (as of February 2004) by Blast searches, whereas 11 entries are derived from Ngb, 31 from Cygb, 87 from Mb, and several hundred from the various Hb genes.

    FIG. 5.— Analysis of GbX expression. RNA was isolated from the indicated goldfish tissues. The expression of GbX mRNA was determined by RT-PCR (upper row). ?-actin mRNA was used as standard control of RNA concentration and integrity (lower row). Note that the multiple bands for ?-actin mRNA derive from tissue-specific isoforms.

    DiscussionFor many years, globins have served as an important model for investigating protein and gene evolution, as well as species phylogeny (Graur and Li 2000). So far, four types of globins have been identified in vertebrates, with Ngb and Cygb being the most recent additions to the well-known Hbs and Mbs (Burmester et al. 2000, 2002; Burmester and Hankeln 2004). Ngb represents an ancient globin type related to invertebrate nerve globins and is thought to be involved in neuronal oxygen homeostasis (Burmester et al. 2000; Sun et al. 2001; Schmidt et al. 2003; Fuchs et al. 2004). Cygb shares a more recent common ancestry with Mb and might play a role in collagen synthesis (Nakatani et al. 2004; Schmidt et al. 2004). As we have shown here, a fifth globin type is present in fish and amphibians. This protein, named GbX, is not particularly related to the and ? Hb chains, the Mbs, or the Cygbs from fishes, amphibians, or any other vertebrate, as reflected by the low sequence similarities and the phylogenetic analyses (fig. 4). GbX is widely expressed in many but not all tissues, although its cellular distribution remains to be investigated. It should be emphasized that, despite its distant similarity to Ngb, GbX is not a primary neuronal protein, and, at least in goldfish, it is not expressed in neural tissues at all (fig. 5). At present, the function(s) of GbX remain uncertain. Nevertheless, GbX provides valuable clues for the evolution of globin proteins and genes.Possible Loss of GbX in Amniota

    In the present study, GbX sequences have been identified in fish and amphibians, suggesting that the emergence of this gene dates back to the time before the Tetrapoda diverged from the teleost fishes some 420 MYA (Benton 1990). Extensive database searches did not provide any evidence for the presence of GbX homologs in the available sequences of mammals and birds. Thus, the GbX gene has most likely been lost during the evolution of the Amniota.

    Conserved and Divergent Features of GbX Protein and Genome Structures

    Similar to Cygb (Burmester et al. 2002) and some invertebrate globins (Neuwald et al. 1997), the GbX proteins contain extensions at their N-terminals and C-terminals. No discernible similarity of these extensions to any known sequence in the databases has been found. The N-terminal parts are largely characterized by simple sequences, and in some T. nigroviridis GbX sequences, an indel of one amino acid was observed in the alanine-rich region (fig. 3). Short sequence duplications, possibly generated by replication slippage, have also been noted in the N-terminal extension of mammalian Cygb genes (Burmester et al. 2002).

    Patterns and rates of amino acid and nucleotide substitutions provide evidence of the levels of selective constraints that have shaped the GbX sequences. The overall amino acid substitution rates of GbX are in the range of 10–9 amino acid substitutions per site per year, similar to those of other vertebrate globins (Graur and Li 2000; Wystub et al. 2004). The globin core of GbX is highly conserved, with rates similarly low as those in Ngb and Cygb (cf. Wystub et al. 2004). Thus, the low similarity of GbX to other globins probably reflects an ancient divergence of GbX (see below) rather than enhanced evolutionary rates. Selective constraints may also be measured by the ds/dn ratios (Nei and Gojobori 1986), which have been found to be highly variable among vertebrate globin genes (Wystub et al. 2004). Strong selective pressure on a coding region will favor synonymous nucleotide substitutions (ds) over nonsynonymous substitutions that lead to an amino acid replacement (dn) (table 1). While overall GbX ds/dn ratios are in the range of those of vertebrate hemoglobins and myoglobins (Wystub et al. 2004), the ratios in the globin core are particularly high. In summary, these observations indicate that strong purifying selection has been imposed on the globin core and, thus, has shaped GbX structure and most likely also its function.

    As already noted for the other vertebrate globins (Wystub et al. 2004), a comparison of pufferfish (T. nigroviridis) and cyprinid (D. rerio; C. auratus) GbX sequences resulted in higher evolution rates than expected (e.g., from a comparison of fish and amphibian globins). Whereas on the one hand, replacement rates may be enhanced in the fish lineage, it is on the other hand conceivable that the divergence time of the Cypriniformes and pufferfishes assumed in this study (approximately 120 MYA [cf. Benton 1990; Wittbrodt, Shima and Schartl 2002]) is in fact an underestimate (Thomas and Touchman 2002). An earlier divergence of these taxa within the teleost fishes would of course explain the apparent higher substitution rates.

    The GbX gene of the pufferfish T. rubripes is not complete, with parts of the first intron and the second exon missing in the databases. Nevertheless, the available genomic sequences of T. rubripes (Aparicio et al. 2002) allow the identification of the neighboring genes. The T. rubripes GbX gene is located on scaffold M000153, between the putative genes coding for hepatocyte growth factor activator inhibitor 1 and phospholipase D3. The human orthologs of these genes are located on chomosomes 15q5 and 19q13.2, respectively, and neither these nor putative paralogous genes are associated with any globin locus in the human genome. Thus, the chromosomal position of GbX is unlikely to be informative with respect to its chromosomal origin in an ancestral taxon.

    Possible Evidence for Intron Sliding in Globins

    Intron positions are considered as valuable clues for gene evolution, but the antiquity of introns within globin genes and their positional stability have been a matter of debate (e.g., Hankeln et al. 1997; Logsdon, Stoltzfus, and Doolittle 1998). The GbX genes contain the "classical" globin introns at positions B12.2 and G7.0 (figs. 1–3). These two introns have been found in most vertebrate and invertebrate globins, including all known vertebrate hemoglobin, myoglobin, neuroglobin, and cytoglobin genes (Hardison 1996; Hankeln et al. 1997; Burmester et al. 2000, 2002) and are considered as phylogenetically ancient (Dixon and Pohajdak 1992; Hardison 1996). GbX contains two additional introns. Whereas the intron in the H-helix (H10.0) is unique and, thus, has most likely been acquired after the divergence of GbX from the other globins, the central intron in E10.2 of the GbX gene requires attention. Introns in the E-helix have been found in various globin genes, although they are differentially positioned within this region (fig. 2). All four C. intestinalis Hb genes, which are distantly related to GbX (fig. 4), also contain introns in position E10.2 (Ebner, Burmester, and Hankeln 2003). It is noteworthy that the vertebrate Ngb genes, which are the closest relatives of GbX, have an intron in E11.0, which is, thus, shifted by 1 bp compared with the E10.2 position. Although it is possible that these introns arose from two independent insertion events, we must also consider the possibility of intron sliding (Rogers 1986; Stoltzfus et al. 1997). At present, a shift of intron-exon boundaries by 1 bp is not easy to explain, and no known mechanism—except for two coinciding independent mutation events (Stoltzfus et al. 1997)—exists that can readily bring about such an intron sliding event. However, given the close phylogenetic relationship of Ngb, GbX and the Ciona Hbs (fig. 4), it is possible that the central intron in these genes actually inserted early in evolution (likely in position E10.2), and that Ngb is in fact one of the few anecdotal examples of intron sliding (Rogozin, Lyons-Weiler, and Koonin 2000).

    Two Families of Animal Globins?

    The globin superfamily is widespread within the animal kingdom, and tracing of globin phylogeny would help to understand their functional evolution. At present, no out-group is available that would allow a reliable rooting of the phylogenetic tree of the metazoan globins. Moreover, regardless of the reconstruction method used, the resolution of the trees remains poor, as reflected by the low bootstrap support values and Bayesian posterior probabilities (fig. 4). This drawback is largely caused by the poor conservation of the globins over long evolutionary distances and the rather short sequences. Nevertheless, three major branches of animal globins were recovered (fig. 4). The first branch, at node "C," comprises protostomian globins and the second branch (node "B") comprises the vertebrate Hbs, Mbs, and Cygbs proteins but not the Ngb and GbX proteins. It is reasonable to assume that these two branches separated at the time of protostomian-deuterostomian divergence. By contrast, Ngb and GbX are members of a third and more ancient globin branch that comprises proteins from both Protostomia and Deuterostomia (fig. 4, node "A"). Thus, the phylogenetic tree provides evidence that the last common bilaterian ancestor possessed at least two distinct globin genes: the ancestor at node "A" eventually gave rise to the vertebrate Ngb and GbX proteins, the C. intestinalis Hbs, and the annelid intracellular globins, whereas the other ancestral gene was the predecessor of the metazoan globins of the clades "B" and "C." However, at present there are no functional or structural features that clearly delineate these globin families. For example, pentacoordinated (e.g., vertebrate and Glycera Hbs) and hexacoordinated (e.g., neuroglobin and cytoglobin) globins occur in both families (Pesce et al. 2002; Burmester and Hankeln 2004), as well as Mb-type oxygen-storage proteins and Hb-type oxygen-transport proteins. This demonstrates the structural and functional flexibility of globins that is only beginning to be fully appreciated as genomic data of a broad variety of taxa becomes available.

    AcknowledgementsA.R. and T.B. wish to thank J. Markl for excellent working facilities. This work is supported by the Deutsche Forschungsgemeinschaft (Bu 956/5 and Ha2103/3) and in part by the EU (QLG3-CT-2002-01548).

    References

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    Aparicio, S., J. Chapman, E. Stupka et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 297:1301–1310.

    Benton, M. J. 1990. Vertebrate paleontology. Unwin Hyman Ltd., London.

    Burmester, T. and T. Hankeln. 2004. Neuroglobin: A respiratory protein of the nervous system. News Physiol. Sci. 19:110–113.

    Burmester, T., B. Ebner, B. Weich, and T. Hankeln. 2002. Cytoglobin: a novel globin type ubiquitously expressed in vertebrate tissues. Mol. Biol. Evol. 19:416–421.

    Burmester, T., B. Weich, S. Reinhardt, and T. Hankeln. 2000. A vertebrate globin expressed in the brain. Nature 407:520–523.

    Chirgwin, J. M., A. E. Przybyla, R. J. MacDonald, and W. J. Rutter. 1979. Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18:5294–5299.

    Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. Pp. 345–352 in Dayhoff, M. O., ed. Atlas of protein sequence structure, Vol. 5, Supplement 3. National Biomedical Research Foundation, Washington, DC.

    Dickerson, R. E., and I. Geis. 1983. Hemoglobin: structure, function, evolution, and pathology. Benjamin/Cummings Publishing Co., Menlo Park, Calif.

    Dixon, B., and B. Pohajdak. 1992. Did the ancestral globin gene of plants and animals contain only two introns?. TIBS 17:486–488.

    Ebner, B., T. Burmester, and T. Hankeln. 2003. Globin genes are present in Ciona intestinalis. Mol. Biol. Evol. 20:1521–1525.

    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791.

    ———. 2004. PHYLIP (phylogeny inference package). Version 3.6b. Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Fl?gel, U., M. W. Merx, A. Goedecke, U. Decking, and J. Schrader. 2001. Myoglobin: a scavenger of bioactive NO. Proc. Natl. Acad. Sci. USA 98:735–740.

    Freitas, T. A., S. Hou, E. M. Dioum, J. A. Saito, J. Newhouse, G. Gonzalez, M. A. Gilles-Gonzalez, and M. Alam. 2004. Ancestral hemoglobins in Archaea. Proc. Natl. Acad. Sci. USA 101:6675–6680.

    Fuchs, C., V. Heib, L. Kiger, M. Haberkamp, A. Roesner, M. Schmidt, D. Hamdane, M. C. Marden, T. Hankeln, and T. Burmester. 2004. Zebrafish reveals different and conserved features of vertebrate neuroglobin gene structure, expression pattern and ligand binding. J. Biol. Chem. 279:24116–24122.

    Gardner, P. R., A. M. Gardner, L. A. Martin, and A. L. Salzman. 1998. Nitric oxide dioxygenase: an enzymic function for flavohemoglobin. Proc. Natl. Acad. Sci. USA 95:10378–10383.

    Graur, D., and W.-H. Li. 2000. Fundamentals of molecular evolution, 2nd edition. Sinauer Associates, Sunderland, Mass.

    Hankeln, T., H. Friedl, I. Ebersberger, J. Martin, and E. R. Schmidt. 1997. A variable intron distribution in globin genes of Chironomus: evidence for recent intron gain. Gene 205:151–160.

    Hardison, R. C. 1996. A brief history of hemoglobins: plant, animal, protist, and bacteria. Proc. Natl. Acad. Sci. USA 93:5675–5679.

    Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755.

    Huelsenbeck, J. P., F. Ronquist, R. Nielsen, and J. P. Bollback. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314.

    Kawada, N., D. B. Kristensen, K. Asahina, K. Nakatani, Y. Minamiyama, S. Seki, and K. Yoshizato. 2001. Characterization of a stellate cell activation-associated protein (STAP) with peroxidase activity found in rat hepatic stellate cells. J. Biol. Chem. 276:25318–25323.

    Kleinschmidt, T., and R. E. Weber. 1998. Primary structures of Arenicola marina isomyoglobins: molecular basis for functional heterogeneity. Biochim. Biophys. Acta 1383:55–62.

    Logsdon, J. M., A. Stoltzfus, and W. F. Doolittle. 1998. Molecular evolution: recent cases of spliceosomal intron gain?. Curr. Biol. 8:R560–563.

    Minning, D. M., A. J. Gow, J. Bonaventura, R. Braun, M. Dewhirst, D. E. Goldberg, and J. S. Stamler. 1999. Ascaris haemoglobin is a nitric oxide-activated ‘deoxygenase’. Nature 401:497–502.

    Nakai, K., and P. Horton. 1999. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24:34–36.

    Nakatani, K., H. Okuyama, Y. Shimahara, S. Saeki, D. H. Kim, Y. Nakajima, S. Seki, N. Kawada, and K. Yoshizato. 2004. Cytoglobin/STAP, its unique localization in splanchnic fibroblast-like cells and function in organ fibrogenesis. Lab. Invest. 84:91–101.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.

    Neuwald, A. F., J. S. Liu, D. J. Lipman, and C. E. Lawrence. 1997. Extracting protein alignment models from the sequence database. Nucleic Acids Res. 25:1665–1677.

    Nicholas, K. B., and H. B. Nicholas Jr. 1997. GeneDoc: analysis and visualization of genetic variation. http://www.psc.edu/biomed/genedoc/.

    Ota, T., and M. Nei. 1994. Variance and covariances of the numbers of synonymous and nonsynonymous substitutions per site. Mol. Biol. Evol. 11:613–619.

    Pesce, A., M. Bolognesi, P. Ascenzi, A. Bocedi, S. Dewilde, L. Moens, T. Hankeln, and T. Burmester. 2002. Neuroglobin and cytoglobin: fresh blood for the vertebrate globin family. EMBO Rep. 3:1146–1151.

    Rogers, J. H. 1986. Introns between protein domains: selective insertion or frameshifting?. Trends Genet. 2:223.

    Rogozin, I. B., J. Lyons-Weiler, and E. V. Koonin. 2000. Intron sliding in conserved gene families. Trends Genet. 16:430–432.

    Ruud, J. T. 1954. Vertebrates without erythrocytes and blood pigment. Nature 173:848–850.

    Schmidt, M., A. Giessl, T. Laufs, T. Hankeln, U. Wolfrum, and T. Burmester. 2003. How does the eye breathe? Evidence for neuroglobin-mediated oxygen supply in the mammalian retina. J. Biol. Chem. 278:1932–1935.

    Schmidt, M., F. Gerlach, A. Avivi et al. (11 co-authors). 2004. Cytoglobin is a respiratory protein expressed in connective tissue and neurons that is up-regulated by hypoxia. J. Biol. Chem. 279:8063–8069.

    Sidell, B. D., M. E. Vayda, D. J. Small, T. J. Moylan, R. L. Londraville, M. L. Yuan, K. J. Rodnick, Z. A. Eppley, and L. Costello. 1997. Variable expression of myoglobin among the hemoglobinless Antarctic icefishes. Proc. Natl. Acad. Sci. USA 94:3420–3424.

    Sowa, A. W., S. M. G. Duff, P. A. Guy, and R. D. Hill. 1998. Altering hemoglobin levels changes energy status in maize cells under hypoxia. Proc. Natl. Acad. Sci. USA 95:10317–10321.

    Stoltzfus, A., J. M. Logsdon Jr, J. D. Palmer, and W. F. Doolittle. 1997. Intron "sliding" and the diversity of intron positions. Proc. Natl. Acad. Sci. USA 94:10739–10744.

    Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969.

    Sun, Y., K. Jin, X. O. Mao, Y. Zhu, and D. A. Greenberg. 2001. Neuroglobin is up-regulated by and protects neurons from hypoxic-ischemic injury. Proc. Natl. Acad. Sci. USA 98:15306–15311.

    Thomas, J. W., and J. W. Touchman. 2002. Vertebrate genome sequencing: building a backbone for comparative genomics. Trends. Genet. 18:104–108.

    Trent, J. T. 3rd, and M. S. Hargrove. 2002. A ubiquitously expressed human hexacoordinate hemoglobin. J. Biol. Chem. 277:19538–19545.

    Weber, R. E., and S. N. Vinogradov. 2001. Nonvertebrate hemoglobins: functions and molecular adaptations. Physiol. Rev. 81:569–628.

    Whelan, S., and Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18:691–699.

    Wittbrodt, J., A. Shima, and M. Schartl. 2002. Medaka—a model organism from the far east. Nat. Rev. Genet. 3:53–64.

    Wittenberg, J. B. 1992. Functions of cytoplasmatic hemoglobins and myohemerythrin. Adv. Comp. Environ. Physiol. 13:60–85.

    Wittenberg, J. B., and B. A. Wittenberg. 2003. Myoglobin function reassessed. J. Exp. Biol. 206:2011–2020.

    Wystub, S., B. Ebner, C. Fuchs, B. Weich, T. Burmester, and T. Hankeln. 2004. Interspecies comparison of neuroglobin, cytoglobin and myoglobin: sequence evolution and candidate regulatory elements. Cytogenet. Genome Res. 105:65–78.(Christine Fuchs, Thomas H)