当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第7期 > 正文
编号:11255037
Retrotransposon-Gene Associations Are Widespread Among D. melanogaster Populations
     Department of Genetics, University of Georgia, Athens

    E-mail address: mcgene@uga.edu.

    Abstract

    We have surveyed 18 natural populations of Drosophila melanogaster for the presence of 23 retrotransposon-gene–association alleles (i.e., the presence of an LTR retrotransposon sequence in or within 1,000 bp of a gene) recently identified in the sequenced D. melanogaster genome. The identified associations were detected only in the D. melanogaster populations. The majority (61%) of the identified retrotransposon-gene associations were present only in the sequenced strain in which they were first identified. Thirty percent of the associations were detected in at least one of the natural populations, and 9% of the associations were detected in all of the D. melanogaster populations surveyed. Sequence analysis of an association allele present in all populations indicates that selection is a significant factor in the spread and/or maintenance of at least some of retroelement-gene associations in D. melanogaster.

    Key Words: Long terminal repeats ? retrotransposons ? transposable elements ? Drosophila melanogaster ? genome evolution

    Introduction

    Retrotransposons are eukaryotic transposable elements having a life cycle that includes reverse transcription of an RNA intermediate (Boeke and Stoye 1997). Retrotransposons constitute a significant fraction of most eukaryotic genomes. For example, in species with relatively large genomes, such as humans, it is estimated that nearly half of the genome is composed of retrotransposon sequences (Venter et al. 2001; Lander et al. 2001). The abundance of retrotransposon sequences in the genomes of some plant species is even higher. For example, at least 50% of the maize genome (SanMiguel et al. 1996) and approximately 90% of the genome of some species of lilies (Flavell 1986) are composed of retrotransposons. In species with smaller genomes, such as yeast, nematodes, and fruit flies, the percentage of retrotransposon sequences in the genome is much less, typically ranging from 1% to 10% (Cherry et al. 1997; Celniker et al. 2002; Kaminker et al. 2002; Kidwell 2002).

    While the adaptive significance of transposable elements (TEs) was assumed by those researchers that first discovered them (e.g., McClintock [1951] and Shapiro [1977]), subsequent theoretical and population genetic studies (e.g., Hickey [1982] and Charlesworth and Langley [1989]) called this assumption into question and proposed that retrotransposons and other transposable elements are more appropriately viewed as parasitic-like sequences that provide little or no adaptive benefit to their hosts (e.g., Charlesworth [1988]). More recent findings in molecular biology and genomics indicate that this negative indictment may have been premature. Presently, there are a large and growing number of examples of retrotransposon sequences located in or near genes that have been shown to have a significant effect on gene expression (Britten 1996; Brosius 1999; Landry, Medstrand, and Mager 2001; Medstrand, Landry, and Mager 2001; Sorek, Ast, and Graur 2002; Lerman et al. 2003).

    The growing availability of the complete sequence of a variety of genomes is providing an unprecedented opportunity to more objectively assess the contribution of transposable element sequences to gene structure and function. The genomic approach typically begins with the identification of a TE-gene association (i.e., the occurrence of a TE sequence in or near a gene) in a sequenced genome (Maside et al. 2002; Petrov et al. 2003). For example, it has recently been shown that the majority of long terminal repeat (LTR) retrotransposon sequences in C. elegans are located in or near genes (Ganko, Fielman, and McDonald 2001; Ganko et al. 2003). Likewise, recent computational analyses of the sequenced human genome indicate that retrotransposon sequences are located in the coding regions of at least 4% (Nekrutenko and Li 2001) and in the promoter regions of at least 25% (Jordan et al. 2003) of human genes. However, the mere identification of a TE-gene association in a sequenced genome is not, in itself, indicative of adaptive significance because it may only represent an insertional mutant unique to the sequenced strain. If, on the other hand, a given TE-gene association is shown to be in high frequency or fixed within a species or among closely related species, this may be taken as putative evidence that the association is of functional significance. The selective hypothesis can be subsequently tested for each candidate adaptive association by sequence and/or molecular analyses.

    Our laboratory has long been interested in the evolution and significance of transposable elements in Drosophila (e.g., McDonald [1990] and McDonald [1993]). We have recently initiated a study to identify LTR retrotransposon-gene associations (i.e., the presence of an LTR retrotransposon sequence in or within 1,000 bp of a gene) in the sequenced Drosophila melanogaster genome (Ganko et al., unpublished data). We have subsequently initiated a survey of the presence of these LTR retrotransposon-gene associations in natural populations. In this paper, we report the presence of 23 previously unidentified LTR retrotransposon-gene associations in the sequenced Drosophila genome. Using a polymerase chain reaction (PCR)–based approach, we have searched for the presence of each of these associations in 18 natural populations of D. melanogaster. The results indicate that the majority of the associations identified in the sequence strain involve recently inserted elements and that these associations are endemic to the sequenced strain. In contrast, those associations that were found to be widespread among populations are composed of relatively small retrotransposon fragments that are presumably of much older origin. We present evidence of a selective sweep of a retrotransposon-gene association present in all D. melanogaster populations surveyed.

    Methods

    Identification of LTR-Gene Association in the D. melanogaster Genome

    Using previously identified Drosophila LTR retrotransposon sequences as queries (Bowen and McDonald 2001), sequence retrieval was initiated via BlastN searches (default parameters [Altschul et al. 1997]) against the BDGP (http://www.fruitfly.org) and GenBank (http://www.ncbi.nlm.nih.gov) databases. Results with e-values less than e–10 were annotated on the corresponding genomic clone with MacVector version 7.0 (http://www.gcg.com), and nearby genes were noted. Selected genes within 1 kb of a TE were Blasted against NCBI's EST database and mapped along with predicted transcript structures from FlyBase (http://www.flybase.org).

    Drosophila Strains

    D. melanogaster strains established from 20 to 30 wild-collected females from Congo (Dimonika), Niger (Niamey), Swaziland (Mbabane), Kenya (Kenia), South Africa (Cape Town), India (Rohtak), Russia (Dushnabe and Dilizhan), Congo (Brazzaville), Ivory Coast (Tai Forest), Australia (Melbourne), Chile (Santiago de Chile), and France (Bordeaux) populations were obtained from Jean David, CNRS Gif-sur-Yvette, France. Strains from Germany (Freiburg), Italy (Frascati), and The Antilles (Rouge) were obtained from Nikolaj Junakovic, Universitá la Sapienza, Rome, Italy. The California and Athens (Georgia, USA) strains were provided by Daniel Promislow, University of Georgia. The D. melanogaster sequenced strain y1; cn1 bw1 sp1 was obtained from the Bloomington, Indiana, stock center.

    Polymerase Chain Reaction

    PCR primers were designed with MacVector version 7.0 and synthesized by Integrated DNA Technologies (Coralville, Iowa). The primer sequences used in each reaction are displayed in table 3 of Supplementary Material online. Three replicate PCR reactions were carried out per strain, per gene-retrotransposon association. The DNA used in each reaction (100 ng) was separately isolated from 10 flies (five males and five females per isolation) according to previously described methods (Gloor et al. 1993). PCR products for each primer were amplified in a 25μl reaction containing 3mM MgCl2, 10X PCR buffer supplied by Pierce (Rockford, Ill.), 2% DMSO, 0.2mM dNTPs, 0.5μM of each primer, and 0.5U of Taq DNA polymerase supplied by Pierce [Rockford, Ill.]. The program consisted of an initial incubation at 94°C for 5 min followed by 35 cycles each consisting of 30 s at 94°C, 1 min at the annealing temperature specific for each primer pair (see table 3 in Supplementary Material online), 1 min (per kb of PCR product) at 72°C and a final extension cycle 10 min at 72°C (final extension). All reactions were carried out in a Hot Top–equipped Robocycler Gradient 96 (Stratagene, La Jolla, Calif.). 25 μl of each PCR product was separated on 1% agarose gel in 0.5x TBE running buffer containing 0.25μg mL–1 ethidium bromide. Gel images were analyzed by UV transillumination.

    Sequencing

    PCR products were agarose gel purified (Qiaquick, Qiagen, Valencia Calif.) and cloned with TOPO TA (Invitrogen, Carlsbad, Calif.). DNA sequencing was performed in the Molecular Genetics Instrumentation Facility at the University of Georgia. Sequencing primers and primers used for amplifying sequenced PCR product, when different from the association primers, are shown in table 3 on Supplementary Material online. Sequence readouts were checked manually for accurate base callings and were assembled with Sequencher (Genes Codes, Ann Arbor, Mich.). The length of the region analyzed is given according to the expected length in the sequenced strain and the polymorphic site positions are located relative to this reference sequence. Nucleotide sequences were aligned using ClustalW (MacVector 7.0). As a control for PCR errors, we also sequenced the published y1; cn1 bw1 sp1 strain. Population genetic parameters were obtained using DnaSP version 3.95.7 (Rozas and Rozas 1999).

    Results

    Twenty-three new LTR retrotransposon-gene associations were selected for analysis. Genome sequence analysis resulted in the identification of over 300 hundred LTR retrotransposon-gene associations (Ganko et al., unpublished data). These associations consisted of full-length and smaller, fragmented LTR-retrotransposon sequences located both within genes and adjacent to genes. The 23 associations selected to be representative of the variety of sizes and location of gene-associated element sequences are shown in table 1. The sequences analyzed include six full-length LTR retrotransposons, nine LTRs, and eight fragments, ranging in size from 281 bp to 9212 bp. There were two instances in which a single retrotransposon sequence was associated with two genes (a 297 LTR was flanked by the Ken and TM4SF genes; a 659-bp Quasimodo LTR was flanked by the spn3 and CG9333 genes).

    Table 1 LTR-Gene Associations Identified in the Drosophila Melanogaster Genome.

    LTR retrotransposon sequences were located within the Flybase-predicted transcriptional boundary (including untranslated leader regions [UTR] introns and exons) of 12 genes. Six of these 12 associations were located within introns. Of the remaining six associations, two were in the 3' UTR, 3 were in the 5' UTR, and one spanned part of an exon and intron. Of the 11 LTR retrotransposon sequences located outside of gene boundaries, six were located 5' (ranging from 68 to 939 bp upstream from the transcriptional start site) of the gene and five were located 3' (ranging from 8 to 491 bp downstream of the polyA addition site) to the gene.

    The majority of the LTR retrotransposon-gene associations identified in the sequenced genome were not detected in natural populations. Two sets of PCR primers were designed for each retrotransposon-gene association, one to amplify a portion of the associated gene and the other to amplify a portion of the associated retrotransposon sequence. Appropriate pairs of these gene and retrotransposon primers were combined to detect the presence or absence of each retrotransposon-gene association in strains representing 18 geographically dispersed populations of D. melanogaster.

    More than half (14 of 23, or 61%) of the associations were detected only in the sequenced strain (tables 1 and 2). Of these, the majority of the associated elements (80%) were full-length or nearly full-length in size and had identical or nearly identical LTRs (> 99% sequence identity). This is consistent with the possibility that these elements have inserted in the recent evolutionary past and, thus, represent mutational polymorphism within the sequenced strain. Seven of the 23 associations were detected in some but not all of the D. melanogaster populations (tables 1 and 2). Some of these alleles were found to display slight indel variation in the size of the associated retrotransposon sequence (fig. 4).

    Table 2 Presence or Absence of Retroelement Sequence Associated with 23 Genes in 18 Drosophila melanogaster Strains Representing a Natural Population, the Laboratory Stock y1; cn1 bw1 sp1.

    FIG. 4. PCR products showing that seven LTR retrotransposon-gene associations are variably present in the 18 strains analyzed. The numbers above each gel correspond to the 18 D. melanogaster natural populations described in table 2

    Two associations were detected in all 18 D. melanogaster populations. Two of the 23 associations (9%) were detected in all 18 of the D. melanogaster populations surveyed (tables 1 and 2 and fig. 1). One of these associations is a promoter-containing 268-bp Quasimodo LTR fragment located 207 bp 5' to the CTCF (CG8591) gene (fig. 2A). The second is a Beagle fragment (593 bp) located 458 bp 3' to CG17514 gene (fig. 3). The CTCF (CG8591) gene maps to a euchromatic region (65F6) on chromosome 3L (www.flybase.org), whereas the CG17514 gene maps to constitutive heterochromatin (Hoskins et al. 2002).

    FIG. 1. Examples of PCR analyses used to detect the presence of LTR retrotransposon-gene association across 18 representative natural populations of D. melanogaster and the sequenced strain. Three PCR reactions were performed per strain, per gene; six representative strains are shown. First lane DNA ladder. (A) PCR products showing that the Quasimodo LTR fragment (281 bp) located 207 bp upstream of the CTCF gene is present in all D. melanogaster populations analyzed. Q = product from Quasimodo-specific primers (expected size = 186 bp); C = product from CTCF-specific primers (expected size = 395 bp); QC = product from Quasimodo F + CTCF R primers PCR (expected size = 1,805 bp). (B) PCR products showing that the Beagle fragment (593 bp) located 458 bp 3' to the heterochromatic CG17514 gene is present in all D. melanogaster populations analyzed. B = Beagle primers PCR product (expected size = 562); G = CG17514 primers PCR product (expected size = 250); BG = CG17514 F + Beagle R primers PCR product (expected size = 3,293 bp)

    FIG. 2. (A) Structure of the Quasimodo-CTCF allele in the sequenced D. melanogaster genome. The 281-bp fragment of Quasimodo LTR is associated with the CTCF gene in the D. melanogaster sequenced genome. Arrows represent the position of the primers used to detect the associations in the populations and species studied. The area sequenced is boxed. Two alternative transcripts have been detected for this gene, Ra and Rb, one composed of exons 1 (Ra), 2(Ra), 3 and 4 and the other composed of exon 1 (Rb) and exon 2 (Rb). (B) The sequence of the 281-bp Quasimodo LTR fragment is conserved across all D. melanogaster populations examined, whereas adjacent intronic and exonic sequences are significantly diverged. Vertical numbers represent the position of the polymorphic site; numbering is according to the sequenced strain; zero indicates no polymorphism in a defined area of the sequence

    Sequence Analysis

    To gain insight into the factors that may have contributed to the widespread distribution of the Beagle-CG17514 and Quasimodo-CTCF alleles in D. melanogaster populations, we sequenced various regions in and around the LTR retrotransposon sequence associated with each of these two widely distributed alleles in six geographically diverse populations (Athens, California, Germany, Kenya, India, and Antilles). The resulting sequences were aligned with one another and with that of the sequenced y1; cn1 bw1 sp1 strain.

    We sequenced the 1,144-bp region containing the Beagle fragment and the adjoining 5' flanking region (including the 3' UTR) of CG17514 (fig. 3). We also sequenced an additional 882-bp coding region (exon 3) within the CG17514 gene. This coding region was found to contain the highest number of polymorphic sites among the six natural populations and sequenced y1; cn1 bw1 sp1 strains (6.1% divergence). The sequence of the region containing the Beagle fragment (6.1% divergence) and the adjacent intragenomic region (3.3% divergence) were also found to be highly polymorphic among the six populations and the sequenced y1; cn1 bw1 sp1 strain.

    In an 1,863-bp region containing the Quasimodo LTR fragment and a portion of the CTCF gene, there were a total of 38 polymorphic sites, of which 14 were small indels. Remarkably, the entire 281-bp Quasimodo LTR fragment was found to be identical in sequence among all six natural population samples (0% divergence), as well as in the sequenced y1; cn1 bw1 sp1 strain. The sequence of the immediately adjacent CTCF exon 1 (5' UTR) was nearly invariant (0.3% divergence) among all strains. Higher levels of intraspecific variation were, however, detected in the more distal intron 1 (2.5% divergence) and exons 2 and 3 (3.0% divergence) of the CTCF gene (fig. 2).

    FIG. 3. (A) Structure of the Beagle-CG17514 allele in the sequenced D. melanogaster genome. A 593-bp Beagle fragment is located 458 bp downstream to the CG17514 gene on the D. melanogaster sequence genome. Arrows represent the position of the primers used to detect the associations in the populations and species studied. The area sequenced is boxed. (B) Sequence analysis showing that the Beagle-derived sequence and the gene region (exon and intron) contain a high number of polymorphic sites in the seven strains analyzed. Vertical numbers represent the position of the polymorphic site; numbering is according to the sequenced strain, and zero indicates no polymorphism in a defined area of the sequence

    Discussion

    Intraspecific patterns of nucleotide and retrotransposon-gene allelic variation appear to be distinct. Several techniques have been used to estimate levels of nucleotide genetic variation within and between species. In the early to middle 1970s, extensive studies of allozyme variation were carried out within and between species of Drosophila (Ayala 1975). The general conclusion was that relatively little nucleotide genetic variation exists between populations of Drosophila species. Local populations of Drosophila were estimated to be greater than 95% identical based on the results of allozyme studies, and this value has been generally supported by subsequent restriction fragment length polymorphisms (RFLP) and direct sequencing based studies (Aquadro et al. 1992).

    The results presented in this paper suggest that the story is quite different with regard to retrotransposon insertional variation in or near genes. We estimate that in the sequenced Drosophila melanogaster genome, approximately 2% of the genes (approximately 300 genes) are associated with an LTR retrotransposon sequence (i.e., an LTR retrotransposon sequence in or within 1,000 bp of the gene) (Ganko et al., unpublished data). The results presented in this paper suggest that the vast majority of these associations (61%) are endemic to the sequenced strain. Previous studies indicate that the genome of the sequenced strain is a typical D. melanogaster genome with respect to the number and distribution of transposable elements (Kaminker et al. 2002). In so far as this is correct, our results indicate that although there appears to be a relatively large number of retrotransposon-gene associations present in D. melanogaster genomes, the majority of these variants are likely to be population/strain specific.

    We did find, however, that a significant proportion of the retrotransposon-gene associations present in the sequenced genome are widely distributed among natural populations. Indeed, 39% (nine of 23) of the retrotransposon-gene associations identified in the sequence strain were detected in at least two populations, and more than 30% (seven of 23) were detected in at least seven out of the 18 populations. Nine percent (two of 23) of the retrotransposon-gene associations were detected in all of the 18 populations surveyed.

    Previous surveys of transposable-element insertion variants using in situ hybridization and RFLP methodologies (Charlesworth and Langley 1989) failed to detect insertion variants that were widespread among D. melanogaster populations. However, the ability of these techniques to detect relatively small insertions are limited, and our results indicate that most of the retrotransposon-gene associations that are widespread among populations are composed of relatively small retrotransposon fragments (tables 1 and 2).

    The majority of the retrotransposon-gene association variants that are unique to one or a few populations are likely of recent evolutionary origin. When LTR retrotransposons initially integrate into genomes, they are generally full-length in size; that is, they are composed of gag, pol, and sometimes env genes flanked by identical LTRs (Boeke and Stoye 1997). Full-length Drosophila LTR retrotransposons are typically 5 to 7 kb in length (Archipoda, Lynbaniskaya, and Ilin 1995). Over time, these full-length elements generally decrease in size because of the gradual accumulation of small deletions or by other mechanisms believed to actively remove transposable element sequences from the genome (Petrov 2002). In our study, more than 60% of the LTR retrotransposon sequences that are unique to the sequenced strain are more than 3,000 bp in length and most are full-length or nearly full-length elements. In addition, retrotransposon-gene associations identified in the sequenced strain that are also present in only a few (1 to 3) of the 18 natural populations surveyed are also composed of full-length or nearly full-length retrotransposons (e.g., roo-DopR and 297-Syn). The degree of sequence identity among the 5' and 3' LTRs of a full-length LTR retrotransposon can be used to estimate the time elapsed since the element transposed (SanMiguel et al. 1998; Jordan and McDonald 1999a, 1999b). All full-length elements found to be associated with genes in our survey displayed greater than 99% sequence identity, indicating that they have been recently inserted. These observations stand in contrast to the fact that those associations more widespread among populations (eight or nine out of 18) are composed of retrotransposon sequences no larger than 659 bp in length. The two associations that were found to be present in all 18 populations surveyed were composed of retrotransposon fragments of only 207 and 593 bp in length respectively.

    We conclude from these results that most of the retrotransposon-gene associations that are strain/population specific or present in only a few populations are the products of relatively recent insertional events. This is consistent with previous results indicating that essentially all full-length elements present in the D. melanogaster genome are much younger than the age of the species (Bowen and McDonald 2001; Kaminker et al. 2002). As elements age and/or are spread among populations, they appear to become significantly reduced in size. The question remains as to whether or not those retrotransposon fragments that remain associated with genes over relatively long spans of evolutionary time are of adaptive significance.

    Sequence analysis indicates that the widespread Quasimodo-CTCF gene association has undergone a recent selective sweep. There are at least three plausible explanations for the widespread distribution of retrotransposon-gene association alleles among Drosophila melanogaster populations. It is possible that the insertion alleles were present in the common ancestor of present-day populations and have been maintained by chance or selection in some or all populations over evolutionary time. A second possibility is that the insertion allele arose more recently in some population and has been spread to other populations by migration coupled with the action of drift and/or selection. A third, less likely, possibility is that the insertion event occurred independently in many or all of the populations in which the retrotransposon-gene association is currently present. We consider this latter possibility extremely unlikely for at least two reasons. First, new LTR retrotransposon insertions typically involve full-length elements and, as discussed above, all of the associations that are widespread among populations surveyed are composed of relatively small fragments of retrotransposons. Second, the precise insertion site of any given associated retrotransposon sequence is the same among all associated alleles, indicating that each is likely the product of the same insertional event. Under any scenario, if the retrotransposon-gene associations are being maintained or spread by random processes, neutral substitutions would be expected to accumulate among the homologous variants over evolutionary time.

    In an initial effort to assess the relative roles of drift and selection in the maintenance of widespread retrotransposon-gene associations, we have examined the patterns of sequence variation in and around the retroelement sequences in the two associations that were detected in all of the 18 populations surveyed in this study. Figure 3 displays the levels of variation in and around the Beagle retrotransposon sequence associated with the CG17514 gene among six of the 18 geographically diverse populations in which it is found. The level of polymorphism within the Beagle element and adjacent intergenic region is twice as high among populations (6.6%) as in the gene-encoding region (3.3%). This pattern of variation provides no evidence of selection operating on the retrotransposon sequence. The fact that this association is located within a constitutively heterochromatic (and, thus, low recombinogenic) region of the genome may help explain why it has been widely maintained in the species, despite the apparent absence of positive selection.

    In contrast, figure 2 displays the patterns of nucleotide variation in and around the LTR fragment located just upstream of the CTCF gene among these same six populations. The level of sequence variation is significantly reduced in the upstream region immediately adjacent to the fragmented retrotransposon. Indeed, we found that the 281-bp sequence of the Quasimodo LTR fragment itself is sequentially identical among all six populations (0% divergence). Nucleotide variability remains remarkably low in the intergenic region immediately adjacent to the Quasimodo sequence (0.3%) but gradually increases as a function of distance from the insertion site, reaching a maximum of 3% in the regions of exons 2 and 3 that were sequenced. These results are consistent with a selective sweep centered in the Quasimodo LTR fragment (e.g., Hudson, Saez, and Ayala [1997] and Saez et al. [2003]). Future molecular studies will be required to delineate the likely functional significance of the Quasimodo sequence on CTCF gene expression.

    Table 2 Extended.

    Acknowledgements

    Research supported by National Institutes of Health (NIH) Grant to J.F.M. E.W.G. is supported through an NIH Genetics Training Grant.

    Literature Cited

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.

    Aquadro, C. F., R. M. Jennings, Jr., M. M. Bland, C. C. Laurie, and C. H. Langley. 1992. Patterns of naturally occurring restriction map variation, dopa decarboxylase activity variation and linkage disequilibrium in the Ddc gene region of Drosophila melanogaster. Genetics 132:443-452.

    Archipoda, I. R., N. V. Lynbaniskaya, and Y. V. Ilin. 1995. Drosophila retrotransposons. RG Landas Co., Austin, Tex.

    Ayala, J. F. 1975. Genetic differentiation during the speciation process. Evol. Biol. 8:1-78.

    Boeke, J. D., and J. P. Stoye. 1997. Retrotransposons, endogenous retroviruses and the evolution of retroelements. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

    Bowen, N. J., and J. F. McDonald. 2001. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res. 11:1527-1540.

    Britten, R. J. 1996. DNA sequence insertion and evolutionary variation in gene regulation. Proc. Natl. Acad. Sci. USA 93:9374-9377.

    Brosius, J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238:115-134.

    Celniker, S. E., D. A. Wheeler, and B. Kronmiller, et al. (29 co-authors). 2002. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3:RESEARCH0079.

    Charlesworth, B. 1988. The maintenance of transposable elements in natural populations. Basic Life Sci. 47:189-212.

    Charlesworth, B., and C. H. Langley. 1989. The population genetics of Drosophila transposable elements. Annu. Rev. Genet. 23:251-287.

    Cherry, J. M., C. Ball, and S. Weng, et al. (8 co-authors). 1997. Genetic and physical maps of Saccharomyces cerevisiae. Nature 387:67-73.

    Flavell, R. B. 1986. Repetitive DNA and chromosome evolution in plants. Philos. Trans. R. Soc. Lond. Biol. Sci. 312:227-242.

    Ganko, E. W., V. Bhattacharjee, P. Schliekelman, and J. F. McDonald. 2003. Evidence for the contribution of LTR retrotransposons to C. elegans gene evolution. Mol. Biol. Evol. 20:1925-1931.

    Ganko, E. W., K. T. Fielman, and J. F. McDonald. 2001. Evolutionary history of Cer elements and their impact on the C. elegans genome. Genome Res. 11:2066-2074.

    Gloor, G. B., C. R. Preston, D. M. Johnson-Schlitz, N. A. Nassif, R. W. Phillis, W. K. Benz, H. M. Robertson, and W. R. Engels. 1993. Type I repressors of P element mobility. Genetics 135:81-95.

    Hickey, D. A. 1982. Selfish DNA: a sexually-transmitted nuclear parasite. Genetics 101:519-531.

    Hoskins, R. A., C. D. Smith, and J. W. Carlson, et al. (13 co-authors). 2002. Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol. 3:RESEARCH0085.

    Hudson, R. R., A. G. Saez, and F. J. Ayala. 1997. DNA variation at the Sod locus of Drosophila melanogaster: an unfolding story of natural selection. Proc. Natl. Acad. Sci. USA 94:7725-7729.

    Jordan, I. K., and J. F. McDonald. 1999a. Comparative genomics and evolutionary dynamics of Saccharomyces cerevisiae Ty elements. Genetica 107:3-13.

    Jordan, I. K., and J. F. McDonald. 1999b. Tempo and mode of Ty element evolution in Saccharomyces cerevisiae. Genetics 151:1341-1351.

    Jordan, I. K., I. B. Rogozin, G. V. Glazko, and E. V. Koonin. 2003. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19:68-72.

    Kaminker, J. S., C. M. Bergman, and B. Kronmiller, et al. (9 co-authors). 2002. The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol. 3:RESEARCH0084.

    Kidwell, M. G. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49-63.

    Lander, E. S., L. M. Linton, and B. Birren, et al. (more than 100 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.

    Landry, J. R., P. Medstrand, and D. L. Mager. 2001. Repetitive elements in the 5' untranslated region of a human zinc-finger gene modulate transcription and translation efficiency. Genomics 76:110-116.

    Lerman, D. N., P. Michalak, A. B. Helin, B. R. Bettencourt, and M. E. Feder. 2003. Modification of heat-shock gene expression in Drosophila melanogaster populations via transposable elements. Mol. Biol. Evol. 20:135-144.

    Maside, X., A. W. Lee, and B. Charlesworth. 2003. Inferences on the evolutionary history of the S-element family of Drosophila melanogaster. Mol. Biol. Evol. 20:1183-1187.

    McClintock, B. 1951. Chromosome organization and genetic expression. Cold Spr. Harb. Symp. Quant. Biol. 16:13-47.

    McDonald, J. F. 1990. Macroevolution and retroviral elements. Bioscience 40:183-191.

    McDonald, J. F. 1993. Evolution and consequences of transposable elements. Curr. Opin. Genet. Dev. 3:855-864.

    Medstrand, P., J. R. Landry, and D. L. Mager. 2001. Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J. Biol. Chem. 276:1896-1903.

    Nekrutenko, A., and W. H. Li. 2001. Transposable elements are found in a large number of human protein-coding genes. Trends Genet. 17:619-621.

    Petrov, D. A. 2002. DNA loss and evolution of genome size in Drosophila. Genetica 115:81-91.

    Petrov, D. A., Y. T. Aminetzach, J. C. Davis, D. Bensasson, and A. E. Hirsh. 2003. Size matters: non-LTR retrotransposable elements and ectopic recombination in Drosophila. Mol. Biol. Evol. 20:880-892.

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.

    Saez, A. G., A. Tatarenkov, E. Barrio, N. H. Becerra, and F. J. Ayala. 2003. Patterns of DNA sequence polymorphism at Sod vicinities in Drosophila melanogaster: unraveling the footprint of a recent selective sweep. Proc. Natl. Acad. Sci. USA 100:1793-1798.

    SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:43-45.

    SanMiguel, P., A. Tikhonov, and Y. K. Jin, et al. 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765-768.

    Shapiro, J. A. 1977. DNA insertion elements and the evolution of chromosome primary structure. Trends Biochem. Sci. 2:622-627.

    Sorek, R., G. Ast, and D. Graur. 2002. Alu-containing exons are alternatively spliced. Genome Res. 12:1060-1067.

    Venter, J. C., M. D. Adams, and E. W. Myers, et al. 2001. The sequence of the human genome. Science 291:1304-1351.(Lucia F. Franchini, Eric )