当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第4期 > 正文
编号:11176556
Expressed Sequence Tag-Linked Microsatellites as a Source of Gene-Associated Polymorphisms for Detecting Signatures of Divergent Selection i
http://www.100md.com 《分子生物学进展》
     * Department of Aquaculture, Swedish University of Agricultural Sciences, Ume?, Sweden; Institute of Zoology and Hydrobiology, University of Tartu, Tartu, Estonia; and Department of Biological and Environmental Sciences (Biocentre 3), University of Helsinki, Finland

    Correspondence: E-mail: anti.vasemagi@vabr.slu.se.

    Abstract

    The prediction that selection affects the genome in a locus-specific way also affecting flanking neutral variation, known as genetic hitchhiking, enables the use of polymorphic markers in noncoding regions to detect the footprints of selection. However, as the strength of the selective footprint on a locus depends on the distance from the selected site and will decay with time due to recombination, the utilization of polymorphic markers closely linked to coding regions of the genome should increase the probability of detecting the footprints of selection as more gene-containing regions are covered. The occurrence of highly polymorphic microsatellites in the untranslated regions of expressed sequence tags (ESTs) is a potentially useful source of gene-associated polymorphisms which has thus far not been utilized for genome screens in natural populations. In this study, we searched for the genetic signatures of divergent selection by screening 95 genomic and EST-derived mini- and microsatellites in eight natural Atlantic salmon, Salmo salar L., populations from different spatial scales inhabiting contrasting natural environments (salt-, brackish, and freshwater habitat). Altogether, we identified nine EST-associated microsatellites, which exhibited highly significant deviations from the neutral expectations using different statistical methods at various spatial scales and showed similar trends in separate population samples from different environments (salt-, brackish, and freshwater habitats) and sea areas (Barents vs. White Sea). We consider these ESTs as the best candidate loci affected by divergent selection, and hence, they serve as promising genes associated with adaptive divergence in Atlantic salmon. Our results demonstrate that EST-linked microsatellite genome scans provide an efficient strategy for discovering functional polymorphisms, especially in nonmodel organisms.

    Key Words: Adaptation ? nonneutral evolution ? divergent selection ? microsatellite DNA ? genetic hitchhiking ? outlier loci ? EST ? Atlantic salmon

    Introduction

    The prediction that selection affects the genome in a locus-specific way (Cavalli-Sforza 1966) affecting also the flanking neutral variation, known as genetic hitchhiking (Maynard Smith and Haigh 1974), enables the use of polymorphic markers in noncoding regions to detect the footprints of selection (Lewontin and Krakauer 1973). Thus, identification of loci that differ substantially in diversity (Schl?tterer 2002a; Kauer, Dieringer, and Schl?tterer 2003) and/or in population divergence (Beaumont and Nichols 1996; Vitalis, Dawson, and Boursot 2001; Beaumont and Balding 2004) from the rest of the genome can be flagged as "outlier" loci which are potentially affected by selection (reviewed by Schl?tterer et al. 2002b; Luikart et al. 2003; Storz (in press)).

    Recently, multi-locus screens based on genomic microsatellites have been applied in humans (Kayser, Brauer, and Stoneking 2003; Storz, Payseur, and Nachman 2004) and traditional model organisms (Schl?tterer 2002a; Kauer, Dieringer, and Schl?tterer 2003). However, the strength of the selective footprint on microsatellite locus depends on the distance from the selected site and will decay with time due to recombination (Wiehe 1998). Therefore, the utilization of polymorphic markers closely linked to coding regions of the genome would have a higher probability of detecting the footprints of selection and be more cost effective, as more gene-containing regions are covered (Vigouroux et al. 2002) compared to more conventional approaches using random selection of polymorphic markers. In addition, close linkage between a polymorphic marker and a transcribed gene further simplifies subsequent sequence analysis of the closest candidate gene, especially when the full genome sequence and/or a high-density linkage map of the study species is not available, as is the case for most nonmodel organisms. Therefore, the occurrence of highly polymorphic microsatellites in the untranslated regions of expressed sequence tags (ESTs) (Li et al. 2004) is a potentially useful source of gene-associated polymorphisms. Thus far however, the use of such gene-associated markers has been limited to linkage mapping studies (e.g., Ruyter-Spira et al. 1996) and an evaluation of their use for potentially identifying genes involved in local adaptation in natural populations is lacking. Given that the number of ESTs publicly available in species other than traditional model organisms is increasing rapidly (e.g., Rise et al. 2004), these loci have the potential to serve as a rich source for gene-associated polymorphisms and present a promising alternative to methods that utilize anonymous markers such as amplified fragment length polymorphisms (AFLP) (e.g., Wilding, Butlin, and Grahame 2001; Campbell and Bernatchez 2004), especially in species with relatively low gene density and high recombination rates.

    Salmonid fishes are good candidates for assessing the efficiency of using EST-linked microsatellites for genome screens as (1) the large diversity in behavior, immunology, life-history patterns, and other traits among local salmonid populations at various geographical scales has been widely recognized as evidence of adaptation to the local environment (Taylor 1991; Adkison 1995) and (2) a large number of EST sequences are publicly available (Rise et al. 2004). In addition, despite their tendency to evolve local adaptations (Taylor 1991), the number of genes and genomic regions that have been found to associate with adaptive or fitness-related traits in salmonids is limited (e.g., Danzmann, Jackson, and Ferguson 1999; Sakamoto et al. 1999; Langefors et al. 2001; Tao and Boulding 2003).

    In this study, we aimed to detect genetic signatures of selection in free-living populations of Atlantic salmon (Salmo salar L.) by screening 95 tandem repeat markers to identify genes and genomic regions potentially important for local adaptation. More specifically, we used genomic and EST-associated mini- and microsatellites to scan eight wild salmon populations sampled from different spatial scales inhabiting similar and contrasting natural environments (salt-, brackish, and freshwater habitat) in order to detect molecular signatures of divergent selection. We compared the consistency of the results obtained using four different neutrality tests and evaluated the robustness of the results across a large spatial scale by assessing whether the outlier loci possessed similar trends in different population pairs.

    Material and Methods

    Study Populations

    Because the spatial scale of selection is expected to vary among different loci, we sampled four closely related wild population pairs (Barents Sea: R. Teno/Tana and R. Tuloma; White Sea: R. Varzuga and R. Kitsa; Baltic Sea: R. Vindel?lven and R. Torne/Tornionjoki; landlocked: R. Taipale and R. Syskynjoki) inhabiting distinct natural environments (salt-, brackish, and freshwater habitat) to be able to detect divergent selection at relatively similar and contrasting environments and both small and large spatial scales (average distance between populations 171 and 671 km, respectively) (fig. 1). In total, 200 individuals were analyzed (24–28 specimens per population). Total DNA was extracted from ethanol-preserved fin clips using salt extraction protocol as outlined in Aljanabi and Martinez (1997).

    FIG. 1.— Map of Northern Europe showing locations of the studied Atlantic salmon populations. Populations inhabiting salt- (TEN, R. Teno/Tana; TUL, R. Tuloma; VAR, R. Varzuga; KIT, R. Kitsa), brackish (VIN, R. Vindel?lven; TOR, R. Torne/Tornionjoki), and freshwater (TAI, R. Taipale; SYS, R. Syskynjoki) habitats during the adult feeding phase are surrounded by dashed, solid, and dotted circles, respectively.

    EST Database Mining and Micro- and Minisatellite Genotyping

    In total, 58,146 Atlantic salmon EST sequences present in the GenBank database were scanned for di-, tri- and tetranucleotide microsatellite repeats using TANDEM REPEATS FINDER v.3.01 (Benson 1999) with the following parameters: match 2; mismatch 7; indel 7; and minimum alignment score 50. Because EST databases are redundant (i.e., contain many overlapping sequences from the same gene), identified microsatellite-containing ESTs were clustered using CAP3 program with a 40-base pair overlap and 95% identity criterion in order to identify homologous loci (Huang and Madan 1999). Primers flanking 8 tetra- and 126 dinucleotide repeat sequences were designed using PRIMER3 software (Rozen and Skaletsky 2000). Similarity search of microsatellite-containing EST sequences was conducted using BlastN and BlastX with the default parameters as described in Altschul et al. (1990). Detailed amplification procedures and primer sequences are described in Vasem?gi , Nilsson, and Primmer (in press). Altogether, 75 EST-associated microsatellites that gave high-quality amplification products were selected for further population-wide genotyping using a MegaBACETM 1000 capillary sequencer (Amersham Biosciences, Buckinghamshire, UK). We also included three histocompatibility complex–linked mini- (MHCII; Stet et al. 2002) and microsatellites (MHCI, TAP2B; Grimholt et al. 2002) in the screening panel as they have been shown to associate with pathogen resistance and mate choice in Atlantic salmon (Landry et al. 2001; Langefors et al. 2001; Miller et al. 2004) and are hence good a priori candidates as loci potentially under selection. In addition, the same individuals were analyzed with 17 genomic microsatellite loci (Tonteri et al. 2005; A. Tonteri and C. Primmer, unpublished data).

    Genetic Diversity and Differentiation Measures

    Conformance to Hardy-Weinberg (H-W) equilibrium expectations was tested using exact tests (Guo and Thompson 1992) as implemented in GENEPOP 3.1b (Raymond and Rousset 1995). Gene diversity (Nei 1978) and pairwise FST estimates according to Weir and Cockerham (1984) were calculated with the software Microsatellite-Analyser (Dieringer and Schl?tterer 2003). The significance of FST estimates among populations was tested by permuting individuals between samples. Ninety-five percent confidence intervals (CI) of the mean FST estimates were obtained by bootstrapping (1,000 replicates) over loci. Heterogeneity in FST estimates among loci was quantified by calculating 2.5th, 25th (Q1), 75th (Q3), and 97.5th percentiles from the observed FST values. Because one of the neutrality tests applied (see below) assumes that no mutations have occurred after the divergence of two populations from the common ancestor population (Vitalis, Dawson, and Boursot 2001), we determined the spatial scale where stepwise-like mutations, in addition to genetic drift, have contributed to genetic differentiation among studied populations by testing whether RST = FST using allele size randomization procedure (10,000 permutations) as implemented in SPAGeDi 1.1 (Hardy and Vekemans 2002). If the observed RST is significantly larger than the randomized RST, the stepwise-like mutations have contributed to the observed differentiation pattern (Hardy et al. 2003).

    Methods for Detection of Divergent Selection

    Spatially varying divergent selection is expected to increase genetic differentiation between populations and reduce variability at linked loci. To search for the signatures of divergent selection we applied three methods, which identify outlier loci based on various estimators of population divergence (Beaumont and Nichols 1996; Vitalis, Dawson, and Boursot 2001; Beaumont and Balding 2004) and an empirical approach based on reduction in genetic diversity (Schl?tterer 2002a; Kauer, Dieringer, and Schl?tterer 2003). Because of the explorative nature of multi-locus screens, we did not apply the extremely conservative Bonferroni correction for the obtained significance values, but instead, we initially report all loci that fall outside 99% from the neutral expectations. Additionally, we evaluated the status of identified candidate loci by assessing whether the putative outliers possess similar trends in separate (albeit not statistically independent) population samples from different environments (salt-, brackish, and freshwater habitats) and sea areas (Barents vs. White Sea). As all applied neutrality tests are based on different assumptions and parameters, the detection of outlier loci simultaneously with more than one statistical approach will strengthen the candidate status of particular locus.

    The first method (hereafter referred to as the "FST-test") developed by Beaumont and Nichols (1996) calculates Cockerham and Weir's (1993) estimator of FST for each locus in the sample, and coalescent simulations based on a symmetrical island model of population structure are used to generate data sets with the mean FST similar to the empirical distribution. To calculate approximate P values for each locus, 100,000 independent loci were generated and simulated distribution of FST was then compared to the observed FST values conditional on heterozygosity to identify potential outliers as implemented in the software FDIST 2 (http://www.rubic.reading.ac.uk/mab/software/fdist2.zip). Sample sizes were set to 24 individuals per population in all simulations. Because our pairwise sampling strategy at the large geographical scale (salt-, brackish, and freshwater comparisons; Barents vs. White Sea) likely violates the assumption of equal migration rate, individual populations within each category were pooled together (i.e., R. Vindel?lven and Torne/Tornionjoki samples were pooled to construct a brackish water data set) and two subpopulations were simulated assuming stepwise mutation model. Loci with unusually high FST values conditional on heterozygosity were regarded as potentially under divergent selection.

    The second likelihood-based method that uses hierarchical-Bayesian model (hereafter the Bayes test), developed by Beaumont and Balding (2004), has similar characteristics compared to the FST-test of Beaumont and Nichols (1996) but uses more information from the raw data and does not assume the same value of FST for each subpopulation. Therefore, this method should be more suitable when some populations exhibit lower variability or reduced immigration than others, which is likely the case in our data set at a large spatial scale. We applied the Bayes test to identify potential outliers from neutrality associated with different environments (salt-, brackish, and freshwater habitat comparison) and sea areas (Barents vs. White Sea). It should be noted that the Bayes test is not a pairwise test because all populations in a particular analysis are treated separately. We did not apply the Bayes test to closely related population pairs at the local scale as simulations by Beaumont and Balding (2004) showed that there was no advantage to combine FST- and Bayes tests (both based on FST estimation) when the same number of subpopulations were used (i.e., there was considerable growth of false positives compared to very few additional "truly" selected loci). We identified outlier loci potentially subject to divergent selection and their corresponding posterior "P values" from the proportion of positive locus-effect parameters i among 2,000 Markov chain Monte Carlo outputs as outlined by Beaumont and Balding (2004).

    The third coalescence-based simulation approach (subsequently referred as the F-test), developed by Vitalis, Dawson, and Boursot (2001), relies upon a population-split model from the common ancestor population and uses the population-specific parameters of population divergence, F (conditional on the number of alleles), to identify putative outlier loci affected by selection. The expected joint distributions of Fpop1 and Fpop2 were generated by performing 100,000–500,000 coalescent simulations for each pairwise comparison using the software DETSEL v.1.0 (Vitalis et al. 2003). The following nuisance parameters were used in different combinations to generate null distributions with similar number of allelic states as in the observed data set: mutation rate (infinite allele model [IAM]) 0.005, 0.001, and 0.0001; ancestral population size 500, 1,000, and 10,000; population size before the split 50 and 500; time since an assumed bottleneck event 50, 100, and 200 generations; time since the population split 50 and 100 generations. The loci with six or more alleles were grouped together as the joint distribution of Fpop1 and Fpop2 becomes tighter when the number of alleles increases (Vitalis, Dawson, and Boursot 2001). Loci that fall outside the specified "probability region" compared to the simulated data points are reported as potentially being affected by selection.

    The fourth empirical approach (hereafter referred to as the lnRH test) identifies loci that differ in variability from the reminder of the genome by calculating the ratio of gene diversity in two populations (Kauer, Dieringer, and Schl?tterer 2003). It has been demonstrated that lnRH is approximately normally distributed under neutrality (Kauer, Dieringer, and Schl?tterer 2003). Therefore, after standardization (mean = 0; SD = 1) 95% of neutral loci are expected to have values between –1.96 and 1.96 (99% CI between –2.58 and 2.58; 99.9% CI between –3.29 and 3.29). In the cases when a locus was monomorphic in one population, we added a single different allele to the sample in order to avoid the heterozygosity value being zero.

    Results

    Genetic Diversity and Population Differentiation

    Both EST-associated and genomic microsatellites showed relatively similar levels of genetic variation (median gene diversity across populations 0.57 and 0.70, respectively; median number of alleles across populations 4.8 and 5.9, respectively; Mann-Whitney U test, P > 0.05) and differentiation among populations (global FST 0.11 and 0.12, respectively; Mann-Whitney U test P > 0.05), indicating that both types of markers were generally affected by the same kind of evolutionary forces across the genome. Gene diversity and number of alleles differed significantly (Wilcoxon's signed rank test: gene diversity P < 0.05; number of alleles, P < 0.001) among populations from salt-, brackish, and freshwater habitats (table 1). The results of genetic diversity estimates and H-W testing for each locus and population are available in Appendix 1 (Supplementary Material online). Genetic differentiation measured across loci was highly significant (FST, P < 0.001) between all studied populations, and the level of divergence varied considerably between the geographically proximate population pairs ranging from 0.02 (White Sea: R. Kitsa vs. R. Varzuga) to 0.14 (landlocked: R. Taipale vs. R. Syskynjoki). Pairwise FST values between more distantly related pairs were, on average, larger (table 1). Observed multi-locus RST values were significantly higher than permuted RST estimates across a large spatial scale (between salt-, brackish, and freshwater habitat comparisons; Barents vs. White Sea), suggesting that stepwise-like mutations have contributed to the micro- and minisatellite divergence at this scale (table 1).

    Table 1 Genetic Diversity (AM, mean number of alleles; H, gene diversity) and Divergence Estimates (RST, genetic differentiation based on allele size; FST, genetic differentiation based on allele identity; F, population-specific divergence) of the Studied Loci in Atlantic Salmon Populations from Different Spatial Scales and Environmental Conditions (salt, brackish, and freshwater habitats)

    Tests for Selection at a Local Geographical Scale

    In total, 18 EST-linked and 4 genomic microsatellites were identified as outliers in four geographically proximate population pair comparisons at the 99% P level using one or more neutrality tests (table 2 and fig. 2). Five EST-associated microsatellite loci (CA047944, CA062621, CA054978, CA054565, CA061621) exhibited significant deviations from the neutral expectations with all three statistical approaches (table 2). The EST locus similar to glycogen debranching enzyme (CA058586) was an outlier in two separate population pairs (table 2). Hence, we consider these six loci as the most promising candidates affected by divergent selection at a small geographical scale. Two MHC-linked markers were also identified as putative outliers in two pairwise population comparisons (table 2).

    Table 2 Candidate Loci for Adaptive Genetic Divergence Between Geographically Proximate (small spatial scale) Atlantic Salmon Populations

    FIG. 2.— Plot of FST values against standardized lnRH estimates for 78 EST-associated (empty bullets) and 17 genomic (black bullets) tandem repeat markers. (A) R. Kitsa versus R. Varzuga. (B) R. Teno/Tana versus R. Tuloma. (C) R. Vindel?lven versus R. Torne/Tornionjoki. (D) R. Taipale versus R. Syskynjoki. Dashed lines indicate the 99% CI (–2.58, +2.58) of standardized lnRH estimates. Accession numbers or locus names of putative candidate loci potentially affected by selection (see Results) are indicated.

    Tests for Selection Across a Broad Geographical Scale

    In total, 21 EST-linked and 4 genomic microsatellites were identified as outliers (P < 0.01) with one or more statistical approach in the large-scale comparisons (table 3 and fig. 3). Fourteen loci deviated from the neutral expectations in more than one habitat/sea area comparison. Five EST-associated microsatellite loci (CA058586, CA048136, CA060208, CA062621, CA039588) exhibited significant departures from the neutral expectations in at least three out of four outlier tests within a single comparison (table 3). EST locus similar to glycogen debranching enzyme (CA058586) exhibited a considerable loss of genetic diversity in the Baltic Sea populations (gene diversity 0.02; number of alleles 2) compared to saltwater (gene diversity 0.82; number of alleles 21) and freshwater populations (gene diversity 0.37; number of alleles 5) (Supplementary Material Appendix 1). The departure from neutral expectations at this locus remained significant in the majority of single population comparisons showing similar trends in separate populations (table 3). The EST locus CA060208 was an extreme outlier in both comparisons involving landlocked populations (freshwater habitat). Hence, we consider these five loci as the most promising candidates affected by divergent selection at a large geographical scale.

    Table 3 Candidate Loci for Adaptive Genetic Divergence Between Atlantic Salmon Populations from Different Habitats (salt-, brackish, and freshwater) and Geographic Areas (Barents vs. White Sea)

    FIG. 3.— Plot of FST values against standardized lnRH estimates for 78 EST-associated (empty bullets) and 17 genomic (black bullets) tandem repeat markers. (A) Brackish versus saltwater. (B) Brackish versus freshwater. (C) Salt- versus freshwater. (D) Barents versus White Sea. Dashed lines indicate the 99% CI (–2.58, +2.58) of standardized lnRH estimates. Accession numbers or locus names of putative candidate loci potentially affected by selection (see Results) are indicated.

    Outliers Among EST-Associated and Genomic Microsatellites

    Contrary to the expectations, anonymous genomic microsatellites were not less frequently classified as outliers compared to gene-associated loci (2, all neutrality tests, P > 0.05). Identification of two genomic microsatellites (Ssa14, Ssa171) as outliers simultaneously with two neutrality tests out of three at the local scale suggests that these loci might have been influenced by divergent selection (table 2 and fig. 2C). Additional genotyping of 24 and 20 individuals from the R. Vindel?lven and Torne/Tornionjoki population, respectively, even further increased FST estimates between these samples (Ssa14, FST = 0.43; Ssa171, FST = 0.188).

    Discussion

    In this study 17 genomic and 78 EST-associated mini- and microsatellites were screened for the footprints of divergent selection among eight Atlantic salmon populations at different geographical scales occupying either relatively similar or contrasting habitats with the aim of identifying genes and genomic regions potentially important for adaptation. Several genes were identified which serve as promising candidates for adaptive divergence, and hence, "local" adaptation among wild Atlantic salmon populations at different spatial scales and environments.

    Anonymous Versus EST-Targeted Polymorphism Screens for Selection

    Two recent studies which utilized AFLP scans to search for footprints of divergent selection in sympatric ecotypes (dwarf and normal) of lake whitefish (Coregonus clupeaformis) and in snail (Littorina saxatilis) populations that differ in shell shape have identified that ca. 1%–5% of screened loci are likely influenced by directional selection (Wilding, Butlin, and Grahame 2001; Campbell and Bernatchez 2004). In the current study the proportion of outlier loci identified was considerably higher (9 of 78 EST-linked loci [12%]). This implies that application of EST-associated microsatellite loci could improve the efficiency of genome screens, especially in species with (1) low genome densities where anonymous loci may not be tightly linked with selected loci and/or (2) high recombination rates, as the signature of selection, may be lost rapidly due to recombination. Concordantly, recent genome scan in closely related oak (Quercus) species (Scotti-Saintagne et al. 2004) identified substantially higher frequency of outliers (21%) among gene-associated loci than among anonymous markers (9%; genomic microsatellites, AFLPs). In addition, as a number of the markers applied in this study are also polymorphic in other salmonid species (Vasem?gi, Nilsson, and Primmer in press), the strategy will be useful in a broad range of salmonids for identifying candidate loci for further sequence analysis in order to further validate the footprints of selection.

    To our knowledge, evidence of divergent selection among contemporary wild Atlantic salmon populations has been reported only at two genes (MEP-2, Verspoor and Jordan 1989; MHCII?, Landry and Bernatchez 2001). However, both studies have used a limited number of loci as a neutral baseline without applying simulations to further test whether the observed pattern deviates from the neutral expectations.

    In the light of encouraging simulations of Beaumont and Balding (2004), who demonstrated a reasonable power of genome scans to identify loci under divergent selection, EST scans may provide suitable strategy to discover functionally important genetic variation both in model and nonmodel organisms and present a viable alternative to genome scans which utilize anonymous genetic markers such as AFLPs. Also, given the relative ease of conducting large-scale multi-locus screens for natural selection (Wilding, Butlin, and Grahame 2001; Campbell and Bernatchez 2004) it is likely that more emphasis will be directed to outlier verification and characterization in the future.

    Performance of Neutrality Tests

    The population-specific divergence (F) method of Vitalis, Dawson and Boursot (2001) revealed a much higher number of outlier loci than the other tests (tables 2 and 3). The explanations for such discrepancy might be that (1) identified outliers from the F-test are real and other methods have failed to detect the signatures of selection at these loci and (2) most of the detected outliers are false positives (type I error). Closer examination of the identified outliers at different spatial scales revealed a striking difference in a number of cases when the population-specific divergence test was the only method showing the deviations from neutrality. Particularly, the F-test identified only two additional outliers not supported by FST- or lnRH test at a local scale, while even 16 outliers from F-test were not supported by any other method at a broad scale (tables 2 and 3). Such apparent discrepancy between the population-specific divergence test and other methods at a large spatial scale suggests that the candidate status of these 16 loci must be taken with considerable caution.

    Interestingly, the consistency with which the same outlier loci were identified using different tests at the large spatial scale was lower for other methods as well (outlier overlap: F-test vs. FST-test, small scale 48%, large scale 23%; F-test vs. lnRH test, small scale 65%, large scale 31%; FST-test vs. lnRH test, small scale 33%, large scale 17%). Outliers from the hierarchical-Bayesian method, which treated each population in a particular comparison separately, showed the most congruent results with the FST-test (outlier overlap: 36%) while only a single deviation from neutral expectations was supported simultaneously by the Bayes and lnRH tests at the large scale (outlier overlap: 5%). High frequency of simultaneous identification of the same loci as outliers with several methods at the local scale supports the prediction that comparison of closely related populations is expected to enhance the efficiency of genome scans for divergent selection because (1) potential selective footprints are likely not obscured by mutations and (2) random drift has a reduced effect on the genetic parameters used to infer the outlier loci (Beaumont and Nichols 1996; Vitalis, Dawson, and Boursot 2001; Schl?tterer 2002a).

    Interpreting Departures from Neutrality

    In the present study, EST-associated tandem repeat markers did not deviate more frequently from the neutral expectations than anonymous genomic microsatellite loci. Therefore, it is possible that (1) some of the genomic microsatellites are affected by selection; (2) a considerable number of the outliers are false positives; or (3) a combination of (1) and (2) can occur. It is likely that false positives (type I error) resulting from multiple testing, possible violations of test assumptions, and genome-wide heterogeneity in variability are responsible for some of the observed outliers. On the other hand, deviations from neutrality at genomic microsatellite Ssa14 with several neutrality tests both at local (Baltic Sea: R. Vindel?lven vs. Torne/Tornionjoki) and large geographical scales (brackish vs. freshwater; salt- vs. freshwater) suggest that Ssa14 might have been influenced by divergent selection. The linkage of this locus to any functional gene is currently unknown (Gilbey et al. 2004).

    It is important to note that, significant deviation from neutral expectations using one or multiple tests does not necessarily mean that a particular locus has been affected by selection. We applied four different neutrality tests in eight separate comparisons using 95 loci (local scale: 3 x 4 x 95 = 1,140 separate tests; large scale: 4 x 4 x 95 = 1,520 separate tests) which is expected to result in approximately 27 false positives at 99% P level. The fact that we found three times more deviations at 99% P level (altogether 82 deviations were observed) indicates that it is unlikely that all the outliers are false positives (type I error). As emphasized in earlier studies, significant results with more than one neutrality test only raise the candidate status of particular locus but does not demonstrate selection per se (e.g., Vigouroux et al. 2002; Schl?tterer 2002a; Campbell and Bernatchez 2004). Therefore, the identified candidate EST loci will serve as a basis for further sequence analysis to validate the role of divergent selection in these genes because the violation of test assumptions is another factor potentially producing false positives. Particularly, FST-test of Beaumont and Nichols (1996) is based on a symmetrical island model of population structure which is based on the assumptions of equal population sizes and migration rates between populations. It is likely that at least some comparisons within our data set (e.g., saltwater vs. freshwater) violate such assumptions, and outliers from FST-test alone should be therefore taken with caution. On the other hand, identification of the same outliers using the FST- and Bayes test of Beaumont and Balding (2004) which does not assume equal populations sizes and migration rates strengthens the candidate status of the five loci (CA058586, CA048136, CA060208, CA062621, CA039588). The inconsistent results of the F-test and other methods at a large spatial scale were probably largely caused by mutations at microsatellite loci which occurred after the population divergence, as indicated by the RST permutation test of Hardy et al. (2003). In addition, because the F-test is based on the joint distribution of the population-specific divergence estimates conditional on the number of alleles, it is possible that different within-locus mutation rates affect the results of F-test more severely than the lnRH test, which is based on gene diversity. Nevertheless, different within-locus mutation rates are likely affecting the outcome of the lnRH test as well. Therefore, when predominantly the shortest alleles are associated with the putative selective sweep, the outlier status of particular loci identified using the lnRH test should be taken with caution. Another potentially unrealistic assumption of F-test is that no migrants have been exchanged after the divergence of two populations. However, Vitalis, Dawson, and Boursot (2001) have shown that moderate levels of migration do not increase the false-positive results (type I error) of the F-test.

    An important direction for future research is therefore the formal testing of the effect of the model assumptions on the identification of outlier loci. In the absence of such information, it has been suggested that a practical approach for strengthening the candidate status of identified outlier loci is to simultaneously apply two or more neutrality tests which are based on different assumptions and parameter estimation (e.g., Storz, Payseur, and Nachman 2004) and only consider outlier loci that are supported by several methods for subsequent validation steps (e.g., further sequence analysis of flanking regions).

    Supplementary Material

    Appendix 1 is available online at the MBE web site (http://www.molbiolevol.org).

    Acknowledgements

    We thank Lena Laaksonen and Paula Lehtonen for their help in the laboratory and Anni Tonteri for providing anonymous microsatellite data. Alexey Veselov and Alexander Zubchenko are acknowledged for providing tissue samples from Russian salmon populations, and two anonymous reviewers are thanked for constructive comments on the manuscript. This work was supported by grants from the Oscar and Lili Lamm's Foundation, the Centre of International Mobility, the Academy of Finland, the Finnish Ministry for Agriculture and Forestry, and a NorFA mobility scholarship.

    References

    Adkison, M. D. 1995. Population differentiation in Pacific salmon: local adaptation, genetic drift, or the environment? Can. J. Fish. Aquat. Sci. 52:2762–2777.

    Aljanabi, S. M., and I. Martinez. 1997. Universal and rapid salt-extraction of high quality genomic DNA for PCR–based techniques. Nucleic Acids Res. 25:4692–4693.

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    Beaumont, M. A., and D. J. Balding. 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13:969–980.

    Beaumont, M. A., and R. A. Nichols. 1996. Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. Lond. B Biol. Sci. 263:1619–1626.

    Benson, G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27:573–580.

    Campbell, D., and L. Bernatchez. 2004. Genomic scan using AFLP markers as a means to assess the role of directional selection in the divergence of sympatric whitefish ecotypes. Mol. Biol. Evol. 21:945–956.

    Cavalli–Sforza, L. L. 1966. Population structure and human evolution. Proc. R. Soc. Lond. B Biol. Sci. 164:362–379.

    Cockerham, C. C., and B. S. Weir. 1993. Estimation of gene flow from F–statistics. Evolution 47:855–863.

    Danzmann, R. G., T. R. Jackson, and M. M. Ferguson. 1999. Epistasis in allelic expression at upper temperature tolerance QTL in rainbow trout. Aquaculture 173:45–58.

    Dieringer, D., and C. Schl?tterer. 2003. MICROSATELLITE ANALYSER (MSA): a platform independent analysis tool for large microsatellite data sets. Mol. Ecol. Notes 3:167–169.

    Gilbey, J., E. Verspoor, A. McLay, and D. Houlihan. 2004. A microsatellite linkage map for Atlantic salmon (Salmo salar). Anim. Genet. 35:98–105.

    Grimholt, U., F. Drablos, S. M. Jorgensen, B. Hoyheim, and R. J. M. Stet. 2002. The major histocompatibility class I locus in Atlantic salmon (Salmo salar L.): polymorphism, linkage analysis and protein modelling. Immunogenetics 54:570–581.

    Guo, S. W., and E. A. Thompson. 1992. Performing the exact test for Hardy-Weinberg proportion for multiple alleles. Biometrics 48:361–372.

    Hardy, O. J., N. Charbonnel, H. Fréville, and M. Heuertz. 2003. Microsatellite allele sizes: a simple test to assess their significance on genetic differentiation. Genetics 163:1467–1482.

    Hardy, O. J., and X. Vekemans. 2002. SPAGEDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol. Ecol. Notes 2:618–620.

    Huang, X. Q., and A. Madan. 1999. CAP3: a DNA sequence assembly program. Genome Res. 9:868–877.

    Kauer, M. O., D. Dieringer, and C. Schl?tterer. 2003. A microsatellite variability screen for positive selection associated with the "Out of Africa" habitat expansion of Drosophila melanogaster. Genetics 165:1137–1148.

    Kayser, M., S. Brauer, and M. Stoneking. 2003. A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol. Biol. Evol. 20:893–900.

    Landry, C., and L. Bernatchez. 2001. Comparative analysis of population structure across environments and geographical scales at major histocompatibility complex and microsatellite loci in Atlantic salmon (Salmo salar). Mol. Ecol. 10:2525–2539.

    Landry, C., D. Garant, P. Duchesne, and L. Bernatchez. 2001. ‘Good genes as heterozygosity’: the major histocompatibility complex and mate choice in Atlantic salmon (Salmo salar). Proc. R. Soc. Lond. B Biol. Sci. 268:1279–1285.

    Langefors, A., J. Lohm, M. Grahn, O. Andersen, and T. von Schantz. 2001. Association between major histocompatibility complex class IIB alleles and resistance to Aeromonas salmonicida in Atlantic salmon. Proc. R. Soc. Lond. B Biol. Sci. 268:479–485.

    Lewontin, R. C., and J. Krakauer. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74:175–195.

    Li, Y. C., A. B. Korol, T. Fahima, and E. Nevo. 2004. Microsatellites within genes: structure, function, and evolution. Mol. Biol. Evol. 21:991–1007.

    Luikart, G., P. R. England, D. Tallmon, S. Jordon, and P. Taberlet. 2003. The power and promise of population genomics: from genotyping to genome typing. Nat. Rev. Genet. 4:981–994.

    Maynard Smith, J., and J. Haigh. 1974. The hitch–hiking effect of a favourable gene. Genet. Res. 23:23–35.

    McConnell, S. K., P. O'Reilly, L. Hamilton, and J. M. Wright. 1995. Polymorphic microsatellite loci from Atlantic salmon (Salmo salar): genetic differentiation of North American and European populations. Can. J. Fish. Aquat. Sci. 52:1863–1872.

    Miller, K. M., J. R. Winton, A. D. Schulze, M. K. Purcell, and T. J. Ming. 2004. Major histocompatibility complex loci are associated with susceptibility of Atlantic salmon to infectious hematopoietic necrosis virus. Environ. Biol. Fishes 69:307–316.

    Nei, M. 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590.

    O'Reilly, P.T., L. C. Hamilton, S. K. McConnell, and J. M. Wright. 1996. Rapid analysis of genetic variation in Atlantic salmon (Salmo salar) by PCR multiplexing of dinucleotide and tetranucleotide microsatellites. Can. J. Fish. Aquat. Sci. 53:2292–2298.

    Raymond, M., and F. Rousset. 1995. GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J. Hered. 86:248–249.

    Rise, M. L., K. R. von Schalburg, G. D. Brown et al. (24 co-authors). 2004. Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics. Genome Res. 14:478–490.

    Rozen, S., and H. J. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Pp. 365–386 in S. Krawetz and S. Misener, eds. Bioinformatics methods and protocols: methods in molecular biology. Humana Press, Totowa, N.J.

    Ruyter-Spira, C. P., R. P. Crooijmans, R. J. Dijkhof, P. A. van Oers, J. A. Strijk, J. J. van der Poel, and M. A. Groenen. 1996. Development and mapping of polymorphic microsatellite markers derived from a chicken brain cDNA library. Anim. Genet. 27:229–234.

    Sakamoto, T., R. G. Danzmann, N. Okamoto, M. M. Ferguson, and P. E. Ihssen. 1999. Linkage analysis of quantitative trait loci associated with spawning time in rainbow trout (Oncorhynchus mykiss). Aquaculture 173:33–43.

    Sanchez, J. A., C. Clabby, and D. Ramos. 1996. Protein and microsatellite single locus variability in Salmo salar L. (Atlantic salmon). Heredity 77:423–432.

    Schl?tterer, C. 2002a. A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics 160:753–763.

    Schl?tterer, C. 2002b. Towards a molecular characterization of adaptation in local populations. Curr. Opin. Genet. Dev. 12:683–687.

    Scotti-Saintagne, C., S. Mariette, I. Porth, P. G. Goicoechea, T. Barreneche, C. Bodénès, K. Burg, and A. Kremer. 2004. Genome scanning for interspecific differentiation between two closely related oak species . Genetics 168:1615–1626.

    Slettan, A., I. Olsaker, and O. Lie. 1995. Atlantic salmon, Salmo salar, microsatellites at the Ssosl25, Ssosl85, Ssosl311, Ssosl417 loci. Anim. Genet. 26:281–282.

    ———. 1996. Polymorphic Atlantic salmon, Salmo salar L., microsatellites at the SSOSL438, SSOSL439 and SSOSL444 loci. Anim. Genet. 27:57–58.

    Stet, R. J. M., B. de Vries, K. Mudde, T. Hermsen, J. van Heerwaarden, B. P. Shum, and U. Grimholt. 2002. Unique haplotypes of co-segregating major histocompatibility class II A and class II B alleles in Atlantic salmon (Salmo salar) give rise to diverse class II genotypes. Immunogenetics 54:320–331.

    Storz, J. F. 2005. Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol. Ecol. (in press).

    Storz, J. F., B. A. Payseur, and M. W. Nachman. 2004. Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol. Biol. Evol. 21:1800–1811.

    Tao, W. J., and E. G. Boulding. 2003. Associations between single nucleotide polymorphisms in candidate genes and growth rate in Arctic charr (Salvelinus alpinus L.). Heredity 91:60–69.

    Taylor, E. B. 1991. A review of local adaptation in Salmonidae, with particular reference to Pacific and Atlantic Salmon. Aquaculture 98:185–207.

    Tonteri, A., S. Titov, A. Veselov et al. (10 co-authors). 2005. Phylogeography of anadromous and non-anadromous Atlantic salmon (Salmo salar) from northern Europe. Ann. Zool. Fenn. 42:1–22.

    Vasem?gi, A., R. Gross, T. Paaver, M-L. Koljonen, M. S?is?, and J. Nilsson. Analysis of gene associated tandem repeat markers in Atlantic salmon (Salmo salar L.) populations: implications for restoration and conservation in the Baltic Sea. Conserv. Genet. (in press).

    Vasem?gi, A., J. Nilsson, and C. Primmer. Seventy five EST-linked Atlantic salmon (Salmo salar L.) microsatellite markers and their cross-amplification in five salmonid species. Mol. Ecol. Notes (in press).

    Verspoor, E., and W. C. Jordan. 1989. Genetic variation at the Me–2 locus in the Atlantic salmon within and between rivers: evidence for its selective maintenance. J. Fish Biol. 35A:205–213.

    Vigouroux, Y., M. McMullen, C. T. Hittinger, K. Houchins, L. Schulz, S. Kresovich, Y. Matsuoka, and J. Doebley. 2002. Identifying genes of agronomic importance in maize by screening microsatellites for evidence of selection during domestication. Proc. Natl. Acad. Sci. USA 99:9650–9655.

    Vitalis, R., K. Dawson, and P. Boursot. 2001. Interpretation of variation across marker loci as evidence of selection. Genetics 158:1811–1823.

    Vitalis, R., K. Dawson, P. Boursot, and K. Belkhir. 2003. DetSel 1.0: a computer program to detect markers responding to selection. J. Hered. 94:429–431.

    Weir, B. S., and C. C. Cockerham. 1984. Estimating F–statistics for the analysis of population structure. Evolution 38:1358–1370.

    Wiehe, T. 1998. The effect of selective sweeps on the variance of the allele distribution of a linked multiallele locus: hitchhiking of microsatellites. Theor. Popul. Biol. 53:272–283.

    Wilding, C. S., R. K. Butlin, and J. Grahame. 2001. Differential gene exchange between parapatric morphs of Littorina saxatilis detected using AFLP markers. J. Evol. Biol. 14:611–619.(Anti Vasem?gi*,,1, Jan Ni)