Development of the Expressed Ig CDR-H3 Repertoire Is Marked by Focusing of Constraints in Length, Amino Acid Use, and Charge That Are First
http://www.100md.com
免疫学杂志 2005年第12期
Abstract
To gain insight into the mechanisms that regulate the development of the H chain CDR3 (CDR-H3), we used the scheme of Hardy to sort mouse bone marrow B lineage cells into progenitor, immature, and mature B cell fractions, and then performed sequence analysis on VH7183-containing Cμ transcripts. The essential architecture of the CDR-H3 repertoire observed in the mature B cell fraction F was already established in the early pre-B cell fraction C. These architectural features include VH gene segment use preference, DH family usage, JH rank order, predicted structures of the CDR-H3 base and loop, and the amino acid composition and average hydrophobicity of the CDR-H3 loop. With development, the repertoire was focused by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity. Unlike humans, the average length of CDR-H3 increased during development. The majority of this increase came from enhanced preservation of JH sequence. This was associated with an increase in the prevalence of tyrosine. With an accompanying increase in glycine, a shift in hydrophobicity was observed in the CDR-H3 loop from near neutral in fraction C (–0.08 ± 0.03) to mild hydrophilic in fraction F (–0.17 ± 0.02). Fundamental constraints on the sequence and structure of CDR-H3 are thus established before surface IgM expression.
Introduction
In jawed vertebrates, the adaptive immune system is characterized by the exponential diversity of its Ag receptors (1, 2, 3, 4, 5). In contrast to the receptors of the innate immune system that bind relatively invariant pathogen-associated epitopes (6), diverse Ag receptor repertoires allow recognition of novel or divergent epitopes on pathogens or toxins.
The diversity of Ig, the BCR, is primarily the property of the V domains of the H and L chains (1, 2, 3, 4, 5). Diversity is asymmetrically distributed within each V domain (7, 8). In the primary sequence, three intervals of hypervariability, termed CDRs, are separated from each other by four relatively conserved intervals, termed framework regions (FRs).4 In the native form of the Ab, the FRs create a scaffold that supports the H and L chain CDRs. These CDRs are juxtaposed to form the Ag binding site. CDR-H1, -H2, -L1, and -L2 create the outside border; CDR-L3 forms the base; and CDR-H3 lies at the center of this Ag binding site. CDR-H1, -H2, -L1, and -L2 are entirely encoded by the V gene segment, and are thus initially restricted to germline sequence, whereas CDR-L3 and -H3 are created de novo by VLJL and VHDHJH joining, respectively. The inclusion of a D gene segment and the addition of nongermline-encoded nucleotides (N regions) vastly enhance the potential for both combinatorial and somatic diversity of CDR-H3. Enhanced diversity and a central position within the Ag binding site allow CDR-H3 to often play a critical role in the recognition of Ag (7, 8).
The composition of the functional CDR-H3 repertoire is biased in length, amino acid composition, predicted loop and base structure, and charge (9). The distribution of lengths of both murine and human CDR-H3 forms normal Gaussian curves with differing means, suggesting that each species achieves its own preferred CDR-H3 length (9, 10). The average hydrophobicity of the amino acids within the CDR-H3 loop also forms a Gaussian distribution centering on neutrality to mild hydrophilicity (11). This neutral, hydrophilic preference reflects enrichment for tyrosine and glycine residues in the CDR-H3 loop in excess of that which would be predicted by random chance alone (9, 11, 12).
Construction of CDR-H3 begins early in B cell progenitors. The various defined stages of B cell development can be viewed, in part, as transitions through a series of checkpoints that test the assembly and function of the V domains (13, 14, 15). A number of studies have established that the CDR-H3 plays a crucial role in these selection processes (16, 17, 18). In humans, repertoire selection during B cell development is associated with a reduction in the distribution and mean length of the expressed CDR-H3 repertoire (19), and loss of highly charged or hydrophobic sequences (20, 21, 22). It has been proposed that the loss of longer sequences as well as those that are enriched for charged amino acids reflects a higher likelihood of self-reactivity in the Igs that bear them (22).
To gain insight into the mechanisms used to regulate the Ab repertoire, to determine when during development constraints on CDR-H3 composition are imposed, and to establish the extent to which murine development resembles that of humans, we sought to establish the pattern of CDR-H3 repertoire development in mice bearing an IgMa H chain repertoire. We used the scheme of Hardy (14) to sort bone marrow B lineage cells into progenitor, immature, and mature B cell fractions. We then cloned, sequenced, and deconstructed the CDR-H3 component of VH7183DJCμ transcripts. We chose to look at RNA message, as this is most representative of the expressed, and thus functional, Ig repertoire. We focused on the VH7183 family because its germline complement in IgHa alleles has been well defined (23); it represents a manageable 10% of the active repertoire (24); patterns of VH7183 use during ontogeny and development have been well established (23, 25, 26); and it contributes to both self and nonself reactivities (reviewed in Ref.27).
We show in this study that the essential architecture of the CDR-H3 repertoire, including patterns of gene segment use, amino acid composition, charge, predicted base and loop structure, and average length, is established very early in B cell development, well before the expression of surface IgM. Development appears to focus the repertoire by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity.
Materials and Methods
Statistical analysis
Differences between populations were assessed, where appropriate, by Student’s t test, two tailed; Fisher’s exact test, two tailed; 2; or the Levene test for the homogeneity of variance. Analysis was performed with JMP IN version 5.1 (SAS Institute). Means are accompanied by the SEM.
Results
Cells within the Hardy bone marrow fractions B-F were sorted using the gates shown in Fig. 2. A total of 707 transcripts was sequenced, of which 649 (92%) were unique. Of these, 619 (95%) contained in-frame, open rearrangements. By fraction, there were 66 sequences from B (pro-B), 192 sequences from C (early pre-B), 131 sequences from D (late pre-B), 121 sequences from E (immature B), and 109 sequences from F (mature B).
Preferential use of VH7183.10 is established early in B cell development
In accordance with previous studies by other investigators (23, 25, 26, 29), the prevalence of VH7183.1 (VH81X) declined with development (BF; p < 0.001). VH81X represented 38% of the fraction B sequences, 23% of the fraction C sequences (BC; p = 0.03), 7% of the fraction D sequences (CD; p < 0.001), 10% of the fraction E sequences (DE; p = 0.52), and 2% of the fraction F sequences (EF; p = 0.02) (Fig. 3A).
VH7183.10 was the most commonly used VH7183 gene segment in fractions C, D, E, and F. VH7183.10 increased from 5% in fraction B to 21% in fraction C (p < 0.01), and then remained relatively unchanged in fractions D, E, and F (31, 20, and 22%; p = 1.0). Changes in the prevalence of VH gene segments other than VH7183.10 were also observed, but none of these individual changes achieved statistical significance.
Patterns of DH use remain relatively unchanged with development
Use of the various DH families did not undergo a significant change with development (Fig. 3B). Using a minimum of 5 nt of identity to assign germline DH origin, we identified members of the DSP and DFL families in 50 and 30% of the transcripts, respectively. DQ52 was used in 4–10% of the transcripts, and DST4 contributed to <2% of the sequences. Due to exonucleolytic nibbling and N addition, we were unable to identify a DH progenitor in the remaining transcripts. The DFL16.1 gene segment was the single most commonly used DH gene segment at all stages of development, representing 20% of sequences in all of the fractions.
Increased prevalence of reading frame 2 in fraction B sequences
A shift in reading frame prevalence was observed during B cell development (Fig. 3C). We identified 53 fraction B, 152 fraction C, 88 fraction D, 92 fraction E, and 88 fraction F sequences that contained identifiable DFL or DSP gene segments. Reading frame 1 was the predominant reading frame at all stages of B cell development. However, use of reading frame 1 increased from 57% in fraction B to 70% in fraction C, 68% in fraction D, 78% in fraction E, and 78% in fraction F. Use of reading frame 3 decreased from 17% in fraction B to 12% in fraction C, 18% in fraction D, 11% in fraction E, and 9% in fraction F. Use of reading frame 2 began at 26% in fraction B, and then decreased to 18% in fraction C, 14% in fraction D, 11% in fraction E, and 13% in fraction F. The change in distribution of reading frames between fractions B and F was significant at p = 0.02.
Reading frame 3 typically encodes one or more termination codons. Functional sequences containing RF3 were significantly shorter (11.7 ± 0.3 codons) than those using RF1 (12.7 ± 0.1; p = 0.004) and RF2 (12.5 ± 0.3; p = 0.05). No significant differences were observed in the average length of RF1- and RF2-containing sequences (p = 0.60).
Increased use of JH1 in the transition to the immature B cell stage
A shift in rank order of JH use was observed in the fraction BC transition (Fig. 3D). In fraction B, JH2 (31%) and JH3 (31%) were the most frequently used JH, followed by JH4 (27%) and JH1 (4%). In fractions C through F, JH4 was the most commonly used sequence (35–40%), followed by relatively equivalent use of JH3 (26–27%) and JH2 (21–28%), and then JH1 (7–16%). The rise in the use of JH1 from fractions B (4%), through C (7%), to D (16%) reached statistical significance (p < 0.03; 2). Use of JH1 then remained relatively stable (14 and 12% in fractions E and F, respectively).
An increase in average CDR-H3 length with development
The average length of CDR-H3 increased during development from an average of 11.4 ± 0.3 in fraction B to 12.5 ± 0.2 in fraction F (p = 0.01) (Fig. 4A). Mouse DH sequences differ in length. To assess the contribution of the identity and length of the DH on the length of CDR-H3, we compared the average lengths of sequences that contained DFL16.1 with those with DSP gene family members, DQ52, or no identifiable D gene segment, respectively (Fig. 4A). DFL16.1 contains 23 nt, DFL16.2 and the DSP gene segments are two codons shorter with 17 nt, DST4 contains 16 nt, and DQ52 is 4 codons shorter with only 11 nt. For sequences containing DFL16.1, the average length increased from 12.6 ± 0.6 in fraction B to 13.6 ± 0.6 in fraction F (p = 0.29). For sequences containing DSP family members, the length increased from 11.7 ± 0.5 in fraction B to 12.9 ± 0.2 in fraction F (p = 0.01; Student’s t test). For sequences containing DQ52, the length increased from 9.0 ± 0.7 to 11.0 ± 0.5 (p = 0.05).
DFL16.1-containing sequences remained statistically similar in length distribution throughout development (Fig. 4A). With development, DSP-containing sequences converged in length to the DFL16.1 standard, whereas DQ52-containing sequences retained a length disadvantage. In fraction B, sequences containing DFL16.1 were 0.9 codons longer than those that contained DSP gene segments (p = 0.31) and 3.5 codons longer than those that contained DQ52 (p = 0.002). In fraction C, the length distribution of DFL16.1- and DSP-containing sequences significantly diverged (13.1 ± 0.4 vs 11.8 ± 0.3, respectively, p = 0.001). In fractions D and E, DSP-containing sequences increased in length to 12.5 ± 0.3 codons, while no significant changes in the length of DFL16.1-containing sequences were observed. By fraction F, the differences in DFL16.1- and DSP-containing sequences no longer achieved statistical significance (13.6 ± 0.6 vs 12.9 ± 0.2, respectively; p = 0.37). However, DFL16.1 sequences retained a length advantage over those with DQ52 (11.0 ± 0.5; p = 0.01).
The increase in length from fraction B to fraction F reflected, in part, a reduction in the prevalence of sequences whose CDR-H3 length was <9 aa (Fig. 5A). Due to the larger number of sequences, this was best observed in a comparison between fractions C and F. Of the 192 sequences in fraction C, 24 were 8 aa or less, whereas only 3 of 109 sequences were 8 aa or less in fraction F (p < 0.01). This also led to a significant narrowing in the variance of the distribution of lengths (p = 0.01; Levene).
The increase in CDR-H3 length reflected increased preservation of terminal JH sequence
Sequences containing identifiable DH gene segments were deconstructed to assess the contribution of VH, DH, and JH sequence, and of N addition and P junctions to the change in CDR-H3 length (Fig. 6). From fractions B to F, the average length increased by 3.4 nt (p = 0.007), or 1.1 codons. Minor increases in the contribution of the VH sequence (+0.2 nt) and N addition at the 5' and 3' junctions (+0.4 nt each), which reflected one-third of the increase, were observed. However, none of these rather subtle increases in length achieved statistical significance. In contrast, the contribution of JH germline sequence increased by 2.6 nt, or two-thirds of the total increase. On average, JH sequence contributed 10.7 ± 0.6 nt in fraction B and 13.3 ± 0.4 nt in fraction F (p < 0.001; Student’s t test).
The increase in average JH component length reflected both the increased use of JH1 and JH4 and enhanced preservation of 5' terminal nucleotides among sequences that used JH2 or JH3. The four JH sequences differ in length, with JH2 and JH3 contributing up to 14 nt each to CDR-H3, JH1 19 nt, and JH4 20 nt (Fig. 6). We examined the complete database of 619 unique sequences for the contribution of JH. The average length of the 22 sequences that contain JH1 and JH4 in fraction B was 12.8 ± 0.6 vs 12.9 ± 0.4 in 51 sequences from fraction F (p = 0.82). In these CDR-H3 intervals, JH1 and JH4 contributed 14.2 ± 0.9 nt in fraction B vs 15.4 ± 0.6 nt in fraction F (p = 0.25; Student’s t test). The average length of the 44 sequences containing JH2 and JH3 in fractions B was 10.7 ± 0.4 vs 12.1 ± 0.3 in 58 sequences from fraction F (p = 0.001). In these CDR-H3 intervals, JH2 and JH3 contributed 9.2 ± 0.6 nt in fraction B vs 10.9 ± 0.3 nt in fraction F (p = 0.0003).
The convergence in length distribution between DFL16.1- and DSP-containing sequences reflected a balance between an increase in the contribution of JH and loss of 5' terminal DFL16.1 sequence. At the point of greatest divergence in fraction C, the 44 sequences that contain DFL16.1 lost an average of 3.5 ± 0.4 5' terminal nucleotides vs 4.7 ± 0.3 for the 94 sequences that contain DSP (p = 0.04), giving DFL16.1 an average net gain of 1.2 germline nt relative to DSP. By fraction F, 5' terminal loss among the 20 sequences that contain DFL16.1 had increased to an average of 5.5 ± 0.6 nt vs 4.1 ± 0.4 for the 57 sequences that contain DSP (p = 0.05), yielding an average net loss of 1.4 nt relative to DSP. This is a net flip of 2.6 nt, or almost 1 codon.
Increased prevalence of tyrosine and glycine in fraction F
The general bias for tyrosine and glycine in the CDR-H3 loop was first apparent in fraction B and intensified during B cell development (Fig. 7). Of the 423 predicted aa in the CDR-H3 loops from fraction B, 135 (32%) were either tyrosine or glycine, whereas of the 814 predicted aa in the loops from fraction F, 324 (40%) were tyrosine or glycine (p = 0.025). Overall, the amino acid composition differed significantly between fraction B and fraction F (p = 0.01, 2, 19 degrees of freedom). Fraction B loops, for example, contained more hydrophobic amino acids (11% valine, isoleucine, or leucine) than fraction F (9%). However, these and other changes in the prevalence of individual amino acids did not achieve statistical significance.
Shifts in the prevalence of tyrosine and glycine, in length, and in N addition have been associated with changes in the distribution of the predicted structures of the CDR-H3 loop and base (9, 30). These types of changes have been observed as a function of ontogeny as well as of species origin. However, in the adult mouse sequences analyzed in this work, the stability in the relative prevalence of amino acid sequence, length, and N addition was accompanied by stability in the distribution of predicted base and loop structures (data not shown). No significant changes in predicted structure were observed from fractions B to F.
A shift in average hydrophobicity from near neutrality to hydrophilicity
To determine whether there was a global change in the distribution of hydrophobicity with development, we used a normalized Kyte-Doolittle scale (31, 32) to calculate the relative average hydrophobicity of the CDR-H3 loops (Fig. 4B). We observed a shift to neutrality from fraction B (–0.15 ± 0.04) to fraction C (–0.08 ± 0.03) that was followed by shift toward hydrophilicity in fractions D, E, and F (–0.15 ± 0.03, –0.14 ± 0.03, and –0.17 ± 0.02, respectively). The shift in average hydrophobicity from fraction C to D is significant at p = 0.05, and the shift from fraction C to F is significant at p = 0.02.
As in the case of length, the variance in average hydrophobicity decreased with development. This shift in variance was significant at p = 0.03 between fractions B and F, and at p < 0.01 between fractions C and F. The change in variance is due, in part, to the loss of sequences at the extremes (Fig. 5B). In fraction B, there was one sequence whose average hydrophobicity score was greater than 0.6, there were 13 in fraction C, 3 in fraction D, 4 in fraction E, and none in fraction F. Similarly, there were 3 sequences in fraction B with an average hydrophobicity score of less than –0.6, there were 6 in fraction C, 6 in fraction D, two in fraction E, and none in fraction F. The difference in the prevalence of sequences at the extreme in fraction C (19 of 211) vs fraction F (0 of 110) is significant at p < 0.01. A comparison of highly charged sequences between surface IgM– pre/pro-B cells (fractions B, C, and D) and surface IgM+ B cells (fractions E and F) is also significant at p < 0.05.
Discussion
We have shown in this study that the major patterns of VH, DH, and JH use in the expressed repertoire are already established in progenitor B cells, and thus before the expression of membrane-bound IgM. The relative prevalence of the various DH families in fraction B remained relatively unchanged from fraction B through fraction F. The rank order of JH prevalence that was first established in fraction C was maintained through fraction F. Consistent with previous reports that focused on rearrangement preference (23, 25, 26, 29), we found a high frequency of transcripts using the VH81X (VH7183.1) gene segment in fraction B. The prevalence of VH81X then steadily diminished in the progression from fraction C to fraction F. A preference for VH7183.10 was established in fraction C, and its prevalence remained remarkably stable from fraction C to fraction F. Thus, the dominant pattern of VH, DH, and JH use remained essentially unchanged from fraction C to F.
Fraction C includes cells that are still at the intermediate DJ rearrangement stage, early pre-B cells that have rearranged their VDJ locus, and pre-B cells that express the pre-BCR (15, 33). Active translation of mRNA is associated with increased transcript abundance due to stabilization of the mRNA by polysomes, a process that is enhanced by B cell activation (34, 35, 36). If the successful assembly of the pre-BCR activates early pre-B cells, it is possible that assembly may similarly enhance μ mRNA abundance and thereby stabilize VH and JH preference. Testing of this hypothesis will require detailed sequence analysis of transcripts and mRNA message abundance from fraction C cells that have been separated on the basis of pre-BCR expression.
Our work confirms and extends a previous observation regarding the enhanced use of DH reading frame 2 at the earliest stages of B cell development (25). The primary mechanism postulated to limit use of reading frame 2 in the expressed repertoire is of the ability of Dμ protein to create a pre-BCR complex (37, 38), which can then activate the allelic exclusion signal transduction pathway to prevent further VDJ rearrangement. We speculate that all of the components of this pathway may not be fully active in fraction B, allowing reading frame 2 DJ rearrangements to undergo VDJ recombination. Rearrangement at stages lacking an intact pre-BCR signal transduction complex may represent a mechanism by which suboptimal H chain V domains may enter the immature B cell repertoire.
Unlike humans, in which the average length of CDR-H3 decreases with development (37, 38), we observed an increase in the average length of CDR-H3 with development in mice. On average, however, mouse sequences are significantly shorter than human ones (9). Thus, even in fraction F, the mice lacked the longer CDR-H3 sequences that have been associated with enhanced self-reactivity in humans.
As with humans (19), the changes in sequence that contribute to the increase in length can be subtle. Although the only significant changes affecting overall CDR-H3 length were the enhanced retention of 5' JH terminal sequence and the increased loss of 5' DFL16.1 sequence, given the number of sequences analyzed it is possible that our analysis was underpowered to confirm the statistical significance of other more delicate differences. Adjustments to JH and DH length with development and with ontogeny are also observed in humans, macaques, and chimpanzees (19, 30). They appear to represent common mechanisms used to respond to potentially common selective forces.
The increase in average length of CDR-H3 with development occurred in association with decreased representation of outlier lengths. Although our analysis cannot distinguish whether this phenomenon results from the loss of outliers or positive selection for sequences with lengths closer to the final average length of CDR-H3 in the mature B cell fraction F, the net effect is to decrease the range of diversity, thereby focusing the mature B cell repertoire into what appears to be a preferred range.
As with length, the broad outline of amino acid preference was also established early in B lineage development. Preference for tyrosine and glycine in the CDR-H3 loop was already established in fraction B. With development, the representation of these two amino acids was enhanced. The adjustments to CDR-H3 length, the loss of 5' terminal DFL16.1 and the gain of 5' terminal JH sequence appear to play a role in this process. The sequences of the CDR-H3 loop portion of the four JH gene segments encode YWY, Y, W, and YYA, respectively. Thus, tyrosine represents 50% of the JH amino acids that can contribute to the CDR-H3 loop. Enhanced preservation of 5' JH sequence will enrich for tyrosine at the C terminus of the loop. In reading frame 1, the 5' terminus of DFL16.1 encodes tyrosine. The increased loss of 5' nucleotides with development has the effect of decreasing representation of tyrosine at the amino terminus of the DFL16.1-containing loops. In humans, there is an asymmetric distribution of tyrosine in the CDR-H3 loop, with the C terminus enriched for tyrosine (9). The tyrosine gradient increases with increasing loop length. A similar tyrosine gradient is not obvious in mice. However, DFL16.1 gene segment-containing sequences are among the longest CDR-H3 loops. The combination of loss of 5' DFL16.1-encoded tyrosine and gain of 3' JH-encoded tyrosine may act to prevent a relative enrichment for tyrosine at the amino terminus of the loop. An excess of tyrosine at the beginning of the CDR-H3 loop may be as undesirable in mice as in humans.
The average length of sequences that contain DQ52 converges toward the average for sequences in which the contribution of DH cannot be ascertained. Both of these types of short sequences are less likely to contain tyrosine in the CDR-H3 loop, although glycine is common (data not shown). The role of short, glycine-enriched, tyrosine-depleted CDR-H3 structures is unknown, but they appear clearly distinct from CDR-H3 structures that contain extensive sequence derived from the DFL and DSP gene segments, which are longer and enriched for tyrosine as well as glycine.
The presence of excess charged amino acids has been correlated with self-reactivity (22, 39, 40, 41), especially to dsDNA. Sequences containing charged amino acids have been reported to be sequentially purged from the repertoire (22). As with length and amino acid use, we observed an adjustment and focusing of the charge distribution of CDR-H3 during B cell development. This reflected not only a decrease in the average charge, but also a significant reduction in the contribution of highly charged or highly hydrophobic sequences.
Together, these data show that the essential architecture of the CDR-H3 repertoire, including gene segment use, length, structure, amino acid composition, and average hydrophobicity, is established early in B cell development before the surface expression of membrane-bound IgM. With development, many of these features are fine-tuned and focused into an apparent optimal range of lengths, charge, and amino acid composition.
These observations raise a series of open questions. DH reading frame 1 sequences encode the same amino acids that dominate CDR-H3 throughout bone marrow development. Does germline conservation of DH sequence serve as the deciding factor in dictating the composition of CDR-H3? If germline DH sequence plays a critical role in regulating CDR-H3, what role do individual DH sequences play in the development of the repertoire and the ability to mount humoral immune responses? What are the specific selective pressures and the mechanisms that serve to fine-tune the composition of CDR-H3 as the B cell population matures? And finally, what might be the functional consequences of violating these apparent constraints on CDR-H3 composition? The answer to these questions will require targeted manipulation of the germline DH locus, which may shed new light on the role of the DH gene segment in navigating B cell developmental checkpoints and optimizing immune function.
Footnotes
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
1 This work was supported by National Institutes of Health Grants AI42732 (to H.W.S.), AI48115 (to H.W.S.), and HD043327 (to R.L.S.), and P.E. Kempkes-Stiftung (to M.Z.).
2 I.I.I. and R.L.S. contributed equally to the preparation of this manuscript.
3 Address correspondence and reprint requests to Dr. Harry W. Schroeder, Jr., WTI 378, 1530 3rd Avenue South, Birmingham, AL 35294-3300. E-mail address: Harry.Schroeder{at}ccc.uab.edu
4 Abbreviation used in this paper: FR, framework region.
Received for publication December 22, 2004. Accepted for publication March 22, 2005.
References
Tonegawa, S.. 1983. Somatic generation of antibody diversity. Nature 302: 575-581.[Medline]
Alt, F. W., D. Baltimore. 1982. Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-J heavy fusions. Proc. Natl. Acad. Sci. USA 79: 4118-4122.
Rajewsky, K.. 1996. Clonal selection and learning in the antibody system. Nature 381: 751-758.
Hood, L., D. Galas. 2003. The digital code of DNA. Nature 421: 444-448.
Nossal, G. J. V.. 2003. The double helix and immunology. Nature 421: 440-444.
Janeway, C. A., Jr, R. Medzhitov. 2000. Innate immune recognition. Annu. Rev. Immunol. 20: 197-216.
Kabat, E. A., T. T. Wu, H. M. Perry, K. S. Gottesman, and C. Foeller. Sequences of Proteins of Immunological Interest. U.S. Department of Health and Human Services, Bethesda, pp. 1–2387..
Padlan, E. A.. 1994. Anatomy of the antibody molecule. Mol. Immunol. 31: 169-217.
Zemlin, M., M. Klinger, J. Link, C. Zemlin, K. Bauer, J. A. Engler, H. W. Schroeder, Jr, P. M. Kirkham. 2003. Expressed murine and human CDR-H3 intervals of equal length exhibit distinct repertoires that differ in their amino acid composition and predicted range of structures. J. Mol. Biol. 334: 733-749
Wu, T. T., G. Johnson, E. A. Kabat. 1993. Length distribution of CDRH3 in antibodies. Proteins Struct. Funct. Genet. 16: 1-7.(Ivaylo I. Ivanov2,*, Robe)
To gain insight into the mechanisms that regulate the development of the H chain CDR3 (CDR-H3), we used the scheme of Hardy to sort mouse bone marrow B lineage cells into progenitor, immature, and mature B cell fractions, and then performed sequence analysis on VH7183-containing Cμ transcripts. The essential architecture of the CDR-H3 repertoire observed in the mature B cell fraction F was already established in the early pre-B cell fraction C. These architectural features include VH gene segment use preference, DH family usage, JH rank order, predicted structures of the CDR-H3 base and loop, and the amino acid composition and average hydrophobicity of the CDR-H3 loop. With development, the repertoire was focused by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity. Unlike humans, the average length of CDR-H3 increased during development. The majority of this increase came from enhanced preservation of JH sequence. This was associated with an increase in the prevalence of tyrosine. With an accompanying increase in glycine, a shift in hydrophobicity was observed in the CDR-H3 loop from near neutral in fraction C (–0.08 ± 0.03) to mild hydrophilic in fraction F (–0.17 ± 0.02). Fundamental constraints on the sequence and structure of CDR-H3 are thus established before surface IgM expression.
Introduction
In jawed vertebrates, the adaptive immune system is characterized by the exponential diversity of its Ag receptors (1, 2, 3, 4, 5). In contrast to the receptors of the innate immune system that bind relatively invariant pathogen-associated epitopes (6), diverse Ag receptor repertoires allow recognition of novel or divergent epitopes on pathogens or toxins.
The diversity of Ig, the BCR, is primarily the property of the V domains of the H and L chains (1, 2, 3, 4, 5). Diversity is asymmetrically distributed within each V domain (7, 8). In the primary sequence, three intervals of hypervariability, termed CDRs, are separated from each other by four relatively conserved intervals, termed framework regions (FRs).4 In the native form of the Ab, the FRs create a scaffold that supports the H and L chain CDRs. These CDRs are juxtaposed to form the Ag binding site. CDR-H1, -H2, -L1, and -L2 create the outside border; CDR-L3 forms the base; and CDR-H3 lies at the center of this Ag binding site. CDR-H1, -H2, -L1, and -L2 are entirely encoded by the V gene segment, and are thus initially restricted to germline sequence, whereas CDR-L3 and -H3 are created de novo by VLJL and VHDHJH joining, respectively. The inclusion of a D gene segment and the addition of nongermline-encoded nucleotides (N regions) vastly enhance the potential for both combinatorial and somatic diversity of CDR-H3. Enhanced diversity and a central position within the Ag binding site allow CDR-H3 to often play a critical role in the recognition of Ag (7, 8).
The composition of the functional CDR-H3 repertoire is biased in length, amino acid composition, predicted loop and base structure, and charge (9). The distribution of lengths of both murine and human CDR-H3 forms normal Gaussian curves with differing means, suggesting that each species achieves its own preferred CDR-H3 length (9, 10). The average hydrophobicity of the amino acids within the CDR-H3 loop also forms a Gaussian distribution centering on neutrality to mild hydrophilicity (11). This neutral, hydrophilic preference reflects enrichment for tyrosine and glycine residues in the CDR-H3 loop in excess of that which would be predicted by random chance alone (9, 11, 12).
Construction of CDR-H3 begins early in B cell progenitors. The various defined stages of B cell development can be viewed, in part, as transitions through a series of checkpoints that test the assembly and function of the V domains (13, 14, 15). A number of studies have established that the CDR-H3 plays a crucial role in these selection processes (16, 17, 18). In humans, repertoire selection during B cell development is associated with a reduction in the distribution and mean length of the expressed CDR-H3 repertoire (19), and loss of highly charged or hydrophobic sequences (20, 21, 22). It has been proposed that the loss of longer sequences as well as those that are enriched for charged amino acids reflects a higher likelihood of self-reactivity in the Igs that bear them (22).
To gain insight into the mechanisms used to regulate the Ab repertoire, to determine when during development constraints on CDR-H3 composition are imposed, and to establish the extent to which murine development resembles that of humans, we sought to establish the pattern of CDR-H3 repertoire development in mice bearing an IgMa H chain repertoire. We used the scheme of Hardy (14) to sort bone marrow B lineage cells into progenitor, immature, and mature B cell fractions. We then cloned, sequenced, and deconstructed the CDR-H3 component of VH7183DJCμ transcripts. We chose to look at RNA message, as this is most representative of the expressed, and thus functional, Ig repertoire. We focused on the VH7183 family because its germline complement in IgHa alleles has been well defined (23); it represents a manageable 10% of the active repertoire (24); patterns of VH7183 use during ontogeny and development have been well established (23, 25, 26); and it contributes to both self and nonself reactivities (reviewed in Ref.27).
We show in this study that the essential architecture of the CDR-H3 repertoire, including patterns of gene segment use, amino acid composition, charge, predicted base and loop structure, and average length, is established very early in B cell development, well before the expression of surface IgM. Development appears to focus the repertoire by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity.
Materials and Methods
Statistical analysis
Differences between populations were assessed, where appropriate, by Student’s t test, two tailed; Fisher’s exact test, two tailed; 2; or the Levene test for the homogeneity of variance. Analysis was performed with JMP IN version 5.1 (SAS Institute). Means are accompanied by the SEM.
Results
Cells within the Hardy bone marrow fractions B-F were sorted using the gates shown in Fig. 2. A total of 707 transcripts was sequenced, of which 649 (92%) were unique. Of these, 619 (95%) contained in-frame, open rearrangements. By fraction, there were 66 sequences from B (pro-B), 192 sequences from C (early pre-B), 131 sequences from D (late pre-B), 121 sequences from E (immature B), and 109 sequences from F (mature B).
Preferential use of VH7183.10 is established early in B cell development
In accordance with previous studies by other investigators (23, 25, 26, 29), the prevalence of VH7183.1 (VH81X) declined with development (BF; p < 0.001). VH81X represented 38% of the fraction B sequences, 23% of the fraction C sequences (BC; p = 0.03), 7% of the fraction D sequences (CD; p < 0.001), 10% of the fraction E sequences (DE; p = 0.52), and 2% of the fraction F sequences (EF; p = 0.02) (Fig. 3A).
VH7183.10 was the most commonly used VH7183 gene segment in fractions C, D, E, and F. VH7183.10 increased from 5% in fraction B to 21% in fraction C (p < 0.01), and then remained relatively unchanged in fractions D, E, and F (31, 20, and 22%; p = 1.0). Changes in the prevalence of VH gene segments other than VH7183.10 were also observed, but none of these individual changes achieved statistical significance.
Patterns of DH use remain relatively unchanged with development
Use of the various DH families did not undergo a significant change with development (Fig. 3B). Using a minimum of 5 nt of identity to assign germline DH origin, we identified members of the DSP and DFL families in 50 and 30% of the transcripts, respectively. DQ52 was used in 4–10% of the transcripts, and DST4 contributed to <2% of the sequences. Due to exonucleolytic nibbling and N addition, we were unable to identify a DH progenitor in the remaining transcripts. The DFL16.1 gene segment was the single most commonly used DH gene segment at all stages of development, representing 20% of sequences in all of the fractions.
Increased prevalence of reading frame 2 in fraction B sequences
A shift in reading frame prevalence was observed during B cell development (Fig. 3C). We identified 53 fraction B, 152 fraction C, 88 fraction D, 92 fraction E, and 88 fraction F sequences that contained identifiable DFL or DSP gene segments. Reading frame 1 was the predominant reading frame at all stages of B cell development. However, use of reading frame 1 increased from 57% in fraction B to 70% in fraction C, 68% in fraction D, 78% in fraction E, and 78% in fraction F. Use of reading frame 3 decreased from 17% in fraction B to 12% in fraction C, 18% in fraction D, 11% in fraction E, and 9% in fraction F. Use of reading frame 2 began at 26% in fraction B, and then decreased to 18% in fraction C, 14% in fraction D, 11% in fraction E, and 13% in fraction F. The change in distribution of reading frames between fractions B and F was significant at p = 0.02.
Reading frame 3 typically encodes one or more termination codons. Functional sequences containing RF3 were significantly shorter (11.7 ± 0.3 codons) than those using RF1 (12.7 ± 0.1; p = 0.004) and RF2 (12.5 ± 0.3; p = 0.05). No significant differences were observed in the average length of RF1- and RF2-containing sequences (p = 0.60).
Increased use of JH1 in the transition to the immature B cell stage
A shift in rank order of JH use was observed in the fraction BC transition (Fig. 3D). In fraction B, JH2 (31%) and JH3 (31%) were the most frequently used JH, followed by JH4 (27%) and JH1 (4%). In fractions C through F, JH4 was the most commonly used sequence (35–40%), followed by relatively equivalent use of JH3 (26–27%) and JH2 (21–28%), and then JH1 (7–16%). The rise in the use of JH1 from fractions B (4%), through C (7%), to D (16%) reached statistical significance (p < 0.03; 2). Use of JH1 then remained relatively stable (14 and 12% in fractions E and F, respectively).
An increase in average CDR-H3 length with development
The average length of CDR-H3 increased during development from an average of 11.4 ± 0.3 in fraction B to 12.5 ± 0.2 in fraction F (p = 0.01) (Fig. 4A). Mouse DH sequences differ in length. To assess the contribution of the identity and length of the DH on the length of CDR-H3, we compared the average lengths of sequences that contained DFL16.1 with those with DSP gene family members, DQ52, or no identifiable D gene segment, respectively (Fig. 4A). DFL16.1 contains 23 nt, DFL16.2 and the DSP gene segments are two codons shorter with 17 nt, DST4 contains 16 nt, and DQ52 is 4 codons shorter with only 11 nt. For sequences containing DFL16.1, the average length increased from 12.6 ± 0.6 in fraction B to 13.6 ± 0.6 in fraction F (p = 0.29). For sequences containing DSP family members, the length increased from 11.7 ± 0.5 in fraction B to 12.9 ± 0.2 in fraction F (p = 0.01; Student’s t test). For sequences containing DQ52, the length increased from 9.0 ± 0.7 to 11.0 ± 0.5 (p = 0.05).
DFL16.1-containing sequences remained statistically similar in length distribution throughout development (Fig. 4A). With development, DSP-containing sequences converged in length to the DFL16.1 standard, whereas DQ52-containing sequences retained a length disadvantage. In fraction B, sequences containing DFL16.1 were 0.9 codons longer than those that contained DSP gene segments (p = 0.31) and 3.5 codons longer than those that contained DQ52 (p = 0.002). In fraction C, the length distribution of DFL16.1- and DSP-containing sequences significantly diverged (13.1 ± 0.4 vs 11.8 ± 0.3, respectively, p = 0.001). In fractions D and E, DSP-containing sequences increased in length to 12.5 ± 0.3 codons, while no significant changes in the length of DFL16.1-containing sequences were observed. By fraction F, the differences in DFL16.1- and DSP-containing sequences no longer achieved statistical significance (13.6 ± 0.6 vs 12.9 ± 0.2, respectively; p = 0.37). However, DFL16.1 sequences retained a length advantage over those with DQ52 (11.0 ± 0.5; p = 0.01).
The increase in length from fraction B to fraction F reflected, in part, a reduction in the prevalence of sequences whose CDR-H3 length was <9 aa (Fig. 5A). Due to the larger number of sequences, this was best observed in a comparison between fractions C and F. Of the 192 sequences in fraction C, 24 were 8 aa or less, whereas only 3 of 109 sequences were 8 aa or less in fraction F (p < 0.01). This also led to a significant narrowing in the variance of the distribution of lengths (p = 0.01; Levene).
The increase in CDR-H3 length reflected increased preservation of terminal JH sequence
Sequences containing identifiable DH gene segments were deconstructed to assess the contribution of VH, DH, and JH sequence, and of N addition and P junctions to the change in CDR-H3 length (Fig. 6). From fractions B to F, the average length increased by 3.4 nt (p = 0.007), or 1.1 codons. Minor increases in the contribution of the VH sequence (+0.2 nt) and N addition at the 5' and 3' junctions (+0.4 nt each), which reflected one-third of the increase, were observed. However, none of these rather subtle increases in length achieved statistical significance. In contrast, the contribution of JH germline sequence increased by 2.6 nt, or two-thirds of the total increase. On average, JH sequence contributed 10.7 ± 0.6 nt in fraction B and 13.3 ± 0.4 nt in fraction F (p < 0.001; Student’s t test).
The increase in average JH component length reflected both the increased use of JH1 and JH4 and enhanced preservation of 5' terminal nucleotides among sequences that used JH2 or JH3. The four JH sequences differ in length, with JH2 and JH3 contributing up to 14 nt each to CDR-H3, JH1 19 nt, and JH4 20 nt (Fig. 6). We examined the complete database of 619 unique sequences for the contribution of JH. The average length of the 22 sequences that contain JH1 and JH4 in fraction B was 12.8 ± 0.6 vs 12.9 ± 0.4 in 51 sequences from fraction F (p = 0.82). In these CDR-H3 intervals, JH1 and JH4 contributed 14.2 ± 0.9 nt in fraction B vs 15.4 ± 0.6 nt in fraction F (p = 0.25; Student’s t test). The average length of the 44 sequences containing JH2 and JH3 in fractions B was 10.7 ± 0.4 vs 12.1 ± 0.3 in 58 sequences from fraction F (p = 0.001). In these CDR-H3 intervals, JH2 and JH3 contributed 9.2 ± 0.6 nt in fraction B vs 10.9 ± 0.3 nt in fraction F (p = 0.0003).
The convergence in length distribution between DFL16.1- and DSP-containing sequences reflected a balance between an increase in the contribution of JH and loss of 5' terminal DFL16.1 sequence. At the point of greatest divergence in fraction C, the 44 sequences that contain DFL16.1 lost an average of 3.5 ± 0.4 5' terminal nucleotides vs 4.7 ± 0.3 for the 94 sequences that contain DSP (p = 0.04), giving DFL16.1 an average net gain of 1.2 germline nt relative to DSP. By fraction F, 5' terminal loss among the 20 sequences that contain DFL16.1 had increased to an average of 5.5 ± 0.6 nt vs 4.1 ± 0.4 for the 57 sequences that contain DSP (p = 0.05), yielding an average net loss of 1.4 nt relative to DSP. This is a net flip of 2.6 nt, or almost 1 codon.
Increased prevalence of tyrosine and glycine in fraction F
The general bias for tyrosine and glycine in the CDR-H3 loop was first apparent in fraction B and intensified during B cell development (Fig. 7). Of the 423 predicted aa in the CDR-H3 loops from fraction B, 135 (32%) were either tyrosine or glycine, whereas of the 814 predicted aa in the loops from fraction F, 324 (40%) were tyrosine or glycine (p = 0.025). Overall, the amino acid composition differed significantly between fraction B and fraction F (p = 0.01, 2, 19 degrees of freedom). Fraction B loops, for example, contained more hydrophobic amino acids (11% valine, isoleucine, or leucine) than fraction F (9%). However, these and other changes in the prevalence of individual amino acids did not achieve statistical significance.
Shifts in the prevalence of tyrosine and glycine, in length, and in N addition have been associated with changes in the distribution of the predicted structures of the CDR-H3 loop and base (9, 30). These types of changes have been observed as a function of ontogeny as well as of species origin. However, in the adult mouse sequences analyzed in this work, the stability in the relative prevalence of amino acid sequence, length, and N addition was accompanied by stability in the distribution of predicted base and loop structures (data not shown). No significant changes in predicted structure were observed from fractions B to F.
A shift in average hydrophobicity from near neutrality to hydrophilicity
To determine whether there was a global change in the distribution of hydrophobicity with development, we used a normalized Kyte-Doolittle scale (31, 32) to calculate the relative average hydrophobicity of the CDR-H3 loops (Fig. 4B). We observed a shift to neutrality from fraction B (–0.15 ± 0.04) to fraction C (–0.08 ± 0.03) that was followed by shift toward hydrophilicity in fractions D, E, and F (–0.15 ± 0.03, –0.14 ± 0.03, and –0.17 ± 0.02, respectively). The shift in average hydrophobicity from fraction C to D is significant at p = 0.05, and the shift from fraction C to F is significant at p = 0.02.
As in the case of length, the variance in average hydrophobicity decreased with development. This shift in variance was significant at p = 0.03 between fractions B and F, and at p < 0.01 between fractions C and F. The change in variance is due, in part, to the loss of sequences at the extremes (Fig. 5B). In fraction B, there was one sequence whose average hydrophobicity score was greater than 0.6, there were 13 in fraction C, 3 in fraction D, 4 in fraction E, and none in fraction F. Similarly, there were 3 sequences in fraction B with an average hydrophobicity score of less than –0.6, there were 6 in fraction C, 6 in fraction D, two in fraction E, and none in fraction F. The difference in the prevalence of sequences at the extreme in fraction C (19 of 211) vs fraction F (0 of 110) is significant at p < 0.01. A comparison of highly charged sequences between surface IgM– pre/pro-B cells (fractions B, C, and D) and surface IgM+ B cells (fractions E and F) is also significant at p < 0.05.
Discussion
We have shown in this study that the major patterns of VH, DH, and JH use in the expressed repertoire are already established in progenitor B cells, and thus before the expression of membrane-bound IgM. The relative prevalence of the various DH families in fraction B remained relatively unchanged from fraction B through fraction F. The rank order of JH prevalence that was first established in fraction C was maintained through fraction F. Consistent with previous reports that focused on rearrangement preference (23, 25, 26, 29), we found a high frequency of transcripts using the VH81X (VH7183.1) gene segment in fraction B. The prevalence of VH81X then steadily diminished in the progression from fraction C to fraction F. A preference for VH7183.10 was established in fraction C, and its prevalence remained remarkably stable from fraction C to fraction F. Thus, the dominant pattern of VH, DH, and JH use remained essentially unchanged from fraction C to F.
Fraction C includes cells that are still at the intermediate DJ rearrangement stage, early pre-B cells that have rearranged their VDJ locus, and pre-B cells that express the pre-BCR (15, 33). Active translation of mRNA is associated with increased transcript abundance due to stabilization of the mRNA by polysomes, a process that is enhanced by B cell activation (34, 35, 36). If the successful assembly of the pre-BCR activates early pre-B cells, it is possible that assembly may similarly enhance μ mRNA abundance and thereby stabilize VH and JH preference. Testing of this hypothesis will require detailed sequence analysis of transcripts and mRNA message abundance from fraction C cells that have been separated on the basis of pre-BCR expression.
Our work confirms and extends a previous observation regarding the enhanced use of DH reading frame 2 at the earliest stages of B cell development (25). The primary mechanism postulated to limit use of reading frame 2 in the expressed repertoire is of the ability of Dμ protein to create a pre-BCR complex (37, 38), which can then activate the allelic exclusion signal transduction pathway to prevent further VDJ rearrangement. We speculate that all of the components of this pathway may not be fully active in fraction B, allowing reading frame 2 DJ rearrangements to undergo VDJ recombination. Rearrangement at stages lacking an intact pre-BCR signal transduction complex may represent a mechanism by which suboptimal H chain V domains may enter the immature B cell repertoire.
Unlike humans, in which the average length of CDR-H3 decreases with development (37, 38), we observed an increase in the average length of CDR-H3 with development in mice. On average, however, mouse sequences are significantly shorter than human ones (9). Thus, even in fraction F, the mice lacked the longer CDR-H3 sequences that have been associated with enhanced self-reactivity in humans.
As with humans (19), the changes in sequence that contribute to the increase in length can be subtle. Although the only significant changes affecting overall CDR-H3 length were the enhanced retention of 5' JH terminal sequence and the increased loss of 5' DFL16.1 sequence, given the number of sequences analyzed it is possible that our analysis was underpowered to confirm the statistical significance of other more delicate differences. Adjustments to JH and DH length with development and with ontogeny are also observed in humans, macaques, and chimpanzees (19, 30). They appear to represent common mechanisms used to respond to potentially common selective forces.
The increase in average length of CDR-H3 with development occurred in association with decreased representation of outlier lengths. Although our analysis cannot distinguish whether this phenomenon results from the loss of outliers or positive selection for sequences with lengths closer to the final average length of CDR-H3 in the mature B cell fraction F, the net effect is to decrease the range of diversity, thereby focusing the mature B cell repertoire into what appears to be a preferred range.
As with length, the broad outline of amino acid preference was also established early in B lineage development. Preference for tyrosine and glycine in the CDR-H3 loop was already established in fraction B. With development, the representation of these two amino acids was enhanced. The adjustments to CDR-H3 length, the loss of 5' terminal DFL16.1 and the gain of 5' terminal JH sequence appear to play a role in this process. The sequences of the CDR-H3 loop portion of the four JH gene segments encode YWY, Y, W, and YYA, respectively. Thus, tyrosine represents 50% of the JH amino acids that can contribute to the CDR-H3 loop. Enhanced preservation of 5' JH sequence will enrich for tyrosine at the C terminus of the loop. In reading frame 1, the 5' terminus of DFL16.1 encodes tyrosine. The increased loss of 5' nucleotides with development has the effect of decreasing representation of tyrosine at the amino terminus of the DFL16.1-containing loops. In humans, there is an asymmetric distribution of tyrosine in the CDR-H3 loop, with the C terminus enriched for tyrosine (9). The tyrosine gradient increases with increasing loop length. A similar tyrosine gradient is not obvious in mice. However, DFL16.1 gene segment-containing sequences are among the longest CDR-H3 loops. The combination of loss of 5' DFL16.1-encoded tyrosine and gain of 3' JH-encoded tyrosine may act to prevent a relative enrichment for tyrosine at the amino terminus of the loop. An excess of tyrosine at the beginning of the CDR-H3 loop may be as undesirable in mice as in humans.
The average length of sequences that contain DQ52 converges toward the average for sequences in which the contribution of DH cannot be ascertained. Both of these types of short sequences are less likely to contain tyrosine in the CDR-H3 loop, although glycine is common (data not shown). The role of short, glycine-enriched, tyrosine-depleted CDR-H3 structures is unknown, but they appear clearly distinct from CDR-H3 structures that contain extensive sequence derived from the DFL and DSP gene segments, which are longer and enriched for tyrosine as well as glycine.
The presence of excess charged amino acids has been correlated with self-reactivity (22, 39, 40, 41), especially to dsDNA. Sequences containing charged amino acids have been reported to be sequentially purged from the repertoire (22). As with length and amino acid use, we observed an adjustment and focusing of the charge distribution of CDR-H3 during B cell development. This reflected not only a decrease in the average charge, but also a significant reduction in the contribution of highly charged or highly hydrophobic sequences.
Together, these data show that the essential architecture of the CDR-H3 repertoire, including gene segment use, length, structure, amino acid composition, and average hydrophobicity, is established early in B cell development before the surface expression of membrane-bound IgM. With development, many of these features are fine-tuned and focused into an apparent optimal range of lengths, charge, and amino acid composition.
These observations raise a series of open questions. DH reading frame 1 sequences encode the same amino acids that dominate CDR-H3 throughout bone marrow development. Does germline conservation of DH sequence serve as the deciding factor in dictating the composition of CDR-H3? If germline DH sequence plays a critical role in regulating CDR-H3, what role do individual DH sequences play in the development of the repertoire and the ability to mount humoral immune responses? What are the specific selective pressures and the mechanisms that serve to fine-tune the composition of CDR-H3 as the B cell population matures? And finally, what might be the functional consequences of violating these apparent constraints on CDR-H3 composition? The answer to these questions will require targeted manipulation of the germline DH locus, which may shed new light on the role of the DH gene segment in navigating B cell developmental checkpoints and optimizing immune function.
Footnotes
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
1 This work was supported by National Institutes of Health Grants AI42732 (to H.W.S.), AI48115 (to H.W.S.), and HD043327 (to R.L.S.), and P.E. Kempkes-Stiftung (to M.Z.).
2 I.I.I. and R.L.S. contributed equally to the preparation of this manuscript.
3 Address correspondence and reprint requests to Dr. Harry W. Schroeder, Jr., WTI 378, 1530 3rd Avenue South, Birmingham, AL 35294-3300. E-mail address: Harry.Schroeder{at}ccc.uab.edu
4 Abbreviation used in this paper: FR, framework region.
Received for publication December 22, 2004. Accepted for publication March 22, 2005.
References
Tonegawa, S.. 1983. Somatic generation of antibody diversity. Nature 302: 575-581.[Medline]
Alt, F. W., D. Baltimore. 1982. Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-J heavy fusions. Proc. Natl. Acad. Sci. USA 79: 4118-4122.
Rajewsky, K.. 1996. Clonal selection and learning in the antibody system. Nature 381: 751-758.
Hood, L., D. Galas. 2003. The digital code of DNA. Nature 421: 444-448.
Nossal, G. J. V.. 2003. The double helix and immunology. Nature 421: 440-444.
Janeway, C. A., Jr, R. Medzhitov. 2000. Innate immune recognition. Annu. Rev. Immunol. 20: 197-216.
Kabat, E. A., T. T. Wu, H. M. Perry, K. S. Gottesman, and C. Foeller. Sequences of Proteins of Immunological Interest. U.S. Department of Health and Human Services, Bethesda, pp. 1–2387..
Padlan, E. A.. 1994. Anatomy of the antibody molecule. Mol. Immunol. 31: 169-217.
Zemlin, M., M. Klinger, J. Link, C. Zemlin, K. Bauer, J. A. Engler, H. W. Schroeder, Jr, P. M. Kirkham. 2003. Expressed murine and human CDR-H3 intervals of equal length exhibit distinct repertoires that differ in their amino acid composition and predicted range of structures. J. Mol. Biol. 334: 733-749
Wu, T. T., G. Johnson, E. A. Kabat. 1993. Length distribution of CDRH3 in antibodies. Proteins Struct. Funct. Genet. 16: 1-7.(Ivaylo I. Ivanov2,*, Robe)