当前位置: 首页 > 期刊 > 《核酸研究》 > 2005年第We期 > 正文
编号:11369500
GlyProt: in silico glycosylation of proteins
http://www.100md.com 《核酸研究医学期刊》
     German Cancer Research Center Heidelberg, Central Spectroscopy–Molecular Modeling Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany

    *To whom correspondence should be addressed. Tel: +49 6221 42 4541; Fax: +49 6221 42 3669; Email: a.bohne@dkfz-heidelberg.de

    ABSTRACT

    GlyProt (http://www.glycosciences.de/glyprot/) is a web-based tool that enables meaningful N-glycan conformations to be attached to all the spatially accessible potential N-glycosylation sites of a known three-dimensional (3D) protein structure. The probabilities of physicochemical properties such as mass, accessible surface and radius of gyration are calculated. The purpose of this service is to provide rapid access to reliable 3D models of glycoproteins, which can subsequently be refined by using more elaborate simulations and validated by comparing the generated models with experimental data.

    INTRODUCTION

    The human genome appears to encode no more than 25 000 proteins (1). This relatively small number of genes compared with the genome of other species has been one of the big surprises to come out of the Human Genome Project. A major challenge is to understand how post-translational events affect the activities and functions of these proteins in relation to health and disease. Among these, glycosylation is by far the most frequent; more than half of all the proteins in the human body have glycan molecules attached (2,3). Glycosylated proteins are ubiquitous components of extracellular matrices and cellular surfaces. Their oligosaccharide moieties are implicated in a wide range of cell–cell and cell–matrix recognition events. N-glycans covalently connected to proteins constitute highly flexible molecules. Therefore, only a small number of glycan structures are available for which sufficient electron density for an entire oligosaccharide chain can be detected (4). Unambiguous structure determination based on NMR-derived geometric constraints alone is often not possible (5). Time-consuming computational approaches such as Monte Carlo calculations and molecular dynamics simulations have been widely used to explore the conformational space accessible to complex carbohydrates (6,7).

    For reasons that are not well understood, not all Asn-X-Ser/Thr sequons are glycosylated. Unfortunately, the unambiguous determination of occupied N-glycosylation sites is experimentally demanding and can vary between different cellular locations. The aims of GlyProt are (i) to evaluate whether a potential N-glycosylation site is spatially accessible, (ii) to generate reasonable three-dimensional (3D) models of glycoproteins with user-definable glycan moieties and (iii) to provide some evidence on how the physicochemical parameters can change between the varying glycoforms of a protein.

    MATERIALS AND METHODS

    The 3D structure of a protein in Protein Data Bank (PDB) format is required as input (see dataflow given in Figure 1). The protein structure can be either taken directly from the PDB or uploaded from a local computer. Potential N-glycosylation sites (sequon: Asn-X-Ser/Thr, where X is not Pro) are automatically detected and highlighted using the one-letter amino acid code. In cases where experimental coordinates with already attached glycans are provided, the internal coordinates (distance between the N of the Asn-sidechain and the C1 of the attached ?-D-GlcpNAc and the torsion angles determining the orientation of the glycan moiety) are displayed.

    Figure 1 Dataflow of GlyProt.

    Orientation of the N-glycans

    The orientation of the attached N-glycan relative to the glycosylation site is described by the four consecutive torsion angles 1, 2, and (for definition see Table 1). It is well known from the analysis of the experimentally available 3D structures of glycoproteins (8,9) that preferred orientations of the glycan moiety relative to the protein exist (Figure 2). The current version of the PDB contains nearly 3000 N-glycan chains. Conformational maps indicating the populated areas for all four torsion angles can be easily obtained using the GlyTorsion tool (http://www.glycosciences.de/glytorsion/) from the Carbohydrate Structure Suite (10).

    Table 1 Definition of torsion angles defining the orientation of the glycan moiety relative to the protein and hierarchy of applied torsion angles

    Figure 2 Statistical analysis of the PDB for torsion angles determining the orientation of the glycan moiety relative to the protein.

    It is assumed that the Man3 N-glycan core exhibits one dominant, relatively rigid conformation. This assumption is supported by the analysis of experimentally determined torsion angles for the corresponding glycosidic linkages in the PDB (Table 2 and Figure 3). Only the 1–6 linkage exhibits two significantly populated conformations, whereas the other three linkages constitute only one highly populated conformation.

    Table 2 Torsion angles for glycosidic linkages of the N-glycan core region

    Figure 3 Statistical analysis of the PDB for glycosidic torsion angles determining the conformation of the N-glycan core.

    To evaluate whether a potential glycosylation site is spatially accessible, a program written in C is used to connect the Man3 N-glycan core to the protein and test all possible angle sets. The frequency of occurrence of the four relevant torsion angles (Table 1) is used to orient the N-glycan core. Next, the program evaluates whether atoms of the attached glycan moiety overlap with the protein. If spatial overlaps are detected, the model is rejected and the next most frequently observed orientation of the glycan moiety is applied. Table 1 lists the values of the four relevant torsion angles and the succession in which they are applied. This procedure is repeated until a structure with no or minor overlap has been found. If all orientations listed in Table 1 have been applied and all resulting glycoprotein structures exhibit overlapping atoms, it is assumed that the glycosylation site is spatially inaccessible and therefore cannot be glycosylated.

    Construction of user-definable glycoproteins

    For each spatially accessible potential N-glycosylation site three options are offered for selecting the N-glycan to be connected. The user can

    select the type of N-glycan (e.g. oligomannose rich, complex, hybrid, very large); by default a typical structure for each class is taken;

    select an N-glycan from a database of >1000 structures (Figure 4) constructed using SWEET-II (11) and optimized using the TINKER MM3 force field (http://dasher.wustl.edu/tinker/); the database is searchable by N-glycan composition;

    construct the desired N-glycan using SWEET-II by user input of the desired structure using the extended IUPAC nomenclature.

    If the coordinates provided already contain attached N-glycans, the user can either accept this orientation or use the procedure described above to align the glycan moiety.

    Figure 4 Input spreadsheet (top) used to query the database, which contains >1000 3D structures of N-glycans (bottom). The user indicates the desired glycoform by checking the corresponding selection box.

    RESULTS

    The atomic coordinates of the desired glycoprotein are given in PDB format, and they are immediately displayed using the Java applet Jmol (http://jmol.sourceforge.net/). The coordinates can be downloaded and used as input for many 3D visualization programs (see Figure 5). In addition, some physicochemical parameters for the non-glycosylated and the glycosylated protein are displayed to provide a general delineation of the changes caused by the selected glycoform (see Table 3). The program Surface Racer (12) is used to calculate the solvent accessible surface of both molecules. The generally observed increase of the polar surface area as a result of glycosylation reflects the well-known experience that glycoproteins exhibit higher solubility.

    Figure 5 User interface (top) to select the desired glycoform for each gycosylation site. Visualization (bottom) of the constructed glycoprotein. The protein part is given as a cartoon representation; the glycan part as a spacefill model.

    Table 3 Comparison of some characteristic physicochemical properties of the pure Influenza A Subtype N9 Neuraminidase (14) and the constructed glycoform

    DISCUSSION

    GlyProt enables rapid Internet-based access to reasonable 3D model of glycoproteins. Although it is estimated that >50% of all proteins are glycosylated (2,3), only 5% of all PDB entries have attached glycan chains (4). Moreover, only a few entries in the PDB contain X-ray diffraction data with sufficient electron density to detect an entire oligosaccharide chain. The 3D models of glycoproteins constructed with GlyProt can provide some evidence on which areas of a protein are captured by a certain glycoform and whether, for example, a binding site is covered so that the biological activity of a protein may be influenced.

    Simply because of their large size and hydrophilicity, glycans can alter the physicochemical properties of a glycoprotein, making it more soluble, reducing backbone flexibility and thus leading to increased protein stability, protecting it from proteolysis, and so on. The calculation of some characteristic physicochemical parameters will help in the evaluation and explanation of the varying properties of different glycoforms. Of the therapeutic proteins on the market, 60% are glycoproteins (13). Often, the removal of N-glycans results in a protein with a very short half-life and virtually no activity in vivo (13).

    A comprehensive evaluation of the impact of varying glycoforms on protein function is hampered by the high conformational flexibility of glycan structures. Based on the statistical analysis of experimentally known glycan conformations, GlyProt constructs a reasonable conformation out of a manifold. However, a more realistic analysis would require the complete conformational space that is accessible to a glycan at a given glycosylation site to be sacnned. Therefore, we intend to expand the GlyProt service with an option allowing the exploration of the conformational space accessible to an N-glycan, which is covalently bound to a specific glycosylation site. A similar approach has already been successfully applied to rapidly generate a representative ensemble of conformations of single N-glycan molecules (6). This algorithm is based on a comprehensive set of conformations of N-glycan fragments that were derived from molecular dynamics simulations. However, this approach would assume a protein conformation that remains unchanged through the attachment of varying glycans. In order to allow conformational changes of the protein backbone, only force-field-based, time-consuming simulation approaches such as molecular dynamics with inclusion of explicit water molecules would be appropriate.

    ACKNOWLEDGEMENTS

    The development of GlyProt is funded by a grant from the German Research Council (Deutsche Forschungsgemeinschaft, DFG) within the digital library program. Funding to pay the Open Access publication charges for this article was provided by DFG.

    REFERENCES

    International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome Nature, 431, 931–945 .

    Apweiler, R., Hermjakob, H., Sharon, N. (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database Biochim. Biophys. Acta, 1473, 4–8 .

    Ben-Dor, S., Esterman, N., Rubin, E. (2004) Biases and complex patterns in the residues flanking protein N-glycosylation sites Glycobiology, 14, 95–101 .

    Luetteke, T., Frank, M., von der Lieth, C.W. (2004) Data mining the protein data bank: automatic detection and assignment of carbohydrate structures Carbohydr. Res., 339, 1015–1020 .

    Imberty, A. and Perez, S. (2000) Structure, conformation, and dynamics of bioactive oligosaccharides: theoretical approaches and experimental validations Chem. Rev., 100, 4567–4588 .

    Frank, M., Bohne-Lang, A., Wetter, T., von der Lieth, C.W. (2002) Rapid generation of a representative ensemble of N-glycan conformations In Silico Biol., 2, 427–439 .

    Woods, R.J. (1998) Computational carbohydrate chemistry: what theoretical methods can tell us Glycoconj. J., 15, 209–216 .

    Imberty, A. and Perez, S. (1995) Stereochemistry of the N-glycosylation sites in glycoproteins Protein Eng., 8, 699–709 .

    Petrescu, A.J., Milac, A.L., Petrescu, S.M., Dwek, R.A., Wormald, M.R. (2004) Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding Glycobiology, 14, 103–114 .

    Lutteke, T., Frank, M., von der Lieth, C.W. (2005) Carbohydrate structure suite (CSS): analysis of carbohydrate 3D structures derived from the PDB Nucleic Acids Res., 33, D242–D246 .

    Bohne, A., Lang, E., von der Lieth, C.W. (1998) W3-SWEET: carbohhydrate modeling by internet J. Mol. Model., 4, 33–43 .

    Tsodikov, O.V., Record, M.T., Jr, Sergeev, Y.V. (2002) A novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature J. Comput. Chem., 23, 600–609 .

    Gerngross, T.U. (2004) Advances in the production of human therapeutic proteins in yeasts and filamentous fungi Nat. Biotechnol., 22, 1409–1414 .

    White, C.L., Janakiraman, M.N., Laver, W.G., Philippon, C., Vasella, A., Air, G.M., Luo, M. (1995) A sialic acid-derived phosphonate analog inhibits different strains of influenza virus neuraminidase with different efficiencies J. Mol. Biol., 245, 623–634 .(Andreas Bohne-Lang* and Claus-Wilhelm vo)