The years 1954-1955 represented a milestone in collagen structure. Ramachandran1-3 derived the concept of the correct type of triple-helical structure for collagen by applying the newly developed fibre diffraction theory4 to improved collagen fibre diffraction patterns with stereochemical considerations. It was Cohen and Bear5 who first recognised that the fibre diffraction pattern indicated collagen had a helical structure, but the elucidation of the correct structure required the additional insight that the unusually high Glycine (Gly) and imino acid contents of collagen must have a structural basis1. The original proposal of three closely packed staggered helical chains, each with three residues per turn, was modified by a supercoiled twist to fit well-orientated diffraction patterns1-3. Independently, in 1955, two other groups proposed related triple-helical structures. Model building based on the structure of polyglycine II led Rich and Crick6,7 to a collagen triple-helical conformation built with more stringent stereochemical criteria. At King’s College, the recording and analysis of highly orientated collagen X-ray patterns, together with the elucidation of the polyproline II structure, led to the proposal of a similar supercoiled structure8. These exciting years have been reviewed in depth9,10, and a recent perspective has been published11-13.
Hydrogen bonding in the triple-helix has been a source of controversy. In terms of solvent accessibility and interactions, the collagen triple-helix is analogous to the formation of secondary structure such as the a-helix and ß-sheet conformations14. Pauling and Corey15 proposed that the formation of all possible peptide hydrogen bonds was an important factor in stabilising helical conformations, and a similar maximisation of hydrogen bond formation might be expected for the triple-helix. But from the beginning1,3, it was clear that no intrachain hydrogen bonds were possible in the triple-helix and that it would not be possible to have all backbone NH and CO groups involved in interchain hydrogen bonds. In particular, the NH group of the residue in the Y position points out of the triple-helix, into the surrounding solvent, with no possibility of interaction with other peptide groups. Anxious to maximise intramolecular hydrogen bonding as a stabilising force in the triple-helix, Ramachandran and Kartha1,3 proposed that two interchain NH…O bonds could be formed for every three residues, one involving the Gly NH group (to the carbonyl of the X residue) and the second utilising the NH group of the residue in the X position whenever this was not an imino acid. Rich and Crick7,16 disagreed with this proposal, and found through calculations that only one interchain hydrogen bond, Gly NH…O=C(X), could be formed and that disallowed short contact distances were present if the model was modified to allow a second interchain hydrogen bond. Later, Ramachandran and Chandrasekharan17 suggested that a second interchain hydrogen bond involving NH groups of the residue in the X position could be mediated by a water molecule. Ramachandran also suggested the possibility of interchain CH(Gly)…O=C bonds in polyglycine II and in collagen18,19 and was one of the researchers who proposed that the stabilising effect of hydroxyproline was related to water-mediated hydrogen bonding20-22.
Calorimetric studies on collagen generated evidence that the major stabilising force of the triple-helix is enthalpic, derived from a dominant contribution of hydrogen bonding22. Privalov’s perceptive analysis of hydrogen bonding in the collagen triple-helix helped resolve several contentious issues by laying out the experimental evidence for water mediated hydrogen bonding.
Recent advances in the understanding of collagen structure have been made through studies on triple-helical peptides of defined sequences, and the new information obtained from these studies concerning the basis collagen confirmation, its hydrogen bonding, and its sequence-dependent features will be summarised here.
Peptide models of collagen
Studies on model peptides played a critical role in the original determination of the triple-helix conformation and in characterising its features. The structures of both polyglycine II and polyproline II in the 1950s gave insight into the molecular conformation of individual chains, intermolecular hydrogen bonding, and molecular packing. Studies on poly (Gly-Pro-Hyp) and other repeating polytripeptides clarified requirements for triple-helices and their relative stabilities in solution23-25.
With the advent of solid-state peptide synthesis, the ability to create triple-helical peptides of defined length and sequence led to more quantitative studies and allowed modelling of specific collagen sequences, including cell-binding sites, heparin-binding sites, an epitope for a monoclonal antibody and the collagenase cleavage site26-28. The most stablising sequence is Gly-Pro-Hyp, and recent peptides have used repeating Gly-Pro-Hyp sequences to promote formation of the triple-helix structure. An additional approach to stabilisation has been cross-linking of three chains, through dilysyl moieties, N-terminal links, or Kemp triacid27-30. One successful and striking example has been the synthesis of a cross-linked heterotrimer which models the type I collagenase cleavage site, and is a substrate for that enzyme31. Fortunately, some peptides have proved to have the high solubility needed for multi-dimensional NMR studies32, and several have formed crystals suitable for X-ray crystallography.
First crystal structure
The first high-resolution triple-helical structure was obtained by X-ray crystallography for a peptide containing repeating Pro-Hyp-Gly motifs with a single Gly to Ala substitution near the centre33. Repeating Pro-Hyp-Gly sequences provided the first and most stable examples of defined triple-helical peptides34, and the replacement of one Gly by an Ala decreased the thermal stability dramatically from 60°C for (Pro-Hyp-Gly) 10 to 30°C35. Unperturbed Pro-Hyp-Gly regions at both ends of the peptide were similar to those predicted from fibre diffraction models, except for a small difference in symmetry (7/2 for the peptides vs. 10/3 for the collagen helix36). The observed distances were consistent with (Gly) NH…CO(Pro) hydrogen bonds, except near the site of the Ala substitution, where a water mediated linkage is present. In addition, the crystal structure showed evidence for the existence of two repetitive patterns of C-H…O=C hydrogen bonds, with one set occurring between GlyCa and C=O groups from Gly and Pro in the other chains and the second set connecting HypCα atoms in one chain with Pro C=O groups in the neighbouring chain37. An extensive and ordered water network was observed, involving hydrogen bonds with all available carbonyl and Hyp hydroxyl groups33,38. Such water-mediated bonding joins carbonyl and/or Hyp groups within one chain, between chains within one molecule, and in less regular arrangements, between molecules. The key participation of the Hyp OH groups in this ordered hydration network is likely to be the basis of the stabilising role of Hyp in the triple-helix. The recent high resolution structure of (Pro-Pro-Gly)10 again shows a highly ordered hydration network, involving the peptide carbonyl groups, surprisingly similar to that seen in the Pro-Hyp-Gly region of the Gly to Ala peptide, even though Hyp is absent39. Comparison of the hydration patterns in Pro-Pro-Gly and Pro-Hyp-Gly peptides indicates ‘that the existence of localised water surrounding the Y position imino acid is itself not dependent on hydroxyproline, but the presence of hydroxyproline induces the formation of additional water bridges, greater localisation, and a more extensive hydration network39.
It is important to note that the crystallographic structure itself cannot directly give information on molecular stability. The structure results from molecular packing as well as intramolecular interactions, and there is no way to estimate the stabilisation effect of visualised hydration networks or other interactions. However, the ordered hydration network involving Hyp visualised in these triple-helical peptides is strikingly similar to that predicted on thermodynamic grounds22, where the thermodynamic data does provide evidence for stabilisation by such hydrogen bonding.
Recent reports from Raines and his group suggest that the mechanism of Hyp stabilisation is likely to be related to the inductive effect of an electron-withdrawing group on the pyrrolidine ring, rather than to hydration40,41. This is based on the observation that the stabilisation of the triple-helix is greater for fluoroproline>Hyp>Pro in the Y position of peptides and this stability order follows the observed trans:cis imide bond ratio.
Since fluoroproline is unlikely to play an important role in a hydration network, the inductive effect is suggested as the principal means of stabilistion. However, it is possible that the high degree of stability of (Pro-Fluoroproline-Gly)10 derives from factors different from those that stabilise (Pro-Hyp-Gly)10 and collagen, eg the unfavourable nature of a monomer with fluoroproline in aqueous solution compared with the trimer.
Interestingly, the Gly to Ala peptide molecules were packed in a quasi-hexagonal arrangement in the crystal, with intermolecular distances comparable with those seen in collagen fibrils in tendon or skin33.
No direct contacts were made with atoms in adjacent molecules; the water mediated bridges appeared to be the determining factor for intermolecular packing. Recent studies on collagen fibrils using osmotic pressure and Raman spectroscopy have emphasised the critical nature of water in the packing of molecules42.
Second hydrogen bond
Gly-Pro-Hyp tripeptides promote triple-helix formation and are highly stabilising, but more varied sequences are needed in collagen to mediate biological functions and to modulate stability to just above physiological temperature. NMR studies and the recent crystallographic solution of a peptide including a nine residue region from type III collagen (residues 785-793) just C-terminal to the unique collagenase site give a view into less stabilising and more functional triple-helical regions. Peptide T3-785, a 30-mer of sequence Pro-Hyp-Gly-Pro-Hyp-Gly-Pro-Hyp-Gly-Ile-Thr-Gly-Ala-Arg-Gly-Leu-Ala-Gly-Pro-Hyp-Gly-Pro-Hyp-Gly-Pro-Hyp-Gly-Pro-Hyp-Gly, forms a stable triple-helix, with a Tm near 18°C, far below the 60°C Tm of the 30-mer (Pro-Hyp-Gly)10.
15N-enriched residues were incorporated at specific sites in the peptide, and NMR studies showed that the entire peptide was in a stable triple-helix conformation, with a rigid chain as determined from relaxation parameters43,44.
Hydrogen exchange studies were carried out for each residue in the Gly-Leu-Ala sequence of this peptide44. The Gly NH and Leu NH both exchanged slowly (protection factors of ~1000 and ~400 respectively) while the Ala NH exchanged very rapidly (protection factor of ~35), almost as fast as in the case of the monomer unfolded form.
These studies confirm the interpretation of hydrogen exchange done on collagen22, indicating that the 1/3 very slowly exchanging hydrogens are from the Gly NH groups, and the slowly exchanging hydrogens come from the NH group of non-imino acids in the X position, while those from the NH group of the Y residue exchange very rapidly.
The high resolution structure of peptide T3-785 was determined recently by Rachel Kramer in Helen Berman’s laboratory45,46. The Gly NH groups form hydrogen bonds with the C=O groups of the X residue, as expected, with additional hydrogen bonding of the carbonyl to the Hyp CαH. The backbone NH of residues in the X position in the central region of the peptide (Ile, Ala and Leu) all form a hydrogen bond through one water molecule to the C=O group of Gly, similar to the water mediated interchain hydrogen bonding scheme proposed by Ramachandran and Chandrasekharan17. Together, the NMR and X-ray data indicate that the NH group of Leu791 which has a water mediated hydrogen bond exchanges almost as slowly as the Gly790 with a direct interchain H bond.
This suggests a stabilising effect of this water mediated hydrogen bond, in accordance with the thermodynamic analysis of Privalov22. Studies on various peptides, including a well defined host-guest set, indicate that it is more favourable to have an imino acid in the X position than an amino acid, suggesting the enthalpic stablisation of a second hydrogen bond is less than the entropic stabilisation of imino acids, but provides a way of increasing stability while allowing functional sequences.
In the Y position, where no interchain H bond is possible, it is clearly more favourable to have Hyp, which is both entropically favourable and can participate in hydration hydrogen bonding network.
Discussion
Application of modern biophysical techniques to peptides of defined sequences has opened the door to molecular characterisation of the collagen triple-helical structure and dynamics. Stabilisation of the triple-helix is seen to be the result of the entropic stablisation from imino acids, together with the summation of numerous sets of strong and weak hydrogen bonding patterns. Some hydrogen bonding sets are direct, while others link available groups on the peptide backbone and/or hydroxyproline through the mediation of one or more water molecules.
An example of the kinds of hydrogen bonds which have been identified in the crystal structure is shown in figure 1. Of all available hydrogen bonding groups, only the NH in the Y position (when it is not an imino acid) bonds to water that does not further bind to a peptide chain45,46.
Future challenges include the sequence specificity of the collagen features and the alterations in structure resulting from mutations in collagen. The major advances in recombinant DNA technology, together with progress on designed peptides, should provide an impetus to addressing these important challenges.
*Barbara Brodsky
Department of Biochemistry, UMDNJ – Robert Wood Johnson Medical School
Email: brodsky@rwja.umdnj.edu