EP1037963A1 - CRYSTAL COMPRISING HUMAN IMMUNODEFICIENCY VIRUS ENVELOPE GLYCOPROTEIN gp120, COMPOUNDS INHIBITING CD4-gp120 INTERACTION, COMPOUNDS INHIBITING CHEMOKINE RECEPTOR-gp120 INTERACTION, MIMICS OF CD4 AND gp120 VARIANTS - Google Patents

CRYSTAL COMPRISING HUMAN IMMUNODEFICIENCY VIRUS ENVELOPE GLYCOPROTEIN gp120, COMPOUNDS INHIBITING CD4-gp120 INTERACTION, COMPOUNDS INHIBITING CHEMOKINE RECEPTOR-gp120 INTERACTION, MIMICS OF CD4 AND gp120 VARIANTS

Info

Publication number
EP1037963A1
EP1037963A1 EP98959406A EP98959406A EP1037963A1 EP 1037963 A1 EP1037963 A1 EP 1037963A1 EP 98959406 A EP98959406 A EP 98959406A EP 98959406 A EP98959406 A EP 98959406A EP 1037963 A1 EP1037963 A1 EP 1037963A1
Authority
EP
European Patent Office
Prior art keywords
gpl20
compound
binding
crystal
polypeptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98959406A
Other languages
German (de)
French (fr)
Other versions
EP1037963A4 (en
Inventor
Peter D. Kwong
Wayne A. Hendrickson
Joseph G. Sodroski
Richard T. Wyatt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Columbia University in the City of New York
Dana Farber Cancer Institute Inc
Original Assignee
Columbia University in the City of New York
Dana Farber Cancer Institute Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Columbia University in the City of New York, Dana Farber Cancer Institute Inc filed Critical Columbia University in the City of New York
Publication of EP1037963A1 publication Critical patent/EP1037963A1/en
Publication of EP1037963A4 publication Critical patent/EP1037963A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N23/00Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
    • G01N23/20Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by using diffraction of the radiation by the materials, e.g. for investigating crystal structure; by using scattering of the radiation by the materials, e.g. for investigating non-crystalline materials; by using reflection of the radiation by the materials
    • G01N23/207Diffractometry using detectors, e.g. using a probe in a central position and one or more displaceable detectors in circumferential positions
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K16/00Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
    • C07K16/08Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from viruses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N1/00Microorganisms, e.g. protozoa; Compositions thereof; Processes of propagating, maintaining or preserving microorganisms or compositions thereof; Processes of preparing or isolating a composition containing a microorganism; Culture media therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N23/00Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
    • G01N23/20Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by using diffraction of the radiation by the materials, e.g. for investigating crystal structure; by using scattering of the radiation by the materials, e.g. for investigating non-crystalline materials; by using reflection of the radiation by the materials
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2740/00Reverse transcribing RNA viruses
    • C12N2740/00011Details
    • C12N2740/10011Retroviridae
    • C12N2740/16011Human Immunodeficiency Virus, HIV
    • C12N2740/16022New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes

Definitions

  • HIV Human Immunodeficiency Virus
  • gpl20 gpl20
  • AIDS acquired immunodeficiency syndrome
  • CD4 two cellular receptors of the human host
  • chemokine receptor primarily CXCR-4 or CCR-5, depending on viral strain
  • these high affinity interactions are attractive targets for mimetic drug design.
  • the structure of the gpl20-binding domain of CD4 and the identity of residues critical to its interaction with gpl20 have been known for several years (13,14), this has not been sufficient for design of potent antagonists (15-17).
  • the major virus-specific antigen accessible to neutralizing antibodies knowledge of the gpl20 structure could also impact considerably on vaccine design.
  • the gpl20 protein has been an obvious target for structural investigation, and quantities of pure soluble protein have been available for several years, a byproduct in part from vaccine trials. Nevertheless, despite considerable effort, it has resisted crystallographic analysis for more than a decade.
  • the mature gpl20 glycoproteins of different HIV-1 strains have approximately 470-490 amino acids (18). Extensive N-linked glycosylation at approximately 20-25 sites accounts for roughly half its mass (18,19). Sequences from many different viral isolates show that it contains five conserved regions (C1-C5) and five variable regions (V1-V5)' (18, 20) and nine conserved disulfide bridges (19) . Except for limited N- and C- terminal cleavage, proteolytic digestion does not reveal a sub-domain structure. Indeed, even after extensive proteolytic cleavage, the unreduced protein runs near its native molecular weight on SDS-PAGE (Peter D. Kwong: unpublished data) .
  • variable regions the V3 loop in particular, appear to be conformationally variable. Conformational change is also evidenced by shedding, the CD4-induced dissociation of gpl20 from the surface of the virus, and by ligand-induced variations in monoclonal antibody binding (21,22). These changes may be related to the functional role of gpl20 in virus entry.
  • the subject invention provides a crystal suitable for X- ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gpl20, wherein the amino acid sequence is at least 100 amino acids in length.
  • the subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 2.5 angstroms or better than 2.5 angstroms.
  • the subject invention additionally provides a method for producing a crystal suitable for X-ray diffraction comprising: (a) deglycosylating a polypeptide having amino acid sequence of a portion of a gpl20 wherein said portion is produced by deleting or replacing part of the gpl20 to reduce the surface loop flexibility; (b) contacting the polypeptide with a ligand so as to form a complex which exhibits restricted conformational mobility; and (c) obtaining crystal from the complex so formed to produce a crystal suitable for X-ray diffraction.
  • the subject invention also provides the above-described methods, wherein the VI, V2, or V3 loop of the gpl20 contained in the polypeptide are partially truncated, deleted or replaced.
  • the subject invention also provides a method for identifying a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b)- determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the gpl20.
  • This invention also provides a method of inhibiting the interaction of HIV-gpl20 with CD4 which comprises administering to a mammal in need thereof a compound capable of disrupting two or more of the contacts between gpl20 and CD4 as set forth in Figure 54.
  • This invention also provides a method for identifying a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD ; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the CD4 binding site of the gpl20.
  • This invention also provides a method for designing a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) designing a compound to fit the CD4 binding site.
  • This invention also provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject.
  • This invention provides a method for identifying a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) determining whether a compound would fit into the binding site, a positive fit indicating that the compound is capable of binding to the chemokine receptor binding site of the gpl20.
  • This invention also provides a method for designing a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) designing a compound to fit the chemokine receptor binding site.
  • the crystal further comprises a chemokine receptor, a second polypeptide having amino acid sequence of a portion of chemokine receptor, an antibody or a Fab capable of binding to the chemokine receptor binding site or a compound known to be capable of binding to the chemokine receptor binding site, bound to the polypeptide.
  • This invention further provides a method of inhibiting the interaction of HIV-gpl20 with chemokine receptor which comprises administering to a mammal in need thereof a compound capable of disrupting two or more of the contacts between gpl20 and chemokine receptor as set forth in Figure 55, thereby inhibiting the interaction of HIV-gpl20 with chemokine receptor.
  • This invention provides a substance mimicking the gpl20- binding domain of CD4 wherein the size of the residue or analog thereof at position 43 is bigger than the size of a phenylalanine so as to increase the affinity for human immunodeficiency virus envelope glycoprotein gpl20.
  • This invention also provides the above-described substances, wherein the modification results in a o residue or analog thereof larger than 10 A across its longest dimension.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof, wherein the residue's longest dimension is longer than phenylalanine ' s longest dimension.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 15 A across its longest dimension.
  • This invention also provides the above-described substances, wherein the modification involves replacemnet of the residue at position 43 with a cysteine .
  • This invention also provides the above-described substances, wherein the modification involves replacement of the residue at position 43 with a tyrosine .
  • This invention further provides a pharmaceutical composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the substance of claim 1; and (b) a pharmaceutically acceptable carrier.
  • This invention further provides a method of inhibiting cell entry by HIV, comprising contacting the cells with an effective amount of the above-described substances, thereby inhibiting cell entry by HIV.
  • This invention further provides a method of treating or preventing HIV infection in a subject, comprising administering to the subject an effective amount of the above-described substances, thereby treating or preventing HIV infection.
  • the subject invention provides a variant of gpl20 which presents a hidden, conserved, neutralization epitope.
  • the subject invention also provides a composition comprising a variant of gpl20 which presents a hidden, conserved, neutralization epitope and a suitable carrier.
  • the subject invention further provides a vaccine comprising a variant of gpl20 which presents a hidde conserved, neutralization epitope and a suitable carrier .
  • the subject invention also provides an antibody induced by a vaccine comprising a variant of gpl20.
  • CD4 is in the top left, gpl20 is toward the right, and Fab 17b interacting.
  • CD4 is in the top left, gpl20 is toward the right, and Fab
  • 17b is in the bottom left of the figure.
  • Crystal types A-F are shown and correspond to the crystal types described in the text and Tables 3 and 4.
  • the photomicrograph in A is at twice the magnification.
  • the bar in A corresponds to 25 ⁇ m (50 ⁇ m for B-F) .
  • Lane 1 (Pharmacia Phast system). Lane 1, 2.5 ug of ternary complex purified by gel filtration. The top band is the deglycosylated ⁇ 82 ⁇ V1/2* ⁇ V3 ⁇ C5 gpl20, the next two bands are the alkylated and reduced heavy and light chains respectively of the Fab 17b, and the bottom band is the two-domain sCD4 (D1D2). Lane 2, standards: 94, 67, 43
  • Lane 5 dissolved crystals. The gel is silver stained.
  • the HIV-1 entry process The trimeric HIV-1 envelope glycoproteins, anchored in the viral membrane, are depicted, with gpl20 in the lower right and gp41 in the upper right. For simplicity, the gpl20 variable loops are not shown, but would extend over the outer surface of the envelope glycoprotein complex.
  • the receptors on the target cell, CD4 and chemokine receptor are also shown.
  • the structures of gpl20, gp41, and CD4 are adapted from available X-ray crystallographic studies (5,20,21), whereas the chemokine receptor model is hypothetical .
  • Figure 27 The HIV-1 gpl20 surface.
  • Figure 27A The HIV-1 gpl20 surface.
  • the molecular surface of the HIV-1 gpl20 core (20) is shown, with the arrow pointing towards the viral membrane.
  • the inner domain, believed to interact with gp41, and the outer domain, which is probably exposed on the assembled trimer, are on the left and right, respectively.
  • the gpl20 surface occluded by CD4 is shown and the gpl20 region thought to be involved in chemokine receptor binding (27) is also shown.
  • the location of the base of the V3 loop is shown.
  • the relationship of different surfaces of the gpl20 core to the antibody response generated by the gpl20 glycoprotein is depicted.
  • the surface of gpl20 that interacts with neutralizing antibodies (32) is shown, spans the inner and outer domains, and includes the V2 and V3 variable loops (not shown) .
  • the surface of gpl20 that interacts with non-neutralizing antibodies is located on the inner domain, and includes gp41- interactive N- and C-terminal gpl20 regions (not shown) .
  • the heavily glycosylated surface of the gpl20 outer domain, which appears to be minimally immunogenic, is also shown.
  • core gpl20 Structure of core gpl20.
  • the orientation of gpl20 in each of the panels shown in this figure is related to Figure 17 by a 90° rotation about a vertical axis.
  • the viral membrane would be oriented above, the target membrane below, and the C-terminal tail of CD4 coming out of the page.
  • the left portion of core gpl20 as the “inner” domain
  • the right portion as the “outer” domain
  • the 4-stranded sheet at the bottom left of gpl20 as the "bridging sheet.”
  • the bridging sheet ( ⁇ 3, ⁇ 2, ⁇ 21, ⁇ 20) can be seen packing primarily over the inner domain, although some surface residues of the outer domain, e.g.
  • FIG 29A Ribbon diagram. Helices and ⁇ -strands are depicted, strand ⁇ l5 makes an antiparallel ⁇ -sheet alignment with strand C' 1 of CD4. The dashed line to the right of the diagram represents the disordered V4 loop. Selected parts of the structure are labeled.
  • Figure 29B
  • Solvent accessibility is indicated for each residue by an open circle if the fractional solvent accessibility is greater than 0.4, a half-closed circle if 0.1 to 0.4, and a closed circle if less than 0.1. Sequence variability observed among primate immunodeficiency viruses is indicated below the solvent accessibility by the number of horizontal hash marks: 1 mark, residues conserved among all primate immunodeficiency viruses; 2 marks, conserved among all HIV-1 isolates; 3 marks, exhibits moderate variation among HIV-1 isolates; and 4 marks, exhibits significant variability among HIV-1 isolates. In accessing conservation, all single atom changes were permitted as well as larger substitutions if the character of the sidechain was conserved (e.g. K to R or F to L) .
  • N-linked glycosylation is indicated by "m” for the high mannose additions and "c” for the complex additions observed in mammalian cells (6).
  • Residues of gpl20 in direct contact with CD4 are indicated by "*". Direct contact is a more restrictive criterion of interaction than the often used loss of solvent accessible surface; residues of gpl20 which show loss of solvent accessible surface but are not in direct contact are 123, 124, 126, 257, 278, 282, 364, 471, 475, 476 and 477. Parts (a) and (b) were drawn with MOLSCRIPT (P. J. Kraulis) .
  • Figure 30 CD4-gpl20 interactions.
  • Figure 30A Ribbon diagram of gpl20 binding to CD4. Residue Phe 43 of CD4 is also depicted reaching into the heart of gpl20. From this orientation the recessed nature of the gpl20 binding pocket is evident.
  • Electron density in the Phe 43 cavity The 2Fo-Fc electron density map at 2.5 ⁇ , l.l ⁇ contour, is shown. The orientation is the same as in (a). The foreground has been clipped for clarity removing the overlying ⁇ 24- ⁇ 5 connection. In the upper middle of the picture is the central unidentified density. At the bottom of the picture, Phe 43 of CD4 can be seen reaching up to contact the cavity.
  • the gpl20 residues are Trp 427 (with its indole ring partially clipped by foreground slabbing) , Trp 112, Val 255, Thr 257, Glu 370 (packing under the Phe 43 ring), lie 371, and Glu 368 (partially clipped in the bottom right corner) . Hydrophobic residues lining the back of the cavity can be partially glimpsed around the central unidentified density.
  • Electrostatic surface of CD4 and gpl20 The electrostatic potential is displayed at the solvent accessible surface, which is shaded according to the local electrostatic potential. The slight "puffiness" of the surface arises from the enlarged nature of the solvent accessible surface relative to the standard molecular surface.
  • the gpl20 surface On the right, the gpl20 surface is shown in an orientation similar to that of Figures 29A and 29C, but rotated -20° around a vertical axis to depict the recessed binding pocket more clearly.
  • a thin yellow C ⁇ worm of CD4 is shown to aid in orientation.
  • the CD4 surface is shown, rotated relative to the gpl20 panel by an exact 180° rotation about the vertical axis shown.
  • a thin red C ⁇ worm of gpl20 is shown.
  • CD4-gpl20 contact surface On the right, the gpl20 surface is shown with the surface within 3.5 A of CD4 (surface-to-atom center distance) . This effectively creates an "imprint" of CD4 on the displayed gpl20 surface. On the left (180° rotation) , the corresponding CD4 surface and gpl20 "imprint” is also shown.
  • Figure 30E CD4-gpl20 mutational "hot-spots.”
  • the surface of gpl20 On the right, the surface of gpl20 is shown with the surface of gpl20 residues shown by substitution to affect CD4 binding highlighted: substantial effect -- residues 257, 368, 370 and 427; moderate effect -- residue 457.
  • Sequence variability mapped to the gpl20 surface The sequence variability observed among primate immunodeficiency viruses (Figure 29D) is depicted mapped onto the gpl20 surface. Also shown is the carbohydrate: N-acetylglucosamine and fucose residues present in the structure; Asn-proximal N-acetylglucosamines modeled at residues 88, 230, 241, 356, 397, 406, 462. Much of the carbohydrate (22 residues) is hidden on the back side of the outer domain. Figure 30H
  • Phe 43 cavity The surface of the Phe 43 cavity is shown, buried in the heart of gpl20.
  • a worm representation of gpl20 shows the three stretches that are incorrectly predicted by secondary structure prediction: the ⁇ B loop, bending around the top of the cavity, strands ⁇ 20- ⁇ 21 just below the cavity, and strand ⁇ l5, slightly more distal to the cavity right.
  • the orientation shown here is the same as for the gpl20 surfaces in Figure 30C-G.
  • FIG 31A Worm diagram of Fab 17b and gpl20.
  • the Fab 17b is shown binding to gpl20.
  • the orientation shown is the same as in Figures 29A and 29C.
  • Electrostatic surface The electrostatic potential is displayed at the solvent accessible surface, which is shaded according to the local electrostatic potential.
  • the electrostatic shading is the same scale as that shown in Figure 30C.
  • the surface that corresponds to the 17b epitope is the most electropositive region of the molecule.
  • the V3 loop is truncated here, but sequence analysis shows that it is generally quite positively charged.
  • FIG. 29A and 29C Schematic representation of the gpl20 initiation of fusion.
  • a single monomer of core gpl20 is depicted in an orientation similar to Figures 29A and 29C.
  • the "3" symbolizes the 3-fold axis, from which gp41 interacts with the gpl20 N- and C- termini to generate the functional oligomer.
  • the V1/V2 loops are shown partially occluding the CD4 binding site.
  • a conformational change is depicted as an inner/outer domain shift, with the dark circle denoting the formation of the Phe 43 cavity.
  • HIV-1 gpl20 Structure of HIV-1 gpl20 with neutralizing antibody and human receptor CD .
  • Figure 34D View of the molecular surface of the gpl20 core inner domain.
  • variability is indicated by the shading scheme used in Figure 34B.
  • the CD4-binding site is to the right of the figure, and the protruding V1/V2 stem is indicated.
  • the conserved molecular surface, which is associated with the inner domain of the gpl20 core, is devoid of know N-linked glycosylation. These are modeled in the figure on the right, which is shaded as described in Figure 23B.
  • Figure 35
  • the molecular surface of the gpl20 core is shown, from the same perspective as that in Figure 34A.
  • the modeled N-terminal gpl20 core residues, V4 loop and carbohydrate structures are included.
  • the variability of the molecular surface is indicated, using the shading scheme described in Figure 34B.
  • the approximate locations of the V2 and V3 variable loops are indicated. Note the well-conserved surfaces near the "Phe 43" cavity and the chemokine receptor- binding site (see Figure 34A) .
  • a Co tracing of the gpl20 core, oriented similarly to Figure 34A.
  • the gpl20 residues within Figure 37A of the 17b CD4i antibody are shown.
  • the residues implicated in the binding of CD4BS antibodyies (20) are shown. Changes in these residues significantly affect the binding of at least 25 percent of the CD4BS antibodies listed in the table from the fourth series of experiments.
  • the residues implicated in 2G12 binding (19) are shown.
  • the V4 variable loop, which contributes to the 2G12 epitope, (19) is indicated by dotted lines (see figure 34A) .
  • Figure 35C
  • Approximate locations of the faces of the gpl20 core defined by the interaction of gpl20 and antibodies.
  • the molecular surface accessible to neutralizing ligands (CD4 and CD4BS, CD4i and 2G12 antibodies) is shown in white.
  • the neutralizing face of the complete gpl20 glycoprotein includes the V2 and V3 loops, which reside adjacent to the surface shown (see Figure 35A) .
  • the approximate location of the gpl20 face that is poorly accessible on the assembled envelope glycoprotein trimer and therefore elicits only non-neutralizing antibodies (5 , 6) is shown.
  • the approximate location of an immulogically "silent" face of gpl20 which roughly corresponds to the highly glycosylated outer domain surface, is also shown.
  • a likely arragement of the HIV-1 gpl20 glycoproteins in a trimeric complex The gpl20 core was organized into a trimeric array, based on the criteria discussed in the text.
  • the perspective if from the target cell membrane, similar to that shown in Figure 34C.
  • the CD4 binding pockets are indicated by black arrows, and the chemokine receptor-binding regions are darkly shaded.
  • the lightly shaded areas indicate the more variable, glycosylated surface of the gpl20 core.
  • the approximate locations of the 2G12 epitopes are indicated by open arrows .
  • the approximate locations for the V3 loops and V4 regions are shown.
  • the positions of the V5 regions and some complex carbohydrate addition sites are shown.
  • the approximate locations of the large V1/V2 loops, centered on the known positions of the VI/V2 stems, are indicated.
  • On one of the gpl20 subunits the positions of the L D and L E loops are indicated.
  • the distance of each of the gpl20 monomers from the 3 -fold symmetry axis is arbitrary.
  • the HIV gpl20 derivative used in the binding assay The wild-type gpl20 and gp41 envelope glycoproteins are shown in the upper figure. conserveed (black) and variable (white) regions (25) are indicated.
  • the N-terminal and V1/V2 deletions correspond to those previously described for the HXBc2 gpl20 mutants ⁇ 82 and ⁇ 128-194, respectively (8,9).
  • SIG signal peptide.
  • FIG. 38A The radiolabeled wt ⁇ protein was incubated either with the parental LI .2 cells or with the L1.2-CCR5 cells. Incubations were carried out either in the absence or presence of sCD4 (lOOnM) . The wt ⁇ protein bound to the cells is shown. The two bands represent different glycoforms of gpl20.
  • Figure 38B The radiolabeled wt ⁇ protein was incubated either with the parental LI .2 cells or with the L1.2-CCR5 cells. Incubations were carried out either in the absence or presence of sCD4 (lOOnM) . The wt ⁇ protein bound to the cells is shown. The two bands represent different glycoforms of gpl20.
  • Figure 38B The two bands represent different glycoforms of gpl20.
  • the wt ⁇ protein was incubated with both sCD4 and 17b antibody at the indicated concentrations prior to adition to the L1.2-CCR5 cells.
  • the L1.2-CCR5 cells were incubated with 2D7 anti-CCR5 antibody or MIP-13 at the indicated concentrations prior to incubation with wt ⁇ -sCD4 complexes.
  • the wt ⁇ protein bound to the cells is shown.
  • Figure 38C The amount of radiolabeled wt ⁇ or selected mutant envelope glycoproteins precipitated by a mixture of HIV- 1-infected patient sera (Total), precipitated by sCD4 and an anti-CD4 antibody (Bound (sCD4) ) , or bound to LI.2- CCR5 cells (Bound (CCR5) ) is shown.
  • Figure 39 The amount of radiolabeled wt ⁇ or selected mutant envelope glycoproteins precipitated by a mixture of HIV- 1-infected patient sera (Total), precipitated by sCD4 and an anti-CD4 antibody (B
  • a ribbon drawing of the HIV-1 gpl20 glycoprotein (6) complexed with CD4 is shown. The perspective is that from the target cell membrane. The two amino-terminal domains of CD4 are shown. The gpl20 inner domain is shown, the outer domain is shown and the "bridging sheet" is shown. The gpl20 residues in which changes resulted in a >90% decrease in CCR5 binding are labeled. The VI/V2 stem and base of the V3 loop (strands / 512 and / ⁇ l3 and the associated turn) are indicated.
  • FIG. 39A A molecular surface of the gpl20 glycoprotein from the same perspective as that of Figure 39A is shown. Shaded surfaces are associated with gpl20 residues in which changes resulted in either a ⁇ 75% decrease, a ⁇ 90% decrease or a > 50% increase in CCR5 binding, when CD4 binding was at least 50% of that seen for the wt ⁇ protein.
  • Figure 39C The surface depicted in Figure 39B is shaded according to the degree of conservation observed among primate immunodeficiency viruses (25) .
  • Figure 39D The molecular surface of the gpl20 glycoprotein is shown, indicating residues in which changes resulted in a ⁇ 70% decease in 17b antibody binding, in the absence of sCD4.
  • Figure 39E The molecular surface of the gpl20 glycoprotein is shown, indicating residues in which changes resulted in a ⁇ 70% decrease in CG10 antibody binding in the presence of sCD4. Residues in which changes significantly decreased CD4 binding (and thus indirectly decreased CG10 binding) are not shown. Images were made with Midas-Plus (Computer Graphics Lab, University of California, San Francisco) and GRASP (26) . Mimcs of CD4 With Enhanced Affinity For gpl20
  • cysteine 43 mutant derivatives produced with the active halogen reaction scheme.
  • Figure 43A General reaction scheme for using a bifunctional reagent to modify the gpl20 -binding domain of CD4.
  • Reaction scheme for using a bifunctional reagent to modify a residue in the gpl20-binding domain of CD4 as applied to a cysteine residue.
  • the residues lining the hydrophobic pocket of gpl20 include: Trp (112), Leu (116), Pro (118), Phe (210), Val (255), Ser (375), Asn (377), Phe (382), He (424), Met (426), Trp (427), Asn (428), Ala (433), Gly (473), and Met (475)
  • Figure 47 Computer-generated ribbon drawing of the tertiary structure of CD4 and gpl20 interacting. CD4 is toward the bottom and gpl20 is toward the top.
  • Figure 48 Reaction scheme for chemically modifying tyrosine residues. RI may be selected from the group shown in Figure 44. An alterative mechanism may be achieved as shown on page 365 of Structure and Protein Chemistry by Jack Kyte (1994), in which a diazonium salt participate in electrophilic aromatic substitution with tyrosine.
  • Figure 49 Schematic showing the structural domains of gpl20.
  • the topology for the gpl20 ( ⁇ 82, ⁇ Vl/2, ⁇ V3 , ⁇ C5) construct .
  • Figure 55 Provides a detailed list of all the contacts between gpl20 (designated here as molecule A) and the Fab 17b
  • the light chain is designated here as molecule C; the heavy chain is designated here as molecule D) .
  • the invention relates to a crystals of gpl20 suitable for x-ray diffraction.
  • the three dimensional structure of gpl20 provides information which has a number of uses; principally related to the development of pharmaceutical compositions which mimic the action of gpl20.
  • the crystals comprising a portion of gpl20.
  • the portion of gpl20 may contain the CD4 binding site.
  • the portion contains the chemokine receptor binding site.
  • the portion of gpl20 contains both the CD4 binding site and the chemokine receptor binding site .
  • the portion of gpl20 will be at least 100 amino acids long. In a preferred embodiment, the portion is at least 200 amino acid long.
  • the essence of the invention resides in the obtaining of crystals of gpl20 of sufficient quality to determine the three dimensional (tertiary) structure of the protein by x-ray diffraction methods.
  • This invention provides crystals of sufficient quality to obtain a determination of the three-dimensional structure of gpl20 to high resolution, preferably to the resolution of 2.5 angstroms.
  • the value of crystals of gpl20 extends beyond merely being able to obtain a structure for gpl20.
  • the knowledge of the structure of gpl20 provides a means of investigating the mechanism of action of these proteins in the body. For example, binding of these proteins to various receptor molecules can be predicted by various computer models. Upon discovering that such binding in fact takes place, knowledge of the protein structure then allows chemists to design and attempt to synthesize molecules which mimic the binding of gpl20 to its receptors. This is the method of "rational" drug design.
  • One skilled in the art may use one of several methods to screen chemical entities for their ability to associate with gpl20. This process may begin by visual inspection of, for example, the active site on the computer screen based on the gpl20 coordinates. Docking may be accomplished using software such as Quanta and Sybyl , followed by energy minimization and molecular dynamics with standard molecular mechanics forcefields, such as CHARMM and AMBER.
  • Specialized computer programs may also assist in the process of selecting fragments or chemical entities. These include:
  • GRID [P.J. Goodford, "A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules” , J. Med. Chem. 28:849-857 (1985)]. GRID is available from Oxford Universit, Oxford, UK.
  • MCSS [A. Miranker and M. Karplus, "Functionality Maps of Binding Sites: A Multiple Copy Simultaneous Search Method", Proteins: Structure, Function and Genetics, 11:29-34 (1991)]. MCSS is available from Molecular Systems, Burlington, MA.
  • AUTODOCK [D.S. Goodsell and A. J. Olsen, "Automated Docking of Substrates to Proteins by Simulated Annealing", Proteins, Structure, Function, and Genetics, 195-202 (1990)] AUTODOCK is available from Scripps Research Institute, La Jolla, CA.
  • Assembly may be proceeded by visual inspection of the relationship of the fragments to each other on the three-dimensional image displayed on a computer screen in relation to the structure coordinates of gpl20. This would be followed by manual model building using software as Quanta or Sybyl .
  • CAVEAT [P.A. Bartell et al . , "CAVEAT: A Program of Facilitate the Structure-Derived Design of Biologically Active Molecules” , in Molecular Recognition in Chemical and Biological Problems", Special Pub., Royal Chem. Soc. 78, pp. 182-196 (1989)]. CAVEAT is available from the University of California, Berkeley, CA.
  • 3D Database systems such as MACCS-3D (MDL Information Systems, San Leandro, CA) . This area is reviewed in Y. C. Martin, "3D Database Searching in Drug Design", J. Med. Chem., 35:2145-2154 (1992).
  • inhibitory or other type of binding compounds may be designed as a whole or "de novo" using either an empty active site or optionally including some portion (s) of a known inhibitor (s) .
  • These methods include :
  • LUDI H.-J. Bohm "The Computer Program LUDI : A New Method for the De Novo Design of Enzyme Inhibitors", J. Co p. Aid. Molec . Design, 6:61-78 (1992)].
  • LUDI is available from Biosym Technologies, San Diego, CA.
  • LEGEND [Y. Nishibata and A. Itai, Tetrahedron, 47:8985 (1991)].
  • LENGEND is available from Molecular Simulations, Burlington, MA.
  • the gpl20 or CD4 antagonist may be tested for bioactivity using standard techniques.
  • structure of the invention may be used in binding assays using conventional formats to screen inhibitors.
  • Suitable assays for use herein include, but are not limited to, the enzyme-linked immunosorben assay (ELISA) , or a fluoresence quench assay.
  • ELISA enzyme-linked immunosorben assay
  • fluoresence quench assay a fluoresence quench assay.
  • Other assay formats may be used; these assay formats are not a limitation on the present invention.
  • the gpl20 structure of the invention permit the design and identification of synthetic compounds and/or other molecules which have a shape complimentary to the conformation of the gpl20 active site of the invention.
  • the coordinates of the gpl20 structure of the invention may be provided in machine readable form, the test compounds designed and/or screened and their conformations superimposed on the structure of the invention. Subsequently, suitable candidates identified as above may be screened for the desired gpl20 inhibitory bioactivity, stability, and the like.
  • inhibitors may be used therapeutically or prophylactically to block gpl20 activity.
  • this invention also provides material which is the basis for the rational design of drugs which mimic the action of gpl20.
  • the subject invention provides a crystal suitable for X- ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gpl20.
  • the subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 4 angstroms or better than 4 angstroms.
  • the subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 2.5 angstroms or better than 2.5 angstroms .
  • the subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a CD4 binding site.
  • the subject invention further provides the above- described crystals, further comprising a compound bound to the CD4 site.
  • the subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a chemokine receptor binding site.
  • the subject invention also provides the above-described crystals, further comprising a compound bound to the chemokine receptor binding site.
  • the subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a CD4 binding site and a chemokine receptor binding site.
  • the subject invention also provides the above-described crystals, further comprising of a first compound bound to the CD4 binding site of the polypeptide and a second compound bound to the chemokine receptor binding site of the polypeptide.
  • the subject invention also provides the above-described crystals, wherein the first compound is the second compound .
  • the subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 lacking the VI, V2 , V3 , and C5 regions.
  • the subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the conserved stem of the V1/V2 stem- loop structure.
  • the subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the base of the V3 loop.
  • the subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the C5 region.
  • the subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 with 5% by weight of the carbohydrate residues linked to the gpl20 in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
  • the subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 with 15% by weight of the carbohydrate residues linked to the gpl20 polypeptide in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
  • the subject invention also provides the above-described crystals, further comprising a Fab, a CD4 , a polypeptide having amino acid sequence of a portion of CD4 , or a combination thereof, bound to the gpl20.
  • the subject invention also provides the above-described crystals, wherein the Fab is produced from an antibody to a discontinuous epitope.
  • the subject invention also provides the above-described crystals, wherein the monoclonal antibody is designated 17b.
  • the subject invention additionally provides a method for producing a crystal suitable for X-ray diffraction comprising: (a) deglycosylating a polypeptide having amino acid sequence of a portion of a gpl20 wherein said portion is produced by deleting or replacing part of the gpl20 to reduce the surface loop flexibility; (b) contacting the polypeptide with a ligand so as to form a complex which exhibits restricted conformational mobility; and (c) obtaining crystal from the complex so formed to produce a crystal suitable for X-ray diffraction.
  • the subject invention also provides the above-described methods, wherein the VI, V2 , or V3 loop of the gpl20 contained in the polypeptide are partially truncated, deleted or replaced.
  • the subject invention also provides the above-described methods, wherein the polypeptide lacks the VI, V2 , V3 and C5 loop of the gpl20.
  • the subject invention also provides the above-described methods, wherein the polypeptide also lacks up to fifty N-terminal amino acids of the gpl20 or up to fifty C- terminal amino acid of gpl20.
  • the subject invention also provides the above-described methods, wherein the ligand is a Fab, a CD4 , or a polypeptide having amino acid sequence of a portion of CD4.
  • the subject invention also provides the above-described methods, wherein the resulting polypeptide after the deglycosylation contains at least 5% of the carbohydrate .
  • the subject invention also provides the crystal produced by the above-described methods.
  • the subject invention also provides a method for identifying a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the gpl20.
  • the subject invention also provides a method for designing a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b) designing a compound to fit the binding site.
  • the subject invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
  • the subject invention also provides a pharmaceutical composition
  • a pharmaceutical composition comprising the compound identified by the above-described methods and a pharmaceutically acceptable carrier.
  • pharmaceutically acceptable carriers means any of the standard pharmaceutical carriers.
  • suitable carriers are well known in the art and may include, but not limited to, any of the standard pharmaceutical carriers such as a phosphate buffered saline solutions, phosphate buffered saline containing Polysorb 80, water, emulsions such as oil/water emulsion, and various type of wetting agents.
  • Other carriers may also include sterile solutions, tablets, coated tablets, and capsules.
  • Such carriers typically contain excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients .
  • excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients .
  • Such carriers may also include flavor and color additives or other ingredients.
  • compositions comprising such carriers are formulated by well known conventional methods.
  • the subject invention also provides the above-described methods, wherein the compound is not previously known.
  • the subject invention also provides the compounds identified by the above-described methods.
  • the subject invention also provides the compound designed by the above-described methods.
  • the subject invention also provides a composition comprising the above-described compounds and a suitable carrier .
  • This invention also provides a method of inhibiting the interaction of HIV-gpl20 with CD4 which comprises administering to a mammal a compound, with the proviso that the compound is not CD4 , capable of disrupting two or more of the contacts between gpl20 and CD4 as set forth in Figure 54.
  • This invention also provides a method for identifying a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the CD4 binding site of the gpl20.
  • the molecular interaction on HIV with CD4 is between the HIV envelope glycoprotein gpl20 and the DI domain of CD4.
  • the crystal structure of the complex between the deglycosylated core of gpl20 and the D1D2 fragment of human CD4 defines this interaction in atomic detail (Nature paper) . Although there is an extensive interface between these components, the nexus of the interaction brings together those residues demonstrated by mutational analyses to those most crucial for binding.
  • Phe 43 and Arg 59 from CD4 and Asp 368, Glu 370 and Trp 427 from gpl20 (Nature paper, Fig. 3j).
  • This dominant sub-site comprises gpl20 residues 365-368, 370-371, 425-430 and 473.
  • Phe 43 closes off a pocket on the HIV surface to form a large cavity (152A 3 ) at this interface (Nature paper, Fig. 3b) .
  • Residues that line the Phe 43 pocket include Trp 112, Val 255, Thr 257, Glu 370, Phe 382, Tyr 384, Try 427, Met 475 and main-chain atoms of 256 and 375-377.
  • the atomic coordinates of the crystallographic model also define the binding surface to be exploited by high- affinity compounds that will have the property to inhibit the gpl20-CD4 interaction, and thereby the attachment of HIV to CD4-positive cells.
  • This definition of the surface provides practioners skilled in the art with the means to design such compounds .
  • Appropriate fragments or chemicals entities for the design of such compounds can be formed through the use of specialized computer programs such as GRID, DOCK and LUDI . Computer graphical representatives of these entitles can then be composed into appropriate chemical compounds, using the crystal structure as a template. Medicinal chemists skilled in the art can then synthesize appropriate chemical compounds to implement these designs.
  • a compound that will bind to the dominant sub-site of the CD4 intermolecular interface will have surface properties that are complementary to the surface properties of the sub-site itself.
  • the surface of the sub-site can be characterized by the GRASP computer program with respect to curvature, electrostatic potential, and hydrophobicity.
  • the complementary surface to this one i.e., convex vs. concave, positive vs. negative, etc.
  • any compound that has an accessible conformation such as to match the surface that is complementary to the HIV gpl20 binding surface is one that has a high probability for inhibitory binding. Since it should be possible for skilled practitioners to design and synthesize such compounds when instructed by the template of the HIV gpl20 structure and the CD4 binding elements, these compounds defined by congruence with the complementary surface can be considered inventions by the process hereby defined.
  • This invention also provides a method for designing a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) designing a compound to fit the CD4 binding site.
  • This invention also provides the above-described methods, wherein the crystal further comprising a CD4 , a second polypeptide having amino acid sequence of a portion of CD4 , or a compound known to be able to bind to the CD4 site of the gpl20, bound to the polypeptide.
  • This invention also provides the above-described methods, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
  • This invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
  • This invention also provides a pharmaceutical composition
  • a pharmaceutical composition comprising the compound identified the by above-described methods and a pharmaceutically acceptable carrier.
  • This invention also provides the above-described methods, wherein the compound is not previously known.
  • This invention also provides the compound identified by the above-described methods.
  • This invention also provides the compound designed by the above-described methods.
  • This invention also provides a composition comprising the above-described compounds and a suitable carrier.
  • This invention also provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject.
  • the above-described compounds are nonpeptidyl.
  • This invention provides a method for identifying a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) determining whether a compound would fit into the binding site, a positive fit indicating that the compound is capable of binding to the chemokine receptor binding site of the gpl20.
  • This invention also provides a method for designing a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) designing a compound to fit the chemokine receptor binding site.
  • the crystal further comprises a chemokine receptor, a second polypeptide having amino acid sequence of a portion of chemokine receptor, an antibody or a Fab capable of binding to the chemokine receptor binding site or a compound known to be capable of binding to the chemokine receptor binding site, bound to the polypeptide.
  • This invention also provides the above-described methods, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
  • This invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
  • composition comprising the compound identified by the above-described methods and a pharmaceutically acceptable carrier.
  • This invention also provides the above-described methods, wherein the compound is not previously known.
  • This invention provides compounds identified by the above-described methods. This invention provides compounds designed by above-described methods.
  • composition comprising the above-described compounds and a suitable carrier.
  • this invention provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject, thereby inhibiting Human Immunodeficiency Virus infection.
  • This invention further provides a method of inhibiting the interaction of HIV-gpl20 with chemokine receptor which comprises administering to a mammal a compound capable of disrupting two or more of the contacts between gpl20 and chemokine receptor as set forth in Figure 55, thereby inhibiting the interaction of HIV- gpl20 with chemokine receptor with the proviso that the compound is not a chemokine receptor.
  • the compound is nonpeptidyl.
  • This invention further provides a method of inhibiting cell entry by HIV, comprising blocking or inhibiting the residues from 2 or more the sets of the CCR5 -binding residues set forth above, thereby inhibiting or preventing gpl20 from binding to CCR5 and thereby inhibiting cell entry by HIV.
  • This invention also provides the above described method wherein 3 or more the sets of the CCR5-binding residues set forth above are blocked or inhibited from interacting with CCR5.
  • This invention also provides the above described methods, wherein the blocking or inhibiting comprises contacting the CCR5 -binding residues with an antibody.
  • This invention also provides the above-described methods, wherein the compound is nonpeptidyl .
  • This invention provides a substance mimicking the human immunodeficiency virus envelope glycoprotein gpl20- binding region of CD4 wherein the size of a residue or analog thereof, corresponding to the phenylalanine at position 43 in the native CD4 , is larger than the size of phenylalanine so as to fill the pocket on gpl20 which extends beyond position 43 in the gpl20/CD4 complex and increase the affinity for gpl20.
  • residue or analog thereof includes amino acids (both individually and as part of a polypeptide chain), modified amino acids, amino acid analogs, and chemical compounds that can be substituted for the amino acids that ordinarily make up the CD4 polypeptide chain. (Also see the discusion of peptidomimetics, synthetic polypeptides, and polypeptide analogs below.)
  • This invention also provides the above-described substance, wherein the substance is a peptidomimetic analog, a synthetic polypeptide, a standard polypeptide, or a polypeptide analog.
  • the substance mimicking the gpl20- binding domain of CD4 embraces a wide range of compounds.
  • the present invention also embraces other CD4 polypeptides such as polypeptide analogs of CD4.
  • Such analogs include fragments of CD4.
  • modifications of cDNA and genomic genes can be readily accomplished by well-known site-directed mutagenesis techniques and employed to generate analogs and derivatives of the CD4 polypeptide. Such products share at least one of the biological properties of CD4 but may differ in others.
  • products of the invention include those which are foreshortened by e.g., deletions; or those which are more stable to hydrolysis (and, therefore, may have more pronounced or longerlasting effects than naturally-occurring products) ; or which have been altered to delete or to add one or more potential sites for O-glycosylation and/or N-glycosylation or which have one or more cysteine residues deleted or replaced by e.g., alanine or serine residues and are potentially more easily isolated in active form from microbial systems; or which have one or more tyrosine residues replaced by phenylalanine and bind more or less readily to target proteins or to receptors on target cells.
  • polypeptide fragments duplicating only a part of the continuous amino acid sequence or secondary conformations within gpl20 which fragments may possess one property of gpl20 and not others. It is noteworthy that activity is not necessary for any one or more of the polypeptides of the invention to have therapeutic utility or utility in other contexts, such as in assays of gpl20 antagonism.
  • Competitive antagonists may be quite useful in, for example, cases of overproduction of gpl20.
  • polypeptide analogs of the invention are reports of the immunological property of synthetic peptides which substantially duplicate the amino acid sequence extant in naturally-occurring proteins, glycoproteins and nucleoproteins . More specifically, relatively low molecular weight polypeptides have been shown to participate in immune reactions which are similar in duration and extent to the immune reactions of physiologically-significant proteins such as viral antigens, polypeptide hormones, and the like. Included among the immune reactions of such polypeptides is the provocation of the formation of specific antibodies in immunologically-active animals [Lerner et al . , Cell, 23, 309-310 (1981); Ross et al . , Nature, 294, 654-658
  • This invention also provides the above-described substances, wherein the modification increases the hydrophobicity or size of the residue or analog thereof at position 43.
  • This invention also provides the above-described substances, wherein the modification comprises directly or indirectly linking a hydrophobic compound to a residue or analog thereof at position 43 of the domain.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that is bulkier than phenylalanine .
  • This invention also provides the above-described substances, wherein the modification- results in a residue or analog thereof larger than 7 A across its longest dimension.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 10 A across its longest dimension.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 15 A across its longest dimension.
  • This invention provides the above described substance which enhances hydrophobic interactions to residues that line the pocket. In another embodiment, this invention provides the above described substance which enhances hydrogen bonding to residues that line the pocket . In a separate embodiment, this invention provides the above described substance which enhances electrostatic interactions with residues that line the pocket. In a still separate embodiment, this invention provides the above described substance which enhances surface fit with residues that line the pocket.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof, wherein the residue's longest dimension is longer than phenylalanine' s longest dimension.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains a localization of negative charge so as to render the gpl20-binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains a localization of charge so as to render the gpl20 -binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
  • This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains at least one additional hydroxyl group.
  • Placing a tyrosine residue at position 43 is an example of a modification resulting in a residue that contains a hydroxyl group.
  • the oxygen of the hydroxyl group has a localization of negative charge so as to render the gpl20 -binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
  • the hydrogen of the hydroxyl group has a localization of charge so as to render the gpl20-binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
  • This invention also provides the above-described substances, wherein the modification involves replacemnet of the residue at position 43 with a cysteine. This invention further provides that the substition of the sulfhydryl group of this cysteine.
  • This invention also provides the above-described substances, wherein the modification involves replacement of the residue at position 43 with a tyrosine. This invention further provides that the substition of this tyrosine
  • This invention also provides the above-described substances, wherein the modification comprises directly or indirectly linking an adaptor residue or analog thereof to position 43.
  • This invention also provides the above-described substances, wherein the adaptor residue or analog thereof is directly or indirectly linked to a hydrophobic compound, thus forming a complex.
  • This invention also provides the above-described substances, wherein the complex is bulkier than phenylalanin .
  • This invention also provides the above-described substances, wherein the complex is larger than 7 A across its longest dimension.
  • This invention also provides the above-described substances, wherein the complex's longest dimension is longer than phenylalanine' s longest dimension
  • This invention also provides the above-described substances, wherein the complex is larger than 10 A across its longest dimension.
  • This invention further provides a pharmaceutical composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the above- described substance; and (b) a pharmaceutically acceptable carrier.
  • the actual effective amount will be based upon the size of the polypeptide, the biodegradability of the polypeptide, the bioactivity of the polypeptide and the bioavailability of the polypeptide. If the polypeptide does not degrade quickly, is bioavailable and highly active, a smaller amount will be required to be effective.
  • the effective amount will be known to one of skill in the art; it will also be dependent upon the form of the polypeptide, the size of the polypeptide and the bioactivity of the polypeptide. Variants of CD4 with lower affinity for gpl20 will require higher dosages than variants of CD4 with higher affinity for gpl20.
  • One of skill in the art could routinely perform empirical activity tests to determine the bioactivity in bioassays and thus determine the effective amount.
  • a pharmaceutical composition for treating or preventing HIV infection comprising (a) an effective amount of the above-described substances; and (b) a pharmaceutically acceptable carrier.
  • This invention further provides a composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the above-described substances; and (b) a suitable carrier.
  • This invention further provides a pharmaceutical composition for treating or preventing HIV infection, comprising (a) an effective amount of the above- described substances; and (b) a pharmaceutically acceptable carrier.
  • This invention further provides a composition for treating or preventing HIV infection, comprising (a) an effective amount of the above-described substances; and (b) a suitable carrier.
  • This invention further provides a method of inhibiting cell entry by HIV, comprising contacting the cells with an effective amount of the above-described substances, thereby inhibiting cell entry by HIV.
  • This invention further provides a method of treating or preventing HIV infection in a subject, comprising administering to the subject an effective amount of the above-described substances, thereby treating or preventing HIV infection.
  • the invention provides a variant of gpl20 which presents a hidden, conserved, neutralization epitope.
  • the amino acid of the above variant at position 375 is changed from a Serine to a Trptophan.
  • the variant further comprise one of the following changes: 88N to P, 102E to L, 113D to R, 117K to W, 257T to A, 266A to E, 386N to Q, 395W to S, 421K to L, 470P to G, 475M to S, 485K to V or a combination thereof .
  • This invention further provides a composition comprising the above-described variant and a suitable carrier.
  • composition as used herein means pharmaceutical compositions comprising therapeutically effective amounts of polypeptide products of the invention together with suitable diluents, preservatives, solubilizers, emulsifiers, adjuvants and/or carriers useful in therapy.
  • suitable diluents preservatives, solubilizers, emulsifiers, adjuvants and/or carriers useful in therapy.
  • a “therapeutically effective amount” as used herein refers to that amount which provides a therapeutic effect for a given condition and administration regimen.
  • compositions are liquids or lyophilized or otherwise dried formulations and include diluents of various buffer content (e.g., Tris-HCl., acetate, phosphate), pH and ionic strength, additives such as albumin or gelatin to prevent absorption to surfaces, detergents (e.g., Tween 20, Tween 80, Pluronic F68, bile acid salts), solubilizing agents (e.g., glycerol, polyethylene glycerol) , anti-oxidants (e.g., ascorbic acid, sodium metabisulfite) , preservatives (e.g., Thimerosal, benzyl alcohol, parabens) , bulking substances or tonicity modifiers (e.g., lactose, mannitol) , covalent attachment of polymers such as polyethylene glycol to the protein, complexation with metal ions, or incorporation of the material into or onto particulate preparations of polymeric compounds such as polylactic acid
  • compositions will influence the physical state, solubility, stability, rate of in vivo release, and rate of in vivo clearance of the admininstered materials .
  • the choice of compositions will depend on the physical and chemical properties of the protein having the biological activity. For example, a product derived from a membrane-bound form of the protein may require a formulation containing detergent.
  • Controlled or sustained release compositions include formulation in lipophilic depots (e.g., fatty acids, waxes, oils) .
  • particulate compositions coated with polymers e.g., poloxamers or poloxamines
  • the variants coupled to antibodies directed against tissue-specific receptors, ligands or antigens or coupled to ligands of tissue-specific receptors are also comprehended by the invention.
  • Other embodiments of the compositions of the invention incorporate particulate forms protective coatings, protease inhibitors or permeation enhancers for various routes of administration, including parenteral, pulmonary, nasal and oral.
  • suitable carriers means any of the standard carriers used in the pharmaceutical industry.
  • suitable carriers are well known in the art and may include, but not limited to, any of the standard pharmaceutical carriers such as a phosphate buffered saline solutions, phosphate buffered saline containing Polysorb 80, water, emulsions such as oil/water emulsion, and various type of wetting agents.
  • Other carriers may also include sterile solutions, tablets, coated tablets, and capsules.
  • Such carriers typically contain excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients.
  • excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients.
  • Such carriers may also include flavor and color additives or other ingredients.
  • Compositions comprising such carriers are formulated by well known conventional methods.
  • This invention also provides a vaccine comprising the above-described variant.
  • a vaccine may further comprise a suitable adjuvant.
  • Vaccines and adjuvants are well-known to those skilled in the art. Using a vaccine, comprising adjuvants or not, one may induce or stimulate the immune response of an individual.
  • the immune response may vary, e.g. a humoral or cell-mediated immune response.
  • Adjuvants are chemical compounds that enhance the immunogenicity of the vaccine so as to enhance the stimulation and induction of the immune response.
  • the vaccine is administered to a subject.
  • subject means any animal or artificially modified animal capable of becoming HIV- infected. Artificially modified animals include, but are not limited to, SCID mice with human immune systems.
  • the subject is a human. In another embodiment, the subject is a human infected with HIV.
  • a "human infected with HIV” means an individual having at least one of his own cells infected by HIV.
  • an HIV-infected cell is a cell wherein HIV has been produced.
  • a non-HIV-infected subject means a subject not having any cells infected by HIV.
  • a non-HIV- infected subject is an HIV-exposed subject.
  • an HIV-exposed subject is a subject who has HIV present in his body, but has not yet become HIV-infected. For example, a subject may become HIV-exposed upon receiving a needle stick injury with an HIV-contaminated needle.
  • the value of crystals of gpl20 extends beyond merely being able to obtain a structure for gpl20.
  • the knowledge of the structure of gpl20 provides a means of investigating the mechanism of action of these proteins in the body. For example, binding of these proteins to various receptor molecules can be predicted by various computer models. Upon discovering that such binding in fact takes place, knowledge of the protein structure then allows chemists to design and attempt to synthesize molecules which mimic the binding of gpl20 to its receptors. This is the method of "rational" drug design. Using such methods, one may determine a variant of gpl20 which presents a hidden, conserved, neutralization epitope.
  • This invention further provides an antibody induced by the above-described vaccine.
  • the antibody may be a polyclonal antibody or a monoclonal antibody.
  • an antibody comprises intact immunoglobulin molecules, substantially intact immunoglobulin molecules and those portions of an immunoglobulin molecule that contains the paratope, including those portions known in the art as Fab, Fab', F(ab') 2 and F(v), which portions are preferred for use in the therapeutic methods described herein.
  • the antibody is a single-chain antibody.
  • polyclonal antibodies may comprise different sera whereas “monoclonal antibody” comprises antibodies, each of which will reconize one single epitope. Methods for production of monoclonal antibodies are well-known in the art.
  • the gpl20 structure of the invention permit the design and identification of synthetic compounds and/or other molecules which have a shape complimentary to the conformation of the gpl20 active site of the invention.
  • the coordinates of the gpl20 structure of the invention may be provided in machine readable form, the test compounds designed and/or screened and their conformations superimposed on the structure of the invention. Subsequently, suitable candidates identified as above may be screened for the desired gpl20 inhibitory bioactivity, stability, and the like.
  • inhibitors may be used therapeutically or prophylactically to block gpl20 activity. ' Such compounds may prove useful as vaccines .
  • This invention provides a vaccine comprising a polypeptide having 6 or more amino acids in the same spatial proximity to each other as the amino acids from the Phe 43 cavity of naturally occurring gpl20.
  • This invention also provides the above-described vaccine, wherein the 6 or more amino acids are identical to the amino acids of naturally occurring gpl20.
  • This invention further provides the above-described vaccines, wherein the amino acids are within 1 angstrom of their distances in naturally occurring gpl20.
  • This invention also provides the above-described vaccines, wherein the amino acids are within 3 angstroms of their distances in naturally occurring gpl20.
  • This invention provides the above-described vaccines, wherein the amino acids are within 5 angstroms of their distances in naturally occurring gpl20.
  • This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
  • This invention further provides the above-described vaccines, further comprising a carrier.
  • This invention also provides the above-described vaccines, further comprising an adjuvant.
  • This invention provides a vaccine comprising a polypeptide having 6 or more continuous amino acids from the Phe 43 cavity of gpl20.
  • This invention provides the above-described vaccines, wherein the polypeptide is or is part of an epitope a conserved neutralization epitope.
  • This invention also provides ' the above-described vaccines, further comprising a carrier.
  • This invention further provides the above-described vaccines, further comprising an adjuvant.
  • This invention further provides a vaccine comprising a polypeptide having 6 or more amino acids in the same spatial proximity to each other as the surface accessible amino acids adjacent to the Phe 43 cavity of naturally occurring gpl20.
  • This invention also provides the above-described vaccines, wherein the 6 or more amino acids are identical to the amino acids of naturally occurring gpl20.
  • This invention provides the above-described vaccines, wherein the amino acids are within 1 angstrom of their distances in naturally occurring gpl20.
  • This invention also provides the above-described vaccines, wherein the amino acids are within 3 angstroms of their distances in naturally occurring gpl20.
  • This invention further provides the above-described vaccines, wherein the amino acids are within 5 angstroms of their distances in naturally occurring gpl20.
  • This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
  • This invention further provides the " above-described vaccines, further comprising a carrier.
  • This invention also provides the above-described vaccines, further comprising an adjuvant.
  • This invention also provides the above-described vaccines, wherein the surface accessible amino acids comprise Lysine 432, Proline 369, and Threonine 373.
  • This invention further provides a vaccine comprising a polypeptide having 6 or more continuous surface accessible amino acids adjacent to the Phe 43 cavity of gpl20.
  • This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
  • This invention further provides the above-described vaccines, further comprising a carrier. This invention also provides the above-described vaccines, further comprising an adjuvant.
  • virus viruses target specific host cells for infection by attachment to cell surface receptor molecules unique to these cells . These viral receptors have particular roles in the normal functioning of these cells. The virus simply subverts these functions in order to effect entry into the cell . Certain molecules on the viral surface can in turn be the target of antibodies raised by the host in defense against this parasitic attack. Viruses can evade such antibody immunity by mutating their surface proteins. The receptor binding site, however, must remain constant. It therefore evolves to be protected from antibody surveillance .
  • the viral surface protein, gpl20 (which appears to be a trimer on the surface of the virion) , -plays a central role in immune evasion.
  • the precise mechanism of gpl20 immune evasion thus far remains unknown, but the structure of the gpl20 - CD4 - Fab 17b complex reveals several crucial features:
  • the CD4 binding site is very large (larger than the typical antibody footprint) .
  • the Vl/2 variable loop is oriented to mask the CD4 binding site.
  • the V3 variable loop is not near the CD4 binding site (on a monomer) , but the tip of this loop could interact with Fab 17b, which marks the second receptor binding site. 4.
  • the CD4 binding site undergoes conformational changes upon CD4 binding.
  • the Vl/2 loop occludes the CD4 binding site and allow CD4 binding. With most viruses, which bind to rare cellular receptors, such a mechanism of immune evasion would not work; the virus would not find the proper receptor at high enough frequency to ensure viral propagation. It is the clustering of CD4 positive cells in such places as the thymus which allows this mechanism to function in the particular case of HIV. 2. The virus masks constant regions involved in both CD4 and second receptor binding; the act of CD4 binding induces conformational changes in gpl20 which unmask these regions . 3. The V3 loop, which forms part of the conserved second receptor binding site, is one of the regions unmasked by CD4 binding.
  • This invention uses an antigen which mimics the conformation of gpl20 on the surface of the HIV-virion, with deletions in the variable loop regions to expose the conserved CD4 binding site. It is already known that CD4 -binding site antibodies are widely neutralizing, and moreover, are found in virtually all patients (although they tend to only be found late in the course of infection -- the initial antibodies produced early in the course of infection have the Vl/2 or V3 loop as epitopes) .
  • This invention provides a vaccine composed of a stabilized oligomer of gpl20, with truncations in the variable loop regions to expose the conserved CD4 binding site, would elicit widely neutralizing antibodies against HIV.
  • a gpl40 construct (the extracellular portion of gpl20 + gp41) with a mutation at the gpl20/gp41 consensus cut site. 4. Trimers of GCN4 have been shown to enhance oligomerization. These oligomerization stabilizers could be added t the C-terminal tail of gpl20.
  • Stabilization of kinetically hidden epitopes of gpl20 uses gpl20 which has been stabilized to elicit an immune response. gpl20 may undergo conformational changes. However, only very few expose a conserved, neutralization epitope. This invention aims at using the information from the structure of gpl20 to stabilize the hidden neutralization epitope of gpl20. Specifically, the epitope may be stabilized by mutating the gpl20 or alternatively, some epitope may be stabilized by ligand/drug interaction.
  • Example 1 Specific examples are illustrated below: Example 1 :
  • the pocket of gpl20 ( Figure 51) only forms upon CD4 binding. If the residues along the pocket are mutated and was filled up, making it "stuck" in the CD4 conformation even without the binding CD4.
  • Such mutation may include changing the Ser375 to Trp375, Val255 to Phe255 and Thr257 to Trp257.
  • the residues which lines the pocket include: Trp 112 Leu 116 Pro 118 Phe 210 Val 255 Thr 257 Ser 375 Asn 377 Phe 382 He 424 Met 426 Trp 427 Asn 428 Ala 433 Gly 473 Met 475
  • disulfide bridges which tie protein domains .
  • the Topology for gpl20 ( ⁇ 82, ⁇ Vl/2, ⁇ V3 , ⁇ C5 ) is shown in Figure 52. One can see the two domains, the N/C termini including ⁇ l, and a barrel around c-2. A disulfide formed between S2 and S21 will tie the protein domains. As another example, disulfide bridge may be formed between ⁇ 5 and ⁇ 6 connection and top of the barrel (e.g. ⁇ lO) .
  • Example 3 Cavities internal to the gpl20 may be determined after knowing the three-dimensional structure of gpl20. Analysis of all atoms are within 4 Angstroms of the surface defining each cavity allows mutations to be designed to determine if any large substitutions are allowed. Below shows some example of the analysis:
  • Val225 to Trp - not as good as 375 (below) -modeling shows some clashes with Met 475, although 475 should be able to move .
  • Ser375 to Trp - good fit (Note: the ser 375 mutation is incompatible with the Val225 mutation so only one can be made at a time) .
  • Wild-type phenotype is 1.00 and decreases in recognition of below 1.
  • the cavity filling mutant 375S/W clearly exhibits reduction in binding of CD4-BS antibody binding. While the data look good, two of the CD4-BS antibodies (15e and 21 h) still bind with reasonable affinity.
  • the basic idea behind the cavity filling mutants is to stabilize the CD4-bound conformation of gpl20 at the expense of the CD4-free conformation. Additional substitutions may then be made in combination with the 375S/W. For example, taking the known mutations which exhibit similar phenotypes to 375S/W (See Thali et al (1995), J.Virol. 67, 3978-3988).
  • 113D/R The aspartic acid stabilizes the bridging sheet residues Gln428 and Lys429, which are important for maintaining the CD4-bound conformation of gpl20.
  • 117K/W The lysine helps stabilize the bridging sheet conformation, but this substitutions may also affects CCR5 binding so it may not be so good a choice .
  • 257T/A Since this Thr is basically buried in the CD4- bound conformation, and indeed provides stabilizing hydrogen bonds, the only way to explain the phenotype of the T/A substitution is that Thr257 must be a critical element in maintaining the CD4 -minus conformation. The residue is quite close 375W, so there may be some complications. If one places 375W in its preferred rotamer conformation, it clashes with Thr257-the mutation is accommodated by a slight change in rotamer conformation or by movement of the 375 backbone) . So the T/A may actually help to accommodate the 375W change .
  • 470P/G and 475M/S Both of these are close to the CD4- binding region, although both are buried and do not interact directly with CD4. Both retain good CD4 binding so the effect may be conformational.
  • This invention further provides vaccine design based upon confromational stabilization using the three- dimensional structure. See e.g. Malakauskas and Mayo Nature Structure Vol .5 , p.470-475, entitled “Design, Structure and Stability of a hyperthermophilic protein variant,” the content of which is incorporated into this application by reference.
  • HIV-1 human immunodeficiency virus
  • Sources of likely conformational heterogeneity such as N-linked carbohydrates, flexible or mobile N- and C-termini, and variable internal loops were reduced or eliminated, and ligands such as CD4 and antigen-binding fragments (Fabs) of monoclonal antibodies were used to restrict conformational mobility as well as to alter the crystallization surface.
  • ligands such as CD4 and antigen-binding fragments (Fabs) of monoclonal antibodies were used to restrict conformational mobility as well as to alter the crystallization surface.
  • CD4 and Fabs antigen-binding fragments
  • Crystallization by variation and modification For the more difficult crystallization challenges, which can be defined as those for which conventional screening fails, one typically tries to vary or modify the protein while maintaining biologically important properties. Meaningful results obtain since the integrity of internal structure and functional properties can often tolerate variation at the molecular surface where lattice contacts are made. The probability for success in crystallization is enhanced because flexible or heterogeneous surface features may be removed or because of the fortuitous introduction of lattice interactions.
  • a prescient example that pre-dates the powerful methods of modern molecular biology was John Kendrew' s screening of myoglobins from many different organisms until he found one, from sperm whale, that crystallized well (5) .
  • human myoglobin requires a Lys to Arg substitution in order to produce crystals suitable for structural analysis (6).
  • crambin forms exceptionally well-ordered crystals despite being a mixture of two isoforms with sequence variation at internal residues (7) .
  • HIV-1 induces acquired immunodeficiency syndrome (AIDS) in humans (17 , 18) .
  • the gpl20 glycoprotein helps to mediate virus entry into cells through sequential recognition of two cellular receptors, the surface glycoprotein CD4 (19,20) and a chemokine receptor (primarily CXCR4 or CCR5 , depending on viral strain) (21-26) .
  • CD4 surface glycoprotein
  • chemokine receptor primarily CXCR4 or CCR5 , depending on viral strain
  • gpl20 As the major virus-specific antigen accessible to neutralizing antibodies, knowledge of the gpl20 structure could also impact considerably on vaccine design. Despite this interest and considerable effort for several years with pure soluble protein, available in quantities as a byproduct in part from vaccine trials, gpl20 has resisted crystallographic analysis .
  • the mature gpl20 glycoproteins of different HIV-1 strains typically have 470-490 amino acid residues (32) .
  • Extensive N-linked glycosylation at 20-25 sites accounts for roughly half of the gpl20 mass(32,33).
  • Sequences from many different viral isolates show that gpl20 has five variable regions (V1-V5) interspersed between relatively conserved regions (C1-C5) (32,34) and nine conserved disulfide bridges (33) .
  • V1-V5 variable regions
  • C1-C5 relatively conserved regions
  • proteolytic digestion does not reveal a sub-domain structure. Indeed, even after extensive proteolytic cleavage, the unreduced protein runs near its native molecular weight on SDS-PAGE (PDK, unpublished data) .
  • the gpl20 glycoprotein likely exhibits conformational flexibility. Some of the variable regions, the V2 and V3 loops in particular, are known to be exposed on the surface of the native protein and probably assume multiple conformations. The potential of gpl20 to undergo conformational change is also evidenced by shedding, the CD4-induced dissociation of gpl20 from the surface of the virus, by ligand-induced variations in monoclonal antibody binding (35 , 36) , and by complex CD4-gpl20 binding kinetics (37) . These changes may be related to the functional role of gpl20 in virus entry.
  • protein ligands namely CD4 and the Fab fragments of several monoclonal antibodies
  • CD4 and the Fab fragments of several monoclonal antibodies were used to restrict conformational mobility.
  • Progressive trials of 18 different gpl20 crystallization variants yielded six different crystals, at least one of which is suitable for structural analysis.
  • This paradigm of crystallization with a focus on protein modification rather than on crystallization screening, may aid in the structural analysis of other conformationally complex proteins.
  • H is defined as the homogeneous fraction of the surface.
  • the probability that both are homogeneous is related to [(H- ⁇ j ) 2 x (H- 3 2 ) 2 ] where "H” is the homogeneous fraction of the surface which may form lattice contacts and " ⁇ n " is a function of the relative size and total number of lattice contacts other than contact n and the degree and distribution of surface homogeneity -- related to the occlusion of available surface area upon formation of each lattice contact as well as the spatial distribution of homogeneous surface over the molecular surface.
  • the probability associated with homogeneous lattice formation is related to:
  • C C ave
  • the observed average value of "C" (C ave ) is -4.5(39), with a minimum theoretical value for the most common space groups of 2 or 3(39). Since C may be relatively small, lattice contacts may make up only a small proportion of a macromolecule surface, with considerable surface heterogeneity tolerated. Thus, for example, many proteins that pack into well-ordered crystal lattices have disordered regions, with N- and C- termini as well as internal loops being unresolved.
  • Equation 4 is still not very useful, however, since M (homogeneous) is unknown and molecule-specific. In reducing heterogeneity, however, it seems reasonable to assume that the removed portion, if it were a highly branched carbohydrate or a proteolytically exposed region, is completely heterogeneous. In such cases,
  • Equation 4 Another variant of Equation 4 can be used to estimate the impact of adding a ligand of fixed structure to a molecule that contains heterogeneous portions. This expands the surface available for lattice contacts and effectively dilutes the heterogeneous component . It may be an approach of choice when the heterogeneity is essentially unremovable, such as at the lipid interface of detergent solubilized membrane proteins.
  • the heterogeneity is essentially unremovable, such as at the lipid interface of detergent solubilized membrane proteins.
  • the enhancement in overall probability for successful crystallization from a set of n variants can then be calculated relative to the probability for a single variant. If we assume that the probability for crystallization of this individual variant, i, is typified by the average for all variants, Pi « P ave / the enhancement factor is
  • the enhancement is inversely related to the average probability of crystallizing a single variant: £ - ( Pave ) - l O )
  • Fab fragments were produced by papain digestion of monoclonal antibodies. Briefly, the antibody was reduced in 100 mM DTT, 100 mM NaCl , 50 mM Tris pH 8.0 for 1 hr at 37°C, and dialyzed (4'C), first in phosphate-buffered saline (PBS) to reduce the DTT concentration to ⁇ 1 mM, then in alkylating solution (PBS titrated to pH 7.5 with 2 mM iodoacetamide, 48 hr) , and subsequently in PBS without iodoacetamide.
  • PBS phosphate-buffered saline
  • the reduced and alkylated antibody was concentrated to at least 2 mg/ml and digested with papain using a commercial protocol (Pierce) .
  • the gpl20 proteins were subjected to digestion with papain, elastase, and subtilisin (Boehringer Mannheim) to assay for proteolytic susceptibility.
  • papain elastase
  • subtilisin Boehringer Mannheim
  • the gpl20 concentration was kept constant and the protease diluted serially (3.3x) from a ratio of 1:10 to 1:1000.
  • the digestion mix was incubated for 1 hr at 37°C and quenched by addition of 1% SDS (1:10 ratio) with immediate heating in boiling water for 2 minutes. Digestion products were analyzed with SDS-polyacrylamide gel electrophoresis (PAGE) with and without DTT reduction.
  • PAGE SDS-polyacrylamide gel electrophoresis
  • Carboxypeptidase Y digestion was used to analyze the C-terminus of gpl20. A 1:10 ratio of carboxylpeptidase Y (Boehringer Mannheim) to gpl20 was incubated for 1 hr at 37 "C, pH 7.0. Even though digestion could not be easily seen by SDS-PAGE, the C-terminus of gpl20, HXBc2 strain, contains a number of positively charged amino acids, and the extent of the reaction could be monitored by native-PAGE.
  • Drosophila-produced gpl20 proteins were deglycosylated enzymatically . Briefly, 0.5 mg/ml of gpl20 was incubated with various deglycosylating enzymes (singly or in combination) in 0.5 M NaCl, 100 mM Na acetate, pH 5.7, for 10 hr at 37 "C.
  • Endoglycosidase D was used at a concentration of 0.1 U/ml, Endoglycosidase F at 0.25 U/ml, Endoglycosidase H at 0.25 U/ml, and Glycopeptidase F at 0.1 U/ml (all from Boehringer Mannheim) .
  • Monoclonal antibody binding assay The various gpl20 glycoproteins were assessed for recognition by a variety of monoclonal antibodies directed against both linear and discontinuous gpl20 epitopes by either immunoprecipitation (46) or by ELISA(47) .
  • the ELISA was performed with both fully glycosylated and deglycosylated ⁇ V1/2 ⁇ V3 glycoproteins immobilized on ELISA plates using a capture antibody specific for the gpl20 carboxyl -terminus, 6205 (International Enzymes) (47) .
  • Crystallization The vapor-diffusion hanging-droplet technique was used for all crystallizations. Small volumes, 0.5 ⁇ l protein solution + 0.5 ⁇ l reservoir solution, were used for most crystallizations, screenings and final optimizations.
  • Crystal Screen I (Hampton Research) was used, augmented by approximately 20 conditions which tested high protein concentrations (vapor diffusion concentration of the protein at various pH ⁇ ) as well as mixtures of organic additives (2-5% MPD, PEG 400, or PEG 4000) combined with high ionic strength (2-4 M NaCl, (NH 4 ) 2 S0 4 or Na/KP0 4 ) at pH 5.5-9.5.
  • high ionic strength 2-4 M NaCl, (NH 4 ) 2 S0 4 or Na/KP0 4
  • a subset of 12 different conditions was analyzed in depth to establish the approximate precipitation point of the protein for a variety of different precipitants .
  • the factorial solutions were then individually adjusted to target the observed precipitation point and a full screen of -70 conditions was set up at 20 °C.
  • Type E crystals were grown from the following conditions: Protein ( ⁇ 82 ⁇ Vl/2* ⁇ V3 ⁇ C5 gpl20, two-domain CD4 (D1D2) , Fab 17b purified as a ternary complex on the Superdex S-200); Droplet (0.5 ⁇ l protein solution consisting of -10 mg/ml protein in gel filtration buffer + 0.4 ⁇ l droplet mix containing 0.1 M NaCitrate, 0.02 M NaHepes, 10% isopropanol, 10.5% PEG 5000 monomethylether (Fluka) , 0.0075% SeaPrep Agarose (FMC BioProducts) , pH 6.4; Reservoir: (0.35 M NaCl, 0.1 M NaCitrate, 0.02 M Hepes, 10% isopropanol, 10.5% PEG 5000 monomethylether, pH 6.4) .
  • Variant constructs of the gpl20 protein Variants of gpl20 were developed through an iterative cycle which strove to eliminate heterogeneity.
  • the cycle involved recombinant production of gpl20 variants, deglycosylation, and then assessment of heterogeneity and flexibility by examinations of glycosylation status, monoclonal antibody binding, and protease sensitivity, leading to the design of new constructs. For example, protease digestion monitored by PAGE indicated susceptibility at the C-terminus, and a form with 15-20 residues removed by carboxypeptidase Y retained CD4 binding activity.
  • a homogeneous product was difficult to make by this method, and primer-based PCR mutagenesis and recombinant expression were used to generate a homogeneous gpl20 derivative with a 19-residue C-terminal deletion.
  • sequencing of the initial constructs showed the expected signal cleavage at +31, with four additional amino acids, Gly-Ala-Arg-Ser, added from the signal peptide (a consequence of different processing of the cloning vector signal peptide with gpl20) .
  • Protease digestion gave a product at +40, indicating flexibility in the N-terminus.
  • variable loops VI, V2 , and V3
  • VI, V2 , and V3 were deleted and replaced with shorter segments, as reported earlier (52,53). Little effect was found on CD4 binding activity (47, 53) .
  • Three constructs were made which contained deletions of the VI, V2 , and V3 loops (Table 2) .
  • the ⁇ V1/2 ⁇ V3 construct the entire base and stem of the variable loops VI, V2 and V3 were excised.
  • the conserved stem of the VI/V2 stem-loop structure was retained, restoring the CD4-induced antibody epitopes in the presence of soluble CD4.
  • the base of the V3 loop was retained as well, fully restoring CD4-induced antibody epitopes, even in the absence of soluble CD4.
  • Mass spectroscopy of the deglycosylated ⁇ 82 ⁇ Vl/2* ⁇ V3 ⁇ C5 gpl20 showed a molecular mass of 39,000 + 50 Da, consistent with a mass of 35.4 kDa for the protein (based on the DNA sequence) and 3.6 kDa for the remaining carbohydrate.
  • Carbohydrate analysis showed only fucose and N-acetyl-glucosamine sugars to be present, in a ratio of 1:3.05 ⁇ 0.02, respectively.
  • CD4 has a flexible juncture between the second and third extracellular domains (54), and Fabs have a conformationally mobile "elbow bend" between their variable and constant domains (55) .
  • Fabs have a conformationally mobile "elbow bend" between their variable and constant domains (55) .
  • Crystallization was originally devised as a method for deducing the essential crystallization factors from combinations of different conditions (1) .
  • a high probability of success has been reported with as few as 6 different conditions at 4 different concentrations (56) , and commercial kits are available with 50-100 conditions (Hampton Research) .
  • small volume droplets were used, typically 0.5 ⁇ l of protein per crystallization trial. With small volumes, 1-2 mg of protein was sufficient to evaluate each gpl20 crystallization variant. Smaller volumes were also more efficient at nucleation than larger droplets, perhaps due to higher surface tension effects which may result in a greater range of precipitant concentrations for each droplet to sample. Indeed, droplets that were "spread-out" also showed enhanced nucleation. This explanation may also account for the well-known observation that crystals frequently nucleate from the edges of crystallization droplets.
  • the initial crystallization screens produced six different types of crystals (Fig. 1, Table 5) .
  • crystal types A-D extensive optimization was unable to produce single crystals large enough to be characterized.
  • crystal types E and F single crystals of needle morphology could be grown.
  • the growth of single crystals of type E required the addition of agarose, which was identified during optimization by the additive screening process.
  • further crystallization optimization failed to produce large single crystals, and the best typical crystals were rods with a cross-section of only 30 x 40 ⁇ m.
  • a closely related crystallization variant which retained 10 additional amino acids in the stem of the V3 loop, failed to crystallize (Table 4) .
  • Characteristics of gpl20 crystals Single crystals of type E and F were analyzed for diffraction in capillary mounts. Only type E crystals showed diffraction. The needle axis of type E crystals proved to coincide with the a axis, and the rhombohedral cross-section perpendicular to the needle axis proved to be bounded by faces of the form (0 1 1) . These could be distinguished from type F crystals, where the cross-section was hexagonal. Gel electrophoresis of type E crystals demonstrated that they contained all the elements of the ternary complex: gpl20, D1D2 , and Fab 17b (Fig. 2) .
  • the resistance of gpl20 to crystallization may be related in part to its functional role in eluding the immune system; the mechanisms evolved to prevent the formation of specific immune system : gpl20 contacts, might also thwart formation of the homogeneous gpl20 : gpl20 contacts needed for crystallization.
  • the protein modifications that most greatly reduced heterogeneity (and thus enhanced the crystallization probability) , removal of carbohydrate and substitution of the variable loops (Table 3) have been shown in vivo to enhance the generation of neutralizing antibodies (58 , 59) . It is difficult to evaluate the predictions of the crystallization algorithms derived here in a statistically significant manner.
  • crystallization literature is replete with examples of protein manipulation, from proteolytic digestion, to variation in solvating detergent, to screening of DNA oligonucleotides (38) . What distinguishes our efforts is the derivation of a theoretical foundation, which allows the probabilistic assessment of the most effective crystallization approach. Because of the conformational complexity of gpl20, we focused on surface modification to eliminate heterogeneity and to present new crystallization variants -- coupled to a limited screen of crystallization conditions. The types of crystallization problems embodied in gpl20 (Table 3) are not so different from many of the typical problems facing present day crystallographers ; both from a theoretical or from a practical perspective, the strategy of probability analysis coupled to variational crystallization may be broadly applicable.
  • solubility requires -300 mM NaCl for solubility ++
  • (+) refers to almost no change in probability after optimization, whereas (+++++) refers to a large change in probability.
  • the scale used here is a qualitative estimate; for more quantitative results, see Table 3.
  • optimization refers to the effect on crystallization of making the protein more chemically homogeneous.
  • optimization refers to the effect of removing or circumventing the particular source of heterogeneity. Table 2.
  • ⁇ V1/2 ⁇ V3 and ⁇ V1/2 ⁇ V3 ⁇ C5 constructs were chimeras of strains BH10 and HXBc2, ⁇ Sequence numbers refer to the translated gpl60, with the mature gpl20 beginning at +31. N-terminal sequencing showed that all constructs contained 4 additional amino acids, Gly- Ala-Arg-Ser, an artifact of the signal peptide cleavage. GAG here refers to the tripeptide, Gly-Ala-Gly, which was substituted for the removed amino acids.
  • Conformational Conformation restriction with protein ligands heterogeneity such as CD4 and Fabs from conformationally ( p-we ) " 1 sensitive monoclonal antibodies
  • the molecular weight for the glycosylated gpl20 is approximately 90 kDa; the deglycosylated gpl20, 60 kDa; and the deglycosylated
  • i D1D2 sCD4 refers to two-domain soluble CD4. Antibody epitopes are described in the text..
  • D1D2 sCD4 refers to the two domain soluble CD4.
  • the protein concentration is given as the absoettee (280 nm) of the complex per ml of solution.
  • Crystallization reagents are conditions from Crystal Screen 1 (Hampton Research); the reagent numbers given here refer to the crystallization reagent from this commercial kit.
  • Hanging droplets were 0.5 ⁇ l protein (in 0.35 M NaCl, 5 mM Tris pH 7.0, 0.02% NaN 3 ) + 0.5 ⁇ l reservoir, except for crystal type B, which used 0.5 ul of 3-fold diluted reservoir.
  • Crystallization reservoirs were 500 ⁇ l; an additional 35 ul of 5 M NaCl was added after the droplet was mixed to compensate for the NaCl in the protein solution. All dilutions used H2O, except for crystal type F, where 22.5% isopropanol was used. Crystallizations were setup at room temperature and incubated at 20°C.
  • PROTEIN 4 CONC RESERVOIR SOLUTION" ulP ⁇ lR FK-ORE
  • PS Factorial #28 0.5 0.5 14 200 ⁇ l factorial/ 400 ⁇ l total volume (2.0 dilution) 1.36
  • PS Factorial #35 0.5 0.5 15 200 ⁇ l factorial/ 700 ⁇ l total volume
  • PS Factorial #12 .05 0.5 23 200 ⁇ l factorial/ 300 ⁇ l total volume
  • PS Factorial #29 .05 0.5 24 200 ⁇ l factorial/ 500 ⁇ l total volume
  • the final volume was made up by water , if there is a volume discrepancy .
  • the human immunodeficiency viruses HIV-1 and HIV- 2 and simian immunodeficiency viruses (SIVs) are the etiologic agents of acquired immunodeficiency syndrome (AIDS) in their respective human and simian host (1) .
  • infection with primate immunodeficiency viruses is characterized by an initial phase of high-level viremia, followed by a long period of persistent virus replication at a lower level (2) .
  • Viral persistence occurs despite specific antiviral immune responses, which include the generation of neutralizing antibodies.
  • the primate immunodeficiency viruses like all retroviruses, are surrounded by an envelope consisting of a host cell-derived lipid bilayer and virus-encoded envelope glycoproteins (3) .
  • the viral membrane must be fused with the plasma membrane of the cell, a process mediated by the envelope glycoproteins.
  • the exposed location of these proteins on the virus allows them to carry out their function but also renders them uniquely accessible to neutralizing antibodies.
  • dual selective forces, virus replication and immune pressure have shaped the evolution of the envelope glycoproteins and continue to do so within each infected host.
  • the envelope glycoproteins are synthesized as approximately 845-870 amino acid precursor in the rough endoplasmic reticulum. (N) - linked, high-mannose sugar chains are added to form the gpl60 glycoprotein, which assembles into oligomers (4- 6) . The preponderance of evidence suggests that these oligomeric complexes are trimers (4,5) .
  • the gpl60 trimers are transported to the Golgi apparatus, where cleavage by a cellular protease generates mature envelope glycoproteins: gpl20, the exterior envelope glycoprotein, and gp41, the transmembrane glycoprotein (3) .
  • the gp41 glycoprotein possesses an ectodomain that is largely responsible for trimerization (7) , a membrane -spanning anchor, and a long cytoplasmic tail. Most of the surface-exposed elements of the mature, oligomeric envelope glycoprotein complex are contained on the gpl20 glycoprotein. Selected, presumably well- exposed, carbohydrates on the gpl20 glycoprotein are modified in the Golgi apparatus by the addition of complex sugar (6) . The gpl20 and gp41 glycoproteins are maintained in the assembled trimer by non-covalent, somewhat labile interactions between the gp41 ectodomain and discontinuous structures composed of N- and C- terminal gpl20 sequences (8) .
  • Virus attachment also involves the interaction of the gpl20 envelope glycoproteins with specific receptors, the CD4 glycoprotein (11) and members of the chemokine receptor family (12, 13) (Fig. 26).
  • the CD4 glycoprotein is expressed on the surface of T lymphocytes, monocytes, -Ill- dendritic cells, and brain microglia, the main target cells for primate immunodeficiency virus in vivo.
  • the requirement for CD4 binding exhibited by most primate immunodeficiency viruses for efficient entry is consistent with this observed in vivo tropism.
  • CD4 binding A major function of CD4 binding is to induce conformational changes in the gpl20 glycoprotein that contribute to the formation and/or exposure of the binding site for the chemokine receptor (13, 14).
  • feline immunodeficiency viruses use chemokine receptors but not CD4 for entry (16) raise the distinct possibility that the chemokine receptors represent the primordial, obligate receptors for this retroviral lineage.
  • the use of CD4 as a receptor may have evolved subsequently, allowing the high-affinity chemokine receptor-binding site of primate immunodeficiency viruses to be sequestered from host immune surveillance.
  • the more conserved regions fold into a gpl20 core which has been recently crystallized in a complex with fragments of CD4 and a neutralizing antibody (20) .
  • the gpl20 core is composed of two domains, an inner domain and an outer domain, and a ⁇ sheet (the "bridging sheet") that does not properly belong to either domain (Fig. 27a) .
  • These names reflect the likely orientation of gpl20 in the assembled envelope glycoprotein trimer: the inner domain faces the trimer axis and, presumably, gp41, while the outer domain is mostly exposed on the surface of the trimer. Elements of both domains contribute to CD4 binding.
  • CD4 binds in a recessed pocket on gpl20, making extensive contact over approximately 800 A° 2 of the gpl20 surface. Two cavities are evident in the gpl20-CD4 interface. A shallow cavity is filled with water molecules, while a deep cavity extends 10-15 A° into the interior of gpl20. The opening of this deep cavity is occupied by phenylalanine 43 of CD4 , which has been shown by mutagenic analysis to be critical for gpl20 binding
  • gpl20 residues previously identified as important for CD4 binding (22,23) surround the opening of the deep cavity and contribute to interactions with phenylalanine 43 of CD4.
  • aspartic acid 368 of gpl20 forms a salt bridge with arginine 59 of CD4 , also shown by mutagenesis to be important for gpl20 binding (21) .
  • mainchain atoms on gpl20 and CD4 form hydrogen bonds bridging the two proteins.
  • the formation of the deep cavity in gpl20 likely contributes to the transmission of CD4-induced conformational changes to gpl20 elements involved in the interaction with chemokine receptors and/or gp41.
  • the deep cavity may be a useful target for intervention by small molecular weight compounds.
  • CCR5 for entry (12) .
  • CCR5 is an obligate coreceptor, and rare individuals that are genetically deficient in CCR5 expression are relatively resistant to HIV-1 infection (24) .
  • HIV-1 isolates arising later in the course of infection often-use other chemokine receptors, frequently CXCR4 , in addition to CCR5 (12,24).
  • Studies of chimeric envelope glycoproteins demonstrated that the third variable (V3) loop of gpl20 is a major determinant of chemokine receptor choice (12,25) .
  • V3-deleted versions of gpl20 do not bind CCR5 , even though CD4 binding occurs at wild-type levels (14) .
  • Antibodies against the V3 loop interfere with gpl20-CCR5 binding
  • CD4i epitopes are discussed further below. Recent mutagenic and structural analysis have revealed the existence of a highly conserved gpl20 structure that is important for CCR5 binding (20,27) (Fig. 27, a and b) . This structure is adjacent to the V3 loop and the CD4i epitopes, and is oriented to face the target cell upon gpl20-CD4 binding.
  • the gp41 ectodomain structures reveal an extended, trimeric coiled coil that could potentially bridge the viral and target cell membranes (5) .
  • Interactions of other gp41 helical segments near the membrane-spanning region with the interhelical grooves of the internal coiled coil are important for fusion-related conformational changes in gp41. This interaction can be inhibited by helical peptides that mimic either of the involved gp41 helices (30) and is a potential target for future intervention with small molecular weight compounds.
  • HIV-1 envelope glycoproteins as antigens.
  • the success of these viruses in achieving persistent infections implies that the viral envelope glycoproteins have evolved to be less-than-ideal immunogens and antigens.
  • Structures on the viral envelope glycoproteins that are conserved among diverse viral strains are, in general, poorly exposed to the humoral immune system.
  • the crystal structure of the gpl20 core reveals a third, immunologically silent face of gpl20 (Fig 6D) .
  • HIV-1 viruses that have been passaged in immortalized cell lines are typically more sensitive to neutralization by antibodies or soluble CD4 than are primary, clinical isolates (34) .
  • a major determinant is the structure of the gpl20 major variable loops, V1/V2 and V3 (35) .
  • V1/V2 and V3 variable loops of a laboratory-adapted virus with those of a neutralization- resistant primary isolate creates a virus similar to the parental primary virus (35) .
  • the basis for the decreased sensitivity of primary HIV-1 isolates to neutralization appears to involve a decreased exposure of the relevant gpl20 epitopes to soluble CD4 or antibody.
  • gpl20 and gp41 contributes to the lability of the functional envelope glycoprotein trimer (8,9) .
  • the interactive regions of gpl20 and gp41 are particular immunogenic (37) .
  • neutralizing antibodies can be detected in the sera of infected animals or humans (38) . These antibodies neutralize the infecting virus but often exhibit little of no activity against other stains of virus . A subset of these strain-restricted antibodies recognize the HIV-1 V3 loop (38) . These antibodies can block chemokine receptor binding (14) .
  • Other variable gpl20 elements can contribute to the epitopes recognized by the strain- restricted neutralizing antibodies. It is known, for example, that antibodies directed against the gpl20 V2 loop can also exhibit neutralizing activity (39) . The V2 loop-associated neutralization epitopes are typically conformation-dependent.
  • V2-or V3- directed antibodies to recognize more than one HIV-1 strain (39,40) suggests that these major variable loops assume a finite number of conformations. This is consistent with the functional consequences on virus entry of some changes in these variable structures (41) , and with the observation that amino acid substitutions in the variable loops are not random (42) .
  • the requirement for chemokine receptor binding probably constrains V3 loop variation.
  • the V2 loop although dispensible for the replication of some HIV-1 viruses in culture (33), helps protect the V3 loop and the conserved epitopes near the chemokine receptor binding site from neutralizing antibodies.
  • the V2 and V3 loops reside proximal to the chemokine receptor binding site (Fig. 27), masking more conserved gpl20 elements and presenting potentially variable epitopes to the immune system.
  • the gpl20 residues important for antibody binding are all located within the CD4 -binding pocket on gpl20 (Fig. 27b) , and several of the most important residues are near the opening of the deep cavity (20) . Therefore, some broadly neutralizing antibodies can apparently access the more recessed elements of the CD4 binding pocket. This is consistent with the observation that the gpl20-CD4 interface is as large as that of a typical antibody-antigen complex (20) .
  • CD4i CD4- induced epitopes
  • the CD4i epitopes are located near conserved gpl20 structures important for chemokine receptor interaction (14) (Fig. 27b) .
  • CD4 binding has been shown to cause a change in the V2 loop conformation that allows better CD4i epitope exposure (33).
  • the antibodies recognizing the CD4i epitopes must bypass the overlapping V2 and V3 loops (33) . Indeed, as is evident in the current crystal structure (20) , this is accomplished by the protrusion of the CDR3 loop of the antibody heavy chain.
  • Antibodies against CD4i epitopes need to bind viruses before CD4 binding occurs to achieve neutralization (47) .
  • the reason is that once the envelope glycoprotein complex binds cell surface CD4 , there are severe steric constraints on the binding of an antibody to the gpl20 surface facing the target cell (Fig. 26) .
  • Another fairly conserved gpl20 neutralization epitope is recognized by the 2G12 antibody (48) .
  • the 2G12 antibody Unlike the other characterized HIV-1 neutralizing antibodies, which recognize gpl20 structures near or within the receptor- binding sites, the 2G12 antibody apparently binds an epitope in the outer domain (Fig. 27b) . Given the variability in this outer domain, the ability of the 2G12 antibody to neutralize a fair number of HIV-1 strains (48) seems paradoxical. The marked sensitivity of 2G12 binding to alterations in gpl20 glycosylation provides a clue to this puzzle.
  • the 2G12 antibody may recognize more conserved carbohydrate structures formed as a result of the heavy concentration of N-linked glycosylation in the gpl20 outer domain.
  • the apparent rarity with which 2G12-like antibodies are elicited attests to the success of the viral strategy of employing a heavily glycosylated outer domain surface in immune evasion.
  • the HIV-1 envelope glycoproteins as vaccine components. That the human and simian immunodeficiency virus envelope glycoproteins are not ideal immunogens is an expected consequence of the immunological selective forces that drove the evolution of these viruses.
  • the same features of the envelope glycoproteins that dictate poor immunogenicity in natural infections have hampered vaccine development.
  • the lability of envelope glycoprotein complex has frustrated attempts to present oligomers mimicking the functional spike to the immune system.
  • the disintegration of envelope glycoprotein oligomers contributes to the preferential elicitation of non-neutralizing antibodies by the newly exposed gpl20 N- and C-termini.
  • variable loops elicit the majority of neutralizing antibodies, probably due to the exposed nature of these epitopes. It is still unclear whether conserved features in the V2 and V3 variable loops exist that can be exploited in vaccine design, or whether all possible functional configurations of these variable structures need to be represented in a cocktail of immunogens .
  • the discontinuous gpl20 structures surrounding the receptor binding sites exhibit a relatively high degree of conservation (20), in keeping with the minimal polymorphism in the host cell receptors.
  • the CD4 binding site contributes a particularly attractive target. It appears to be accessible to antibodies, more so than the conserved elements of the chemokine receptor-binding region. A large fraction of the broadly neutralizing antibodies that eventually appear in HIV-1-infected individuals is directed against the CD4 binding site (43), indicating that ability of the human immune system to recognize this gpl20 region and to generate an appropriate response. Nonetheless, these antibodies have been difficult to elicit in animals and vaccinated humans (49) . The reasons for the relatively poor immunogenicity of the CD4 binding site are not yet understood, although several possibilities can be envisioned.
  • Interdomain flexibility may disrupt the CD4BS epitopes and decrease their representation in the pool of immunogens.
  • Masking by variable loops (19,33) and glycosylation may contribute to the recessed nature of the CD4BS epitopes which, even on the crystallized gpl20 core, occupy a 20 A° deep canyon (20) .
  • Within the CD4 -binding pocket not all of the gpl20 surface is conserved among HIV-1 strains. Therefore, even when elicited, some CD4BS-directed antibodies may lack the breadth and affinity to be optimal neutralization agents. While many monoclonal antibodies against the CD4 binding site exhibit reasonable potency and breadth (44) , whether a polyclonal response against the envelope glycoprotein can be focused to preferentially contain these types of antibodies remains to be seen.
  • HIV-1 envelope glycoproteins have evolved to be inefficient at eliciting effective antiviral antibody responses.
  • the availability of structural information on the conserved HIV-1 gpl20 neutralization epitopes should facilitate the modification of this important antigen and allow the rational testing of hypotheses regarding its poor immunogenic properties. These efforts should complement ongoing efforts to improve antigen presentation to the immune system and to create suitable animal models for the screening of vaccine candidates .
  • the structure reveals a cavity- laden CD4-gpl20 interface, a conserved binding site for the chemokine receptor, evidence for conformational change upon CD4 binding, the nature of a CD4-induced antibody epitope, and specific mechanisms for immune evasion.
  • Our results provide a framework for understanding the complex biology of HIV entry into cells and will guide efforts to intervene.
  • the crystallized gpl20 is from the HXBc2 strain of HIV-1. It has deletions of 52 and 19 residues from the N- and C- termini, respectively; Gly-Ala-Gly tripeptide substitutions for 67 Vl/V2-loop residues and 32 V3-loop residues; and the removal of all sugar groups beyond the linkages between the two core N-acetylglucosamine residues.
  • This deglycosylated core gpl20 eliminates over 90% of the carbohydrate but retains over 80% of the non-variable-loop protein. Its capacity to interact with CD4 and relevant antibodies is preserved at or near wild-type levels26.
  • the final model, composed of 7877 atoms comprises residues 90-396 and 410-492 of gpl20 (excepting loop substitutions), residues 1-181 of CD4 , and residues 1-213 of the light chain and 1-229 of the heavy chain of the 17b monoclonal antibody.
  • 11 N-acetylglucosamine and 4 fucose residues, and 602 water molecules have been placed.
  • the overall structure of the complex of gpl20 with D1D2 of CD4 and Fab 17b is as depicted in Fig. 28.
  • the polypeptide chain of gpl20 is folded into two major domains plus certain excursions that emanate from this body.
  • the inner domain (inner with respect to the N- and C-termini) features a two-helix, two-strand bundle with a small five-stranded 3-sandwich at its termini -proximal end and a projection at the distal end from which the V1/V2 stem emanates.
  • the outer domain is a stacked double barrel that lies alongside the inner domain such that the outer barrel and inner bundle axes are approximately parallel .
  • the proximal barrel of the outer-domain stack is composed from a 6-stranded, mixed-directional /3-sheet that is twisted to embrace helix ⁇ 2 as a 7th barrel stave.
  • the distal barrel of the stack is a 7-stranded antiparallel ⁇ barrel.
  • the two barrels share one contiguous hydrophobic core, and the staves also continue from one barrel to the next except at the domain interface. This interruption is centered at a side between barrels where the chain enters the outer domain with loop ⁇ B insinuated as a tongue between strands 316 and 523.
  • the extended segment just preceding ⁇ B is like an 8th stave of the distal barrel, but it is slightly out of reach for hydrogen bonding with its ⁇ !6 and 319 neighbors.
  • the chain returns to complete the inner domain after ⁇ 24 .
  • the proximal end of the outer domain includes variable loops V4 and V5 and loops ⁇ D and ⁇ E, which are variable in sequence as well. Loop ⁇ C is also at this end, close in space to loop ⁇ A of the inner domain, although by topology it is at the other end of this domain.
  • the distal end does include the stem of the excised variable loop V3 and also an excursion via loop ⁇ F into a ⁇ hairpin, (320- / 321, which in turn hydrogen bonds with the V1/V2 stem emanating from the inner domain.
  • Direct interatomic contacts are made between 22 CD4 residues and 26 gpl20 amino-acid residues. These include 219 van der Waals contacts and 12 hydrogen bonds . Residues in contact are concentrated in the span from 25 to 64 of CD4 , but they are distributed over six segments of gpl20 (Figs. 29d & 30i) : 1 residue from the V1/V2 stem, loop LD, the beta-15-alpha-3 excursion, the beta-20-beta-21 hairpin, strand beta-23 and the beta- 24-alpha-5 connection. These interactions are compatible with previous analyses of mutational data on both CD4(11, 12, 29) and gpl20(3, 13, 14).
  • gpl20 residues that are covered by CD4 are variable in sequence. This variation is accommodated in part by the large interfacial cavity (Fig. 30e) .
  • the gpl20 residues in contact with this water-filled cavity are especially variable (Fig. 30g) .
  • half of the gpl20 residues that make contacts with CD4 do so only through main-chain atoms (including C ⁇ ) of gpl20, and 60% of CD4 contacts are made by gpl20 main-chain atoms (Fig. 3 Of ) . Included among these are 5 of the 12 hydrogen bonds in the interface.
  • One such contributing element is an antiparallel ⁇ - sheet alignment of CD4 strand C" with gpl20 strand beta-15 (Figs. 30a & i) .
  • Phe 43 interacts with residues Glu 370, He 371, Asn 425, Met 426, Trp 427 and Gly 473 as well as Asp 368, but only the contacts with He 371 have a conventional hydrophobic character. Those to 425-427 and 473, including Trp 427, are only to backbone atoms. A surprisingly large fraction of the Phe 43 contacts (28%) are to polar groups. The phenyl group is stacked on the carboxylate group of Glu 370, and there are contacts with the carbonyl oxygen atoms of residues 425, 426 and 473 and the NH group of Trp 427.
  • the larger cavity is lined by mostly hydrophilic residues, half derived from gpl20 and half from CD4. It is not deeply buried; while formally a cavity in the crystal structure, minor changes in sidechain orientation would make it solvent accessible.
  • the observed electron density and predicted hydrogen bonding are consistent with at least 8 water molecules in the cavity.
  • Residues from gpl20 that actually line the cavity include Ala 281, Ser 364, Ser 365, Thr 455, Arg 469) exhibit sequence variability, whereas surrounding this variable patch are conserved residues, the substitution of which affect CD4 binding. These include the critical contact residues Asp 368, Glu 370 and Trp 427, which flank one end of the..cavity, and Asp 457 at the other end (Fig. 30e) .
  • CD4 residues that line the cavity can be mutated with only moderate effect on gpl20 binding, whereas Arg 59 suffers less loss of solvent accessible surface upon gpl20 binding but is highly sensitive to mutation.
  • This cavity thus serves as a water buffer between gpl20 and CD4 (Fig. 30e) .
  • the tolerance for variation in the gpl20 surface associated with this cavity produces a variational island (Fig. 30g) , or "anti-hot spot", which is centrally located between regions required for CD4 binding, and may help the virus escape from antibodies directed against the CD4 binding site.
  • the "Phe 43" cavity (Fig. 30b & h) is very different in character from the larger binding- interface cavity. It is roughly spherical, with a diameter of ⁇ 8 A (atom center to atom center) across the center of the cavity. It is positioned just beyond Phe 43 of CD4 , at the intersection of the inner domain, the outer domain and the bridging sheet. It is relatively deeply buried, extending into the hydrophobic interior of gpl20. The phenyl ring of Phe 43 is the only non-gpl20 residue contacting this cavity, forming a lid which covers the bottom of the cavity (Fig. 30b) .
  • mutations at Thr 257 (no contacts) and Trp 427 (only main-chain contacts) can substantially reduce binding.
  • Changes in cavity-lining residues also affect the binding of antibodies directed against the CD4 binding site.
  • many of the residues that line the cavity interact with elements of the chemokine receptor binding region (see below) . It may be that the Phe 43 cavity and the other interdomain cavities form as a consequence of a CD4- induced conformational change (see below) .
  • the 17b antibody is a broadly neutralizing human monoclonal isolated from the blood of an HIV-infected individual. It binds to a CD4-induced (CD4i) gpl20 epitope that overlaps the chemokine receptor-binding site (20) .
  • the interface between Fab 17b and core gpl20 in the ternary complex involves a small area of interaction.
  • the solvent accessible area excluded upon binding is only 455 A 2 from gpl20 and 445 2 A from 17b, which is largely from the heavy chain (371A 2 ) .
  • the long (15 residue) complementarity-determining region 3 (CDR3) of the heavy chain dominates, but the heavy-chain CDR2 and the light-chain CDR3 also contribute.
  • the 17b contact surface is very acidic (3 Asp, 3 Glu, no Arg or Lys) although hydrophobic contacts (notably a cis proline and tryptophan from the light chain) predominate at the center.
  • the 17b epitope lies across the base of the four-stranded bridging sheet (Fig. 31c & e) . All four strands make substantial contact with 17b, suggesting that the integrity of the bridging sheet is necessary for 17b binding.
  • the gpl20 surface that contacts 17b consists of a hydrophobic center surrounded by a highly basic periphery (3 Lys, 1 Arg, and no Asp or Glu) (Fig. 31d) . Although this basic gpl20 surface complements the acidic 17b surface, only one salt bridge is observed (between Arg 419 of gpl20 and Glu 106 of the 17b heavy chain) . The rest of the specific contacts occur between hydrophobic and polar residues.
  • the interaction between 17b and gpl20 involves a hydrophobic central region flanked on the periphery by charged regions, predominately acidic on 17b and basic on gpl20.
  • CD4-17b contacts There are no direct CD4-17b contacts and none of the gpl20 residues contacts both 17b and CD4. Rather, CD4 binds on the opposite face of the bridging sheet, providing specific contacts that appear to stabilize its conformation (Fig. 30i and 30j) and may explain in part the CD4- induction of 17b binding.
  • the 17b epitope is well conserved among HIV-1 isolates. Of the 18 residues that show loss in solvent accessible surface upon contact with 17b, 12 residues (67%) are conserved among all HIV-1 viruses. By contrast, only 19 of the 37 gpl20 residues (51%) that show loss of solvent accessible surface upon CD4 binding are similarly conserved.
  • CD4i epitopes tend to be masked from immune surveillance by the adjacent V2 and V3 loops (see accompanying paper) . Indeed, in the complex structure, a large gap is seen between gpl20 and tips of the light-chain CDR1 and CDR2 loops. Pointing directly at this gap is the base of the V3 loop.
  • variable loops may need to be bypassed for access to the conserved structures in the bridging sheet .
  • the 17b epitope may be further protected from the immune system by a CD4- induced conformational change (see below) .
  • gpl20 Although monomeric in isolation, gpl20 likely exists as a trimeric complex with gp41 on the virion surface.
  • the large electroneutral surface on the inner domain (Fig. 30c) is the probable site of trimer packing based on its lack of glycosylation, its conservation in sequence, the location of CD4 and CCR5 binding sites, and the immune response to this region.
  • the Phe 43 cavity (now a pocket) would present a perplexing structural dilemma.
  • the cavity-lining residues have few structural restrictions, with ample room for larger substitutions into the cavity, yet these residues are highly conserved and inexplicably hydrophobic if exposed in a pocket.
  • This pocket structure is in turn intimately connected to the bridging sheet, itself peculiar in absence of CD4.
  • the backbone amide of bridging- sheet residue 425 is hydrogen-bonded to Glu 370, a critical CD4 contact residue (Fig.
  • core gpl20 may differ in the absence of CD4 comes from comparison with theory.
  • the evolutionary algorithm of PHD37 gives secondary-structure predictions with 90% estimated reliability for roughly 45% of the core gpl20 sequence. Compared to our structure, it is accurate except at three places where it is markedly wrong (four consecutive residues with reliability index greater than 90%) . All of these are at the Phe 43 cavity or in contacts with CD4 : loop ⁇ B, strand 315, and the segment of 320 into the turn to ,321. (Fig. 30h) . Most significantly, the latter segment (residues 422-429) entering the bridging sheet is predicted to be helical.
  • CD4 binds efficiently to a gpl20 derivative with both 32 and 33 truncated (38 ) . Since the bridging sheet is most likely not stable in the absence of half its strands, CD4 binding must possess the ability to properly orient strands 320 and ,321 from a very different prior conformation .
  • the Phe 43 cavity is at the nexus of the CD4 interface, between the inner domain, the outer domain, and the bridging sheet. As such, Phe 43 itself seems to serve as a keystone without which the structure might collapse. If so, to what state and, in reverse, how does CD4 binding lead to the state seen in this ternary complex? Certainly, it is clear that CD4-gpl20 binding kinetics are complex(39), and microcalorimetric analysis reveals unusually large ⁇ H and compensating T ⁇ S values for soluble CD4 binding to gpl20 (M. L. Doyle, personal communication) .
  • Fig. 30c stabilizes a nascent complex state, and inserts the Phe 43 to induce formation of the Phe 43 cavity.
  • gpl20 Analysis of the antigenic structure of gpl20 shows that most of the envelope protein surface is hidden from humoral immune responses by glycosylation and oligomeric occlusion (accompanying paper) . Most broadly neutralizing antibodies generally access only two surfaces, one which overlaps the CD4 binding site (shielded by the V1/V2 loop) and the other which overlaps the chemokine receptor binding site (shielded by the V3 loop) . Conformational changes in core gpl20 provide additional mechanisms for evasion from immune surveillance. In the case of the CD4-binding surface, which contains a high proportion of mainchain atoms in the complex (Fig. 30f ) , the conformation without CD4 bound may expose underlying sidechain variability (Fig. 30g) .
  • Escape may also be provided by the recessed nature of the binding pocket (steric occlusion) (Fig. 30a) and by a topographical surface mismatch, which encloses a variational island or "anti-hot spot" (described above, Fig. 30d) .
  • Similar mechanisms may be found in the chemokine receptor region: conformational change may hide the conserved epitope (unformed prior to CD4 binding) ; steric occlusion may take place between the CD4 anchored viral spike and the proximal target membrane; and an "anti-hot spot” equivalent may camouflage chemokine-receptor binding residues on the V3 loop in surrounding variability.
  • Some of the defenses used to elude antibody-based responses may also help HIV avoid cellular immunity. Understanding the specific gpl20 mechanisms of immune evasion may be prerequisite to the design of effective prophylaxis.
  • the HIV surface proteins function to fuse the viral membrane with the target cell membrane.
  • the gpl20 glycoprotein plays roles crucial to the control and initiation of fusion.
  • One set of roles concerns positioning: locating a cell capable of productive viral infection, anchoring the virus to the cell surface, and orienting the viral spike next to the target membrane.
  • Another set concerns timing: holding the gp41 in a metastable conformation and triggering the coordinate release of the three N-terminal fusion peptides of the trimeric gp41. While it is clear that this is a complex multi-conformational process, the simplicity of the system, composed only of two membranes, the viral oligomer, and two host receptors, raises the possibility that we may be able to understand the entire mechanism.
  • Crystallography has now provided two snapshots: an intermediate state in which gpl20 is bound to CD4 , described herein; and a probably final, "fusion-active" state of the gp41 ectodomain (40,41) .
  • an intermediate state in which gpl20 is bound to CD4 described herein
  • a probably final, "fusion-active" state of the gp41 ectodomain 40,41
  • the vast biochemical data concerning the membrane fusion process mediated by the HIV-1 envelope glycoproteins allow us to extend our understanding from these two states .
  • CD4 binding also induces conformational changes in gpl20, which result in the creation of a metastable oligomer. Although some of the more flexible gpl20 regions and gp41 are missing, the structure of the core gpl20-CD4 complex presented here describes this state in atomic detail . CD4 binding results in movement of the V2 loop, which numerous experiments suggest partially occludes the V3 loop and CD4i epitopes (18, 36) . It also creates, or at least stabilizes, the bridging sheet on which these epitopes are located (described above for the core) .
  • CD4 binding results in changes in the conformation of the V3 region, with the tip of the loop becoming more accessible, as judged by enhanced proteolytic susceptibility and altered exposure of V3 epitopes (19) .
  • the V3 loop together with the uncovered epitopes comprise the chemokine-receptor binding site.
  • CD4 binding not only orients the gpl20 surface implicated in chemokine receptor binding to face the target cell, but it also forms and exposes the site itself.
  • these changes may all result from a single, concerted shift in the relative orientation of the inner and outer domains.
  • This conformational shift may alter the orientation of the N- and C- termini, at the proximal end of the inner domain, perhaps partially destabilizing the oligomeric gpl20/gp41 interface (21) .
  • Such a shift would also alter the relative placement of the V1/V2 stem (in the CD4i site) , which emanates from the inner domain, and the V3 loop, which emanates from the outer domain.
  • mutations that permit an adaptation of HIV-1 to CD4-independent entry using CXCR4 involve sequence changes in both the V1/V2 stem and the V3 loop (42) .
  • the next step in HIV-1 entry is the interaction of the gpl20-CD4 complex with the chemokine receptor (Fig. 32, step 2) .
  • Fig. 32, step 2 interactions between CD4 and chemokine receptor may occur, mutagenic analyses (H.
  • the structure of the gpl20/CD4/l7b antibody ternary complex described here reveals some of the molecular aspects of HIV-1 entry, including the atomic structure of gpl20, the explicit interactions with CD4 , and the conserved site of binding for the chemokine receptor. Still unknown are details of the apo state of core gpl20, the oligomeric structure, the interaction with the chemokine receptor, the conformational changes that trigger the reorganization of the gp41 ectodomain and the structural basis for insertion of the fusion peptide of gp41 into the target membrane. Further understanding will require snapshots of other intermediates.
  • the conformational complexity and observed intricate domain associations of gpl20 may reflect genome restrictions at the protein level akin to those that lead to overlapping reading frames at the transcription level. Multiply protected infection machinery is contained in these condensed intricacies. Its mechanisms frustrate host defenses ; understanding them may inspire medical intervention.
  • the two-domain CD4 (D1D2, residues 1-182) was produced in Chinese hamster ovarian cells (8), the monoclonal antibody 17b in an Epstein-Barr virus immortalized B-cell clone isolated from an HIV-1 infected individual and fused with a murine B-cell fusion partner(18), and the core gpl20 from Drosophila Schneider 2 lines under control of an inducible metallothionein promoter (20) .
  • the various biochemical manipulations e.g. deglycosylation for the gpl20 and papain digestion to produced the Fab 17b
  • protein purification e.g. deglycosylation for the gpl20 and papain digestion to produced the Fab 17b
  • ternary complex crystallization e.g. deglycosylation for the gpl20 and papain digestion to produced the Fab 17b
  • cryoprotectant containing stabilizer 10% ethylene glycol with 10.5% monomethyl-PEG 5,000, 10% isopropanol, 50 mM NaCl, 100 mM Citrate/HEPES buffer pH 6.3
  • cryoprotectant containing stabilizer (10% ethylene glycol with 10.5% monomethyl-PEG 5,000, 10% isopropanol, 50 mM NaCl, 100 mM Citrate/HEPES buffer pH 6.3)
  • immiscible oil Paratone-N; Exxon
  • Diffraction data were collected at beamline X4A, Brookhaven National Laboratory, using phosphor image plates and a Fuji BAS2000 scanner. To avoid overlap problems from the relatively high mosaicity (-1.0°), oscillation data were collected using a rotation axis that was off-set at least 30° from the 197A c axis. Although crystals initially diffracted to Bragg spacing of greater than 2A, ⁇ axis mosaicity and substantial radiation damage despite cryogenic cooling reduced the overall resolution to -2.5A. Data processing and reduction were performed using DENZO and SCALEPACK (45) (Table 1) .
  • crystals were soaked in over 20 different heavy atom solutions and screened for isomorphous replacement using the statistical ⁇ chi>2 test in SCALEPACK (45) .
  • Derivatives were identified from two heavy atom compounds : 10 mM K3IrCl6 (10 hr equilibration in heavy atom containing cryoprotectant stabilizer; 2.8A) and 5 mM K20sCl6 (24 hr soak; 3.5A) .
  • K3IrCl6 derivative was modeled as 9 partially occupied sites; two sites of occupancy 0.158 and 0.142, and 7 of less than 0.07. While relatively isomorphous, poor data quality (Rsym of greater than 20% past 3.0A) combined with relatively small isomorphous differences (Riso of 12.0%) reduced the quality of phasing. In contrast, the K20sCl6 derivative had an Riso of 15.6%, but was only isomorphous to roughly 5A. It was modeled as 4 sites of occupancy 0.321, 0.207, 0.194 and 0.128, with the highest site at the same position as the second highest site from K 3 IrCl 6 .
  • Deviations of the CD4 structure in the complex from the free state were measured by the procedure of Wu et al. (10) . Deviations were taken as significant when the root mean square (rms) residue deviation was greater than the overall value and also more than 0.5 ⁇ greater than variation among the free structures. Interatomic contacts were defined as in Zhu et . al . (48) . Structural alignments were made by visual comparison of the SCOP databas, and automatic searches were performed with PrISM (A.-S. Yang and B. Honig) .
  • HIV-1 entry co- factor functional cDNA cloning of a seven- transmembrane, G protein-coupled receptor. Science 272, 872-877 (1996) .
  • chemokine receptors as human immunodeficiency virus type 1 coreceptors determined by individual amino acids in the envelope V3 loop. J. Virol. 71, 7136-7139 (1997) .
  • HIV-1 Human immunodeficiency virus
  • AIDS acquired immune deficiency syndrome
  • the HIV-1 envelope glycoproteins, gpl20 and gp41 are assembled into a trimeric complex that mediates virus entry into target cells (1) . HIV-1 entry depends upon the sequential interaction of the gpl20 exterior envelope glycoprotein with the receptors on the cell, CD4 and members of the chemokine receptor family (2-4).
  • the gpl20 glycoprotein which can be shed from the envelope complex, elicits both virus-neutralizing and non-neutralizing antibodies during natural infection. Antibodies that lack neutralizing activity are often directed against the gpl20 regions occluded on the assembled trimer and exposed only upon shedding (5,6) .
  • Neutralizing antibodies by contrast, must access the functional envelope glycoprotein complex (7) and typically recognize conserved or variable epitopes near the receptor-binding regions (8-11) .
  • conserved neutralization epitopes on gpl20, utilizing epitope maps in conjunction with the X-ray crystal structure of a ternary complex that includes a gpl20 core, CD4 and a neutralizing antibody (12) .
  • a large fraction of the predicted accessible surface of gpl20 in the trimer is composed of variable, heavily glycosylated core and loop structures that surround the receptor-binding regions. Understanding the structural basis for the ability of HIV-1 to evade the humoral immune response should assist vaccine design.
  • human and simian immunodeficiency virus gpl20 glycoproteins consist of five variable regions (VI-V5) interposed among mor e conserved regions
  • Variable regions V1-V4 form exposed loops anchored at their bases by disulfide bonds (14) .
  • Neutralizing antibodies recognize both variable and conserved gpl20 structures.
  • the V2 and V3 loops contain epitopes for strain-restricted neutralizing antibodies (15-17) . More broadly neutralizing antibodies recognize discontinuous, conserved epitopes in three regions of the gpl20 glycoprotein (Table 1) . In HIV-1 infected humans, the most abundant of these are directed against the CD4 binding site (CD4BS) and block gpl20-CD4 interaction (8,9) . Less common are antibodies against epitopes induced or exposed upon CD4 binding (CD4i) (18) . Both CD4i and V3 antibodies disrupt the binding of gpl20-CD4 complexes to chemokine receptors (10, 11) .
  • a third gpl20 neutralization epitope is defined by a unique monoclonal antibody, 2G12, (19) which does not efficiently block receptor binding (11) .
  • (12) we report the X-ray crystal structure of an HIV-1 gpl20 core in a ternary complex with two-domain soluble CD4 and the Fab fragment of the CD4i antibody, 17b.
  • the gpl20 core lacks the V1/V2 and V3 variable loops, as well as N- and C- terminal sequences, which interact with the gp41 glycoprotein, (6) and is enzymatically deglycosylated
  • the gpl20 core binds CD4 and antibodies against CD4BS and CD4i epitopes

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biochemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Public Health (AREA)
  • Microbiology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The subject invention provides a crystal suitable for X-ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gp120. The subject invention also provides the above-described crystals, wherein the crystal is arranged in a space group P2221, so as to form a unit cell of dimensions a=71.6 Å, b=88.1 Å, c=196.7 Å, and which effectively diffracts X-rays for determination of the atomic coordinates of the gp120 to a resolution of 2.5 Å or better. The subject invention additionally provides compounds inhibiting the CD4-gp 120 interaction, compounds inhibiting chemokine receptor-gp120 interaction, mimics of CD4, gp120 variants and uses thereof.

Description

CRYSTAL COMPRISING HUMAN IMMUNODEFICIENCY VIRUS ENVELOPE GLYCOPROTEIN gpl20, COMPOUNDS INHIBITING CD4-gpl20 INTERACTION, COMPOUNDS INHIBITING CHEMOKINE RECEPTOR-gpl20 INTERACTION, MIMICS OF CD4 AND CΓP120 VARIANTS
This application is a contiunation-in-part of U.S. Serial No. 09/100,631, filed June 18, 1998, a continuation-in-part of U.S. Serial No. 08/976,741, filed November 24, 1997, U.S. Serial No. 09/100,763, filed June 18, 1998, a continuation- in-part of U.S. Serial No. 08/966,987, filed November 10, 1997, U.S. Serial No. 09/100,529, filed June 18, 1998, a continuation-in-part of U.S. Serial No. 08/967,403, filed November 10, 1997, U.S. Serial No. 09/100,762, filed June 18, 1998, a continuation-in-part of U.S. Serial No. 08/966,932, filed November 10, 1997, U.S. Serial No. 09/100,521, filed June 18, 1998 a continuation- in-part of U.S. Serial No. 08/967,148, filed November 10, 1997. The contents of the above- identified application are incorporated into this application by reference.
The invention disclosed herein was made with United States Government support under National Institute of Health Grant Nos. Al 31783, Al 39420, Al 28691, CA 06516, Al 41851, Al 40895, GM 5-20394 and CUID 511168. Accordingly, the United States Government has certain rights in this invention.
Various references are referred to within this application. Disclosures of these publications in their entireties are hereby incorporated by reference into this application to more fully describe the state of the art to which this invention pertains.
Background of the Invention
During the first thirty years of protein crystallization, the standard conceptual practice was to treat the protein as a fixed constant and screen it through a multitude of crystallization conditions. Advances in this approach has led to the development of crystallization robots capable of testing thousands of conditions (1,2). While this approach has had success, it fails for many interesting proteins.
One of these is the Human Immunodeficiency Virus (HIV) -1 envelope glycoprotein, gpl20. .HIV induces acquired immunodeficiency syndrome (AIDS) in humans (3,4). The gpl20 glycoprotein helps to mediate virus entry into cells through sequential recognition of two cellular receptors of the human host, CD4 (5,6), and a chemokine receptor (primarily CXCR-4 or CCR-5, depending on viral strain) (7-12). These high affinity interactions are attractive targets for mimetic drug design. Although the structure of the gpl20-binding domain of CD4 and the identity of residues critical to its interaction with gpl20 have been known for several years (13,14), this has not been sufficient for design of potent antagonists (15-17). As. the major virus-specific antigen accessible to neutralizing antibodies, knowledge of the gpl20 structure could also impact considerably on vaccine design.
The gpl20 protein has been an obvious target for structural investigation, and quantities of pure soluble protein have been available for several years, a byproduct in part from vaccine trials. Nevertheless, despite considerable effort, it has resisted crystallographic analysis for more than a decade.
The mature gpl20 glycoproteins of different HIV-1 strains have approximately 470-490 amino acids (18). Extensive N-linked glycosylation at approximately 20-25 sites accounts for roughly half its mass (18,19). Sequences from many different viral isolates show that it contains five conserved regions (C1-C5) and five variable regions (V1-V5)' (18, 20) and nine conserved disulfide bridges (19) . Except for limited N- and C- terminal cleavage, proteolytic digestion does not reveal a sub-domain structure. Indeed, even after extensive proteolytic cleavage, the unreduced protein runs near its native molecular weight on SDS-PAGE (Peter D. Kwong: unpublished data) . Some of the variable regions, the V3 loop in particular, appear to be conformationally variable. Conformational change is also evidenced by shedding, the CD4-induced dissociation of gpl20 from the surface of the virus, and by ligand-induced variations in monoclonal antibody binding (21,22). These changes may be related to the functional role of gpl20 in virus entry.
The extensive glycosylation and conformational heterogeneity of gpl20 suggested that merely screening the protein through ever more exotic crystallization conditions would not produce well-diffracting crystals. We therefore adopted a fundamentally different approach, which we term variational crystallization. This approach employed on radical modification of the protein surface, primarily to reduce heterogeneity, but also as a means of varying potential crystallization lattice contacts. An interactive cycle, involving different biochemical and molecular biological techniques, was used to detect and remove chemical and conformational heterogeneity. In addition, protein ligands, such as CD4 and the Fabs of monoclonal antibodies, were used to restrict conformational mobility. Progressive trials of 18 different gpl20 crystallization variants yielded six different crystals. This paradigm of crystallization, with a focus on protein modification rather than on crystallization screening, may aid in the structural analysis of other conformationally complex proteins.
Summary of the Invention
The subject invention provides a crystal suitable for X- ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gpl20, wherein the amino acid sequence is at least 100 amino acids in length.
The subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 2.5 angstroms or better than 2.5 angstroms.
The subject invention also provides the above-described crystals, wherein the crystal is arranged in a space group P222:, so as to form a unit cell of dimensions a=71.6 A, b= 88.1 A, c=196.7 Λ, and which effectively diffracts x-rays for determination of the atomic coordinates of the gpl20 to a resolution of 2.5 A or better.
The subject invention additionally provides a method for producing a crystal suitable for X-ray diffraction comprising: (a) deglycosylating a polypeptide having amino acid sequence of a portion of a gpl20 wherein said portion is produced by deleting or replacing part of the gpl20 to reduce the surface loop flexibility; (b) contacting the polypeptide with a ligand so as to form a complex which exhibits restricted conformational mobility; and (c) obtaining crystal from the complex so formed to produce a crystal suitable for X-ray diffraction.
The subject invention also provides the above-described methods, wherein the VI, V2, or V3 loop of the gpl20 contained in the polypeptide are partially truncated, deleted or replaced.
The subject invention also provides a method for identifying a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b)- determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the gpl20.
This invention also provides a method of inhibiting the interaction of HIV-gpl20 with CD4 which comprises administering to a mammal in need thereof a compound capable of disrupting two or more of the contacts between gpl20 and CD4 as set forth in Figure 54.
This invention also provides a method for identifying a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD ; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the CD4 binding site of the gpl20.
This invention also provides a method for designing a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) designing a compound to fit the CD4 binding site.
This invention also provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject.
This invention provides a method for identifying a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) determining whether a compound would fit into the binding site, a positive fit indicating that the compound is capable of binding to the chemokine receptor binding site of the gpl20.
This invention also provides a method for designing a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) designing a compound to fit the chemokine receptor binding site.
This invention also provides the above-described methods, wherein the crystal further comprises a chemokine receptor, a second polypeptide having amino acid sequence of a portion of chemokine receptor, an antibody or a Fab capable of binding to the chemokine receptor binding site or a compound known to be capable of binding to the chemokine receptor binding site, bound to the polypeptide.
This invention further provides a method of inhibiting the interaction of HIV-gpl20 with chemokine receptor which comprises administering to a mammal in need thereof a compound capable of disrupting two or more of the contacts between gpl20 and chemokine receptor as set forth in Figure 55, thereby inhibiting the interaction of HIV-gpl20 with chemokine receptor.
This invention provides a substance mimicking the gpl20- binding domain of CD4 wherein the size of the residue or analog thereof at position 43 is bigger than the size of a phenylalanine so as to increase the affinity for human immunodeficiency virus envelope glycoprotein gpl20.
This invention also provides the above-described substances, wherein the modification results in a o residue or analog thereof larger than 10 A across its longest dimension.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof, wherein the residue's longest dimension is longer than phenylalanine ' s longest dimension. This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 15 A across its longest dimension.
This invention also provides the above-described substances, wherein the modification involves replacemnet of the residue at position 43 with a cysteine .
This invention also provides the above-described substances, wherein the modification involves replacement of the residue at position 43 with a tyrosine .
This invention further provides a pharmaceutical composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the substance of claim 1; and (b) a pharmaceutically acceptable carrier.
This invention further provides a method of inhibiting cell entry by HIV, comprising contacting the cells with an effective amount of the above-described substances, thereby inhibiting cell entry by HIV.
This invention further provides a method of treating or preventing HIV infection in a subject, comprising administering to the subject an effective amount of the above-described substances, thereby treating or preventing HIV infection.
The subject invention provides a variant of gpl20 which presents a hidden, conserved, neutralization epitope.
The subject invention also provides a composition comprising a variant of gpl20 which presents a hidden, conserved, neutralization epitope and a suitable carrier.
The subject invention further provides a vaccine comprising a variant of gpl20 which presents a hidde conserved, neutralization epitope and a suitable carrier .
The subject invention also provides an antibody induced by a vaccine comprising a variant of gpl20.
Brief Description of the Figures
Figures for the First Series of Experiments
Figure 1
Computer-generated ribbon drawing of the tertiary structure of CD4, gpl20, and Fab 17b interacting. CD4 is in the top left, gpl20 is toward the right, and Fab
17b is in the bottom left of the figure.
Figure 2
Illustration of the locations of CD4, gpl20, and Fab 17b in the computer-generated ribbon drawing o figure 1.
Figure 3
Photomicrographs of crystals containing HIV-1 gpl20.
Crystal types A-F are shown and correspond to the crystal types described in the text and Tables 3 and 4. The photomicrograph in A is at twice the magnification.
The bar in A corresponds to 25 μm (50 μm for B-F) .
Figure 4
Polyacrylamide gel electrophoresis (PAGE) of the ternary complex crystals (Type E) . A cluster of crystals (0.4x0.1x0.05mM) was washed four times with 1 μl of reservoir solution and dissolved in 3 μl of loading buffer and analyzed by SDS-PAGE on a 8-25% gradient gel
(Pharmacia Phast system). Lane 1, 2.5 ug of ternary complex purified by gel filtration. The top band is the deglycosylated Δ82ΔV1/2*ΔV3ΔC5 gpl20, the next two bands are the alkylated and reduced heavy and light chains respectively of the Fab 17b, and the bottom band is the two-domain sCD4 (D1D2). Lane 2, standards: 94, 67, 43
(diffuse), 30, 20, and 14. Lane 3, supernant from the crystallization droplet. Lane 4, last wash of crystals.
Lane 5, dissolved crystals. The gel is silver stained.
Figure 5 A and B
Crystals formed under condition one described in Table
7. Figure 6 Crystals formed under condition two described in Table
7.
Figure 7
Crystals formed under condition three described in Table 7.
Figure 8
Crystals formed under condition four described in Table
7.
Figure 9 Crystals formed under condition five described in Table
7.
Figure 10
Crystals formed under condition six described in Table
7. Figure 11
Crystals formed under condition seven described in Table
7.
Figure 12
Crystals formed under condition eight described in Table 7.
Figure 13
Crystals formed under condition nine described in Table
7.
Figure 14 Crystals formed under condition ten described in Table
7.
Figure 15
Crystals formed under condition eleven described in
Table 7. Figure 16
Crystals formed under condition twelve described in
Table 7.
Figure 17
Crystals formed under condition thirteen described in Table 7. Figure 18
Crystals formed under condition fourteen described in
Table 7.
Figure 19 Crystals formed under condition fifteen described in
Table 7.
Figure 20
Crystals formed under condition sixteen described in
Table 7. Figure 21
Crystals formed under condition seventeen described in
Table 7.
Figure 22
Crystals formed under condition eighteen described in Table 7.
Figure 23
Crystals formed under condition nineteen described in
Table 7.
Figure 24 Crystals formed under condition twenty described in
Table 7.
Figure 25
Crystals formed under condition twenty-one described in
Table 7.
Figures for the Second Series of Experiments
Figures 26A and 26B
The HIV-1 entry process. The trimeric HIV-1 envelope glycoproteins, anchored in the viral membrane, are depicted, with gpl20 in the lower right and gp41 in the upper right. For simplicity, the gpl20 variable loops are not shown, but would extend over the outer surface of the envelope glycoprotein complex. The receptors on the target cell, CD4 and chemokine receptor, are also shown. The structures of gpl20, gp41, and CD4 are adapted from available X-ray crystallographic studies (5,20,21), whereas the chemokine receptor model is hypothetical . Figure 27 The HIV-1 gpl20 surface. Figure 27A
The molecular surface of the HIV-1 gpl20 core (20) is shown, with the arrow pointing towards the viral membrane. The inner domain, believed to interact with gp41, and the outer domain, which is probably exposed on the assembled trimer, are on the left and right, respectively. The gpl20 surface occluded by CD4 is shown and the gpl20 region thought to be involved in chemokine receptor binding (27) is also shown. The location of the base of the V3 loop is shown. Figure 27B
Conserved gpl20 neutralization epitopes are shown on the gpl20 core, which is oriented identically to that in Figure 16A. The location of the epitopes was deduced from mutagenic analysis (45,46,48). Figure 27C
The approximate location of gpl20 structures (20) that contribute to protection from antibody responses is shown. The major variable loops (V2, V3, and V4 ) , the V5 region and the sites of N-linked glycosylation are shown. Figure 16D
The relationship of different surfaces of the gpl20 core to the antibody response generated by the gpl20 glycoprotein is depicted. The surface of gpl20 that interacts with neutralizing antibodies (32) is shown, spans the inner and outer domains, and includes the V2 and V3 variable loops (not shown) . The surface of gpl20 that interacts with non-neutralizing antibodies is located on the inner domain, and includes gp41- interactive N- and C-terminal gpl20 regions (not shown) . The heavily glycosylated surface of the gpl20 outer domain, which appears to be minimally immunogenic, is also shown.
Figures for the Third Series of Experiments
Figure 28
Overall structure. The ribbon diagram shows gpl20, the
N-terminal two domains of CD4, and the Fab 17b (light chain) and (heavy chain) . The sidechain of Phe 43 on CD4 is also shown. The prominent CDR3 loop of the 17b heavy chain is evident in this orientation. Although the complete N- and C- termini of gpl20 are missing, the positions of the gpl20 termini are consistent with the proposal that gp41, and hence the viral membrane, is located towards the top of the diagram. This would position the target membrane at the diagram base. The vertical dimension of gpl20 in this orientation is roughly 50 A. Precisely perpendicular views of gpl20 are shown in Figures 29 and 30. Drawn with RIBBONS49. Figure 29
Structure of core gpl20. The orientation of gpl20 in each of the panels shown in this figure is related to Figure 17 by a 90° rotation about a vertical axis. Thus the viral membrane would be oriented above, the target membrane below, and the C-terminal tail of CD4 coming out of the page. In this view, we describe the left portion of core gpl20 as the "inner" domain, the right portion as the "outer" domain, and the 4-stranded sheet at the bottom left of gpl20 as the "bridging sheet." The bridging sheet (β3, β2, β21, β20) can be seen packing primarily over the inner domain, although some surface residues of the outer domain, e.g. Phe 382, reach in to form part of its hydrophobic core. Figure 29A Ribbon diagram. Helices and β-strands are depicted, strand βl5 makes an antiparallel β-sheet alignment with strand C'1 of CD4. The dashed line to the right of the diagram represents the disordered V4 loop. Selected parts of the structure are labeled. Figure 29B
Secondary structure diagram. The schematic is arranged to coincide with the orientation of Figures 29A and 29C. Helices are shown as corkscrews and labeled αl-α5. β-strands are shown as arrows: black and labeled represent the 25 β-strands of core gpl20; gray and unlabeled represent the continuation of hydrogen bonding across a sheet; white and labeled represents the C' strand of CD4. Spatial proximity between neighboring strands implies mainchain hydrogen bonding. Loops are labeled ζA-ζF and V1-V5. The labels of loops with high sequence variability are circled. Assignment of secondary structure was accomplished with the Kabsch and Sander algorithm except for β4 and β8, which are both interrupted mid-strand by sidechain-backbone hydrogen bonds, β9, βl5, and β25a, all of which have angles or hydrogen bonds which are slightly non-standard, and 4, which hydrogen bonds as a 3-10 helix with the final residue in β-conformation. Figure 29C
Stereo plot of an α-carbon trace. Every 10th Cα is marked with a filled circle, and every twentieth residue is labeled. Disulfide connections are depicted as ball and stick. Shown are ordered residues, 90-396 and 410-492.
Figure 29D
Structure-based sequence alignment. Shown are the sequences of "HIV-1 B" (core gpl20 from clade B, strain HXBc2 used in these studies), "C" (HIV-1 clade C, strain UG268A2), "0" (HIV-1 clade 0, strain ANT70), "HIV-2" (strain ROD), and "SIV" (African green monkey isolate, clone GRI-1). The secondary structure assignments are shown as arrows and cylinders, with (x) denoting residues which are disordered in the present structure. The "gars" sequence at the N-terminus and the "gag" sequence in the V1/V2 and V3 loops are consequences of the gpl20 truncation. Solvent accessibility is indicated for each residue by an open circle if the fractional solvent accessibility is greater than 0.4, a half-closed circle if 0.1 to 0.4, and a closed circle if less than 0.1. Sequence variability observed among primate immunodeficiency viruses is indicated below the solvent accessibility by the number of horizontal hash marks: 1 mark, residues conserved among all primate immunodeficiency viruses; 2 marks, conserved among all HIV-1 isolates; 3 marks, exhibits moderate variation among HIV-1 isolates; and 4 marks, exhibits significant variability among HIV-1 isolates. In accessing conservation, all single atom changes were permitted as well as larger substitutions if the character of the sidechain was conserved (e.g. K to R or F to L) . N-linked glycosylation is indicated by "m" for the high mannose additions and "c" for the complex additions observed in mammalian cells (6). Residues of gpl20 in direct contact with CD4 are indicated by "*". Direct contact is a more restrictive criterion of interaction than the often used loss of solvent accessible surface; residues of gpl20 which show loss of solvent accessible surface but are not in direct contact are 123, 124, 126, 257, 278, 282, 364, 471, 475, 476 and 477. Parts (a) and (b) were drawn with MOLSCRIPT (P. J. Kraulis) . Figure 30 CD4-gpl20 interactions. Figure 30A Ribbon diagram of gpl20 binding to CD4. Residue Phe 43 of CD4 is also depicted reaching into the heart of gpl20. From this orientation the recessed nature of the gpl20 binding pocket is evident. Figure 30B
Electron density in the Phe 43 cavity. The 2Fo-Fc electron density map at 2.5Λ, l.lσ contour, is shown. The orientation is the same as in (a). The foreground has been clipped for clarity removing the overlying β24-α5 connection. In the upper middle of the picture is the central unidentified density. At the bottom of the picture, Phe 43 of CD4 can be seen reaching up to contact the cavity. Moving clockwise around the cavity, the gpl20 residues are Trp 427 (with its indole ring partially clipped by foreground slabbing) , Trp 112, Val 255, Thr 257, Glu 370 (packing under the Phe 43 ring), lie 371, and Glu 368 (partially clipped in the bottom right corner) . Hydrophobic residues lining the back of the cavity can be partially glimpsed around the central unidentified density. Figure 30C
Electrostatic surface of CD4 and gpl20. The electrostatic potential is displayed at the solvent accessible surface, which is shaded according to the local electrostatic potential. The slight "puffiness" of the surface arises from the enlarged nature of the solvent accessible surface relative to the standard molecular surface. On the right, the gpl20 surface is shown in an orientation similar to that of Figures 29A and 29C, but rotated -20° around a vertical axis to depict the recessed binding pocket more clearly. A thin yellow Cα worm of CD4 is shown to aid in orientation. On the left, the CD4 surface is shown, rotated relative to the gpl20 panel by an exact 180° rotation about the vertical axis shown. A thin red Cα worm of gpl20 is shown. Figure 30D
CD4-gpl20 contact surface. On the right, the gpl20 surface is shown with the surface within 3.5 A of CD4 (surface-to-atom center distance) . This effectively creates an "imprint" of CD4 on the displayed gpl20 surface. On the left (180° rotation) , the corresponding CD4 surface and gpl20 "imprint" is also shown. Figure 30E CD4-gpl20 mutational "hot-spots." On the right, the surface of gpl20 is shown with the surface of gpl20 residues shown by substitution to affect CD4 binding highlighted: substantial effect -- residues 257, 368, 370 and 427; moderate effect -- residue 457. Also depicted is the surface of the large water-filled cavity at the CD4-gpl20 interface. On the left (180° rotation), residues important for gpl20 binding are shown on the CD4 surface: substantial effect — residues 43 and 59; moderate effect -- residues 29, 35, 44, 46, 47. Figure 30F
Sidechain/mainchain contribution to the gpl20 surface. The orientation is the same as the right panel of Figures 30C-30E, and below (Figure 30G) , and allows for direct comparison of the CD4-gpl20 contact surface. A striking surface concentration of mainchain atoms is seen in the regions corresponding to the CD4 "imprint." Figure 30G
Sequence variability mapped to the gpl20 surface. The sequence variability observed among primate immunodeficiency viruses (Figure 29D) is depicted mapped onto the gpl20 surface. Also shown is the carbohydrate: N-acetylglucosamine and fucose residues present in the structure; Asn-proximal N-acetylglucosamines modeled at residues 88, 230, 241, 356, 397, 406, 462. Much of the carbohydrate (22 residues) is hidden on the back side of the outer domain. Figure 30H
Phe 43 cavity. The surface of the Phe 43 cavity is shown, buried in the heart of gpl20. A worm representation of gpl20 shows the three stretches that are incorrectly predicted by secondary structure prediction: the ζB loop, bending around the top of the cavity, strands β20-β21 just below the cavity, and strand βl5, slightly more distal to the cavity right. The orientation shown here is the same as for the gpl20 surfaces in Figure 30C-G. Figure 301
Schematic of the CD4-gpl20 interface. This schematic of the entire interface shows six discrete segments of gpl20 (solid black line) interacting with CD4 (double line) . To aid in orientation, secondary structural elements are labeled, as are representative contact residues from each segment of gpl20. Arrows indicate mainchain direction. The sidechain of Phe 43 is also shown. The orientation shown is similar to Figure 30A and 30B. Figure 30J
Schematic of gpl20 contacts around Phe 43 and Arg 59 of CD4. Residues on gpl20 involved in direct contact with Phe 43 or Arg 59 are depicted.- Electrostatic interactions are depicted as dashed lines. Hydrophobic interactions are found between Phe 43 (CD4) and Trp 427, Glu 370, Gly 473, and He 371 (all from gpl20) and between Arg 59 (CD4) and Val 430 (gpl20). The orientation is similar to Figure 30A, 30B, and 301, but has been rotated for clarity. Sidechains of Phe 43 and Arg 59 as well as those portions of gpl20 sidechains which interact with these crucial CD4 residues are drawn with bold lines. (Figure 30A was drawn with RIBBONS49, Figure 30B with the program 04 J and Figures 30B-G with GRASP5J )
Figure 31
Neutralizing antibody 17b-gpl20 interface.
Figure 31A Worm diagram of Fab 17b and gpl20. The Fab 17b is shown binding to gpl20. The orientation shown is the same as in Figures 29A and 29C.
Figure 31B
Contact surface and V3 loop. The surface of gpl20 is shown with any surface within 3.5 A of Fab 17b
(surface-to-atom center) and the surface of the V3 base.
The orientation is the same as in Figure 30A.
Figure 31C
Contact surface and V3 loop. The same as Figure 20B, but rotated around a horizontal axis to more clearly depict the 17b epitope.
Figure 31D
Electrostatic surface. The electrostatic potential is displayed at the solvent accessible surface, which is shaded according to the local electrostatic potential.
The electrostatic shading is the same scale as that shown in Figure 30C. The surface that corresponds to the 17b epitope is the most electropositive region of the molecule. The V3 loop is truncated here, but sequence analysis shows that it is generally quite positively charged.
Figure 31E
Worm diagram of gpl20. The gpl20 is shown shaded according to the same scheme given in Figure 30A. The orientation is the same as in Figures 30C and 30D, that is, 90° from Figure 30A.
Figure 32
Schematic representation of the gpl20 initiation of fusion. A single monomer of core gpl20 is depicted in an orientation similar to Figures 29A and 29C. The "3" symbolizes the 3-fold axis, from which gp41 interacts with the gpl20 N- and C- termini to generate the functional oligomer. In the initial state of gpl20 (on the surface of a virion) , the V1/V2 loops are shown partially occluding the CD4 binding site. Following CD4 binding (now at a target cell, though above the glycocalyx) , a conformational change is depicted as an inner/outer domain shift, with the dark circle denoting the formation of the Phe 43 cavity. This conformational change strains the interactions at the N- and C- termini of gpl20 with the rest of the oligomer, priming the CD4-bound gpl20 core. In the next step (which takes place directly adjacent to the target membrane) , the chemokine receptor binds to the bridging sheet and the V3 loop (at the bottom left and right, respectively, of gpl20), causing an orientational shift of core gpl20 relative to the oligomer. This triggers further steps, which ultimately lead to the fusion of the viral and target membranes. Figure 33
Structure of HIV-1 gpl20 with neutralizing antibody and human receptor CD .
Figures for the Fourth Series of Experiments Figure 34A
Structure and orientation of the HIV-1 gpl20 core. Cα tracing of the gpl20 core, which was crystallized in a ternary complex with two-domain sCD4 and Fab fragment of the 17b antibody (12) , is shown. The gpl20 core is seen from the perspective of CD4 , and is oriented with the viral membrane at the top of the figure and the target cell membrane at the bottom. The N- and C-termini of the truncated gpl20 core are labeled, as are the positions of structures related to the gpl20 variable regions, V1-V5. The Ld and Le surface loops (12) are shown. The position of the "Phe 43" cavity involved in CD4 binding is indicated by an asterisk. A gpl20 surface implicated in binding to the CCR5 chemokine receptor (C. Rizzuto and J. Sodroski, submitted) is indicated. The perspectives in Figures 23B, C and D are indicated. Figure 34B
View of the molecular surface of the gpl20 outer domain, from the perspective indicated in Figure 34A. The molecular surface in the figure on the left is shaded according to the variability observed in gpl20 residues among primate immunodeficiency viruses. The variability of the gpl20 surface shown is underestimated since the V4 variable loop, which is not resolved in the structure, contributes to this surface. The position of the V5 region is shown. Also note the highly conserved glycosylation site (asparagine 356 and threonine/serine 358) within the Le loop, between the V5 and V4 regions. In the figure on the right, the V4 loop and the carbohydrates are modeled, as described in Materials and
Methods . Figure 34C
View of the gpl20 molecular surface facing the target cell. Variability is indicated in the figure on the left, using the shading scheme as in Figure 34B. Note the clear demarcation between the conserved surface, which has been implicated in the formation of CD4i epitopes (18) and in chemokine receptor binding (C. Rizzuto and J. Sodroski, unpublished observations) , and the variable surface of the outer domain. The recessed binding site for CD4 is indicated, flanked by the V1/V2 stem, which is labeled. The V4 loop and the carbohydrates are modeled in the figure on the right. The figure is shaded as indicated in Figure 34B particularly carbohydrates referred to elsewhere in this report are labeled. Figure 34D View of the molecular surface of the gpl20 core inner domain. In the figure on the left, variability is indicated by the shading scheme used in Figure 34B. The CD4-binding site is to the right of the figure, and the protruding V1/V2 stem is indicated. The conserved molecular surface, which is associated with the inner domain of the gpl20 core, is devoid of know N-linked glycosylation. These are modeled in the figure on the right, which is shaded as described in Figure 23B. Figure 35
The spatial relationship of epitopes on the HIV-1 gpl20 glycoprotein .
Figure 35A
The molecular surface of the gpl20 core is shown, from the same perspective as that in Figure 34A. The modeled N-terminal gpl20 core residues, V4 loop and carbohydrate structures are included. The variability of the molecular surface is indicated, using the shading scheme described in Figure 34B. The approximate locations of the V2 and V3 variable loops are indicated. Note the well-conserved surfaces near the "Phe 43" cavity and the chemokine receptor- binding site (see Figure 34A) . Figure 35B
A Co: tracing of the gpl20 core, oriented similarly to Figure 34A. The gpl20 residues within Figure 37A of the 17b CD4i antibody are shown. The residues implicated in the binding of CD4BS antibodyies (20) are shown. Changes in these residues significantly affect the binding of at least 25 percent of the CD4BS antibodies listed in the table from the fourth series of experiments. The residues implicated in 2G12 binding (19) are shown. The V4 variable loop, which contributes to the 2G12 epitope, (19) is indicated by dotted lines (see figure 34A) . Figure 35C
The molecular surface of the gpl20 core, oriented and shaded as in Figure 35B, is shown. Figure 35D
Approximate locations of the faces of the gpl20 core, defined by the interaction of gpl20 and antibodies. The molecular surface accessible to neutralizing ligands (CD4 and CD4BS, CD4i and 2G12 antibodies) is shown in white. The neutralizing face of the complete gpl20 glycoprotein includes the V2 and V3 loops, which reside adjacent to the surface shown (see Figure 35A) . The approximate location of the gpl20 face that is poorly accessible on the assembled envelope glycoprotein trimer and therefore elicits only non-neutralizing antibodies (5 , 6) is shown. The approximate location of an immulogically "silent" face of gpl20, which roughly corresponds to the highly glycosylated outer domain surface, is also shown. Figure 36
A likely arragement of the HIV-1 gpl20 glycoproteins in a trimeric complex. The gpl20 core was organized into a trimeric array, based on the criteria discussed in the text. The perspective if from the target cell membrane, similar to that shown in Figure 34C. The CD4 binding pockets are indicated by black arrows, and the chemokine receptor-binding regions are darkly shaded. The lightly shaded areas indicate the more variable, glycosylated surface of the gpl20 core. The approximate locations of the 2G12 epitopes are indicated by open arrows . The approximate locations for the V3 loops and V4 regions are shown. The positions of the V5 regions and some complex carbohydrate addition sites (asparaginase 276, 463, 356, 397 and 406) are shown. The approximate locations of the large V1/V2 loops, centered on the known positions of the VI/V2 stems, are indicated. On one of the gpl20 subunits, the positions of the LD and LE loops are indicated. The distance of each of the gpl20 monomers from the 3 -fold symmetry axis is arbitrary.
Figures for the Fifth Series of Experiments Figure 37
The HIV gpl20 derivative used in the binding assay. The wild-type gpl20 and gp41 envelope glycoproteins are shown in the upper figure. Conserved (black) and variable (white) regions (25) are indicated. The wtΔ protein, which is derived from the primary macrophage- tropic YU2 HIV-1 isolate (7) , is shown beneath the wild- type envelope glycoproteins. The N-terminal and V1/V2 deletions correspond to those previously described for the HXBc2 gpl20 mutants Δ82 and Δ128-194, respectively (8,9). SIG=signal peptide. Figure 38
The gpl20-CCR5 binding assay. Figure 38A The radiolabeled wtΔ protein was incubated either with the parental LI .2 cells or with the L1.2-CCR5 cells. Incubations were carried out either in the absence or presence of sCD4 (lOOnM) . The wtΔ protein bound to the cells is shown. The two bands represent different glycoforms of gpl20. Figure 38B
The wtΔ protein was incubated with both sCD4 and 17b antibody at the indicated concentrations prior to adition to the L1.2-CCR5 cells. The L1.2-CCR5 cells were incubated with 2D7 anti-CCR5 antibody or MIP-13 at the indicated concentrations prior to incubation with wtΔ-sCD4 complexes. The wtΔ protein bound to the cells is shown. Figure 38C The amount of radiolabeled wtΔ or selected mutant envelope glycoproteins precipitated by a mixture of HIV- 1-infected patient sera (Total), precipitated by sCD4 and an anti-CD4 antibody (Bound (sCD4) ) , or bound to LI.2- CCR5 cells (Bound (CCR5) ) is shown. Figure 39
Structure of the HIV-1 gpl20 region implicated in CCR5 binding . Figure 3 9A
A ribbon drawing of the HIV-1 gpl20 glycoprotein (6) complexed with CD4 is shown. The perspective is that from the target cell membrane. The two amino-terminal domains of CD4 are shown. The gpl20 inner domain is shown, the outer domain is shown and the "bridging sheet" is shown. The gpl20 residues in which changes resulted in a >90% decrease in CCR5 binding are labeled. The VI/V2 stem and base of the V3 loop (strands /512 and /βl3 and the associated turn) are indicated. Figure 39B
A molecular surface of the gpl20 glycoprotein from the same perspective as that of Figure 39A is shown. Shaded surfaces are associated with gpl20 residues in which changes resulted in either a ≥ 75% decrease, a ≥ 90% decrease or a > 50% increase in CCR5 binding, when CD4 binding was at least 50% of that seen for the wtΔ protein. Figure 39C The surface depicted in Figure 39B is shaded according to the degree of conservation observed among primate immunodeficiency viruses (25) . Figure 39D The molecular surface of the gpl20 glycoprotein is shown, indicating residues in which changes resulted in a ≥ 70% decease in 17b antibody binding, in the absence of sCD4. Figure 39E The molecular surface of the gpl20 glycoprotein is shown, indicating residues in which changes resulted in a ≥ 70% decrease in CG10 antibody binding in the presence of sCD4. Residues in which changes significantly decreased CD4 binding (and thus indirectly decreased CG10 binding) are not shown. Images were made with Midas-Plus (Computer Graphics Lab, University of California, San Francisco) and GRASP (26) . Mimcs of CD4 With Enhanced Affinity For gpl20
Figure 40
Illustration of the gpl20 -binding domain of CD4 and its interaction with the hydrophobic pocket of gpl20. Figure 41A
Active Halogen Reaction Scheme for modifying cysteine 43 mutants of CD4.
Figure 41B
Pyridyl Disulfide Reaction Scheme for modifying cysteine 43 mutants of CD4.
Figure 42
Some specific examples of cysteine 43 mutant derivatives produced with the active halogen reaction scheme.
Figure 43A General reaction scheme for using a bifunctional reagent to modify the gpl20 -binding domain of CD4.
Figure 43B
Reaction scheme for using a bifunctional reagent to modify a residue in the gpl20-binding domain of CD4 as applied to a cysteine residue.
Figure 44A and B
Use of 3- (2-pyridyldithio) propionic acid N- hydroxysuccinimide ester (SPDP) , a bifunctional reagent, as an adaptor for modifying a residue in the gpl20- binding domain of CD4.
Figure 45
Illustration of how modification can improve the fit between the gpl20 -binding domain of CD4 and the hydrophobic pocket in gpl20. Figure 46
Illustration of some of the residues lining the hydrophobic pocket of gpl20. The residues lining the hydrophobic pocket of gpl20 include: Trp (112), Leu (116), Pro (118), Phe (210), Val (255), Ser (375), Asn (377), Phe (382), He (424), Met (426), Trp (427), Asn (428), Ala (433), Gly (473), and Met (475)
Figure 47 Computer-generated ribbon drawing of the tertiary structure of CD4 and gpl20 interacting. CD4 is toward the bottom and gpl20 is toward the top. Figure 48 Reaction scheme for chemically modifying tyrosine residues. RI may be selected from the group shown in Figure 44. An alterative mechanism may be achieved as shown on page 365 of Structure and Protein Chemistry by Jack Kyte (1994), in which a diazonium salt participate in electrophilic aromatic substitution with tyrosine. Figure 49 Schematic showing the structural domains of gpl20.
gpl20 Variants as Vaccine For HIV Infection Figure 50
Depiction of the gpl20 Oligomer.
Figure 51
Depiction of the pocket of gpl20 formed after the binding of CD4 to gpl20. Figure 52
The topology for the gpl20 (Δ82, ΔVl/2, ΔV3 , ΔC5) construct .
Coordinates and Contacts Figure 53
Shows the x-ray crystallography obtained atomic coordinate data of the gpl20 ternary complex of HIV-1 GP120 complexed with CD4 and Fab 17b having space group P2221 and unit cell dimensions a=71.643, b=88.130, c=196.7. The raw data and the coordinates were described in U.S. Serial No. 09/100,764, filed June 18, 1998 and U.S. Serial No. 08/967,708, filed November 10, 1997, on which this subject application claims priority. These documents are subjected for public inspection. The contents of these applications are incorporated into this application by reference. The coordinates have been deposited in* the in the Brookhaven Protein Data Bank with the accession code Igcl. In addition, the coordinates may be obtained in the worldwide web: www.pbd.bnl.gov after inputting "Igcl" for the above coordinates . Figure 54
Provides a detailed list of all the contacts between gpl20 (designated here as molecule A) and CD4
(designated here as molecule B) . Figure 55 Provides a detailed list of all the contacts between gpl20 (designated here as molecule A) and the Fab 17b
(the light chain is designated here as molecule C; the heavy chain is designated here as molecule D) .
Detailed Description of the Invention
The invention relates to a crystals of gpl20 suitable for x-ray diffraction. The three dimensional structure of gpl20 provides information which has a number of uses; principally related to the development of pharmaceutical compositions which mimic the action of gpl20. In an embodiment, the crystals comprising a portion of gpl20. The portion of gpl20 may contain the CD4 binding site. In another embodiment, the portion contains the chemokine receptor binding site. In a further embodiment, the portion of gpl20 contains both the CD4 binding site and the chemokine receptor binding site .
In a separate embodiment, the portion of gpl20 will be at least 100 amino acids long. In a preferred embodiment, the portion is at least 200 amino acid long.
The essence of the invention resides in the obtaining of crystals of gpl20 of sufficient quality to determine the three dimensional (tertiary) structure of the protein by x-ray diffraction methods.
This invention provides crystals of sufficient quality to obtain a determination of the three-dimensional structure of gpl20 to high resolution, preferably to the resolution of 2.5 angstroms.
The value of crystals of gpl20 extends beyond merely being able to obtain a structure for gpl20. The knowledge of the structure of gpl20 provides a means of investigating the mechanism of action of these proteins in the body. For example, binding of these proteins to various receptor molecules can be predicted by various computer models. Upon discovering that such binding in fact takes place, knowledge of the protein structure then allows chemists to design and attempt to synthesize molecules which mimic the binding of gpl20 to its receptors. This is the method of "rational" drug design.
One skilled in the art may use one of several methods to screen chemical entities for their ability to associate with gpl20. This process may begin by visual inspection of, for example, the active site on the computer screen based on the gpl20 coordinates. Docking may be accomplished using software such as Quanta and Sybyl , followed by energy minimization and molecular dynamics with standard molecular mechanics forcefields, such as CHARMM and AMBER.
Specialized computer programs may also assist in the process of selecting fragments or chemical entities. These include:
GRID [P.J. Goodford, "A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules" , J. Med. Chem. 28:849-857 (1985)]. GRID is available from Oxford Universit, Oxford, UK.
MCSS [A. Miranker and M. Karplus, "Functionality Maps of Binding Sites: A Multiple Copy Simultaneous Search Method", Proteins: Structure, Function and Genetics, 11:29-34 (1991)]. MCSS is available from Molecular Systems, Burlington, MA.
AUTODOCK [D.S. Goodsell and A. J. Olsen, "Automated Docking of Substrates to Proteins by Simulated Annealing", Proteins, Structure, Function, and Genetics, 195-202 (1990)] AUTODOCK is available from Scripps Research Institute, La Jolla, CA.
Once suitable entities or fragments have been selected, they can be assembled into a single compound or inhibitor. Assembly may be proceeded by visual inspection of the relationship of the fragments to each other on the three-dimensional image displayed on a computer screen in relation to the structure coordinates of gpl20. This would be followed by manual model building using software as Quanta or Sybyl .
Useful programs to aid one of skill in the art in connecting the individual chemical entities or fragments include :
CAVEAT [P.A. Bartell et al . , "CAVEAT: A Program of Facilitate the Structure-Derived Design of Biologically Active Molecules" , in Molecular Recognition in Chemical and Biological Problems", Special Pub., Royal Chem. Soc. 78, pp. 182-196 (1989)]. CAVEAT is available from the University of California, Berkeley, CA.
3D Database systems such as MACCS-3D (MDL Information Systems, San Leandro, CA) . This area is reviewed in Y. C. Martin, "3D Database Searching in Drug Design", J. Med. Chem., 35:2145-2154 (1992).
Instead of proceeding to build a gpl20 inhibitor in a step-wise fashion one fragment or chemical entity at a time as described above, inhibitory or other type of binding compounds may be designed as a whole or "de novo" using either an empty active site or optionally including some portion (s) of a known inhibitor (s) . These methods include :
LUDI [H.-J. Bohm "The Computer Program LUDI : A New Method for the De Novo Design of Enzyme Inhibitors", J. Co p. Aid. Molec . Design, 6:61-78 (1992)]. LUDI is available from Biosym Technologies, San Diego, CA. LEGEND [Y. Nishibata and A. Itai, Tetrahedron, 47:8985 (1991)]. LENGEND is available from Molecular Simulations, Burlington, MA.
Other molecular modeling techniques may also be employed in accordance with this invention. See, e.g., N.C. Cohen et al, "Molecular Modeling Software and Methods for Medicinal Chemistry", J. Med. Chem., 33:883-894 (1990) . See also, M.A. Navia and M.A. Murcko, "The Use of Structural Information in Drug Design" , Current Opinions in Structural Biology, 2:202-210 (1992). For example, where the structures of test compounds are known, a model of the test compound may be superimposed over the model of the structure of the invention. Numerous methods and techniques are known in the art for performing this step, any of which may be used. See, e.g., P.S. Farmer, Drug Design, Ariens, E.J., ed. , Vol. 10, pp. 119-143 (Academic Press, New York 1980); U.S. Patent No. 5,331,573; U.S. Patent No. 5,500,807; C. Verlinde, Structure, 2:577-587 (1994); and I.D. Kuntz, Science 257:1078-1082 (1992). The model building techniques and computer evaluation systems described herein are not a limitation on the present invention.
Thus, using these computer evaluation systems, a large number of compounds may be quickly and easily examined and expensive and lengthy biochemical testing avoided. Moreover, the need for actual synthesis of many compounds is effectively eliminated.
Once identified by the modeling techniques, the gpl20 or CD4 antagonist may be tested for bioactivity using standard techniques. For example, structure of the invention may be used in binding assays using conventional formats to screen inhibitors. Suitable assays for use herein include, but are not limited to, the enzyme-linked immunosorben assay (ELISA) , or a fluoresence quench assay. Other assay formats may be used; these assay formats are not a limitation on the present invention.
In another aspect, the gpl20 structure of the invention permit the design and identification of synthetic compounds and/or other molecules which have a shape complimentary to the conformation of the gpl20 active site of the invention. Using known computer systems, the coordinates of the gpl20 structure of the invention may be provided in machine readable form, the test compounds designed and/or screened and their conformations superimposed on the structure of the invention. Subsequently, suitable candidates identified as above may be screened for the desired gpl20 inhibitory bioactivity, stability, and the like.
Once identified and screened for biological activity, these inhibitors may be used therapeutically or prophylactically to block gpl20 activity.
Accordingly, this invention also provides material which is the basis for the rational design of drugs which mimic the action of gpl20.
The subject invention provides a crystal suitable for X- ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gpl20.
The subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 4 angstroms or better than 4 angstroms.
The subject invention also provides the above-described crystals, which effectively diffract X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 2.5 angstroms or better than 2.5 angstroms .
The subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a CD4 binding site.
The subject invention further provides the above- described crystals, further comprising a compound bound to the CD4 site.
The subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a chemokine receptor binding site.
The subject invention also provides the above-described crystals, further comprising a compound bound to the chemokine receptor binding site.
The subject invention also provides the above-described crystals, wherein the portion of gpl20 comprises a CD4 binding site and a chemokine receptor binding site.
The subject invention also provides the above-described crystals, further comprising of a first compound bound to the CD4 binding site of the polypeptide and a second compound bound to the chemokine receptor binding site of the polypeptide.
The subject invention also provides the above-described crystals, wherein the first compound is the second compound .
The subject invention also provides the above-described crystals, wherein the crystal is arranged in a space group P222x, so as to form a unit cell of dimensions a=71.6 A, b= 88.1 A, c=196.7 A, and which effectively diffracts x-rays for determination of the atomic coordinates of the gpl20 to a resolution of 2.5 A or better .
The subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 lacking the VI, V2 , V3 , and C5 regions.
The subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the conserved stem of the V1/V2 stem- loop structure.
The subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the base of the V3 loop.
The subject invention also provides the above-described crystals, wherein the gpl20 variant comprises a portion of the C5 region.
The subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 with 5% by weight of the carbohydrate residues linked to the gpl20 in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
The subject invention also provides the above-described crystals, wherein the polypeptide is a variant of gpl20 with 15% by weight of the carbohydrate residues linked to the gpl20 polypeptide in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
The subject invention also provides the above-described crystals, further comprising a Fab, a CD4 , a polypeptide having amino acid sequence of a portion of CD4 , or a combination thereof, bound to the gpl20.
The subject invention also provides the above-described crystals, wherein the Fab is produced from an antibody to a discontinuous epitope.
The subject invention also provides the above-described crystals, wherein the monoclonal antibody is designated 17b.
The subject invention additionally provides a method for producing a crystal suitable for X-ray diffraction comprising: (a) deglycosylating a polypeptide having amino acid sequence of a portion of a gpl20 wherein said portion is produced by deleting or replacing part of the gpl20 to reduce the surface loop flexibility; (b) contacting the polypeptide with a ligand so as to form a complex which exhibits restricted conformational mobility; and (c) obtaining crystal from the complex so formed to produce a crystal suitable for X-ray diffraction.
The subject invention also provides the above-described methods, wherein the VI, V2 , or V3 loop of the gpl20 contained in the polypeptide are partially truncated, deleted or replaced.
The subject invention also provides the above-described methods, wherein the polypeptide lacks the VI, V2 , V3 and C5 loop of the gpl20.
The subject invention also provides the above-described methods, wherein the polypeptide also lacks up to fifty N-terminal amino acids of the gpl20 or up to fifty C- terminal amino acid of gpl20.
The subject invention also provides the above-described methods, wherein the ligand is a Fab, a CD4 , or a polypeptide having amino acid sequence of a portion of CD4.
The subject invention also provides the above-described methods, wherein the resulting polypeptide after the deglycosylation contains at least 5% of the carbohydrate .
The subject invention also provides the crystal produced by the above-described methods.
The subject invention also provides a method for identifying a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the gpl20.
The subject invention also provides a method for designing a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and (b) designing a compound to fit the binding site.
Structure-based drug design has been known and was previously described. See e.g., Bugg et al . (1993) Sci. Amer., December: 92-98; Giranda (1994) Structure, 2:695- 698; Lam et al . (1994) Science 263:380-384; and Navia et al. (1994) Circulation 89 (4) : 1557-1566. The subject invention also provides the above-described methods, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
The subject invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
The subject invention also provides a pharmaceutical composition comprising the compound identified by the above-described methods and a pharmaceutically acceptable carrier.
For the purposes of this invention "pharmaceutically acceptable carriers" means any of the standard pharmaceutical carriers. Examples of suitable carriers are well known in the art and may include, but not limited to, any of the standard pharmaceutical carriers such as a phosphate buffered saline solutions, phosphate buffered saline containing Polysorb 80, water, emulsions such as oil/water emulsion, and various type of wetting agents. Other carriers may also include sterile solutions, tablets, coated tablets, and capsules.
Typically such carriers contain excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients . Such carriers may also include flavor and color additives or other ingredients.
Compositions comprising such carriers are formulated by well known conventional methods.
The subject invention also provides the above-described methods, wherein the compound is not previously known.
The subject invention also provides the compounds identified by the above-described methods.
The subject invention also provides the compound designed by the above-described methods.
The subject invention also provides a composition comprising the above-described compounds and a suitable carrier .
This invention also provides a method of inhibiting the interaction of HIV-gpl20 with CD4 which comprises administering to a mammal a compound, with the proviso that the compound is not CD4 , capable of disrupting two or more of the contacts between gpl20 and CD4 as set forth in Figure 54.
This invention also provides a method for identifying a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the CD4 binding site of the gpl20.
The molecular interaction on HIV with CD4 is between the HIV envelope glycoprotein gpl20 and the DI domain of CD4. The crystal structure of the complex between the deglycosylated core of gpl20 and the D1D2 fragment of human CD4 defines this interaction in atomic detail (Nature paper) . Although there is an extensive interface between these components, the nexus of the interaction brings together those residues demonstrated by mutational analyses to those most crucial for binding. Phe 43 and Arg 59 from CD4 and Asp 368, Glu 370 and Trp 427 from gpl20 (Nature paper, Fig. 3j). This dominant sub-site comprises gpl20 residues 365-368, 370-371, 425-430 and 473. In addition, Phe 43 closes off a pocket on the HIV surface to form a large cavity (152A3) at this interface (Nature paper, Fig. 3b) . Residues that line the Phe 43 pocket include Trp 112, Val 255, Thr 257, Glu 370, Phe 382, Tyr 384, Try 427, Met 475 and main-chain atoms of 256 and 375-377.
The atomic coordinates of the crystallographic model also define the binding surface to be exploited by high- affinity compounds that will have the property to inhibit the gpl20-CD4 interaction, and thereby the attachment of HIV to CD4-positive cells. This definition of the surface provides practioners skilled in the art with the means to design such compounds . Appropriate fragments or chemicals entities for the design of such compounds can be formed through the use of specialized computer programs such as GRID, DOCK and LUDI . Computer graphical representatives of these entitles can then be composed into appropriate chemical compounds, using the crystal structure as a template. Medicinal chemists skilled in the art can then synthesize appropriate chemical compounds to implement these designs. Not all such compounds will bind and have inhibitory properties, but a sufficient portion will do so to provide the designed lead compounds for drug discovery. Such leads can then be developed by the methods of structure-based drug design using crystallized complexes between these compounds and deglycosylated core gpl20.
A compound that will bind to the dominant sub-site of the CD4 intermolecular interface will have surface properties that are complementary to the surface properties of the sub-site itself. The surface of the sub-site can be characterized by the GRASP computer program with respect to curvature, electrostatic potential, and hydrophobicity. The complementary surface to this one (i.e., convex vs. concave, positive vs. negative, etc.) Defines an envelope that will correspond to the binding portion of the molecular surface of an inhibitory compound. Thus, any compound that has an accessible conformation such as to match the surface that is complementary to the HIV gpl20 binding surface is one that has a high probability for inhibitory binding. Since it should be possible for skilled practitioners to design and synthesize such compounds when instructed by the template of the HIV gpl20 structure and the CD4 binding elements, these compounds defined by congruence with the complementary surface can be considered inventions by the process hereby defined.
This invention also provides a method for designing a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X- ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and (b) designing a compound to fit the CD4 binding site.
This invention also provides the above-described methods, wherein the crystal further comprising a CD4 , a second polypeptide having amino acid sequence of a portion of CD4 , or a compound known to be able to bind to the CD4 site of the gpl20, bound to the polypeptide.
This invention also provides the above-described methods, wherein the fitting is determined by shape complementarity or by estimated interaction energy. This invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
This invention also provides a pharmaceutical composition comprising the compound identified the by above-described methods and a pharmaceutically acceptable carrier.
This invention also provides the above-described methods, wherein the compound is not previously known.
This invention also provides the compound identified by the above-described methods.
This invention also provides the compound designed by the above-described methods.
This invention also provides a composition comprising the above-described compounds and a suitable carrier.
This invention also provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject.
In embodiments of the above-described methods, the above-described compounds are nonpeptidyl.
This invention provides a method for identifying a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) determining whether a compound would fit into the binding site, a positive fit indicating that the compound is capable of binding to the chemokine receptor binding site of the gpl20.
This invention also provides a method for designing a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: (a) determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and (b) designing a compound to fit the chemokine receptor binding site.
This invention also provides the above-described methods, wherein the crystal further comprises a chemokine receptor, a second polypeptide having amino acid sequence of a portion of chemokine receptor, an antibody or a Fab capable of binding to the chemokine receptor binding site or a compound known to be capable of binding to the chemokine receptor binding site, bound to the polypeptide.
This invention also provides the above-described methods, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
This invention also provides the above-described methods, wherein the atomic coordinates are set forth in Figure 53.
The pharmaceutical composition comprising the compound identified by the above-described methods and a pharmaceutically acceptable carrier. This invention also provides the above-described methods, wherein the compound is not previously known.
This invention provides compounds identified by the above-described methods. This invention provides compounds designed by above-described methods.
A composition comprising the above-described compounds and a suitable carrier.
Additionally, this invention provides a method of inhibiting Human Immunodeficiency Virus infection in a subject comprising adminstering effective of amount of the above-described composition to the subject, thereby inhibiting Human Immunodeficiency Virus infection.
This invention further provides a method of inhibiting the interaction of HIV-gpl20 with chemokine receptor which comprises administering to a mammal a compound capable of disrupting two or more of the contacts between gpl20 and chemokine receptor as set forth in Figure 55, thereby inhibiting the interaction of HIV- gpl20 with chemokine receptor with the proviso that the compound is not a chemokine receptor. In an embodiment, the compound is nonpeptidyl.
Table Summarizing the CCR5-binding residues of gpl20
SET A 117, 121, or 123
SET B 207
SET C 330
SET D 419 420, 421, 422, 437, 438, 440, 441,
442 or 444
This invention further provides a method of inhibiting cell entry by HIV, comprising blocking or inhibiting the residues from 2 or more the sets of the CCR5 -binding residues set forth above, thereby inhibiting or preventing gpl20 from binding to CCR5 and thereby inhibiting cell entry by HIV.
This invention also provides the above described method wherein 3 or more the sets of the CCR5-binding residues set forth above are blocked or inhibited from interacting with CCR5.
This invention also provides the above described methods, wherein the blocking or inhibiting comprises contacting the CCR5 -binding residues with an antibody.
This invention also provides the above-described methods, wherein the compound is nonpeptidyl .
This invention provides a substance mimicking the human immunodeficiency virus envelope glycoprotein gpl20- binding region of CD4 wherein the size of a residue or analog thereof, corresponding to the phenylalanine at position 43 in the native CD4 , is larger than the size of phenylalanine so as to fill the pocket on gpl20 which extends beyond position 43 in the gpl20/CD4 complex and increase the affinity for gpl20.
As used herein, residue or analog thereof includes amino acids (both individually and as part of a polypeptide chain), modified amino acids, amino acid analogs, and chemical compounds that can be substituted for the amino acids that ordinarily make up the CD4 polypeptide chain. (Also see the discusion of peptidomimetics, synthetic polypeptides, and polypeptide analogs below.)
This invention also provides the above-described substance, wherein the substance is a peptidomimetic analog, a synthetic polypeptide, a standard polypeptide, or a polypeptide analog.
As used herein, the substance mimicking the gpl20- binding domain of CD4 embraces a wide range of compounds. In addition to naturally-occurring forms of polypeptides derived from CD , the present invention also embraces other CD4 polypeptides such as polypeptide analogs of CD4. Such analogs include fragments of CD4. Following the procedures of the published application by Alton et al . (WO 83/04053), one can readily design and manufacture genes coding for microbial expression of polypeptides having primary conformations which differ from that herein specified for in terms of the identity or location of one or more residues (e.g., substitutions, terminal and intermediate additions and deletions) . Alternately, modifications of cDNA and genomic genes can be readily accomplished by well-known site-directed mutagenesis techniques and employed to generate analogs and derivatives of the CD4 polypeptide. Such products share at least one of the biological properties of CD4 but may differ in others.
As examples, products of the invention include those which are foreshortened by e.g., deletions; or those which are more stable to hydrolysis (and, therefore, may have more pronounced or longerlasting effects than naturally-occurring products) ; or which have been altered to delete or to add one or more potential sites for O-glycosylation and/or N-glycosylation or which have one or more cysteine residues deleted or replaced by e.g., alanine or serine residues and are potentially more easily isolated in active form from microbial systems; or which have one or more tyrosine residues replaced by phenylalanine and bind more or less readily to target proteins or to receptors on target cells. Also comprehended are polypeptide fragments duplicating only a part of the continuous amino acid sequence or secondary conformations within gpl20, which fragments may possess one property of gpl20 and not others. It is noteworthy that activity is not necessary for any one or more of the polypeptides of the invention to have therapeutic utility or utility in other contexts, such as in assays of gpl20 antagonism. Competitive antagonists may be quite useful in, for example, cases of overproduction of gpl20.
Of applicability to polypeptide analogs of the invention are reports of the immunological property of synthetic peptides which substantially duplicate the amino acid sequence extant in naturally-occurring proteins, glycoproteins and nucleoproteins . More specifically, relatively low molecular weight polypeptides have been shown to participate in immune reactions which are similar in duration and extent to the immune reactions of physiologically-significant proteins such as viral antigens, polypeptide hormones, and the like. Included among the immune reactions of such polypeptides is the provocation of the formation of specific antibodies in immunologically-active animals [Lerner et al . , Cell, 23, 309-310 (1981); Ross et al . , Nature, 294, 654-658
(1981); Walter et al . , Proc. Natl. Acad. Sci. USA ,78,
4882-4886 (1981); Wong et al . , Proc. Natl. Sci. USA, 79,
5322-5326 (1982); Baron et al . , Cell, 28, 395-404
(1982); Dressman et al . , Nature, 295, 185-160 (1982); and Lerner, Scientific American, 248, 66-74 (1983) . See also, Kaiser et al . [Science, 223, 249-255 (1984)] relating to biological and immunological properties of synthetic peptides which approximately share secondary structures of peptide hormones but may not share their primary structural conformation.
This invention also provides the above-described substances, wherein the modification increases the hydrophobicity or size of the residue or analog thereof at position 43.
This invention also provides the above-described substances, wherein the modification comprises directly or indirectly linking a hydrophobic compound to a residue or analog thereof at position 43 of the domain.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that is bulkier than phenylalanine .
This invention also provides the above-described substances, wherein the modification- results in a residue or analog thereof larger than 7 A across its longest dimension.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 10 A across its longest dimension.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof larger than 15 A across its longest dimension.
This invention provides the above described substance which enhances hydrophobic interactions to residues that line the pocket. In another embodiment, this invention provides the above described substance which enhances hydrogen bonding to residues that line the pocket . In a separate embodiment, this invention provides the above described substance which enhances electrostatic interactions with residues that line the pocket. In a still separate embodiment, this invention provides the above described substance which enhances surface fit with residues that line the pocket.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof, wherein the residue's longest dimension is longer than phenylalanine' s longest dimension.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains a localization of negative charge so as to render the gpl20-binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains a localization of charge so as to render the gpl20 -binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
This invention also provides the above-described substances, wherein the modification results in a residue or analog thereof that contains at least one additional hydroxyl group.
Placing a tyrosine residue at position 43 is an example of a modification resulting in a residue that contains a hydroxyl group. Further, the oxygen of the hydroxyl group has a localization of negative charge so as to render the gpl20 -binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20. Further, the hydrogen of the hydroxyl group has a localization of charge so as to render the gpl20-binding domain of CD4 able to hydrogen bond more strongly with the hydroxyl-containing side chains lining gpl20.
This invention also provides the above-described substances, wherein the modification involves replacemnet of the residue at position 43 with a cysteine. This invention further provides that the substition of the sulfhydryl group of this cysteine.
This invention also provides the above-described substances, wherein the modification involves replacement of the residue at position 43 with a tyrosine. This invention further provides that the substition of this tyrosine
This invention also provides the above-described substances, wherein the modification comprises directly or indirectly linking an adaptor residue or analog thereof to position 43.
This invention also provides the above-described substances, wherein the adaptor residue or analog thereof is directly or indirectly linked to a hydrophobic compound, thus forming a complex.
This invention also provides the above-described substances, wherein the complex is bulkier than phenylalanin .
This invention also provides the above-described substances, wherein the complex is larger than 7 A across its longest dimension.
This invention also provides the above-described substances, wherein the complex's longest dimension is longer than phenylalanine' s longest dimension
This invention also provides the above-described substances, wherein the complex is larger than 10 A across its longest dimension. This invention further provides a pharmaceutical composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the above- described substance; and (b) a pharmaceutically acceptable carrier.
The actual effective amount will be based upon the size of the polypeptide, the biodegradability of the polypeptide, the bioactivity of the polypeptide and the bioavailability of the polypeptide. If the polypeptide does not degrade quickly, is bioavailable and highly active, a smaller amount will be required to be effective. The effective amount will be known to one of skill in the art; it will also be dependent upon the form of the polypeptide, the size of the polypeptide and the bioactivity of the polypeptide. Variants of CD4 with lower affinity for gpl20 will require higher dosages than variants of CD4 with higher affinity for gpl20. One of skill in the art could routinely perform empirical activity tests to determine the bioactivity in bioassays and thus determine the effective amount.
Pharmaceutically acceptable carriers are well known to those skilled in the art and have been described supra.
A pharmaceutical composition for treating or preventing HIV infection, comprising (a) an effective amount of the above-described substances; and (b) a pharmaceutically acceptable carrier.
This invention further provides a composition capable of inhibiting cell entry by HIV, comprising (a) an effective amount of the above-described substances; and (b) a suitable carrier.
This invention further provides a pharmaceutical composition for treating or preventing HIV infection, comprising (a) an effective amount of the above- described substances; and (b) a pharmaceutically acceptable carrier.
This invention further provides a composition for treating or preventing HIV infection, comprising (a) an effective amount of the above-described substances; and (b) a suitable carrier.
This invention further provides a method of inhibiting cell entry by HIV, comprising contacting the cells with an effective amount of the above-described substances, thereby inhibiting cell entry by HIV.
This invention further provides a method of treating or preventing HIV infection in a subject, comprising administering to the subject an effective amount of the above-described substances, thereby treating or preventing HIV infection.
The invention provides a variant of gpl20 which presents a hidden, conserved, neutralization epitope. In an embodiment, the amino acid of the above variant at position 375 is changed from a Serine to a Trptophan. In a further embodiment, the variant further comprise one of the following changes: 88N to P, 102E to L, 113D to R, 117K to W, 257T to A, 266A to E, 386N to Q, 395W to S, 421K to L, 470P to G, 475M to S, 485K to V or a combination thereof .
This invention further provides a composition comprising the above-described variant and a suitable carrier.
In a specific embodiment, "composition" as used herein means pharmaceutical compositions comprising therapeutically effective amounts of polypeptide products of the invention together with suitable diluents, preservatives, solubilizers, emulsifiers, adjuvants and/or carriers useful in therapy. A "therapeutically effective amount" as used herein refers to that amount which provides a therapeutic effect for a given condition and administration regimen. Such compositions are liquids or lyophilized or otherwise dried formulations and include diluents of various buffer content (e.g., Tris-HCl., acetate, phosphate), pH and ionic strength, additives such as albumin or gelatin to prevent absorption to surfaces, detergents (e.g., Tween 20, Tween 80, Pluronic F68, bile acid salts), solubilizing agents (e.g., glycerol, polyethylene glycerol) , anti-oxidants (e.g., ascorbic acid, sodium metabisulfite) , preservatives (e.g., Thimerosal, benzyl alcohol, parabens) , bulking substances or tonicity modifiers (e.g., lactose, mannitol) , covalent attachment of polymers such as polyethylene glycol to the protein, complexation with metal ions, or incorporation of the material into or onto particulate preparations of polymeric compounds such as polylactic acid, polglycolic acid, hydrogels, etc, or onto liposomes, microemulsions , micelles, unilamellar or multilamellar vesicles, erythrocyte ghosts, or spheroplasts . Such compositions will influence the physical state, solubility, stability, rate of in vivo release, and rate of in vivo clearance of the admininstered materials . The choice of compositions will depend on the physical and chemical properties of the protein having the biological activity. For example, a product derived from a membrane-bound form of the protein may require a formulation containing detergent. Controlled or sustained release compositions include formulation in lipophilic depots (e.g., fatty acids, waxes, oils) . Also comprehended by the invention are particulate compositions coated with polymers (e.g., poloxamers or poloxamines) and the variants coupled to antibodies directed against tissue-specific receptors, ligands or antigens or coupled to ligands of tissue-specific receptors. Other embodiments of the compositions of the invention incorporate particulate forms protective coatings, protease inhibitors or permeation enhancers for various routes of administration, including parenteral, pulmonary, nasal and oral.
For the purposes of this invention "suitable carriers" means any of the standard carriers used in the pharmaceutical industry. Examples of suitable carriers are well known in the art and may include, but not limited to, any of the standard pharmaceutical carriers such as a phosphate buffered saline solutions, phosphate buffered saline containing Polysorb 80, water, emulsions such as oil/water emulsion, and various type of wetting agents. Other carriers may also include sterile solutions, tablets, coated tablets, and capsules.
Typically such carriers contain excipients such as starch, milk, sugar, certain types of clay, gelatin, stearic acid or salts thereof, magnesium or calcium sterate, talc, vegetable fats or oils, gums, glycols, or other known excipients. Such carriers may also include flavor and color additives or other ingredients. Compositions comprising such carriers are formulated by well known conventional methods.
This invention also provides a vaccine comprising the above-described variant. Such a vaccine may further comprise a suitable adjuvant.
Vaccines and adjuvants are well-known to those skilled in the art. Using a vaccine, comprising adjuvants or not, one may induce or stimulate the immune response of an individual. The immune response may vary, e.g. a humoral or cell-mediated immune response. Adjuvants are chemical compounds that enhance the immunogenicity of the vaccine so as to enhance the stimulation and induction of the immune response.
In a specific embodiment, the vaccine is administered to a subject. As used herein, "subject" means any animal or artificially modified animal capable of becoming HIV- infected. Artificially modified animals include, but are not limited to, SCID mice with human immune systems.
In the preferred embodiment, the subject is a human. In another embodiment, the subject is a human infected with HIV.
As used herein, a "human infected with HIV" means an individual having at least one of his own cells infected by HIV. As used herein, an HIV-infected cell is a cell wherein HIV has been produced. A non-HIV-infected subject means a subject not having any cells infected by HIV. In one embodiment, a non-HIV- infected subject is an HIV-exposed subject. As used herein, an HIV-exposed subject is a subject who has HIV present in his body, but has not yet become HIV-infected. For example, a subject may become HIV-exposed upon receiving a needle stick injury with an HIV-contaminated needle.
In a specific embodiment of the invention, one may first crystals of gpl20 of sufficient quality to determine the three dimensional (tertiary) structure of the protein by x-ray diffraction methods. The value of crystals of gpl20 extends beyond merely being able to obtain a structure for gpl20. The knowledge of the structure of gpl20 provides a means of investigating the mechanism of action of these proteins in the body. For example, binding of these proteins to various receptor molecules can be predicted by various computer models. Upon discovering that such binding in fact takes place, knowledge of the protein structure then allows chemists to design and attempt to synthesize molecules which mimic the binding of gpl20 to its receptors. This is the method of "rational" drug design. Using such methods, one may determine a variant of gpl20 which presents a hidden, conserved, neutralization epitope.
This invention further provides an antibody induced by the above-described vaccine. Specifically, the antibody may be a polyclonal antibody or a monoclonal antibody.
An antibody comprises intact immunoglobulin molecules, substantially intact immunoglobulin molecules and those portions of an immunoglobulin molecule that contains the paratope, including those portions known in the art as Fab, Fab', F(ab')2 and F(v), which portions are preferred for use in the therapeutic methods described herein. In another embodiment, the antibody is a single-chain antibody.
As used herein, "polyclonal antibodies" may comprise different sera whereas "monoclonal antibody" comprises antibodies, each of which will reconize one single epitope. Methods for production of monoclonal antibodies are well-known in the art.
In order to determine variants of gpl20 which presents a hidden, conserved, neutralization epitope, the gpl20 structure of the invention permit the design and identification of synthetic compounds and/or other molecules which have a shape complimentary to the conformation of the gpl20 active site of the invention. Using known computer systems, the coordinates of the gpl20 structure of the invention may be provided in machine readable form, the test compounds designed and/or screened and their conformations superimposed on the structure of the invention. Subsequently, suitable candidates identified as above may be screened for the desired gpl20 inhibitory bioactivity, stability, and the like.
Once identified and screened for biological activity, these inhibitors may be used therapeutically or prophylactically to block gpl20 activity. ' Such compounds may prove useful as vaccines .
This invention provides a vaccine comprising a polypeptide having 6 or more amino acids in the same spatial proximity to each other as the amino acids from the Phe 43 cavity of naturally occurring gpl20.
This invention also provides the above-described vaccine, wherein the 6 or more amino acids are identical to the amino acids of naturally occurring gpl20.
This invention further provides the above-described vaccines, wherein the amino acids are within 1 angstrom of their distances in naturally occurring gpl20.
This invention also provides the above-described vaccines, wherein the amino acids are within 3 angstroms of their distances in naturally occurring gpl20.
This invention provides the above-described vaccines, wherein the amino acids are within 5 angstroms of their distances in naturally occurring gpl20.
This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
This invention further provides the above-described vaccines, further comprising a carrier.
This invention also provides the above-described vaccines, further comprising an adjuvant.
This invention provides a vaccine comprising a polypeptide having 6 or more continuous amino acids from the Phe 43 cavity of gpl20.
This invention provides the above-described vaccines, wherein the polypeptide is or is part of an epitope a conserved neutralization epitope.
This invention also provides ' the above-described vaccines, further comprising a carrier.
This invention further provides the above-described vaccines, further comprising an adjuvant.
This invention further provides a vaccine comprising a polypeptide having 6 or more amino acids in the same spatial proximity to each other as the surface accessible amino acids adjacent to the Phe 43 cavity of naturally occurring gpl20.
This invention also provides the above-described vaccines, wherein the 6 or more amino acids are identical to the amino acids of naturally occurring gpl20.
This invention provides the above-described vaccines, wherein the amino acids are within 1 angstrom of their distances in naturally occurring gpl20.
This invention also provides the above-described vaccines, wherein the amino acids are within 3 angstroms of their distances in naturally occurring gpl20.
This invention further provides the above-described vaccines, wherein the amino acids are within 5 angstroms of their distances in naturally occurring gpl20.
This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
This invention further provides the "above-described vaccines, further comprising a carrier.
This invention also provides the above-described vaccines, further comprising an adjuvant.
This invention also provides the above-described vaccines, wherein the surface accessible amino acids comprise Lysine 432, Proline 369, and Threonine 373.
This invention further provides a vaccine comprising a polypeptide having 6 or more continuous surface accessible amino acids adjacent to the Phe 43 cavity of gpl20.
This invention also provides the above-described vaccines, wherein the polypeptide is or is part of a conserved neutralization epitope.
This invention further provides the above-described vaccines, further comprising a carrier. This invention also provides the above-described vaccines, further comprising an adjuvant.
Many animal viruses target specific host cells for infection by attachment to cell surface receptor molecules unique to these cells . These viral receptors have particular roles in the normal functioning of these cells. The virus simply subverts these functions in order to effect entry into the cell . Certain molecules on the viral surface can in turn be the target of antibodies raised by the host in defense against this parasitic attack. Viruses can evade such antibody immunity by mutating their surface proteins. The receptor binding site, however, must remain constant. It therefore evolves to be protected from antibody surveillance .
Application to HIV vaccine:
The viral surface protein, gpl20 (which appears to be a trimer on the surface of the virion) , -plays a central role in immune evasion. The precise mechanism of gpl20 immune evasion thus far remains unknown, but the structure of the gpl20 - CD4 - Fab 17b complex reveals several crucial features:
1. The CD4 binding site is very large (larger than the typical antibody footprint) . 2. The Vl/2 variable loop is oriented to mask the CD4 binding site. 3. The V3 variable loop is not near the CD4 binding site (on a monomer) , but the tip of this loop could interact with Fab 17b, which marks the second receptor binding site. 4. The CD4 binding site undergoes conformational changes upon CD4 binding.
From the structure, the following details of the mechanism of gpl20 immune evasion become clear:
1. The Vl/2 loop occludes the CD4 binding site and allow CD4 binding. With most viruses, which bind to rare cellular receptors, such a mechanism of immune evasion would not work; the virus would not find the proper receptor at high enough frequency to ensure viral propagation. It is the clustering of CD4 positive cells in such places as the thymus which allows this mechanism to function in the particular case of HIV. 2. The virus masks constant regions involved in both CD4 and second receptor binding; the act of CD4 binding induces conformational changes in gpl20 which unmask these regions . 3. The V3 loop, which forms part of the conserved second receptor binding site, is one of the regions unmasked by CD4 binding.
This invention uses an antigen which mimics the conformation of gpl20 on the surface of the HIV-virion, with deletions in the variable loop regions to expose the conserved CD4 binding site. It is already known that CD4 -binding site antibodies are widely neutralizing, and moreover, are found in virtually all patients (although they tend to only be found late in the course of infection -- the initial antibodies produced early in the course of infection have the Vl/2 or V3 loop as epitopes) .
This invention provides a vaccine composed of a stabilized oligomer of gpl20, with truncations in the variable loop regions to expose the conserved CD4 binding site, would elicit widely neutralizing antibodies against HIV.
Details: Oligomer stabilization (Figure 50) :
1. Appropriately placed cysteine mutations, which would then form stabilizing disulfide bonds.
2. Linkers between consecutive N- and C- termini. (The structure shows that the N- and C- termini of gpl20 are relatively close together. A genetically constructed flexible linker of amino acids between the C- terminus of one monomer and the N- terminus of an adjacent monomer would also serve to covalently stabilize the oligomer.)
3. A gpl40 construct (the extracellular portion of gpl20 + gp41) with a mutation at the gpl20/gp41 consensus cut site. 4. Trimers of GCN4 have been shown to enhance oligomerization. These oligomerization stabilizers could be added t the C-terminal tail of gpl20.
Loop deletions:
Replacement of Vl/2 loop with tripeptide Gly-Ala-Gly to expose of the CD4 binding site. Replacement of the V3 loop as well .
Stabilization of kinetically hidden epitopes of gpl20. This invention uses gpl20 which has been stabilized to elicit an immune response. gpl20 may undergo conformational changes. However, only very few expose a conserved, neutralization epitope. This invention aims at using the information from the structure of gpl20 to stabilize the hidden neutralization epitope of gpl20. Specifically, the epitope may be stabilized by mutating the gpl20 or alternatively, some epitope may be stabilized by ligand/drug interaction.
Specific examples are illustrated below: Example 1 :
The pocket of gpl20 (Figure 51) only forms upon CD4 binding. If the residues along the pocket are mutated and was filled up, making it "stuck" in the CD4 conformation even without the binding CD4. Such mutation may include changing the Ser375 to Trp375, Val255 to Phe255 and Thr257 to Trp257.
The residues which lines the pocket include: Trp 112 Leu 116 Pro 118 Phe 210 Val 255 Thr 257 Ser 375 Asn 377 Phe 382 He 424 Met 426 Trp 427 Asn 428 Ala 433 Gly 473 Met 475
Example 2 :
Making disulfide bridges which tie protein domains . The Topology for gpl20 (Δ82, ΔVl/2, ΔV3 , ΔC5 ) is shown in Figure 52. One can see the two domains, the N/C termini including αl, and a barrel around c-2. A disulfide formed between S2 and S21 will tie the protein domains. As another example, disulfide bridge may be formed between β5 and β6 connection and top of the barrel (e.g. βlO) .
Example 3 Cavities internal to the gpl20 may be determined after knowing the three-dimensional structure of gpl20. Analysis of all atoms are within 4 Angstroms of the surface defining each cavity allows mutations to be designed to determine if any large substitutions are allowed. Below shows some example of the analysis:
Val225 to Trp - not as good as 375 (below) -modeling shows some clashes with Met 475, although 475 should be able to move . Ser375 to Trp - good fit (Note: the ser 375 mutation is incompatible with the Val225 mutation so only one can be made at a time) .
Following are the antibody binding results: Tabl : Binding of the gp!20 Variants to CD4BS
Antibod: Les
CD4BS antibodies
Mutants F105 15e IgGbl2 21h F91
255V/W 0.0 0.06 0.51 0.80 0.76
375S/W 0.0 0.36 0.05 0.72 0.0
Wild-type phenotype is 1.00 and decreases in recognition of below 1.
Control for CD4 and 17b Binding
Mutants CD4 17b
255V/W 0.7 0.8 375S/W 0.9 1.0
The above result shows clearly that the mutations of Val255 to Trp, and Ser375 to Trp cause decrease of binding to CD4BS antibodies.
The cavity filling mutant 375S/W clearly exhibits reduction in binding of CD4-BS antibody binding. While the data look good, two of the CD4-BS antibodies (15e and 21 h) still bind with reasonable affinity.
The basic idea behind the cavity filling mutants is to stabilize the CD4-bound conformation of gpl20 at the expense of the CD4-free conformation. Additional substitutions may then be made in combination with the 375S/W. For example, taking the known mutations which exhibit similar phenotypes to 375S/W (See Thali et al (1995), J.Virol. 67, 3978-3988).
88N/P: Since this substitution is very far from the CD4 interacting surface, the only way to explain the results is they affect the C1/C5 terminal regions which in some manner affect the relative stability of the gpl20 conformations.
102E/L: This glutamic acid is on the surface of gpl20 and appears to stabilize the alphal/alpha5 helix interaction. However, the stabilization is weak. The only way to explain the observed phenotype is that in the CD4 -minus conformation, the glutamic acid is somehow involved in a stabilizing interaction, perhaps to the nearby Arg that in this conformation is just out of reach.
113D/R: The aspartic acid stabilizes the bridging sheet residues Gln428 and Lys429, which are important for maintaining the CD4-bound conformation of gpl20.
117K/W: The lysine helps stabilize the bridging sheet conformation, but this substitutions may also affects CCR5 binding so it may not be so good a choice . 257T/A: Since this Thr is basically buried in the CD4- bound conformation, and indeed provides stabilizing hydrogen bonds, the only way to explain the phenotype of the T/A substitution is that Thr257 must be a critical element in maintaining the CD4 -minus conformation. The residue is quite close 375W, so there may be some complications. If one places 375W in its preferred rotamer conformation, it clashes with Thr257-the mutation is accommodated by a slight change in rotamer conformation or by movement of the 375 backbone) . So the T/A may actually help to accommodate the 375W change .
266A/E: The alanine is buried in the interface between the inner and outer domain. The substitution (on the face away from CD4 binding) most certainly effects thins conformationally, but since it is disruptive it is difficult to interpret.
386N/Q: This substitution is on the outer face of the outer domain and may not affect conformation. However, it does effect 21h the epitope of which is closer tot he inner domain so perhaps the loss of carbohydrate has long-range conformation effects.
395W/S: This substitution is also on the outer face of the outer domain. But it affects all the CD4-BS antibodies while retaining good CD4 binding.
421K/L: This is on bridging sheet. In the CD4 -bound conformation, the Leu may pack nicely against Ileu423. Although this substitution may reduce CCR5 binding, the fact that it is far from where CD4 binds suggest that its effects may be conformational. (The effect 421K/D on CCR5 binding may be primarily electrostatic.)
470P/G and 475M/S: Both of these are close to the CD4- binding region, although both are buried and do not interact directly with CD4. Both retain good CD4 binding so the effect may be conformational.
485K/V: This is at the inner/outer domain interface. There may be steric clashes of the valine (the base of the Lys is buried) which may be disruptive.
This invention further provides vaccine design based upon confromational stabilization using the three- dimensional structure. See e.g. Malakauskas and Mayo Nature Structure Vol .5 , p.470-475, entitled "Design, Structure and Stability of a hyperthermophilic protein variant," the content of which is incorporated into this application by reference.
The invention will be better understood by reference to the Experimental Details which follow, but those skilled in the art will readily appreciate that the specific experiments detailed are only illustrative, and are not meant to limit the invention as described herein, which is defined by the claims which follow thereafter.
FIRST SERIES OF EXPERIMENTS
Probability analysis of variational crystallization and its application to gpl20, the exterior envelope glycoprotein of type 1 human immunodeficiency virus (HIV-1)
Summary
The extensive glycosylation and conformational mobility of gpl20, the envelope glycoprotein of type 1 human immunodeficiency virus (HIV-1) , pose formidable barriers for crystallization. To surmount these difficulties, we used probability analysis to determine the most effective crystallization approach and derive equations which show that a strategy, which we term variational crystallization, substantially enhances the overall probability of crystallization for gpl20. Variational crystallization focuses on protein modification as opposed to crystallization screening. Multiple variants of gpl20 were analyzed with an iterative cycle involving a limited set of crystallization conditions and biochemical feedback on protease sensitivity, glycosylation status, and monoclonal antibody binding. Sources of likely conformational heterogeneity such as N-linked carbohydrates, flexible or mobile N- and C-termini, and variable internal loops were reduced or eliminated, and ligands such as CD4 and antigen-binding fragments (Fabs) of monoclonal antibodies were used to restrict conformational mobility as well as to alter the crystallization surface. Through successive cycles of manipulation involving 18 different variants, we succeeded in growing six different types of gpl20 crystals. One of these, a ternary complex composed of gpl20, its receptor CD4 , and the Fab of the human neutralizing monoclonal antibody 17b, diffracts to a minimum Bragg spacing of at least 2.2 A and is suitable for structural analysis.
Introduction
In conventional crystallizations of biological macromolecules , the protein or other macromolecular subject is treated as a fixed entity to be tested in a multitude of crystallization conditions. Despite advances such as sophisticated screening procedures (1 , 2) and crystallization robots(3,4), this approach often fails for components from complex biological systems. One of these, the subject of this study, is the HIV-1 exterior envelope glycoprotein, gpl20. In such cases, success may follow if the protein itself is varied. There are, however, many options in this vein and it is not clear how they might be prioritized. By way of background for this study, we first consider various options for the crystallization of conformationally complex macromolecules and then describe the characteristics of gpl20.
Crystallization by variation and modification. For the more difficult crystallization challenges, which can be defined as those for which conventional screening fails, one typically tries to vary or modify the protein while maintaining biologically important properties. Meaningful results obtain since the integrity of internal structure and functional properties can often tolerate variation at the molecular surface where lattice contacts are made. The probability for success in crystallization is enhanced because flexible or heterogeneous surface features may be removed or because of the fortuitous introduction of lattice interactions. A prescient example that pre-dates the powerful methods of modern molecular biology was John Kendrew' s screening of myoglobins from many different organisms until he found one, from sperm whale, that crystallized well (5) . Indeed, human myoglobin requires a Lys to Arg substitution in order to produce crystals suitable for structural analysis (6). Conversely, crambin forms exceptionally well-ordered crystals despite being a mixture of two isoforms with sequence variation at internal residues (7) .
There are many notable examples of variation or modification in the crystallization of macromolecules. Systematic variation in the species of origin, as pioneered with myoglobin (5) , was also instrumental in the crystallization of the transcription initiating TATA-binding protein (8) . Proteolysis is often used to define crystallizable fragments, following the early examples from enzymatic digestions of antibodies that produced crystallizable fragments (reviewed in (9)) and the bromelain release of hemagglutinin from the influenza virus membrane (10) . Variation of recombinant constructs, often inspired by proteolytic definition, is now commonplace with the widespread use of molecular biology tools. Systematic variation in the length of DNA oligomers has proved essential in the structural studies of protein - nucleic acid complexes. The work of Jordan and Pabo on λ repressor (11) sets the example for transcription factors, and the principle extends to other complexes as for the nucleosome (12) . The use of protein ligands to stabilize another protein of interest for crystallization has also been effective as in the study of actin through its complex with DNase 1(13) and more generally through complexes with antigen-binding
Fab fragments of antibodies (reviewed in (14)) . The principle that the detergent solubilized lipid interface of membrane proteins is generally unavailable for lattice contacts has led to the concept that crystallizability will be enhanced if the non-variable surface area is increased, and this was demonstrated in practice in the crystallization of a bacterial cytochrome oxidase in complex with an antibody Fv fragment (15) . Similarly, the anticipated conformational and compositional heterogeneity in carbohydrate moieties of glycoconjugates is expected to interfere with crystallization, and deglycosylation has proved essential for heavily glycosylated proteins such as human chorionic gonadotropin (16) .
Characteristics of HIV gpl20. HIV-1 induces acquired immunodeficiency syndrome (AIDS) in humans (17 , 18) . The gpl20 glycoprotein helps to mediate virus entry into cells through sequential recognition of two cellular receptors, the surface glycoprotein CD4 (19,20) and a chemokine receptor (primarily CXCR4 or CCR5 , depending on viral strain) (21-26) . These high affinity interactions are attractive targets for mimetic drug design. Although the structure of the gpl20-binding domain of CD4 and the identity of residues critical to its interaction with gpl20 have been known for several years (27, 28) , this has not been sufficient for design of potent antagonists (29-31) . As the major virus-specific antigen accessible to neutralizing antibodies, knowledge of the gpl20 structure could also impact considerably on vaccine design. Despite this interest and considerable effort for several years with pure soluble protein, available in quantities as a byproduct in part from vaccine trials, gpl20 has resisted crystallographic analysis .
The mature gpl20 glycoproteins of different HIV-1 strains typically have 470-490 amino acid residues (32) . Extensive N-linked glycosylation at 20-25 sites accounts for roughly half of the gpl20 mass(32,33). Sequences from many different viral isolates show that gpl20 has five variable regions (V1-V5) interspersed between relatively conserved regions (C1-C5) (32,34) and nine conserved disulfide bridges (33) . Except for limited N- and C-terminal cleavage, proteolytic digestion does not reveal a sub-domain structure. Indeed, even after extensive proteolytic cleavage, the unreduced protein runs near its native molecular weight on SDS-PAGE (PDK, unpublished data) .
The gpl20 glycoprotein likely exhibits conformational flexibility. Some of the variable regions, the V2 and V3 loops in particular, are known to be exposed on the surface of the native protein and probably assume multiple conformations. The potential of gpl20 to undergo conformational change is also evidenced by shedding, the CD4-induced dissociation of gpl20 from the surface of the virus, by ligand-induced variations in monoclonal antibody binding (35 , 36) , and by complex CD4-gpl20 binding kinetics (37) . These changes may be related to the functional role of gpl20 in virus entry.
The extensive glycosylation and conformational heterogeneity of gpl20 suggested that merely screening the protein through ever more exotic crystallization conditions would not produce well-diffracting crystals. We have analyzed the effectiveness of optimizing different crystallization factors given the specific characteristics of gpl20. This led us to a strategy employing radical modification of the protein surface, primarily to reduce heterogeneity but also to create new potential lattice contacts. We derive equations which show that this strategy, which we term variational crystallization, substantially enhances the overall probability of crystallization for gpl20. An iterative process, involving both biochemical and molecular biological techniques, was used to detect and remove chemical and conformational heterogeneity. In addition, protein ligands, namely CD4 and the Fab fragments of several monoclonal antibodies, were used to restrict conformational mobility. Progressive trials of 18 different gpl20 crystallization variants yielded six different crystals, at least one of which is suitable for structural analysis. This paradigm of crystallization, with a focus on protein modification rather than on crystallization screening, may aid in the structural analysis of other conformationally complex proteins.
Theoretical Analysis
Much of the crystallization literature is anecdotal, reflective perhaps of the diverse nature of proteins. Systematic quantitative studies have necessarily focused on robust, well characterized systems (38) . If a particular protein fails to crystallize, one is faced with a bewildering array of options based on the experience with other often quite different proteins. In the absence of a comprehensive crystallization theory it is difficult to know how to proceed. Here, we devise an approximate theoretical underpinning for such decisions based on the ratio comparing crystallization probabilities before ( _?_) and after (Pf) a modifying procedure. We define the enhancement in crystallization probability as £ = pf /pi -1 whereby £ = 0 for no change and can reach a maximum, £max = 1 /Pi - 1, that depends on the inverse of the initial probability.
In evaluating different crystallization strategies, one important consideration is effectiveness. Many factors affect crystallization, and a suitable crystallization approach depends on identifying and dealing with those that are most limiting. For example, if a protein were only 30% pure, the crystallization probability associated with such protein purity would be low and a purification strategy would be key; if a protein were 98% pure, further purification would most likely have little impact on the overall probability of crystallization. Factors that might be expected to affect the crystallization of gpl20 are listed in Table 1, along with estimates of the effect of optimizing each factor given the specific characteristics of gpl20.
Although identification of limiting" crystallization factors can establish rough guidelines as to the appropriateness of a particular crystallization strategy, a better way to evaluate effectiveness (or perhaps to judge the progress of ' a specific crystallization effort) is by quantitative assessment of the enhancement in crystallization probability. For example, if 80% of all crystallizable proteins crystallize from a core set of 50 conditions (2) , a strategy that involves screening ever larger arrays of crystallization conditions could at most enhance the probability of crystallization by only 25% over that for the first 50 conditions; further screening would yield increasingly diminishing returns. With this screening example, the quantitative enhancement of probability is straightforward to calculate, but it is not immediately apparent for the strategy of variational crystallization, which focuses on protein modification. We can consider two kinds of such modifications -- those designed to reduce heterogeneity and those related to expanding the number of crystallization candidates.
Enhancement of surface homogeneity. Crystalline order is explicitly dependent on lattice homogeneity. Reducing heterogeneity can be thought of as increasing the proportion of surface area available for formation of lattice contacts, increasing the probability of crystallization. The probability that a single lattice contact between two molecules may form is in part related to the fraction of surface area that is homogeneous on one molecule multiplied by the fraction homogeneous on the other, i.e.,
P (homogeneous contact) cc H (molecule 1) x H (molecule 2) (1)
where H is defined as the homogeneous fraction of the surface. Consider the case where the molecule in question is the smallest repeating unit in a crystal, that is, the asymmetric unit. In such a case molecule 1 and molecule 2 are equal, and the above equation reduces to (% homogeneous surface)2. Now consider the same scenario with two lattice contacts; the probability that both are homogeneous is related to [(H-δj)2 x (H- 32)2] where "H" is the homogeneous fraction of the surface which may form lattice contacts and "δn" is a function of the relative size and total number of lattice contacts other than contact n and the degree and distribution of surface homogeneity -- related to the occlusion of available surface area upon formation of each lattice contact as well as the spatial distribution of homogeneous surface over the molecular surface. Generalizing to case of "C" lattice contacts, the probability associated with homogeneous lattice formation is related to:
C
P(latice) cc [(H-3ϊ)2 x (E- d2) 2x...x (H-βc) 2] =11 (H-dn) 2] (2) n=l
In the restricted case of one molecule per asymmetric unit, the observed average value of "C" (Cave) is -4.5(39), with a minimum theoretical value for the most common space groups of 2 or 3(39). Since C may be relatively small, lattice contacts may make up only a small proportion of a macromolecule surface, with considerable surface heterogeneity tolerated. Thus, for example, many proteins that pack into well-ordered crystal lattices have disordered regions, with N- and C- termini as well as internal loops being unresolved.
Given a reduction in surface heterogeneity, what is the change in crystallization probability? Surface area is correlated with molecular mass (M) by the power law: surface area = 6.3 x M0'73, which on average "predicts surface area to within 4% for monomeric proteins (40) . The fraction of homogeneous surface can thus be approximated as a ratio of molecular masses of the total and of the homogeneous portion of the protein:
H == [M (homogeneous) / M (total) ] °-73 (3)
From Eqs . 2 and 3, it is now possible to estimate the enhancement in probability for crystallization upon reduction of heterogeneity. With the simplifying approximation 9n=0, the probability ratio of before (P^) and after (Pf) becomes
Pf/Pi - [ (M(homogeneous)f / M (total; f] x-46 x c /
[ (M (homogeneous) i / M (total) i] 1- 6 x C (4)
Equation 4 is still not very useful, however, since M (homogeneous) is unknown and molecule-specific. In reducing heterogeneity, however, it seems reasonable to assume that the removed portion, if it were a highly branched carbohydrate or a proteolytically exposed region, is completely heterogeneous. In such cases,
[M (homogeneous) f « M (homogeneous) i] whether or not all heterogeneity has been removed. Assuming that C <- Cave, the enhancement (£) in probability on removal of a heterogeneous portion becomes
Pf /pi " 1 - [ M ( total ) i / M ( total ) f] 1 ' 46 Cave - 1 ( 5 ) This last equation allows the change in crystallization probability upon heterogeneity removal to be quantified. For example, consider a situation where a recombinant DNA approach is used to produce a protein with an affinity tag of 10 amino acid residues. Is it important to remove these presumably flexible residues? From Eq. 5, the answer depends on the protein size. For a 100 -residue protein, removal of the tag would greatly enhance the crystallization probability by: £r»r = Pf /p._
-1 - [110/(110-10) j1'46 x 4-5 -1 = 0.87, or almost 90%, whereas for a 500-residue protein the enhancement would be minimal, £r = 0.14.
Another variant of Equation 4 can be used to estimate the impact of adding a ligand of fixed structure to a molecule that contains heterogeneous portions. This expands the surface available for lattice contacts and effectively dilutes the heterogeneous component . It may be an approach of choice when the heterogeneity is essentially unremovable, such as at the lipid interface of detergent solubilized membrane proteins. One faces the difficulty of estimating the extent of heterogeneity to use Eq. 4, but this might be done by summing the residual variable components in gpl20 or by topographical estimates for a membrane protein. (For example, for a sphere embedded symmetrically in a membrane of thickness h, 1-H area (heterogeneous) /area (total) = h/ [6Mv/ (τrN0) ] 1/3, where M is molecular mass, v is partial specific volume and N0 is Avogadro's number. Thereby, 1-H = 0.62 for h = 30A and M = 50 kDa . ) . Then the enhancement in probability on addition of a fixed component becomes
- - { [M ( total ) i /M ( total ) f) x (M ( total ) f - M (hetero) i ] / (M ( total ) i - M (hetero) i ) } 1 -4 S * cave- l
( 6 ) In the instance of a 50 kDa protein, half of which is heterogeneous, to which a 25 kDa Fv fragment is complexed, £a={ [50/75] x [75-25] / [50-25] j1'46 x 4'5 -1 = 5.6.
Thus if the overall crystallization probability of the protein was initially only 1 chance in 10, assuming all other crystallization probability components remained unchanged, the crystallization probability of the Fv fragment complex would be roughly 1 chance in 2 , a substantial enhancement.
The accuracy of the quantification is only as good as the approximations, and several of the approximations used here call for further scrutiny. The approximation of molecular mass for surface area was used for the initial protein prior to heterogeneity removal. This is probably an underestimate since the completely heterogeneous portions of the protein would not be expected to fold as compactly as the homogeneous portions. In addition, the approximation that _h=-0 tends to underestimate the deleterious influence of heterogeneity on crystallization. Both of these assumptions show an underestimation, but the equations still should predict the correct general trend. For some assumptions, however, the effect is more subtle. For example the equations were generated assuming one molecule per asymmetric unit. If one considered a tight complex of molecules, the same equations would hold as long as the complex did not have internal symmetry
(complexes with internal symmetry show a different average contact number) . Finally the category of heterogeneity is quite broad, and there are some situations, such as with segmental flexibility where these equations may be invalid. For example in the case of two rigid domains connected by a flexible linker, one would have to consider the possibility that one domain could be fixed relative to the other with a single appropriate contact . Increase of molecular variants . Another aspect of variational crystallization, the use of multiple variants of the same protein, also increases the probability of crystal formation. In this case, the overall probability of crystallization is exponentially related to the number of variants. Assuming independence of variants (a reasonable"assumpt"ion with different protein ligands; not as valid with minor changes) with n variants and a probability of crystallization for each variant of Pi# the overall probability Pτ is: n
Pτ = 1-[(1-P.) x (1-P2) x x (1-Pn)]= 1- II (1-Pi) (7) i=l
For example, if each variant of a relatively heterogeneous protein had only a 25% chance of crystallizing, the overall probability would be 1- (l-0.25)n; with 15 variants, the probability would increase to almost 99%.
The enhancement in overall probability for successful crystallization from a set of n variants can then be calculated relative to the probability for a single variant. If we assume that the probability for crystallization of this individual variant, i, is typified by the average for all variants, Pi « Pave/ the enhancement factor is
£ = " / Pi - 1 - i 1 / Pave) - [ ( 1 - Pave) n/Pave] - 1 ( 8 )
If one tries many variants such that (1-Pave)n << 1, then the enhancement is inversely related to the average probability of crystallizing a single variant: £ - ( Pave ) - l O )
Thus, the more difficult a protein is to crystallize, the more it benefits from a strategy employing multiple variants .
Experimental Procedures
Constructs of gpl20. The various recombinant gpl20 glycoproteins used for crystallization trials were produced in stable Drosophila Schneider 2 lines under the control of an inducible promoter as previously described (41) (Table 2) . Genetic constructs containing various deletions and substitutions were made during the course of dissecting the gpl20 domain structure. The procedures for making these constructs and the biological properties of the corresponding protein products are described elsewhere (see references in Table 2) .
Protein production and purification. The N-terminal two domains of CD4 (D1D2), residues 1-183, were produced in Chinese hamster ovary (CHO) cells and purified as described previously (27) . Human monoclonal antibodies 17b, A32, Cll and F105 (derived from HIV-infected individuals) (42,43) and mouse monoclonal antibodies L71 and 178.1(44,45) were purified by Protein-A affinity chromatography. Secreted gpl20 from Drosophila cells was purified by affinity chromatography with the F105 antibody covalently coupled to Sepharose. Following extensive washing with phosphate-buffered saline containing 0.5 M NaCl, gpl20 protein was eluted with 0.1 M glycine, pH 2.8, followed by immediate neutralization with Tris buffer.
Protease Digestion. Fab fragments were produced by papain digestion of monoclonal antibodies. Briefly, the antibody was reduced in 100 mM DTT, 100 mM NaCl , 50 mM Tris pH 8.0 for 1 hr at 37°C, and dialyzed (4'C), first in phosphate-buffered saline (PBS) to reduce the DTT concentration to ~1 mM, then in alkylating solution (PBS titrated to pH 7.5 with 2 mM iodoacetamide, 48 hr) , and subsequently in PBS without iodoacetamide. The reduced and alkylated antibody was concentrated to at least 2 mg/ml and digested with papain using a commercial protocol (Pierce) . An additional gel filtration chromatographic step on a Superdex S-200 column (Pharmacia, FPLC) was added to ensure oligomeric homogeneity.
The gpl20 proteins were subjected to digestion with papain, elastase, and subtilisin (Boehringer Mannheim) to assay for proteolytic susceptibility. In these assays, the gpl20 concentration was kept constant and the protease diluted serially (3.3x) from a ratio of 1:10 to 1:1000. The digestion mix was incubated for 1 hr at 37°C and quenched by addition of 1% SDS (1:10 ratio) with immediate heating in boiling water for 2 minutes. Digestion products were analyzed with SDS-polyacrylamide gel electrophoresis (PAGE) with and without DTT reduction.
Carboxypeptidase Y digestion was used to analyze the C-terminus of gpl20. A 1:10 ratio of carboxylpeptidase Y (Boehringer Mannheim) to gpl20 was incubated for 1 hr at 37 "C, pH 7.0. Even though digestion could not be easily seen by SDS-PAGE, the C-terminus of gpl20, HXBc2 strain, contains a number of positively charged amino acids, and the extent of the reaction could be monitored by native-PAGE.
Deglycosylation. Drosophila-produced gpl20 proteins were deglycosylated enzymatically . Briefly, 0.5 mg/ml of gpl20 was incubated with various deglycosylating enzymes (singly or in combination) in 0.5 M NaCl, 100 mM Na acetate, pH 5.7, for 10 hr at 37 "C. Endoglycosidase D was used at a concentration of 0.1 U/ml, Endoglycosidase F at 0.25 U/ml, Endoglycosidase H at 0.25 U/ml, and Glycopeptidase F at 0.1 U/ml (all from Boehringer Mannheim) . For crystallization variants involving the CD4-gpl20 complex, the addition of D1D2 (which lacks carbohydrate) to the deglycosylation cocktail was found to enhance gpl20 solubility. The deglycosylation reactions were monitored by following the reduction in molecular weight on SDS-polyacrylamide gel electrophoresis (SDS-PAGE) . Deglycosylation was nearly complete within 30 min and plateaued after 3 hr. The extent of deglycosylation was judged by matrix-assisted laser desorption (MALDI) mass spectroscopy, carbohydrate analysis, affinity for concanavalin-A, and mobility and band width on SDS-PAGE. Protein aggregation was assayed by native-PAGE, dynamic light scattering, and gel filtration chromatography.
Monoclonal antibody binding assay. The various gpl20 glycoproteins were assessed for recognition by a variety of monoclonal antibodies directed against both linear and discontinuous gpl20 epitopes by either immunoprecipitation (46) or by ELISA(47) . The ELISA was performed with both fully glycosylated and deglycosylated ΔV1/2ΔV3 glycoproteins immobilized on ELISA plates using a capture antibody specific for the gpl20 carboxyl -terminus, 6205 (International Enzymes) (47) .
Binary and ternary complex purification. To ensure proper stoichiometry and oligomeric homogeneity, all complexes were purified by gel filtration chromatography on a Superdex S-200 column (Pharmacia, FPLC) . This column exhibited good resolution with routine separation of samples that differed by only 30% in molecular weight. Individual components were first purified separately to ascertain their monomeric status. Components were then combined to form complexes, which were repurified on the same column. A buffer of 0.35 M NaCl, 5 mM Tris/Cl pH 7.0, 0.02% NaN3 was used throughout. Peak fractions were concentrated over centricon-30 (Amicon) to a final protein concentration of -10 mg/ml and either aliquoted and stored at -80 °C or used directly for crystallization.
Crystallization. The vapor-diffusion hanging-droplet technique was used for all crystallizations. Small volumes, 0.5 μl protein solution + 0.5 μl reservoir solution, were used for most crystallizations, screenings and final optimizations.
Screening. The Crystal Screen I (Hampton Research) was used, augmented by approximately 20 conditions which tested high protein concentrations (vapor diffusion concentration of the protein at various pHε) as well as mixtures of organic additives (2-5% MPD, PEG 400, or PEG 4000) combined with high ionic strength (2-4 M NaCl, (NH4)2S04 or Na/KP04 ) at pH 5.5-9.5. For each gpl20 crystallization variant, a subset of 12 different conditions was analyzed in depth to establish the approximate precipitation point of the protein for a variety of different precipitants . The factorial solutions were then individually adjusted to target the observed precipitation point and a full screen of -70 conditions was set up at 20 °C. After at least one week of constant daily observation, screening solutions were recalibrated to account for the observed 20°C precipitation point and another full screen at 4 ° C was set up. If no crystals were observed, the Crystal Screen II (Hampton Research) was set up at 20° C.
Optimization. In addition to the standard single variable optimization of crystallization conditions, a factorial-like procedure was used to determine if small amounts of different additives increased crystal quality. Type E crystals were grown from the following conditions: Protein (Δ82ΔVl/2*ΔV3ΔC5 gpl20, two-domain CD4 (D1D2) , Fab 17b purified as a ternary complex on the Superdex S-200); Droplet (0.5μl protein solution consisting of -10 mg/ml protein in gel filtration buffer + 0.4 μl droplet mix containing 0.1 M NaCitrate, 0.02 M NaHepes, 10% isopropanol, 10.5% PEG 5000 monomethylether (Fluka) , 0.0075% SeaPrep Agarose (FMC BioProducts) , pH 6.4; Reservoir: (0.35 M NaCl, 0.1 M NaCitrate, 0.02 M Hepes, 10% isopropanol, 10.5% PEG 5000 monomethylether, pH 6.4) . The droplet mix was kept at 37 °C to ensure the agarose solubility, and the crystallization setup at room temperature. Clumps of crystals appeared within two weeks of incubation at 20 "C and grew for several months to maximal size.
X-ray diffraction characterization. All data were collected at beamline X4A of the National Synchrotron Light Source, Brookhaven National Laboratory. The type E crystals were crosslinked with the vapor diffusion technique of Lusty (48) by placing a crystallization bridge (Hampton Research) with a 25 μl sitting droplet of 1% glutaraldehyde (Sigma) in the reservoir of a standard hanging droplet vapor diffusion crystallization setup for 1 hr at room temperature. The crosslinked crystal was washed with stabilizer (reservoir solution with only 50 mM NaCl) containing 10% ethylene glycol. After approximately 24 hr, the external liquid surrounding the crystal was replaced with paratone-N (Exxon) , the crystal mounted in an ethylene loop (Hampton Research) (49) , and flash-cooled in the nitrogen stream of a cryostat (details are provided in (50) ) . Oscillation data were processed with DENZO(51) and scaled with SCALEPACK (51) . Results and Discussion
To address the many problems associated with the crystallization of HIV-1 gpl20, we exploited the mutability of the macromolecular surface using tactics that involved protein modification and conformational restriction (Table 3) . Several of these tactics contain novel features and are detailed here.
Variant constructs of the gpl20 protein. Variants of gpl20 were developed through an iterative cycle which strove to eliminate heterogeneity. The cycle involved recombinant production of gpl20 variants, deglycosylation, and then assessment of heterogeneity and flexibility by examinations of glycosylation status, monoclonal antibody binding, and protease sensitivity, leading to the design of new constructs. For example, protease digestion monitored by PAGE indicated susceptibility at the C-terminus, and a form with 15-20 residues removed by carboxypeptidase Y retained CD4 binding activity. A homogeneous product was difficult to make by this method, and primer-based PCR mutagenesis and recombinant expression were used to generate a homogeneous gpl20 derivative with a 19-residue C-terminal deletion. At the N-terminus, sequencing of the initial constructs showed the expected signal cleavage at +31, with four additional amino acids, Gly-Ala-Arg-Ser, added from the signal peptide (a consequence of different processing of the cloning vector signal peptide with gpl20) . Protease digestion gave a product at +40, indicating flexibility in the N-terminus. Progressive genetic truncation and biochemical analysis identified +83 as a variant that was recognized by conformation-dependent gpl20 ligands, whereas +94 exhibited some conformational disruption (46) . Thus much of the apparently flexible region at the N-terminus of gpl20 could be removed without disrupting the global conformation of the protein.
To further reduce flexibility, variable loops, VI, V2 , and V3 , were deleted and replaced with shorter segments, as reported earlier (52,53). Little effect was found on CD4 binding activity (47, 53) . Three constructs were made which contained deletions of the VI, V2 , and V3 loops (Table 2) . In the ΔV1/2ΔV3 construct, the entire base and stem of the variable loops VI, V2 and V3 were excised. In the ΔV1/2*ΔV3 protein, the conserved stem of the VI/V2 stem-loop structure was retained, restoring the CD4-induced antibody epitopes in the presence of soluble CD4. In the ΔV1/2*ΔV3* protein, the base of the V3 loop was retained as well, fully restoring CD4-induced antibody epitopes, even in the absence of soluble CD4.
Deglycosylated forms of gpl20. The asparagine- linked carbohydrate on the gpl20 glycoprotein produced in
Drosophila cells was analyzed. Dionex chromatography showed that the carbohydrate on this protein consisted of (N-acetyl-glucosamine) _ (fucose) F (mannose)M, with F = 0 or 1 and M = 3 to 9 (JSC, unpublished data) . Deglycosylation with enzymes such as Glycopeptidase F (or Endoglycosidase F at pH 5.0), which cleave the glycosidic linkage and convert the N-linked asparagine into an aspartic acid, resulted in gpl20 aggregation, although it remained soluble. Cleavage of the 1-4 .-bonds in the chitobiose core with Endoglycosidases D or H, leaving only a single N-acetylglucosamine residue and, potentially, a 1-6 fucose attached to any of the glycosylated asparagine residues, appeared to leave the protein intact as judged by a panel of conformationally sensitive monoclonal antibodies (47) . Digestion of full-length constructs with Endoglycosidase H, which has specificity for oligosaccharides with 5-9 mannose residues, removed roughly 60% of the carbohydrate, and addition of Endoglycosidase D, which cleaves oligosaccharides with 3 or 4 mannose residues, removed up to 90% of the carbohydrate. For the variable loop-deleted constructs, all mannose residues were removed with the Endoglycosidase D/H combination as judged by carbohydrate analysis and by the inability of concanavalin A to bind to the deglycosylated protein. Mass spectroscopy of the deglycosylated Δ82ΔVl/2*ΔV3ΔC5 gpl20 showed a molecular mass of 39,000 + 50 Da, consistent with a mass of 35.4 kDa for the protein (based on the DNA sequence) and 3.6 kDa for the remaining carbohydrate. Carbohydrate analysis showed only fucose and N-acetyl-glucosamine sugars to be present, in a ratio of 1:3.05 ± 0.02, respectively. These results suggest that, of the 18 potential asparagine glycosylation sites in the Δ82ΔVl/2*ΔV3ΔC5 gpl20, five are unused, nine are modified with N-acetyl-glucosamine and four with N-acetyl-glucosamine (1-6) fucose.
Complexes with gpl20 ligands. Protein ligands, CD4 and the Fab fragments of monoclonal antibodies, were used in an attempt to reduce mobility in the overall surface of the protein and, hence, in the potential crystal lattice. This was complicated by the internal mobility of these ligands: CD4 has a flexible juncture between the second and third extracellular domains (54), and Fabs have a conformationally mobile "elbow bend" between their variable and constant domains (55) . For CD4 , we used a construct containing the N-terminal two domains
(1-182) , for which we had previous success in structure determination (27) . Fabs of the monoclonal antibodies, were screened individually, even though combinations of Fabs were possible.
Initial trials with the Fab 178.1, which recognizes a linear epitope in V3 of both free- and CD4 -bound gpl20(44), gave only crystalline precipitates at best. We also tested the Fab of the anti-CD4 antibody L71, which recognizes the CDR3-like loop in domain Dl(45), but had difficulties preparing ternary complexes, probably due to a destabilization of the CD4 - gpl20 interaction. Subsequently, we focused on gpl20 -directed antibodies with discontinuous epitopes, which were more likely to recognize conformationally rigid portions of gpl20. Complexes of gpl20 proteins with Fabs of Cll, which recognizes an epitope spanning Cl and C5(42), and
F105, whose epitope lies within C2 , C3 , C4 , and C5
(overlapping the CD4 binding site) (43) gave only poor crystals (Table 4) . We had greater success with 17b, which not only recognizes a discontinuous epitope but discriminates between different conformational states of gpl20(36) . The Fab of 17b did not bind the initial gpl20 constructs, requiring the restoration of the stem of the V1/V2 loop (constructs ΔV1/2*ΔV3 or ΔV1/2*ΔV3*) .
Crystallization. We screened 18 different combinations of gpl20 variants and ligands (Table 4) , using a limited factorial-based crystallization screen. Factorial screening was originally devised as a method for deducing the essential crystallization factors from combinations of different conditions (1) . The empirical observation, however, that most crystallizable macromolecules are able to crystallize from a limited set of common conditions, has validated an entirely different process: crystallization screening with a small but diverse collection of fixed conditions (2) . A high probability of success has been reported with as few as 6 different conditions at 4 different concentrations (56) , and commercial kits are available with 50-100 conditions (Hampton Research) .
In conjunction with the limited crystallization screen, small volume droplets were used, typically 0.5 μl of protein per crystallization trial. With small volumes, 1-2 mg of protein was sufficient to evaluate each gpl20 crystallization variant. Smaller volumes were also more efficient at nucleation than larger droplets, perhaps due to higher surface tension effects which may result in a greater range of precipitant concentrations for each droplet to sample. Indeed, droplets that were "spread-out" also showed enhanced nucleation. This explanation may also account for the well-known observation that crystals frequently nucleate from the edges of crystallization droplets.
The initial crystallization screens produced six different types of crystals (Fig. 1, Table 5) . ' For crystal types A-D, extensive optimization was unable to produce single crystals large enough to be characterized. For crystal types E and F, single crystals of needle morphology could be grown. The growth of single crystals of type E, however, required the addition of agarose, which was identified during optimization by the additive screening process. Trials with a variety of agaroses found the SeaPrep, with a gelling point near room temperature, gave the best results. Despite considerable effort, further crystallization optimization failed to produce large single crystals, and the best typical crystals were rods with a cross-section of only 30 x 40 μm. A closely related crystallization variant, which retained 10 additional amino acids in the stem of the V3 loop, failed to crystallize (Table 4) .
Characteristics of gpl20 crystals. Single crystals of type E and F were analyzed for diffraction in capillary mounts. Only type E crystals showed diffraction. The needle axis of type E crystals proved to coincide with the a axis, and the rhombohedral cross-section perpendicular to the needle axis proved to be bounded by faces of the form (0 1 1) . These could be distinguished from type F crystals, where the cross-section was hexagonal. Gel electrophoresis of type E crystals demonstrated that they contained all the elements of the ternary complex: gpl20, D1D2 , and Fab 17b (Fig. 2) .
We were unable to flash-cool the type E crystals with standard cryoprotectants . Satisfactory results were found with a procedure that (i) fortified the crystals with vapor-diffusion glutaraldehyde erosslinking (48) ,
(ii) permeated the crystals with 10% ethylene glycol and
(iii) used an immiscible oil, paratone-N, to replace the external solution around the crystals prior to flash-cooling (50) Cryopreserved crystals diffracted to Bragg spacings of better than 2 A, although the diffraction was anisotropic, with higher mosaicity along the 88 A b-axis .
Type E crystals were orthorhombic, space group P222x with unit cell parameters, a=71.25A, b=88.11A and c=196.44A
(c-=(β=γ=90° ) . Solvent content analysis yielded a solvent content of 58% for one ternary complex in the crystallization asymmetric unit (assuming a partial specific volumes of 0.73 for protein and 0.65 for carbohydrate and the observed total molecular mass of 108.3 kDa for the complex of which 3.6 kDa is carbohydrate) . Diffraction data have been collected to a limit of 2.2A spacings (Table 6) .
Conclusions. Our success with gpl20 demonstrates the power of variational crystallization. We have derived equations that quantify the effect of this strategy on the overall probability of crystallization and have calculated the corresponding probability enhancements for several of the biochemical and molecular biological manipulations employed in this study. As can be seen (Table 3), the probability of crystallization can be strongly influenced by reducing molecular surface heterogeneity. The influence of using multiple variants is more difficult to quantify since it depends on the individual probability of crystallization for each variant. Nonetheless, our theoretical analysis shows that the effect of multiple variants is greatest for proteins least likely to crystallize.
While the variational approach with gpl20 did involve extensive effort, this was primarily a consequence of the difficulty in producing the gpl20 glycoprotein, which involved expression levels of only a few mg of gpl20 per liter of eukaryotic cell culture. While future advances in molecular biology will no doubt make such projects less arduous, if proteins are expressed bacterially, present day recombinant techniques coupled to affinity or "tag" purfications make the generation of variants straightforward. A recent example, involving the generation of 11 different variants in the crystallization of an ionotropic glutamate receptor (57) , required only a 6 month effort (E. Gouaux, personal communication) .
The resistance of gpl20 to crystallization may be related in part to its functional role in eluding the immune system; the mechanisms evolved to prevent the formation of specific immune system : gpl20 contacts, might also thwart formation of the homogeneous gpl20 : gpl20 contacts needed for crystallization. Perhaps relevant to this, the protein modifications that most greatly reduced heterogeneity (and thus enhanced the crystallization probability) , removal of carbohydrate and substitution of the variable loops (Table 3) , have been shown in vivo to enhance the generation of neutralizing antibodies (58 , 59) . It is difficult to evaluate the predictions of the crystallization algorithms derived here in a statistically significant manner. The failure of proteins to crystallize is rarely reported in the literature, and our own results comprise too small a sample to be statistically meaningful. Nonetheless, we note that for gpl20 the algorithms predict that crystals are most probable with deglycosylation, variable loop removal, and addition of an ordered protein ligand. Consistent with prediction, for the 6 crystallization variants that did have all of these modifications, three
(or 50%) produced crystals, whereas for the 12 variants that did not have these modifications, no crystals (0%) were grown. In addition, theory predicts that well-ordered crystals are most probable when the overall probability of crystallization is highest; Table 4 shows that the crystallization variant that produced the only well-ordered crystals appeared to have the greatest probability of crystallization, producing three different crystals forms whereas the best of the other variants only produced one form each.
The crystallization literature is replete with examples of protein manipulation, from proteolytic digestion, to variation in solvating detergent, to screening of DNA oligonucleotides (38) . What distinguishes our efforts is the derivation of a theoretical foundation, which allows the probabilistic assessment of the most effective crystallization approach. Because of the conformational complexity of gpl20, we focused on surface modification to eliminate heterogeneity and to present new crystallization variants -- coupled to a limited screen of crystallization conditions. The types of crystallization problems embodied in gpl20 (Table 3) are not so different from many of the typical problems facing present day crystallographers ; both from a theoretical or from a practical perspective, the strategy of probability analysis coupled to variational crystallization may be broadly applicable.
Subsequent to the submission of this manuscript, the structure determination of type E crystals was reported (63 ) .
Ref erences
I. Carter, C. W. Jr., and Carter, C. W. (1979) J. Biol . Chem. 254, 12219-12223. 2. Jancarik, J., and Kim, S. H. (1991) J. Appl . Crystallogr. 24, 409-411.
3. Kelders, H. A., Kalk, K. H., Gros, P., and Hoi, W. G. (1987) Protein Eng. 1, 301-303.
4. Morris, D. W. , Kim, C. Y. , and McPherson, A. (1989) Biotechniques 1 , 522-527.
5. Kendrew, J. C, and Parrish, R. G. (1956) Proc . Roy. Soc , A 238, 305-324.
6. Hubbard, S. R. , Hendrickson, W. A., Lambright, D. G., and Boxer, S. G. (1990) J. Mol . Biol . 213, 215-218.
7. Teeter, M. M., and Hendrickson, W. A. (1979) J. Mol . Biol . 127, 219-223.
8. Nikolov, D. B., Hu, S. H., Lin, J., Gasch, A., Hoffmann, A., Horikoshi, M. , Chua, N. H., Roeder, R. G., and Burley, S. K. (1992) Nature 360, 40-46.
9. Porter, R. R. (1973) Science 87, 713-716.
10. Brand, C. M. , and Skehel, J. J. (1972) Na ture 87, 713-716.
II. Jordan, S. R., Whitcombe, T. V., Berg, J. M., and Pabo, C. O. (1985) Science 230, 1383-1385.
12. Luger, K. , Mader, A. W., Richmond, R. K. , Sargent, D. F., and Richmond, T. J. (1997) Nature 389, 251-260.
13. Kabsch, W. , Mannherz, H. G., Suck, D., Pai , E. F., and Homes, K. C. (1990) Nature 347, 37-44.
14. Kovari, L. C, Momany, C, and Ross ann, M. G.
(1995) Structure 3, 1291-1293.
15. Ostermeier, C, Iwata, S., Ludwig, B., and Michel, H. (1995) Nat . Struct . Biol . 2, 842-846. 16. Lustbader, J. W., Birken, S., Pileggi, Ν. F., Gawinowicz, M. A., Pollak, S., Cuff, M. E., Yang, W., Hendrickson, W. A., and Canfield, R. E. (1989) Biochem . 1989, 9239-9243.
17. Gallo, R. C, Salahuddin, S. Z., Popovic, M., Shearer, G. M. , Kaplan, M. , Haynes, B. F., Palker, T. J., Redfield, R., Oleske, J., Safai, B., White, G., Foster, P., and Markham, P. D. (1984) Science
224, 500-503.
18. Barre-Sinoussi , F., Chermann, J. C, Rey, F., Nugeyre, M. T., Chamaret , S., Gruest, J., Dauguet, C, Axler-Blin, C. , Vezinet-Brun, F., Rouzioux, C, Rosenbaum, W. , and Montagnier, L. (1983) Science
220, 868-871.
19. Dalgleish, A. G., Beverley, P. C. L., Clapham, P. R. , Crawford, D. H., Greaves, M. F., and Weiss, R. A. (1984) Na ture 312, 763-767. 20. Klatzmann, D., Champagne, E., Charmaret, S., Gruest, J. , Guetard, D., Hercend, T., Gluckman, J.
C, and Montagnier, L. (1984) Na ture 312, 767-768. 21. Zhang, L., Huang, Y., He, T., Cao, Y., and Ho, D.
D. (1996) Nature 383, 768. 22. Feng, F., Broder, C. C, Kennedy, P. E., and Berger, E. A. (1996) Science 272, 872-877.
23. Dragic, T. , Litwin, V., Allaway, G. P., Martin, S. R. , Huang, Y., Nagashima, K. A., Cayanan, C, Maddon, P. J., Koup, R. A., Moore, J. P., and Paxton, W. A. (1996) Na ture 381, 667-673.
24. Deng, H. , Liu, R., Ellmeier, W. , Choe, S., Unutmaz, D., Burkhart, M., Di Marzio, P., Marmon, S., Sutton, R. E., Hill, C. M. , Davis, C. B., Peiper, S. C, Schall, T. J., Littman, D. R., and Landau, N. R. (1996) Na ture 381, 661-666.
25. Choe, H., Farzan, M., Sun, Y., Sullivan, Ν. , Rollins, B., Ponath, P. D., Wu, L., Mackay, C. R. , LaRosa, G. , Newman, W. , Gerard, N. , Gerard, C, and Sodroski, J. (1996) Cell 85, 1135-1148. 26. Alkhatib, G., Combadiere, C, Broder, C. C, Feng, Y., Kennedy, P. E., Murphy, P. M., and Berger, E. A. (1996) Science 272, 1955-1958. 27. Ryu, S.-E., Kwong, P. D., Truneh, A., Porter, T.
G., Arthos, J., Rosenberg, M., Dai, X., Xuong, N.,
Axel, R., Sweet, R. W. , and Hendrickson, W. A.
(1990) Na ture 348, 419-426. 28. Wang, J. H. , Yan, Y. W. , Garrett, T. P., Liu, J.
H., Rodgers, D. W., Garlick, R. L., Tarr, G. E.,
Husain, Y., Reinhertz, E. L., and Harrison, S. C.
(1990) Nature 348, 411-418.
29. Zhang, X., Gaubin, M., Briant, L., Srikantan, V., Murali, R., Saragovi, U. , Weiner, D., Devaux, C,
Autiero, M. , Piatier-Tonneau, D., and Greene, M. I. (1997) Nat . Biotechnol . 15, 150-154.
30. Jarvest, R. , Breen, A. L. , Edge, C. M. , Chaikin, M. A., Jennings, L. J., Truneh, A., Sweet, R. W., and Hertzberg, R. P. (1993) Bioorg. Med . Chem . 3,
2851-2856.
31. Chen, S., Chrusciel, R. S., Νakanishi, H., Raktabutr, A., Johnson, M. E., Sato, A., Weiner, D., Hoxie, J., Saragovi, H. U. , Greene, M. I., and et al. (1992) Proc . Na tl . Acad . Sci . USA 89,
5872-5876.
32. Myers, G. , Wain-Hobson, S., Henderson, L., Korber, B., Jeang, K.-T., and Pavlakis, G. (1994), Los Alamos National Laboratory, Los Alamos, New Mexico 33. Leonard, C. K. , Spellman, M. W., Riddle, L., Harris, R. J., Thomas, J. N., and Gregory, T. J. (1990) J. Biol . Chem . 265, 10373-10382.
34. Starcich, B. R. , Hahn, B. H. , Shaw, G. M. , McNeely, P. D., Modrow, S., Wolf, H. , Parks, W. P., Josephs, S. F., Gallo, R. C. , and Wong-Staal, F. (1986) Cell
45, 637-648.
35. Sattentau, Q. J., Moore, J. P., Vignaux, F., Traincard, F., and Poignard, P. (1993) J. Virol . 64, 7383-7393. 36. Thali, M. , Moore, J. P., Furman, C, Charles, M. , Ho, D. D., Robinson, J. , and Sodroski, J. (1993) J. Virol . 67, 3978-3988. 37. Wu, H., Myszka, D. G., Tendian, S. W. , Brouillette, C. G., Sweet, R. W., Chaiken, I. M., and Hendrickson, W. A. (1996) Proc . Na tl . Acad . Sci . USA 93, 15030-15035. 38. Ducruix, A., and Giege, R. (1992) Crystalliza tion of nucleic acids and proteins . The Practical approach series., Oxford Univ. Press
39. Wukovitz, S. W. , and Yeates, T. O. (1995) Nature Struct . Biol . 2, 1062-1067. 40. Miller, S., Jannin, J., Lesk, A. M. , and Chothia, C. (1987) J. Mol . Biol . 196, 641-656.
41. Cherbas, L., Moss, R. , and Cherbas, P. (1994) Methods Cell Biol . 44, 161-179.
42. Moore, J. P., and Sodroski, J. (1996) J Virol . 70, 1863-1872.
43. Thali, M. , Olshevsky, U. , Furman, C, Gabuzda, D., Posner, M. , and Sodroski, J. (1991) J Virol . 65, 6188-6193.
44. Langedijk, J. P., Back, N. K. , Kinney-Thomas, E., Bruck, C, Francotte, M. , Gouds it, J., and Meloen,
R. H. (1992) Arch Virol . 126, 129-146.
45. Truneh, A., Buck, D., Cassatt, D. R. , Jusczak, R. , Kassis, S., Ryu, S.-E., Healey, D., Sweet, R. , and Sattentau, Q. J. (1991) J. Biol . Chem . 266, 5942-5948.
46. Wyatt, R. , Desjardin, E., Olshevsky, U. , Nixon, C, Binley, J. , Olshevsky, V., and Sodroski, J. (1997) J. Virol . 71, 9722-9731.
47. Binley, J. M. , Wyatt, R. , Desjardins, E., Kwong, P. D., Hendrickson, W. A., Moore, J. P., and Sodroski,
J. (1998) AIDS Res . Hum. Retroviruses 14, 191-198.
48. Lusty, C. J. (1998) J. Appli ed Crys tallogr . in press .
49. Teng, T. Y. (1990) J. Appl . Crystallogr . 23, 387-391.
50. Kwong, P. D., and Liu, Y. (1998) J. Applied Crystallgr . in press. 51. Otwinowski, Z., and Minor, W. (1997) Methods Enymol . 276, 307-326.
52. Pollard, S. R. , Rosa, M. D., Rosa, J. J., and Wiley, D. C. (1992) EMBO J. 11, 585-591. 53. Wyatt, R. , Sullivan, N. , Thali, M., Repke, H., Ho, D., Robinson, J., Posner, M., and Sodroski, J. (1993) J. Virol . 67, 4557-4565. 54. Wu, H., Kwong, P. D., and Hendrickson, W. A. (1997) Nature 387, 527-530. 55. Lesk, A. M., and Chothia, C. (1988) Nature 335, 188-190. 56. Stura, E. A., Nemerow, G. R. , and Wilson, I. A. (1991) in Freiburg Macromolecular Crystallization Meeting, pp. 1-12, Journal of Crystal Growth 57. Chen, G.-Q., Sun, Y., Jin, R. , and Gouaux, E. (1998) Prot. Science In press. 58. Wyatt, R. , Moore, J., Accola, M. , Desjardin, E., Robinson, J. , and Sodroski, J. (1995) J. Virol . 69, 5723-5733. 59. Reitter, J. N. , Means, R. E., and Desrosiers, R. C. (1998) Nature Med . 4, 679-684. 60. Culp, J. S., Johansen, H., Hellmig, B., Beck, J., Matthews, T. J., Delers, A., and Rosenberg, M. (1991) Biotechnology 9, 173-177. 61. Ivey-Hoyle, M. , Culp, J. S., Chaikin, M. A., Hellmig, B. D., Matthews, T. J., and Sweet, R. W. (1991) Proc . Natl . Acad . Sci USA 88, 512-516.
62. Wu, L., Gerard, N. P., Wyatt, R. , Choe, H., Parlin, C, Ruffing, N. , Borsetti, A., Cardoso, A. A., Desjardin, E., Newman, W., Gerard, C, and
Sodroski, J. (1996) Nature 384, 179-183.
63. Kwong, P. D., Wyatt, R. , Robinson, J., Sweet, R. W. , Sodroski, J., and Hendrickson, W. A. (1998) Nature 393, 648-659. Table 1. Factors affecting the crystallization of gp!20.
Crystallization factor Specific gpl20 characteristics Optimization
Protein characteristics :
A. Chemical homogeneity
— other proteins very clean — affinity purification +
- carbohydrate variation relatively limited, source dependent ++
-- polypeptide variation N- and C- terminal ragged ++
B. Conformational heterogeneity
- carbohydrate N-linked, large and flexible
- surface loop flexibility total of -130 amino acids are flexible
- N- or C- termini -30 amino acids are flexible ++
— other function-related conformational mobility +++
Screening characteristics:
A. Protein
— solubility requires -300 mM NaCl for solubility ++
~ stability stable for over 1 week in cell culture +
-- availability -5 mg quantities ++
B. Screening variations
— protein homologues many different gpl20 isolates +++
~ ligands many ligands (dozens of monoclonal antibodies available)
t Estimate of the effect on the crystallization probability of a strategy which optimizes the particular factor. The number of (+) symbols denotes the size of the effect: (+) refers to almost no change in probability after optimization, whereas (+++++) refers to a large change in probability. The scale used here is a qualitative estimate; for more quantitative results, see Table 3. For chemical heterogeneity, optimization refers to the effect on crystallization of making the protein more chemically homogeneous. For conformational heterogeneity, optimization refers to the effect of removing or circumventing the particular source of heterogeneity. Table 2. The gp!20 constructs used for crystallization.
construct name .gp!20 strain' amino acid for constructs reference
Δ30-FL JRFL 31-511 (60)
ΔV1/2ΔV3 BH10 HXBc2 31-120 GAG 204-297 GAG 330-511 (53)
ΔV1/2ΔV3ΔC5 BH10/HXBc2 31-120 GAG 204-297 GAG 330-492 (61)
Δ82ΔV1/2*ΔV3ΔC5 HXBc2 83- 127 GAG 195-297 GAG 330-492 (61 )-
Δ82ΔV1/2*ΔV3*ΔC5 HXBc2 83-127 GAG 195-302 GAG 325-492 (61)
t The ΔV1/2ΔV3 and ΔV1/2ΔV3ΔC5 constructs were chimeras of strains BH10 and HXBc2, § Sequence numbers refer to the translated gpl60, with the mature gpl20 beginning at +31. N-terminal sequencing showed that all constructs contained 4 additional amino acids, Gly- Ala-Arg-Ser, an artifact of the signal peptide cleavage. GAG here refers to the tripeptide, Gly-Ala-Gly, which was substituted for the removed amino acids.
Table 3. Crystallization problems, variational crystallization solutions, and enhancement of crystallization probability.
Probability
Problem Solution Enhancement '
N-linked carbohydrate Protein production in an inducible Drosophila cell line coupled with deglycosylation with 1200% Endoglycosidases D and H
Surface loop flexibility Replacement of N1/N2 and V3 loops with 370% tripeptide linkers of Gly-Ala-Gly.
N- and C- terminal Mutational deletion and proteolytic cleavage
50% § heterogeneity analysis coupled to the production of gpl20 with truncated Ν- and C-termini
Conformational Conformation restriction with protein ligands heterogeneity such as CD4 and Fabs from conformationally ( p-we ) " 1 sensitive monoclonal antibodies
t The probability enhancement, ε, was calculated from the equation:
([ M (total)/ / MW(total)/] 46 x Ca e . 1) with C ve= 4.5, the average observed contact number. For the drosophila produced HXBc2, the molecular weight for the glycosylated gpl20 is approximately 90 kDa; the deglycosylated gpl20, 60 kDa; and the deglycosylated
ΔVl/2ΔV3 gpl20, 47 kDa. § The Ν- terminus is resistant to proteolysis from +39 to +82, and thus probably adopts an ordered conformation. This number was calculated assuming only the C-terminal 19 and the Ν-terminal 8 amino acids were disordered, t Dependent on the average probablity (Pave) of crystallizing a single variant of gpl20. If
Pave=10%, the use of many variants would lead to a probability enhancement of 900%. Table 4. Summary of HTV-l gpl20 crystallization attempts
HΓV-1 gpno glycosylation cofactors comments# construct status §
Δ61-IHB glycosylated — bad precipitates (IIIB strain)
-60% deglycosylated — bad precipitates
-90% deglycosylated precipitates look better but some Asn to Asp but still primarily bad
-60% deglycosylated D1D2 sCD4 ok precipitates
-60% deglycosylated Fab 178.1 ok precipitates
-60% deglycosylated DlD2 sCD4 and ok precipitates Fab 178.1
-90% deglycosylated ok precipitates
Δ30-FL -90% deglycosylated — ok precipitates (JRFL strain)
DlD2 sCD4 ok to good precipitates
Fab 178.1 good precipitates
DlD2 sCD4 and good precipitates Fab 178.1 — no crystals
ΔV1/2ΔV3 fully deglycosylated good precipitates — (BH10/HXBc2 strain) no crystals very small, nice looking
D1D2 sCD4 crystals in PEG 400 (Crystal Type A)
DlD2 sCD4 and badly formed crystals Fab Cl l from (NH4)2S04 (Crystal Type B)
ΔV1/2ΔV3ΔC5 fully deglycosylated DlD2 sCD4 spheroidal crystals in (BH10/HXBc2 strain) PEG 4000 (Crystal Type C)
Fab F105 good precipitates - no crystals
Δ82ΔV1/2*ΔV3ΔC5 fully deglycosylated D1D2 sCD4 and three different types of (HXBc2 strain) Fab 17b crystals (Types D-F). Orthorhombic diffract to at least 2.2 A
Δ82ΔV1/2*ΔV3*ΔC5 fully deglycosylated DlD2 sCD4 and good precipitates - (HXBc2 strain) Fab 17b no crystals
i D1D2 sCD4 refers to two-domain soluble CD4. Antibody epitopes are described in the text..
SThe percent deglycosylation reported here refers to the percent of N-linked sites cleaved by endoglycosidase D or H. Thus the "fully deglycosylated" protein still contains N-acetyl glucosamine and fucose additions.
*The correlation between overall physical characteristics of a precipitate in a crystallization trial and the actual crystallization probability are imprecise. As a consequence, the comments made here describing precipitates are extremely qualitative. "Bad precipitates" indicate that most of the precipitates were yellow to light-yellow in color, indicative of denatured protein. "Good precipitates" indicates that in some conditions, the precipitates appeared to be microcrystalline, but individual crystals could not be discerned. "Ok precipitates" span the continuum between these two extremes. Table 5. Crystallization conditions for initial gp!20 crystals.
All binary and ternary complexes were purified by gel filtration. D1D2 sCD4 refers to the two domain soluble CD4.
-The protein concentration is given as the abso bance (280 nm) of the complex per ml of solution.
-Most of the reservoirs are conditions from Crystal Screen 1 (Hampton Research); the reagent numbers given here refer to the crystallization reagent from this commercial kit. Hanging droplets were 0.5 μl protein (in 0.35 M NaCl, 5 mM Tris pH 7.0, 0.02% NaN3) + 0.5 μl reservoir, except for crystal type B, which used 0.5 ul of 3-fold diluted reservoir. Crystallization reservoirs were 500 μl; an additional 35 ul of 5 M NaCl was added after the droplet was mixed to compensate for the NaCl in the protein solution. All dilutions used H2O, except for crystal type F, where 22.5% isopropanol was used. Crystallizations were setup at room temperature and incubated at 20°C.
Table 6. Data collection statistics for Type E crystals of the two-domain CD4 (D1D2)/ Fab 17b / Δ82ΔV1/2*ΔV3ΔC5 gρ!20 complex.
all data 20-2.2 56,195 14.5 87.4 last shell 2.48-2.2 13,928 35.5 73.1
TABLE 7
PROTEIN4 CONC RESERVOIR SOLUTION" ulP μlR FK-ORE
A 11.0 Factorial 1 #40 0.5 0.35 5 A,B
500μl factorial/ 880μl Total vol. (1.76 dilution)
Factorial 1 #28 0.5 0.-35
500μl factorial/1375μl
Total vol. (2.75 dilution)
Factorial 1 #18 0.5 0.35 500μl/l375 total volume (2.75 dilution)
Factorial 1 #14 0.5 0.35 +50μl 100% PEG 400 500μl/550μl total volume (50μl of PEG only)
Factorial 1 #43 + 200μl 0.5 0.35
Saturated AM2S04
500μl only/700 μl total vol,
PS Factorial #46 0.5 0.35 10 200μ factorial/ 600μl total volume
PS Factorial #31 0.5 0.5 11 200μl factorial/ 550μl total volume (2.75 dilution)
Factorial 1 #18 0.5 0.35 12 +50μl pH 4.5 Na Acetate 0.5M / 250μl factorial/ 688 total volume
5.2 PS Factorial #26 0.5 0.5 13 200μl factorial/ 800μl total volume (4.0 dilution)
PS Factorial #28 0.5 0.5 14 200μl factorial/ 400μl total volume (2.0 dilution) 1.36 PS Factorial #35 0.5 0.5 15 200μl factorial/ 700μl total volume
1.4 PS Factorial #9 0.5 0.5 16 200μl factorial/ 200μl total volume
Factorial 1 #32 0.5 0.35 17 500μl factorial/ 900μl total volume
6.2 Factorial 1 #17 0.5 0.35 18 500μl factorial/ 1300μl total volume
Factorial 1 #18 0.5 0.35 19 500μl factorial/ 1300μl total volume
Factorial 1 #38 0.5 0.35 20 500μl factorial/ lOOOμl total volume
Factorial 1 #40 0.5 0.35 21 500μl factorial/ 1200μl total volume
Factorial 1 #46 0.5 0.35 22 500μl factorial/ 900μl total volume
PS Factorial #12 .05 0.5 23 200μl factorial/ 300μl total volume
PS Factorial #29 .05 0.5 24 200μl factorial/ 500μl total volume
Factorial 1 #40 + Factorial 1 # 16 0.5 0.35 25 150μl #16 250μl #40/600μl total vol.
Crystals gave good diffraction
* *N0TE : To all reservoirs 5M NaCl was added to bring the final concentration to 350mM. after droplet set up.
The final volume was made up by water , if there is a volume discrepancy . PRQTEIN
FAB/CD4/ISOLATE (DISTINGUISH qpl20) ; CONSTRUCT NAME
A 17b/D1D2/YU2 (Δ82HYB. ) ; Δ82ΔV, ,*ΔV3ΔC5
B SC17b/D1D2/HxBC2 (Δ82HYB) ; Δ82V1 ,*ΔV3ΔC5
G 17b/D1D2/HxBC2 (+V3) ;Δ82ΔV1>2*ΔC5
M 17b/D1D2/HxBC2 (+C1C5HYB. ) ;ΔV1 2*ΔV3
N 48D/D1D2/HxBC2 (+C1C5HYB. ) ;ΔV1 2*ΔV3
++ :
Factorial 1. is obtained in "Crystal Screen" provided by the Hampton Research, 27632 El Lazo Road, Suite 100, Laguna, Niguel, CA 92677- 3913, United States of America
PS FACTORIAL
% OF ADDITIVE PRECIPITANT PH SALT ADDITIVE
9 5% Isoproponal 4M NaCl 7.5 lOOmM CaCl2
12 2% PEG 400 2.5M Na/KP04 6. .5
26 10% Isoproponal 30% PEG 1500 6. .5 lOOmM CaCl2
28 8% MPD 30% PEG 1500 8. .5 t
29 15% Isoproponal 25% PEG 3,350 4.5 200mM A Cit
31 15% MPD 25% PEG 3,350 6.5 200mM LiS04
35 10% Isoproponal 20% PEG 8000 7. .5 200mM Am4S04 ( (NH„
46 20% PEG 1000 40% PEG 400 6. .5 200mM Na/KP04
MPD IS 2-METHYL-2,4-PENTANEDIOL; Am = NH4
Second Series of Experiments
The human immunodeficiency viruses (HIV-1 and HIV- 2) and simian immunodeficiency viruses (SIVs) are the etiologic agents of acquired immunodeficiency syndrome (AIDS) in their respective human and simian host (1) . Typically, infection with primate immunodeficiency viruses is characterized by an initial phase of high-level viremia, followed by a long period of persistent virus replication at a lower level (2) . Viral persistence occurs despite specific antiviral immune responses, which include the generation of neutralizing antibodies.
The primate immunodeficiency viruses, like all retroviruses, are surrounded by an envelope consisting of a host cell-derived lipid bilayer and virus-encoded envelope glycoproteins (3) . For the virus to enter target cells, the viral membrane must be fused with the plasma membrane of the cell, a process mediated by the envelope glycoproteins. The exposed location of these proteins on the virus allows them to carry out their function but also renders them uniquely accessible to neutralizing antibodies. Thus, dual selective forces, virus replication and immune pressure, have shaped the evolution of the envelope glycoproteins and continue to do so within each infected host. Below summarized the current understanding of the functional features of these proteins .
Synthesis and assembly of the envelope crlycoproteins .
In the infected cell, the envelope glycoproteins are synthesized as approximately 845-870 amino acid precursor in the rough endoplasmic reticulum. (N) - linked, high-mannose sugar chains are added to form the gpl60 glycoprotein, which assembles into oligomers (4- 6) . The preponderance of evidence suggests that these oligomeric complexes are trimers (4,5) . The gpl60 trimers are transported to the Golgi apparatus, where cleavage by a cellular protease generates mature envelope glycoproteins: gpl20, the exterior envelope glycoprotein, and gp41, the transmembrane glycoprotein (3) . The gp41 glycoprotein possesses an ectodomain that is largely responsible for trimerization (7) , a membrane -spanning anchor, and a long cytoplasmic tail. Most of the surface-exposed elements of the mature, oligomeric envelope glycoprotein complex are contained on the gpl20 glycoprotein. Selected, presumably well- exposed, carbohydrates on the gpl20 glycoprotein are modified in the Golgi apparatus by the addition of complex sugar (6) . The gpl20 and gp41 glycoproteins are maintained in the assembled trimer by non-covalent, somewhat labile interactions between the gp41 ectodomain and discontinuous structures composed of N- and C- terminal gpl20 sequences (8) . Upon reaching the infected cell surface, a fraction of these envelope glycoproteins complexes are incorporated into budding virus particles. A large number of the complexes disassemble, releasing gpl20 and exposing the previously buried gp41 ectodomain. These events contribute tot he formation of defective virions, which predominate in any retroviral preparation (9) .
Binding of the envelope glycoproteins to the CD4 receptor.
Many cell surface proteins, including adhesion molecules, are incorporated into HIV-1 virions along with the envelope glycoprotein complexes (10) . These host cell-derived molecules can assist the attachment of viruses to potential target cells. Virus attachment also involves the interaction of the gpl20 envelope glycoproteins with specific receptors, the CD4 glycoprotein (11) and members of the chemokine receptor family (12, 13) (Fig. 26). The CD4 glycoprotein is expressed on the surface of T lymphocytes, monocytes, -Ill- dendritic cells, and brain microglia, the main target cells for primate immunodeficiency virus in vivo. The requirement for CD4 binding exhibited by most primate immunodeficiency viruses for efficient entry is consistent with this observed in vivo tropism. A major function of CD4 binding is to induce conformational changes in the gpl20 glycoprotein that contribute to the formation and/or exposure of the binding site for the chemokine receptor (13, 14). Some HIV-1 and HIV- 2 isolates cultured in the laboratory, as well as several primary SIV isolates, no longer depend upon CD4 for efficient entry, and bind to chemokine receptors but not CD4 for interaction (15) . These examples and the observation that feline immunodeficiency viruses use chemokine receptors but not CD4 for entry (16) raise the distinct possibility that the chemokine receptors represent the primordial, obligate receptors for this retroviral lineage. The use of CD4 as a receptor may have evolved subsequently, allowing the high-affinity chemokine receptor-binding site of primate immunodeficiency viruses to be sequestered from host immune surveillance.
Multiple approaches have yielded insights into the structural basis for CD4 -binding by the primate immunodeficiency virus gpl20 glycoproteins. Early comparisons of gpl20 sequences revealed the existence of five variable (V1-V5) regions interspersed with five conserved regions (17) . Intramolecular disulfide bonds in the gpl20 glycoprotein result in the incorporation of the first four variable regions into large, loop-like structures (6) . Antibody binding studies and deletion mutagenesis have indicated that the major variable loops are well-exposed on the surface of the gpl20 glycoprotein (18, 19) . The more conserved regions fold into a gpl20 core which has been recently crystallized in a complex with fragments of CD4 and a neutralizing antibody (20) . The gpl20 core is composed of two domains, an inner domain and an outer domain, and a β sheet (the "bridging sheet") that does not properly belong to either domain (Fig. 27a) . These names reflect the likely orientation of gpl20 in the assembled envelope glycoprotein trimer: the inner domain faces the trimer axis and, presumably, gp41, while the outer domain is mostly exposed on the surface of the trimer. Elements of both domains contribute to CD4 binding. CD4 binds in a recessed pocket on gpl20, making extensive contact over approximately 800 A°2 of the gpl20 surface. Two cavities are evident in the gpl20-CD4 interface. A shallow cavity is filled with water molecules, while a deep cavity extends 10-15 A° into the interior of gpl20. The opening of this deep cavity is occupied by phenylalanine 43 of CD4 , which has been shown by mutagenic analysis to be critical for gpl20 binding
(21) . Most of the gpl20 residues previously identified as important for CD4 binding (22,23) surround the opening of the deep cavity and contribute to interactions with phenylalanine 43 of CD4. In addition, aspartic acid 368 of gpl20 forms a salt bridge with arginine 59 of CD4 , also shown by mutagenesis to be important for gpl20 binding (21) . Additionally, mainchain atoms on gpl20 and CD4 form hydrogen bonds bridging the two proteins. The formation of the deep cavity in gpl20 likely contributes to the transmission of CD4-induced conformational changes to gpl20 elements involved in the interaction with chemokine receptors and/or gp41. The deep cavity may be a useful target for intervention by small molecular weight compounds.
Chemokine receptor binding
Most primary, clinical isolate of primate immunodeficiency viruses use the chemokine receptors
CCR5 for entry (12) . For most HIV-1 isolated that are transmitted and that predominate during the early years of infection, CCR5 is an obligate coreceptor, and rare individuals that are genetically deficient in CCR5 expression are relatively resistant to HIV-1 infection (24) . HIV-1 isolates arising later in the course of infection often-use other chemokine receptors, frequently CXCR4 , in addition to CCR5 (12,24). Studies of chimeric envelope glycoproteins demonstrated that the third variable (V3) loop of gpl20 is a major determinant of chemokine receptor choice (12,25) . V3-deleted versions of gpl20 do not bind CCR5 , even though CD4 binding occurs at wild-type levels (14) . Antibodies against the V3 loop interfere with gpl20-CCR5 binding
(14) . These results support an involvement of the V3 loop in chemokine receptor binding. Other, conserved gpl20 structures also appear to play an important role in chemokine receptor binding. The use of CCR5 by a diverse group of immunodeficiency viruses with divergent V3 sequences, first suggest the involvement of more conserved gpl20 elements (26) . Antibodies that recognize conserved, discontinuous gpl20 epitopes that are more exposed after CD4 binding are potent inhibitors of gpl20-CCR5 interaction (14) . These CD4- induced
(CD4i) epitopes are discussed further below. Recent mutagenic and structural analysis have revealed the existence of a highly conserved gpl20 structure that is important for CCR5 binding (20,27) (Fig. 27, a and b) . This structure is adjacent to the V3 loop and the CD4i epitopes, and is oriented to face the target cell upon gpl20-CD4 binding.
gp41 -mediated membrane fusion.
It is likely that the interaction of the gpl20-CD4 complex with the appropriate chemokine receptor promotes additional conformational changes in the envelope glycoprotein complex. By analogy with the influenza hemagglutinin, it has been suggested that the HIV-1 gp41 ectodomain undergoes major conformational changes during virus entry (28) . The proposed result of these changes is the insertion of the hydrophobic gp41 amino terminus (the "fusion peptide") into the membrane of the target cell. Mutagenic analysis (23,29) and the recently determined crystal structures of HIV-1 gp41 ectodomain fragments (5) are consistent with this model. The gp41 ectodomain structures reveal an extended, trimeric coiled coil that could potentially bridge the viral and target cell membranes (5) . Interactions of other gp41 helical segments near the membrane-spanning region with the interhelical grooves of the internal coiled coil are important for fusion-related conformational changes in gp41. This interaction can be inhibited by helical peptides that mimic either of the involved gp41 helices (30) and is a potential target for future intervention with small molecular weight compounds.
The HIV-1 envelope glycoproteins as antigens.
The exposure of the primate immunodeficiency virus envelope glycoproteins on the surface of virions or infected cells makes them prime targets for antibodies that potentially block key functions of these proteins. However, the success of these viruses in achieving persistent infections implies that the viral envelope glycoproteins have evolved to be less-than-ideal immunogens and antigens. Structures on the viral envelope glycoproteins that are conserved among diverse viral strains are, in general, poorly exposed to the humoral immune system. The conserved gpl20 surfaces involved in binding to its three minimally polymorphic ligands, gp41, CD4 and chemokine receptors, each exhibit particular problems with respect to the elicitation of sensitivity to neutralizing antibodies. The moieties involved in gpl20-gp41 association are buried in the interior of the functional envelope glycoprotein spike (18, 31, 32). The CD4 binding sites is recessed, flanked by variable regions exhibiting considerable glycosylation (19,20). The chemokine receptor-binding site is masked by variable loops, probably V3 and V2 (20,32,33) (Figure 27c). Even in the relatively conserved HIV-1 gpl20 core that has been structurally analyzed, the outer domain exhibits a variable, heavily glycosylation surface (20) . Since most carbohydrate moieties may appear as "self" to the immune system, this concentrated glycosylation may reduce the potential of a large portion of the gpl20 surface to serve as an immunogenic target. Thus, in addition to the neutralizing and nonneutralizing faces of gpl20 previously detected by antibody competition analysis
(32) , the crystal structure of the gpl20 core reveals a third, immunologically silent face of gpl20 (Fig 6D) .
Despite the potential to exert potent antiviral effects, antibodies are not able to suppress virus replication completely in infected hosts. The efficacy of the humoral immune response in limiting virus spread in vivo is compromised by at least two factors: 1) the relative resistance of primary virus isolates to neutralization; and 2) the temporal pattern with which neutralizing antibodies are generated.
Decreased neutralization sensitivity of primary HIV-1 isolates .
HIV-1 viruses that have been passaged in immortalized cell lines are typically more sensitive to neutralization by antibodies or soluble CD4 than are primary, clinical isolates (34) . Although other envelope glycoprotein regions can influence this phenotype, a major determinant is the structure of the gpl20 major variable loops, V1/V2 and V3 (35) . Thus, replacement of the V1/V2 and V3 variable loops of a laboratory-adapted virus with those of a neutralization- resistant primary isolate creates a virus similar to the parental primary virus (35) . The basis for the decreased sensitivity of primary HIV-1 isolates to neutralization appears to involve a decreased exposure of the relevant gpl20 epitopes to soluble CD4 or antibody. This decrease is most apparent in the context of the assembled oligomeric complex (36) . A likely explanation for this neutralization resistance is that the major variable loops of primary viruses assume tightly interfacing, "closed" conformations that decrease the accessibility of many gpl20 epitopes to antibodies.
The temporal pattern of the antibody response to HIV-1 infection.
The noncovalent nature of the association between gpl20 and gp41 contributes to the lability of the functional envelope glycoprotein trimer (8,9) . During natural infections, disassembled envelope glycoproteins apparently elicit most of the antibodies directed against these viral components. The interactive regions of gpl20 and gp41 are particular immunogenic (37) .
However, since the cognate antibodies cannot bind the assembled, functional envelope glycoprotein complex, they do not exhibit neutralizing activity. Thus, although antibodies against the envelope glycoproteins typically can be detected in the sera of HIV-1-infected individuals by two-three weeks after infection, most of these antibodies lack the ability to inhibit virus infection. By the time that neutralization antibodies are efficiently elicited, HIV-1 is firmly established in the host.
Several weeks after virus infection, usually after the initial high level of viremia has subsided, neutralizing antibodies can be detected in the sera of infected animals or humans (38) . These antibodies neutralize the infecting virus but often exhibit little of no activity against other stains of virus . A subset of these strain-restricted antibodies recognize the HIV-1 V3 loop (38) . These antibodies can block chemokine receptor binding (14) . Other variable gpl20 elements can contribute to the epitopes recognized by the strain- restricted neutralizing antibodies. It is known, for example, that antibodies directed against the gpl20 V2 loop can also exhibit neutralizing activity (39) . The V2 loop-associated neutralization epitopes are typically conformation-dependent. The ability of some V2-or V3- directed antibodies to recognize more than one HIV-1 strain (39,40) suggests that these major variable loops assume a finite number of conformations. This is consistent with the functional consequences on virus entry of some changes in these variable structures (41) , and with the observation that amino acid substitutions in the variable loops are not random (42) . The requirement for chemokine receptor binding probably constrains V3 loop variation. The V2 loop, although dispensible for the replication of some HIV-1 viruses in culture (33), helps protect the V3 loop and the conserved epitopes near the chemokine receptor binding site from neutralizing antibodies. Thus, the V2 and V3 loops reside proximal to the chemokine receptor binding site (Fig. 27), masking more conserved gpl20 elements and presenting potentially variable epitopes to the immune system.
Later in the course of HIV-1 infection of humans, antibodies capable of neutralizing a wider range of HIV- 1 isolated appear (43) . A subset of the broadly reactive neutralizing antibodies, found in most HIV-1 infected individuals, interferes with the binding of gpl20 and CD4 (43) . Human monoclonal antibodies derived from HIV-1 infected individuals have been identified that recognize the gpl20 glycoproteins from a diverse range of HIV-1 isolates, that block gpl20-CD4 binding, and that neutralize virus infection (44) . The discontinuous epitopes (the so-called CD4BS epitopes) recognized by many of these human monoclonal antibodies have been characterized by mutagenic analysis (45) . The gpl20 residues important for antibody binding are all located within the CD4 -binding pocket on gpl20 (Fig. 27b) , and several of the most important residues are near the opening of the deep cavity (20) . Therefore, some broadly neutralizing antibodies can apparently access the more recessed elements of the CD4 binding pocket. This is consistent with the observation that the gpl20-CD4 interface is as large as that of a typical antibody-antigen complex (20) .
A second group of neutralizing antibodies found in a smaller number of HIV-1-infected humans is directed against the CD4- induced (CD4i) epitopes (46) . The CD4i epitopes are located near conserved gpl20 structures important for chemokine receptor interaction (14) (Fig. 27b) . CD4 binding has been shown to cause a change in the V2 loop conformation that allows better CD4i epitope exposure (33). In the absence of CD4 , the antibodies recognizing the CD4i epitopes must bypass the overlapping V2 and V3 loops (33) . Indeed, as is evident in the current crystal structure (20) , this is accomplished by the protrusion of the CDR3 loop of the antibody heavy chain. Antibodies against CD4i epitopes need to bind viruses before CD4 binding occurs to achieve neutralization (47) . The reason is that once the envelope glycoprotein complex binds cell surface CD4 , there are severe steric constraints on the binding of an antibody to the gpl20 surface facing the target cell (Fig. 26) .
Another fairly conserved gpl20 neutralization epitope is recognized by the 2G12 antibody (48) . Unlike the other characterized HIV-1 neutralizing antibodies, which recognize gpl20 structures near or within the receptor- binding sites, the 2G12 antibody apparently binds an epitope in the outer domain (Fig. 27b) . Given the variability in this outer domain, the ability of the 2G12 antibody to neutralize a fair number of HIV-1 strains (48) seems paradoxical. The marked sensitivity of 2G12 binding to alterations in gpl20 glycosylation provides a clue to this puzzle. Despite the variability of the underlying primary amino acid sequence, the 2G12 antibody may recognize more conserved carbohydrate structures formed as a result of the heavy concentration of N-linked glycosylation in the gpl20 outer domain. The apparent rarity with which 2G12-like antibodies are elicited attests to the success of the viral strategy of employing a heavily glycosylated outer domain surface in immune evasion.
The HIV-1 envelope glycoproteins as vaccine components. That the human and simian immunodeficiency virus envelope glycoproteins are not ideal immunogens is an expected consequence of the immunological selective forces that drove the evolution of these viruses. The same features of the envelope glycoproteins that dictate poor immunogenicity in natural infections have hampered vaccine development. The lability of envelope glycoprotein complex has frustrated attempts to present oligomers mimicking the functional spike to the immune system. As discussed above, the disintegration of envelope glycoprotein oligomers contributes to the preferential elicitation of non-neutralizing antibodies by the newly exposed gpl20 N- and C-termini. Regardless of the context in which the envelope glycoproteins are presented, the gpl20 variable loops elicit the majority of neutralizing antibodies, probably due to the exposed nature of these epitopes. It is still unclear whether conserved features in the V2 and V3 variable loops exist that can be exploited in vaccine design, or whether all possible functional configurations of these variable structures need to be represented in a cocktail of immunogens .
The discontinuous gpl20 structures surrounding the receptor binding sites exhibit a relatively high degree of conservation (20), in keeping with the minimal polymorphism in the host cell receptors. The CD4 binding site contributes a particularly attractive target. It appears to be accessible to antibodies, more so than the conserved elements of the chemokine receptor-binding region. A large fraction of the broadly neutralizing antibodies that eventually appear in HIV-1-infected individuals is directed against the CD4 binding site (43), indicating that ability of the human immune system to recognize this gpl20 region and to generate an appropriate response. Nonetheless, these antibodies have been difficult to elicit in animals and vaccinated humans (49) . The reasons for the relatively poor immunogenicity of the CD4 binding site are not yet understood, although several possibilities can be envisioned. Interdomain flexibility may disrupt the CD4BS epitopes and decrease their representation in the pool of immunogens. Masking by variable loops (19,33) and glycosylation may contribute to the recessed nature of the CD4BS epitopes which, even on the crystallized gpl20 core, occupy a 20 A° deep canyon (20) . Within the CD4 -binding pocket, not all of the gpl20 surface is conserved among HIV-1 strains. Therefore, even when elicited, some CD4BS-directed antibodies may lack the breadth and affinity to be optimal neutralization agents. While many monoclonal antibodies against the CD4 binding site exhibit reasonable potency and breadth (44) , whether a polyclonal response against the envelope glycoprotein can be focused to preferentially contain these types of antibodies remains to be seen.
The conserved element near the chemokine receptor- binding site will be difficult target for vaccine- elicited antibodies. Known monoclonal antibodies to the CD4i epitopes must interact with virus prior to CD4 binding if neutralization is to be achieved (47) . Yet these gpl20 structures are poorly exposed in the absence of CD4 , in large part due the overlying V2 loop (33) . This is consistent with the relative rarity with which these antibodies appear to be elicited in HIV-1-infected humans (46) . Attempts to expose these structures better on gpl20 -based antigens seem warranted.
Summary
The HIV-1 envelope glycoproteins have evolved to be inefficient at eliciting effective antiviral antibody responses. The availability of structural information on the conserved HIV-1 gpl20 neutralization epitopes should facilitate the modification of this important antigen and allow the rational testing of hypotheses regarding its poor immunogenic properties. These efforts should complement ongoing efforts to improve antigen presentation to the immune system and to create suitable animal models for the screening of vaccine candidates .
References for the Second Series of Experiments
1. F. Barre-Sinoussi et al . Science 220,868(1983); R.C. Gallo et al . , ibid. 224,500(1984); M.D. Daniel et al., ibid. , 228, 1201 (1985) ; N.L. Letvin et al . , ibid. 230,71 (1985) .
2. R.W. Coombs et al . , N. Engl. J. Med. 321,1626(1989); S.J. Clark et al . , ibid. 324,950(1991); E. S. Daar, T. Moudgil, R. Meyer, D.D. Ho., ibid. 961(1991); A. S. Fauci, G.
Pantaleo, S. Stanley, D. Weissman, Ann. Int. Med.
124(1996); D.D. Ho, T. Moudgil, M. Alam, N. Engl. J. Med.321, 1621 (1989) .
3. J.S. Allan et al . , Science 228,1091(1985); W.G. Robey et al . , ibid. 229,1402(1985) .
4. P.L. Earl, B. Moss, R. W. Doms, J. Virol. 65,2047(1991); P.L. Earl, R.W. Doms, B. Moss, Proc. Natl. Acad. Sci. U.S.A. 87,648(1990); A. Pinter et al., J. Virol. 63,2674(1989); M. Lu, S. Blacklow, P. Kim. Nature Struct. Biol. 2,1075(1995); CD. Weiss, J. Levy, J. M. White, J. Virol. 64, 5674 (1990) .
D.C. Chan, D. Fass, J. M. Berger, P.S. Kim, Cell 89,263(1997); W. Weissenhorn, A. Dessen, S.C. Harrison, J.J. Skehel, D.C. Wiley, Nature 387,426 (1997) .
CK. Leonard, et al . , J. Biol. Chem. 265,10373 (1990) .
7. P.L. Earl and B. Moss, AIDS Res. Hum. Retroviruses 9,589(1993); D. Sugars, C. Wild, T. Greenwell, T.
Matthews, J. Virol. 70, (1996) . 8. E. Helseth, U. Olshevsky, C. Furman, J. Sodroski, J. Virol. 65,2119(1991) .
9. J.A. McKeating, A. McKnight, J.P. Moore, J. Virol. 65,852(1991); W.P. Tsai, S. R. Conley, H. Kung, R.
Garrity, P. Nara, Virology 226,205X1996) . .
10. P.W. Berman and G.R. Nakamura, AIDS Res. Hum. Retroviruses 10,585(1994); R. Cantin, J.-F. Fortin, G. Lamontagne, M. Tremblay, Clood 90,1901(1997); R.
Cantin, J.-F., G. Lamontagne, M. Tremblay, J.
Virol. 71,1922(1997); J.-F. Fortin, R. Cantin, M.
Tremblay, ibid. 72,2105(1998); Castilleti et al . ,
AIDS Res. Hum. Retroviruses 11,547(1995); J. -F. Fortin, R. Cantin, G. Lamontagne, M. Tremblay, J.
Virol. 71,3588(1997); I. Frank et al . , AIDS
10,1611(1996); M.M.L. Guo and J.E.K. Hildreth, AIDS
Res. Hum. Retrovirouses 11,1007(1995); L.E.
Henderson et al . , J. Virol. 61,629(1987); J. Hoxie et al., Hum. Immunol. 18,39(1987); J. Hoxieet al . ,
Hum. Immunol. 18,39(1987); G. Pantaleo et al . , J.
Exp. Med. 173,511(1991); CD. Rizzuti and J. G.
Sodroski, J. Virol. 71,4847(1997); J. Rossio, J.
Bess, L.E. Henderson, P. Cresswell, L.A. Arthur, AIDS Res. Hum. Retrovirouses 11, 1433(1995); M.
Saifuddin et al . , Exp. Med. 182,501(1995) .
11. D. Klatzmann et al . , Nature 312,767(1984); A. G. Dalgleish et al . , ibid. 312,763(1984); J.S. McDougal et al . , Science 312,763(1984); J. S.
McDougal et al . , Science 231,382(1986) .
12. Y. Feng, C. Broder, P. Kennedy, E. Berger, Science 272, 872(1996); H. Choe et al . , Cell 85,1135(1996); H.K. Deng et al . , Nature 381, 661(1996); T. Dragic et al., ibid, 667(1996); B.J. Doranz et al . , Cell 85,1149(1996); G. Alkhatib et al . , Science 272 , 1955 ( 1996 )
13. Q. Sattentau and J. Moore, J. Exp. Med. 174,407(1991); M. Thali et al . , J. Virol. 67,3987(1993); Q. Sattentau, J. Moore, F. Vignaux,
F. Traincard, P. Poignard, ibid. 6.7 7383(1993) .
14. A. Trkola et al . , Nature 384,184(1996); L. Wu et al . , ibid., 179(1996); C. Lapham et al . , Science 274,602(1996); J.C. Bandres et al . , J. Virol.
72,2500(1998); CM. Hill et al . , ibid. 71, 6296 (1997) .
15. P. Chapman, A. McKnight, R. Weiss, J. Virol. 66, 3531(1992); J. Reeves and T. Shulz, ibid.
71,1453(1997); M. J. Endres et al . , Cell 87,745(1996); J. Dumonceaux et al J. Virol. 72,512(1998); A L. Edinger et al . , Proc. Natl. Acad. Sci. U.S.A. 94,14742(1997); K Martin , et al., Science 278,1470(1997) .
16. B.J. Willett, M.J. Hosie, J. C Neil, J. D. Turner, J. A. Hoxie, Nature 385,587 (199&) ; B.J. Willett et al . , J. Virol. 71,6407(1997); M.J. Hosie et al . , ibid. 72,2097 (1998) .
17. B.R. Sarcich et al . , Cell 45,637(1986) .
18. J. Moore, Q. Sattentau, R. Wyatt, J. Sodroski, J. Virol. 68, 469(1994); S. Pollard, M.D. Rosa, J.
Rosa, D.C. Wiley, EMBO J. 11,585(1992) .
19. R. Wyatt et al . , J. Virol. 67,4557(1993).
20. P. Kwong et al . , submitted; P. Kwomg et al . , in preparation; R. Wyatt et al . , in preparation. 21. A. Peterson and B. Seed, Cell 54,65(1988); J. Arthos et al . , ibid. 57, 469 (1989) ;M. Brodsky, M. Warton, R. M. Myers, D.R. Litman, J. Immunol. 144,3078(1990); A. Ashkenzi et al . , Proc. Natl. Acad. Aci. U.S.A. 87,7150(1990); H. Choe et al . , J.
Aids 5,204(1992); S. E. Ryu et al . , Nature 348,419(1990); J. Wang et al . , ibid. 411(1990); H. Wu, P. Kwong, W. Hendrickson, ibid 387,527(1997) .
22. L. Lasky et al . , cell 50,975(1987; A. Cordonnier et al., Nature 340,571(1989); A. Cordonnier et al . , J. Virol 63,4464(1989); U. Olshevsky et al . , ibid. 64, 5701 (1990) .
23. M. Kowalski et al . , Science 237,1351(1987) .
24. R.I. Connor, K. Sheridan, D. Ceradini, S. Choe, N. Landau, J. Exp. Med. 185,621(1997); L. Zhang, Y. Huang, T. He, Y. Cao, D. D. Ho, Nature 383,768(1996); A. Bjorndal et al . , J. Virol, 71,
7478(1997); M. Dean et al . , Science 273,1856(1996); R. Liu et al., Cell 86,367(1996); W.A. Paxton et al., Nat. Med. 2,412(1996); M. Samsonet al . , Nature 382, 722 (1996) .
25. F. Cocchi et al . , Nature Med. 2,1244(1996); P.D. Bieniasz et al . , EMBO J. 16,2599(1997); R. Speck et al. , J. Virol. 71, (1997) .
26. L. Marcon et al . , J. Virol. 71,2522(1997); Z. Chen, P. Zhou, D. Ho, N. Landau, P. Marx, ibid. 71,2705(1997) ; F. Kichhoff et al . , ibid..71, 6509 (1997) ; J. Rucker et al . , ibid. 8999(1997); N/ Sol et al . , ibid. 71,82837(1997) .
27. C Rizzuto et al . , in preparation. 28. CM. Carr and P.S. Kim, Cell 73,823(1993); P. Builough, F. Hughson, J/ Skehel, D.C. Wiley, Nature 371,37(1994); W. Weissenhom et al . , EMBO J. 15, 1507 (1996) .
29. E. Freed, D. Myers, R. Risser, Proc. Natl. Acad. Sci. U.S.A. 87,4650(1990); J. Cao et al . , Virol. 67,2747(1993); J.M. Felser, T. Klimkait, J. Silver, Viorlogy 170,566(1989); H. Schaal, M/ Klein, P. Gehrmann, O. Adams, A. Scheid, J. Virol.
69,3308(1995); M. Delahunty, I. Rhee, E. Freed, J.
Bonifacino, Virology 218,94(1996); J.W. Dubay, S.
Roberts, B. Brody, E. Hunter, J. Virol.
66,4748 (1992) .
30. C.-H. Chen, T.J. Matthews, CB. McDanal, D.P. Bolgnesi, M.L. Greenberg, J. Virol. 69,3771(1995); C. Wild., T. Oas, C McDanal, D. Bolgnesi, T. Mattews, Proc. Natl. Acad. Sci. U.S.A. 89,10537(1992); S. Jiang, K. Lin, N. Strick, A.R.
Neurath, Nature 365,113(1993); S. Jiang, k. Lin, N. Strick, A.R. Neurath, BBRC 195,533(1993) .
31. R. Wyatt, wtal . , J. Virol. 71,9722(1997) .
32. J. Moore and J. Sodorski, ibid. 70,1863(1996) .
33. R. Wyatt et al . , ibid. 69,5723(1995); J. Cao et al. , ibid 71, 9722 (1997) .
34. E. Daar, X. L. Li, T. Moudgil, D.D. Ho, Proc. Natl. Acad. Sci. U.S.A. 87,6574(1990); P.J. Gomatos et al., J. Immunol. 144,4183(1990); T. Wrin et al . , J. Virol. 69,39(1995); J.R. Mascola et al . , J. Inf. Dis. 169,48(1994); L. Sawyer et al . , J. Virol.
68,1342(1994); N. Sullivan, Y. Sun, J. Li, W. Hofmann, J. Sodroski, J. Virol. 69,4413(1995); J. P. Moore and D.D. Ho, AIDS 9,S117 (1995); W. O'Brien, S. Mao, Y. Cao, J. Moore, J. Virol. 68,5264(1994); Y. Xhang, R. Fredriksson, J. MckEating, E. M. Fenyo, Virology 238,254(1997); T. Mattewsm AIDS Res Human Retrovisruses 10,631(1994) .
35. T. Morikita et al . , AIDS Res. Human Retrovirsues 13,1291(1997); A. Koito, G. Harrow, J. Levy, C. Cheng-Meyer, J. Virol. 68,2253(1994); S. Hwang, T. Boyle, H.K. Lyerly, B. Cullen, Science
257,535(1992); N. Sullivan et al . , submitted.
36. T. Fouts, J. Binley, A. Trkola, J. Robinson, J. P. Moore, J.
37. J. Wang, S. Steel, R. Wisniewolski , C.Y. Yang, Proc. Natl. Acad. Sci., U.S.A. 83,6159(1986); J. W. Gnannm Jr. J. Nelson, M.A. B. Oldstone, J. Virol. 61,2639(1987); S. Karwowska et al . , AIDS Res. Human Retroviruse 8,1099(1992); J. Krowka et al . , Clin.
Immunol. Immunopathol . 59. 53(1991); T.J. Palker et al ., Proc. Natl., Acad. Sci. U.S.A.
84,2479(1987); J. M. Bunley et al . , AIDS Res. Hum.
Retroviruses 12,911 (1996) . 38. P. Nara et al . , J. Virol. 61, 3173(1987); P. Nara et al., ibid. 64,3779(1990); A. Gegerfelt, J.
Albert, L. Morfeldt-Manson, K. Broliden, E. M.
Fenyo, Virology 185, 162, (1991); M. Arendrup et al., JAIDS5, 303(1992); J. Li et al . , J. Virol. 69, 7061 (1995) .
39. J. Mckeating et al . , J. Virol. 67,4932(1993); M.S.C Fung et al . , ibid. 66,8489(1992); J. Moore et al., ibid. 67, 6136(1993); M. Grony et al . , ibid. 68,8312(1994); C. Shotton et al . , ibid.
69,222(1995); H. Ditzel et al , J. Immunol 154, 893 (1995) . 40. T. Ohnoet al . , Proc. Natl. Acad. Sci. U.S.A. 88,10726(1991); J.P. Moore et al . , J. Virol.69, (1995); M. Gorny et al . , ibid. 66,7538(1992) .
41. E. Helseth et al . , J. Virol. 64,2146(199); E. Freed, D. Myers, R. Risser, ibid. 6.5, 190(1991); L. Ovanoff et al . , AIDS Res. Hum. Retroviruses 7,595(1991); K. Page, S. Stearns, D. Littman, J. Virol. 66,524(1992); N. Sullivan, M. Thali, C Furman, D. Ho, J. Sodroski, ibid. 67,3674(1993) .
42. G. Myers et al , Human Retroviruses and AIDS 1996 Los Almos National Laboratory, Los Alamos, N.M.) .
43. A. Profy et al . , J. Iimunol. 144, 4641(199)); I.
Berkower, G. Smith, C. Girl, D. Murphy, J. Exp.
Med. 170, 1681(1989); C.-y. Kang et al, Proc. Natl.
Acad. Sci. U.S.A. 88,6171(1991); K.S. Steimer, C.J.
Scandella, P. V. Skiles, N.L. Haigwood, Scienc 254,105(1991); J.P> Moore and D.D. Ho, J. Virol.
67, 863 (1993) .
44. M. Posner et al . , J. Immunol. 146, 4325(1991; S.A. Tilley, W. Honnen, M. Racho, M. Hilgartner, A. Pinter, Res. Virol. 142,247(1991); D.R. Burton et al., Science 26,1024(1994); D.D. Ho, et al . , J. Virol. 65,489(1991); S. Karbobska et al . , AIDS Res. Human Retroviruses 8, 689(1992) .
45. M. Thali et al . , J. Virol. 65, 6188(1991); M. Thali et al., ibid. 66,5635(1992); J. McKeating et al . , Virology 190, 134 (1992) ;M. Shutten et al . , AIDS 7, 919 (1993) .
46. M. Thali et al . , J. Virol. 67,3978(1993); C.-Y. Kang, K. Harihan, M.R. Posner, P. Nara, J. Immunol. 151, 449 (1993) . 47. N. Sullivan et al . , submitted.
48. T. Muster et a; , J. Virol. 67,6642(1993); A Trkola., ibid. 70, 1100(1996) .
49. J. Rusche et al . , Proc. Natl. Acad. Sci. U.S.A. 84, 6924(1987); J. Klaniecki et al . , AIDS Res. Hum. Retroviruses 7,791(1991); N. Haigwood et al . , J. Virol. 66,172 (1992) .
Third Series of Experiments
The entry of human immunodeficiency virus (HIV) into cells requires sequential interactions of the viral exterior envelope glycoprotein, gpl20, with the CD4 glycoprotein and a chemokine receptor on the cell surface. These interactions initiate a fusion of the viral and cellular membranes. Although gpl20 can elicit virus-neutralizing antibodies, HIV eludes the immune system. We have solved the X-ray crystal structure at 2.5ύ resolution of an HIV-1 gpl20 core complexed with a two-domain fragment of human CD4 and an antigen-binding fragment of a neutralizing antibody that blocks chemokine-receptor binding. The structure reveals a cavity- laden CD4-gpl20 interface, a conserved binding site for the chemokine receptor, evidence for conformational change upon CD4 binding, the nature of a CD4-induced antibody epitope, and specific mechanisms for immune evasion. Our results provide a framework for understanding the complex biology of HIV entry into cells and will guide efforts to intervene.
Introduction
Human immunodeficiency viruses, HIV-1 and HIV- 2, and the related simian immunodeficiency viruses (SIV) cause the destruction of CD4+ lymphocytes in their respective hosts, resulting in the development of acquired immunodeficiency syndrome (AIDS) (1, 2) . The entry of HIV into host cells is mediated by the viral envelope glycoproteins, which are organized into oligomeric, probably trimeric, spikes displayed sparsely on the surface of the virion. These envelope complexes are anchored in the viral membrane by the gp41 transmembrane envelope glycoprotein. The surface of the spike is composed primarily of the exterior envelope glycoprotein, gpl20, associated by noncovalent interactions with each subunit of the trimeric gp41 glycoprotein complex(3, 4.) When the gpl20 sequences of different primate immunodeficiency viruses were initially compared, five variable regions (V1-V5) were identified (5) . The first four variable regions form surface-exposed loops that contain disulfide bonds at their bases (6) . The conserved gpl20 regions form discontinuous structures important for the interaction with the gp41 ectodomain and with the viral receptors on the target cell. Both conserved and variable gpl20 regions are extensively glycosylated (6) . The variability and glycosylation of the gpl20 surface likely modulate the immunogenicity and antigenicity of the gpl20 glycoprotein, which is the major target for neutralizing antibodies elicited during natural infection (7) .
Entry of primate immunodeficiency viruses into the host cell involves the binding of the gpl20 envelope glycoprotein to the CD4 glycoprotein, which serves as the primary receptor. The gpl20 glycoprotein binds to the most amino-terminal of the four immunoglobulin-like domains of CD4. Structures of both the N-terminal two domains (8, 9) and the entire extracellular portion of CD410 have been determined, and mutagenesis studies indicate that the CD4 structure analogous to the second complementarity-determining region (CDR2) of immunoglobulins is critical for gpl20 binding (11, 12) . Conserved gpl20 residues important for CD4 binding have likewise been identified by mutagenesis (3, 13, 14) .
CD4 binding induces conformational changes in the gpl20 glycoprotein, some of which involve the exposure and/or formation of a binding site for specific chemokine receptors. These chemokine receptors, mainly CCR5 and CXCR4 for HIV, serve as obligate second receptors for virus entry (15, 16.) The gpl20 third variable (V3) loop is the major determinant of chemokine receptor specificity (17) . However, other more conserved gpl20 structures that are exposed upon engagement of CD4 also appear to be involved in chemokine-receptor binding. This CD4- induced exposure is indicated by the enhanced binding of several gpl20 antibodies (18, 19) which, like V3-loop antibodies, efficiently block..the binding of gpl20-CD4 complexes to the chemokine receptor (20) . These are called the CD4-induced (CD4i) antibodies. CD4 binding may trigger additional conformational changes in the envelope glycoproteins. For example, the binding of CD4 to the envelope glycoproteins of some HIV-1 isolates induces the release or "shedding" of the gpl20 protein from the complex (21) , although the relevance of this process to HIV entry is uncertain.
HIV and related retroviruses belong to a class of enveloped fusogenic viruses that includes corona-, paramyxo- and orthomyxoviruses (e.g. influenza virus), all of which require post-translational cleavage for activation. The transmembrane coat proteins of these viruses (gp41 equivalents) share sequence resemblance, particularly in their N-terminal fusion peptides, and they participate directly in membrane fusion. The ectodomain of gp41 can form a coiled coil resembling that of influenza hemagglutinin HA2 (23, 4, 22,) supporting the notion that this class of viruses may share some common aspects with respect to virus entry. In other respects, enveloped viruses tend to be distinctive. They use varying modes of entry (direct membrane penetration for HIV, endocytosis for influenza virus) and even otherwise closely related viruses may use individualized receptors. The exterior coat proteins (gpl20 equivalents) are accordingly specialized. Thus, for example, there is no detectable similarity in sequence, nor now in structure, between the receptor binding portion of HIV and that of murine leukemia virus (23), another retrovirus. Mechanisms for receptor-mediated triggering of fusion may also be virus specific .
Because of the key role that the gpl20 glycoprotein plays in receptor binding and in interactions with neutralizing antibodies, knowledge ..of the gpl20 structure is important for understanding HIV infection and for the design of therapeutic and prophylactic strategies. Here, we report the crystal structure, at 2.5 A° resolution, of an HIV-1 gpl20 core bound to a two-domain fragment of the CD4 cellular receptor and to the antigen-binding fragment (Fab) of an antibody, 17b, that is directed against a CD4i epitope. A companion report relates this structure to the antigenic properties of the gpl20 envelope proteins (24) .
Structure determination
The extensive glycosylation and conformational heterogeneity associated with the HIV-1 gpl20 glycoprotein recommended a crystallization strategy aimed at radical modification of the protein surface. We made truncations at termini and variable loops in various combinations with gpl20 from various strains, extensively deglycosylated these gpl20 variants, and produced complexes with various ligands. A theoretical analysis showed that the probability of crystal formation is greatly increased by such reduction of surface heterogeneity and trials with multiple variants (25) . After screening almost twenty combinations of gpl20 variants and ligands, we obtained crystals of a ternary complex composed of a truncated form of gpl20, the N-terminal two domains (D1D2) of CD4 , and an Fab from the human neutralizing monoclonal antibody 17b (18, 25) .
The crystallized gpl20 is from the HXBc2 strain of HIV-1. It has deletions of 52 and 19 residues from the N- and C- termini, respectively; Gly-Ala-Gly tripeptide substitutions for 67 Vl/V2-loop residues and 32 V3-loop residues; and the removal of all sugar groups beyond the linkages between the two core N-acetylglucosamine residues. This deglycosylated core gpl20 eliminates over 90% of the carbohydrate but retains over 80% of the non-variable-loop protein. Its capacity to interact with CD4 and relevant antibodies is preserved at or near wild-type levels26. The crystals are of space group P222-L (a=71.6, b=88.1, c=196.7A°) with one ternary complex and 60% solvent in the crystallographic asymmetric unit .
The ternary structure was solved by a combination of molecular replacement, isomorphous replacement, and density modification techniques. It has been refined to an R-value of 21.0% (5-2.5 A° data > 2 σ , R-free=30.3%) . The final model, composed of 7877 atoms comprises residues 90-396 and 410-492 of gpl20 (excepting loop substitutions), residues 1-181 of CD4 , and residues 1-213 of the light chain and 1-229 of the heavy chain of the 17b monoclonal antibody. In addition, 11 N-acetylglucosamine and 4 fucose residues, and 602 water molecules have been placed. The overall structure of the complex of gpl20 with D1D2 of CD4 and Fab 17b is as depicted in Fig. 28.
Structure of gpl20
The deglycosylated core of gpl20 as dissected from the ternary complex approximates a prolate ellipsoid with dimensions of 50 x 50 x 25ϋ, although its overall profile is more heart-shaped than circular. Its backbone structure is shown in Figs. 29a & c in an orientation precisely perpendicular to that in Fig. 28 (Fig. 31e gives a mutually perpendicular view) . This core gpl20 comprises 25 b strands, 5 a helices and 10 defined loop segments, all organized with the topology shown in Fig. 29b. Specific spans of structural elements are given in Fig. 29d. The structure confirms the chemically determined disulfide bridge assignments (6; Fig. 29c) . The polypeptide chain of gpl20 is folded into two major domains plus certain excursions that emanate from this body. The inner domain (inner with respect to the N- and C-termini) features a two-helix, two-strand bundle with a small five-stranded 3-sandwich at its termini -proximal end and a projection at the distal end from which the V1/V2 stem emanates. The outer domain is a stacked double barrel that lies alongside the inner domain such that the outer barrel and inner bundle axes are approximately parallel .
The proximal barrel of the outer-domain stack is composed from a 6-stranded, mixed-directional /3-sheet that is twisted to embrace helix α2 as a 7th barrel stave. The distal barrel of the stack is a 7-stranded antiparallel β barrel. The two barrels share one contiguous hydrophobic core, and the staves also continue from one barrel to the next except at the domain interface. This interruption is centered at a side between barrels where the chain enters the outer domain with loop λB insinuated as a tongue between strands 316 and 523. The extended segment just preceding λB is like an 8th stave of the distal barrel, but it is slightly out of reach for hydrogen bonding with its β!6 and 319 neighbors. The chain returns to complete the inner domain after β24 .
The proximal end of the outer domain includes variable loops V4 and V5 and loops λD and λE, which are variable in sequence as well. Loop λC is also at this end, close in space to loop λA of the inner domain, although by topology it is at the other end of this domain. The distal end does include the stem of the excised variable loop V3 and also an excursion via loop λF into a β hairpin, (320-/321, which in turn hydrogen bonds with the V1/V2 stem emanating from the inner domain. This completes an antiparallel, 4-stranded "bridging sheet" that stands as a peculiar minidomain in contact with, but distinct from, the inner and outer domains as well as the excised V1/V2 domain. This bridging sheet also participates in the separated interactions of gpl20 with both CD4 and the 17b antibody (Fig. 28 and below) . One further excursion from the body of the outer domain produces strand ,315 and helix α3 , which are also important in CD4 binding.
Taken as a whole the structure of gpl20 seen here is novel. Moreover, our domain-level searches have failed to reveal similarity of the inner domain to any known atomic structures, although the missing terminal segments might conceal relationships. We do, however, find a fragmentary similarity for portions of the outer domain with known structures. In particular, part of the protomer of FabA dehydrase (27) is like part of the proximal barrel, and dUTP pyrophosphatase (28) has elements in common with both barrels of the outer domain. In each case the superimposable fraction is limited. For FabA, 45 of its 171 C-alpha atoms superimpose on five segments, but the rest are topologically unrelated. For dUTPase, 41 of its 152 C- alpha atoms appropriately capture 8 of the 15 segments in the outer domain body, but there is no helix corresponding to alpha-2 and the placements of termini are not comparable. Interestingly, several viruses related to HIV encode dUTPases; however, we have not found sequence evidence to support a possible role in coat protein evolution.
This structure of core gpl20 should be a prototype for the class. As shown in the structure-based alignment of representative sequences (Fig. 29d) , there is substantial conservation despite the noted variability among HIV strains. Thus, even an HIV-2 sequence is 35% identical with that of the HXBc2 strain expressed in this crystallized construct, and the .. identity level rises to 77% and 51%, respectively, for the more closely related HIV-1 clade C and clade 0 representatives. The inner domain is appreciably more conserved than the outer domain with 86%, 72% and 45% identity for the respective C, O and HIV-2 comparisons. Variability correlates with the degree of solvent exposure of residues (Fig. 29d) , in keeping with the conservation of hydrophobic cores. The seven disulfide bridges retained in core gpl20 are absolutely conserved and mostly buried
(Fig. 29c) . Glycosylation sites are all surface exposed and are conserved above average (Fig. 29d) . The previously identified HIV variable segments (5) are all on loops connecting elements of secondary structure, and loops λD and λE are also especially variable. Indeed, λE is more variable than V5 in light of current sequence data. These loops are also relatively mobile as reflected in high B factors or disorder, as in V4. Interestingly, variable segments in the outer domain, including the exposed face of α2 , appear to arise from neutral mutation rather than selective pressure since they are on non-immunogenic surfaces, presumably masked by glycosylation.
CD4-gpl20 interaction
CD4 is bound into a depression formed at the interface of the outer domain with the inner domain and the bridging sheet of gpl20 (Figs. 30a) . This interaction buries a total of 742 A°2 from CD4 and 802 A°2 from gpl20. The surface areas that are actually in contact are considerably smaller (Fig. 30d) because an unusual mismatch in surface topography creates large cavities that are occluded in the interface, as described below. There is, however, a general complementarity in electrostatic potential at the surfaces of contact, although the match is imprecise in this respect as well. The focus of CD4 positivity is displaced from the center of greatest negativity on gpl20 (Fig. 30c) . The binding site is devoid of carbohydrate (Fig. 30g) . The structure of CD4 in this complex differs only locally from that in free D1D2 structures and at only a few places: residues 17-20 at the poorly ordered CDRl-like loop and residues 41,42,47,49 and 60, which are at or near the contact site and have low B factors in the gpl20 -bound state.
Direct interatomic contacts are made between 22 CD4 residues and 26 gpl20 amino-acid residues. These include 219 van der Waals contacts and 12 hydrogen bonds . Residues in contact are concentrated in the span from 25 to 64 of CD4 , but they are distributed over six segments of gpl20 (Figs. 29d & 30i) : 1 residue from the V1/V2 stem, loop LD, the beta-15-alpha-3 excursion, the beta-20-beta-21 hairpin, strand beta-23 and the beta- 24-alpha-5 connection. These interactions are compatible with previous analyses of mutational data on both CD4(11, 12, 29) and gpl20(3, 13, 14). Other groups are also involved, including some at gpl20 sites that have not been tested, but residues identified as critical for binding do indeed interact with one another (Fig. 30e) . Most importantly, Phe 43 and Arg 59 of CD4 make multiple contacts centered on residues Asp 368, Glu 370 and Trp 427 of gpl20, which are all conserved among primate immunodeficiency viruses. In fact, 63% of all interatomic contacts come from one span (40-48) in CC" of CD4 , and Phe 43 alone accounts for 23% of the total. Similarly, with respect to gpl20, the spans of 365-371 and 425-430 contribute 57% of the total. Of the three CD4 lysine residues implicated in binding (residues 29, 35 & 46) , only Lys 29 makes a direct ionic hydrogen bond, and while Asp 457 of gpl20 is near to these electropositive groups (Figs. 30e & i) it does not make hydrogen bonds .
Several gpl20 residues that are covered by CD4 are variable in sequence. This variation is accommodated in part by the large interfacial cavity (Fig. 30e) . The gpl20 residues in contact with this water-filled cavity are especially variable (Fig. 30g) . Moreover, half of the gpl20 residues that make contacts with CD4 do so only through main-chain atoms (including Cβ) of gpl20, and 60% of CD4 contacts are made by gpl20 main-chain atoms (Fig. 3 Of ) . Included among these are 5 of the 12 hydrogen bonds in the interface. One such contributing element is an antiparallel β- sheet alignment of CD4 strand C" with gpl20 strand beta-15 (Figs. 30a & i) .
Atomic details of the interaction are particularly intricate and unusual for the contacts made between gpl20 and the mutationally critical CD4 residues Phe 43 and Arg 59 (Fig. 30j) . Arg 59 interacts with Asp 368 and Val 430. The carboxylate group of Asp 368 makes double hydrogen bonds with the guanidinium Nη atoms of Arg 59, but it also hydrogen bonds back to the backbone NH group of residue 44 and it appears to be optimally positioned to receive a CH...0 hydrogen bond (3.20 A) from the Phe 43 ring. Phe 43 interacts with residues Glu 370, He 371, Asn 425, Met 426, Trp 427 and Gly 473 as well as Asp 368, but only the contacts with He 371 have a conventional hydrophobic character. Those to 425-427 and 473, including Trp 427, are only to backbone atoms. A surprisingly large fraction of the Phe 43 contacts (28%) are to polar groups. The phenyl group is stacked on the carboxylate group of Glu 370, and there are contacts with the carbonyl oxygen atoms of residues 425, 426 and 473 and the NH group of Trp 427. Indeed, at a distance of 3.10 A, the phenyl contact with 0 425 is a second candidate CH...0 hydrogen bond. Asp 368 and Glu 370 have their carboxylate groups close together (3.54 A) and they are, of course, buried in the complex. Even for gpl20 excised from the complex, their fractional surface accessibilities are only 44% and 14%, respectively. Glu 370 may therefore be protonated. Perhaps the most extraordinary aspect of this site is the large cavity beyond Cζ of Phe 43 (Figs. 30b & h, and below) .
Interfacial cavities
Analysis of the solvent accessible surface of the ternary complex reveals a number of topologically interior surfaces or cavities. Two of these, both at the gpl20-CD4 interface, are unusually large. The larger (279 A3) is formed at the interface between the slightly concave middle of the CCC" portion of the CD4 sheet, and a groove on gpl20 where beta-23 and beta-24 are indented relative to beta-15 and the λD loop (Fig. 30e) . The second is from a pocket in the gpl20 surface that is plugged by Phe 43 from CD4 (152 A3) . This pocket is itself at the interface between the inner and outer domains of gpl20 (Fig. 30h) . Several other smaller cavities are also wedged at the interface between the two gpl20 domains.
The larger cavity is lined by mostly hydrophilic residues, half derived from gpl20 and half from CD4. It is not deeply buried; while formally a cavity in the crystal structure, minor changes in sidechain orientation would make it solvent accessible. The observed electron density and predicted hydrogen bonding are consistent with at least 8 water molecules in the cavity. Residues from gpl20 that actually line the cavity (including Ala 281, Ser 364, Ser 365, Thr 455, Arg 469) exhibit sequence variability, whereas surrounding this variable patch are conserved residues, the substitution of which affect CD4 binding. These include the critical contact residues Asp 368, Glu 370 and Trp 427, which flank one end of the..cavity, and Asp 457 at the other end (Fig. 30e) . Similarly, CD4 residues that line the cavity (e.g., Gin 40 and Lys 35) can be mutated with only moderate effect on gpl20 binding, whereas Arg 59 suffers less loss of solvent accessible surface upon gpl20 binding but is highly sensitive to mutation. This cavity thus serves as a water buffer between gpl20 and CD4 (Fig. 30e) . The tolerance for variation in the gpl20 surface associated with this cavity produces a variational island (Fig. 30g) , or "anti-hot spot", which is centrally located between regions required for CD4 binding, and may help the virus escape from antibodies directed against the CD4 binding site.
The "Phe 43" cavity (Fig. 30b & h) is very different in character from the larger binding- interface cavity. It is roughly spherical, with a diameter of ~8 A (atom center to atom center) across the center of the cavity. It is positioned just beyond Phe 43 of CD4 , at the intersection of the inner domain, the outer domain and the bridging sheet. It is relatively deeply buried, extending into the hydrophobic interior of gpl20. The phenyl ring of Phe 43 is the only non-gpl20 residue contacting this cavity, forming a lid which covers the bottom of the cavity (Fig. 30b) . Other routes of solvent access are possible: past Met 426 under the bridging sheet, or directly through the heart of gpl20, at the inner domain-barrel domain interface. Ordered water molecules demarking possible paths of solvent access are found along both routes. Nonetheless, in the cavity itself, only a few water molecules are observed. The center of the cavity is dominated by a large piece of spherical density, which is over 4 A from any protein atom (Fig. 30b) . The size, shape and predicted hydrogen bonding of this density is inconsistent with those expected for water, isopropanol, ethylene glycol, or any of the other major crystallization components. We have been unable to identify the source of this density.
Residues that line the Phe 43 cavity (side chains of Trp 112, Val 255, Thr 257, Glu 370, Phe 382, Tyr 384, Trp 427 and Met 475; main chains of 255-257 and 375-377) are primarily hydrophobic. They are also highly conserved, as much so as the buried gpl20 hydrophobic core. Despite a lack of steric hindrance, almost no substitutions to larger residues are found. Given the frequency of gpl20 sequence divergence, such conservation strongly implies functional significance. Indeed, although residues that line this cavity provide little direct contact to CD4 , they do nevertheless affect the gpl20-CD4 interaction. Thus, mutations at Thr 257 (no contacts) and Trp 427 (only main-chain contacts) can substantially reduce binding. Changes in cavity-lining residues also affect the binding of antibodies directed against the CD4 binding site. In addition, many of the residues that line the cavity interact with elements of the chemokine receptor binding region (see below) . It may be that the Phe 43 cavity and the other interdomain cavities form as a consequence of a CD4- induced conformational change (see below) .
Despite this unusual cavity- laden interface between gpl20 and CD4 interface, we believe that this structure reflects the true character of the interaction. Core gpl20 binds CD4 with essentially the same affinity (26) and residues identified as critical by mutational analysis on both components are indeed at the focus of contact in the structure. In any case, the missing loops and termini could not conceivably have a role in filling these cavities.
Antibody interface
The 17b antibody is a broadly neutralizing human monoclonal isolated from the blood of an HIV-infected individual. It binds to a CD4-induced (CD4i) gpl20 epitope that overlaps the chemokine receptor-binding site (20) .
Relative to other antibody-antigen pairs (Fig. 31a-c) , the interface between Fab 17b and core gpl20 in the ternary complex involves a small area of interaction. The solvent accessible area excluded upon binding is only 455 A2 from gpl20 and 445 2A from 17b, which is largely from the heavy chain (371A2) . The long (15 residue) complementarity-determining region 3 (CDR3) of the heavy chain dominates, but the heavy-chain CDR2 and the light-chain CDR3 also contribute. Overall, the 17b contact surface is very acidic (3 Asp, 3 Glu, no Arg or Lys) although hydrophobic contacts (notably a cis proline and tryptophan from the light chain) predominate at the center.
On gpl20, the 17b epitope lies across the base of the four-stranded bridging sheet (Fig. 31c & e) . All four strands make substantial contact with 17b, suggesting that the integrity of the bridging sheet is necessary for 17b binding. The gpl20 surface that contacts 17b consists of a hydrophobic center surrounded by a highly basic periphery (3 Lys, 1 Arg, and no Asp or Glu) (Fig. 31d) . Although this basic gpl20 surface complements the acidic 17b surface, only one salt bridge is observed (between Arg 419 of gpl20 and Glu 106 of the 17b heavy chain) . The rest of the specific contacts occur between hydrophobic and polar residues. Thus, the interaction between 17b and gpl20 involves a hydrophobic central region flanked on the periphery by charged regions, predominately acidic on 17b and basic on gpl20. There are no direct CD4-17b contacts and none of the gpl20 residues contacts both 17b and CD4. Rather, CD4 binds on the opposite face of the bridging sheet, providing specific contacts that appear to stabilize its conformation (Fig. 30i and 30j) and may explain in part the CD4- induction of 17b binding.
The 17b epitope is well conserved among HIV-1 isolates. Of the 18 residues that show loss in solvent accessible surface upon contact with 17b, 12 residues (67%) are conserved among all HIV-1 viruses. By contrast, only 19 of the 37 gpl20 residues (51%) that show loss of solvent accessible surface upon CD4 binding are similarly conserved. CD4i epitopes tend to be masked from immune surveillance by the adjacent V2 and V3 loops (see accompanying paper) . Indeed, in the complex structure, a large gap is seen between gpl20 and tips of the light-chain CDR1 and CDR2 loops. Pointing directly at this gap is the base of the V3 loop. In intact gpl20, the variable loops may need to be bypassed for access to the conserved structures in the bridging sheet . The 17b epitope may be further protected from the immune system by a CD4- induced conformational change (see below) .
Chemokine receptor site
The site of interaction with the chemokine receptor CCR5 overlaps with the 17b epitope (30) . Both are induced upon CD4 binding and both involve highly conserved residues. By mutational analysis, the basic and polar gpl20 residues (Lys 121, Arg 419, Lys 421, Gin 422) that contact the 17b heavy chain also are important for CCR5 interaction (30) . The hydrophobic and acidic surface of the 17b heavy chain may mimic the tyrosine-rich, acidic N-terminal region of CCR5 , which is important for gpl20 binding and HIV-1 entry (31, 32) . Geometrically, this site is directed at the cellular membrane when gpl20 is engaged by CD4. Electrostatic interactions between the basic surface of the bridging sheet and the acidic chemokine receptor (and possibly the acid headgroups in the target membrane) could drive conformational changes related to virus entry.
Oligomer and gp41 interactions
Although monomeric in isolation, gpl20 likely exists as a trimeric complex with gp41 on the virion surface. The large electroneutral surface on the inner domain (Fig. 30c) is the probable site of trimer packing based on its lack of glycosylation, its conservation in sequence, the location of CD4 and CCR5 binding sites, and the immune response to this region. These points are elaborated in the accompanying paper and a model is presented (24) .
A large body of mutagenic and antibody-binding analyses suggest that the N- and C-termini of full-length gpl20 are the most important regions for interaction with the gp41 glycoprotein (33, 34) . From these analyses, we expect that gp41 interactive regions will extend away from core gpl20 toward the viral membrane, and that the conserved, electroneutral surface is occluded in the oligomer/gp41 interface. A similar arrangement is seen in influenza hemagglutinin, where the extended N- and C-termini of HAX interact with the HA^ transmembrane protein (35) .
Conformational change in core gpl20
There is abundant evidence to suggest that CD4 binding induces a conformational change in gpl20. Much of this evidence, however, derives from intact gpl20 with variable loops in place or from the oligomeric gpl20:gp41 complex. The ternary complex structure provides clues to conformational changes within core gpl20 itself. (Although 17b binding could contribute to the gpl20 conformation observed in the crystal, the CD4 contacts are much more extensive and multifaceted than those of 17b. These observations argue that CD4 binding plays the major role in the formation of the observed gpl20 structure.)
Were the conformation of gpl20 seen here preserved in the absence of CD4 , the Phe 43 cavity (now a pocket) would present a perplexing structural dilemma. As discussed above, the cavity-lining residues have few structural restrictions, with ample room for larger substitutions into the cavity, yet these residues are highly conserved and inexplicably hydrophobic if exposed in a pocket. This pocket structure is in turn intimately connected to the bridging sheet, itself peculiar in absence of CD4. Thus, for example, the backbone amide of bridging- sheet residue 425 is hydrogen-bonded to Glu 370, a critical CD4 contact residue (Fig. 30j); He 424 makes extensive hydrophobic contacts with Phe 382, which lines the pocket from the outer domain; and Trp 427 packs perpendicular to Trp 112, which lines the pocket from the inner domain (Fig. 30b) . N£ of Trp 427 is delicately poised for hydrogen-bonding with the τr-electrons of the indole ring of Trp 112. Structures such as these would necessarily be very sensitive to orientational shifts between the inner and outer domains .
The characteristics of 17b binding to core gpl20 provide additional evidence for a CD4- induced conformational change. We do not observe detectable binding of Fab 17b to core gpl20 unless CD4 is present, but then the ternary complex is stable in gel filtration. Since there are no direct CD4-17b contacts in the structure, the effect of CD4 must be to stabilize the bridging-sheet minidomain to which 17b binds. This result is compatible with the binding properties of 17b and other CD4i antibodies to full-length gpl20(18) (see accompanying paper) , but it shows that the conformational change is not limited to an unmasking of the antibody epitope by CD4-induced of the V2 loop, as initially thought (36) . The ability of the 17b antibody to bind full-length gpl20 in the absence of CD4 , albeit at a lower level, implies that structural elements required for 17b binding can be accessed in the absence of CD4. If we assume that 17b binds in the same way to both full-length and core gpl20, as shown by the concordance between the structural contacts (Fig. 31) and epitope mapping data, this suggests that alternative conformations are in a kinetically accessible equilibrium in native gpl20.
A further indication that core gpl20 may differ in the absence of CD4 comes from comparison with theory. When applied to the many known sequence variants of gpl20, the evolutionary algorithm of PHD37 gives secondary-structure predictions with 90% estimated reliability for roughly 45% of the core gpl20 sequence. Compared to our structure, it is accurate except at three places where it is markedly wrong (four consecutive residues with reliability index greater than 90%) . All of these are at the Phe 43 cavity or in contacts with CD4 : loop λB, strand 315, and the segment of 320 into the turn to ,321. (Fig. 30h) . Most significantly, the latter segment (residues 422-429) entering the bridging sheet is predicted to be helical. Indeed, residues 427-428 at the 320-4321 turn do have helical character. We also note that CD4 binds efficiently to a gpl20 derivative with both 32 and 33 truncated (38 ) . Since the bridging sheet is most likely not stable in the absence of half its strands, CD4 binding must possess the ability to properly orient strands 320 and ,321 from a very different prior conformation .
The Phe 43 cavity is at the nexus of the CD4 interface, between the inner domain, the outer domain, and the bridging sheet. As such, Phe 43 itself seems to serve as a keystone without which the structure might collapse. If so, to what state and, in reverse, how does CD4 binding lead to the state seen in this ternary complex? Certainly, it is clear that CD4-gpl20 binding kinetics are complex(39), and microcalorimetric analysis reveals unusually large ΔH and compensating TΔS values for soluble CD4 binding to gpl20 (M. L. Doyle, personal communication) . These exceptional CD4 -binding thermodynamics imply a large conformational change and are similar for both full-length and core gpl20, which further supports the relevance of the structural observations on core gpl20. We imagine that CD4 sees gpl20 as an uneven equilibrium of conformational states, makes initial contact through electrostatic interactions
(Fig. 30c), stabilizes a nascent complex state, and inserts the Phe 43 to induce formation of the Phe 43 cavity.
Viral evasion of immune surveillance
Analysis of the antigenic structure of gpl20 shows that most of the envelope protein surface is hidden from humoral immune responses by glycosylation and oligomeric occlusion (accompanying paper) . Most broadly neutralizing antibodies generally access only two surfaces, one which overlaps the CD4 binding site (shielded by the V1/V2 loop) and the other which overlaps the chemokine receptor binding site (shielded by the V3 loop) . Conformational changes in core gpl20 provide additional mechanisms for evasion from immune surveillance. In the case of the CD4-binding surface, which contains a high proportion of mainchain atoms in the complex (Fig. 30f ) , the conformation without CD4 bound may expose underlying sidechain variability (Fig. 30g) . Escape may also be provided by the recessed nature of the binding pocket (steric occlusion) (Fig. 30a) and by a topographical surface mismatch, which encloses a variational island or "anti-hot spot" (described above, Fig. 30d) . Similar mechanisms may be found in the chemokine receptor region: conformational change may hide the conserved epitope (unformed prior to CD4 binding) ; steric occlusion may take place between the CD4 anchored viral spike and the proximal target membrane; and an "anti-hot spot" equivalent may camouflage chemokine-receptor binding residues on the V3 loop in surrounding variability. Some of the defenses used to elude antibody-based responses may also help HIV avoid cellular immunity. Understanding the specific gpl20 mechanisms of immune evasion may be prerequisite to the design of effective prophylaxis.
Mechanistic implications for virus entry
During virus entry, the HIV surface proteins function to fuse the viral membrane with the target cell membrane. The gpl20 glycoprotein plays roles crucial to the control and initiation of fusion. One set of roles concerns positioning: locating a cell capable of productive viral infection, anchoring the virus to the cell surface, and orienting the viral spike next to the target membrane. Another set concerns timing: holding the gp41 in a metastable conformation and triggering the coordinate release of the three N-terminal fusion peptides of the trimeric gp41. While it is clear that this is a complex multi-conformational process, the simplicity of the system, composed only of two membranes, the viral oligomer, and two host receptors, raises the possibility that we may be able to understand the entire mechanism. Crystallography has now provided two snapshots: an intermediate state in which gpl20 is bound to CD4 , described herein; and a probably final, "fusion-active" state of the gp41 ectodomain (40,41) . Although precise structural information is lacking for other intermediates, the vast biochemical data concerning the membrane fusion process mediated by the HIV-1 envelope glycoproteins allow us to extend our understanding from these two states .
The entry process is initiated by the binding of HIV-1 to the cellular receptor CD4 (Fig. 32, step 1) . Although the extracellular portion of CD4 has some segmental flexibility, this binding roughly orients the viral spike. This orientation can be simulated by an alignment of the D1D2 CD4 in the ternary complex with the previously solved structure of the four-domain, entire extracellular portion of CD4(10) . Such alignment orients the N- and C- termini of core gpl20 towards the viral membrane, while the 17b epitope/chemokine receptor-binding site on the gpl20 surface faces the target cell membrane. Such an orientation is consistent with the proposed oligomeric structure and gp4l-interactive surfaces described above.
CD4 binding also induces conformational changes in gpl20, which result in the creation of a metastable oligomer. Although some of the more flexible gpl20 regions and gp41 are missing, the structure of the core gpl20-CD4 complex presented here describes this state in atomic detail . CD4 binding results in movement of the V2 loop, which numerous experiments suggest partially occludes the V3 loop and CD4i epitopes (18, 36) . It also creates, or at least stabilizes, the bridging sheet on which these epitopes are located (described above for the core) . In addition, CD4 binding results in changes in the conformation of the V3 region, with the tip of the loop becoming more accessible, as judged by enhanced proteolytic susceptibility and altered exposure of V3 epitopes (19) . The V3 loop together with the uncovered epitopes comprise the chemokine-receptor binding site. Thus, CD4 binding not only orients the gpl20 surface implicated in chemokine receptor binding to face the target cell, but it also forms and exposes the site itself. We note that these changes may all result from a single, concerted shift in the relative orientation of the inner and outer domains. This conformational shift may alter the orientation of the N- and C- termini, at the proximal end of the inner domain, perhaps partially destabilizing the oligomeric gpl20/gp41 interface (21) . Such a shift would also alter the relative placement of the V1/V2 stem (in the CD4i site) , which emanates from the inner domain, and the V3 loop, which emanates from the outer domain. Interestingly, mutations that permit an adaptation of HIV-1 to CD4-independent entry using CXCR4 involve sequence changes in both the V1/V2 stem and the V3 loop (42) .
The next step in HIV-1 entry is the interaction of the gpl20-CD4 complex with the chemokine receptor (Fig. 32, step 2) . Although interactions between CD4 and chemokine receptor may occur, mutagenic analyses (H.
Choe and J. Sodroski, unpublished observations) and the known examples of CD4- independent virus entry or chemokine-receptor binding suggest that direct gpl20 contacts dominate in the interaction with the chemokine receptor. Since most of the chemokine receptor is encased in the host membrane, binding would necessarily move the gpl20 bridging sheet close to the target membrane. This movement requires CD4 flexibility since the initial HIV binding at the N-terminal DI domains probably occurs above the glycocalyx. Reducing flexibility at the D2-D3 juncture or at the D4-membrane juncture of CD4 has been shown to block HIV-1 entry (10, 43) .
Chemokine-receptor binding is believed to trigger additional conformational changes in the HIV-1 envelope glycoprotein trimer which lead to exposure of the gp41 ectodomain. Presumably, a signal is transmitted from the cell-associated distal end of gpl20 to elements of the inner domain that are likely to be involved in gpl20-gp41 or gpl20-gpl20 association on the trimer. Although further inter-domain shifts may occur in core gpl20 after chemokine-receptor binding, the geometrically specific contacts that support the bridging sheet make it unlikely that another shift could occur without destabilizing this important component of the chemokine-receptor binding site. Since the high affinity of interaction makes it likely that both CD4 and chemokine receptor remain bound to gpl20 during fusion, we expect that additional conformational changes probably occur between neighboring gpl20 protomers in the oligomeric complex. Perhaps the chemokine receptor triggers gp41 exposure by prising gpl20 protomers away from the trimer axis thus exerting a torque on the gpl20-gp41 interface. In this regard it is interesting that several of the substitutions that affect chemokine-receptor binding in the context of monomeric gpl20 appear to induce gpl20 dissociation in an oligomeric context (30) .
The structure of the gpl20/CD4/l7b antibody ternary complex described here reveals some of the molecular aspects of HIV-1 entry, including the atomic structure of gpl20, the explicit interactions with CD4 , and the conserved site of binding for the chemokine receptor. Still unknown are details of the apo state of core gpl20, the oligomeric structure, the interaction with the chemokine receptor, the conformational changes that trigger the reorganization of the gp41 ectodomain and the structural basis for insertion of the fusion peptide of gp41 into the target membrane. Further understanding will require snapshots of other intermediates. The conformational complexity and observed intricate domain associations of gpl20, like those of reverse transcriptase (44) , the other large HIV translation product, may reflect genome restrictions at the protein level akin to those that lead to overlapping reading frames at the transcription level. Multiply protected infection machinery is contained in these condensed intricacies. Its mechanisms frustrate host defenses ; understanding them may inspire medical intervention.
Methods
Protein production, crystallization, and data collection. The two-domain CD4 (D1D2, residues 1-182) was produced in Chinese hamster ovarian cells (8), the monoclonal antibody 17b in an Epstein-Barr virus immortalized B-cell clone isolated from an HIV-1 infected individual and fused with a murine B-cell fusion partner(18), and the core gpl20 from Drosophila Schneider 2 lines under control of an inducible metallothionein promoter (20) . The various biochemical manipulations (e.g. deglycosylation for the gpl20 and papain digestion to produced the Fab 17b) , protein purification, and ternary complex crystallization are described elsewhere (25) . The best crystals were small needles of cross-section only 30-40 μm. These were crosslinked with vapor diffusion glutaraldehyde treatment (C J. Lusty, personal communication), equilibrated with cryoprotectant containing stabilizer (10% ethylene glycol with 10.5% monomethyl-PEG 5,000, 10% isopropanol, 50 mM NaCl, 100 mM Citrate/HEPES buffer pH 6.3), transferred into immiscible oil (Paratone-N; Exxon) , suspended in a small ethylene loop at the end of a mounting pin, and flash-frozen in a cryostat nitrogen stream at 100 K .
Diffraction data were collected at beamline X4A, Brookhaven National Laboratory, using phosphor image plates and a Fuji BAS2000 scanner. To avoid overlap problems from the relatively high mosaicity (-1.0°), oscillation data were collected using a rotation axis that was off-set at least 30° from the 197A c axis. Although crystals initially diffracted to Bragg spacing of greater than 2A, β axis mosaicity and substantial radiation damage despite cryogenic cooling reduced the overall resolution to -2.5A. Data processing and reduction were performed using DENZO and SCALEPACK (45) (Table 1) .
Structure determination and refinement . To locate the position of the Fab 17b in the ternary complex crystals, rotational searches with 52 different Fab models were made with the program MERLOT (P. M. Fitzgerald) . The Fabs were aligned by superposition of their variable domains to allow comparison of rotational solutions. Even though four models showed greater than 10% discrimination between highest and second highest solutions, no consistent rotational solution was found. Discrimination between correct and incorrect solutions was achieved by using confirmatory searches with the variable portion of the Fab. This was successful with only one model, molecule B of lhil. Rigid body refinement of the lhil solution (XPLOR(46)), allowing each immunoglobulin domain to move independently, produced a Patterson correlation of 24.9%. To locate the position of the two-domain CD4 , each of the top 100 possible rotational solutions with each of three different CD4 models (lcdi, lcdh, 3cd4), were searched for a distinctive translation solution (AMoRe; J. Navaza) . The translation searches used the rigid body refined Fab as a partial structure to help discriminate the correct solution. Two distinctive solutions were found: the 25th rotational solution of 3cd4 gave a translation correlation of 0.171 (verses 0.128 for the second highest translation solution) , and the 61st rotational solution of lcdh gave 0.149 (verses 0.140). These two solutions were virtually identical. Rigid body refinement in XPLOR(46) gave a Patterson correlation of 7.9% for the CD4 alone and 32.4% for the Fab and CD4. All molecular replacement and rigid body refinements used 8-4A data.
To provide additional phasing, crystals were soaked in over 20 different heavy atom solutions and screened for isomorphous replacement using the statistical <chi>2 test in SCALEPACK (45) . Derivatives were identified from two heavy atom compounds : 10 mM K3IrCl6 (10 hr equilibration in heavy atom containing cryoprotectant stabilizer; 2.8A) and 5 mM K20sCl6 (24 hr soak; 3.5A) .
Isomorphism was found to be highest between these heavy atom data sets and a native data set collected at pH 7.0
(cryoprotectant stabilizer buffered with 50 mM BisTris pH 7.0) . Heavy atom sites were identified by difference Fourier analysis using the molecular replacement phases, and phasing parameters were refined with MLPHARE (in the CCP4 suite of crystallographic programs) . The K3IrCl6 derivative was modeled as 9 partially occupied sites; two sites of occupancy 0.158 and 0.142, and 7 of less than 0.07. While relatively isomorphous, poor data quality (Rsym of greater than 20% past 3.0A) combined with relatively small isomorphous differences (Riso of 12.0%) reduced the quality of phasing. In contrast, the K20sCl6 derivative had an Riso of 15.6%, but was only isomorphous to roughly 5A. It was modeled as 4 sites of occupancy 0.321, 0.207, 0.194 and 0.128, with the highest site at the same position as the second highest site from K3IrCl6.
The initial combination of model and isomorphous replacement phasing did not produce readily interpretable density for gpl20. In order to monitor efforts at phase improvement, we devis.ed an objective assay of density quality that used correlations in a region internal to domain 1 of CD4 between the experimental electron density and the calculated model density (CD4 as positioned by molecular replacement and rigid body refinement) . Refinement of heavy atom positions improved this correlation, and provided a starting point for phase improvement, primarily using real-space modification techniques (Table 1) . These techniques included automatic concatenation of the unmodeled density (with the program PRISM; D. Agard) , reciprocal -space averaging of the PRISM modeled density and real-space model subtraction (implemented using the XPLOR(46) shell language), application of real-space constraints such as solvent flattening, histogram matching and negative density truncation (with the program DM (in the CCP4 suite of crystallographic programs) , and real-space combinatorial addition of the various experimental density maps (with the program MAPMAN; T.A. Jones) . The combinatorial use of these techniques generated greatly improved electron density maps .
At this point, most of the carbon alpha backbone could be modeled (with the program O47) defining the secondary structure. Computer aided sequence alignment (slider routine in 0) and secondary structure prediction (PHD37) helped to position the amino acid sequence leaving only regions around the N-terminus (residues 79-100 and residues 215-245) , the V1/V2 loop, and the V4 loop uncertain. Iterative rounds of building with 0, simulated annealing and positional refinement with XPLOR(46), and addition of ordered solvent clarified the trace .
Structure analysis. Deviations of the CD4 structure in the complex from the free state were measured by the procedure of Wu et al. (10) . Deviations were taken as significant when the root mean square (rms) residue deviation was greater than the overall value and also more than 0.5ϋ greater than variation among the free structures. Interatomic contacts were defined as in Zhu et . al . (48) . Structural alignments were made by visual comparison of the SCOP databas, and automatic searches were performed with PrISM (A.-S. Yang and B. Honig) .
References for the Third Series of Experiments
1. Barre-Sinoussi , F., et al . Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immunodeficiency syndrome (AIDS) .
Science 220, 868-871 (1983) .
2. Gallo, R.C., et al . Frequent detection and isolation of cytopathic retroviruses (HTLV-III) from patients with AIDS and at risk for AIDS.
Science 224, 500-503 (1984) .
3. Kowalski, M.L., et al . Functional regions of the envelope glycoprotein of human immunodeficiency virus type 1. Science 237, 1351-1355 (1987) .
4. Lu, M., Blackow, S. & Kim, P. A trimeric structural domain of the HIV-1 transmembrane glycoprotein. Nature Structural Biol. 2, 1075-1082 (1995) .
Starcich, B.R., et al . Identification and characterization of conserved and variable regions of the envelope gene HTLV-III/LAV, the retrovirus of AIDS. Cell 45, 637-648 (1986) .
6. Leonard, C.K., et al . Assignment of intrachain disulfide bonds and characterization of potential glycosylation sites of the type 1 recombinant immunodeficiency virus envelope glycoprotein (gpl20) expressed in Chinese Hamster ovary cell. J.
Biol. Chem. 265, 10373-10382 (1990) .
7. Profy, A.T., et al . Epitopes recognized by the neutralizing antibodies of an HIV-1-infected individual. J Immunol. 144, 4641-4647 (1990) .
8. Ryu, S.-E., et al . Crystal structure of an HlV-binding recombinant fragment of human CD4. Nature 348, 419-426 (1990) .
9. Wang, J.H., et al . Atomic structure of a fragment of human CD4containing two immunoglobulin- like domains. Nature 348, 411-418 (1990) .
10. Wu, H., Kwong, P.D. & Hendrickson, W.A. Dimeric association and segmental variability in the structure of human CD4. Nature 387, 527-530 (1997) .
11. Moebius, U. , Clayton, L. , Abraham, S., Harrison, S. & Reinhertz, E. The human immunodeficiency virus gpl20 binding site of CD4 : Delineation by quantitative equilibrium and kinetic binding studies of mutants inconj unction with a high-resolution CD4 atomic structure. J. Exp. Med. 176, 507-517 (1992) .
12. Sweet, R.W, Truneh, A. & Hendrickson, W. A. CD4 : its structure, role in immune function and AIDS pathogenesis, and potential as a pharmacological target. Curr. Opin. Biotech. 2, 622-633 (1991) .
13. Olshevsky, U. , et al . Identification of individual HIV-1 gpl20 amino acids important for CD4 receptor binding. J. Virol. 64, 5701-5707 (1990) .
14. Cordonnier, A., Montagnier, L. & Emerman, M. Single amino acid changes in HIV envelope affect viral tropism and receptor binding. Nature 340, 571-574
(1989) .
15. Moore, J.P. Coreceptors : implications for HIV pathogenesis and therapy. Science 276, 51-52 (1997) .
16. Feng, F., Broder, C C , Kennedy, P.E. & Berger, E.A. HIV-1 entry co- factor: functional cDNA cloning of a seven- transmembrane, G protein-coupled receptor. Science 272, 872-877 (1996) .
17. Speck, R.F., et al . Selective employment of chemokine receptors as human immunodeficiency virus type 1 coreceptors determined by individual amino acids in the envelope V3 loop. J. Virol. 71, 7136-7139 (1997) .
18. Thali, M., et al . Characterization of conserved human immunodeficiency virus type 1 (HIV-1) gpl20 neutralization epitopes exposed upon gpl20-CD4 binding. J. Virol. 67, 3978-3988 (1993) .
19. Sattentau, Q.J., Moore, J.P., Vignaux, F., Traincard, F. & Poignard, P. Conformational changes induced in the envelope glycoproteins of human and simian immunodeficiency virus by soluble receptor binding. J. Virol. 64, 7383-7393 (1993) .
20. Wu, L., et al . CD4- induced interaction of primary HIV-1 gpl20 glycoproteins with the chemokine receptor CCR-5. Nature 384, 179-183 (1996) .
21. Moore, J.P., McKeating, J.A. , Weiss, R.A. & Sattentau, Q.J. Dissociation of gpl20 from HIV-1 virions induced by soluble CD4. Science 250, 1139-1142 (1990) .
22. Builough, P. A., Hughson, F.M., Skehel, J.J. & Wiley, D.C. Structure of influenza haemagglutinin at the pH of membrane fusion. Nature 371, 37-43 (1994) .
23. Fass, D., et al . Structure of a murine leukemia virus receptor-binding glycoprotein at 2.0 anstrom resolution. Science 277, 1662-1666 (1997) .
24. Wyatt, R., et al . The antigenic structure of the human immunodeficiency virus gpl20 envelope glycoprotein. Nature , submitted (1998) .
25. Kwong, P.D., et al . Quantitative probablility analysis and variational crystallization of gpl20, the exterior envelope glycoprotein of the human immunodeficiency virus type 1 (HIV-1) . J. Biol.
Chem. , submitted (1998) .
26. Binley, J.M. , et al . Analysis of the interaction of antibodies with a conserved, enzymatically deglycosylated core of the HIV-1 gpl20 envelope glycoprotein. AIDS Res. Hum. Retroviruses 14, 191-198 (1997) .
27. Leesong, M., Hederson, B.S., Gillig, J.R., Schwab, J.M. & Smith, J.L. Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids: two catalytic activities in one active site. Structure 4, 253-264 (1996) .
28. Cedergren-Zeppezauer, E.S., Larsson, G. , Nyman, P.O., Dauter, Z. & Wilson, K.S. Crystal structure of a dUTPase. Nature 355, 740-743 (1992) .
29. Ryu, S.-E., Truneh, A., Sweet, R.W. & Hendrickson, W.A. Structures of an HIV and MHC binding fragment from human CD4 as refined in two crystal lattices. Structure 2, 59-74 (1994) .
30. Rizzuto, C, et al . Identification of a conserved human immunodeficiency virus gpl20 glycoprotein structure important for chemokine receptor binding. Science , submitted (1998) .
31. Dragic, T., et al . Amino-terminal substitutions in the CCR5 coreceptor impair gpl20 binding and human immunodeficiency type 1 entry. J. Virol. 72,
279-285 (1998) .
32. Farzan, M., et al . A tryosine-rich region in the N-terminus of CCR5 is important for human immunodeficiency virus type 1 entry and mediates an association between gpl20 and CCR5. J. Virol. 72, 1160-1164 (1998) .
33. Wyatt, R., et al . Analysis of the interaction of the human immunodeficiency virus type 1 gpl20 envelope glycoprotein with the gp41 transmembrane glycoprotein. J. Virol. 71, 9722-9731 (1997) .
34. Helseth, E., Olshevsky, U. , Furman, C & Sodroski, J. Human immunodeficiency virus type 1 gpl20 envelope glycoprotein regions important for assiciation with the gp41 transmembrane glycogprotein. J. Virol. 65, 2119-2123 (1991) .
35. Wilson, I. A., Skehel, J.J. & Wiley, D.C. Structure of the haemagglutinin membrane glycoprotein of influenza virus at 3 A resolution. Nature 289, 366-373 (1981) .
36. Wyatt, R. , et al . Involvement of the V1/V2 variable loop structure in the exposure of human immunodeficiency virus type 1 gpl20 epitopes induced by receptor binding. J. Virol. 69, 5723-5733 (1995) .
37. Rost, B., Sander, C. & Schneider, R. PHD -- an automated mail server for protein secondary structure prediction. Comput . Appl . Biosci . 10, 53-60 (1994) .
38. Wyatt, R., et al . Functional and immunologic characterization of human immunodeficiency virus type 1 envelope glycoproteins containing deletions of the major variable regions. J. Virol. 67, 4557-4565 (1993) .
39. Wu, H., et al . Kinetic and structural analysis of mutant CD4 receptors that are defective in HIV gpl20 binding. Proc. Natl. Acad. Sci. USA 93, 15030-15035 (1996) .
40. Chan, D.C, Fass, D., Berger, J.M. & Kim, P.S. Core structure of gp41 from the HIV envelope glycoprotein. Cell 89, 263-273 (1997) .
41. Weissenhorn, W. , Dessen, A., Harrison, S.C, Skehel, J.J. & Wiley, D.C. Atomic structure of the ectodomain from HIV-1 gp41. Nature 387, 426-430 (1997) .
42. Dumonceaux, J. , et al . Spontaneous mutations in the env gene of the human immunodeficiency virus type 1 NDK isolate are associated with a CD4 - independent entry phenotype. J. Virol. 72, 519-519 (1998) .
43. Moir, S., Perreault, J. & Poulin, L. Postbinding events mediated by human immunodeficiency virus type 1 are sensitive to modifications in the D4 -transmembrane linked region of CD4. J. Virol. 70, 8019-8028 (1996) .
44. Kohlstaadt, L.A., Wang, J., Friedman, J.M., Rice, P. A. & Steitz, T.A. Crystal structure at 3.5ύ resolution of HIV-1 reverse transcriptase complexed with an inhibitor. Science 256, 1783-1790 (1992) .
45. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode.
Methods Enymol. 276, 307-326 (1997) .
46. Brunger, A.T. (Yale University, New Haven, 1993) .
47. Jones, T.A., Zou, J.Y., Cowan, S.W. & Kjeldgaard, M. Improved methods for binding protein models in electron density in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110-119 (1991) .
48. Zhu, X., et al . Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272, 1606-1614 (1996) .
49. Carson, M. Ribbons 2.0. J. Appl . Crystallogr. 24, 958-961 (1991) .
50. NichoUs, A., Sharp, K.A. & Honig, B. Protein folding and association: insight from the interfacial and thermodynamic properties of hydrocarbons. Proteins Struct. Funct . Genet. 11,
281-296 (1991) .
Table . Structure solution
Data Collection:
Native K3IrCls K,OsClc
Resolution limits (A) 20-2.5 20-2.8 20-3.5 Total observations 113,966 76, 739 25,821 Unique Observations 37, 724 28,599 11,982 Rsym (%)*t 9.3 (24.7) 11.5 (20.2) 14.3 (18.2) Data coverage (%)+ 86.0 (62.8) 90.8 (82.9) 72.5 (62.5)
Molecular Replacement :
Fab CD4 Fab+CD4
Model lhil 3cd4/lcdh lhil+ lcdh
Scattering (%)H 43 18 61 Rigid-body correlation* 0.249 0.079 0.325
Generation of experimental electron density:
Phasing Procedure Correlation coefficient
Molecular replacement (MR) -0.02
Multiple isomorphous replacement (MIR) 0.34
Phase combination:
MIR + MR 0.60
+ density modification 0.66
+ density modification + substraction 0.69 Density modelling (concatenation) :
MIR + MR 0.65
+ density modification 0.68
+ density modification + subtraction 0.71 Combination map addition: 0.73
Refinement Statistics :
R-factors (10-2.5 A) : Data cutoff (σ) Rcrystal (Rfree) (%) 24.9 (32 22.2 (30.7) 21.2 (29.2) Data completenes (%) 85.8 77.3 66.4
Geometric parameters (rms) Bond length (A) 0.007 Bond angle (°) 1.59°
B-factors : average rms bond r s angle mainchain 20. .80 1.33 2.31 sidechain 21. .93 1.97 3.01 waters 22, .31
* Rsym = ∑'Iobs-Iavg| /EJavg t Numbers in parentheses represents the statistics for the shell comprising the outer 10% (theoretical) of the data. 1 The percentage of scattering of the initial search model .
# Correlation obtained upon rigid-body refinement of the model against 8-4 ύ data.
Φ Correlation in the DI region of CD4 between the experimental electron density and the calculated model density (from CD4 as positioned by molecular replacement) using 10-2.8 ύ data. Correlations in this region (consisting of -6000U3) were used to generate a quantitative measure of the overall quality of the ternary complex experimental electron density. For the purposes of these calculations, the model used for phase combination omitted DI . A correlation of 0.6 is roughly the level of an interpretable protein electron density map, while a well refined structure would give a correlation of about 0.9.
Fourth Series of Experiments
Human immunodeficiency virus (HIV-1) establishes persistent infections in humans leading to the acquired immune deficiency syndrome (AIDS) . The HIV-1 envelope glycoproteins, gpl20 and gp41, are assembled into a trimeric complex that mediates virus entry into target cells (1) . HIV-1 entry depends upon the sequential interaction of the gpl20 exterior envelope glycoprotein with the receptors on the cell, CD4 and members of the chemokine receptor family (2-4). The gpl20 glycoprotein, which can be shed from the envelope complex, elicits both virus-neutralizing and non-neutralizing antibodies during natural infection. Antibodies that lack neutralizing activity are often directed against the gpl20 regions occluded on the assembled trimer and exposed only upon shedding (5,6) . Neutralizing antibodies, by contrast, must access the functional envelope glycoprotein complex (7) and typically recognize conserved or variable epitopes near the receptor-binding regions (8-11) . Here, we describe the spatial organization of conserved neutralization epitopes on gpl20, utilizing epitope maps in conjunction with the X-ray crystal structure of a ternary complex that includes a gpl20 core, CD4 and a neutralizing antibody (12) . A large fraction of the predicted accessible surface of gpl20 in the trimer is composed of variable, heavily glycosylated core and loop structures that surround the receptor-binding regions. Understanding the structural basis for the ability of HIV-1 to evade the humoral immune response should assist vaccine design.
In primary sequence, human and simian immunodeficiency virus gpl20 glycoproteins consist of five variable regions (VI-V5) interposed among mor e conserved regions
(13) . Variable regions V1-V4 form exposed loops anchored at their bases by disulfide bonds (14) .
Neutralizing antibodies recognize both variable and conserved gpl20 structures. The V2 and V3 loops contain epitopes for strain-restricted neutralizing antibodies (15-17) . More broadly neutralizing antibodies recognize discontinuous, conserved epitopes in three regions of the gpl20 glycoprotein (Table 1) . In HIV-1 infected humans, the most abundant of these are directed against the CD4 binding site (CD4BS) and block gpl20-CD4 interaction (8,9) . Less common are antibodies against epitopes induced or exposed upon CD4 binding (CD4i) (18) . Both CD4i and V3 antibodies disrupt the binding of gpl20-CD4 complexes to chemokine receptors (10, 11) . A third gpl20 neutralization epitope is defined by a unique monoclonal antibody, 2G12, (19) which does not efficiently block receptor binding (11) .
In an accompanying article, (12) we report the X-ray crystal structure of an HIV-1 gpl20 core in a ternary complex with two-domain soluble CD4 and the Fab fragment of the CD4i antibody, 17b. The gpl20 core lacks the V1/V2 and V3 variable loops, as well as N- and C- terminal sequences, which interact with the gp41 glycoprotein, (6) and is enzymatically deglycosylated
(12,21). Despite these modifications, the gpl20 core binds CD4 and antibodies against CD4BS and CD4i epitopes
(21, 22) and thus retains structural integrity. The gpl20 core is composed of an inner domain, an outer domain and a third element, the "bridging sheet" (12) (Figure 34a) . All three structural elements contribute, either directly or indirectly, to CD4 and chemokine receptor binding (12) . Here, the organization of the surface of the gpl20 is analyzed in light of the known antibody responses directed against this exposed viral glycoprotein . Variability and glycosylation of the (P120 surface
Although generally well-conserved compared with the five variable regions, some variability in the surface of the gpl20 core is evident when the sequences of all primate immunodeficiency viruses are analyzed. This variability is disproportionately associated with the surface of the outer domain proximal to the V4 and V5 regions and removed from the receptor-binding regions (Figure 34a, b,c) . The LA, Lc, and L surface loops (12) contribute to the variability of this surface. The potential N-linked glycosylation sites present in the gpl20 core are concentrated in this variable half of the protein (Figure 34, b and c) . In fact, the only conserved residues apparent on this relatively variable surface are asparagine 356 and threonine/serine 358, which constitute a complex carbohydrate addition site within the LE loop (Figure 34, b and c) . Since most carbohydrate moieties may appear as "self" to the immune system, the extensive glycosylation of the outer domain surface may render it less visible to immune surveillance. This helps to explain why antibodies directed against this gpl20 surface have been identified so infrequently.
The receptor-binding regions retained in the gpl20 core are well -conserved among primate immunodeficiency viruses (12) . Also highly conserved is the surface of the inner domain spanned by the αl helix and located opposite the variable surface described above (Figure 34d) . This surface is likely to interact with gp41 and/or with N-terminal gpl20 segments absent from the gpl20 core. This inner domain surface and the receptor- binding regions are devoid of glycosylation.
Conserved gp!20 neutralization epitopes In conjunction with prior mutagenic and antibody competition analyses (5,6, 18-21), the gpl20 core structure reveals for the first time the spatial positioning of the conserved gpl20 neutralization epitopes. Although the major variable loops are either absent (V1/V2 and V3 ) are poorly resolved (V4) in the gpl20 core structure, their approximate positions can be deduced (Figure 35a) . The conserved gpl20 neutralization epitopes are discussed in relation to these variable loops and to the variable, glycosylated core surface . a) CD4i epitopes. The gpl20 epitope recognized by the CD4i antibody, 17b, can be directly visualized in the crystallized ternary complex (12) (Figure 35b, c). Strands from the gpl20 fourth conserved (C4) region and the V1/V2 stem contribute to an antiparallel 3-sheet (the "bridging sheet" (see Figure 34a) ) that contacts the antibody. The vast majority of gpl20 residues previously implicated in formation of the CD4i epitopes (18) (Table 1) are located either within this 3-sheet or in nearby structures. With the exception of Thr 202 and Met 434, the gpl20 residues in contact with the 17b Fab are highly conserved among
HIV-1 isolates (Figure 34c, 2a) . The prominent ("male") CDR3 loop of the 17b heavy chain dominates the contacts with gpl20, with additional contacts through the heavy chain CDR2 (12) .
Unusually, there are minimal 17b light chain contacts, leaving a large gap between the gpl20 core and most of the 17b light chain surface. In the complete gpl20 glycoprotein, this gap is likely occupied by the V3 loop. This is consistent with the position and orientation of the V3 stem on the gpl20 core structure (12) , the effect of V3 deletions on the binding of CD4i antibodies in the absence of soluble CD4 (22), the competition of some V3 -directed antibodies with CD4i antibodies (5) , and the ability of both antibody groups to block chemokine receptor binding (10,11) . The chemokine receptor-binding region of gpl20 likely consists of elements near or within the "bridging sheet" and the V3 loop (Figure 34a) , a model that is supported by recent mutagenic analysis (C. Rizzuto and J. Sodroski, submitted) .
The V2 loop likely resides on the side of the 17b epitope opposite the V3 loop (Figure 35a) . The V1/V2 loops, which vary from 57 to 86 residues in length (13) , are dispensable for HIV-1 replication (22,27), but decrease the sensitivity of viruses to neutralization by antibodies against V3 and CD4i epitopes (27) . The latter effect is mediated primarily by the V2 loop (22) , suggesting that part of the V2 loop folds back along the VI/V2 stem to mask the "bridging sheet" and adjacent V3 loop. The proximity of the V2 and V3 loops is supported by the observation that, in monkeys infected with simian-human immunodeficiency viruses (SHIVs) , neutralizing antibodies are raised against discontinuous epitopes with V2 an V3 components (B. Etemad-Moghadam and J. Sodroski, submitted) . The CD4i epitopes are probably masked by the flanking V2 and V3 loops, requiring the evolution of antibodies with protruding
("male") CDRs to access these conserved epitopes. CD4 binding has been suggested to reposition the V1/V2 loops, thus exposing the CD4i epitopes (22) . The presence of contacts between the V1/V2 stem and CD4 in the crystal structure (12) is consistent with this model .
b) CD4BS epitopes . CD4 makes a number of contacts within a recessed pocket on the gpl20 surface. The gpl20-CD4 interface includes two cavities, one water-filled and bounded equally by both proteins, the other extending into the gpl20 interior and contacting CD4 only at phenylalanine 43 (Figure 34a (12) . Table 1 and Figure 35b, c show the gpl20 residues implicated in the formation of CD4BS epitopes recognized by eight representative antibodies. CD4BS epitopes are uniformly disrupted by changes in Asp 368 and Glu 370 (20) , which surround the opening of the "Phe 43 cavity." These residues are located on a ridge at the intersection of the two receptor-binding gpl20 surfaces, consistent with competition studies suggesting that CD4BS epitopes overlap both the CD4i epitopes and the binding site for CD4 (5,18). The location of the gpl20 residues implicated in the formation of the CD4BS epitopes suggests that important elements of the CD4 -binding surface of gpl20 are accessible to antibodies.
Some CD4BS antibodies, like IgGlbl2, are particularly potent at neutralizing HIV-1 (23) . IgGlbl2 binding is disrupted by gpl20 changes that affect the binding of other CD4BS antibodies but, atypically, is sensitive to changes in the V1/V2 stem-loop structure (24) . The observation that some well-conserved residues in the gpl20 V1/V2 stem contact CD4 (12) raises the possibility that this protruding structure also contributes to the IgGlbl2 epitope. This might increase the ability of the antibody to access the assembled envelope glycoprotein trimer, thus increasing neutralizing capability.
While the CD4BS epitopes and the CD4 -binding site overlap, several observations demonstrate that the binding of CD4BS antibodies differs from that of CD4. Changes in Trp 427, a gpl20 residue that contacts both the "Phe 43 cavity" and CD4 , uniformly disrupt CD4 binding but affect the binding of only some CD4BS antibodies (Table 1) . Conversely, some changes in other cavity-lining gpl20 residues, Ser 256 and Thr 257, affect the binding of CD4BS antibodies more than the binding of CD4 (20) . Since the recessed position of Ser 256 and Thr 257 in the current crystal structure (Figure 35b, c) makes direct contacts with antibody unlikely, either the effects of changes in these residues are indirect or the CD4BS antibodies recognize a gpl20 conformation that differs from the CD4 -bound state. With respect to the latter possibility, it is interesting that several of the residues implicated in the integrity of the CD4BS epitopes are located in the interface between the inner and outer gpl20 domains. CD4BS antibodies might recognize a gpl20 conformation in which the spatial relationship between the domains is altered compared with the CD4 -bound state, thus allowing better surface exposure of these residues. Differences between the CD4BS epitopes and the CD4 -binding site create opportunities for neutralization escape (20) . The gpl20 residues surrounding the "Phe 43" cavity are highly conserved among primate immunodeficiency viruses (Figure 35a) , but the observed modest variation in adjacent surface-accessible residues (e.g., Pro 369, Thr 373 and Lys 432) could account for decrease recognition of the gpl20 glycoprotein from some geographic clades of HIV-1 by CD4BS antibodies (24) . Additional potential for variation near or within the CD4BS epitopes is created by the unusual water-filled cavity in the gpl20- CD4 binding interface, since CD4 binding can apparently tolerate change in the gpl20 residues contacting this cavity (12) .
The recessed nature of the CD4 binding pocket on gpl20 (Figure 34c) may delay the generation of high-affinity antibodies against the CD4BS epitopes and may afford opportunities to minimize the antiviral efficacy of such antibodies once they are elicited. The degree of recession is probably much greater on the full-length, glycosylated gpl20 than is evident on the crystallized gpl20 core. The recessed pocket is flanked on one side by the V1/V2 stem- loop structure. The characterization of HIV-1 escape mutants from the IgGlbl2 CD4BS antibody and the mapping of several V2 conformational epitopes support a model in which the V2 loop folds back along the V1/V2 stem, with V2 residues 183-188 proximai to Asp 368 and Glu 370. This model is consistent with observations that V1/V2 changes, in combination with V3 changes, can alter the exposure of the adjacent CD4BS epitopes, particularly on the assembled trimer (28) .
The high temperature factors associated with the V1/V2 stem (12) imply flexibility in this protruding element
(Figure 34c, d), expanding the potential range of space occupied by the VI/V2 stem- loop structure. This could enhance masking of the adjacent CD4BS and CD4i gpl20 epitopes and divert antibody responses towards the variable loops.
Glycosylation may modify the interaction of antibodies with CD4BS epitopes. The LD loop, on the rim of the CD4- binding pocket opposite the V1/V2 stem, contains a well- conserved glycosylation site, asparagine 276 (Figure 34c) . Changes in this site and at the adjacent alanine 281 have been associated with escape from the neutralizing activity of patient sera (25) and have been seen in SHIVs extensively passaged in monkeys (26) . Another conserved glycosylation site at asparagine 386 lies adjacent to both CD4BS and CD4i epitopes (Figure 34c) and could diminish antibody responses against those sites. Additionally, in various HIV-1 strains, carbohydrates are added to the V2 loop segment (residues 186-188) thought to be proximal to the CD4BS epitopes. The 2G12 epitope. The integrity of the 2G12 epitope is disrupted by changes in gpl20 glycosylation, either by glycosidase treatment or mutagenic alteration of specific N-linked carbohydrate addition sites (19) . These sites are located on the relatively variable surface of the gpl20 outer domain, opposite to and approximately 25 A away from the CD4 binding site (Figure 35b, c) . The gpl20 glycoprotein synthesized in mammalian cells exhibits a dense concentration of high-mannose sugars in this region (Figure 35a) . Even in the enzymatically deglycosylated gpl20 core, carbohydrate residues constitute much of this surface. 2G12 likely binds at least in part to these carbohydrates, explaining the surprising conservation of the 2G12 epitope despite the variability of the underlying protein surface, which includes the stem of the V3 loop and the V4 variable region. The inclusion of carbohydrate in the epitope might also explain the apparent rarity with which these antibodies are generated. The localization of the 2G12 epitope is consistent with previous studies indicating that 2G12 forms a unique competition group (5,19) and does not interfere with the binding of monomeric gpl20 to either CD4 or chemokine receptors (11) . Since the 2G12 epitope is predicted to be oriented towards the target cell upon CD4 binding (see below) , the antibody may sterically impair interactions of the oligomeric envelope glycoprotein complex with host cell moieties. Orienta ion of gpl20 in the trimer
Possible orientations of the exterior glycoproteins in the trimer are significantly constrained by the requirement that observed and deduced binding sites for receptors and neutralizing antibodies, sites of N-linked glycosylation, and variable structures be exposed on the surface of he assembled complex. The two-domain CD4 in the ternary complex structure was aligned to the structure of four-domain CD4 (29) to orient the trimer model with respect to the target cell membrane. The consequences of such a model (Figure 36) are: a) the chemokine receptor-binding sites are clustered at the vertex of the trimer predicted to be closest to the target cell; b) both variable and conserved neutralization epitopes are concentrated on the half of gpl20 facing the target cell; c) possibilities for intersubunit interactions among the variable structures that could help mask conserved neutralization epitopes are created; d) the subset of gpl20 glycosylation sites to which complex carbohydrates are added in mammalian cells (14) is well-exposed on the outer periphery of the trimer; e) the highly conserved surface near the αl helix is available for gp41 and/or gpl20 protein interactions within the trimers; and f) the surface of the assembled envelope glycoprotein complex is roughly hemispherical, thus minimizing the surface area of the viral spike that is potentially exposed to antibodies .
In summary, the X-ray crystal structure of the gpl20 core/two-domain CD4/l7b Fab complex provides a framework for visualizing key interactions between HIV-1 and the humoral immune system. Previous antibody competition analyses suggested that the gpl20 surface buried in the assembled trimer elicits non-neutralizing antibodies (5,6) . By contrast, the binding sites for neutralizing antibodies cluster on a different gpl20 surface (5) . Our structural studies support the existence of non- neutralizing and neutralizing faces of gpl20, and reveal another, immunologically "silent" face of the glycoprotein (Figure 35d) . This outer domain surface, along with the major variable loops, contributes to the large fraction of the gpl20 surface that is protected against antibody responses by a dense array of carbohydrates and by the capacity for variation. The conserved receptor-binding regions of gpl20 represent attractive targets for immune intervention. However, the elicitation of antibodies against these conformation-dependent structures is inefficient. Since the gpl20 epitopes near the receptor-binding regions span the inner and outer domains, interdomain conformational shifts may decrease their representation in the immunogen pool. The recessed nature of the CD4- binding site likely contributes to its poor immunogenicity. The sequential recognition of two receptors by primate immunodeficiency viruses allows the conserved elements of the chemokine receptor-binding site to be created or exposed only after CD4 binding has occurred. At that point, it is likely that the proximity of the chemokine-receptor binding site to the cell membrane sterically limits antibody binding. The evolution of primate immunodeficiency viruses that successfully persist despite the host immune response presents challenges to vaccine development. An understanding of the structures of the relevant gpl20 epitopes should assist efforts to overcome these hurdles . Material and Methods
Graphics . Molecular graphics were produced using Midas- Plus (University of California, San Francisco) and GRASP (30) .
Assignment of variability. Variability in gpl20 residues was assessed using an alignment of sequences derived from approximately 400 HIV-1, HIV-2 and simian immunodeficiency viruses (13) . Residues were assigned variability indices and color coded as follows:
Red : conserved in all primate immunodeficiency viruses ; Orange: conserved in all HIV-1, including groups M and 0 and chimpanzee isolates;
Yellow: some variation among HIV-1 isolates
(divergence from the consensus sequence in 1-8 of the 12 HIV-1 groups examined) ;
Green : variable among HIV-1 isolates (divergence from the consensus sequence in > 9 of the
12 HIV-1 groups examined) .
Molecular modeling. Residues 88, 89, and 397-409, which are disordered in the ternary complex crystals (12) , were built manually using the program TOM. For the V4 loop (residues 397-409) , a dominant constraint was the distance between the ordered residues 396 and 410 { Ca - Cα distance of 26.88 A). For the carbohydrate, examination of the N-linked carbohydrate in several crystal structures (e.g. Ifc2, lgly, lite) showed that the core common to both high-mannose and complex N- linked sugars, (NAG) _ (MAN) 3 did not differ greatly in conformation after alignment of the first NAG. This core, which represents roughly half the total glycosylation for a typical N-linked site, was built onto each of the 18 consensus N-linked glycosylation sites found on the HXBc2 gpl20 core. The stereochemistry of this initial model was refined using simulated annealing in XPLOR. Briefly, the model was heated to between 2,500° and 3,500°K, and "slow cooled" in steps of 25° to 300°K. At each step, molecular dynamics were performed with the core gpl20 fixed, allowing only the modeled residues and carbohydrate (including any attached Asn) to move. The three separate runs, performing molecular dynamics for 5 fs/step, all steric clashes could be removed and the geometry idealized, with an average root mean square (RMS) of carbohydrate movement of only -3.5A. Four subsequent runs were made using dynamic times of between 50-75 fs/step. The carbohydrate positions obtained from these runs differed more substantially from those in the starting model (average carbohydrate RMS difference of roughly 8 A) . Two of the models from these longer annealings were much more similar to each other than to the rest (RMS differences in carbohydrate of ~ 4 A versus ~ 8 A for all other models) . One had been heated to 3,500°K with dynamics of 75 fs/step. The other
(shown in the figures display here) was heated to only
2,500°K with dynamics of 50 fs/step. In general the RMS movement of the NAG sugars was roughly half the RMS movement of the MAN sugars, reflecting greater conformational flexibility further from the protein surface .
References for the Fourth Series of Experiments
1. Allen, J. el al . Identification of the major envelope glycoprotein product of HTLV-II. Science 228, 1091-1094 (1983) .
2. Dalgleish, A.G. et al . The CD4 (T4) antigen is essential component of the receptor for the AIDS retro virus. Nature 312, 763-767 (1984) .
3. Klatzmann, D. et al . T- lymphocyte T4 molecule behaves as the receptor for human retro virus LAV. Nature 312 , 767-768 (1984) .
4. Feng, Y., Broder, CC, Kennedy, P.E. and Berger, E. HIV-1 entry cofactor: Functional cDNA cloning of a seven- transmembrane, G protein- coupled receptor. Science 272 , 872-877 (1996) .
5. Moore, J.P. and Sodroski, J. Antibody cross- competition analysis of the human immunodeficiency virus type 1 gpl20 exterior envelope glycoprotein. J. Virol. 70, 1863-1872 (1996) .
6. Wyatt, R. et al . Analysis of the interaction of the human immunodeficiency virus type 1 (HIV-1) gpl20 envelope glycoprotein with the gp41 transmembrane glycoprotein. J. Virol., 71, 9722-9731
(1997) .
Sattentau, Q.J. and Moore, J.P. Human immunodeficiency virus type 1 neutralization is determined by epitope exposure on the gpl20 oligomer. J. Exp. Med. 182, 185-196 (1995) .
Posner, M. et al . An IgG human monoclonal antibody which reacts with HIV-1 gpl20, inhibits virus binding to cells, and neutralizes infection. J. Immunol.146, 4325-4332 (1991) .
9. Ho, D. et al . Conformational epitope on gpl20 important in CD4 binding and h u m a n immunodeficiency virus type 1.. neutralization identified by a human monoclonal antibody. J.
Virol. 65, 489-493 (1991) .
10. Wu, L. et al . CD4-induced interaction of primary HIV-1 gpl20 glycoproteins with the chemokine receptor CCR5. Nature 384, 179-183 (1996).
11. Trikola, A. et al . CD4 -dependent , antibody- sensitive interactions between HIV-1 and its coreceptor CCR-5. Nature 384, 184-187 (1996).
12. Kwong P. et al . Nature , submitted.
13. Myers, G. et al . Human retro viruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences. Los Alamos National Laboratory. Los Alamos, N.M. 1996.
14. Leonard, C et al . Assignment of interchain disulfide bonds and characterization of potential glycosylation sites of the type 1 human immunodeficiency virus envelope glycoprotein (gpl20) expressed in Chinese hamster ovary cells. J. Biol. Chem. 265, 10373-10382 (1990) .
15. Fung, M.S.C. et al . Identification and characterization of a neutralization site within the second variable region of human immunodeficiency virus type 1 gpl20. J. Virol. 66 ,
848-856 (1992) . 16. Putney, S. et al . HTLV-H/LAV-neutralizing antibodies to an E. coli -produced fragment of the virus envelope. Science 234, 1392-1395 (1986) .
17. Rusche, J.R. et al . Antibodies that inhibit fusion of human immunodeficiency virus-infected cells bind a 24-amino-acid sequence of the viral envelope gpl20. Proc. Natl. Acad. Sci. USA 85, 3198-3202
(1988) .
18. Thali, M. et al . Characterization of conserved human immunodeficiency virus type 1 gpl20 neutralization epitopes exposed upon gpl20-CD4 binding. J. Virol. 67, 3978-3988 (1993) .
19. Trkola, A. et al . Human monoclonal antibody 2G12 defines a distinctive neutralization epitope on the gpl20 glycoprotein of human immunodeficiency virus type 1. J. Virol. 70, 1100-1108 (1996) .
20. Thali, M. et al . Discontinuous, conserved neutralization epitopes overlapping the CD4 binding region of the HIV-1 gpl20 envelope glycoprotein. J. Virol, 66, 5635-5641 (1992) .
21. Binley, J. et al . Analysis of the interaction of antibodies with a conserved, enzymatically deglycosylated core of the HIV-1 gpl20 envelope glycoprotein. AIDS Res. Hum. Retro viruses, 14 , 191-198 (1997) .
22. Wyatt, R. et al . Involvement of the V1/V2 varaible loop structure in the exposure of human immunodeficiency virus type 1 gpl20 epitopes induced by receptor binding. J. Virol. 69 , 5723-
5733 (1995) . 23. Robe, P. et al . Recognition properties of a panel of human recombinant Fab fragments to the CD4 binding site of gpl20 that show differing abilities to neutralize human immunodeficiency virus type 1. J. Virol. 68, 4821-4828 (1994) .
24. Moore, J.P. et al . Exploration of antigenic variation in gpl20 form clades A through F of human immunodeficiency virus type 1 by using monoclonal antibodies. J. Virol. 68, 8350-8364
(1994) .
25. Watkins, B.A. et al . Immune escape by human immunodeficiency virus type 1 from neutralizing antibodies: evidence for multiple pathways. J.
Virol 67, 7493 (1993) .
26. Karlsson, G. et al . Characterization of molecularly cloned simian-human immunodeficiency viruses causing rapid CD4+ lymphocyte depletion in rhesus monkeys. J. Virol. 71, 4218 (1997) .
27. Cao, J. et al . Replication and neutralization of human immunodeficiency virus type 1 lacking the V1/V2 variable loops of the gpl20 envelope glycoprotein. J. Virol. 71, 9808-9812 (1997) .
28. Wyatt, R. et al . Functional and immunologic characterization of human immunodeficiency virus type 1 envelope glycoproteins containing deletions of the major variable regions. J. Virol. 67 , 4557-4565 (1993) .
29. Wu, H., Kwong, P.D. and Hendrickson, W.A. Dimeric association and segmental variable in the structure of human CD . Nature 387, 527-530 (1997) . 30. NichoUs, A., Sharp, K.A. and Honig, B. Protein folding and association: from the mterfacial and thermodynamic properties of hydrocarbons . Proteins 11, 281-296 (1991) .
Table 1. Conserved Epitopes for Neutralizing Antibodies Identified on the gpl20 Core
C D 4 - 17b Asn 88, Lys InterCD4 binding 18 and C.
Induced 48d l "7, Lys 121, ference increases Rizzuto
Epitopes Lys 207, Ser with exposure of and J .
(CD4i) 256 , Thr 257, chemokine the epitopes Sodroski ,
Asn 262, ΔV3, receptor as a result submitted
Glu 370, Glu binding of movement
381, Phe 382, of the V2
Arg 419, lie variable loop
420, Lys 421,
Gin 422, He
423, Trp 427,
Tyr 435, Pro
438, Met 475
2G12 2G12 Asn 295, Thr Unknown An t l b o d y
29^ , Ser 334, binding is 19
Asn 386, Asn dependent upon
392, Asn 397 proper N- l i n k e d glycosylation
The gpl20 competition groups are defined as Reference 5. The gpl20 amino acids are numbered according to the sequence of the HXBc2 (IIIB) gpl20 glycoprotein, where residue 1 is the methionine at the amino-terminus of the signal peptide. Changes m the amino acids listed resulted in significant reduction m antibody binding to the gpl20 glycoprotein (Ref. 18-20) . The numbers m parentheses indicate the percentage of the CD4BC antibodies examined whose binding is decreased by changes in the indicated residue.
Fifth Series of Experiments
The entry of primate immunodeficiency viruses into target cells depends upon a sequential interaction of the gpl20 envelope glycoprotein with the cellular receptors, CD4 and members of the chemokine receptor family. The gpl20 third variable (V3) loop has been implicated in chemokine receptor binding, but the use of the CCR5 chemokine receptor by diverse primate immunodeficiency viruses suggests the involvement of an additional, conserved gpl20 element. Here we identify a highly conserved gpl20 structure that is critical for CCR5 binding, is located adjacent to the V3 loop, and contains neutralization epitopes induced by CD4 binding. This conserved element may be a useful target for pharmacologic or prophylactic intervention in immunodeficiency virus infections.
The clinically abundant primate immunodeficiency viruses behind the 3-chemokine receptor CCR5 as an obligate step in virus entry into target cells (1,2) . The gpl20 glycoproteins of primary, macrophage-tropic HIV-1 strains have been shown to bind specifically to cells expressing CCR5(3,4) . The affinity of gpl20 binding was increased 2-3 logs by the presence of soluble CD4 (sCD4)
(3) . Efficient CCR5 binding was dependent upon the presence of the V3 variable loop of gpl20, but the gpl20
V1/V2 variable loops and N-and C- termini were dispensable for high-affinity binding to CCR5(3) . No significant CCR5 binding was observed for gpl20 glycoproteins derived from laboratory-adapted HIV-1 isolates, which do not use CCR5 as a coreceptor (3,4) .
Specific groups of HIV-1 neutralizing antibodies directed against the gpl20 V3 loop or CD4- induced (CD4i) epitopes were able to block the binding of gpl20-sCD4 complexes to CCR5 -expressing cells (3,4) . The CD4i epitopes are conserved, discontinuous gpl20 structures that are exposed better after CD4 binding (5) . Mutagenic analysis suggested that elements of the conserved stem of the V1/V2 stem- loop and of the fourth conserved region of gpl20 comprise the CD4i epitopes (5) . Here we test the hypothesis that conserved gpl20 residues near or within the CD4i epitopes are critical for CCR5 binding.
An assay was established that could assess the CCR5- binding ability of a panel of HIV-1 gpl20 glycoproteins mutants. The mutants were created by the introduction of single amino acid changes in gpl20 residues near or within regions previously shown to be important for the integrity of the CD4i epitopes (5) . During the course of this work, structural information on the gpl20 epitope recognized by a CD4i-directed antibody, 17b, became available (6) (see below) and was used to guide the mutagenesis. The wtΔ glycoprotein, which lacks the V1/V2 variable loops and the N-terminus and is derived from the YU2 primary macropage-tropic HIV-1 isolate (7) , was the starting point for the studies (Fig. 37) . This protein was chosen because it had been shown to bind CD4 and CD5 with high affinity (3,8,9). Furthermore, the use of this protein minimized the opportunities for indirect effects of gpl20 amino acid changes on CCR5 binding (e.g., by repositioning the V1/V2 loops, which can mask CD4i epitopes (9) . Metabolically labeled wtΔ and mutant derivatives were produced in 293T cells and incubated with mouse LI .2 cells stably expressing human CCR5(3), in either the absence or presence of sCD4. The cells were washed and lysed, and bound gpl20 protein was detected by precipitation with a mixture of sera from HIV-1 infected individuals (10) .
The wtΔ protein efficiently bound to the LI .2 CCR5 cells in the presence of sCD4 (Fig. 38, A and B) . Binding was dramatically reduced when sCD4 was not present in the assay. The wtΔ protein binding to the L1.2-CCR5 cells was inhibited by preincubation of the wtΔ protein with the 17b antibody. Binding was also inhibited by incubation of the L1.2-CCR5 cells with the 2D7 antibody against CCR5 (11) or with the CCR5 ligand, MIP-l3(12). The Cll antibody, which is directed against a gpl20 region dispensable for CCR5 binding (3), did not block the binding of the wtΔ protein to the L1.2-CCR5 cells (data not shown) . The wtΔ protein did not bind appreciably to the parental LI .2 cells not expressing CCR5 , even in the presence of sCD4. These results suggest that the wtΔ protein binds CCR5 in a specific, CD4 -dependent manner .
The binding of the panel of gpl20 mutants to the LI.2- CCR5 cells in the absence and presence of sCD4 was measured. The recognition of the mutant proteins by sCD4 and by monoclonal antibodies that recognize discontinuous gpl20 epitopes (5,13) was assessed in parallel (10) . Changes in several gpl20 amino acids resulted in dramatic reductions in the ability of the protein to bind to L1.2-CCR5 cells in the presence of sCD4 (Table 1 and Fig. 38C) . In some cases (257 T/D, 370 E/Q and 383 F/S) , the attenuated CD4 -binding ability of the mutant proteins could account for the observed reduction in binding to the L1.2-CCR5 cells. In most cases, however, the mutant proteins that were deficient in CCR5 binding still bound sCD4 and at least one of the monoclonal antibodies recognizing discontinuous gpl20 epitopes. As expected, some of the introduced amino acid changes decreased recognition by the 17b antibody. Interestingly, two of the gpl20 amino acid changes (437 P/A, 442 Q/L) resulted in an increase in CCR5 binding compared with the wtΔ protein, even though CD4 binding was not significantly increased. In the absence of sCD4, the 437 P/A and 442 Q/L envelope glycoprotein mutants bound to the L1.2-CCR5 cells slightly better than the other mutants and the wtΔ protein, which exhibited very low levels of binding (Fig. 38A and data not shown) .
Recently, the structure of an HIV-1 gpl.20 core crystallized in a ternary complex with two-domain CD4 and the 17b Fab has been solved (6) . The gpl20 core is composed of an inner domain, an outer domain, and a "bridging sheet" (Fig. 39A) . The "bridging sheet" is a four-stranded, antiparallel /3-sheet that includes the V1/V2 stem and strands (/320 and 321) derived from the fourth conserved gpl20 region. CD4 contacts gpl20 residues in the outer domain and the "bridging sheet" (6) . The gpl20 residues implicited by our study in CCR5 binding are located near or within the "bridging sheet"
(Figure 39, A and B. The "bridging sheet" is predicted to face the target cell after the envelope glycoproteins bind CD4 (6) . Even more than the CD4 -binding site, the gpl20 region implicated in CCR5 binding is highly conserved among primate immunodeficiency viruses; this is particularly apparent in comparison to the remainder of the gpl20 surface thought to be exposed on the assembled envelope glycoprotein complex (Fig. 39C) (6) . The CD4i epitope for the 17b antibody is located near or within the "bridging sheet" (6) , consistent with the ability of the antibody to block CCR5 binding (3,4) . All of the individual gpl20 residues in which changes disrupted recognition by the 17b antibody (Fig. 39D) are located closed to the gpl20-17b interface in the crystallized complex (Table 1) . The binding of another antibody, CG10, which disrupts gpl20-CCR5 interaction (3) and competes with the 17b antibody for gpl20 binding (14) , is also affected by changes in amino acid residues within or near the "bridging sheet" (Fig. 39E) . The position and orientation of the V3 base in the structure
(6) , in conjunction with a number of mutagenic and antibody competition studies (15) , suggest that the gpl20 V3 loop resides proximal to the region implicated in CCR5 binding (Fig. 39A) . For example, the binding of both CG10 and CD4i antibodies to gpl20 can be disrupted by some V3 changes (5,14) . Furthermore, several V3- directed antibodies compete with CD4i antibdoies for gpl20 binding (15) .
Our observations suggest that the CCR5 -binding site is likely composed of conserved gpl20 elements near or within the "bridging sheet" and V3 loop residues. The latter might include more conserved structures (e.g. the aromatic or hydrophobic residue at position 317, altered in this study) as well as more variable structures (16) that determine the specific chemokine receptor used.
Some of the gpl20 residues identified in this and previous studies (16) as determinants of chemokine receptor utilization could modulate the interaction of the V3 loop and elements near the "bridging sheet" . For example, studies of HIV-1 revertants (15) suggested a functional interaction of gpl20 residue 440, shown here to influence CCR5 binding, with the V3 loop.
A subset of the gpl20 residues in or near the "bridging sheet" likely contacts CCR5 directly. Most of the gpl20 residues implicated in CCR5 binding exhibit reasonable solvent accessibility in the free gpl20 core (Table 1) , consistent with this possibility. The gpl20 surface implicated in CCR5 binding is highly basic (6) , potentially favoring interactions with the acidic CCR5 amino terminus, which has been shown to be important for gpl20 binding (17,18). Additionally, hydrophobic interactions, similar to those seen for gpl20-17b binding (6) , may also contribute to the gpl20-CCR5 interaction.
The exposure and/or formation of the CCR5 -binding site of HIV-1 gpl20 glycoproteins is dependent upon interaction with CD4 (3,4) . CD4 binding has been shown to reposition the V1/V2 variable loops and thus expose the CD4i epitopes (9) , which overlap the CCR5-binding region (3,4) . However, since a gpl20 glycoprotein lacking the VI and V2 variable loops also exhibits CD4- dependent CCR5 binding (3), the interaction with CD4 must cause other conformational changes in gpl20 related to the CCR5 -binding site. Our results, which highlight the proximity of the two receptor-binding sites on gpl20, provide likely explanations for the induction of such conformational changes. First, one of the components of the "bridging sheet", the V1/V2 stem, also contacts CD4(6) . Thus, CD4 binding, which appears to distort the V1/V2 stem, may reposition this structure and allow the formation of the β-sheet important for CCR5 binding. In this respect, we note that a substitution of aspartic acid for threonine 123, which is located in the V1/V2 stem and contacts CD4 , significantly decreases CCR5 binding. This substitution may disrupt CD4- induced conformational changes in the V1/V2 stem required for CCR5 binding. Second, the CD4- bound conformation of gpl20 exhibits a cavity (the "Phe 43" cavity) within the gpl20 interior(6). This cavity contacts the gpl20 inner and outer domains as well as the "bridging sheet" and likely forms as a result of interdomain conformational changes in gpl20 induced by CD4 binding (6). Since the "bridging sheet" lacks its own hydrophobic core and is thus dependent upon residues contributed by both inner and outer domains (6), any shift in orientation between these domains would alter the conformation of the "bridging sheet" . Furthermore, CD4 binding could also alter the precise orientation of the "bridging sheet" with respect to the inner and outer domains, thus aligning the V3 loop and conserved gpl20 elements important for CCR5 binding. To summarize, CD4 binding likely induces conformational changes within the "bridging sheet" as well as between this sheet and the inner and outer domains to form the high-affinity CCR5 binding site. For some primate immunodeficiency viruses, the CD4 -bound conformation of gpl20 must be energetically assessable in the absence of CD4 to explain the documented examples of.. CD4- independent chemokine receptor binding and entry (18,19) .
It is likely that the CCR5-binding region defined in this study is also important for the binding of simian and human immunodeficiency viruses to other chemokine receptors. The identified region exhibits one of the most highly conserved surfaces on the HIV-1 gpl20 glycoprotein (6), supporting its functional importance for all primate immunodeficiency viruses. The laboratory-adapted HXBc2 envelope glycoprotein, which uses CXCR4 and not CCR5 as a coreceptor (1,2,20), can be converted to an efficient CCR5 -using protein simply by substituting the V3 loop of the YU2 virus (2) . Thus, all of the CCR5 -binding region outside of the V3 loop must be conserved, at least between the HXBc2 and YU2 viruses. Indeed, we have shown that alteration of the lysine 117, lysine 207 and glycine 441 in the HXBc2- YU2V3 chimeric protein also disrupts CCR5 binding (21) . Consistent with the use of this region for the binding of other chemokine receptors is the observation (19) that the gpl20 changes associated with the conversion of HIV-2 to a CD4-independent , CXCR4-using virus affect the "bridging sheet" and the V3 loop. Alterations in
"bridging sheet" residues have also been implicated in changes in the tropism of HIV-1 for immortalized cell lines that do not express CCR5(22). Finally, the 17b antibody neutralizes HIV-1 strains that use different chemokine receptors (5, 14), supporting the involvement fo a common gpl20 region in chemokine receptor interaction. Chemokine receptor binding may trigger additional conformatinal changes in the envelope glycoprotein complex that ultimately lead to the fusion of the viral and target cell membrane. It is believed that some of these changes include exposure of the ectodomain of the gp41 transmembrane envelope gylcoprotein (23 ) . It is interesting that the CCR5 -binding region defined herein likely resides closes to the trimer axis of the assembled envelope gycoprotein complex (6) . Indeed, some of the gpl20 residue changes that affect CR5 binding also affect the non-covalent association of gpl20 and gp41 subunits in the trimeric complex (21). These observations raise the possibility that chemokine receptor binding alters the relationship between gpl20 and gp41, leading to the exposure of the gp41 ectodomain and interaction with the target cell membrane.
The definition of a highly conserved gpl20 structure that this important for binding to CCR5 , the major coreceptor used by clinically abundant primate immunologic inhibitors of virus-receptor interactions. An understanding of the CD4- induced conformational changes in this structure may allow the targeting of sucg inhibitors to native or CD4-bound states of gpl20.
References for the Fifth Series of Experiments
1. G. Alkhatib et al . , Science 272, 1955(1996); H.K. Deng et al . , Nature 381, 661(1996); B.J. Doranz et al., Cell85, 1149(1996); T. Dragic et al . , Nature
381, 667 (1996) .
2. H. Chloe et al . , Cell 85, 1135(1996) .
3. L. Wu et al., Nature 384, 179(1996).
4. A. Trkola et al . , Nature 384, 184(1996) .
5. M. Thali et al . , J. Virol. 67, 3978(1993) .
6. P. Kwong et al . , eubmitted; R. Wyatt et al . , submited; P. Kwong et al . , submitted.
7. Y. Li et al., J. Virol. 65, 3973(1991) .
8. R. Wyatt et al . , J. Virol. 67, 4557(1993); R. Wyatt et al., J. Virol. 71, 8722(1997) .
9. R. Wyatt et al . , J. Virol. 69, 5723(1995) .
10. 293T cells were cotransfected with 20 μg of a plasmid expressing the wtΔ or mutant envelope glycoproteins and 2 μg of a plasmid expressing the HIV-1 Tat protein, using the calcium phosphate technique. Transfected cells were washed and metabolically labeled for 16 hours with 50 μCi/ml 35S-cysteine and 50 μCi/ml (35) S-methionine . Labeled cell supernatants were harvested, cleared by low- speed centrifugation (200 xg for 10 minutes at 4°C) and stored at 4°C until used in the binding assays. For measurement of the binding of sCD4 and antibodies to the wtΔ and mutant envelope glycoproteins, different dilutions of the envelope glycoprotein-containing supernatants were precipitated to ensure that binding occurred in the linear range of the assay. For CD4 binding, the envelope glycoprotein-containing supernatants were incubated for 30 minutes at room temperature with a concentration of sCD4 (Smith Kline Beecham) empirically determined to precipitate the wtΔ protein optimally. The envelope glycoprotein-sCD4 complexes were then precipitated with the CD4- specific antibody, 0KT4 (Ortho) and Protein A- Sepharose (Pharmacia) . For binding of the 17b and F105 antibodies, the monoclonal antibodies were preincubated with Protein A-Sepharose prior to overnight incubation with envelope glycoprotein- containing sepernatants at 4°C For Binding of the CG10 antibody, envelope glycoprotein-containing supernatants were incubated with 100 nM sCD4 at room temperature for 30 minutes prior to addition of a CGI0-Protein G-Sepharose mixture and overnight incubation at 4°C Immunoprecipitates were washed and run on 12.5% SDS-polyacrylamide gels, which were fixed, dried and analyzed by autoradiography.
Binding was qualified by densitometry .
To measure CCR5 binding, envelope glycoprotein- containing supernatants were mixed with lOOnM xCD4 or phosphate-buffered saline (PBS) and incubated at room temperature for 30-60 minutes. L1.2-CCR5 cells (2xl07cells, LeukoSite, Inc. (3)) were pelleted, resuspended in 500 μl of envelope glycoprotein-containing supernatants, and rocked gently at 37°C for 1 hour. Cells were pelleted, washed twice in PBS and lysed by the addition of NP40 buffer (0.5 M NaCl, 10 mM Tris, pH 7.5 , 0.5% NP40) . Lysates were cleared (20,000 xg at 4°C for 15 minutes) in a microdcentrifuge and the envelope glycoproteins were precipitated overnight at 4°C by a mixture of sera from HIV-1-infected individuals and Protein A-Sepharose. Sepharose pellets were washed in NP40 buffer, boiled in. SDS-containing sample buffer and run on 12.5% SDS-polyacrylamide gels. Autoradiographed gels were quantitated using a densitometer .
11. L. Wu et al., J. Exp. Med. 185, 1681(1997) .
12. M. Samson, O. Labbe , C Mollereau, G. Vassart, M.
Parmentier, Biochemistry 35, 3362(1996); C. Combadiere, S. Akuja, H. Tiffany, P. Murphy, J.
Leukocyte Biol. 60, 157(1996); C Raport , J.
Gosling, V. Schweickart, P. Gray, I. Charo, J. Biol. Chem. 271, 17161(1996).
13. M. Posner et al . , J. Immunol.. 146, 4325(1991) .
14. N. Sullivan et al . , J. Virol., in the press.
15. R. Wyatt et al . , J. Virol. 66, 6997 (1992); J. P. Moore, et al . J. Virol. 67, 4785 (1993); Carillo,
A. and Ratner, L. J. Virol. 70, 1301(1996); H.G. Morrison, F., Kirchhoff, R. Desrosiers, Virology 195, 167 (1993); F. Kirchhoff, H. Morrison, R. Desrosiers, Virology 213, 179(1995); J.P. Moore and J. Sodorski, J. Virol. 70, 1863(1996) .
16. F. Cocchi et al . , Nature Med. 2, 1244(1996); P.D. Bieniasz et al . , EMBO J. 16, 2599(1997); Speck et al., J. Virol. 71, 7136(1997).
17. M. Farzan et al . , J. Virol. 72, 1160(1998); T. Dragic et al . , J. Virol. 72,279(1998); G. Rabut et al . , J . Virol . 72 , 3464 ( 1998 ) .
18. K. Martin et al . , Science 278, 1470(1997) .
19. M.J. Endres et al . , Cell 87, 745(1996); J.D. Reeves and T.F. Schulz, J. Virol. 71, 1453(1997) .
20. Y. Feng, CC Broder, P. Kennedy, E. Berger; Science 272, 872(1996) .
21. C. Rizzuto, N. Hernandez and J. Sodroski, unpublished observations.
22. A. Cordonnier, L. Montagnier, A. Cordonnier, J. Virol. 67, 6253(1993); K. Fujita, J. Silver, K.
Peden, J. Virol. 66, 4445(1992) .
23. CM. Carr and P. S. Kim, Cell 73, 623(1993); P. Bullough, F. Hughson, J. Skehl, D.C. Wiley, Nature 371, 37(1994); W. Weissenhorn et al . , EMBO J 15,
1507 (1996); C.-H. Chen, T.J. Matthews, C B. McDanal, D. P. Bolognesi, Greenber, J. Virol. 69, 3771 (1995); C Wild, T. Oas , C, McDanal, D. Bolognesi, T. Matthews, Proc. Natl. Acad. Sci. USA 89, 10537(1992); S. Jiang, K. Lin, N. Strick, A. R.
Neurath, Nature 365, 113(1993); S. Jian, K. Lin, N. Strick, A.R. Neurath, BBRC 195, 553(1993); D.C. Chan, D. Fass, J.M. Berger, P.S. Kim, Cell 89, 263(1997); W. Weissenhorn, A. Dessen, S.C. Harrison, J.J. Skehel, D.C. Wiley, Nature 387,
426 (1997) .
24. B. Lee and F. Richards, J. Mol. Biol. 55, 379(1971); S. Sheriff, W.A. Hendrickson, R.E. Stenkamp, L. Sieker, L. H. Jensen, Proc. Natl.
Acad. Sci. USA 82, 1104(1985) . 25. Myers et al . , Human Retroviruses and AIDS. A compulation of nucleic acid and amino acid sequences. Los Almos National Laboratory. Los Almos, NM, 1996.
26. A. NichoUs, K.A. Sharp, B. Honig, Proteins 11, 281 (1991) .
Table 1. Phenotypes of HIV-1 gpl20 mutants. The ability of the wtΔ and mutant glycoptroteins to bind CCR5 expressed on LI .2 cells was determined (10) . The recognition of the wtΔ and mutant glycoporteins by sCD4 and monoclonal antibodies was determined (10) . All values reported are relatie to those seen for the wtΔ protein. Values represent the average of at least two independent expeiments and exhibitedless than 30% variation from the values shown.
*The number of the mutant wtΔglycoproteins is based on the sequence of the prototypic HXBc2 gpl20 glycoprotein (24), with 1 representing the initiator methionine. The wild-type YU2 gpl20 residue is listed, followed by the subsitituted residue. Amino acid abbreviations: A, Alanine; D, aspartic; E, glutamic acid; F, phenylalanine; G, glycine; h, histidine; I, isoleucine; K, lysone; L, leucine; M, methionine; N, Asparagine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; Y, tyrosine. The fractional solvent accessibilities associated with gpl20 residues in which changes specifically disrupted CCR5 binding are shown in prentheses . Fractional solvent accessibility was claculated as the ratio of solvent-accessible surface area for atoms of amino-acid residue X in the gpl20 core (without carbohydrate moieties) to the area obtained after reducing the structure to a Gl-X-Gly tripeptide (24), values cited are for side-chain atoms except for glycine 441 where the value for all atoms is given.
+The binding of the wtΔ glycoprotein to L1.2-CCR5 cells was shown to be linearly related to the concentration of wtΔ protein in the transfected 293T cell supernatants, over the range of concentrations used in these experiments . The total amount of wtΔ and mutant glycoprotein present in the 293T cell supernatants was estimated by precipitation with an excess of a mixture of sera from HIV- 1- infected individuals. The amount of wtΔ and mutant glycoprotein bound to the L1.2-CCR5 cells was determined as described (10) . The value for CCR5 binding was calculated using the following formula: Bound mutant protein T o t a l w t Δ protein
CCR binding= Bound wtΔ protein X Total mutant protein
tThe recognition of the wtΔ and mutant glycoproteins by sCD4 and antibodies was determined by precipitation of radiolabeled envelope glycoproteins in transfected 293T cell supernatants as described (10) . In parallel, the labeled envelope glycoproteins were precipitated with an excess of a mixture of sera from HIV-1-infected individuals. The value for ligand binding was calculated using the following formula:
Ligand binding = Mutant protein1 < jaπΛ wtΔproteinPO-.- _.„,■„.■„ wtΔproteinligand X Mutant proteinserum mixture
In the sCD4 and 17b columns, the values in bold indicate gpl20 residues that exhibit decreased solvent accessibility on the presence of the two-domain sCD4 or 17b Fab, respectively, in the ternary complex (6) . Changes in solvent accessibility were calculated using the MS program of Michael Connolly.

Claims

hat is claimed is :
1. A crystal suitable for X-ray diffraction comprising a polypeptide having an amino acid sequence of a portion of a Human Immunodeficiency Virus envelope glycoprotein gpl20.
2. The crystal of claim 1, which effectively diffracts X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 4 angstroms or better than 4 angstroms.
3. The crystal of claim 1, which effectively diffracts X-rays for determination of the atomic coordinates of the polypeptide to a resolution of 2.5 angstroms or better than 2.5 angstroms .
4. The crystal of claim 1, wherein the portion of gpl20 comprises a CD4 binding site.
5. The crystal of claim 4, further comprising a compound bound to the CD4 site.
6. The crystal of claim 1, wherein the portion of gpl20 comprises a chemokine receptor binding site.
7. The crystal of claim 6, further comprising a compound bound to the chemokine receptor binding site .
8. The crystal of claim 1, wherein the portion of gpl20 comprises a CD4 binding site and a chemokine receptor binding site.
9. The crystal of claim 8, further comprising of a first compound bound to the CD4 binding site of the polypeptide and a second compound bound to the chemokine receptor binding site of the polypeptide.
10. The crystal of claim 9, wherein the first compound is the second compound.
11. The crystal of claim 9, wherein the crystal is arranged in a space group P222lf so as to form a unit cell of dimensions a=71.6 A, b= 88.1 A, c=196.7 A, and which effectively diffracts x-rays for determination of the atomic coordinates of the gpl20 to a resolution of 2.5 A or better.
12. The crystal of claim 1, wherein the polypeptide is a variant of gpl20 lacking the VI, V2 , V3 , and C5 regions .
13. The crystal of claim 12, wherein the gpl20 variant comprises a portion of the conserved stem of the V1/V2 stem- loop structure.
14. The crystal of claim 13, wherein the gpl20 variant comprises a portion of the base of the V3 loop.
15. The crystal of claim 14, wherein the gpl20 variant comprises a portion of the C5 region.
16. The crystal of claim 1, wherein the polypeptide is a variant of gpl20 with 5% by weight of the carbohydrate residues linked to the gpl20 in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
17. The crystal of claim 1, wherein the polypeptide is a variant of gpl20 with 15% by weight of the carbohydrate residues linked to the gpl20 polypeptide in substantially the same manner as they are linked to gpl20 in unmodified gpl20.
18. The crystal of claim 12 or 16, further comprising a Fab, a CD4 , a polypeptide having amino acid sequence of a portion of CD4 , or a combination thereof, bound to the gpl20.
19. The crystal of claim 18, wherein the Fab is produced from an antibody to a discontinuous epitope .
20. The crystal of claim 19, wherein the monoclonal antibody is designated 17b.
21. A method for producing a crystal suitable for X-ray diffraction comprising: a. deglycosylating a polypeptide having amino acid sequence of a portion of a gpl20 wherein said portion is produced by deleting or replacing part of the gpl20 to reduce the surface loop flexibility; b. contacting the polypeptide with a ligand so as to form a complex which exhibits restricted conformational mobility; and c. obtaining crystal from the complex so formed to produce a crystal suitable for X-ray diffraction.
22. The method of claim 21, wherein the VI, V2 , or V3 loop of the gpl20 contained in the polypeptide are partially truncated, deleted or replaced.
23. The method of claim 21, wherein the polypeptide lacks the VI, V2 , V3 and C5 loop of the gpl20.
24. The method of claim 21, wherein the ligand is a Fab, a CD4 , or a polypeptide having amino acid sequence of a portion of CD4.
25. The method of claim 21, wherein the resulting polypeptide after the deglycosylation contains at least 5% of the carbohydrate.
26. The crystal produced by the method of claim 21.
27. A method for identifying a compound capable of binding to a portion of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: a. determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and b. determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the gpl20.
28. A method for designing a compound capable of binding to a portion of Human Immunodeficiency
Virus envelope glycoprotein gpl20 comprising: a. determining a binding site on the portion of gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising the portion of gpl20; and b. designing a compound to fit the binding site.
29. A method of claim 27 or 28, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
30. A method of claim 27 or 28, wherein the atomic coordinates are set forth in Figure 53.
31. A pharmaceutical composition comprising the compound identified by claim 27 and a pharmaceutically acceptable carrier.
32. The method of claim 27, wherein the compound is not previously known.
33. The compound identified by the method of claim 32.
34. The compound designed by the method of claim 28.
35. A composition comprising the compound of claim 34 and a suitable carrier.
36. A method of inhibiting the interaction of HIV-gpl20 with CD4 which comprises administering to a mammal a compound, with the proviso that the compound is not CD4 , capable of disrupting two or more of the contacts between gpl20 and CD4 as set forth in
Figure 54.
37. A method for identifying a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising :
a. determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and
b. determining whether a compound would fit into the binding site, a positive fitting indicating that the compound is capable of binding to the CD4 binding site of the gpl20.
38. A method for designing a compound capable of binding to the CD4 binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising :
a. determining the CD4 binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide having amino acid sequence of a portion of gpl20 capable of binding to CD4 ; and
b. designing a compound to fit the CD4 binding site .
39. A method of claim 37 or 38, wherein the crystal further comprising a CD4 , a second polypeptide having amino acid sequence of a portion of CD4 , or a compound known to be able to bind to the CD4 site of the gpl20, bound to the polypeptide.
40. A method of claim 37 or 38, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
41. A method of claim 37 or 38, wherein the atomic coordinates are set forth in Figure 53.
42. A pharmaceutical composition comprising the compound identified by claim 37 and a pharmaceutically acceptable carrier.
43. The method of claim 37, wherein the compound is not previously known.
44. The compound identified by the method of claim 43.
45. The compound designed by method of claim 38.
46. A composition comprising the compound of claim 44 or 45 and a suitable carrier.
47. A method of inhibiting Human Immunodeficiency Virus infection in a subject comprising administering effective of amount of the composition of claim 46 to the subject.
48. A method for identifying a compound capable of binding to the chemokine receptor binding site of Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: a. determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide comprising the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and b. determining whether a compound would fit into the binding site, a positive fit indicating that the compound is capable of binding to the chemokine receptor binding site of the gpl20.
49. A method for designing a compound capable of binding to the chemokine receptor binding site of
Human Immunodeficiency Virus envelope glycoprotein gpl20 comprising: a. determining the chemokine receptor binding site on the gpl20 based on the atomic coordinates computed from X-ray diffraction data of a crystal comprising a polypeptide comprising the amino acid sequence of a portion of gpl20 capable of binding to the chemokine receptor; and b. designing a compound to fit the chemokine receptor binding site .
50. The method of claim 48 or 49, wherein the crystal further comprises a chemokine receptor, a second polypeptide having amino acid sequence of a portion of chemokine receptor, an antibody or a Fab capable of binding to the chemokine receptor binding site or a compound known to be capable of binding to the chemokine receptor binding site, bound to the polypeptide .
51. The method of claim 48 or 49, wherein the fitting is determined by shape complementarity or by estimated interaction energy.
52. The method of claim 48 or 49, wherein the atomic coordinates are set forth in Figure 53.
53. The pharmaceutical composition comprising the compound identified by the method of claim 48 and a pharmaceutically acceptable carrier.
54. The method of claim 48, wherein the compound is not previously known.
55. The compound identified by the method of claim 54.
56. The compound designed by method of claim 49.
57. A composition comprising the compound of claim 55 or 56 and a suitable carrier.
58. A method of inhibiting Human Immunodeficiency Virus infection in a subject comprising administering effective of amount of the composition of claim 57 to the subject, thereby inhibiting Human Immunodeficiency Virus infection.
59. A method of inhibiting the interaction of HIV-gpl20 with chemokine receptor which comprises administering to a mammal a compound capable of disrupting two or more of the contacts between gpl20 and chemokine receptor as set forth in figure 55, thereby inhibiting the interaction of HIV-gpl20 with chemokine receptor with the proviso that the compound is not a chemokine receptor.
60. The method of claim 59, wherein the compound is nonpeptidyl.
61. A substance mimicking the human immunodeficiency virus envelope glycoprotein gpl20-binding region of CD4 wherein the size of a residue or analog thereof, corresponding to the phenylalanine at position 43 in the native CD4 , is larger than the size of phenylalanine so as to fill the pocket on gpl20 which extends beyond position 43 in the gpl20/CD4 complex and increase the affinity for gpl20.
62. The substance of claim 61, wherein the substance is a peptidomimetic analog, a synthetic polypeptide, a standard polypeptide, or a polypeptide analog.
63. The substance of claim 61, wherein the size of residue or analog thereof is increased by directly or indirectly linking a hydrophobic compound to the residue or analog thereof.
64. The substance of claim 61, wherein the sidechain of the residue or analog thereof is larger than 7 A across its longest dimension.
65. The substance of claim 61, wherein the sidechain of the residue or analog is larger than 10 A across its longest dimension.
66. The substance of claim 61, wherein the sidechain of the residue or analog thereof is larger than 15 A across its longest dimension.
67. The substance of claim 61, wherein the sidechain of the residue or analog thereof is. longer than the phenylalanine sidechain' s longest dimension.
68. The substance of claim 61, which enhances hydrophobic interactions to residues that line the pocket .
69. The substance of claim 61, which enhances hydrogen bonding to residues that line the pocket.
70. The substance of claim 61, which enhances electrostatic interactions with residues that line the pocket .
71. The substance of claim 61, which enhances surface fit with residues that line the pocket.
72. The substance of claim 61, wherein the residue or analog thereof contains a localization of charge so as to render the gpl20-binding region of CD4 able to hydrogen bond more strongly with the hydroxyl- containing side chains lining gpl20.
73. The substance of claim 61, wherein the residue or analog thereof contains at least one additional carbon grou .
74. The substance of claim 61, wherein the modification involves replacement of the residue at position 43 with a cysteine and substitution of the sulfhydryl group .
75. The substance of claim 61, wherein the modification involves replacement of the residue at position 43 with a tyrosine and substitution of the tyrosine.
76. The substance of claim 61, wherein the residue or analog thereof is directly or indirectly linked to an adaptor.
77. The substance of claim 76, wherein the adaptor residue or analog thereof is directly or indirectly linked to a hydrophobic compound to form a complex.
78. The substance of claim 77, wherein the formed complex is larger than 7 A across its longest dimension.
79. The substance of claim 77, wherein the complex is larger than 10 A across its longest dimension.
80. A pharmaceutical composition capable of inhibiting cell entry by human immunodeficiency virus, comprising a. an effective amount of the substance of claim 61 ; and b. a pharmaceutically acceptable carrier.
81. A composition capable of inhibiting cell entry by human immunodeficiency virus, comprising a. an effective amount of a substance mimicking the human immunodeficiency virus envelope glycoprotein gpl20 -binding region of CD4 wherein the size of a residue or analog thereof, corresponding to the phenylalanine at position 43 in the native CD4 , is larger than the size of phenylalanine so as to fill the pocket on gpl20 which extends beyond position 43 in the gpl20/CD4 complex and increase the affinity for gpl20; and b. a suitable carrier.
82. A pharmaceutical composition for treating or preventing human immunodeficiency virus infection, comprising a. an effective amount of the substance of claim 61 ; and b. a pharmaceutically acceptable carrier.
83. A composition for treating or preventing human immunodeficiency virus infection, comprising a. an effective amount of the substance of claim 61 ; and b. a suitable carrier.
84. A method of inhibiting cell entry by human immunodeficiency virus, comprising contacting the cells with an effective amount of the substance of claim 61 to inhibit cell entry by human immunodeficiency virus.
85. A method of treating or preventing human immunodeficiency virus infection in a subject, comprising administering to the subject an effective amount of the substance of claim 61, thereby treating or preventing human immunodeficiency virus infection.
86. A variant of gpl20 which presents a hidden, conserved, neutralization epitope.
87. The variant of 86 wherein position 375 is changed from a Serine to a Trptophan.
88. The variant of claim 87 further comprising one of the following changes: 88N to P, 102E to L, 113D to R, 117K to W, 257T to A, 266A to E, 386N to Q, 395W to S, 421K to L, 470P to G, 475M to S, 485K to V or a combination thereof.
89. A composition comprising the variant of claim 86, 87 or 88 and a suitable carrier.
90. A vaccine comprising the variant of claim 86, 87 or 88.
91. A method for inducing antibody against HIV in a subject comprising adminstering an effective amount of the variant of claim 86 to the subject.
92. The method of claim 91, wherein the subject is a human .
93. The vaccine of claim 91, further comprising a suitable adjuvant.
94. An antibody against the variant of claim 86, 87 or 88.
95. The antibody of claim 94, wherein the antibody is neutralizing to HIV.
96. The antibody of claim 94, wherein the antibody is a monoclonal antibody.
EP98959406A 1997-11-10 1998-11-10 X-ray crystal comprising hiv-1 gp120 Withdrawn EP1037963A4 (en)

Applications Claiming Priority (24)

Application Number Priority Date Filing Date Title
US96714897A 1997-11-10 1997-11-10
US96698797A 1997-11-10 1997-11-10
US96740397A 1997-11-10 1997-11-10
US96693297A 1997-11-10 1997-11-10
US966987 1997-11-10
US967148 1997-11-10
US966932 1997-11-10
US967403 1997-11-10
US97674197A 1997-11-24 1997-11-24
US976741 1997-11-24
US8958098P 1998-06-17 1998-06-17
US8958198P 1998-06-17 1998-06-17
US89580 1998-06-17
US10052198A 1998-06-18 1998-06-18
US10063198A 1998-06-18 1998-06-18
US10052998A 1998-06-18 1998-06-18
US10076298A 1998-06-18 1998-06-18
US10076398A 1998-06-18 1998-06-18
US100529 1998-06-18
US100521 1998-06-18
US100762 1998-06-18
US100763 1998-06-18
US100631 1998-06-18
PCT/US1998/023905 WO1999024553A2 (en) 1997-11-10 1998-11-10 X-ray crystal comprising hiv-1 gp120

Publications (2)

Publication Number Publication Date
EP1037963A1 true EP1037963A1 (en) 2000-09-27
EP1037963A4 EP1037963A4 (en) 2004-09-22

Family

ID=27583731

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98959406A Withdrawn EP1037963A4 (en) 1997-11-10 1998-11-10 X-ray crystal comprising hiv-1 gp120

Country Status (2)

Country Link
EP (1) EP1037963A4 (en)
WO (1) WO1999024553A2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6743594B1 (en) 1995-06-06 2004-06-01 Human Genome Sciences, Inc. Methods of screening using human G-protein chemokine receptor HDGNR10 (CCR5)
US6025154A (en) 1995-06-06 2000-02-15 Human Genome Sciences, Inc. Polynucleotides encoding human G-protein chemokine receptor HDGNR10
US20030125518A1 (en) * 2001-12-01 2003-07-03 Crevecoeur Harry F. Surface simulation synthetic peptides useful in the treatment of hyper-variable viral pathogens
EP1940867A2 (en) 2005-09-06 2008-07-09 The Government of the United States of America as represented by the Secretary of the Department of Health and Human Services Hiv gp120 crystal structure and its use to identify immunogens
US20130101617A1 (en) * 2010-06-30 2013-04-25 Torrey Pines Institute For Molecular Studies Env trimer immunogens
CN114609392A (en) * 2022-03-08 2022-06-10 武汉科技大学 Screening method and application of HIV (human immunodeficiency virus) fully-humanized broad-spectrum neutralizing antibody

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
GHIARA J B ET AL: "Structure-based design of a constrained peptide mimic of the HIV-1 V3 loop neutralization site" JOURNAL OF MOLECULAR BIOLOGY, LONDON, GB, vol. 266, no. 1, 14 February 1997 (1997-02-14), pages 31-39, XP004462201 ISSN: 0022-2836 *
KWONG P D ET AL: "STRUCTURE OF AN HIV gp120 ENVELOPE GLYCOPROTEIN IN COMPLEX WITH THE CD4 RECEPTOR AND A NEUTRALIZING HUMAN ANTIBODY" NATURE, MACMILLAN JOURNALS LTD. LONDON, GB, vol. 393, 18 June 1998 (1998-06-18), pages 648-659, XP002093502 ISSN: 0028-0836 *
OXFORD J S ET AL: "New scientific developments towards an AIDS vaccine: report on a workshop organized by EU programme EVA entitled Novel approaches to AIDS vaccine development held at the Institut Pasteur, Paris" VACCINE, BUTTERWORTH SCIENTIFIC. GUILDFORD, GB, vol. 14, no. 17, 1 December 1996 (1996-12-01), pages 1712-1717, XP004016836 ISSN: 0264-410X *
RINI JAMES M ET AL: "Crystal structure of a human immunodeficiency virus type 1 neutralizing antibody, 50.1, in complex with its V3 loop peptide antigen" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES, vol. 90, no. 13, 1993, pages 6325-6329, XP002276067 1993 ISSN: 0027-8424 *
SANEJOUAND Y-H: "On the role of CD4 conformational change in the HIV-cell fusion process" COMPTES RENDUS DES SEANCES DE L'ACADEMIE DES SCIENCES. SERIE III: SCIENCES DE LA VIE, ELSEVIER, AMSTERDAM, NL, vol. 320, no. 2, February 1997 (1997-02), pages 163-170, XP004349275 ISSN: 0764-4469 *
See also references of WO9924553A2 *
STURA E A ET AL: "CRYSTALLIZATION, SEQUENCE, AND PRELIMINARY CRYSTALLOGRAPHIC DATA FOR AN ANTIPEPTIDE FAB 50.1 AND PEPTIDE COMPLEXES WITH THE PRINCIPAL NEUTRALIZING DETERMINANT OF HIV-1 GP120" PROTEINS: STRUCTURE, FUNCTION AND GENETICS, ALAN R. LISS, US, vol. 14, no. 4, December 1992 (1992-12), pages 499-508, XP001069692 ISSN: 0887-3585 *
WYATT R ET AL: "THE ANTIGENIC STRUCTURE OF THE HIV gp120 ENVELOPE GLYCOPROTEIN" NATURE, MACMILLAN JOURNALS LTD. LONDON, GB, vol. 393, 18 June 1998 (1998-06-18), pages 705-711, XP002093503 ISSN: 0028-0836 *

Also Published As

Publication number Publication date
WO1999024553A9 (en) 2001-05-31
WO1999024553A8 (en) 1999-07-22
WO1999024553A3 (en) 2003-12-11
WO1999024553A2 (en) 1999-05-20
EP1037963A4 (en) 2004-09-22

Similar Documents

Publication Publication Date Title
Kwong et al. Structures of HIV-1 gp120 envelope glycoproteins from laboratory-adapted and primary isolates
Kong et al. Antigenicity and immunogenicity in HIV-1 antibody-based vaccine design
AU2013316100B8 (en) Antibody evolution immunogens
EP3189067A1 (en) Recombinant hiv-1 envelope proteins and their use
US8268323B2 (en) Conformationally stabilized HIV envelope immunogens
Pejchal et al. Structure-based vaccine design in HIV: blind men and the elephant?
AU2017259275C1 (en) Compositions and methods related to HIV-1 immunogens
EP2873423B1 (en) Soluble hiv-1 envelope glycoprotein trimers
Liu et al. Structure of the HIV-1 gp41 membrane-proximal ectodomain region in a putative prefusion conformation
US20140348865A1 (en) Immunogens based on an hiv-1 v1v2 site-of-vulnerability
US7048929B1 (en) Stabilized primate lentivirus envelope glycoproteins
Shotton et al. Identification and characterization of monoclonal antibodies specific for polymorphic antigenic determinants within the V2 region of the human immunodeficiency virus type 1 envelope glycoprotein
Daniels et al. Antibody responses to the HIV-1 envelope high mannose patch
EP3423472A1 (en) Compositions comprising hiv envelopes to induce ch235 lineage antibodies
WO1999024065A1 (en) COMPOUNDS INHIBITING CD4-gp120 INTERACTION AND USES THEREOF
Prabakaran et al. Structure and function of the HIV envelope glycoprotein as entry mediator, vaccine immunogen, and target for inhibitors
US20150313990A1 (en) Hiv therapeutics and methods of making and using same
EP1037963A1 (en) CRYSTAL COMPRISING HUMAN IMMUNODEFICIENCY VIRUS ENVELOPE GLYCOPROTEIN gp120, COMPOUNDS INHIBITING CD4-gp120 INTERACTION, COMPOUNDS INHIBITING CHEMOKINE RECEPTOR-gp120 INTERACTION, MIMICS OF CD4 AND gp120 VARIANTS
Wu et al. Structural basis of diverse peptide accommodation by the rhesus macaque MHC class I molecule Mamu-B* 17: insights into immune protection from simian immunodeficiency virus
WO2004053100A2 (en) Immunogenic mutant human immunodeficiency virus gp120 polypeptides, and methods of using same
Moyo Role of envelope compactness and glycosylation in HIV-1 resistance to neutralising antibody responses
Ghiara Structural mapping of the V3 loop neutralization site of HIV-1: Crystallographic analysis of Fab-gp120 peptide complexes
US20020173446A1 (en) Compounds which bind to the central cavity between HIV-1 gp120 and CD4 and uses thereof
Wu Structural characterizations of the dimeric anti-HIV antibody 2G12 and the HIV-2 envelope glycoprotein
EA041182B1 (en) COMPOSITIONS AND METHODS ASSOCIATED WITH HIV-1 IMMUNOGENS

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000609

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

D17D Deferred search report published (deleted)
D17D Deferred search report published (deleted)
RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 01N 23/207 B

Ipc: 7G 01N 23/201 B

Ipc: 7G 01N 23/20 B

Ipc: 7C 07K 5/00 B

Ipc: 7A 61K 38/00 A

RTI1 Title (correction)

Free format text: X-RAY CRYSTAL COMPRISING HIV-1 GP120

RIC1 Information provided on ipc code assigned before grant

Ipc: 7A 61K 38/00 B

Ipc: 7C 07K 16/08 B

Ipc: 7C 07K 14/73 B

Ipc: 7C 07K 14/16 A

A4 Supplementary search report drawn up and despatched

Effective date: 20040805

17Q First examination report despatched

Effective date: 20050215

17Q First examination report despatched

Effective date: 20050215

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070512