WO2005026379A2

WO2005026379A2 - The mapping and reconstitution of a conformational discontinuous binding surface

Info

Publication number: WO2005026379A2
Application number: PCT/US2004/029290
Authority: WO
Inventors: Dimitri Denisov; Galina Denisova; Jonathan M. Gershoni
Original assignee: Ramot At Tel Aviv University
Priority date: 2003-09-08
Filing date: 2004-09-08
Publication date: 2005-03-24
Also published as: WO2005026379A3; US20070005262A1; US7587281B2; USRE44747E1

Abstract

The structure of conformational, discontinuous binding surfaces that associate with a binding molecule, preferably the epitopes of monoclonal antibodies (mAbs) may be discovered. The binding molecule is used to select specific peptides from a peptide library that, in turn, are used as a binding surface (epitope) defining database that is applied via a novel computer algorithm to analyze the crystalline-structure of the original binding surface (antigen). An antigenic epitope-mimetic that is recognized by its original mAb may be reconstituted based on the segments of the epitope identified in the prediction.

Description

THE MAPPING AND RECONSTITUTION OF A CONFORMATIONAL DISCONTINUOUS BINDING SURFACE FIELD OF THE INVENTION [0001] The present invention is directed to a' method for the discovery of the structure of conformational, discontinuous binding surfaces that associate with a binding molecule, preferably the epitopes of monoclonal antibodies (mAbs) . More particularly, the binding molecule, such as a mAb, is used to select specific peptides from a peptide library that, in turn, are used as a binding surface (epitope) defining database that is applied via a novel computer algorithm to analyze the crystalline-structure of the original binding surface (antigen) . The algorithm is based on the following: (1) most contacts between a mAb and an antigen are through side-chain atoms of the residues; (2) in the three-dimensional structure of a protein, amino acids remote in linear sequence can juxtapose to one another through folding; (3) tandem amino acids of the selected phage-displayed peptides can represent pairs of juxtaposed amino acids of the antigen; (4) contact- residues of the epitope are accessible to the antigen surface; . and (5) the most frequent tandem pairs of amino acids in the selected phage-displayed peptides can reflect pairs of juxtaposed amino acids of the epitope. Application of the algorithm enables prediction of epitopes. The present- invention is further directed to the reconstitution of an antigenic epitope-mimetic that is recognized by its original mAb, based on the segments of the epitope identified in the prediction. BACKGROUND OF THE INVENTION [0002] The prediction of antigenic determinants is a difficult and uncertain task. Antibody: antigen interfaces have been generally assumed to be hydrophilic and transiently accessible to the surrounding milieu bathing the antigen and, as such, distinct from the subunit : subunit interface of multimeric protein complexes. These assumptions were the basis for the original Hopp and Woods predictive algorithm that sought to identify hydrophilic stretches in the protein linear sequence (Hopp, 1993) . This was accomplished by assigning a hydrophilicity value for each of the 20 amino acids and calculating an average score for hexapeptides along the sequence of the antigen. Since then, numerous improved predictive algorithms have been published (Hopp, 1993; Hofmann et al, 1987; Pauletti et al, 1985; Van Regenmortel et al , 1994) . Measuring the partition of model synthetic peptides in HPLC analyses has developed empirical hydrophilic values (Parker et al, 1986) . Parameters for flexibility (Hopp, 1984) , accessibility (Jones et al, 1997) and even antigenicity (Welling et al, 1985) have been introduced in an effort to increase the success rate for accurate prediction of binding surfaces (Van Regenmortel, 1999) . Van Regenmortel, who has contributed much to this field, has published numerous detailed and comprehensive reviews and comparisons of predictive algorithms (see, for example, Van Regenmortel, 1999) . [0003] The essence of these studies goes towards attempting to learn the fundamental rules for biorecognition and to apply this knowledge to discover potential epitopes of a given antigen. The initial approach has dealt with the linear aspect of protein antigens and is unable to address the more realistic situation of conformational epitopes. Ninety percent of all epitopes are predicted to be discontinuous and highly conformational (Van Regenmortel, 1996) . Van Regenmortel has argued that even a three dimensional analysis is still insufficient as one must also consider the fourth dimension - time, which plays a role in the conformational induced-fit of the epitope to better conform to its corresponding paratope of the antibody, and vice versa (Van Regenmortel, 1996) . [0004] A major step forward in understanding the nature of the epitope has been due to the co-crystallization of antibody:antigen complexes and solution of their structures at high resolution. Thus, as opposed to the original notion that antigen binding surfaces should comprise only 5-7 amino acid residues (Rabat, 1968) , B cell-epitopes are now considered to contain 15-20 residues derived from 2-5 peptide segments of the ,2 antigen, occupying a surface of 700-900 A (Lo Conte, 1999; Chakrabarti et al, 2002; Jones et al, 1997). Furthermore, epitopes have been found to incorporate hydrophobic and aromatic residues in addition to hydrophilic and charged amino acids (Glaser et al, 2001) . The degree of conformational complementarity between the epitope and paratope is less, complete than might have been expected and water molecules play a significant role in bridging the binding surfaces and "filling-in the gaps" (Xu et al, 1997) . [0005] An effective humoral response towards an infectious agent is the ability of antibodies to bind and inactivate the pathogen. Vaccines, designed to induce the production of such antibodies, are typically derivatives of the pathogen, i.e., killed whole cells, attenuated live pathogens, fragments of antigens or DNA corresponding to the latter (Ellis et al, 2001; Hansson et al, 2000) . Whatever the modality, the purpose of the vaccine is to stimulate neutralizing immunity in the naive individual in preparation of future encounters with fully virulent field-isolates of the pathogen. Correspondence between the vaccine and the field-isolate of the pathogen must be substantial, therefore, to ensure its efficacy. In cases where the pathogen undergoes extensive genetic variation, the ability to formulate an effective vaccine may present what appears to be an insurmountable obstacle. Such seems to be the case, for example, for HIV-1, the etiological agent of the AIDS epidemic, that is continuously selected for its ability to evade immune surveillance (Burton et al, 1998; Montefiori et al, 1999; Hoffman-Lehman et al, 2002) . [0006] HIV-1, as a result of each infectious cycle, accumulates numerous random mutations providing it with an endless source of variants (Wang et al, 2002; Moore et al, 2001) . Nonetheless, over the years a few examples of highly cross-reactive and neutralizing monoclonal anti-HIV antibodies have been described - illustrating that protective immunity is possible (Mascola et al, 1999; Gauduin et al, 1997; Burton et al, 1994; Zwick et al, 2001b; Muster et al, 1993; Trkola et al, 1996; Conley et al, 1994; Van Regenmortel, 1996) . This has been substantiated by experiments in which cocktails of mixtures of these mAbs, administered as passive immunotherapy, have proven effective in preventing the infection of CD4+ lymphocytes both in vi tro and in vivo (Mascola et al, 1999; Gauduin, 1997) . Thus, a rational approach to the design of a cross-reactive antibody response against AIDS can be proposed as follows: • First, one must accumulate a collection of genuine broadly cross-reactive and neutralizing mAbs (to date at least 4 exist (Burton et al, 1994; Zwick et al, 2001b; Muster et al, 1993; Trkola et al, 1996; Conley et al, 1994) ) . • These, in turn, are used to discover their corresponding epitopes within the antigens of HIV-1. • Once mapped, the epitopes are to be reconstituted as synthetic versions that must be both antigenic (i.e., recognized by the original mAbs) and immunogenic (i.e., able to elicit in the naive individual the production of antibodies that are as effective as the original mAbs themselves) . [0007] Unfortunately, such a protocol turns out to be a very difficult task as the "interesting" mAbs against HIV-1 (and in fact against most pathogens) typically correspond to highly conformational epitopes that are comprised of discontinuous segments of the viral antigen (Van Regenmortel et al, 1996) . Very often, even linear epitopes show conformational preferences and dependence on the context of a protein antigen (Ho et al, 2002) . Thus, one is faced with a fundamental problem, namely: how can one discover the precise molecular design of conformational discontinuous epitopes of highly desirable mAbs of clinical importance? SUMMARY OF THE INVENTION [0008] Accordingly, it is an object of the present invention to solve the problems of the prior art . [0009] It is a further object of the present invention to discover the molecular design of conformational discontinuous epitopes of highly desirable monoclonal antibodies of clinical importance . [0010] It is another object of the present invention to predict the region on the surface of a proteinaceous material representing a basic element of a binding surface that associates with a predetermined binding molecule. [0011] It is yet another object of the present invention to identify the basic elements of a binding surface on a proteinaceous material, which binding surface associates with a predetermined binding molecule. [0012] It is still another object of the present invention to provide a method of producing a binding surface mimetic. [0013] It is still a further object of the present invention to provide a pharmaceutical composition including one or more of the basic elements of the binding surface of gpl20 that is recognized by a broadly neutralizing antibody. [0014] It is still another object of the present invention to provide a molecule mimetic of the binding surface of gpl20 that is recognized by a broadly neutralizing antibody. [0015] More specifically, the present invention is directed to a method for improved prediction of the region on the surface of a proteinaceous material representing a basic element of a binding surface that associates with a predetermined binding molecule comprising: (a) screening a peptide library with said predetermined binding molecule to identify a plurality of peptides that bind to said binding molecule; (b) determining the amino acid sequence of each identified peptide; (c) assigning a symbol to each class of amino acid residue represented in the library and presenting each said sequence as a string of said symbols; (d) calculating the frequency of occurrences of each tandem pair of symbols that exist in the strings of symbols presented in step (c) ; (e) identifying those tandem pairs of symbols , the number of occurrences of which is statistically significant; (f) mapping on a three-dimensional model of the proteinaceous material those pairs of amino acids represented by the tandem pairs of symbols identified in step (e) , wherein a pair of amino acids is two amino acids, each of which are accessible to the surface of the proteinaceous material and whose alpha carbons are separated by no more than a predetermined distance; and (g) determining clusters of amino acid pairs mapped in (f) , each amino acid pair in the cluster being topographically related to at least one other pair in the cluster. [0016] Each cluster that is determined in step (g) is a predicted region on the surface of the proteinaceous material representing the binding surface. [0017] The present invention is further directed toward identifying the basic elements of such a binding surface. First, a cluster is identified by the method for improved prediction discussed above. Next, the outermost amino acids of binding pairs in the cluster are identified so as to define the perimeter of the predicted binding surface . All other amino acids on the surface of the proteinaceous material situated within the perimeter of the predicted binding surface, or that are within a predetermined distance therefrom, are then also identified. Finally, the basic elements of the binding surface are identified in which each such element is a linear segment of the proteinaceous material whose first and last residues are amino acids previously identified, and none of the intermediate amino acids thereof are amino acids on the surface of the proteinaceous material not so identified. Any amino acid so identified that is not part of such a linear segment is considered to be a basic element of a single amino acid. [0018] The basic elements so identified may be used to produce a binding surface mimetic by connecting the basic elements in such a manner as to maintain the relative spatial orientation of the amino acids of the basic elements, thereby identifying a molecule that is mimetic of said binding surface. That mimetic molecule may then be produced. [0019] An important broadly neutralizing antibody that interacts with the CD4 binding site on HIV-I surface glycoprotein gpl20 and neutralizes many primary and TCLA viruses very efficiently is mAb bl2 (Burton et al, 1994) . Peptides identified by phage library screening have been identified in Zwick et al (2001a) and Boots et al (1997) . The algorithm of the present invention can be applied to these previously-disclosed peptides in order to predict a cluster of amino acid pairs that represents the binding surface recognized by this antibody and, from that, to identify the basic elements thereof and produce a binding surface mimetic molecule that is expected to be recognized by this antibody. Thus, the present invention is further directed to a pharmaceutical composition including one or more of the basic elements of the binding surface of gpl20 that is recognized by mAb bl2, which composition comprises a pharmaceutically acceptable carrier and one or more peptides selected from the group consisting of amino acids 110-118 of SEQ ID NO:l, amino acids 391-396 of SEQ ID NO:l, and amino acids 457-468 of SEQ ID NO:l. The composition may further include a peptide of amino acids 360- 361 of SEQ ID NO : 1. [0020] The invention further relates to a molecule mimetic of the binding surface of gpl20 that is recognized by mAb bl2 that is obtained by connecting the four peptides mentioned in the previous paragraph, each in forward or reverse sequence, in such a manner as to form a single molecule that maintains the spatial orientation that the amino acids thereof have when they are positioned at 110-118, 360-361, 391-396 and 457-468 of gpl20. BRIEF DESCRIPTION OF THE DRAWINGS [0021] Figure 1 shows a scheme of the algorithm used in the present invention. [0022] Figure 2A shows the sequences of inserts of eleven phages (SEQ ID NOs: 2-12, respectively) selected by screening phage display peptide libraries with mAb 17b. Figure 2B shows pair composition of 17b phage inserts. Figure 2C shows the amino acid pairs ranked according to their occurrences in the 17b-specific peptides. [0023] Figure 3A shows random frequencies (%) of amino acid pairs in the phage display peptide library that were calculated by multiplying the frequencies of amino acids in each pair. Random frequencies of amino acids in phage display peptide library inserts are shown in parentheses. Figure 3B is a bar graph showing the identification of statistically significant pairs (SSP's): comparison of theoretical (f_r) (stippled columns) and experimentally observed (solid columns) occurrences of 17b amino acid pairs. Error bars present f as a measure of random error. The theoretical occurrences are based on randomness and were calculated by multiplying the theoretical frequency of a pair (n_r) with the total number of pairs in a specific set of peptides (A) [f_r=n_r-A] . The theoretical probabilities of the BC and CB pairs were calculated taking into account the "semi-random" nature of these pairs . [0024] Figure 4A shows clusters of 17b epitope pairs. In the column "Pair", amino acids (in single letter codes) are presented which were identified by the algorithm as located on a surface of the protein antigen at close distance (shown in the column D) . In the column "aa", the numbers of these amino acids in the gpl20 sequence (SEQ ID NO:l) and their 3-letter codes are presented. These amino acid pairs form "connected clusters". Horizontal lines divide the different clusters. Within the parentheses, the first number is the number of pair types, the second is the number of different amino acids in a cluster. D is the distance between backbone atoms in A, and aa is amino acids. Figures 4B and 4D show RasMol representations of the 17b epitope. Figure 4C shows a model of gpl20 backbone + genuine 17b epitope residues, and Figure 4E shows the predicted "cluster A", displayed as space fill residues. In Figures 4C and 4E, ribbon presentations of genuine and predicted epitope elements, respectively, are shown. Grey represents either non-predicted or non-contact residues, and black represents either predicted or contact residues. [0025] Figures 5A-C show the analysis of the 13b5 mAb. Figure 5A shows the sequences of inserts of sixteen phages (SEQ ID NOs: 13-28, respectively) selected by screening phage display peptide libraries with mAb 13b5. Figure 5B shows the pair composition of 13b5 phage inserts. Figure 5C shows clusters of 13b5 epitope pairs. In the column marked "Pair", amino acids (in single letter codes) are presented which were identified by the algorithm as located on a surface of the protein antigen at close distance (shown in the column D) . In the column marked "aa", the numbers of these amino acids and their one-letter names are presented. These amino acid pairs form "connected clusters". Horizontal lines divide the different clusters. In parentheses: the first number is the number of pair types, the second the number of different amino acids in a cluster. "D" is the distance between backbone (alpha carbon) atoms in A. Figure 5D is the RasMol representation of the 13b5 epitope. A model of the p24 backbone and the epitope is displayed as space fill residues. The spheres of lightest color are only predicted residues; the spheres of intermediate darkness are only experimental residues; and the black spheres are coinciding residues. [0026] Figure 6 shows the comparison of CDR-loop sequences of 17b and CG10 mAbs (SEQ ID NOs: 29-40 as designated in the figure) . Coinciding residues are shown in bold. [0027] Figure 7A shows sequences (SEQ ID NOs: 41-68) of inserts of phages selected by screening phage display peptide libraries with the CG10 mAb. Figure 7B shows the pair composition of CG10 phage inserts. Figure 7C shows the amino acid pairs ranked according to their occurrences in the CG10- specific peptides. Figure 7D is a bar graph showing the identification of SSP's, comprising the theoretical and experimental occurrences of CG10 amino acid pairs. [0028] Figure 8 is a cluster analysis of the CG10 epitope. The first number in the parentheses is the number of pair types, the second is the number of different amino acids in a cluster. "D" is the distance between backbone atoms in A, and aa is amino acids. [0029] Figures 9A-9C are RasMol depictions of the CG10 epitope. Figure 9A is a comparison of the 17b and predicted CG10 epitopes. White - amino acid residues exclusive to the 17b epitope; black - amino acid residues predicted only for the CG10 epitope; grey - amino acid residues shared by both epitopes. Figure 9B is a three-dimensional model of gpl20 where "cluster A" predicted by computer algorithm is displayed as space fill amino acids. Figure 9C is a mutational analysis of the CG10 epitope (adapted from Rizzuto et al, 1998) . Note that five residues circled in Figure 9C were predicted as is shown in Figure 9B. [0030] Figures 10A-C are a reconstitution of the CG10 epitope. Figure 10A shows four peptide fragments (residues 119-123, 430-435, 421-423 and 205-207, respectively, of SEQ ID NO:l) of the CG10 epitope according to the prediction, connected by GPG linkers, with an additional 381-382 fragment added for conformational requirements. In Figure 10B, directions (from N-terminus to C-terminus) are shown. The sequence of the reconstituted epitope (SEQ ID NO: 69) is presented in Figure IOC. Figure 10D is a dot blot analysis of CG10 binding to phage G12. Figure 10E shows the G12 phage binding by CG10 vs CGI, 17b and CG25 mAbs. Figure 10F is a graph showing the competition between G12 phage and the CD4/gpl20 complex for CG10 mAb binding. [0031] Figure 11A shows sequences (SEQ ID NOs: 70-103) of inserts of phages selected by screening phage display libraries with the bl2 mAb. Figure IIB shows the pair composition of bl2 phage inserts. Figure 11C shows the amino acid pairs ranked according to their occurrence in the bl2-specific peptides. [0032] Figure 12 is a graph showing the identification of statistically significant pairs: comparisons of theoretical (dark columns) and experimental (light columns) occurrence of bl2 amino acid pairs . [0033] Figures 13A and 13C show clusters of bl2 epitope pairs. Figures 13B and 13D show models of gpl20 backbone with the predicted clusters displayed as space fill residues. The circled cluster in Figure 13B represents the cluster identified in Figure 13A, and similarly for Figures 13D and 13C. Figure 13E is a ribbon presentation of the predicted epitope elements of bl2. Gray represents non-predicted residues, and black represents predicted residues. [0034] Figures 14A-14C are a reconstitution of the bl2 epitope. Figure 14A shows three peptide fragments of the bl2 epitope according to the prediction, the first being residues 391-396 of SEQ ID NO:l, the second being 358-362 of SEQ ID NO:l, and the third being 457-468 of SEQ ID N0:1. The first and second fragments are linked by a TPGS (residues 9-12 of SEQ ID NO: 104) linker, and the second and third fragments are linked by another TPGS (residues 18-21 of SEQ ID NO: 104) linker. The third fragment is linked to the first, substantially maintaining the spatial orientation by a SLWDQSLC (residues 34-41 of SEQ ID NO: 104) linker. In Figure 14B, directions (from N-terminus to C-terminus) are shown. The sequence of the reconstituted epitope (SEQ ID NO: 104) is presented in Figure 14C. [0035] Figures 15A-15C show a reconstitution of the bl2 epitope that is a variation of that of Figures 14A-14C. Figure 15A shows the three peptide fragments (residues 391-396, 358- 362 and 457-468 of SEQ ID NO:l), linked by TPGS (residues 9-12 and 19-22 of SEQ ID NO: 105) linkers, with double cysteine loops. In Figure 15B, directions (from N-terminus to C- terminus) are shown, as well as the location of the disulfide bonds. The sequence of the reconstituted epitope (SEQ ID NO: 105) is presented in Figure 15C. [0036] Figures 16A shows the structure of the three binding elements of Figure 13E, including the distance between the terminal alpha-carbons thereof. Figures 16B and 16C show the structure of the peptide elements that are part of the CD4 protein (SEQ ID NO: 106) and the V_H domains of the murine IgGl protein (SEQ ID NO:107), respectively. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS [0037] As used throughout the present specification and claims, the following definitions apply: [0038] A "binding surface" is that portion of a proteinaceous material that associates with a binding molecule. [0039] A "proteinaceous material" is any protein, or fragment thereof, or complex containing one or more proteins formed by any means, such as covalent peptide bonds, disulfide bonds, chemical crosslinks, etc., or non-covalent associations, such as hydrogen bonding, van der Waal's contacts, electrostatic salt bridges, etc. [0040] The "binding molecule" is any molecule, whether or not proteinaceous, that associates with a binding surface, i.e., binds to the binding surface with specificity dependent on the structure of that surface and with an affinity greater than K_D = 10^"5M. While the binding molecule-binding surface association is mainly discussed herein from the standpoint of the antibody-antigen association, such associations may be a ligand-receptor association, a receptor-hormone association, an enzyme-substrate association, or any other protein-protein interaction. [0041] While the binding molecule may itself be a proteinaceous material, such as an antibody, an enzyme, a ligand, a receptor, etc., it is not necessarily proteinaceous. Thus, for example, the binding molecule may be a polynucleotide sequence or a sugar molecule, gangliosides, lipids, etc. Among examples of applications for the process of the present invention using non-proteinaceous binding molecules are the following. One may screen a peptide library with a complex glyco moiety for the purpose of identifying the active binding surface of a lectin specific for that sugar. Another example is screening with a specific DNA sequence for the discovery of the binding surface of transcription factors such as repressors or inducers of gene expression that bind to that surface. The active binding surface of any DNA binding protein may be found by this method. [0042] In general, when the binding surface and the binding molecule are both proteinaceous materials, they may be interchangeable, such that either may be denominated the binding surface or the binding molecule. However, while the binding molecule need not be proteinaceous, the binding surface must be on a proteinaceous material. Thus, where one of the binding pairs is a proteinaceous material and the other is non- proteinaceous, the proteinaceous material is always considered to be the binding surface, and the non-proteinaceous material will always be considered to be the binding molecule. [0043] A "peptide library" is a collection of peptides, preferably ranging from 5 to 25 amino acid residues in length. The collection of peptides may be a random collection or it may be rationally designed based on the composition of the proteinaceous material of which the binding surface is a part. The greater the number of different peptides in the library, the better. Preferably, in the case of a random peptide library, it should contain more than 10⁷ different peptides. An example of a rationally defined peptide library may be that disclosed in WO 98/20169. [0044] In the peptide library, the peptides may be displayed by any means, such as, for example, peptides displayed on phage, a combinatorial library of synthetic peptides on beads, etc. Phage display libraries of random peptides are well known in the art. See, for example, Enshell-Seij ffers et al (2002). [0045] The term "statistically significant", when determining which tandem pairs of symbols are to be used for further processing, means that the number of occurrences of that tandem pair is greater than would be predicted from randomness by an amount that is statistically significant. [0046] The term "three-dimensional model" means a calculated or predicted structure in which the XYZ coordinates of at least the alpha-carbon of each amino acid thereof is specified. An example of such a three-dimensional model would be a solved crystal structure or a structure determined by NMR- spectroscopy. Alternatively, a predicted structure can be derived by superimposition of an unknown structure onto a known structure . [0047] The term "mapping" means identifying the physical location on the three-dimensional model of the identified amino acids . [0048] The "predetermined distance" used in the mapping step is a function of the degree of stringency designed into the analysis. The greater the distance used, the poorer will be the resolution of the results. A preferred range is 5- 15 A. [0049] A "cluster of amino acid pairs" is an identified group of amino acid pairs, each amino acid pair of which is topographically related to at least one other pair in the group. Two amino acid pairs are "topographically related" if they either share an amino acid residue or each has an amino acid residue that falls within a predetermined distance of one another. The predetermined distance must be greater than the amino acid difference that defines a pair of amino acids and may be as great as three times that distance. The preferred relationship is that in which the two amino acid pairs share a common amino acid. A cluster has two or more pairs of amino acids. Any cluster so identified is a predicated region on the surface of the proteinaceous material representing a basic element of the binding surface being sought . The greater the number of pairs of amino acids in the cluster, the greater the likelihood this cluster represents the binding surface or a significant portion thereof. [0050] The present invention is a systematic approach designed to discover the precise molecular design of conformational, discontinuous binding surfaces, which are bound by highly desirable binding molecules. Preferably the binding surfaces are epitopes and the binding molecules are monoclonal antibodies (mAbs) . It is based on using specific binding molecules, such as mAbs, to screen peptide libraries, such as combinatorial phage display peptide libraries. The binding molecule-specific phages are then used as a binding surface- defining database to which is applied a novel computer algorithm to analyze the crystalline structures of the proteinaceous material, of which the binding surface is a part. Thus, if the proteinaceous material is the gpl20 envelope protein of HIV and the binding material is a mAb specific thereto, the mAb-specific phage may be used to analyze the crystal-like structure of the viral antigen. In this manner, candidate binding surface areas are mapped to the surface of the proteinaceous material. Based on this mapping, Jbαna fide segments of the proteinaceous material are used to reconstitute a binding surface mimetic that is recognized by the original binding molecules. Henceforth, the present specification will' discuss the preferred embodiment in which the binding molecule is a mAb and the binding surface is an epitope. It should be understood, however, that other binding molecules may be substituted for mAbs and other binding surfaces for epitopes . [0051] The development of the present invention was based on the assumption that affinity-selected peptides derived from a vast collection of random peptides, due to their specific binding to the mAb of interest, must reflect structural elements of the original epitope. Initially, this might appear to be a trivial assumption as one could expect to obtain peptides that show linear homologies with the immunogenic antigen. However, in the case of highly conformation-dependent mAbs, the peptides obtained are often diverse and lack obvious homology or relatedness with the original antigen (Felici et al, 1993; Folgori et al, 1994). [0052] The first step, therefore, is to screen libraries so as to accumulate a diverse spectrum of peptides that bind to the binding molecule of the binding molecule-binding surface pair that is being studied. In the case of a mAb-epitope binding pair, these peptides will collectively provide insights to the molecular nature of the epitope recognized by the mAb. [0053] Then, a systematic computer algorithm is used, which is described below, designed to focus on the common denominators of the peptides and use this information to map epitopes onto the surface of the solved crystalline structures of the antigens. [0054] Once this is achieved, the discontinuous epitope can be reconstituted from the segments of the antigen that make up the original epitope . [0055] As indicated above, the first step of the process of the present invention is to repeatedly screen combinatorial phage display peptide libraries (or other peptide libraries) using the monoclonal antibody (or other binding molecule) until a collection of peptides is obtained, each of which bind to the binding molecule. It is expected that 10-20 peptides will be identified in the course of such a screen, although fewer may suffice to make an accurate prediction, and the more that are found, the more accurate the prediction is expected to be. The amino acid sequence is then determined for each peptide identified in the screen as being capable of binding to the binding molecule, such as the mAb, used in the screen. [0056] In order to derive the location of the binding surface on the proteinaceous material of interest, using the peptides identified in the screen, a novel computational algorithm has been developed. Figure 1 is a general scheme of the final algorithm. The algorithm is based on the following considerations : 1. Most of the contacts made between a mAb and its antigen are through side-chain atoms of the residues, rather than through main-chain moieties. This is based on many studies, for example, the analysis of 19 different antibody- antigen complexes (81% of the contacts are contributed by side- chain atoms (Lo Conte et al, 1999)) and is specifically true for the 17b/gpl20 complex for which >70% of the contacts are through the side-chains. 2. In the three-dimensional structure of a protein, amino acids remote in linear sequence can be juxtaposed to one another through folding. A conformational epitope of an antibody can consist of such amino acids and form a cluster of discontinuous amino acids that create a contiguous binding surface . A cluster can therefore contain residues that are tandem in sequence or residues brought together through folding. 3. Tandem amino acids of the selected phage- displayed peptides can represent pairs of juxtaposed amino acids of the binding surface of the antigen. It is assumed that phage displayed peptides, affinity selected via biopanning with the mAb of interest, represent partial structural elements of the epitope . 4. Contact residues of the epitope are accessible to the antigen surface. 5. The most frequent tandem pairs of amino acids in the selected phage-displayed peptides, are most likely to reflect pairs of juxtaposed amino acids of the clustered residues of the Jona fide epitope. [0057] Step I in the computer algorithm is to calculate the number of occurrences of specific amino acid pairs present in the affinity-selected peptides, and identify those pairs that are substantially over-represented as compared to the random frequencies of pairs statistically anticipated. [0058] Calculation of random frequencies of amino acid pairs may be performed as follows. In the random library the frequency of amino acids should reflect the codon-usage employed in the generation of the corresponding random DNA oligonucleotides. For example, in the libraries prepared for the experiments reported in the following examples, a restricted codon-usage was applied in which any of the four phosphoramidites are possible for the first two positions of any codon; however, only G or T are provided for the third position. The reason for this was to avoid the stop codons UAA and UGA as well as to reduce the total redundancy of codons. Furthermore, in order to prevent abortive termination at stop codon UAG, the libraries were produced in bacteria containing a glutamic acid suppression mutation. [0059] Each pair of residues may be analyzed for anticipated frequencies. However, as contacts between the mAb and the antigen are through functional moieties of the R-groups, a preferred embodiment is to consolidate conserved residues into functional subgroups of amino acids. An example of this is the consolidation into the following six functional subgroups, each of which is given single letter codes: R,K = B E,D = J S,T = O L,V,I = U

W,F = Z. [0060] The anticipated frequencies (n_r) of the amino acid pairs for totally random peptides are then calculated. In the above-described library, using the symbols represented by the single letter codes given above for grouped residues and the standard code of the remaining individual residues, the results of such a calculation are given in Figure 3A.< A similar analysis can readily be conducted for any other library that may be used. [0061] Two types of pairs must be considered: completely random pairs, in which both partners of a pair are random and "semi-random" pairs in which only one partner is random, whereas the other is constant (as is in the case for the first and last pair of each peptide containing the constant flanking cysteine residues) . [0062] The amino acid sequences of the peptides found in the screen are first entered into the algorithm. The algorithm then assigns the corresponding symbol to each class of amino acid residue represented in the library and then presents each of the sequences as a string of such symbols. As indicated above, the symbols may be the₍ common single-letter code for each residue or a symbol representing each of the functional subgroups of amino acids. Preferably, each sequence is presented as tandem pairs of said symbols. Thus, for example, if the first peptide is CSGLRNETFLRC (SEQ ID NO:2), the same sequence represented as a string of symbols that take into account the functional subgroups would be COGUBXJOZUBC . This sequence may be represented as the following tandem pairs of amino acid symbols: CO OG GU UB BX XJ JO OZ ZU UB BC. [0063] Once all of the peptides found in the initial screen are presented as tandem pairs, the frequency of occurrences of each tandem pair of symbols that exists in the strings of symbols is calculated. This is simply done by counting the number of times each possible tandem pair appears in the list of tandem pairs that is generated from the sequences of the peptides found in the screen. Thus, for example, for the sequences found in Example 1, discussed below, shown in Figure 2A, the tandem pairs of symbols are shown in Figure 2B, and the number of occurrences for each amino acid pair of the eleven affinity-selected peptides were scored and listed according to their occurrences in Figure 2C. This corresponds to "Product I " in Figure 1. [0064] The next step is to identify those tandem pairs of symbols, the number of occurrences of which is statistically significant. Thus, for example, if the number is no more than would be expected from a random appearance of such pairs, it is not considered to be statistically significant. The expected theoretical number of specific amino acid pairs (f_r) is calculated as the product of the theoretical frequency of a pair (n_r) and the total number of pairs in a specific set of peptides (A) . Thus, the total number of tandem pairs (A) in Figure 2B is 141. The anticipated frequency (n_r) of the most frequently found tandem pair, BU, can be seen from Figure 3A as being 2.34%. Thus, the theoretical frequency (f_r) of BU in the 141 tandem pairs of Figure 2B is 3.3, as can be seen in the stippled column above UB in Figure 3B . Even taking into account random error, it is apparent that the experimental number of occurrences (10) is substantially greater than theoretical and, thus, the number of occurrences for UB is statistically significant. On the other hand, the experimental number of the BU tandem pair is not greater than the theoretical statistically significant amount. [0065] The next step is to map, on a three-dimensional model of the proteinaceous material, those pairs of amino acids represented by the statistically significant tandem pairs of symbols. For the purpose of this mapping, a "pair of amino acids" is two amino acids, each of which are accessible to the surface of the proteinaceous material and whose alpha-carbons are separated by no more than a predetermined distance. Preferably, this is done by first preparing a database containing the physical distances between each pair of alpha- carbon atoms in the entire proteinaceous material . As the three-dimensional structure of the proteinaceous material has already been solved as a prerequisite to its use in the process of the present invention, it is readily within the skill of one of ordinary skill in the art to calculate the distance between any two alpha-carbons, repeat this for every possible pair, and then rank each pair in order of the distance between them. This is done without regard to whether or not the members of the pair are adjacent to one another in the linear sequence that makes up the proteinaceous material . This may be done by hand or by computer. [0066] As indicated above, those pairs that are considered to be relevant in determining the binding surface are those having a separation of alpha-carbons less than a predetermined distance. [0067] Using the parameters defined above, amino acid pairs from the database of amino acid pairs of the proteinaceous material ranked by the distance separating them can be analyzed to determine which of those pairs is considered to be relevant. Each of the relevant pairs may then be mapped on the three- dimensional model of the proteinaceous material . [0068] The next step is to determine clusters of those mapped relevant amino acid pairs, wherein each amino acid pair in the cluster is topographically related to at least one other pair in the cluster. Preferably, clustered pairs are determined by an appropriate rule of connectivity. For example, a cluster of pairs may be determined when a member of one pair can be connected to a member of a second pair, which, in turn, is connected to a third, and so forth. Two pairs are considered to be connected if they are topographically related. Two amino acid pairs are topographically related if they either share an amino acid residue or each has an amino acid residue that falls within a predetermined distance of one another. In either case, a tandem residue (i.e., linearly consecutive) to a predicted pair is considered to be topographically related if it also participates in a predicted pair. Thus, one rule of connectivity is to require that each pair actually share an amino acid residue (or a tandem residue) in order to be considered topographically related. In another embodiment, however, it is only necessary that each have an amino acid residue that falls within a predetermined distance of one another. The predetermined distance must be greater than the amino acid difference that defines a pair of amino acids used in determining what is a relevant amino acid pair when mapping the statistically significant amino acid pairs onto the surface of the three-dimensional model of the proteinaceous material . Thus, the predetermined distance for determining whether two amino acid pairs are topographically related may be as great as three times the distance defined above with respect to the mapping step. Thus, such a predetermined distance may preferably be 18-36 A, although it could be as great as 45 A. [0069] A cluster has two or more pairs of amino acids as determined by the rule of connectivity that is used. Any cluster so identified is a predicted region on the surface of the proteinaceous material, representing at least a part of the binding surface being sought . The greater the number of pairs of amino acids in the cluster, the greater the likelihood this cluster represents the binding surface or a significant portion thereof. Each cluster identified by means of the process of the present invention is a predicted region on the surface of the proteinaceous material representing the binding surface. [0070] In order to refine the precision of the prediction of the binding surface, a number of steps may optionally be taken. This is particularly useful in the situation that an analysis predicts a 'number of clusters within the antigen and one requires objective criteria to focus on the most physiologically relevant prediction. [0071] As the first step, and as indicated above, the largest and most elaborate cluster, containing the largest number of different pairs and the largest number of different residues, is identified. [0072] In order to further refine the prediction once one has screened the mAb with a given type of library (such as a linear p3 20mer library) and obtained peptides and a prediction, a second screen can be conducted against a different type of library (e.g., constrained S-S 12mer library) and another analysis conducted so as to try to identify overlapping clusters or segments of clusters. It could be concluded that the overlaps are more likely to be relevant than isolated individual clusters. [0073] Another refinement involves conducting mutagenesis of the residues in the predicted clusters to identify those mutations that most affect the binding. [0074] A further refinement would be to create mutations of the CDR loops of the mAb. If one is able to introduce a mutation that affects the binding of some of the peptides and not to the antigen, one can exclude the affected peptides from the analysis and see if the prediction now focuses more on a specific cluster. [0075] Another possible refinement is to screen the peptides recognized by the mAb against polyclonal serum of relevant patients. It would be expected that peptides that most resemble (structurally or functionally) the bαna fide epitopes of the pathogenic antigens would be recognized by polyclonal serum. The analysis of the present invention can then be conducted using the peptides that are recognized by more than the single original mAb. [0076] Another refinement would be to try to obtain different mAbs that compete for the antigen against the original mAb. If any of the peptides from the original mAb are bound by the competing mAb, these peptides should be given more weight in the analysis. [0077] Finally, the competing mAbs could be screened against peptide libraries in order to -obtain new peptides. These can be used to conduct predictions and see if any new clusters coincide or overlap in part with the original clusters . Furthermore, if the competing antibodies produce peptides that are recognized by the original mAb, these should be given more weight in the analyses. [0078] Once a cluster of amino acids is identified using the process of the present invention discussed above, this information may be used to identify a basic element of the binding surface on the proteinaceous material . The first step in accomplishing the identification of such a basic element is to analyze the amino acids of the cluster as mapped on the three-dimensional model of the proteinaceous material. The perimeter of the predicted binding surface is defined by identifying the outermost amino acids of the binding pairs in the cluster. [0079] In the next step, one identifies all of the other amino acids on the surface of the proteinaceous material situated either within the perimeter of the predicted binding surface or within a perimeter that is extended from the positions of the outermost amino acids of the binding pairs in the cluster by a predetermined distance. That predetermined distance is preferably about the same as that described above in the context of the rules of connectivity, i.e., 18-36 A, although it could be as great as 45 A. Preferably, it is about 20 A. Thus, all amino acids within the perimeter of the outermost amino acids of the binding pairs in the cluster or otherwise within a predetermined distance of one of those outermost amino acids, which predetermined distance is the same as that for finding relevant amino acid pairs, are all identified in this step. [0080] Finally, linear segments of the proteinaceous material are identified according to the following requirements. The first and last residues of each such linear segment must be amino acid residues identified in either of the preceding two steps . None of the intermediate amino acids thereof are amino acids on the surface of the proteinaceous material not identified in either of the previous two steps. Each such linear segment is identified as being a basic element of the binding surface. In addition, any single amino acid identified in either of the previous two steps that is not a part of one of the linear segments identified herein are considered to be basic elements of a single amino acid each. [0081] The "basic elements", which are determined using the analysis discussed above, may each be peptides having from one to twenty or more residues. The basic elements themselves have utility as follows: (1) They may be used as a vaccine with the expectation that antibodies raised against them will cross-react with the binding surface and, therefore, cause neutralization, a mechanism similar to that by which the neutralizing binding molecule that one started with causes neutralization. Besides raising antibodies that might be neutralizing, the basic element itself might prevent the interaction of the proteinaceous material that is necessary for its activity with its target. (2) If the basic element is part of an enzymatic region, then the basic element may serve as an inhibitor of the enzymatic reaction by sequestering the substrate. [0082] With respect to the first utility, one can assume that, if a functional antibody response is elicited against a segment of the epitope, that might suffice in interfering with virus infection or physiology. Thus, one might not have to functionally reconstitute the original surface to be recognized by the original monoclonal antibody to justify its use as a vaccine component . [0083] While it may not be reasonably predictable that each binding element will function as a vaccine or have one of the other utilities discussed above, they have a much increased probability of such activity over a random peptide. One of skill in the art will readily be able to test each such binding element for the specified activity without engaging in undue experimentation . [0084] The basic elements previously identified may next be used in order to produce a molecule that is a binding surface mimetic. [0085] When making a molecule mimetic of a binding surface, the peptides of the basic elements are connected in such a manner as to form a single molecule that substantially maintains the spatial orientation that the same amino acids have when part of the binding surface. The term "connected", in this sense, includes connecting the peptides by covalent bonds (peptide bond or other) , directly to one another or indirectly by means of appropriate linkers selected so as to cause the molecule to maintain the desired spatial orientation. For example, a preferred linker used to impose a turn is GPG. A second linker that also imposes a turn is the tetrapeptide TPGS (residues 9-12 of SEQ ID NO: 104) . If one wants to force rigidity, one might introduce more proline residues. A more flexible and hydrophilic linker might contain glycines, serines and threonines . [0086] When connecting the peptides, one or more of the peptides of the basic elements may need to be synthesized in reverse order so as to maintain the spatial orientation of the binding surface. The peptides may also be connected by means of non-peptide bonds, such as disulfide bonds. Examples of mimetic molecules connected in such a manner are discussed below with respect to the G12 mimetic, which is mimetic of the binding surface of gpl20 recognized by the CG10 antibody, and with respect to the molecules that are mimetics of the binding surface of gpl20 recognized by the bl2 antibody. [0087] The term "connected" in this context also includes grafting of the peptides onto a scaffold, which is essentially the same as the use of extended linkers. The grafting of loops onto a scaffold is common in the field of antibody modifications in which CDR-loop grafting is common practice. In CDR-loop grafting, CDR loops from one antibody are swapped onto a scaffold of another, a process required in humanization of murine antibodies to be used in pharmaceutical compositions in people. To find an appropriate scaffold for such grafting, one would screen the database of protein crystals to find spatial orientation structures on other proteins that are similar to the spatial orientation of the binding elements being connected. If such a structure is found, then the sequences of the peptides of the binding elements can be substituted at the corresponding positions of the scaffold. [0088] Another example of a scaffold is to actually use the same proteinaceous material of which the binding surface is a part, which has been modified in such a manner as to accentuate the immunogenicity only of the binding surface. An example of this may be found in Pantophlet et al (2003) . In this reference, hyperglycosylation is introduced into the gpl20 so as to obscure some areas of the molecule and reveal others to become more immunogenic. [0089] Once the peptides of the basic elements are identified, one can vary the sequence of those peptides to the extent that the corresponding portion of other cognate proteins may include such variability. Two proteins are "cognate" if they are produced in different species, but are sufficiently similar in structure and biological activity to be considered the equivalent proteins for those species . Two proteins may also be considered cognate if they have at least 50% amino acid sequence identity (when globally aligned with a pam250 scoring matrix with a gap penalty of the form q+r(k-l) where k is the length of the gap, q = -12 and r = -4; percent identity = number of identities as percentage of length of shorter sequence) and at least one biological activity in common. Similarly, two genes are cognate if they are expressed in different species and encode cognate proteins. [0090] For example, the proteinaceous material used in the examples of the present application is gpl20. One can vary the sequence of the peptides of the basic element reported herein to the extent that the corresponding portions of other isolates of gpl20 may include such variability. Thus, the gpl20 sequence of SEQ ID NO:l, which is used throughout the present specification, is the sequence of HXB2 isolate. Many other isolates of gρl20 are known and whose sequences are known, which isolates represent the extensive number of sub-types of HIV. Table 1, for example, is a comparison of the sequences of the basic elements of the bl2 binding surface of HXB2 (which is a clade B isolate) and the consensus sequence for each of clades A-G and 0. It can be seen that many of the residues are highly conserved while others are quite variable.

Table 1

Comparison between the Cluster A Segments of the B12 Epitope of HXB2 with the Corresponding Positions in the Consensus Sequences of Various HIV-1 Clades

Clade 391 -396 SEQ ID NO

Hxb2 F N S T W F 1

A F N S T w N 108

Al F N S T w N 108

A2 F N S T w ? 109

B F N S T w N 108

C F N S T Y N 110

D F N S T w N 108

FI F N D T G S 111

F2 F N N T E V 112

G F N N S I L 113

0 F N Y T F S 114

01 F N N T C I 115

02 F N S T W N 108

03 F N S T W N 108

04 F N S T Y M 116

06 F N ? S I ? 117

08 F N G T Y ? 118

Clade 457-468 SEQ ID NO

Hxb2 D G G N S - - - N N E S E I F 1

A D G G V N - - 9 ? N s ? N E T F 119

Al D G G V N - - 9 ? N s ? N E T F 119

A2 D G G ? N - - 9 ? ? N E T F 120

B D G G N N ? ? 9 9 9 9 T N T T E I F 121

C D G G N N - ? 9 9 9 9 T N T T E T F 122

D D G G A N - ? ? - ? ? ? ? N S s N E T F 123

FI D G G Q ? - - 9 9 9 S 9 T E T F 124

F2 D G G K N - - .. 9 N G S E T L 125

G D G G N N - ? - - - ? ? ? T S T N E T F 126

0 D N P W N - ? - - - ? ? T s N ? N A T F 127

01 D G G A N - ? - - - 1 1 1 N T T N E T F 128

02 D G G N N - - - 9 ? N S T N E T F 129

03 D G G N Q - - - S N V T E I F 130

04 D G G ? ? - - _ _ 9 9 9 9 9 9 9 N E T F 131

06 n β U M ? •p S 9 S E T F 132

08 D G G R T - - _ 9 E S N D T E I F 133

Data obtained from the website found at hiv-web . lanl . gov ?: variable postitions, -: no residue at this position [0091] Table 2, hereinbelow, is an illustration of an example of anticipated variations in the sequence of the basic elements of the bl2 binding surface that would be regarded as still part of the present invention. Any natural variation of the predicted segments may be substituted for the residue identified at a particular position with respect to the HXB2 sequence. Table 2 represents the results of BLAST analysis of the segment of HXB2 HIV-1 gpl20 against all the other known HIV sequences. Then each residue of the cluster A peptides was analyzed for variations seen in the top 50 closest peptide sequences found. As to be expected, there is just a limited degree of variation. The same BLAST analysis can be conducted taking the consensus sequence of each of the clades. HXB2 is a representative of clade B. [0092] Table 2 below shows that variability may be found within the consensus of each clade. Residue-variation at each position was obtained by multiple sequence alignment with HXB2 (50 homologs were analyzed) :

Table 2 Residue in HXB2 Natural Variations

ILE 360 A,F,H,K,N,R,S,T,V,Y PHE 361 L LYS 362 A,D,I,N,Q,R,T,V

PHE 391 I,L ASN 392 S,T SER 393 D,G,N,W,Y THR 394 A,I,L,N,V TRP 395 E,H,I,N PHE 396 D,L,N,Q,R,T,Y

ASP 457 N GLY 458 A,E,S,T,V,W,Y GLY 459 D,I,N,P,Q,R,T,V ASN 460 A,I,K,Q,S,T,V SER 461 D,E,G,N ASN 462 D,E,G,I,K,Q,S,T,V ASN 463 D,Q,S,T GLU 464 D,G,K,N,Q,R,S,T SER 465 E,H,I,N,T GLU 466 L ILE 467 N,T,V PHE 468 G,I,L,N,S

[0093] Using a similar analysis, known variations that diverge from target sequences of each clade can be determined, all of which are considered to be encompassed by the present invention. Thus for example, the tryptophan residue in position 395 in HXB2 has diverged to tyrosine in clade C, glycine in clade FI and cysteine in clade 01. These three variations are not included in the BLAST analysis described in Table 2, illustrating therefore that such a BLAST analysis for each clade consensus sequence would certainly expand the range of anticipated variation permissible for each amino acid position claimed as part of this invention. [0094] Any of the residues that appear at a given position in any clade may be substituted for the residue in the HXB2 isolate which has been specified when making pharmaceutical compositions comprising the basic elements or when making the molecules that are mimetic of the binding surface. Preferably, the highly conserved residues are maintained. [0095] Once a molecule is identified that is mimetic of the binding surface, it is produced for ultimate use. If the mimetic is a peptide, it may either be synthesized or produced recombinantly in methods that are well known to those of ordinary skill in the art. Where necessary, disulfide bonds may be caused to be formed in such peptides by means that are well known to those of ordinary skill in the art. [0096] The present invention may be used to produce a vaccine in any situation in which a desirable neutralizing antibody is known for any given pathogen and the antigen that is recognized by that antibody has a known three-dimensional structure. Oftentimes, it is undesirable to use the pathogen itself or even the antigen as a vaccine in view of problems of toxicity or lack of immunogenicity or immunodominance of unrelated epitopes. By means of the present invention, the binding surface of the antigen may be identified and a molecule mimetic thereof produced which, when used as an immunogen, will in effect raise antibodies similar to the desirable neutralizing antibody used as the starting material. A useful non-toxic vaccine thereby becomes available for active immunization. [0097] If the mAb used coincides with a binding site on the proteinaceous material for another ligand, such as, for example, the CD4 or CCR5 binding site on gpl20, then the predicted and reconstituted epitope might itself bind the ligand, such as CD4 or CCR5 in the above example. Thus, such a reconstituted epitope could be used directly as an antiviral therapeutic. It would not involve undue experimentation to test any such reconstituted epitope for this utility and then formulate appropriate therapeutic compositions, including appropriate pharmaceutically acceptable excipients. Furthermore, it is well within the skill of those of ordinary skill in the art to empirically determine appropriate dosages and means of administration for such an antiviral therapeutic. [0098] If the mAb that one starts with binds specifically to the active site of an enzyme, then it would be expected that the reconstituted epitope could act as a catalytic peptide and could be used for catalysis in a, bioreactor scenario. Again, it is within the skill of the art to determine whether any given reconstituted epitope in such a situation does act as a catalytic peptide by routine experimentation. If it does have such a property, those of skill in the art could use the catalytic peptide for catalysis in a bioreactor scenario without engaging in undue experimentation. [0099] The approach of the present invention is based on the assumption that the collection of random peptides that bind specifically to the antibody being studied must reflect in some manner the paratopes of that antibody. Furthermore, as the antibody was produced to correspond to the native antigen, the native epitope should in some fashion correlate with this same collection of affinity-selected peptides. In the following examples, the reference mAbs, 17b and 13b5, were useful for the formulation of the algorithm providing known positive controls. Of course, other control antibodies that have been solved in a co-crystal could also have been selected. Once the system appeared to be working, the prediction of the CG10 and bl2 epitopes was undertaken. [0100] The following examples provide four lines of evidence that give credence to the approach of the present invention. [0101] The epitope predicted for CG10 entails segments that overlap with the 17b epitope. Both these mAbs have been studied extensively and have been found to be distinct yet do compete for gpl20 binding, suggesting that there should be common shared elements in their corresponding epitopes (Sullivan et al, 1998) . [0102] The epitopes do not coincide completely and one can conclude this also by comparing phage sequences found by screening the library with both antibodies (Figure 2A and 7A) . The phages that were selected by screening the 17b mAb do not interact with the CG10 mAb and vice versa (data not shown) . Moreover, as previously discussed, the sequences of CDR-loops of both antibodies are also different, particularly the CDR3- loops of the heavy chains (Figure 6) . It is obvious, therefore, that these antibodies should not necessarily have the same contact residues even when they form overlapping interfaces with gpl20. • Five of the predicted amino acids of the CG10 epitope have been shown by mutational analyses performed by others to be critical for mAb binding to gpl20 (Rizzuto et al, 1998) . • The construction of the reconstituted epitope that binds CG10 and competes against CD4/gpl20 complex. [0103] This last criterion is especially important as it illustrates that not only is the procedure of the present invention able to identify a limited cluster of residues that appear to be critical for mAb binding, it is possible to "string them together" in a meaningful way. The concept of reconstituting discontinuous epitopes is not new. There have been previous successful studies where segments of a protein have been produced as contiguous elements in a single synthetic peptide structure (Ottl et al, 1999; Villen et al, 2001) . Functional reproduction of the discontinuous antigenic site D of foot-and-mouth disease virus (FMDV) has been achieved by means of synthetic peptide constructions that integrate into a single molecule each of the three protein loops that define the antigenic site. Antisera to the peptide were moderately neutralizing of FMDV in cell culture and partially protective of guinea pigs against challenge with the virus. These results demonstrate functional mimicry of the discontinuous epitope D by the peptide, which is therefore an obvious candidate for a peptide-based vaccine against FMDV (Villen et al, 2001) . [0104] The G12 construct produced in Example 3 reconfirms, therefore, two major aspects of the present invention, the predictability of epitopes by combining the phage analyses with the algorithm of the present invention and the feasibility of reconstituting discontinuous epitopes. [0105] The reconstructed epitope molecules corresponding to the predicted antigenic determinants can be used as a synthetic vaccine that will generate antibodies that have the same specificity as the original antibody used in the analysis. The reconstituted CG10 epitope that has been produced based on the predictions of the present invention can be used as ' a synthetic vaccine. [0106] The present invention will be better understood by consideration of the following non-limiting examples. Example 1 ; mAb 17b [0107] The concept of the present invention will first be demonstrated using a control model system. The ternary complex of the core of HIV-1 gpl20 with a truncated version of its receptor, soluble CD4 (sCD4) and a Fab fragment of the mAb 17b has been crystallized and solved to 2.5 - 2.2 A resolution (Kwong et al, 2000; Rizzuto et al, 2000; Wyatt et al, 1998; Kwong et al 1998; Rizzuto et al, 1998) . As a result, the 17b epitope is known to be comprised of four discontinuous beta- strands. Thus, this information allows the known mAb 17b epitope to be used as a control model system. [0108] MAb 17b was used to repeatedly screen combinatorial phage display peptide libraries until a collection of peptides was obtained. As is illustrated in Figure 2A, eleven random peptides were isolated through the comprehensive screening of three combinatorial phage display peptide libraries. Each library represented 1-5 x 10⁹ recombinant random 12mer peptides flanked by constant cysteine residues so to constrain a looped structure at the NH2 terminus of the major coat protein, pVIII, of the filamentous bacteriophage fd (one of the selected peptides happens to contain only 8 residues) . No obvious homology exists between the peptides and gpl20. [0109] In order to derive the epitope of mAb 17b from the 11 peptides described above, the novel computational algorithm discussed above was utilized (Figure 1 is a general scheme of the final algorithm) . Identification of Over-Represented Amino Acid Pairs : [0110] Step I in the computer algorithm is to calculate the number of occurrences of specific amino acid pairs present in the affinity-selected phage-displayed peptides, and identify those pairs that are substantially over-represented as compared to the random frequencies of pairs statistically anticipated. [0111] Calculation of random frequencies of amino acid pairs was performed as discussed above. As contacts between the mAb and the antigen are through functional moieties of the R- groups, conserved residues were consolidated into the six functional subgroups of amino acids, as discussed above, and given the following single letter codes: R,K = B E,D = J S,T = O L,V,I = U Q,N = X W,F = 2i. [0112] The anticipated frequencies (n_r) of the amino acid pairs for totally random peptides were then calculated and are given in Figure 3A. [0113] Two types of pairs must be considered: completely random pairs, in which both partners of a pair are random and "semi-random" pairs in which only one partner is random, whereas the other is constant (as is in the case for the first and last pair of each peptide containing the constant flanking cysteine residues) . [0114] The peptide sequences were converted into their amino acid pair equivalents (Figure 2B) . The 11 peptides comprise a total of 141 amino acid pairs (119 are completely random and 22 pairs are "semi-random") , representing 84 different pairs of the 169 possible pairs (considering the 13 amino acid categories, see Figure 3A) . The number of occurrences for each amino acid pair of the 11 affinity selected peptides was scored and a list of amino acid pairs ranked according to their occurrences is given in Figure 2C (this corresponds to "Product I" in Figure 1) . Then, the expected theoretical number of specific amino acid pairs (f_r=n_r-A, where A is number of amino acid pairs in a set of peptides) was calculated for 17b selected peptides based on the randomness. Figure 3B compares the theoretical occurrences (f_r) to the experimental occurrences in the isolated 17b selected peptides. For example, ten UB pairs exist in the eleven 17b-peptides which is at least twice as many as would be expected randomly. Thus, for the 7 most prevalent pairs in the 17b-peptides four of them (UB, BC, BP and OX) are statistically more abundant than randomness would predict and thus represent the set of the most statistically significant pairs (SSP, see algorithm Figure 1) . Note the propensity for positively charged residues (B=K,R) which is not surprising in view of the acidic nature of the CDR3 loop of the heavy chain of 17b (Kwong et al, 1998) ; see also Figure 6) . [0115] The next step is to identify the most prevalent pairs of amino acids described above, within the three dimensional structure of gpl20. Then, based on these clustered pairs, candidate epitopes are predicted. The present novel computer algorithm is designed to accomplish these two tasks. Identification of Epitope-Relevant Amino Acid Pairs [0116] In essence, pairs of amino acids in the antigen are sought that are functionally represented by the tandem residues in the 12mer peptides displayed on the phages. The dimensions of what is a pair have been defined hereinabove. Using these definitions, each epitope-relevant amino acid pair in the proteinaceous material can readily be identified from the database of distances between each possible pair of amino acids in the protein. Prediction of Candidate Epitopes: [0117] Once the relevant amino acid pairs are identified on the surface of the antigen, clustered-pairs can then be defined. A cluster of pairs is determined when a member of one pair can be connected to a member of a second pair, which in turn is connected to a third and so forth. Thus, for example, in Figure 4A a list of the SSP clusters relevant to the 17b- selected peptides is provided (Product III, Figure 1) . Seven clusters could be defined. The largest is Cluster A. Note that the UB pair Llll, K117 (first pair in Cluster A) is connected to the BC pair K117, C205 as K117 is common to both. The OX T202, Q203 pair is included as T202 in tandem to 1201. BU was included despite the fact that, while abundant, it is statistically within the error set for randomness. It was included to test whether the inclusion would improve the prediction. The difference without BU (data not shown) is only slight. The same binding elements are predicted except that amino acids 419 and 208 were not included in the absence of BU. Thus, it should be understood that the definition of statistical error given above is not the only definition of statistical significance that can be used and that making such judgment calls is within the skill of the art. [0118] An alternative embodiment of the present invention is to consider forward and reverse pairs, e.g., BU and UB, as being equivalent when conducting the SSP analysis. Thus, the number of each would be added together, and the n_r for each would be added when determining which pairs are relevant . [0119] Thus, one can define the boundaries of a given cluster through rules of connectivity. The cluster of highest relevance is considered to be that cluster that includes the maximal number of pairs, types of pairs and total amino acids associated via connectivity. Cluster A is a particularly attractive candidate-epitope as it comprises 27 pairs of 5 different pair types (UB, BP, BC, BU and OX) encompassing a total of 24 different amino acid residues. [0120] Figure 4E depicts Cluster A in the context of the gpl20 crystalline structure. The predicted epitope comprises four segments of the gpl20 molecule that together create an anti-parallel beta-sheet surface (Figure 4D) . As can be seen, this predicted epitope (Product IV, Figure 1) corresponds very well with the actual contact residues of the 17b epitope as determined from the co-crystal (Figure 4B; Kwong et al, 1998) . Thus, it appears that the computer algorithm correctly identified the elements of the Jbona fide 17b epitope based on the affinity-selected phage-displayed peptides. Example 2 ; mAb 13b5 [0121] The algorithm was further tested using a second control mAb. The mAb 13b5 binds HIV-1 p24 antigen (Monaco- Malbet et al, 2000) . The epitope of 13b5 has been defined in the 13b5/p24 co-crystal (3 A resolution) (Berthet-Colominas, 1999) . Therefore, 13b5 was used to screen the 12mer constrained-loop peptide library described herein. Initially, 10 peptides were isolated and used for the computational analysis. A number of poorly-defined clusters were predicted including the region of the genuine epitope . In order to improve the statistics and sharpen the cluster analysis, additional libraries were screened until a total of 16 different 13b5-specific peptides were isolated (Figures 5A and 5B) . These were then used for analysis and the algorithm predicted 4 clusters (Figure 5C) . The largest and most comprehensive corresponded remarkably well to the genuine epitope illustrating the predictive power of the method (Figure 5D) . Example 3 : mAb CG10 [0122] In view of the above success in epitope prediction, the phage/algorithm analysis of the present invention was then applied to a mAb where its corresponding epitope is unknown. For this, the mAb CG10, which binds to a highly conformation dependent epitope peculiar to the CD4/gpl20 complex (Gershoni et al, 1993) , was used. In view of the fact that 17b and CG10 compete for CD4/gpl20 complex binding, it was expected that the CG10 epitope should overlap to some degree with the 17b epitope (Sullivan et al, 1998) . On the other hand, comparing the molecular structure of the CDR loops of both mAbs shows marked differences between the two (Figure 6) . The CDR3 of the 17b heavy chain is extremely acidic (containing 5 E or D residues out of 12 amino acids) . In contrast to this, the CDR3 of the CG10 heavy chain is rather hydrophobic. One should expect, therefore, very distinct random peptides for each mAb. [0123] CG10 was used to screen the phage-display peptide libraries produced in the laboratory of the present inventors until a total of 28 phages were affinity isolated (Figure 7A) . These were then translated into their corresponding amino acid pair equivalents (Figure 7B) and the list of amino acid pairs ranked according to the number of occurrences (Product I, Figure 1) is given in Figure 7C. [0124] Comparison of the usage of the amino acid pairs found in the CG10-specific peptides versus the expected random usage is given in Figure 7D. As to be expected, in comparison with the same analysis of 17b, there are marked differences. The most prevalent 17b pair, UB, is very rare in the CGlO-specific peptides. PB, the most abundant CG10 pair, is relatively rare for 17b (2 incidences) . Clearly, the general profile of amino acid pairs used to produce CG10 binding peptides is markedly different from that found for 17b. These differences in composition explain the fact that whereas 17b and CG10 compete for gpl20, none of the 17b selected phages are recognized by CG10 and vice versa . [0125] The search for clusters on the gpl20 surface (Step II, Figure 1) for CG10 produced seven clusters of which only Cluster A seemed extensive and likely as an epitope candidate (Figure 8) . Figure 9A shows the position of this cluster within the crystalline model of gpl20. Indeed, as expected, the predicted CG10 epitope does overlap with the 17b epitope yet contains elements unique and distinct as well. The • question is how confident can one be with this prediction? Evidence for the validity of the prediction can be found in the comparison of the predicted epitope and the mutation analysis performed by Rizzuto et al (1998) for CG10 binding. As is illustrated in Figure 9 (compare Figure 9B with Figure 9C) five predicted residues correspond with five mutations previously shown to affect CG10 binding to the CD4/gpl20 complex. Reconstitution of the CG10 Epitope: [0126] The information as to the mapping of the CG10 epitope can best be used by physically creating a molecule that effectively reconstitutes the CG10 epitope based on the prediction. For this, four short peptide segments of gpl20 that were predicted to comprise the CG10 epitope, were used to generate an epitope mimetic (see Figures 10A-10C) . To fill the gap between amino acids K207 and K421 a fifth segment comprising amino acid residues F381-E382 was introduced. The logic was based on the fact that in the original gpl20 molecule these amino acids form non-covalent contacts with K421 and Y435 stabilizing the conformation (Kwong et al , 1998). As is shown in Figures 10A-10C, the five segments were either connected directly to one another or via two tri-peptide linkers (GPG) designed to impose a turn in the structure . Since the epitope consists of anti-parallel beta-strands, fragments #3, #4 and #5

(Figure IOC) within the reconstituted sequence were introduced in the opposite direction as compared to the original antigen sequence to satisfy spatial requirements. Thus, the 25 amino acid sequence derived from gpl20 (Figure IOC) was taken

(including the two predicted cysteine residues, cysll9 and cys205) , which create, therefore,^, a constrained loop that was introduced into the N-terminus of the pVIII protein of the fd bacteriophage. Phages were isolated and tested for their specific binding of CG10. As can be seen in Figure 10D, the phage expressing the reconstituted epitope (designated G12) readily bound CG10 in contrast to the lack of binding for the control fthl phage. The specificity of the binding was shown in Figure 10E where only CG10 bound G12 as compared to three other mAbs that bind other epitopes. Note that 17b did not bind G12, once again demonstrating the distinct nature of these two overlapping epitopes. Finally, G12 was found to efficiently compete for CGlO-binding against the CD4/gpl20 complex (Figure 10F) . In view of these results, it was concluded that not only was the presentation able to predict the CG10 epitope based on phage analysis and application of the herein described computer algorithm, but also reconstitute the epitope based on this prediction. Example 4 ; mAb b!2 [0127] One of the most important mAbs that interacts with the CD4 binding site on HIV-I surface glycoprotein gpl20 is mAb bl2, which is known to neutralize many primary and TCLA viruses very efficiently (Burton et al, 1994) . This antibody has been extensively studied, and the first step of the present invention has already been reported in the prior art in studies by others. Thus, Zwick et al (2001a) and Boots et al (1997) report the screening of peptide libraries using the bl2 mAb, as well as the sequences of peptides found to bind thereto. These reported peptides can be used in the algorithm of the present invention so as to predict the binding elements of the binding surface of gpl20, which is bound by mAb bl2 without the necessity of using the antibody itself. Thus, 32 peptides from Boots et al (1997) and two peptides from Zwick et al (2001a) (Figure 11A) , were translated into their corresponding amino acid pair equivalents (Figure IIB) , and the list of amino acid pairs ranked according to the number of occurrences is given in Figure 11C. Comparison of the usage of the amino acid pairs with the statistical frequencies, as shown in Figure 12, allows the identification of those pairs that are particularly relevant . [0128] The algorithm was then applied to identify the pairs on the surface of gpl20 and two clusters were identified as potential epitope elements (Figures 13A-B and 13C-D) . [0129] These analyses predict that three strands of gpl20 comprise major elements of the bl2 epitope: residues 360-362, 391-396 and 464-468 (Figure 13E) . In addition, conducting the analysis on the two peptides isolated by Zwick et al (2001a) (the last two peptides in Figure 11A) residue Asp457 is predicted to be part of the epitope (the significance of Asp457 is lost when the two Zwick et al (2001a) peptides are taken together with those of Boots et al (1997) ) . The residues correspond to the numeration of gpl20 of HIV-1 HXB2 isolate (SEQ ID N0:1) . [0130] In order to reconstitute the bl2 epitope, one first highlights the predicted segments of the epitope within the three dimensional model of gpl20. In doing so, one immediately appreciates the spacial relatedness of each segment and understands the sequential order that must be maintained in order to construct a single linear peptidomimetic of the epitope. Thus, for example, for construction and topographical considerations, residues 358-359 should be included to give a single continuous element 358-362 which would bring the beginning of the first segment closer to residue 396 of the second predicted segment (Figures 14A and 14B) . Furthermore, in order to include predicted residue 457 in the construct, a single continuous element ranging from 457 through 468 is to be used. Thus, starting with residue 391, one strings the elements together proceeding to residue 396, followed by a short bridging linker (TPGS (residues 9-12 of SEQ ID NO: 104) that imposes a turn) followed by the sequence 358 through 362. A second TPGS (residues 18-21 of SEQ ID NO: 104) linker allows a turn in the construct to be followed by the reversed orientation segment 468 through 457. The distance between the ends of this construct is predicted to be 14-15 A. In order to bridge this gap, one can introduce residues derived from the 110-118 segment, which is the sole peptide element that is predicted for cluster B. Thus, the first seven residues (SLWDQSL, residues 110-116 of SEQ ID NO:l) were introduced to bridge the gap. This permits at least a portion of another predicted peptide element to be included in the peptidomimetic. Then, flanking cysteine residues are included to produce a constrained looped peptide that contains the sequential as well as spacial orientation of the residues predicted for the bl2 epitope. The resultant peptide is shown in Figure 14C (SEQ ID NO: 104) [0131] An additional variation of this kind of construct is illustrated in Figure 15 in which the same elements are incorporated into a double cysteine looped construct. This illustrates that there may be more than one way to connect the basic elements in a manner so as to substantially maintain the relative spatial orientation of the amino acids in the basic elements. Those of ordinary skill in the art, following the instructions in the present disclosure, and being aware of prior art methods of doing so (see, for example Villen et al, 2001) , can readily perform such a step without engaging in undue experimentation. [0132] In contrast to the preceding example in which the elements of an epitope, or part thereof, are strung together (with or without bridging linkers) into a linear peptidomimetic, the epitope elements may also be incorporated into existing scaffolds. The concept is based on the fundamental premise that if an epitope element is of a given secondary conformation in the original antigen, it will continue to assume the same structure even when it replaces a different sequence with a similar structure in the context of a foreign polypeptide scaffold. In other words, the gpl20 derived peptides, when grafted into the scaffold, will assume their native conformation there. [0133] The bl2 epitope elements, for example, form a beta- sheet comprised of three beta strands. Figure 16A. The native orientation of these strands can be defined by measuring the distances between the ends of the strands as is illustrated. [0134] Once one knows the structure one is interested in (Figure 16A) one can survey the database of solved crystalline structures of proteins and search for compatible scaffolds. In the following case, beta-sheet structures are to be analyzed. It is well known in the art that members of the super gene family of immunoglobulins contain considerable beta-sheet structures. Thus, for example, one can examine the solved structure of CD4 , a member of this gene family. [0135] The model of the CD4 backbone can be visualized with any of the graphic programs such as RasMol, PDB viewer or Protein Explorer. One then can selectively highlight segments of CD4 that assume beta-strand secondary structure (a standard command in any of these programs) . Seeking areas that form beta-sheets of the same general dimensions and shape of that of the bl2 epitope, it becomes apparent that the three strands of CD4: residues 77-81, 93-95 and 2-7 (of SEQ ID NO:106), are excellent candidates for strand exchanges. Thus, a method for the expression of the bl2 epitope is to incorporate the gpl20 strands into the corresponding locations of the identified CD4 beta-sheet as is illustrated in Figure 16B. This will result in a hybrid protein with the sequence of SEQ ID NO: 134. [0136] A second example of such a grafted construct is shown in Figure 16C in which the gpl20 segments replace compatible segments of the framework component of the V_H domain of murine IgGl (SEQ ID NO: 107) . This will result in a hybrid protein with the sequence of SEQ ID NO: 135. Example 5: Materials and Methods: DNA Preparations [0137] Single-stranded DNA of filamentous phages was isolated using the QIAprep Spin M13 Kit^® (QIAGEN GmbH, Germany) . Monoclonal Antibodies [0138] CG10, CGI and CG9 mAbs used in this study Were produced at Tel Aviv University (Gershoni et al, 1993) . Human mAb 17b was produced at Tulane University Medical Center (Thali et al, 1993) . The mAb 13b5 is a product of bioMerieux, France (Monaco-Malbet et al, 2000) . Library Construction and Biopanning [0139] Three 12-mer cysteine-constrained peptide libraries were constructed: one in ftacδδ and the other two in fthl (Enshell-Seijffers et al, 2001) . Protein G (Sigma Chemical Co, St. Louis, MO) was used to coat the bottom of 35-mm tissue culture 6-well plates overnight at 4°C (50 μg of protein G in 0.7ml Tris buffered saline (TBS)). After discarding the excess solution, the dish was blocked with TBS containing 0.25% (w/v) gelatin (TBSG) for 2 hours at room temperature. Next, the dish was rinsed rapidly five times with TBS and incubated with a total volume of 0.7 ml containing 10-50 μg of mAb and rocked gently at room temperature for 4 hours. After washing with TBS, biopanning was accomplished by adding 10¹¹ phages from the library to the dish in 0.7 ml TBSG and incubated at 4°C overnight. Unbound phages were rinsed away and the dish was washed extensively ten times with TBS. Bound phages were eluted with 400 μl of elution buffer (0.1M HCI adjusted to pH 2.2 with glycine, 1 mg/ml BSA) for 10 minutes at room temperature with gentle agitation. The eluate was transferred into a 1.5 ml microfuge tube and neutralized with 75 μl of neutralizing buffer (1M Tris-HCl pH 9.1) . For more details see Enshell-Siejffers et al, 2002) . Immunoscreening [0140] DH5alphaF+ bacteria were infected with the affinity selected phages and plated on LB plates containing 20 μg/ml tetracycline and grown at 37°C overnight. Single colonies were picked to inoculate 200 μl Terrific Broth in U-bottom 96 well plates. After overnight culture the plates were centrifuged at 3000 rpm for 30 minutes at room temperature. 125 μl of the supernatant from each well were transferred to a flat-bottom, 96-well plate already containing 50 μl/well of PEG/NaCl solution. The flat-bottom plates were incubated at 4°C for 2 hours and centrifuged. The precipitated phages were re- suspended in a total of 100 μl TBS and applied via a vacuum manifold to nitrocellulose filters. After blocking with 5% milk in TBS for 1 hour, the membranes were washed briefly with TBS and incubated overnight with the selected mAb (1-2 μg/ml) in TBS/5% milk at 4oC with gentle rocking. After washing, the membranes were incubated with goat-anti mouse IgG/HRP conjugates diluted 1:5000 in TBS/5% milk for 1 hour at room temperature (Enshell-Siejffers et al, 2002) . The positive signals were detected by ECL (Amersham) immunodetection. Sequencing [0141] For sequencing the inserts of the phages selected from library, single-stranded DNA was isolated from phage- particles and used as a template for sequencing using the primer 5 ' -GGTCAGACGATTGGCCTTG-3 ' (SEQ ID NO: 136) (Enshell- Siejffers et al, 2002) . Reconstitution of CG10 Epitope [0142] For the construction of the reconstituted CG10- epitope, two complementary oligonucleotides were used: 5 ' -GTGTGTAAAATTAACTGCACCAGGAGTAGGAAAAGCAATGTATGGCCC TGGGATTCAAAAATTCGAAAAGCCATGTGCTG-3' (SEQ ID NO:137) and 5 ' -CAGCACATGGCTTTTCGAATTTTTGAATCCCAGGGCCATACATTGCTT TTCCTACTCCTGGTGCAGTTAATTTTACACAC-3' (SEQ ID O:138) [0143] which were designed to create an insert containing 3 ' -overhang ends compatible with the Sfi I sites of the fthl vector and reconstruct the reading frame of phage pVIII protein. The oligonucleotides were heated at 86°C 10 minutes and allowed to slowly cool to 35°C and then were ligated with fthl vector cut with Sfi I to produce clone G12. Analysis of the Reconstituted CG10 Epitope [0144] 100 μl of phages G12 or fthl were applied onto nitrocellulose membrane filters, blocked with 5% milk and incubated with mAbs: CG10 (1-10 μg/ml), CGI, 17b and CG25 mAbs (5 μg/ml) overnight at 4°C. Then the membranes were washed and probed with anti-mouse IgG conjugated with HRP. ECL reaction was used for detection. ELISA [0145] CG10 (1 μg/ml) was pre-incubated overnight at 4°C with a series of double diluted phages (G12 or fthl, 10¹¹ was the starting amount) and then applied onto ELISA plates coated with CD4/gpl20 complex (375 ng) and incubated 1 hour at room temperature and then probed with anti-mouse alkaline phosphatase conjugate. Structure Analyses [0146] Definition of the 17b contact residues and surface accessible residues were performed according to Sobolev et al (1999) . [0147] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without undue experimentation and without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. The means, materials, and steps for carrying out various disclosed functions may take a variety of alternative forms without departing from the invention. Thus the expressions "means to..." and "means for...", or any method step language, as may be found in the specification above and/or in the claims below, followed by a functional statement, are intended to define and cover whatever structural, physical, chemical or electrical element or structure, or whatever method step, which may now or in the future exist which carries out the recited function, whether or not precisely equivalent to the embodiment or embodiments disclosed in the specification above, i.e., other means or steps for carrying out the same functions can be used; and it is intended that such expressions be given their broadest interpretation. References :

Berthet-Colominas et al, "Head-to-tail dimers and interdomain flexibility revealed by the crystal structure of HIV-1 capsid protein (p24) com-plexed with a monoclonal antibody Fab", EMBO J", 18:1124-1136 (1999)

Boots et al, "Anti-human immunodeficiency virus type 1 human monoclonal antibodies that bind discontinuous epitopes in the viral glycoproteins can identify mimotopes from recombinant phage peptide display libraries", AIDS Res Hum Retroviruses 13 (18) : 1549-1559 (1997)

Burton et al, "Efficient neutralization of primary isolates of HIV 1 by a recombinant human monoclonal antibody", Science 266:1024-1027 (1994)

Burton et al, "Why do we not have an HIV vaccine and how can we make one?" Nat Med 4:495-498 (1998)

Chakrabarti et al, "Dissecting protein-protein recognition sites", Proteins 47:334-343 (2002)

Conley et al, "Neutralization of divergent human immunodeficiency virus type 1 variants and primary isolates by IAM-41-2F5, an anti-gp41 human monoclonal antibody", Proc Natl Acad Sci USA 91:3348-52 (1994)

Ellis RW, "Technologies for the design, discovery, formulation and administration of vaccines", Vaccine 19:2681-2687 (2001)

Enshell-Seijffers et al, "The rational design of a 'type 88' genetically stable peptide display vector in the filamentous bacteriophage fd", Nucleic Acids Res 29:E50-0 (2001)

Enshell-Siejffers et al, "Phage-display selection and analysis of Ab-binding epitopes", in Current Protocols in Immunology (Coligan et al, Eds.), John Wiley & Sons, New York (2002) Felici et al, "Mimicking of discontinuous epitopes by phage- displayed peptides, II", Selection of clones recognized by a protective monoclonal antibody against the Bordetella pertussis toxin from phage peptide libraries", Gene 128:21-27(1993)

Folgori et al, "A general strategy to identify mimotopes of pathological antigens using only random peptide libraries and human sera", Embo J, 13:2236-2243 (1994)

Gauduin et al, "Passive immunization with a human monoclonal antibody protects hu-PBL-SCID mice against challenge by primary isolates of HIV-1", Nat Med 3:1389-1393 (1997)

Gershoni et al, "HIV binding to its receptor creates specific epitopes for the CD4/gpl20 complex", Faseb J 7:1185-1187 (1993)

Gershoni et al, "Determination and Control of Bimolecular Interactions", WO 98/20169, published May 14, 1998

Glaser et al, "Residue frequencies and pairing preferences at protein-protein interfaces", Proteins 43:89-102 (2001)

Hansson et al, "Design and production of recombinant subunit vaccines", Biotechnol Appl Biochem 32, 95-107 (2000)

Ho et al, "Construction of recombinant targeting immunogens incorporating an HIV-1 neutralizing epitope into sites of differing conformational constraint", Vaccine 20:1169- 1180 (2002)

Hofmann et al, "On the theoretical prediction of protein antigenic determinants from amino acid sequences", Biomed Biochim Acta 46:855-866 (1987)

Hoffmann-Lehmann et al, "Molecular evolution of human immunodeficiency virus env in humans and monkeys: similar patterns occur during natural disease progression or rapid virus passage", J Virol 76:5278-284 (2002) Hopp TP, "Protein antigen conformation: folding patterns and predictive algorithms; selection of antigenic and immunogenic peptides", Ann Sclavo Collana Monogr 1:47-60 (1984) Hopp TP, "Retrospective: 12 years of antigenic determinant predictions, and more", Pept Res 6:183-190(1993) Jones et al, "Analysis of protein-protein interaction sites using surface patches", J Mol Biol 272:121-132 (1997a) Jones et al, "Prediction of protein-protein interaction sites using patch analysis", J Mol Biol 272:133-143 (1997b) Kabat EA, Structural concepts in immunology and immunochemistry (Ebert et al Eds) , Holt, Rinehart and Winston, Inc., New York, (1968) Kwong et al, "Structure of an HIV gpl20 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody", Nature 393:648-59 (1998) Kwong et al, "Structures of HIV-1 gpl20 envelope glycoproteins from laboratory-adapted and primary isolates", Structure Fold Des 8:1329-1339 (2000) Lo Conte et al, "The atomic structure of protein-protein recognition sites", J Mol Biol 285:2177-2198 (1999) Mascola et al, "Protection of macaques against pathogenic simian/human immunodeficiency virus 89.6PD by passive transfer of neu-tralizing antibodies", J Virol 73:4009- 4018 (1999) Monaco-Malbet et al, "Mutual Conformational Adaptations in Antigen and Antibody upon Complex Formation between an Fab and HIV-1 Capsid Protein p24", Structure 8:1069-1077 (2000) Montefiori et al, "HIV vaccines, Magic of the occult?", Science 283, 336-337 (1999) Moore et al, "Genetic subtypes, humoral immunity, and human immunodeficiency virus type 1 vaccine development", J Virol 75:5721-9 (2001)

Muster et al, "A conserved neutralizing epitope on gp41 of human immunodeficiency virus type 1", J Virol 67:6642- 6647 (1993)

Ottl et al, "Heterotrimeric collagen peptides containing functional epitopes. Synthesis of single-stranded collagen type I peptides related to the collagenase cleavage site", J Pept Sci 5:103-110 (1999)

Parker et al, "New hydrophilicity scale derived from high- performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites", Biochemistry 25:5425-5432 (1986)

Pantophlet et al, "Hyperglycosylated mutants of human immunodeficiency virus (HIV) type 1 monomeric gpl20 as novel antigens for HIV vaccine design", J Virol 77(10) :5889-5901 (2003)

Pauletti et al, "Application of a modified computer algorithm in determining potential antigenic determinants associated with the AIDS virus glycoprotein", Anal Biochem 151:540-546 (1985)

Rizzuto et al, "A conserved HIV gpl20 glycoprotein structure involved in chemokine receptor binding", Science 280:1949-1953 (1998)

Rizzuto et al, "Fine definition of a conserved CCR5-binding region on the human immunodeficiency virus type 1 glycoprotein 120", AIDS Res Hum Retroviruses 16:741-749 (2000)

Sobolev et al, "Automated analysis of interatomic contacts in proteins", Bioinfor atics 15:327-332 (1999) Sullivan et al, "CD4 -Induced conformational changes in the human immunodeficiency virus type 1 gpl20 glycoprotein: consequences for virus entry and neutralization", J Virol 72:4694-4703 (1998)

Thali et al, "Characterization of conserved human immunodeficiency virus type 1 gpl20 neutralization epitopes exposed upon gpl20-CD4 binding", J Virol 67:3978-3988 (1993)

Trkola et al, "Human monoclonal antibody 2G12 defines a distinctive neutralization epitope on the gpl20 glycoprotein of human immunodeficiency virus type 1", J Virol 70:1100-1108 (1996)

Van Regenmortel et al, "Predicting antigenic determinants in proteins: looking for unidimensional solutions to a three-dimensional problem?", Pept Res 7"224-228 (1994)

Van Regenmortel MHV, "Mapping Epitope Structure and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity", Methods 9:465-72 (1996)

Van Regenmortel MHV, "Molecular dissection of protein antigens and prediction of epitopes" in Synthetic Peptides as Antigens (van der Vliet, Ed.), Elsevier Science, Amsterdam (1999)

Villen et al, "Synthetic peptides as functional mimics of a viral discontinuous antigenic site", Biologicals 29:265- 269 (2001)

Wang et al, "Emergence of autologous neutralization-resistant variants from preexisting human immunodeficiency virus (HIV) quasi species during virus rebound in HIV type 1- infected patients undergoing highly active antiretroviral therapy", J Infect Pis 185:608-617 (2002) Welling et al, "Choice of peptide and peptide length for the generation of antibodies reactive with the intact protein", FEBS Lett 182:81-84 (1985)

Wyatt et al, "The antigenic structure of the HIV gpl20 envelope glycoprotein", Nature 393:705-711! (1998)

Xu et al, "Hydrogen bonds and salt bridges across protein- protein interfaces", Protein Eng 10:999-1012 (1997)

Zwick et al, "Identification and characterization of a peptide that specifically binds the human, broadly neutralizing anti-human immunodeficiency virus type 1 antibody bl2", J Virol 75(14) :6692-6699 (2001a)

Zwick et al, "Broadly neutralizing antibodies targeted to the membrane-proximal external region of human immunodeficiency virus type 1 glycoprotein gp41", J Virol 75 (22) :10892-10905 (2001b)

Claims

WHAT IS CLAIMED IS: 1. A method for improved prediction of the region on the surface of a proteinaceous material representing a binding surface that associates with a predetermined binding molecule, comprising: (a) screening a peptide library with said predetermined binding molecule to identify a plurality of peptides that bind to said binding molecule; (b) determining the amino acid sequence of each identified peptide; (c) assigning a symbol to each class of amino acid residue represented in the library and presenting each said sequence as a string of said symbols; (d) calculating the frequency of occurrences of each tandem pair of symbols that exist in the strings of symbols presented in step (c) ,- (e) identifying those tandem pairs of symbols, the number of occurrences of which is statistically significant; (f) mapping on a three-dimensional model of the proteinaceous material those pairs of amino acids represented by the tandem pairs of symbols identified in step (e) , wherein a pair of amino acids is two amino acids, each of which are accessible to the surface of the proteinaceous material and whose alpha carbons are separated by no more than a predetermined distance; and (g) determining clusters of amino acid pairs mapped in (f) , each amino acid pair in the cluster being topographically related to at least one other pair in the cluster, whereby each said cluster is a predicted region on the surface of the proteinaceous material representing said binding surface.

2. A method of identifying a basic element of a binding surface on a proteinaceous material, which binding surface associates with a predetermined binding molecule, comprising: (a) identifying a cluster of amino acids predicted to represent a binding surface by means of the process of claim 1; (b) identifying the outermost amino acids of binding pairs in the cluster so as to define the perimeter of the predicted binding surface; (c) identifying all other amino acids on the surface of the proteinaceous material situated within the perimeter of the predicted binding surface or within a predetermined distance therefrom; and (d) identifying basic elements of the binding surface, each said element being a linear segment of the proteinaceous material whose first and last residues are amino acids identified in (b) or (c) and none of the intermediate amino acids thereof are amino acids on the surface of the proteinaceous material not identified in (b) or (c) , any amino acids identified in (b) or (c) that are not part of a said linear segment being a basic element of a single airiino acid.

3. A method of producing a binding surface mimetic, comprising: (a) identifying the basic elements of the binding surface by means of the process of claim 2; (b) connecting the basic elements in such a manner as to maintain the relative spatial orientation of the amino acids of (a) , thereby identifying a molecule that is mimetic of said binding surface; and (c) producing the mimetic molecule.

4. A pharmaceutical composition including one or more of the basic elements of the binding surface of gpl20 that is recognized by a broadly neutralizing antibody, said composition comprising a pharmaceutically acceptable carrier and one or more of the peptides selected from the group consisting of: amino acids 360-362 of SEQ ID NO:l; amino acids 391-396 of SEQ ID NO:l; amino acids 464-468 of SEQ ID NO:l; and amino acids 110-118 of SEQ ID NO:l.

5. A molecule mimetic of the binding surface of gpl20 that is recognized by a broadly neutralizing antibody, obtained by connecting peptides 360-362, 391-396, and 464-468 of SEQ ID NO:l, each in forward or reverse sequence, in such a manner as to form a single molecule that maintains the spatial orientation that the amino acids thereof have when they are positioned at 360-362, 391-396 and 464-468 of gpl20 (SEQ ID NO:l) .

6. A molecule in accordance with claim 5 having the sequence of SEQ ID NO: 104 or 105.

7. A molecule in accordance with claim 5 having the sequence of SEQ ID NO: 134 or 135.