WO2012154312A1 - Neutralizing antibodies to hiv-1 and their use - Google Patents

Neutralizing antibodies to hiv-1 and their use Download PDF

Info

Publication number
WO2012154312A1
WO2012154312A1 PCT/US2012/030465 US2012030465W WO2012154312A1 WO 2012154312 A1 WO2012154312 A1 WO 2012154312A1 US 2012030465 W US2012030465 W US 2012030465W WO 2012154312 A1 WO2012154312 A1 WO 2012154312A1
Authority
WO
WIPO (PCT)
Prior art keywords
antibody
nucleic acid
vrc
sequences
heavy chain
Prior art date
Application number
PCT/US2012/030465
Other languages
French (fr)
Inventor
John R. Mascola
Gary J. Nabel
Peter D. Kwong
Lawrence S. Shapiro
Jiang Zhu
Xueling Wu
Zhenhai Zhang
Tongqing Zhou
Original Assignee
The United States Of America, As Represented By The Secretary, Department Of Health & Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The United States Of America, As Represented By The Secretary, Department Of Health & Human Services filed Critical The United States Of America, As Represented By The Secretary, Department Of Health & Human Services
Publication of WO2012154312A1 publication Critical patent/WO2012154312A1/en
Priority to PCT/US2013/032070 priority Critical patent/WO2013142324A1/en
Priority to EP13763664.3A priority patent/EP2828294A1/en
Priority to US14/386,920 priority patent/US20150044137A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K16/00Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
    • C07K16/08Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from viruses
    • C07K16/10Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from viruses from RNA viruses
    • C07K16/1036Retroviridae, e.g. leukemia viruses
    • C07K16/1045Lentiviridae, e.g. HIV, FIV, SIV
    • C07K16/1063Lentiviridae, e.g. HIV, FIV, SIV env, e.g. gp41, gp110/120, gp160, V3, PND, CD4 binding site
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56983Viruses
    • G01N33/56988HIV or HTLV
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/20Immunoglobulins specific features characterized by taxonomic origin
    • C07K2317/21Immunoglobulins specific features characterized by taxonomic origin from primates, e.g. man
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/30Immunoglobulins specific features characterized by aspects of specificity or valency
    • C07K2317/35Valency
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/50Immunoglobulins specific features characterized by immunoglobulin fragments
    • C07K2317/52Constant or Fc region; Isotype
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/50Immunoglobulins specific features characterized by immunoglobulin fragments
    • C07K2317/56Immunoglobulins specific features characterized by immunoglobulin fragments variable (Fv) region, i.e. VH and/or VL
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2317/00Immunoglobulins specific features
    • C07K2317/70Immunoglobulins specific features characterized by effect upon binding to a cell or to an antigen
    • C07K2317/76Antagonist effect on antigen, e.g. neutralization or inhibition of binding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • This relates to broadly neutralizing monoclonal antibodies that bind to the CD4 binding site of human immunodeficiency virus (HIV)-l gpl20, their use, and methods of identifying these broadly neutralizing monoclonal antibodies.
  • HAV human immunodeficiency virus
  • HIV-1 vaccine will likely need to induce neutralizing antibodies (NAbs) that block HIV-1 entry into human cells.
  • NAbs neutralizing antibodies
  • vaccine induced antibodies will have to be active against most circulating strains of HIV-1.
  • HIV-1 vaccines are unable to induce potent and broadly reactive NAbs.
  • One major obstacle to the design of better vaccines is the limited understanding of what region of the HIV-1 envelope glycoproteins (gpl20) are recognized by NAbs.
  • gpl20 HIV-1 envelope glycoproteins
  • a few neutralizing monoclonal antibodies (mAbs) have been isolated from HIV-1 infected individuals and these mAbs define specific regions (epitopes) on the virus that are vulnerable to NAbs.
  • HrV-l neutralizing mAb can bind to a site on gpl20 that is required for viral attachment to its primary cellular receptor, CD4.
  • mAb bl2 was derived from a phage display library, a process which makes it impossible to know if the antibody was naturally present in an infected person, or was the result of a laboratory combination of antibody heavy and light chains.
  • bl2 can neutralize about 75% of clade B strains of HIV-1 (those most common in North America), but it is not broadly neutralizing (it neutralizes less than 50% of other strains of HIV-1 found worldwide). Therefore, there is a need to develop broadly neutralizing antibodies for HIV-1.
  • the isolated VRCOl-like broadly neutralizing antibodies do not include the heavy or light chain from an established VRCOl-like antibody.
  • compositions including these antibodies that specifically bind gpl20 and nucleic acids encoding these antibodies, expression vectors comprising the nucleic acids, and isolated host cells that express the nucleic acids.
  • methods for identifying the class of VRCOl-like heavy chain variable domains, broadly neutralizing antibodies that include these heavy chain variable domains, and methods of using these broadly neutralizing antibodies are also disclosed.
  • a VRCOl-like heavy chain variable domain is identified by performing a cross-donor phylogenetic analysis on a population of heavy chain variable domain nucleic acid sequences from B cells from a subject infected with HIV.
  • the cross-donor analysis is performed on a population of nucleic acid sequences encoding heavy chain variable regions having an IGHV1-2 germline origin.
  • the cross-donor analysis is performed on a population of nucleic acid sequences encoding heavy chain variable regions having an IGHV1-2 germline or other germline origin.
  • cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from one or more VRCOl- like antibodies (reference antibodies), such as one or more of VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, PRC-CH30, VRC-CH31, VRC-CH32, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 and 8 ANC 134 heavy chain variable domains, to the population of test sequences.
  • VRCOl- like antibodies reference antibodies
  • nucleotide sequence of one or more germline sequences (such as the IGHV1-2 germline sequence, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV 1-2*05 germline sequence) is added to the population of heavy chain variable domain nucleic acid sequences. This forms an analytic population of sequences.
  • a phylogenetic tree is then constructed from this analytic population, for example using neighbor joining analysis, rooted at the germline sequence (such as the IGHV 1-2 germline sequence, for example the IGHV 1-2*01, IGHV 1-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline sequence).
  • Nucleic acid sequences of interest are selected that segregate within a distinct branch (such as the smallest subtree) in the phylogenetic tree with one or more (such as all) of the heavy chain variable domains from the one or more VRCOl-like antibodies included in the analytic population.
  • the process is repeated in an iterative fashion, for example on subpopulations of heavy chain variable domain nucleic acid sequences from a subject infected with HIV, until the phylogenic tree converges.
  • a nucleic acid encoding a VRCOl-like light chain in a population of test sequences is identified as a nucleic acid sequence of interest.
  • each test sequence in the population is a nucleic acid sequence encoding a light chain variable domain from a subject infected with HIV, wherein the light chain variable domain comprises a complementarity determining region (CDR)l, a CDR2 and a CDR3 and has a corresponding germline origin light chain variable domain.
  • CDR complementarity determining region
  • the CDR3 of the VRCOl-like light chain variable domain comprises a hydrophobic residue followed by a glutamic acid residue or glutamine residue, and if the germline origin of the VRCOl-like light chain variable domain is a IGKVl-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises at least two glycine residues, and if the germline origin of the VRCOl-like light chain variable domain is not a IGKVl-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises a deletion of two or more amino acids compared to the corresponding germline origin.
  • Several embodiments include selecting a test sequence that encodes the
  • VRCOl-like light chain variable domain as the nucleic acid sequence of interest and synthesizing an isolated nucleic acid molecule comprising the nucleic acid sequence of interest, thereby producing the isolated nucleic acid molecule encoding the VRCOl-like light chain variable domain.
  • a polypeptide is produced from the nucleic acid sequence of interest, thereby producing the VRCOl-like heavy or light chain domain.
  • the selected and expressed VRCOl-like heavy or light chain domain is tested for neutralization activity by complementation with a corresponding heavy or light chain variable domain from an identified VRCOl-like antibody, such as one or more of VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, PRC-CH30, VRC- CH31, VRC-CH32, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 light chain variable domains.
  • the antibodies and compositions disclosed herein can be used for a variety of purposes, such as for detecting an HIV-1 infection or diagnosing acquired immune deficiency (AIDS) in a subject.
  • These methods can include contacting a sample from the subject diagnosed with HIV-1 or AIDS with a VRCOl-like antibody that specifically binds gpl20, and detecting binding of the antibody to the sample.
  • An increase in binding of the antibody to the sample relative to binding of the antibody to a control sample confirms that the subject has an HIV-1 infection and/or AIDS.
  • the methods further include contacting a second antibody that specifically binds gpl20 with the sample, and detecting binding of the second antibody.
  • an increase in binding of the antibody to the sample relative to a control sample detects HIV-1 in the subject.
  • the antibody specifically binds soluble gpl20 in the sample.
  • the methods further comprise contacting a second antibody that specifically recognizes VRCOl-like antibody with the sample and detecting binding of the second antibody.
  • a method for treating a subject with an HIV infection, such as, but not limited to, a subject with AIDS.
  • the methods include administering a therapeutically effective amount of a VRCOl-like antibody or an antigen-binding fragment thereof to the subject.
  • Also disclosed are methods for testing a potential vaccine for example by contacting the potential vaccine with a VRCOl-like antibody, or an antigen -binding fragment thereof; and detecting the binding of the antibody to an immunogen in the potential vaccine.
  • FIGS. 1A-1F show the identification and characterization of broadly neutralizing CD4-binding-site monoclonal antibodies (mAbs) from HIV- 1 -infected donors, 74 and 0219.
  • the RSC3 probe was used to identify five broadly neutralizing mAbs, all of which were inferred to derive from the IGVH 1-2*02 allele and displayed a high levels of somatic mutation.
  • FIG. 1A is a plot showing RSC3 analysis of serum.
  • FIG. IB is a set of dot plots showing the RSC3- and ARSC3-binding profile of IgG-i- B cells from donors 74 and 0219. Gating and percentage of B cells of interest (RSC3+ARSC3-) are indicated, with 40 and 26 sorted single B cells from donors 74 and 0219, respectively.
  • FIG. 1C is a sequence alignment showing the protein sequences of heavy chain variable regions of mAbs: VRC-PG04 (SEQ ID NO: 1609) and VRCPG04b (SEQ ID NO: 1610), isolated from donor 74; mAbs VRC-CH30 (SEQ ID NO: 1611), VRC-CH31 (SEQ ID NO: 1612) and VRC- CH32 (SEQ ID NO: 1613), isolated from donor 0219; and mAbs VRCOl (SEQ ID NO: 1614), VRC02 (SEQ ID NO: 1615), and VRC03 (SEQ ID NO: 1616) isolated from donor 45; and IGHV 1-02*02 (SEQ ID NO: 2487) and light chain variable regions of mAbs VRC-PG04 (SEQ ID NO: 1619) and VRCPG04b (SEQ ID NO: 1620), isolated from donor 74, and mAbs VRC-CH30 (SEQ ID NO: 1621), VRC- CH31
  • FIG. ID is a plot showing competition ELISAs. The binding to YU2 gpl20 by a single concentration of biotin- labeled VRC-PG04 or VRC-CH31 was assessed against increasing concentrations of competitive ligand.
  • CD4-Ig is a fusion protein of the N-terminal two domains of CD4 with IgGl Fc.
  • FIG. IF is a set of neutralization dendrograms.
  • VRC-PG04 and VRC-CH31 were tested against genetically diverse Env-pseudoviruses representing the major HIV-1 clades.
  • Neighbor-joining dendrograms display the protein distance of gpl60 sequences from 179 HIV-1 isolates tested against VRCPG04 and a subset (52 isolates) tested against VRC-CH31.
  • a scale bar denotes the distance corresponding to a 1% change in amino acid sequence.
  • Dendrogram branches are indicated by the neutralization potencies of VRC-PG04 and VRC-CH31 against each particular virus.
  • FIGS. 2A-2C are digital images of the atomic structures of antibodies VRC-
  • FIG. 2A is digital image of the overall structures.
  • FIGS. 2B and FIG. 2C are digital images showing interaction close-ups.
  • FIGS. 3A-3C are a set of plots and digital images of atomic structures showing the focused evolution of VRCOl-like antibodies.
  • the maturational processes that facilitate the evolution of VRCOl-like antibodies from low affinity unmutated antibodies to high affinity potent neutralizers involve divergence in antibody sequence and convergence in epitope recognition.
  • FIG. 3A shows antibody convergence.
  • FIG. 3B shows epitope convergence.
  • the HIV-1 gpl20 surface involved with CD4 binding contains conformationally invariant regions (e.g.
  • FIG. 3C is a set of digital images showing the divergences in sequence and convergences in recognition.
  • the development of VRCOl-like antibodies involves a heavy chain derived from the IGHVl-02*02 allele and selected light chain VK alleles.
  • the far left image depicts ribbon representation model of a putative germline antibody.
  • Somatic hypermutation during the process of affinity maturation leads to a divergence in sequence, yet results in the convergent recognition of similar epitopes.
  • Intersection of the epitope surfaces recognized by VRCOl, VRC03 and VRC-PG04 (far right image) reveals a remarkable similarity to the site of vulnerability.
  • the primary divergence of this intersection from the hypothesized site of vulnerability occurs in the region of HIV- 1 gpl20 recognized by the light chain of the VRCOl-like antibodies. While the separate epitopes on gpl20 do show differences in recognition surface, these primarily involve the bridging sheet region, which is likely to adopt a different conformation in the functional viral spike prior to engagement of CD4.
  • FIGS. 4A-4E are dendrograms and plots showing the results of deep sequencing of expressed heavy and light chains from donors 45 and 74. 454 pyrosequencing facilitates the determination of the repertoire of heavy and light chain sequences (the heavy and light chain antibodyomes). Heavy and light chain complementation, computational bioinformatics, and neutralization measurements on reconstituted chimeric antibodies provide functional assessment.
  • FIG. 4A is a dendrogram showing heavy and light chain complementation. The neutralization profiles of VRCOl and VRC03 (donor 45), VRC-PG04 (donor 74), and VRCCH31 (donor 0219) and their heavy and light chain chimeric swaps are depicted with 20- isolate neutralization dendrograms. Explicit neutralization IC50s are provided in
  • FIG. 4B is a set of plots showing the repertoire of heavy chain sequences from donor 45 (2008 sample) and donor 74 (2008 sample). Heavy chain sequences are plotted as a function of sequence identity to the heavy chain of VRCOl (left), VRC03 (middle) and VRC-PG04 (right) and of sequence divergence from putative genomic VH-alleles: upper row plots show sequences of putative IGHV 1-2*02 allelic origin; lower row plots show sequences from other allelic origins.
  • FIG. 4C is a set of plots showing the repertoire of expressed light chain sequences from donor 45 (2001 sample). Light chain sequences are plotted as a function of sequence identify to VRCOl (left) and VRC03 (right) light chains, and of sequence divergence from putative genomic V-gene alleles. Sequences with 2-residue deletions in the CDR LI region (which is observed in VRCOl and VRC03) are shown as black dots. Two sequences, with 92.0% identify to VRCOl (sequence ID 181371) and with 90.3% identify to VRC03 (sequence ID 223454) are highlighted with triangles. FIG.
  • FIG. 4D is a set of dendrograms showing the functional assessment of light chain sequences identified by deep sequencing.
  • the neutralization profiles of sequence 181371 reconstituted with the VRCOl heavy chain (named gVRC-Lld45) and of sequence 223454 reconstituted with the VRC03 heavy chain (named gVRCL2d45) are depicted with 20-isolate neutralization dendrograms; explicit neutralization IC50s are shown provided in Table S15.
  • FIG. 4E is a set of plots showing the functional assessment of heavy chain sequences identified by deep sequencing.
  • Heavy chain sequences from donors 45 and 74 were synthesized and expressed with either the light chain of VRCOl or VRC03 (for donor 45) or the light chain of VRC-PG04 (for donor 74) and evaluated for neutralization. Neutralizing antibodies are shown as stars and are labeled. Comprehensive expression and neutralization results are presented in Tables S14 and S15. gVRC-H(n) refers to the heavy chains with confirmed neutralization when reconstituted with the light chain of VRC-PG04 (Tables S 14 and S 15).
  • FIGS. 5A-5B are a set of phylogenetic trees and digital images of antibody structures showing the maturational similarities of VRCOl -like antibodies in different donors revealed by phylogenetic analysis.
  • the structural convergence in maturation of VRCOl -like antibodies suggested similarities of their maturation processes; phylogenetic analysis revealed such similarities and allowed maturation intermediates to be inferred.
  • FIG. 5A shows neighbor-joining phylogenetic trees of heavy chain sequences from donor 45 (left) and donor 74 (right).
  • the donor 45 tree is rooted by the putative reverted unmutated ancestor of the heavy chain of VRCOl, and also includes specific neutralizing sequences from donor 74 and 0219.
  • the donor 74 tree is rooted in the putative reverted unmutated ancestor of the heavy chain of VRC-PG04, and sequences donor 45 and 0219 are included in the phylogenetic analysis. Bars representing 0.1 changes per nucleotide sequence are shown. Insets show J chain assignments for all sequences within the neutralizing subtree identified by the exogenous donor sequences.
  • FIG. 5B shows
  • FIGS. 6A-6E is a set of plots and a phylogenetic tree showing the analysis of the heavy chain antibodyome of donor 74 and identification of heavy chains with HIV- 1 neutralizing activity. Identity/diversity- grid analysis, cross-donor
  • FIG. 6A is a plot showing identity/diversity-grid analysis. The location of the 70 synthesized heavy chains from donor 74 is shown, including neutralizing (light stars) and nonneutralizing (black stars) sequences.
  • FIG. 6B shows a cross-donor phylogenetic analysis and CDR H3 lineage analysis. A maximum- likelihood phylogenetic tree of the 70 synthesized heavy chain sequences is rooted in the putative reverted unmutated ancestor of VRC-PG04. The probe-identified VRCPG and VRC-CH antibodies are shown.
  • FIG. 6C is a plot showing the expression levels of selected heavy chains reconstituted with the light chain of VRC-PG04 versus breadth of neutralization.
  • FIG. 6D is a plot showing the neutralization potency of reconstituted phylo genetically-predicted antibodies on seven HIV-1 isolates.
  • FIG. 6E is a plot showing the CDR H3 analysis of donor 74 heavy chain sequences.
  • the CDR H3 was determined and its percent identity to that of the VRC-PG04 heavy chain was graphed.
  • the sequences with high CDR H3 identity to VRC-PG04 reside in regions of high overall heavy chain sequence identity, even for sequences with a low divergence from IGHVl-2*02.
  • FIGS. 7A-7C are plots and digital images showing the maturation lineages of four unique VRCOl-like heavy chains in donor 74.
  • the CDR H3 sequence a product of V(D)J gene recombination and N nucleotide addition and removal, provides a signature to trace the lineage of a particular B cell.
  • FIG. 7A shows the lineage analysis of CDR H3 class 3 (SEQ ID NO: 2491). Grid positions are displayed for the 390 heavy chain sequences with a CDR H3 sequence identical to the identified CDR H3 class 3. These sequences cluster into an elongated family of sequences with moderate identity to VRC-PG04.
  • sequences ranging from low to high IGVH 1-2*02 sequence divergence are shown as structural models of the heavy chain variable domain, with maturation changes highlighted in surface mode indicated by chemistry as in FIG.5B. Sequences of displayed structures are shown in FIG. 22. Overall neutralization breadth and potency for sequence ID 13826_2 was assessed on a 20-isolate HIV-1 panel, with individual neutralization results tabulated in Table S15.
  • FIG. 7B shows the lineage analysis of CDR H3 class 6 (SEQ ID NO: 2492) that was performed as described above. The sequence ID 10731_1 that was selected in the grid analysis and found to be neutralizing is shown as a member of this family.
  • CDR H3 classes 7 and 8 SEQ ID NO: 2594.
  • Analysis of the CDR H3 of classes 7 and 8 suggest that these might be clonally related (FIG. 21). Sequences from these related classes segregate in similar ways, suggestive of related maturational pathways.
  • FIG. 8 is a set of dot plots and a table showing single RSC3-specific B cell sorting. About 20 million PBMC from donors 74 and 0219 were incubated with APC and PE labeled RSC3 and RSC3, respectively. Memory B cells were selected on the basis of the presented gating strategy. The percentages of B cells that reacted with RSC3 and not RSC3 within IgG-i- B cells are indicated. The actually sorted single B cells were 40 from donor 74 and 26 from donor 0219. The sorter configurations are indicated in the bottom table.
  • FIG. 9 is a set of graphs showing antigen binding profiles of five newly isolated mAbs, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 and VRC- CH32, as measured by ELISA. Solid symbols show mAb binding to RSC3 (top) and YU2 gpl20 (bottom). Open symbols indicate mAb binding to ARSC3 or to the CD4bs knockout mutant of gpl20, D368R.
  • FIG. 10 is a set of graphs of competition ELISAs showing that mAbs VRC- PG04 and VRC-CH31 are directed to the CD4bs of HIV-1 gpl20.
  • the competition ELISAs were performed with a single concentration of biotin-labeled VRC-PG04 or VRCCH31. Unlabeled mAbs were titrated into the ELISA at increasing concentration of biotin-labeled VRC-PG04 or VRCCH31. Unlabeled mAbs were titrated into the ELISA at increasing
  • CD4-Ig is a fusion protein of the N- terminal two domains of CD4 fused with IgGl Fc to served as a CD4 surrogate.
  • FIG. 11A-11C is a set of digital images and a table showing that CDR H2 and CDR L3 regions of VRCOl-like antibodies showed high degree of similarity in recognition on gpl20.
  • FIG. 11A shows that when gpl20 was superimposed, orientations of the antibodies in the gp 120: antibody complexes were compared. CDR H2 and CDR L3 regions of VRCOl -like antibodies showed high precision alignment.
  • FIG. 11B shows ribbon representation of VRCOl, VRC03 and VRC- PG04 in the same orientation as panel A.
  • FIG. 11C is a table showing the pairwise root-mean- square deviation (RMSD) of CDR loops between VRCOl, VRC03 and VRC-PG04.
  • RMSD pairwise root-mean- square deviation
  • FIG. 12 is a set of graphs showing the correlations between structural convergence and antigen-interacting surface areas of antibody, (left) A significant correlation was found between antigen-interfacing surface on CDR and average RMSD for the six CDR regions in the three available structures (VRCOl, VRC03, and VRC-PG04). The point for CDR L3 of VRC03 overlaps almost perfectly with the point for CDR L3 of VRCOl and is not visible, (right) While no correlation was found between average antigen-interfacing surface and Ca deviation for each interface residue, residues with large interface surface were observed to have low Ca deviations.
  • FIG. 13 is a plot of the 454 sequence distribution of donor 45 and donor 74 heavy-chain antibodyomes plotted as a function of sequence identity to VRC02 and VRC-PG04b and sequence divergence from respective germlines.
  • Non-IGHV 1-2*02 germline divergence Row one plots sequences of IGHVl -2*02 origin and row two plots sequences of other origins.
  • FIG. 14 is a plot of neutralization of expressed phylogeny-segregated sequences and sequences selected by other criteria from donor 45 2008 heavy-chain antibodyome. Specifically, the two neutralizing sequences were selected from the phylogenetic subtree of IGHV 1-2*02 sequences (see FIG.
  • FIG. 15 are a set of plots of the sequence distribution of 454- pyrosequencing-determined donor 74 heavy-chain antibodyome (obtained from Beckman Coulter Genomics) plotted as a function of sequence identity to VRCOl, VRC03 and VRC-PG04 and sequence divergence from respective germlines. Row one plots sequences of IGHV 1-2*02 origin and row two plots sequences of non- IGHV 1-2*02 origin.
  • FIG. 16 is a plot of the identity/divergence -grid assessment of donor 74 heavy-chain 2008 antibodyome.
  • a 10X10 grid was placed over the quadrant defined by high divergence and high sequence identity to VRC-PG04.
  • the sequences within each square of the grid were subjected to a clustering procedure with a sequence identity cutoff of 90%.
  • a sequence was then randomly selected from the largest cluster as candidate.
  • An initial set of 57 sequences was obtained using this approach. Sequences with an identity of 95% or greater to others or containing uncorrected sequencing errors were replaced by new ones selected from the grid. Note that every time a new sequence was selected, the possibility of overlapping with sequences of neighboring squares was examined using sequence clustering.
  • a total of 56 grid- selected sequences were synthesized to assess the function of 454- pyrosequencing- determined heavy-chain sequences.
  • FIG. 17 is a plot of a phylogenetic tree in which additional sequences were selected to enhance the coverage of phylogeny-segregated sequences.
  • 5047 sequences were found to segregate with VRCOl, VRC02, VRC03 and VRC-PG04 on a district branch.
  • FIG. 18 is a phylogenetic tree of 98 sequences from donor 45 light-chain 2001 antibodyome that have the same VRCOl-like and VRC03-like deletions.
  • the maximum likelihood (ML) tree is rooted at the IGKV3- 11*01, VRCOl light-chain V-gene germline, which is highlighted in green.
  • the known VRCOl-like antibody light-chain sequences are colored in light grey and the two synthesized sequences that show functional complementation with VRCOl-like heavy chains are highlighted in grey.
  • FIG. 19 is a sequence alignment of CDR H3 classification of 35 expressed and experimentally tested heavy-chain sequences (SEQ ID NOs: 2502-2536) in the neutralization tree shown in FIG.
  • FIG. 20 is a sequence alignment of CDRH3 analysis of expressed heavy chain sequences from donor 74.
  • FIG. 21 s a sequence alignment of CDRH3analysis of expressed heavy chain sequences from donor 74.
  • FIG. 22 is a sequence alignment of maturation intermediates in CDR H3 classes 3, 6, 7 and 8 shown in FIG. 7.
  • the neutralizing heavy-chain sequences are highlighted in grey and CDR H3 region is circled by dotted line.
  • FIG. 23 shows the amino acid frequencies in the VH domains of VRC01- like neutralizing antibodies. Sequence alignment was generated for the VH domains of the twenty-two identified neutralizing sequences from donor 74, along with VRCOl, VRC02, VRC03, VRC-PG04, and VRC-PG04b. The amino acid frequencies for each of the VH residue positions were plotted using WebLogo. The height of each letter is proportional to the frequency with which the respective amino acid type is observed for the given residue position. The IGHV1- 2*02 germline sequence is shown for comparison; insertions with respect to IGHVl-2*02 were not included in this analysis.
  • FIG. 24 is Tables SI and S2.
  • FIG. 25 is Tables S3a and S3b.
  • FIG. 26 is Tables S3c and S3d.
  • FIG. 27 is Tables S3d, S3e and S3f.
  • FIG. 28 is Tables S3g and S3h.
  • FIG. 29 is Table S4.
  • FIG. 30 is Tables S5a and S5b.
  • FIG. 31 is Tables S5c and S5d.
  • FIG. 32 is Tables S6a and S6b.
  • FIG. 33 is Tables S6c and S6d.
  • FIG. 34 is Tables S7 and S8.
  • FIG. 35 is Tables S9 and S10.
  • FIG. 36 is Tables SI 1 and S 12.
  • FIG. 37 is Tables S13 (SEQ ID NOs: 2537-2579) and S14 (SEQ ID NOs: 2580-2623).
  • FIG. 38 is Table S14 (SEQ ID NOs: 1707-1755) continued.
  • FIG. 39 is Tables S15 and S16.
  • FIG. 40 is a flow diagram of a build of a neighborhood joining tree.
  • FIG. 41 is a table showing the number of cross-donor positive sequence reads from an exemplary analysis.
  • FIG. 42 is a heat map plot showing identity to a VRC antibody and divergence from the respective germline.
  • FIG. 43A and 43B are table of results from all-origin and IGHVl-2 origin cross donor phylogenetic analyses.
  • FIG. 44 is a set of Venn diagrams of results from all-origin and IGHVl-2 origin cross donor phylogenetic analyses.
  • FIG. 45 is a set of Venn diagrams of rooted an non-rooted cross donor phylogenetic analyses.
  • FIG. 46 is a set of graphs of the percent of the segregated antibodies in a rooted analysis and an unrooted analysis.
  • FIG. 47 is a phylogenetic tree, wherein all native/reference heavy chain antibody nucleotide sequences segregate in a subtree that does not contain the germline sequence.
  • FIG. 48 is a phylogenetic tree. Rooting the tree on the IGHVl-2*02 changes the number of sequences in the cross donor phylogenetic analysis.
  • FIG. 49A is a table of data from an exemplary analysis.
  • FIG. 49B is a graph showing the percent of segregation to the initial input and the iteration of cross-donor runs.
  • FIG. 50A is a table of data from an exemplary analysis.
  • FIG. 50B is a graph showing the percent of segregation to the initial input and the iteration of cross-donor runs.
  • FIG. 51A is a table showing the overlap of results from all-origin and
  • FIG. 51B is a Venn diagram showing the overlap of results from all-origin and IGHV1-2 origin cross donor phylogenetic analyses.
  • FIG. 52A is a table of data from an exemplary analysis, following 12 iterations.
  • FIG. 52B is a Venn diagram of the data from an exemplary analysis, following 12 iterations.
  • FIG. 53B is a set of plots of heat maps.
  • the all-origin cross donor analysis identified >99 of the VRCOl -like antibodies from the input data set.
  • FIG. 54 is a graph showing the percent of antibodies with WGXG start positioning.
  • FIG. 55 is a set of bar graphs of V-gene family assignments.
  • FIG. 56 is a set of bar graphs of J-gene family assignments.
  • FIG. 57 is a table of data illustrating the number of correct germline assignments corresponding to the indicated germline assignment programs.
  • FIG. 58 is a set of bar graphs of V-gene family assignments.
  • FIG. 59 is a set of bar graphs of J-gene family assignments.
  • FIG. 60A-60D are tables showing HIV- 1 neutralization by the indicated complemented antibody heavy and light chains.
  • FIG. 61A is a graph showing results of a neutralization assay.
  • FIG. 61B is a heat mapshowing sequence identity and germline divergence of the indicated antibodies heavy chains.
  • FIG. 61C is a phylogenic tree of a cross-donor phylogenetic analysis.
  • FIG. 62A is a bar graph showing germline origin of sequence reads.
  • FIG. 62B are pie charts and graphical representations of relatedness.
  • FIG. 63 is a schematic diagram comparing methods for functional antibody identification.
  • FIG. 64 is a bar graph showing the read length distribution.
  • FIG. 65 is a phylogenic tree of a cross-donor phylogenetic analysis.
  • FIG. 66 is data from deep sequencing analysis of donor 200-384.
  • FIG. 67A is a table entitled "Structural Studies on CD4-Binding
  • FIG. 67B is a diagram of the conformation of gpl20 when bound to VRCOl like antibodies.
  • FIG. 68 is the amino acid sequence of IGKV3-11*01 (SEQ ID NO: 2488) and VRCOl (SEQ ID NO: 1624) light chain.
  • FIG. 69 illustrates the light chain contact residues.
  • FIG. 70 is a diagram showing the position of the CDR L3 in bound form.
  • FIG. 71 is a schematic diagram showing a conformation of loop D and VRCOl CDRL1.
  • VK3-11*01 SEQ ID NO: 2488
  • VRCOl SEQ ID NO: 1624
  • VK3-20*01 SEQ ID NO: 2489
  • VRC03 SEQ ID NO: 1626
  • VRC-PG04 SEQ ID NO: 1619
  • VK1-33*01 SEQ ID NO: 2490
  • VRC-CH31 SEQ ID NO: 1622
  • 12A21 SEQ ID NO: 2460
  • LV2-14*01 SEQ ID NO: 2624
  • VRC-PG20 SEQ ID NO: 1652
  • FIG. 72 is a diagram of conserved residues in CDR L3 that stabilize loop D and base of V5 on gpl20. Partial VK3- 11*01 (SEQ ID NO: 2488), VRCOl (SEQ ID NO: 1624), VK3-20*01 (SEQ ID NO: 2489), VRC03 (SEQ ID NO: 1626), VRC- PG04 (SEQ ID NO: 1619), VK1-33*01 (SEQ ID NO: 2490), VRC-CH31 (SEQ ID NO: 1622) sequences are shown.
  • FIG. 73 is a table of data showing that the light chains of VRCOl -like antibodies have diverse origins.
  • FIG. 74 is a set of sequences and a schematic showing the conserved hydrophobic-Glu motif.
  • FIG. 75 is a bar graph of the distribution of sequence reads for a donor sample (donor 45).
  • FIG. 76 is a bar graph of the germline family distribution for a donor sample (donor 45).
  • FIG. 77 is a heat map for donor 45 deep sequencing. VRCOl -like light chains are labeled as black dots.
  • FIG. 78 is a bar graph of the distribution of sequence reads for a donor sample (donor 57).
  • FIG. 79 is a bar graph of the germline family distribution for a donor sample (donor 57)
  • FIG. 80 is a heat map for donor 57 deep sequencing. VRCOl-like light chains are labeled as black dots.
  • nucleic and amino acid sequences listed in the accompanying sequence listing are shown using standard letter abbreviations for nucleotide bases, and three letter code for amino acids, as defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood as included by any reference to the displayed strand.
  • SEQ ID NOs: 1-2 are the nucleic acid sequence of VRCOl-like heavy chains isolated from donor 45 (and deposited as GENBANK® Accession Numbers JN159474 - JN159475).
  • SEQ ID NOs: 3-4 are the nucleic acid sequence of VRCOl-like light chains isolated from donor 45 (and deposited as GENBANK® Accession Numbers JN159468 - JN159469).
  • SEQ ID NOs: 5-7 are the nucleic acid sequences of heavy chains from the VRC-CH30, VRC-CH32, and VRC-CH32 antibodies (and deposited as
  • SEQ ID Nos: 8-10 are the nucleic acid sequences of light chains from the VRC-CH30, VRC-CH32, and VRC-CH32 antibodies.
  • SEQ ID NOs: 11-34 are the nucleic acid sequence of VRCOl-like heavy chains isolated from donor 74 (and deposited as GENBANK® Accession Numbers JN159440 - JN159463).
  • SEQ ID NOs: 35-36 are the nucleic acid sequences of light chains from the VRC-PG04 and VRC-PG04b antibodies (and deposited as GENBANK® Accession Numbers JN159466 - JN159467).
  • SEQ ID Nos: 37-38 are the nucleic acid sequences of heavy chains from the
  • VRC-PG04 and VRC-PG04b antibodies (and deposited as GENBANK® Accession Numbers JN159464 - JN159465).
  • SEQ ID NOs: 39-40 are the nucleic acid sequences of heavy chains from the VRC-CH33 and VRC-CH34 antibodies (and deposited as GENBANK® Accession Numbers JN159470 -JN159471).
  • SEQ ID NOs: 41-42 are the nucleic acid sequences of light chains from the VRC-CH33 and VRC-CH34 antibodies (and deposited as GENBANK® Accession Numbers JN159472 -JN159473).
  • SEQ ID NOs: 43-1603 are the nucleic acid sequences of heavy chain sequences of exemplary VRCOl-like antibodies (and deposited as GENBANK® Accession Numbers JN157873-JN159433).
  • SEQ ID Nos: 1604-1608 are the nucleic acid sequences of the IGHV1-02,
  • SEQ ID NO: 1609 is the amino acid sequence of VRC-PG04 heavy chain.
  • SEQ ID NO: 1610 is the amino acid sequence of VRC-PG04b heavy chain.
  • SEQ ID NO: 1611 is the amino acid sequence of VRC-CH30 heavy chain.
  • SEQ ID NO: 1612 is the amino acid sequence of VRC-CH31 heavy chain.
  • SEQ ID NO: 1613 is the amino acid sequence of VRC-CH32 heavy chain.
  • SEQ ID NO: 1614 is the amino acid sequence of VRCOl heavy chain.
  • SEQ ID NO: 1615 is the amino acid sequence of VRC02 heavy chain.
  • SEQ ID NO: 1616 is the amino acid sequence of VRC03 heavy chain.
  • SEQ ID NO: 1617 is the nucleic acid sequence of a primer.
  • SEQ ID NO: 1618 is the nucleic acid sequence of a primer.
  • SEQ ID NO: 1619 is the amino acid sequence of VRC-PG04 kappa chain.
  • SEQ ID NO: 1620 is the amino acid sequence of VRC_PG04b kappa chain.
  • SEQ ID NO: 1621 is the amino acid sequence of VRC-CH30 kappa chain.
  • SEQ ID NO: 1622 is the amino acid sequence of VRC-CH31 kappa chain.
  • SEQ ID NO: 1623 is the amino acid sequence of VRC-CH32 kappa chain.
  • SEQ ID NO: 1624 is the amino acid sequence of VRCOl kappa chain.
  • SEQ ID NO: 1625 is the amino acid sequence of VRC02 kappa chain.
  • SEQ ID NO: 1626 is the amino acid sequence of VRC03 kappa chain.
  • SEQ ID NO: 1627 is the amino acid sequence of VRCOlb heavy chain.
  • SEQ ID NO: 1628 is the amino acid sequence of VRCOlb kappa chain.
  • SEQ ID NO: 1629 is the amino acid sequence of VRC03b heavy chain.
  • SEQ ID NO: 1630 is the amino acid sequence of VRC03b kappa chain.
  • SEQ ID NO: 1631 is the amino acid sequence of VRC03c heavy chain.
  • SEQ ID NO: 1632 is the amino acid sequence of VRC06 heavy chain.
  • SEQ ID NO: 1633 is the amino acid sequence of VRC06 kappa chain.
  • SEQ ID NO: 1634 is the amino acid sequence of VRC06b heavy chain.
  • SEQ ID NO: 1635 is the amino acid sequence of VRC06b kappa chain.
  • SEQ ID NO: 1636 is the amino acid sequence of VRC07 heavy chain.
  • SEQ ID NO: 1637 is the amino acid sequence of VRC07b heavy chain.
  • SEQ ID NO: 1638 is the amino acid sequence of VRC07b kappa chain.
  • SEQ ID NO: 1639 is the amino acid sequence of VRC07c heavy chain.
  • SEQ ID NO: 1640 is the amino acid sequence of VRC07c kappa chain.
  • SEQ ID NO: 1641 is the amino acid sequence of VRC08 heavy chain.
  • SEQ ID NO: 1642 is the amino acid sequence of VRC08b heavy chain.
  • SEQ ID NO: 1643 is the amino acid sequence of VRC17 heavy chain.
  • SEQ ID NO: 1644 is the amino acid sequence of VRC18 heavy chain.
  • SEQ ID NO: 1645 is the amino acid sequence of VRC18b heavy chain.
  • SEQ ID NO: 1646 is the amino acid sequence of VRC18b kappa chain.
  • SEQ ID NO: 1647 is the amino acid sequence of VRC-PG19 heavy chain.
  • SEQ ID NO: 1648 is the amino acid sequence of VRC-PG19 lambda chain.
  • SEQ ID NO: 1649 is the amino acid sequence of VRC-PG19b heavy chain.
  • SEQ ID NO: 1650 is the amino acid sequence of VRC-PG19b lambda chain.
  • SEQ ID NO: 1651 is the amino acid sequence of VRC-PG20 heavy chain.
  • SEQ ID NO: 1652 is the amino acid sequence of VRC-PG20 lambda chain.
  • SEQ ID NO: 1653 is the amino acid sequence of VRC-PG20b heavy chain.
  • SEQ ID NO: 1654 is the amino acid sequence of VRC-PG20b lambda chain.
  • SEQ ID NO: 1655 is the amino acid sequence of VRC23 heavy chain.
  • SEQ ID NO: 1656 is the amino acid sequence of VRC23 kappa chain.
  • SEQ ID NO: 1657 is the amino acid sequence of VRC23b heavy chain.
  • SEQ ID NO: 1658 is the amino acid sequence of VRC23b kappa chain.
  • SEQ ID NO: 1659 is the amino acid sequence of VRC-CH33 heavy chain.
  • SEQ ID NO: 1660 is the amino acid sequence of VRC-CH33 kappa chain.
  • SEQ ID NO: 1661 is the amino acid sequence of VRC-CH34 heavy chain.
  • SEQ ID NO: 1662 is the amino acid sequence of VRC-CH34 kappa chain.
  • SEQ ID NO: 1663 is an exemplary nucleic acid sequence encoding VRC- PG04 heavy chain.
  • SEQ ID NO: 1664 is an exemplary nucleic acid sequence encoding VRC- PG04b heavy chain.
  • SEQ ID NO: 1665 is an exemplary nucleic acid sequence encoding VRC- CH30 heavy chain.
  • SEQ ID NO: 1666 is an exemplary nucleic acid sequence encoding VRC-
  • SEQ ID NO: 1667 is an exemplary nucleic acid sequence encoding VRC- CH32 heavy chain.
  • SEQ ID NO: 1668 is an exemplary nucleic acid sequence encoding VRCOl heavy chain.
  • SEQ ID NO: 1669 is an exemplary nucleic acid sequence encoding VRC02 heavy chain.
  • SEQ ID NO: 1670 is an exemplary nucleic acid sequence encoding VRC03 heavy chain.
  • SEQ ID NO: 1671 is an exemplary nucleic acid sequence encoding VRC-
  • SEQ ID NO: 1672 is an exemplary nucleic acid sequence encoding VRC_PG04b kappa chain.
  • SEQ ID NO: 1673 is an exemplary nucleic acid sequence encoding VRC- CH30 kappa chain.
  • SEQ ID NO: 1674 is an exemplary nucleic acid sequence encoding VRC- CH31 kappa chain.
  • SEQ ID NO: 1675 is an exemplary nucleic acid sequence encoding VRC- CH32 kappa chain.
  • SEQ ID NO: 1676 is an exemplary nucleic acid sequence encoding VRCOl kappa chain.
  • SEQ ID NO: 1677 is an exemplary nucleic acid sequence encoding VRC02 kappa chain.
  • SEQ ID NO: 1678 is an exemplary nucleic acid sequence encoding VRC03 kappa chain.
  • SEQ ID NO: 1679 is an exemplary nucleic acid sequence encoding VRCOlb heavy chain.
  • SEQ ID NO: 1680 is an exemplary nucleic acid sequence encoding VRCOlb kappa chain.
  • SEQ ID NO: 1681 is an exemplary nucleic acid sequence encoding VRC03b heavy chain.
  • SEQ ID NO: 1682 is an exemplary nucleic acid sequence encoding VRC03b kappa chain.
  • SEQ ID NO: 1683 is an exemplary nucleic acid sequence encoding VRC03c heavy chain.
  • SEQ ID NO: 1684 is an exemplary nucleic acid sequence encoding VRC06 heavy chain.
  • SEQ ID NO: 1685 is an exemplary nucleic acid sequence encoding VRC06 kappa chain.
  • SEQ ID NO: 1686 is an exemplary nucleic acid sequence encoding VRC06b heavy chain.
  • SEQ ID NO: 1687 is an exemplary nucleic acid sequence encoding VRC06b kappa chain.
  • SEQ ID NO: 1688 is an exemplary nucleic acid sequence encoding VRC07 heavy chain.
  • SEQ ID NO: 1689 is an exemplary nucleic acid sequence encoding VRC07b heavy chain.
  • SEQ ID NO: 1690 is an exemplary nucleic acid sequence encoding VRC07b kappa chain.
  • SEQ ID NO: 1691 is an exemplary nucleic acid sequence encoding VRC07c heavy chain.
  • SEQ ID NO: 1692 is an exemplary nucleic acid sequence encoding VRC07c kappa chain.
  • SEQ ID NO: 1693 is an exemplary nucleic acid sequence encoding VRC08 heavy chain.
  • SEQ ID NO: 1694 is an exemplary nucleic acid sequence encoding VRC08b heavy chain.
  • SEQ ID NO: 1695 is an exemplary nucleic acid sequence encoding VRC17 heavy chain.
  • SEQ ID NO: 1696 is an exemplary nucleic acid sequence encoding VRC18 heavy chain.
  • SEQ ID NO: 1697 is an exemplary nucleic acid sequence encoding VRC18b heavy chain.
  • SEQ ID NO: 1698 is an exemplary nucleic acid sequence encoding VRC18b kappa chain.
  • SEQ ID NO: 1699 is an exemplary nucleic acid sequence encoding VRC- PG19 heavy chain.
  • SEQ ID NO: 1700 is an exemplary nucleic acid sequence encoding VRC- PG19 lambda chain.
  • SEQ ID NO: 1701 is an exemplary nucleic acid sequence encoding VRC- PG19b heavy chain.
  • SEQ ID NO: 1702 is an exemplary nucleic acid sequence encoding VRC- PG19b lambda chain.
  • SEQ ID NO: 1703 is an exemplary nucleic acid sequence encoding VRC-
  • SEQ ID NO: 1704 is an exemplary nucleic acid sequence encoding VRC- PG20 lambda chain.
  • SEQ ID NO: 1705 is an exemplary nucleic acid sequence encoding VRC- PG20b heavy chain.
  • SEQ ID NO: 1706 is an exemplary nucleic acid sequence encoding VRC- PG20b lambda chain.
  • SEQ ID NO: 1707 is an exemplary nucleic acid sequence encoding VRC23 heavy chain.
  • SEQ ID NO: 1708 is an exemplary nucleic acid sequence encoding VRC23 kappa chain.
  • SEQ ID NO: 1709 is an exemplary nucleic acid sequence encoding VRC23b heavy chain.
  • SEQ ID NO: 1710 is an exemplary nucleic acid sequence encoding VRC23b kappa chain.
  • SEQ ID NO: 1711 is an exemplary nucleic acid sequence encoding VRC- CH33 heavy chain.
  • SEQ ID NO: 1712 is an exemplary nucleic acid sequence encoding VRC- CH33 kappa chain.
  • SEQ ID NO: 1713 is an exemplary nucleic acid sequence encoding VRC- CH34 heavy chain.
  • SEQ ID NO: 1714 is an exemplary nucleic acid sequence encoding VRC- CH34 kappa chain.
  • SEQ ID Nos: 1715-2414 are amino acid sequence of the heavy chains of VRCOl-like antibodies (and correspond to SEQ ID NOs:760-1459 of PCT
  • SEQ ID NO: 2415 is the amino acid sequence of VRC13 heavy chain.
  • SEQ ID NO: 2416 is the amino acid sequence of VRC13 lambda chain.
  • SEQ ID NO: 2417 is the amino acid sequence of VRC14 heavy chain.
  • SEQ ID NO: 2418 is the amino acid sequence of VRC14 lambda chain.
  • SEQ ID NO: 2419 is the amino acid sequence of VRC14b heavy chain.
  • SEQ ID NO: 2420 is the amino acid sequence of VRC14b lambda chain.
  • SEQ ID NO: 2421 is the amino acid sequence of VRC14c heavy chain.
  • SEQ ID NO: 2422 is the amino acid sequence of VRC14c lambda chain.
  • SEQ ID NO: 2423 is the amino acid sequence of VRC15 heavy chain.
  • SEQ ID NO: 2424 is the amino acid sequence of VRC15 lambda chain.
  • SEQ ID NO: 2425 is the amino acid sequence of VRC16 heavy chain.
  • SEQ ID NO: 2426 is the amino acid sequence of VRC16 kappa chain.
  • SEQ ID NO: 2427 is the amino acid sequence of VRC16b heavy chain.
  • SEQ ID NO: 2428 is the amino acid sequence of VRC16b kappa chain.
  • SEQ ID NO: 2429 is the amino acid sequence of VRC16c heavy chain.
  • SEQ ID NO: 2430 is the amino acid sequence of VRC16c kappa chain.
  • SEQ ID NO: 2431 is the amino acid sequence of VRC16d heavy chain.
  • SEQ ID NO: 2432 is the amino acid sequence of VRC16d kappa chain.
  • SEQ ID NO: 2433 is an exemplary nucleic acid sequence encoding VRC13 heavy chain.
  • SEQ ID NO: 2434 is an exemplary nucleic acid sequence encoding VRC13 lambda chain.
  • SEQ ID NO: 2435 is an exemplary nucleic acid sequence encoding VRC14 heavy chain.
  • SEQ ID NO: 2436 is an exemplary nucleic acid sequence encoding VRC14 lambda chain.
  • SEQ ID NO: 2437 is an exemplary nucleic acid sequence encoding VRC14b heavy chain.
  • SEQ ID NO: 2438 is an exemplary nucleic acid sequence encoding VRC14b lambda chain.
  • SEQ ID NO: 2439 is an exemplary nucleic acid sequence encoding VRC14c heavy chain.
  • SEQ ID NO: 2440 is an exemplary nucleic acid sequence encoding VRC14c lambda chain.
  • SEQ ID NO: 2441 is an exemplary nucleic acid sequence encoding VRC15 heavy chain.
  • SEQ ID NO: 2442 is an exemplary nucleic acid sequence encoding VRC15 lambda chain.
  • SEQ ID NO: 2443 is an exemplary nucleic acid sequence encoding VRC16 heavy chain.
  • SEQ ID NO: 2444 is an exemplary nucleic acid sequence encoding VRC16 kappa chain.
  • SEQ ID NO: 2445 is an exemplary nucleic acid sequence encoding VRC16b heavy chain.
  • SEQ ID NO: 2446 is an exemplary nucleic acid sequence encoding VRC16b kappa chain.
  • SEQ ID NO: 2447 is an exemplary nucleic acid sequence encoding VRC16c heavy chain.
  • SEQ ID NO: 2448 is an exemplary nucleic acid sequence encoding VRC16c kappa chain.
  • SEQ ID NO: 2449 is an exemplary nucleic acid sequence encoding VRC16d heavy chain.
  • SEQ ID NO: 2450 is an exemplary nucleic acid sequence encoding VRC kappa chain.
  • SEQ ID NO: 2451 is the amino acid sequence of NIH45_46 heavy chain.
  • SEQ ID NO: 2452 is the amino acid sequence of NIH45_46 kappa chain.
  • SEQ ID NO: 2453 is the amino acid sequence of 3BNC60 heavy chain.
  • SEQ ID NO: 2454 is the amino acid sequence of 3BNC60 kappa chain.
  • SEQ ID NO: 2455 is the amino acid sequence of 3BNC117 heavy chain.
  • SEQ ID NO: 2456 is the amino acid sequence of 3BNC117 kappa chain.
  • SEQ ID NO: 2457 is the amino acid sequence of 12A12 heavy chain.
  • SEQ ID NO: 2458 is the amino acid sequence of 12A12 kappa chain.
  • SEQ ID NO: 2459 is the amino acid sequence of 12A21 heavy chain.
  • SEQ ID NO: 2460 is the amino acid sequence of 12A21 kappa chain.
  • SEQ ID NO: 2461 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2462 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2463 is an exemplary nucleic acid sequence encoding 3BNC60 heavy chain.
  • SEQ ID NO: 2464 is an exemplary nucleic acid sequence encoding 3BNC60 kappa chain.
  • SEQ ID NO: 2465 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2466 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2467 is an exemplary nucleic acid sequence encoding 12A12 heavy chain.
  • SEQ ID NO: 2468 is an exemplary nucleic acid sequence encoding 12A12 kappa chain.
  • SEQ ID NO: 2469 is an exemplary nucleic acid sequence encoding 12A21 heavy chain.
  • SEQ ID NO: 2470 is an exemplary nucleic acid sequence encoding 12A21 kappa chain.
  • SEQ ID NO: 2471 is the amino acid sequence of 1NC9 heavy chain.
  • SEQ ID NO: 2472 is the amino acid sequence of 1NC9 lambda chain.
  • SEQ ID NO: 2473 is the amino acid sequence of 1B2530 heavy chain.
  • SEQ ID NO: 2474 is the amino acid sequence of 1B2530 lambda chain.
  • SEQ ID NO: 2475 is the amino acid sequence of 8ANC131 heavy chain.
  • SEQ ID NO: 2476 is the amino acid sequence of 8ANC131 kappa chain.
  • SEQ ID NO: 2477 is the amino acid sequence of 8ANC134 heavy chain.
  • SEQ ID NO: 2478 is the amino acid sequence of 8ANC134 kappa chain.
  • SEQ ID NO: 2479 is an exemplary nucleic acid sequence encoding 1NC9 heavy chain.
  • SEQ ID NO: 2480 is an exemplary nucleic acid sequence encoding 1NC9 lambda chain.
  • SEQ ID NO: 2481 is an exemplary nucleic acid sequence encoding 1B2530 heavy chain.
  • SEQ ID NO: 2482 is an exemplary nucleic acid sequence encoding 1B2530 lambda chain.
  • SEQ ID NO: 2483 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2484 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2485 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2486 is an exemplary nucleic acid sequence encoding
  • SEQ ID NO: 2487 is the amino acid sequence of IGHVl-02*02 heavy chain.
  • SEQ ID NO: 2488 IS the amino acid sequence of IGKV3- 11*01 light chain.
  • SEQ ID NO: 2489 is the amino acid sequence of IGKV3-20*01 light chain,
  • SEQ ID NO: 2490 IS the amino acid sequence of IGKV3-33*01 light chain.
  • SEQ ID NO: 2491 is the amino acid sequence of the lineage analysis of CDR H3 class 3.
  • SEQ ID NO: 2492 is the amino acid sequence of the lineage analysis of CDR H3 class 6.
  • SEQ ID NO: 2493 is the amino acid sequence of the lineage analysis of
  • SEQ ID NO: 2494 is the amino acid sequence of the lineage analysis of CDR H3 class 8.
  • SEQ ID Nos: 2495-2501 are the nucleic acid sequences of primers.
  • SEQ ID Nos: 2502-2536 are amino acid sequences of CDR H3 of 35 expressed and experimentally tested heavy-chain sequences.
  • SEQ ID NOs: 2537-2623 are amino acid sequences of the heavy chain variable regions of exemplary VRCOl-like antibodies.
  • SEQ ID NO: 2624 is the amino acid sequence of LV2- 14*01.
  • Sequence.txt The Sequence Listing is submitted as an ASCII text file in the form of the file named Sequence.txt, which was created on March 23, 2012, and is -1.9 MB bytes, which is incorporated by reference herein.
  • Administration The introduction of a composition into a subject by a chosen route. Administration can be local or systemic. For example, if the chosen route is intravenous, the composition is administered by introducing the composition into a vein of the subject. In some examples a disclosed antibody specific for an HIV protein or polypeptide is administered to a subject.
  • Amino acid substitution The replacement of one amino acid in peptide with a different amino acid.
  • Amplification A technique that increases the number of copies of a nucleic acid molecule (such as an RNA or DNA).
  • An example of amplification is the polymerase chain reaction, in which a biological sample is contacted with a pair of oligonucleotide primers, under conditions that allow for the hybridization of the primers to a nucleic acid template in the sample.
  • the primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid.
  • the product of amplification can be characterized by electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing using standard techniques.
  • Other examples of amplification include strand displacement amplification, as disclosed in U.S. Patent No. 5,744,311;
  • repair chain reaction amplification as disclosed in WO 90/01069
  • ligase chain reaction amplification as disclosed in EP-A-320 308
  • gap filling ligase chain reaction amplification as disclosed in U.S. Patent No. 5,427,930
  • NASBATM RNA transcription-free amplification as disclosed in U.S. Patent No. 6,025,134.
  • Animal Living multi-cellular vertebrate organisms, a category that includes, for example, mammals and birds.
  • mammal includes both human and non- human mammals.
  • subject includes both human and veterinary subjects.
  • Antibody A polypeptide substantially encoded by an immunoglobulin gene or immunoglobulin genes, or antigen binding fragments thereof, which specifically binds and recognizes an analyte (antigen) such as gpl20, or an antigenic fragment of gpl20.
  • Immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as the myriad immunoglobulin variable region genes.
  • Antibodies exist, for example as intact immunoglobulins and as a number of well characterized fragments produced by digestion with various peptidases. For instance, Fabs, Fvs, and single-chain Fvs (scFvs) that specifically bind to gpl20 or fragments of gpl20 would be gp 120 -specific binding agents.
  • a scFv protein is a fusion protein in which a light chain variable region of an immunoglobulin and a heavy chain variable region of an immunoglobulin are bound by a linker, while in dsFvs, the chains have been mutated to introduce a disulfide bond to stabilize the association of the chains.
  • the term also includes genetically engineered forms such as chimeric antibodies (such as humanized murine antibodies), heteroconjugate antibodies such as bispecific antibodies). See also, Pierce Catalog and Handbook, 1994-1995 (Pierce Chemical Co., Rockford, IL); Kuby, J., Immunology, 3 rd Ed., W.H. Freeman & Co., New York, 1997.
  • Antibody fragments are defined as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab') 2 , the fragment of the antibody obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; (4) F(ab') 2 , a dimer of two Fab' fragments held together by two disulfide bonds; (5) Fv, a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (6) single chain antibody (“SCA”), a genetically engineered molecule containing the variable region of the light chain, the variable region of the heavy chain, linked by a suitable polypeptide link
  • antibody also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies.
  • a naturally occurring immunoglobulin has heavy (H) chains and light (L) chains interconnected by disulfide bonds.
  • H heavy chain
  • L light chain
  • lambda
  • kappa
  • IgM immunoglobulin heavy chain classes
  • Each heavy and light chain contains a constant region and a variable region, (the regions are also known as “domains”).
  • the heavy and the light chain variable regions specifically bind the antigen.
  • Light and heavy chain variable regions contain a "framework" region interrupted by three hypervariable regions, also called “complementarity-determining regions” or "CDRs.” The extent of the framework region and CDRs have been defined (see, Kabat et ah, Sequences of Proteins of Immunological Interest, U.S. Department of Health and Human
  • the CDRs are primarily responsible for binding to an epitope of an antigen.
  • the CDRs of each chain are typically referred to as CDRl, CDR2, and CDR3, numbered sequentially starting from the N-terminus, and are also typically identified by the chain in which the particular CDR is located.
  • a V H CDR3 is located in the variable domain of the heavy chain of the antibody in which it is found
  • a V L CDRl is the CDRl from the variable domain of the light chain of the antibody in which it is found.
  • Light chain CDRs are sometimes referred to as CDR LI, CDR L2, and CDR L3.
  • Heavy chain CDRs are sometimes referred to as CDR HI, CDR H2, and CDR H3.
  • immunoglobulin heavy chain including that of an antibody fragment, such as Fv, scFv, dsFv or Fab.
  • references to "V L " or "VL” refer to the variable region of an immunoglobulin light chain, including that of an Fv, scFv, dsFv or Fab.
  • a “monoclonal antibody” is an antibody produced by a single clone of B-lymphocytes or by a cell into which the light and heavy chain genes of a single antibody have been transfected.
  • Monoclonal antibodies are produced by methods known to those of skill in the art, for instance by making hybrid antibody-forming cells from a fusion of myeloma cells with immune spleen cells. These fused cells and their progeny are termed "hybridomas.”
  • Monoclonal antibodies include humanized monoclonal antibodies. In some examples monoclonal antibodies are isolated from a subject. The amino acid sequences of such isolated monoclonal antibodies can be determined.
  • a "humanized” immunoglobulin is an immunoglobulin including a human framework region and one or more CDRs from a non-human (such as a mouse, rat, or synthetic) immunoglobulin.
  • the non-human immunoglobulin providing the
  • CDRs is termed a "donor,” and the human immunoglobulin providing the framework is termed an “acceptor.” In one embodiment, all the CDRs are from the donor immunoglobulin in a humanized immunoglobulin. Constant regions need not be present, but if they are, they must be substantially identical to human
  • immunoglobulin constant regions such as at least about 85-90%, such as about 95% or more identical.
  • all parts of a humanized immunoglobulin, except possibly the CDRs are substantially identical to corresponding parts of natural human immunoglobulin sequences.
  • a "humanized antibody” is an antibody comprising a humanized light chain and a humanized heavy chain immunoglobulin.
  • a humanized antibody binds to the same antigen as the donor antibody that provides the CDRs.
  • the acceptor framework of a humanized immunoglobulin or antibody may have a limited number of substitutions by amino acids taken from the donor framework.
  • Humanized or other monoclonal antibodies can have additional conservative amino acid substitutions which have substantially no effect on antigen binding or other immunoglobulin functions.
  • Humanized immunoglobulins can be constructed by means of genetic engineering (for example, see U.S. Patent No. 5,585,089).
  • a “neutralizing antibody” is an antibody which reduces the infectious titer of an infectious agent by binding to a specific antigen on the infectious agent.
  • the infectious agent is a virus.
  • an antibody that is specific for gpl20 neutralizes the infectious titer of HIV.
  • Antibodyome The entire repertoire of expressed antibody heavy and light chain sequence in an individual.
  • the individual can be an individual infected with a pathogen, for example HIV.
  • Antigen A compound, composition, or substance that can stimulate the production of antibodies or a T cell response in an animal, including compositions that are injected or absorbed into an animal.
  • An antigen reacts with the products of specific humoral or cellular immunity, including those induced by heterologous antigens, such as the disclosed antigens.
  • "Epitope” or “antigenic determinant” refers to the region of an antigen to which B and/or T cells respond.
  • T cells respond to the epitope, when the epitope is presented in conjunction with an MHC molecule.
  • Epitopes can be formed both from contiguous amino acids or noncontiguous amino acids juxtaposed by tertiary folding of a protein.
  • Epitopes formed from contiguous amino acids are typically retained on exposure to denaturing solvents whereas epitopes formed by tertiary folding are typically lost on treatment with denaturing solvents.
  • An epitope typically includes at least 3, and more usually, at least 5, about 9, or about 8-10 amino acids in a unique spatial conformation. Methods of determining spatial conformation of epitopes include, for example, x-ray crystallography and nuclear magnetic resonance.
  • antigens include, but are not limited to, peptides, lipids, polysaccharides, and nucleic acids containing antigenic determinants, such as those recognized by an immune cell.
  • antigens include peptides derived from a pathogen of interest. Exemplary pathogens include bacteria, fungi, viruses and parasites.
  • an antigen is derived from HIV, such as a gpl20 polypeptide or antigenic fragment thereof, such as a gpl20 outer domain or fragment thereof.
  • a “target epitope” is a specific epitope on an antigen that specifically binds an antibody of interest, such as a monoclonal antibody.
  • a target epitope includes the amino acid residues that contact the antibody of interest, such that the target epitope can be selected by the amino acid residues determined to be in contact with the antibody of interest.
  • Antigenic surface A surface of a molecule, for example a protein such as a gpl20 protein or polypeptide, capable of eliciting an immune response.
  • An antigenic surface includes the defining features of that surface, for example the three- dimensional shape and the surface charge.
  • An antigenic surface includes both surfaces that occur on gpl20 polypeptides as well as surfaces of compounds that mimic the surface of a gpl20 polypeptide (mimetics).
  • an antigenic surface includes all or part of the surface of gpl20 that binds to the CD4 receptor.
  • Atomic Coordinates or Structure coordinates Mathematical coordinates derived from mathematical equations related to the patterns obtained on diffraction of a monochromatic beam of X-rays by the atoms (scattering centers) such as an antigen, or an antigen in complex with an antibody.
  • antigen can be gpl20, a gp 120: antibody complex, or combinations thereof in a crystal.
  • the diffraction data are used to calculate an electron density map of the repeating unit of the crystal.
  • the electron density maps are used to establish the positions of the individual atoms within the unit cell of the crystal.
  • structure coordinates refers to Cartesian coordinates derived from mathematical equations related to the patterns obtained on diffraction of a monochromatic beam of X-rays, such as by the atoms of a gpl20 in crystal form.
  • Binding affinity Affinity of an antibody or antigen binding fragment thereof for an antigen.
  • affinity is calculated by a modification of the Scatchard method described by Frankel et al., Mol. Immunol., 16: 101-106, 1979.
  • binding affinity is measured by an antigen/antibody dissociation rate.
  • a high binding affinity is measured by a competition radioimmunoassay.
  • a high binding affinity is at least about 1 x 10 - " 8 M.
  • a high binding affinity is at least about 1.5 x 10 "8 , at least about 2.0 x 10 "8 , at least about 2.5 x 10 "8 , at least about 3.0 x 10 "8 , at least about 3.5 x 10 - “ 8 , at least about 4.0 x 10 - “ 8 , at least about 4.5 x 10 - " 8 , or at least about 5.0 x 10 "8 M.
  • an antibody that binds to and inhibits the function of related antigens such as antigens that share 85%, 90%, 95%, 96%, 97%, 98% or 99% identity antigenic surface of antigen.
  • an antigen from a pathogen such as a virus
  • the antibody can bind to and inhibit the function of an antigen from more than one class and/or subclass of the pathogen.
  • the antibody can bind to and inhibit the function of an antigen, such as gpl20 from more than one clade.
  • broadly neutralizing antibodies to HIV are distinct from other antibodies to HIV in that they neutralize a high percentage of the many types of HIV in circulation.
  • CD4 Cluster of differentiation factor 4 polypeptide; a T-cell surface protein that mediates interaction with the MHC class II molecule. CD4 also serves as the primary receptor site for HIV on T-cells during HIV-I infection. CD4 is known to bind to gpl20 from HIV. The known sequence of the CD4 precursor has a hydrophobic signal peptide, an extracellular region of approximately 370 amino acids, a highly hydrophobic stretch with significant identity to the membrane- spanning domain of the class II MHC beta chain, and a highly charged intracellular sequence of 40 resides (Maddon, Cell 42:93, 1985).
  • CD4BS antibodies Antibodies that bind to or substantially overlap the CD4 binding surface of a gpl20 polypeptide. The antibodies interfere with or prevent CD4 from binding to a gpl20 polypeptide.
  • Chimeric antibody An antibody which includes sequences derived from two different antibodies, which typically are of different species.
  • a chimeric antibody includes one or more CDRs and/or framework regions from one human antibody and CDRs and/or framework regions from another human antibody.
  • Placement in direct physical association includes both in solid and liquid form, which can take place either in vivo or in vitro.
  • Contacting includes contact between one molecule and another molecule, for example the amino acid on the surface of one polypeptide, such as an antigen, that contacts another polypeptide, such as an antibody.
  • Contacting can also include contacting a cell for example by placing an antibody in direct physical association with a cell.
  • Computer readable media Any medium or media, which can be read and accessed directly by a computer, so that the media is suitable for use in a computer system.
  • Such media include, but are not limited to: magnetic storage media such as floppy discs, hard disc storage medium and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
  • Computer system Hardware that can be used to analyze atomic coordinate data and/or design an antigen using atomic coordinate data or to analyze an amino acid or nucleic acid sequence, for example to compare two or more sequences an calculate sequence similarity and/or divergence.
  • the minimum hardware of a computer-based system typically comprises a central processing unit (CPU), an input device, for example a mouse, keyboard, and the like, an output device, and a data storage device. Desirably a monitor is provided to visualize structure data.
  • the data storage device may be RAM or other means for accessing computer readable. Examples of such systems are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based Windows NT or IBM OS/2 operating systems.
  • Cross donor complementation Formation of an antibody using a heavy chain variable domain of an antibody that specifically binds an epitope of an antigen of interest from first donor and a light chain variable domain of an antibody that specifically binds the same epitope from a second donor, wherein the antibody that is formed from the heavy chain variable domain and the light chain variable domain retains its ability to bind the epitope and wherein the first and the second donor are different antibodies.
  • the light chain variable domains and the heavy chain variable domains that form an antibody are from different sources, but the chimeric antibody that is formed still binds the epitope.
  • the different antibodies are from different subjects.
  • the antigen is gpl20.
  • the epitope is RSC3.
  • the heavy chain variable domain or the light chain variable domain is the VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 heavy chain or light chain variable domain.
  • the heavy chain variable domain or the light chain variable domain is the VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC 134 heavy chain or light chain variable domain and the light chain variable domain or the heavy chain variable domain, respectively, is not from VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
  • Dendrogram A diagrammatic representation of a phylogenetic tree.
  • DNA Maximum Likelihood A method for constructing phylogenetic trees of nucleic acid sequences under the constraint that the phylogenetic trees must be consistent with a molecular clock.
  • the molecular clock is the assumption that the tips of the tree are all equidistant, in branch length, from its root.
  • the computer program and several embodiments of the method are disclosed on the DNA MLK website (version 3.5c, copyright 1986-1993, incorporated herein by reference). The assumptions of the model are:
  • Each site undergoes substitution at an expected rate which is chosen from a series of rates (each with a probability of occurrence) which we specify.
  • a substitution consists of one of two sorts of events:
  • the first kind of event consists of the replacement of the existing base by a base drawn from a pool of purines or a pool of pyrimidines
  • the second kind of event consists of the replacement of the existing base by a base drawn at random from a pool of bases at known frequencies, independently of the identity of the base which is being replaced. This could lead either to a no change, to a transition or to a transversion.
  • the ratio of the two purines in the purine replacement pool is the same as their ratio in the overall pool, and similarly for the pyrimidines.
  • DNA sequencing The process of determining the nucleotide order of a given DNA molecule.
  • the general characteristics of "deep sequencing” are that genetic material is amplified, such as by polymerase chain reaction, and then the amplified products are ligated to a solid surface.
  • the sequence of the amplified target genetic material is then performed in parallel and the sequence information is captured by a computer.
  • the sequencing can be performed using automated Sanger sequencing (AB 13730x1 genome analyzer), pyrosequencing on a solid support (454 sequencing, Roche), sequencing-by- synthesis with reversible terminations (ILLUMINA® Genome Analyzer), sequencing-by-ligation (ABI
  • DNA sequencing is performed using a chain termination method developed by Frederick Sanger, and thus termed “Sanger based sequencing” or "SBS.”
  • SBS serum based sequencing
  • This technique uses sequence- specific termination of a DNA synthesis reaction using modified nucleotide substrates. Extension is initiated at a specific site on the template DNA by using a short oligonucleotide primer complementary to the template at that region.
  • the oligonucleotide primer is extended using DNA polymerase in the presence of the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related
  • DNA fragments that are terminated only at positions where that particular nucleotide is present.
  • the fragments are then size-separated by electrophoresis a
  • polyacrylamide gel or in a narrow glass tube (capillary) filled with a viscous polymer.
  • An alternative to using a labeled primer is to use labeled terminators instead; this method is commonly called “dye terminator sequencing.”
  • “Pyrosequencing” is an array based method, which has been commercialized by 454 Life Sciences.
  • single- stranded DNA is annealed to beads and amplified via EmPCR®. These DNA-bound beads are then placed into wells on a fiber-optic chip along with enzymes that produce light in the presence of ATP. When free nucleotides are washed over this chip, light is produced as the PCR amplification occurs and ATP is generated when nucleotides join with their complementary base pairs. Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded, such as by the charge coupled device (CCD) camera, within the instrument.
  • the signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.
  • Epitope An antigenic determinant. These are particular chemical groups or peptide sequences on a molecule that are antigenic, i.e. that elicit a specific immune response.
  • An antibody specifically binds a particular antigenic epitope on a polypeptide.
  • a disclosed antibody specifically binds to an epitope on the surface of gpl20 from HIV, such as the CD4 binding site on the surface of gpl20.
  • VRCOl -like antibody or a heavy chain or light chain that can complement with a corresponding heavy chain or light chain from VRCOl, as specifically defined herein.
  • "established VRCOl-like" antibody, heavy chain or light chain refers to the following antibodies, heavy chains or light chains:
  • VRCOl-like antibodies, heavy chains and light chains disclosed in Wu et al., "Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1," Science, 329(5993):856-861, 2012, which is incorporated by reference herein. These include heavy and light chains sequences of antibodies VRCOl (SEQ ID NO: 1614 and SEQ ID NO: 1624, respectively), VRC02 (SEQ ID NO: 1615 and SEQ ID NO: 1625, respectively) and VRC03 (SEQ ID NO: 1616 and SEQ ID NO: 1626, respectively).
  • These include heavy and light chains of the VRCOl (SEQ ID NO: 1614 and SEQ ID NO: 1624, respectively), VRC02 (SEQ ID NO: 1615 and SEQ ID NO: 1625, respectively) and VRC03 (SEQ ID NO: 1616 and SEQ ID NO: 1626, respectively) antibodies and 700 additional VRCOl-like heavy chains (SEQ ID NO: 1715-2414).
  • These include the heavy and light chains of the 3BNC117, 3BNC60, 12A12, 12A21, NIH45-46, 8ANC131, 8ANC134, 1B2530, 1NC9 antibodies (corresponding SEQ ID NOs. and/or Accession Nos. shown in Table 1, below) and up to 567 other clonal related antibodies, including those listed in Figures S3, S13, S14 and Table S8 of Scheid et al., which are specifically incorporated by reference herein.
  • VRCOl-like antibodies, heavy chains and light chains disclosed in Wu et al., "Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing," Science, 333(6049): 1593-1602, 2011, incorporated herein by reference.
  • These certain VRCOl-like antibodies, heavy chains and light chains include the heavy and light chains of the VRC-PG04 and VRC-PG04b antibodies (GENBANK® Accession Nos. JN159464 to JN159467, respectively), VRC-CH30, VRC-CH31, and VRC-CH32 antibodies (GENBANK® Accession Nos. JN159434 to JN159439, respectively), and VRC-CH33 and VRC-CH34 antibodies (GENBANK® Accession Nos. JN159470 to 159473, respectively)
  • VRCOl-like antibodies, heavy chains and light chains also include 24 heavy chains from donor 74, 2008 (GENBANK® Accession Nos. JN159440 to JN159463), two heavy chains from donor 45, 2008
  • VRCOl-like antibodies, heavy chains and light chains also include 1561 unique sequences associated with neutralizing CDR H3 distributions with at least one low divergent member shown in Fig. 6B and Fig. S16 of Wu et ah, Science, 333(6049): 1593-1602, 2011 (GENBANK® Accession Nos. JN157873 to JN159433, respectively).
  • These include the heavy and light chains of the NIH45-46 antibody with a G54W amino acid substitution (Kabat numbering) in the heavy chain variable domain.
  • VRC-CH31 SEQ ID NO: 1612 SEQ ID NO: 1622
  • VRC-CH32 SEQ ID NO: 1613 SEQ ID NO: 1623
  • Framework Region Amino acid sequences interposed between CDRs. Includes variable light and variable heavy framework regions. The framework regions serve to hold the CDRs in an appropriate orientation for antigen binding.
  • Fc polypeptide The polypeptide comprising the constant region of an antibody excluding the first constant region immunoglobulin domain.
  • Fc region generally refers to the last two constant region immunoglobulin domains of IgA, IgD, and IgG, and the last three constant region immunoglobulin domains of IgE and IgM.
  • An Fc region may also include part or all of the flexible hinge N-terminal to these domains.
  • an Fc region may or may not comprise the tailpiece, and may or may not be bound by the J chain.
  • the Fc region comprises immunoglobulin domains Cgamma2 and Cgamma3 (Cy2 and Cy3) and the lower part of the hinge between Cgammal (Cyl) and Cy2.
  • the human IgG heavy chain Fc region is usually defined to comprise residues C226 or P230 to its carboxyl-terminus, wherein the numbering is according to the EU index as in Kabat.
  • the Fc region comprises immunoglobulin domains Calpha2 and Calpha3 (Ca2 and Ca3) and the lower part of the hinge between Calphal (Cal) and Ca2. Encompassed within the definition of the Fc region are functionally equivalent analogs and variants of the Fc region.
  • a functionally equivalent analog of the Fc region may be a variant Fc region, comprising one or more amino acid modifications relative to the wild-type or naturally existing Fc region.
  • Variant Fc regions will possess at least 50% homology with a naturally existing Fc region, such as about 80%, and about 90%, or at least about 95% homology.
  • Functionally equivalent analogs of the Fc region may comprise one or more amino acid residues added to or deleted from the N- or C- termini of the protein, such as no more than 30 or no more than 10 additions and/or deletions.
  • Functionally equivalent analogs of the Fc region include Fc regions operably linked to a fusion partner.
  • Fc region must comprise the majority of all of the Ig domains that compose Fc region as defined above; for example IgG and IgA Fc regions as defined herein must comprise the majority of the sequence encoding CH 2 and the majority of the sequence encoding CH 3 . Thus, the CH 2 domain on its own, or the CH domain on its own, are not considered Fc region.
  • the Fc region may refer to this region in isolation, or this region in the context of an Fc fusion polypeptide.
  • Furin A calcium dependent serine endoprotease that cleaves precursor proteins at paired basic amino acid processing sites.
  • substrates of furin include proparathyroid hormone, proablumin, and von Willebrand factor.
  • Furin can also cleave HIV envelope protein gpl60 into gpl20 and gp41.
  • Furin cleaves proteins just downstream of a basic amino acid target sequence (canonically, Arg-X- (Arg/Lys) -Arg'). Thus, this amino acid sequence is a furin cleavage site.
  • gpl20 An envelope protein from Human Immunodeficiency Virus (HIV). This envelope protein is initially synthesized as a longer precursor protein of 845- 870 amino acids in size, designated gpl60. gpl60 is cleaved by a cellular protease into gpl20 and gp41. gpl20 contains most of the external, surface-exposed, domains of the HIV envelope glycoprotein complex, and it is gpl20 which binds both to cellular CD4 receptors and to cellular chemokine receptors (such as CCR5).
  • HIV Human Immunodeficiency Virus
  • the mature gpl20 wildtype polypeptides have about 500 amino acids in the primary sequence. gpl20 is heavily N-glycosylated giving rise to an apparent molecular weight of 120 kD.
  • the polypeptide is comprised of five conserved regions (C1-C5) and five regions of high variability (V1-V5).
  • Exemplary sequence of wt gpl20 polypeptides are shown on GENBANK®, for example accession numbers AAB05604 and AAD12142 (as available on August 10, 2011),
  • the gpl20 core has a molecular structure, which includes two domains: an "inner” domain (which faces gp41) and an “outer” domain (which is mostly exposed on the surface of the oligomeric envelope glycoprotein complex).
  • the two gpl20 domains are separated by a "bridging sheet” that is not part of either domain.
  • the gpl20 core comprises 25 beta strands, 5 alpha helices, and 10 defined loop segments.
  • the third variable region referred to herein as the V3 loop is a loop of about 35 amino acids critical for the binding of the co-receptor and determination of which of the co-receptors will bind.
  • the V3 loop comprises residues 296-331.
  • the numbering used in gpl20 polypeptides disclosed herein is relative to the HXB2 numbering scheme as set forth in Numbering Positions in HIV Relative to HXB2CG Bette Korber et al, Human Retroviruses and AIDS 1998: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Korber B, Kuiken CL, Foley B, Hahn B, McCutchan F, Mellors JW, and Sodroski J, Eds. Theoretical
  • Host cells Cells in which a vector can be propagated and its DNA expressed, for example a disclosed antibody can be expressed in a host cell.
  • the cell may be prokaryotic or eukaryotic.
  • the term also includes any progeny of the subject host cell. It is understood that all progeny may not be identical to the parental cell since there may be mutations that occur during replication. However, such progeny are included when the term "host cell" is used.
  • Immunoadhesin A molecular fusion of a protein with the Fc region of an immunoglobulin, wherein the immunoglobulin retains specific properties, such as Fc receptor binding and increased half-life.
  • An Fc fusion combines the Fc region of an immunoglobulin with a fusion partner, which in general can be any protein, polypeptide, peptide, or small molecule.
  • immunoadhesin includes the hinge, CH 2 , and CH 3 domains of the immunoglobulin gamma 1 heavy chain constant region.
  • the immunoadhesin includes the CH 2 , and CH 3 domains of an IgG.
  • Immunologically reactive conditions Includes reference to conditions which allow an antibody raised against a particular epitope to bind to that epitope to a detectably greater degree than, and/or to the substantial exclusion of, binding to substantially all other epitopes. Immunologically reactive conditions are dependent upon the format of the antibody binding reaction and typically are those utilized in immunoassay protocols or those conditions encountered in vivo. See Harlow & Lane, supra, for a description of immunoassay formats and conditions. The immunologically reactive conditions employed in the methods are "physiological conditions" which include reference to conditions (e.g., temperature, osmolarity, pH) that are typical inside a living mammal or a mammalian cell.
  • the intra-organismal and intracellular environment normally lies around pH 7 (e.g., from pH 6.0 to pH 8.0, more typically pH 6.5 to 7.5), contains water as the predominant solvent, and exists at a temperature above 0°C and below 50°C. Osmolarity is within the range that is supportive of cell viability and proliferation.
  • IgA A polypeptide belonging to the class of antibodies that are substantially encoded by a recognized immunoglobulin alpha gene. In humans, this class or isotype comprises IgAi and IgA 2 .
  • IgA antibodies can exist as monomers, polymers (referred to as plgA) of predominantly dimeric form, and secretory IgA.
  • the constant chain of wild- type IgA contains an 18-amino-acid extension at its C- terminus called the tail piece (tp).
  • Polymeric IgA is secreted by plasma cells with a 15-kDa peptide called the J chain linking two monomers of IgA through the conserved cysteine residue in the tail piece.
  • IgG A polypeptide belonging to the class or isotype of antibodies that are substantially encoded by a recognized immunoglobulin gamma gene. In humans, this class comprises IgG 1; IgG 2 , IgG 3 , and IgG 4 . In mice, this class comprises IgG 1; IgG 2a , IgG 2b , IgG 3 .
  • Inhibiting or treating a disease Inhibiting the full development of a disease or condition, for example, in a subject who is at risk for a disease such as acquired immunodeficiency syndrome (AIDS). "Treatment" refers to a therapeutic intervention that ameliorates a sign or symptom of a disease or pathological condition after it has begun to develop.
  • AIDS acquired immunodeficiency syndrome
  • ameliorating refers to any observable beneficial effect of the treatment.
  • the beneficial effect can be evidenced, for example, by a delayed onset of clinical symptoms of the disease in a susceptible subject, a reduction in severity of some or all clinical symptoms of the disease, a slower progression of the disease, a reduction in the viral load, an improvement in the overall health or well-being of the subject, or by other parameters well known in the art that are specific to the particular disease.
  • a "prophylactic" treatment is a treatment administered to a subject who does not exhibit signs of a disease or exhibits only early signs for the purpose of decreasing the risk of developing pathology.
  • Isolated An "isolated" biological component (such as a cell, for example a
  • B-cell a nucleic acid, peptide, protein, heavy chain domain or antibody
  • Nucleic acids, peptides and proteins which have been "isolated” thus include nucleic acids and proteins purified by standard purification methods.
  • the term also embraces nucleic acids, peptides, and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.
  • an antibody such as an antibody specific for gpl20 can be isolated, for example isolated from a subject infected with HIV.
  • silico A process performed virtually within a computer.
  • a virtual compound can be screened for surface similarity or conversely surface complementarity to a virtual representation of the atomic positions at least a portion of a gpl20 polypeptide, a gpl20 polypeptide in complex with an antibody.
  • Ka The dissociation constant for a given interaction, such as a polypeptide ligand interaction or an antibody antigen interaction.
  • a polypeptide ligand interaction such as any of the antibodies disclosed herein
  • an antigen such as gpl20
  • Label A detectable compound or composition that is conjugated directly or indirectly to another molecule, such as an antibody or a protein, to facilitate detection of that molecule.
  • molecule such as an antibody or a protein
  • labels include fluorescent tags, enzymatic linkages, and radioactive isotopes.
  • a disclosed antibody as labeled.
  • Neighborhood Joining A method of constructing phylogenetic trees that finds pairs of operational taxonomic units (OTUs, also called “neighbors”) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree.
  • OFTUs operational taxonomic units
  • Nucleic acid A polymer composed of nucleotide units (ribonucleotides, deoxyribonucleotides, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof) linked via phosphodiester bonds, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof.
  • nucleotide polymers in which the nucleotides and the linkages between them include non-naturally occurring synthetic analogs, such as, for example and without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), and the like.
  • Such polynucleotides can be
  • oligonucleotide typically refers to short polynucleotides, generally no greater than about 50 nucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), this also includes an RNA sequence (i.e., A, U, G, C) in which "U” replaces "T. "
  • nucleotide sequences the left-hand end of a single- stranded nucleotide sequence is the 5'-end; the left-hand direction of a double-stranded nucleotide sequence is referred to as the 5'-direction.
  • the direction of 5' to 3' addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction.
  • the DNA strand having the same sequence as an mRNA is referred to as the "coding strand;" sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5' to the 5'-end of the RNA transcript are referred to as "upstream
  • downstream sequences sequences on the DNA strand having the same sequence as the RNA and which are 3' to the 3' end of the coding RNA transcript are referred to as "downstream sequences.”
  • cDNA refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.
  • Encoding refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom.
  • a gene encodes a protein if transcription and translation of mRNA produced by that gene produces the protein in a cell or other biological system.
  • coding strand the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings
  • non-coding strand used as the template for transcription
  • a "nucleotide sequence encoding an amino acid sequence" includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. Nucleotide sequences that encode proteins and RNA may include introns.
  • Recombinant nucleic acid refers to a nucleic acid having nucleotide sequences that are not naturally joined together. This includes nucleic acid vectors comprising an amplified or assembled nucleic acid which can be used to transform a suitable host cell. A host cell that comprises the recombinant nucleic acid is referred to as a "recombinant host cell.” The gene is then expressed in the recombinant host cell to produce, e.g., a "recombinant polypeptide.”
  • a recombinant nucleic acid may serve a non-coding function (e.g., promoter, origin of replication, ribosome-binding site, etc.) as well.
  • a first sequence is an "antisense" with respect to a second sequence if a polynucleotide whose sequence is the first sequence specifically hybridizes with a polynucleotide whose sequence is the second sequence.
  • sequence relationships between two or more nucleotide sequences or amino acid sequences include “reference sequence,” “selected from,” “comparison window,” “identical,” “percentage of sequence identity,” “substantially identical,” “complementary,” and “substantially
  • sequence comparison For sequence comparison of nucleic acid sequences, typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters are used. Methods of alignment of sequences for comparison are well known in the art.
  • Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443, 1970, by the search for similarity method of Pearson & Lipman, Proc. Nat' I. Acad. Sci.
  • PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 35:351-360, 1987. The method used is similar to the method described by Higgins & Sharp, CABIOS 5:151-153, 1989.
  • a reference sequence is compared to other test sequences to determine the percent sequence identity relationship using the following parameters: default gap weight (3.00), default gap length weight
  • PILEUP can be obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et al., Nuc. Acids Res.
  • An oligonucleotide is a linear polynucleotide sequence of up to about 100 nucleotide bases in length.
  • ClustalW is a program that aligns three or more sequences in a
  • this program can classify sequences for phylogenetic analysis, which aims to model the substitutions that have occurred over evolution and derive the evolutionary relationships between sequences.
  • ClustalW multiple sequence alignment web form is available on the internet from EMBL-EBI (ebi.ac.uk/Tools/msa/clustalw2/), see also Larkin et al., Bioinformatics 200723(21): 2947-2948.
  • a polynucleotide or nucleic acid sequence refers to a polymeric form of nucleotide at least 10 bases in length.
  • a recombinant polynucleotide includes a polynucleotide that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally occurring genome of the organism from which it is derived.
  • the term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences.
  • the nucleotides can be ribonucleotides, deoxyribonucleotides, or modified forms of either nucleotide.
  • the term includes single- and double- stranded forms of DNA.
  • compositions of use are conventional. Remington's Pharmaceutical Sciences, by E. W. Martin, Mack Publishing Co., Easton, PA, 19th Edition, 1995, describes
  • compositions and formulations suitable for pharmaceutical delivery of the antibodies herein disclosed are provided.
  • parenteral formulations usually comprise injectable fluids that include pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle.
  • pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle.
  • physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like
  • solid compositions e.g., powder, pill, tablet, or capsule forms
  • conventional non-toxic solid carriers can include, for example, pharmaceutical grades of mannitol, lactose, starch, or magnesium stearate.
  • compositions to be administered can contain minor amounts of non-toxic auxiliary substances, such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
  • non-toxic auxiliary substances such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
  • Pharmaceutical agent A chemical compound or composition capable of inducing a desired therapeutic or prophylactic effect when properly administered to a subject or a cell.
  • a pharmaceutical agent includes one or more of the disclosed antibodies.
  • Phylogenetic analysis The assembly of a phylogenetic tree representing the evolutionary ancestry of a set of genes, such as genes encoding an antibody, or other taxa, using nucleotide sequences as the basis for classification.
  • Phylogenetic tree A branching diagram or "tree” showing the inferred evolutionary relationships among nucleic acid or amino acid sequences based upon similarities and differences in their sequence.
  • the "taxa” or “leaves” joined together in the tree are implied to have descended from a common ancestor, such as an inferred common ancestor.
  • a common ancestor such as an inferred common ancestor.
  • Each node can be referred to a taxonomic unit.
  • Internal nodes are generally referred to ad hypothetical evolutionary intermediates as they cannot be directly observed.
  • Phylogenetic trees are constructed using computational phylogenetic methods and tools, such as distance-matrix methods for example, neighbor-joining, maximum likelihood or UPGMA, which calculate genetic distance from multiple sequence alignments.
  • Many sequence alignment methods such as ClustalW also create trees by using the simpler algorithms (i.e., those based on distance) of tree construction.
  • More advanced methods use the optimality criterion of maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation.
  • a rooted phylogenetic tree is a directed tree with a unique node
  • a root is the common ancestor of all of the sequences in the phylogenetic tree.
  • a phylogenetic tree is constructed as aprt of a cross- donor phylogenetic analysis, wherein nucleic acid sequences are leaves and the root of the phylogenetic tree.
  • Polypeptide Any chain of amino acids, regardless of length or post- translational modification (e.g., glycosylation or phosphorylation).
  • the polypeptide is gpl20 polypeptide.
  • the polypeptide is a disclosed antibody or a fragment thereof.
  • a "residue” refers to an amino acid or amino acid mimetic incorporated in a polypeptide by an amide bond or amide bond mimetic.
  • a polypeptide has an amino terminal (N-terminal) end and a carboxy terminal end.
  • purified does not require absolute purity; rather, it is intended as a relative term.
  • a purified peptide preparation is one in which the peptide or protein (such as an antibody) is more enriched than the peptide or protein is in its natural environment within a cell.
  • a preparation is purified such that the protein or peptide represents at least 50% of the total peptide or protein content of the preparation.
  • a recombinant nucleic acid is one that has a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • Sequence identity The similarity between amino acid sequences is expressed in terms of the similarity between the sequences, otherwise referred to as sequence identity. Sequence identity is frequently measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more similar the two sequences are. Homologs or variants of a polypeptide will possess a relatively high degree of sequence identity when aligned using standard methods.
  • NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J. Mol. Biol. 215:403, 1990) is available from several sources, including the National Center for Biotechnology Information (NCBI, Bethesda, MD) and on the internet (along with a description of how to determine sequence identity using this program).
  • Homologs and variants of a V L or a V H of an antibody that specifically binds a polypeptide are typically characterized by possession of at least about 75%, for example at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity counted over the full length alignment with the amino acid sequence of interest. Proteins with even greater similarity to the reference sequences will show increasing percentage identities when assessed by this method, such as at least 80%, at least 85%, at least 90%, at least 95%, at least
  • sequence identity 98%, or at least 99% sequence identity.
  • homologs and variants will typically possess at least 80% sequence identity over short windows of 10-20 amino acids, and may possess sequence identities of at least 85% or at least 90% or 95% depending on their similarity to the reference sequence.
  • sequence identity ranges are provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that fall outside of the ranges provided.
  • bind When referring to an antibody, refers to a binding reaction which determines the presence of a target protein, peptide, or
  • an antibody binds preferentially to a particular target protein, peptide or polysaccharide (such as an antigen present on the surface of a pathogen, for example gpl20) and do not bind in a significant amount to other proteins or polysaccharides present in the sample or subject.
  • Specific binding can be determined by methods known in the art. With reference to an antibody antigen complex, specific binding of the antigen and antibody has a K d of less than
  • Therapeutic agent Used in a generic sense, it includes treating agents, prophylactic agents, and replacement agents.
  • Therapeutically effective amount or effective amount A quantity of a specific substance, such as a disclosed antibody, sufficient to achieve a desired effect in a subject being treated. For instance, this can be the amount necessary to inhibit HIV replication or treat AIDS. In several embodiments, a therapeutically effective amount is the amount necessary to reduce a sign or symptom of AIDS, and/or to decrease viral titer in a subject. When administered to a subject, a dosage will generally be used that will achieve target tissue concentrations that has been shown to achieve a desired in vitro effect.
  • T Cell A white blood cell critical to the immune response.
  • T cells include, but are not limited to, CD4 + T cells and CD8 + T cells.
  • a CD4 + T lymphocyte is an immune cell that carries a marker on its surface known as "cluster of differentiation
  • CD4 helper T cells
  • helper T cells help orchestrate the immune response, including antibody responses as well as killer T cell responses.
  • CD8 + T cells carry the "cluster of differentiation 8" (CD8) marker.
  • a CD8 T cells is a cytotoxic T lymphocytes.
  • a CD8 cell is a suppressor T cell.
  • a nucleic acid molecule as introduced into a host cell, thereby producing a transformed host cell.
  • a vector may include nucleic acid sequences that permit it to replicate in a host cell, such as an origin of replication.
  • a vector may also include one or more selectable marker genes and other genetic elements known in the art.
  • Virus Microscopic infectious organism that reproduces inside living cells.
  • a virus consists essentially of a core of a single nucleic acid surrounded by a protein coat, and has the ability to replicate only inside a living cell.
  • "Viral replication" is the production of additional virus by the occurrence of at least one viral life cycle.
  • a virus may subvert the host cells' normal functions, causing the cell to behave in a manner determined by the virus. For example, a viral infection may result in a cell producing a cytokine, or responding to a cytokine, when the uninfected cell does not normally do so.
  • RNA viruses wherein the viral genome is RNA.
  • the genomic RNA is reverse transcribed into a DNA intermediate which is integrated very efficiently into the chromosomal DNA of infected cells.
  • the integrated DNA intermediate is referred to as a pro virus.
  • the term "lentivirus” is used in its conventional sense to describe a genus of viruses containing reverse transcriptase.
  • the lentiviruses include the "immunodeficiency viruses” which include human immunodeficiency virus (HIV) type 1 and type 2 (HIV-I and HIV-II), simian immunodeficiency virus (SIV), and feline
  • FMV immunodeficiency virus
  • HIV-I is a retrovirus that causes immunosuppression in humans (HIV disease), and leads to a disease complex known as the acquired immunodeficiency syndrome (AIDS).
  • HIV disease refers to a well-recognized constellation of signs and symptoms (including the development of opportunistic infections) in persons who are infected by an HIV virus, as determined by antibody or western blot studies.
  • VRCOl-like antibody VRC-01 antibodies, and methods for identifying and producing these antibodies, are disclosed herein. Generally, these antibodies bind to CD4 binding surface of gpl20 in substantially the same orientation as VRCOl, and are broadly neutralizing. VRCOl-like antibodies mimic the binding of CD4 to gpl20 with several of the important contacts between CD4 and gpl20 mimicked by the VRCOl-like antibodies (see below).
  • the heavy and or light chains of a VRCOl -antibody can be cross-complemented by the heavy and or light chains of a known VRCOl like antibody, such as VRC-PG04, VRC-PG04b, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 and maintain high binding affinity for gpl20.
  • a known VRCOl like antibody such as VRC-PG04, VRC-PG04b, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 and maintain high binding affinity for gpl20.
  • a class of isolated human monoclonal antibodies "VRCOl-like antibodies” that specifically bind gpl20 and are broadly neutralizing are disclosed herein. Also disclosed herein are compositions including these human monoclonal antibodies and a pharmaceutically acceptable carrier. Nucleic acids encoding these antibodies, expression vectors comprising these nucleic acids, and isolated host cells that express the nucleic acids are also provided.
  • compositions comprising the human monoclonal antibodies specific for gpl20 can be used for research, diagnostic and therapeutic purposes.
  • the human monoclonal antibodies disclosed herein can be used to diagnose or treat a subject having an HIV-1 infection and/or AIDS.
  • the antibodies can be used to determine HIV-1 titer in a subject.
  • the antibodies disclosed herein also can be used to study the biology of the human immunodeficiency virus.
  • VRCOl-like antibodies bind to the particular CD4 binding site on gpl20 in a specific orientation that mimics the binding of CD4 to gpl20.
  • VRCOl-like antibodies can be described by this novel mode of binding.
  • the crystal structures of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 antibodies in complex with gpl20 provides insight for a novel binding mode for antibodies and gpl20. Such a novel binding mode establishes a new class of antibody recognition for gpl20.
  • the antibody specifically binds to an epitope on the surface of gpl20 that includes, residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20 or a subset or combination thereof (see the numbering of gpl20 according to the HXBC2 convention).
  • VRCOl -like antibodies bind to the epitope defined by residues
  • N 2 76.T 278 NNAKT 283 ..S365GGD368..I371...T455 DGG459.N461.. 469 ..G 472 GN 47 4 in gpl20.
  • a VRCOl-like antibody has a relative angle and orientation of binding of gpl20 as shown in the crystal structure of the complex of the VRCR03 antibody and gp 120 (see FIG. 2d of Zhou et al. , "Structural Basis for Broad and Potent Neutralization of ⁇ -1 by Antibody VRCOl, Science 329, 811- 817 (2010), which is incorporated herein by reference in its entirety).
  • the VRCOl-like antibodies partially mimic the binding of the CD4 receptor, with an about 6 A shift and an about 43 degree rotation from the CD4- defined position (see FIG. 2d of Zhou et ah, "Structural Basis for Broad and Potent
  • a VRCOl-like antibody is an antibody with heavy and light chain in an orientation of heavy chain relative to gpl20, that differs by less than 10 about degrees, such as less than about 9 about degrees, less than about 8 about degrees, less than about 7 about degrees, less than about 6 about degrees, less than about 5 about degrees, or less than about 4 degrees, such as about 10-8 degrees, about 10-7 degrees, about 9-6 degrees, or about 9-5 degrees, and/or less than about a 5 A translation from the binding angle of VRCOl and/or VRC03 to gpl20, such as less than about a 5 A translation, such as less than about a 4 A translation, than about a 3
  • a translation less than about a 2 A translation, or less than about a 2 A translation, for example a 2-3 A translation a 5-3 A translation, or a 4-2 A translation.
  • a binding characteristic can readily be determined from the crystal structure of the VRC03 or VRCOl antibody complex.
  • the CDR H2 region (the C" strand in particular) forms hydrogen-bonds to the b-15 loop of gpl20.
  • Asp 368 of gpl20 forms a salt-bridge with Arg 71 of the heavy chain. All VRCOl-like antibodies need to mimic CD4 with similar heavy chain orientations.
  • a VRCOl-like antibody is one that is identified by the methods set forth in Section D.
  • a VRCOl-like antibody includes CDRs of the heavy chain variable domain sequences identified by the methods set forth in Section D and a light chain, such as a light chain from a known
  • VRC02 for example VRC02, VRC03, NIH45-46, VRC-PG04, VRC-
  • VRC-CH30 VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
  • a VRCOl-like antibody includes an amino acid sequence of a heavy chain variable domain sequence identified by the methods set forth in Section D and a light chain, such as a light chain from a known VCROl-like antibody, for example VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
  • a known VCROl-like antibody for example VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC
  • a VRCOl-like antibody does not include a heavy chain with an amino acid sequence of an established VRCOl-like antibody.
  • the heavy chain variable domain sequence identified by the methods set forth in Section D is not a heavy chain variable domain sequence from an established VRCOl-like antibody.
  • a VRCOl-like antibody does not include a light chain with an amino acid sequence of an established VRCOl-like antibody.
  • the light chain variable domain sequence identified by the methods set forth in Section D is not a light chain variable domain sequence from an established VRCOl-like antibody.
  • a nucleic acid encoding a VRCOl-like antibody heavy chain variable domains is derived from the IGHV1-2 germline allelic origin, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV 1-2*05 germline allelic origin.
  • a nucleic acid encoding a VRCOl-like antibody light chain variable domain is derived from a IGKV3 allelic origin.
  • a nucleic acid sequence encoding a VRCOl-like antibody heavy or light chain variable domain derived from the IGHV 1-2 germline for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV1- 2*05 germline or IGKV3 germline origin is about 10%, 15%, 20%, 25%, 30%, 35% or 40%, such as about 15% to 40% divergent, such as 25% divergent from the heavy or light germline sequence of interest, such as the IGHV 1-2 germline sequence, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV1- 2*05 germline sequence, or a IGKV3 germline sequence, respectively.
  • a nucleic acid sequence encoding a VRCOl-like antibody heavy or light chain variable domain is derived from the IGHV 1-2 germline, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline, or IGKV3 germline, respectively, and is about 55% 60%, 65% 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical, such as 60% to 99% identical, such as 85% identical, to a heavy (or light) chain variable domain of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody heavy (or light) chain variable domain.
  • a VRCOl -like antibody heavy or light chain variable domain derived from the IGHV1-2 germline for example the IGHV 1-2*01, IGHV 1-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline, or IGKV3 germline a nucleic at sequences that is about 55% 60%, 65% 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical, such as 60% to 99% identical, such as 85% identical, to a heavy (or light) chain variable domain of an antibody that is known to be broadly neutralizing to HIV, such as a VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody heavy (or light) chain variable domain and has a nucleic sequences that is about 10%, 15%, 20%
  • the heavy chain of a VRCOl -like antibody can be complemented by the light chain of VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 antibody and still retain binding for gpl20, for example retain specific binding for residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20.
  • the VRCOl light chain and VRC03 heavy chain form active antibodies able to specifically bind HIV- 1.
  • the VRCOl heavy chain and VRC03 light chain form active antibodies that recognize HIV-1.
  • VCROl-like antibodies that can be identified by complementation of the heavy or light chains of VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC- CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
  • the heavy chain amino acid sequences of one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and/or encoded by one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710 are demonstrated to be VRCOl-like antibodies that specifically bind gpl20.
  • the VRCOl-like antibody include one of the heavy chain amino acid sequences set forth as one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and/or encoded by one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710, and a light chain from VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
  • binding to the antigen of interest such as, but not limited to, gpl20
  • an epitope of interest such as, but not limited to, RSC3
  • variable domain of interest is a heavy chain variable domain
  • amino acid sequence of this heavy chain variable domain is produced.
  • the heavy chain variable domain is then paired with a reference sequence light chain variable domain, such as VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134, light chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a K D of 10 ⁇ 8 , 10 "9 or 10 "10 .
  • a specified affinity such as a K D of 10 ⁇ 8 , 10 "9 or 10 "10 .
  • variable domain of interest is a light chain variable domain
  • this amino acid sequence is produced.
  • the variable light chain variable domain is then paired with a reference sequence heavy chain variable domain, such as variable domain is then paired with a reference sequence light chain variable domain, such as VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 heavy chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a KD of 10 ⁇ 8 , 10 "9 or 10 "10 .
  • a specified affinity such as a KD of 10 ⁇ 8 , 10 "9 or 10 "10 .
  • a VRCOl-like antibody includes one, two or all three
  • a VRCOl-like antibody includes the heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710, or the amino acid sequence set forth as one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and a light chain.
  • a VRCOl-like antibody includes the heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or
  • the light chain can be the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 light chain.
  • the isolated human monoclonal antibody specifically binds gpl20 and the light chain of the antibody includes amino acids CDR1 CDR2 and/or CDR3 of SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03).
  • VRC-PG04 amino acids CDR1 CDR2 and/or CDR3 of SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), S
  • the light chain of the antibody includes SEQ ID NO: 1619 (VRC- PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03).
  • a VRCOl-like antibody includes the CDRs from a light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 and 4 and a heavy chain, such as a heavy chain from the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody.
  • a VRCOl-like antibody includes the light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 and 4 and a heavy chain, such as a heavy chain from the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody.
  • a heavy chain such as a heavy chain from the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody.
  • the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDR1 CDR2 and/or CDR3 of SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC- CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03).
  • a VRCOl-like antibody includes a light chain amino acid sequence comprising a L-CDR3 including three amino acids, that are, in order, a hydrophobic, a negative change, and a hydrophobic amino acid.
  • the L-CDR3 includes, tyrosine-glutamic acid, and phenylalanine.
  • the L-CDR3 includes CQQYEFFG.
  • the heavy chain variable domain of the VRCOl-like antibody can include the CDRs from any heavy chain variable domain identified using the methods disclosed herein.
  • the VRC-01-like antibody includes a heavy chain variable domain comprising the heavy chain CDRs encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, and 1707-1710, or the CDRs from the amino acid sequence set forth as one of SEQ ID NOs: 1627-1646, 1655- 1658 or 2537-2623.
  • the VRCOl-like antibody includes a heavy chain with CDR1 CDR2 and/or CDR3 of SEQ ID NO: SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03).
  • the antibodies specifically bind gpl20 and/or neutralize HIV.
  • the heavy chain variable domain sequences identified by the methods set forth in Section D is not a heavy chain variable domain sequence from an established VRCOl-like antibody. In some embodiments, a
  • VRCOl-like antibody does not include a heavy chain domain from an established VRCOl-like antibody. In some embodiments, a VRCOl-like antibody is not an established VRCOl-like antibody.
  • Fully human monoclonal antibodies include human framework regions.
  • any of the antibodies that specifically bind gpl20 herein can include the human framework region.
  • framework sequences that can be used include the amino acid framework sequences of the heavy and light chains disclosed in PCT Publication No. WO 2006/074071 (see, for example, SEQ ID NOs: 1-16), which is herein incorporated by reference.
  • the monoclonal antibody can be of any isotype.
  • the monoclonal antibody can be, for example, an IgM or an IgG antibody, such as IgGior an IgG 2 .
  • the class of an antibody that specifically binds gpl20 can be switched with another.
  • a nucleic acid molecule encoding V L or V H is isolated using methods well- known in the art, such that it does not include any nucleic acid sequences encoding the constant region of the light or heavy chain, respectively.
  • the nucleic acid molecule encoding V L or V H is then operatively linked to a nucleic acid sequence encoding a C L or C H from a different class of immunoglobulin molecule.
  • This can be achieved using a vector or nucleic acid molecule that comprises a C L or C H chain, as known in the art.
  • an antibody that specifically binds gpl20, that was originally IgM may be class switched to an IgG. Class switching can be used to convert one IgG subclass to another, such as from IgGi to IgG 2 .
  • the disclosed antibodies are multimers of antibodies, such as dimers trimers, tetramers, pentamers, hexamers, septamers, octomers and so on. In some examples, the antibodies are pentamers.
  • the CDRs of the light chain are bounded by the residues at positions 24 and 34 (L-CDR1), 50 and 56 (L-CDR2), 89 and 97 (L-CDR3); the CDRs of the heavy chain are bounded by the residues at positions 31 and 35b (H- CDR1), 50 and 65 (H-CDR2), 95 and 102 (H-CDR3), using the numbering convention delineated by Kabat et al., (1991) Sequences of Proteins of
  • CDR numbering schemes such as the Kabat, Chothia or IMGT numbering schemes
  • Antibody fragments are encompassed by the present disclosure, such as Fab,
  • F(ab') 2 , and Fv which include a heavy chain and light chain variable region and specifically bind an antigen, such as gpl20.
  • antigen such as gpl20.
  • Fab the fragment which contains a monovalent antigen-binding fragment of an antibody molecule
  • Fab' the fragment of an antibody molecule can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain
  • two Fab' fragments are obtained per antibody molecule
  • Fv a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains
  • Single chain antibody (such as scFv), defined as a genetically engineered molecule containing the variable region of the light chain, the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule.
  • a dimer of a single chain antibody (scFV 2 ), defined as a dimer of a scFV. This has also been termed a "miniantibody.”
  • variable region included in the antibody is the variable region of m912.
  • the antibodies are Fv antibodies, which are typically about 25 kDa and contain a complete antigen-binding site with three CDRs per each heavy chain and each light chain.
  • the V H and the V L can be expressed from two individual nucleic acid constructs in a host cell. If the V H and the V L are expressed non-contiguously, the chains of the Fv antibody are typically held together by noncovalent interactions. However, these chains tend to dissociate upon dilution, so methods have been developed to crosslink the chains through glutaraldehyde, intermolecular disulfides, or a peptide linker.
  • the Fv can be a disulfide stabilized Fv (dsFv), wherein the heavy chain variable region and the light chain variable region are chemically linked by disulfide bonds.
  • the Fv fragments comprise V H and V L chains connected by a peptide linker.
  • scFv single-chain antigen binding proteins
  • Antibody fragments can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli of DNA encoding the fragment.
  • Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods.
  • antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab') 2 .
  • This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments.
  • cleaving antibodies such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody.
  • VRC-l-like antibody can be produced.
  • Such conservative variants employed in antibody fragments such as dsFv fragments or in scFv fragments, will retain critical amino acid residues necessary for correct folding and stabilizing between the V H and the V L regions, and will retain the charge characteristics of the residues in order to preserve the low pi and low toxicity of the molecules.
  • additional recombinant neutralizing antibodies that specifically bind the same epitope of gpl20 bound by the antibodies disclosed herein that specifically bind gpl20, such as affinity mature forms can be isolated by screening of a recombinant combinatorial antibody library, such as a Fab phage display library (see, for example, U.S. Patent Application Publication No. 2005/0123900).
  • a recombinant combinatorial antibody library such as a Fab phage display library
  • the V L and V H segments can be randomly mutated, such as within H-CDR3 region or the L- CDR3 region, in a process analogous to the in vivo somatic mutation process responsible for affinity maturation of antibodies during a natural immune response.
  • This in vitro affinity maturation can be accomplished by amplifying V H and V L regions using PCR primers complementary to the H-CDR3 or L-CDR3,
  • the primers have been "spiked” with a random mixture of the four nucleotide bases at certain positions such that the resultant PCR products encode V H and V L segments into which random mutations have been introduced into the V H and/or V L CDR3 regions. These randomly mutated V H and V L segments can be tested to determine the binding affinity for gpl20.
  • nucleic acid encoding the selected antibody can be recovered from the display package (for example, from the phage genome) and subcloned into other expression vectors by standard recombinant DNA techniques, as described herein. If desired, the nucleic acid can be further manipulated to create other antibody fragments, also as described herein.
  • the DNA encoding the antibody is cloned into a recombinant expression vector and introduced into a mammalian host cells, as described herein.
  • the antibodies or antibody fragments disclosed herein can be derivatized or linked to another molecule (such as another peptide or protein).
  • the antibody or portion thereof is derivatized such that the binding to gpl20 is not affected adversely by the derivatization or labeling.
  • the antibody can be functionally linked (by chemical coupling, genetic fusion, noncovalent association or otherwise) to one or more other molecular entities, such as another antibody (for example, a bispecific antibody or a diabody), a detection agent, a pharmaceutical agent, and/or a protein or peptide that can mediate associate of the antibody or antibody portion with another molecule (such as a streptavidin core region or a polyhistidine tag).
  • One type of derivatized antibody is produced by cross-linking two or more antibodies (of the same type or of different types, such as to create bispecific antibodies).
  • Suitable crosslinkers include those that are heterobifunctional, having two distinctly reactive groups separated by an appropriate spacer (such as m- maleimidobenzoyl-N-hydroxysuccinimide ester) or homobifunctional (such as disuccinimidyl suberate).
  • Such linkers are available from Pierce Chemical Company (Rockford, IL).
  • An antibody that specifically binds gpl20 can be labeled with a detectable moiety.
  • Useful detection agents include fluorescent compounds, including fluorescein, fluorescein isothiocyanate, rhodamine, 5-dimethylamine-l- napthalenesulfonyl chloride, phycoerythrin, lanthanide phosphors and the like.
  • Bioluminescent markers are also of use, such as luciferase, Green fluorescent protein, Yellow fluorescent protein.
  • An antibody can also be labeled with enzymes that are useful for detection, such as horseradish peroxidase, ⁇ - galactosidase, luciferase, alkaline phosphatase, glucose oxidase and the like.
  • an antibody When an antibody is labeled with a detectable enzyme, it can be detected by adding additional reagents that the enzyme uses to produce a reaction product that can be discerned. For example, when the agent horseradish peroxidase is present the addition of hydrogen peroxide and diaminobenzidine leads to a colored reaction product, which is visually detectable.
  • An antibody may also be labeled with biotin, and detected through indirect measurement of avidin or streptavidin binding. It should be noted that the avidin itself can be labeled with an enzyme or a fluorescent label.
  • An antibody may be labeled with a magnetic agent, such as gadolinium.
  • Antibodies can also be labeled with lanthanides (such as europium and dysprosium), and manganese.
  • Paramagnetic particles such as superparamagnetic iron oxide are also of use as labels.
  • An antibody may also be labeled with a predetermined polypeptide epitopes recognized by a secondary reporter (such as leucine zipper pair sequences, binding sites for secondary antibodies, metal binding domains, epitope tags).
  • labels are attached by spacer arms of various lengths to reduce potential steric hindrance.
  • An antibody can also be labeled with a radiolabeled amino acid.
  • the radiolabel may be used for both diagnostic and therapeutic purposes.
  • Examples of labels for polypeptides include, but are not limited to, the following radioisotopes or radionucleotides: 3 H, 14 C, 15 N, 35 S, 90 Y, 99 Tc, m In, 125 I, 131 I.
  • An antibody can also be derivatized with a chemical group such as polyethylene glycol (PEG), a methyl or ethyl group, or a carbohydrate group. These groups may be useful to improve the biological characteristics of the antibody, such as to increase serum half-life or to increase tissue binding.
  • PEG polyethylene glycol
  • methyl or ethyl group a methyl or ethyl group
  • carbohydrate group a chemical group such as polyethylene glycol (PEG), a methyl or ethyl group, or a carbohydrate group.
  • radiolabels may be detected using photographic film or scintillation counters
  • fluorescent markers may be detected using a photodetector to detect emitted illumination
  • Enzymatic labels are typically detected by providing the enzyme with a substrate and detecting the reaction product produced by the action of the enzyme on the substrate, and colorimetric labels are detected by simply visualizing the colored label.
  • the present disclosure also relates to the crystals obtained from the VRC-
  • VRC-PG04b VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03, antibody or portions thereof in complex with gpl20, the crystal structures of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20, the three-dimensional coordinates of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20 and three-dimensional structures of models of the VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32
  • the three dimensional coordinates of VRCOl in complex with gpl20 are available at the Protein Data Bank, at accession number 3NGB.
  • a set of structure coordinates for the VRC-PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20 or a portion thereof is a relative set of points that define a shape in three dimensions.
  • an entirely different set of coordinates could define a similar or identical shape.
  • slight variations in the individual coordinates will have little effect on overall shape.
  • the variations in coordinates discussed above may be generated because of mathematical manipulations of the structure coordinates.
  • This disclosure further provides systems, such as computer systems, intended to generate structures and/or perform rational drug or compound design for an antigenic compound capable of eliciting an immune response in a subject.
  • the system can contain one or more or all of: atomic co-ordinate data according to VRC- PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03 antibody complex or a subset thereof, and the figures derived therefrom by homology modeling, the data defining the three- dimensional structure of a VRC-PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody complex or at least one sub-domain thereof, or structure factor data for gpl20, the structure factor data being derivable from the atomic co-ordinate data of VRC- PG04, V
  • Nucleic acid molecules (also referred to as polynucleotides) encoding the antibody heavy and light chains provided herein (including, but not limited to VRCOl -like antibodies disclosed herein) can readily be produced by one of skill in the art.
  • these nucleic acids can be produced using the amino acid sequences provided herein (such as the CDR sequences, heavy chain and light chain sequences), sequences available in the art (such as framework sequences), and the genetic code.
  • the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDRl, CDR2, and CDR3 encoded by any one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707- 1710.
  • the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDRl, CDR2, and CDR3 of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623.
  • nucleic acids which differ in sequence but which encode the same antibody sequence, or encode a conjugate or fusion protein including the V L and/or V H nucleic acid sequence.
  • Nucleic acid sequences encoding the antibodies that specifically bind gpl20 can be prepared by any suitable method including, for example, cloning of appropriate sequences or by direct chemical synthesis by methods such as the phosphotriester method of Narang et al., Meth. Enzymol. 68:90-99, 1979; the phosphodiester method of Brown et al., Meth. Enzymol. 68: 109-151, 1979; the diethylphosphoramidite method of Beaucage et al., Tetra. Lett. 22: 1859-1862, 1981; the solid phase phosphoramidite triester method described by Beaucage &
  • Exemplary nucleic acids can be prepared by cloning techniques. Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Sambrook et al, supra, Berger and Kimmel (eds.), supra, and Ausubel, supra. Product information from manufacturers of biological reagents and experimental equipment also provide useful information. Such manufacturers include the SIGMA Chemical Company
  • Nucleic acids can also be prepared by amplification methods.
  • Amplification methods include polymerase chain reaction (PCR), the ligase chain reaction (LCR), the transcription-based amplification system (TAS), the self-sustained sequence replication system (3SR).
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • TAS transcription-based amplification system
  • 3SR self-sustained sequence replication system
  • nucleic acids encoding any of the antibodies, VH and/or VL, disclosed herein (or fragment thereof) can be expressed in a recombinantly engineered cell such as bacteria, plant, yeast, insect and mammalian cells. These antibodies can be expressed as individual VH and/or VL chain, or can be expressed as a fusion protein. An immunoadhesin can also be expressed. Thus, in some examples, nucleic acids encoding a VH and VL, and immunoadhesin are provided. The nucleic acid sequences can optionally encode a leader sequence.
  • the VH- and VL-encoding DNA fragments are operatively linked to another fragment encoding a flexible linker, e.g., encoding the amino acid sequence (Gly 4 -Ser) 3 , such that the VH and VL sequences can be expressed as a contiguous single-chain protein, with the VL and VH domains joined by the flexible linker (see, e.g., Bird et al., Science 242:423-426, 1988;
  • cleavage site can be included in a linker, such as a furin cleavage site.
  • the nucleic acid encoding the VH and/or the VL optionally can encode an Fc domain (immunoadhesin).
  • the Fc domain can be an IgA, IgM or IgG Fc domain.
  • the Fc domain can be an optimized Fc domain, as described in U.S. Published Patent Application No. 20100/093979, incorporated herein by reference.
  • the immunoadhesin is an IgGi Fc.
  • the single chain antibody may be monovalent, if only a single V H and V L are used, bivalent, if two V H and V L are used, or polyvalent, if more than two V H and V L are used. Bispecific or polyvalent antibodies may be generated that bind specifically to gpl20 and to another molecule, such as gp41.
  • the encoded V H and V L optionally can include a furin cleavage site between the V H and V L domains.
  • the host cell can be a gram positive bacteria including, butare not limited to,
  • Lactobacillus Lactococcus, Clostridium, Geobacillus, and Oceanobacillus.
  • Gram negative bacteria include, but not limited to, E. coli, Pseudomonas, Salmonella, Campylobacter, Helicobacter, Flavobacterium, Fusobacterium, Ilyobacter, Neisseria, and Ureaplasma.
  • One or more DNA sequences encoding the antibody or fragment thereof can be expressed in vitro by DNA transfer into a suitable host cell.
  • the cell may be prokaryotic or eukaryotic.
  • the term also includes any progeny of the subject host cell. It is understood that all progeny may not be identical to the parental cell since there may be mutations that occur during replication. Methods of stable transfer, meaning that the foreign DNA is continuously maintained in the host, are known in the art. Hybridomas expressing the antibodies of interest are also encompassed by this disclosure.
  • nucleic acids encoding the isolated proteins described herein can be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or inducible), followed by incorporation into an expression cassette.
  • the promoter can be any promoter of interest, including a cytomegalovirus promoter and a human T cell lymphotrophic virus promoter (HTLV)-l.
  • an enhancer such as a cytomegalovirus enhancer, is included in the construct.
  • the cassettes can be suitable for replication and integration in either prokaryotes or eukaryotes. Typical expression cassettes contain specific sequences useful for regulation of the expression of the DNA encoding the protein.
  • the expression cassettes can include appropriate promoters, enhancers, transcription and translation terminators, initiation sequences, a start codon (i.e., ATG) in front of a protein-encoding gene, splicing signal for introns, sequences for the maintenance of the correct reading frame of that gene to permit proper translation of mRNA, and stop codons.
  • the vector can encode a selectable marker, such as a marker encoding drug resistance (for example, ampicillin or tetracycline resistance).
  • expression cassettes which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational initiation (internal ribosomal binding sequences), and a transcription/translation terminator.
  • a strong promoter to direct transcription e. coli
  • a ribosome binding site for translational initiation e. coli
  • a transcription/translation terminator e. coli, this includes a promoter such as the T7, trp, lac, or lambda promoters, a ribosome binding site, and preferably a transcription termination signal.
  • control sequences can include a promoter and/or an enhancer derived from, for example, an immunoglobulin gene, HTLV, SV40 or cytomegalovirus, and a polyadenylation sequence, and can further include splice donor and/or acceptor sequences (for example, CMV and/or HTLV splice acceptor and donor sequences).
  • the cassettes can be transferred into the chosen host cell by well-known methods such as transformation or electroporation for E. coli and calcium phosphate treatment, electroporation or lipofection for mammalian cells. Cells transformed by the cassettes can be selected by resistance to antibiotics conferred by genes contained in the cassettes, such as the amp, gpt, neo and hyg genes.
  • Eukaryotic cells can also be cotransformed with
  • a eukaryotic viral vector such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein (see for example, Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982).
  • SV40 simian virus 40
  • bovine papilloma virus bovine papilloma virus
  • Modifications can be made to a nucleic acid encoding a polypeptide described herein without diminishing its biological activity. Some modifications can be made to facilitate the cloning, expression, or incorporation of the targeting molecule into a fusion protein. Such modifications are well known to those of skill in the art and include, for example, termination codons, a methionine added at the amino terminus to provide an initiation, site, additional amino acids placed on either terminus to create conveniently located restriction sites, or additional amino acids (such as poly His) to aid in purification steps.
  • the immunoconjugates, effector moieties, and antibodies of the present disclosure can also be constructed in whole or in part using standard peptide synthesis well known in the art.
  • the recombinant immunoconjugates, antibodies, and/or effector molecules can be purified according to standard procedures of the art, including ammonium sulfate precipitation, affinity columns, column
  • the antibodies, immunoconjugates and effector molecules need not be 100% pure. Once purified, partially or to homogeneity as desired, if to be used therapeutically, the polypeptides should be substantially free of endotoxin.
  • a reducing agent must be present to separate disulfide bonds.
  • An exemplary buffer with a reducing agent is: 0.1 M Tris pH 8, 6 M guanidine, 2 mM EDTA, 0.3 M DTE (dithioerythritol).
  • Reoxidation of the disulfide bonds can occur in the presence of low molecular weight thiol reagents in reduced and oxidized form, as described in Saxena et ah, Biochemistry 9: 5015-5021, 1970, and especially as described by Buchner et ah, supra.
  • Renaturation is typically accomplished by dilution (for example, 100-fold) of the denatured and reduced protein into refolding buffer.
  • An exemplary buffer is 0.1 M Tris, pH 8.0, 0.5 M L-arginine, 8 mM oxidized glutathione (GSSG), and 2 mM EDTA.
  • the heavy and light chain regions are separately solubilized and reduced and then combined in the refolding solution.
  • An exemplary yield is obtained when these two proteins are mixed in a molar ratio such that a 5-fold molar excess of one protein over the other is not exceeded.
  • Excess oxidized glutathione or other oxidizing low molecular weight compounds can be added to the refolding solution after the redox-shuffling is completed.
  • the antibodies, labeled antibodies and functional fragments thereof that are disclosed herein can also be constructed in whole or in part using standard peptide synthesis.
  • Solid phase synthesis of the polypeptides of less than about 50 amino acids in length can be accomplished by attaching the C-terminal amino acid of the sequence to an insoluble support followed by sequential addition of the remaining amino acids in the sequence.
  • Proteins of greater length may be synthesized by condensation of the amino and carboxyl termini of shorter fragments. Methods of forming peptide bonds by activation of a carboxyl terminal end (such as by the use of the coupling reagent N, N'-dicylohexylcarbodimide) are well known in the art.
  • Epitope scaffolds have been used to isolate antibodies with particular binding specificity (See PCT Publication No. WO 2008/025015). Briefly, an epitope, such as an epitope of a pathogenic agent (for example, an epitope of an HIV-1 polypeptide) recognized by broadly neutralizing antibodies is placed into an appropriate peptide scaffold that preserves its structure and antigenicity. Such epitope scaffolds can then be used as an immunogen to elicit an epitope- specific antibody response in a subject. In another example, such scaffolds can be used to identify specific serum reactivities against the target epitope of the scaffold. This scaffolding technology is applicable not only to HIV-1, but to any pathogen for which a broadly neutralizing antibody and its respective epitope has been characterized at the atomic-level.
  • epitope-protein scaffolds which elicit selected neutralizing antibodies is disclosed in PCT Publication No. WO 2008/025015, which is incorporated herein by reference.
  • the protocols utilize searchable databases containing the three dimensional structure of proteins, epitopes, and epitope-antibody complexes to identify proteins that are capable of structurally accommodating at least one selected epitope on their surface. Protein folding energetic predictions are further utilized to make energetic predictions. The predicted energies may be used to optimize the structure of the epitope- scaffold and filter results on the basis of energy criteria in order to reduce the number of candidate proteins and identify energetically stable epitope-scaffolds.
  • a "superposition" epitope-scaffold can be designed and utilized.
  • Superposition epitope-scaffolds are based upon scaffold proteins having an exposed segment on their surface with a similar conformation as a selected target epitope.
  • the backbone atoms in this superposition region can be structurally superimposed onto the target epitope with less than a selected level of deviation from their native configuration.
  • Candidate scaffolds are identified by
  • the candidate scaffolds are further designed by putting epitope residues in the superposition region of the scaffold protein and making additional mutations on the surrounding surface of the scaffold to prevent undesirable interactions between the scaffold and the epitope or the scaffold and the antibody.
  • Superposition is advantageous in that it is a conservative technique.
  • Epitope- scaffolds designed by superposition require only a limited number of mutations on the surface of known, stable proteins. Thus, the designs can be produced rapidly and a high fraction of the first round designs are likely to fold properly.
  • Grafting epitope scaffolds utilize scaffold proteins that can accommodate replacement of an exposed segment with the crystallized conformation of the target epitope. For each suitable scaffold identified by computationally searching through a database of known three-dimensional structures, an exposed segment is replaced by the target epitope. The surrounding protein side chains are further mutated to accommodate and stabilize the inserted epitope. Mutations are further made on the surface of the scaffold to avoid undesirable interactions between the scaffold and epitope or scaffold and antibody. Grafting epitope-scaffolds should substantially mimic the epitope-antibody interaction, as the epitope is presented in substantially its native conformation. As such, grafting may be utilized to treat complex epitopes which are more difficult to incorporate using superposition techniques.
  • ROSETTATM is a software application, developed at least in part at the University of Washington which provides protein structure predictions.
  • ROSETTATM utilizes physical models of the macromolecular interactions and algorithms for finding the lowest energy structure for an amino acid sequence in order to predict the structure of a protein.
  • ROSETTATM may use these models and algorithms to find the lowest energy amino acid sequence for a protein or protein-protein complex for protein design.
  • the ROSETTATM energy function and several modules of the ROSETTATM protein structure modeling and design platform are employed in the protein scaffold design discussed below.
  • an original (parental) antibody that specifically binds a scaffold epitope is identified and sequenced.
  • the antibody binding determinants of antibody reactivity are then identified by mutagenesis (for example, amino acid substitutions) of the antibody sequences, wherein variant antibodies are produced.
  • These amino acid substitutions can be made in one or more CDRs and/or in one or more framework regions of the original antibody.
  • the amino acid substitutions can be a replacement of the amino acid in the original antibody for a tryptophan.
  • the antibodies include at most one, at most two, at most three or at most four amino acids substitutions, such as in the CDRs.
  • These variant antibodies, such as the antibodies including one, two, three or four amino acids substitutions are then evaluated for binding to the epitope scaffold.
  • Antibodies are selected that have altered binding affinity for the epitope scaffold as compared to the original (parental) antibody.
  • selection of residues for mutagenesis is aided by structural modeling of the scaffold- antibody interaction.
  • the amino acid(s) that have been identified as critical for antibody reactivity are then further substituted and the effects on antibody reactivity measured by further probing with the epitope scaffold.
  • the epitope scaffold probe is fused to a biotinylation peptide.
  • the amino acid residues in the antibody that are responsible for specific binding to the epitope are indicated by a decrease in antibody affinity of the variant antibody as compared to the parental antibody.
  • antibodies are selected wherein binding is decreased by at least 20%, at least 30% at least 40% at least 50% at least 100% (2-fold), at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000% (10-fold) as compared to the original antibody.
  • the decrease of affinity for the scaffold identifies the variant antibody as compared to the parental antibody to identify the one or more amino acids as critical for antigen binding.
  • the complete loss of antibody binding affinity for the epitope scaffold identifies the one or more amino acid residues as critical for specific binding of antibody to the epitope.
  • variant antibodies are selected wherein binding is increased by at least 20%, at least 30% at least 40% at least 50% at least 100% (2- fold), at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000% (10-fold) as compared to the parental antibody.
  • An increase of the binding of the variant antibody as compared to the parental antibody identifies the one or more amino acid residues as critical for specific binding of the antibody to the epitope.
  • an exemplary method is disclosed herein, wherein this technology is utilized for an antibody that specifically binds an antigenic glycoprotein of HIV.
  • this method is broadly applicable to antibodies that specifically bind any antigen of interest.
  • the antibody specifically binds a pathogen of interest. Pathogens include viruses, fungi, bacteria, and protozoa.
  • the antibody specifically binds a tumor antigen of interest.
  • a subject is selected that produces, or has broadly neutralizing sera, such that the B- cells isolated from that subject are believed express one or more broadly neutralizing antibodies to an antigen of interest, such as an antigen from a pathogenic organism or a tumor.
  • B-cells are isolated from the subject, and the isolated B-cells are contacted with a target antigen of interest, such as a resurfaced antigen, and the complex of the B-cells and the target antigen of interest is isolated.
  • Nucleic acids are obtained from the B-cells are analyzed and antibodies encoded by the Ig gene are synthesized and the antibodies are further characterized.
  • the antibody antigen complexes are further characterized structurally, for example using
  • X-ray diffraction methods which allows the important antibody/antigen contacts to be mapped.
  • This information can be used to define classes of neutralizing antibodies specific for an antigen of interest, for example as is disclosed herein for the class of VRCOl-like antibodies.
  • the structural information about the antigen/antibody contacts and conformation can be analyzed in conjunction with sequencing data, such as 454 sequencing data, to identify additional antibodies that have the same or similar binding properties, in that they are highly specific for a specific neutralizing epitope on the surface of the antigen of interest.
  • sequencing data such as 454 sequencing data
  • genomic analyses of B-cell cDNA libraries provide insight into sequence complexity and can be used to identify neutralizing antibodies of interest. These sequences of B cell variable domains specify the functional antibodyome, the repertoire of expressed antibody heavy and light chain sequences in each individual.
  • High-throughput sequencing methods provide heavy chain and light chain sequences of antibodies, can be used in certain genomic analyses, such as cross-donor phylogenetic analyses, to identify an antibody that binds an antigen of interest, or an epitope of this antigen.
  • the antigen can be any antigen of interest, such as a viral antigen.
  • the antigen is gpl20, or an epitope thereof, such as, but not limited to, resurfaced stabilized core 3 probe (RSC3).
  • RSC3 resurfaced stabilized core 3 probe
  • the methods disclosed herein are of use for identifying a class of antibodies of interest, such as, but not limited to, VRCOl-like antibodies.
  • the methods disclosed herein can be used to identify the heavy and/or light chain variable domains of antibodies that specifically bind gpl20, and specific subsets of these antibodies. In some example, the methods identify the heavy or light chain domains of VRCOl-like antibodies. Antibodies can be produced that include a heavy or light chain variable domain (or the CDR sequences in a framework region) identified by these methods, and a light chain variable domain for heavy chain variable domain (or the CDR sequences in a framework region).
  • Both of the heavy chain and the light chain variable domain are from antibodies that bind the same antigen, such that the resulting antibody (including three heavy chain CDRs and three light chain CDRs) binds this antigen.
  • a VRCOl-like antibody in some embodiments, functional characterization of selected sequences can be achieved through expression of an identified heavy or light chain variable domains followed by reconstitution with a corresponding known VRCOl-like light or heavy chain variable domains (respectively) into an antibody.
  • the methods include isolating a sample such as a B cell sample from a subject (for example, see the methods described in the preceding section), such as a subject infected with HIV, and sequencing the heavy and/or chain variable domains.
  • a data processing program can be used to assess sequence identity and divergence, such as divergence from a germline gene of interest and/or identity to a variable heavy and or variable light chain gene of interest.
  • These programs are known to one of skill in the art, and include JOINSOLVR® (available on the internet the NIAID website), IMGT/V-Quest (the international
  • binding to the antigen of interest such as, but not limited to, gpl20
  • an epitope of interest such as, but not limited to, RSC3
  • variable domain of interest is a heavy chain variable domain
  • amino acid sequence of this heavy chain variable domain is produced.
  • the heavy chain variable domain is then paired with a reference sequence light chain variable domain, such as VRC-PG04, VRCPG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03, light chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a KD of 10 "8 , 10 "9 or 10 "10 mol/L.
  • Methods of sieving a population of antibody heavy chain sequences (such as an antibodyome of a subject) using cross-donor phylogenetic analysis to identify heavy chain variable domains of interest are provided.
  • additional functional characterization is also used.
  • a VRCOl-like heavy chain variable domain is identified by performing a cross-donor phylogenetic analysis on a population of heavy chain variable domain nucleic acid sequences that were obtained by sequencing nucleic acids, specifically heavy chain variable domains, from a sample of B cells from a subject infected with a virus, such as HIV.
  • the nucleic acids from the sample of B cells is amplified prior to sequencing.
  • Variations of the cross-donor phylogenetic analysis are provided herein, including all-origin cross donor phylogenetic analysis (which analyzes heavy chain sequences from IGHV1-2 germline origin and also other germline origins) and IGHV1-2 origin cross donor phylogenetic analysis (which analyzes heavy chain sequences derived from IGHV1-2 germline origin), such as to identify and isolate VRCOl-like heavy chain sequences.
  • all-origin cross donor phylogenetic analysis which analyzes heavy chain sequences from IGHV1-2 germline origin and also other germline origins
  • IGHV1-2 origin cross donor phylogenetic analysis which analyzes heavy chain sequences derived from IGHV1-2 germline origin
  • heavy chain variable region sequences with an IGHV1-2 germline origin are used as the population of sequence for analysis.
  • the germline origins are assigned to each sequence using, for example the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site: ncbi.nlm.nih.gov/igblast/showGermline.cgi, which is specifically incorporated herein by reference in its entirety) and those sequences that are not assigned to the IGHV1-2 germline sequence, such as the IGHV1-2*01, IGHV1- 2*02, IGHVl-2*03, IGHVl-2*04, or IGHVl-2*05 germline sequence, are not included in the population of test sequences used for the cross-donor phylogenetic analysis.
  • the cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from at least one known VRCOl-like antibody (reference antibody sequence or sequences), such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 to the population of test sequences.
  • reference antibody sequence or sequences such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 to the population of test sequences.
  • At least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 are added to the population of test sequences.
  • nucleotide sequences of the IGHV1-2 germline such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV 1-2*04, IGHV 1-2*05 germline, or of the V-gene reverted sequence for VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRC01 , VRC02, or VRC03 are added to the test population of heavy chain variable domain nucleic acid sequences. This forms a population of nucleic acid sequences for analysis, creating an analytic population.
  • the known VCROl-like sequences are includes as a reference for the segregation of the VRCOl-like sequences, and the germline sequence is included to root the population of sequences to a common ancestor.
  • a phylogenetic tree is constructed from this analytic population of heavy chain variable domain sequences for example by using neighbor joining analysis (for example as using the program ClustalW2
  • IGHV1- 2 germline sequence such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline sequence.
  • the nucleic acid sequences of interest that segregate in a distinct branch (such as a subtree) of the phylogenetic tree with the at least one known VRCOl-like antibody such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody such as least one of VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • the methods include selecting a nucleic acid sequence of interest that segregates in a subtree (such as the smallest subtree) with the at least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) is interposed between the IGHV 1-2*02 germline origin and the subtree in a distinct branch in the phylogenetic tree.
  • a subtree such as the smallest subtree
  • the analytic population of heavy chain variable domain nucleic acid sequences from a subject is divided into sub-populations, and cross donor phylogenetic analysis on each subpopulation is performed independently.
  • the nucleic acid sequences identified in each of the subpopulation can then be pooled and/or combined with other heavy chain variable domain nucleic acid sequences from the subject and the cross donor phylogenetic analysis performed iteratively until convergence.
  • All of the sequences in a branch of the phylogenetic tree that segregate with the at least one known VRCOl-like antibody are identified as nucleic acid sequences that encode a VRCOl-like heavy chain variable domains.
  • the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • the VRCOl-like antibody heavy chain nucleic acid sequences are identified in a stepwise fashion, in the first step an iterative screening of the antibody heavy chain nucleic acid sequences is performed based on neighbor-joining (NJ) phylogenetic analysis.
  • NJ neighbor-joining
  • germline origins are assigned to each sequences using, for example using the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site: ncbi.nlm.nih.gov/igblast/showGermline.cgi).
  • IGHV 1-2 germline such as the IGHV 1-2*01, IGHV1- 2*02, IGHVl-2*03, IGHVl-2*04, or IGHVl-2*05 germline
  • an iterative procedure based on the NJ method is used to search for a small set of potentially VRCOl-like sequences.
  • the full-length sequences of the IGHV1-2 germline origin such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , IGHVl-2*05 germline origin are divided into subsets of sequences amenable to computational analysis, such as subsets of 2,500 sequences, 5,000 sequences, or 10,000 sequences. The size of these subsets is determined by the computing capabilities used in the analysis.
  • nucleotide sequences of heavy-chain variable domains of one or more known VRCOl-like antibodies are added to each set, and a NJ phylogenetic tree is constructed, for example using the program ClustalW2 "Phylogenetic trees" option.
  • the nucleotide distance is calculated as percent divergence between all pairs of sequences in the multiple sequence alignment.
  • the sequences clustered in a distinct branch containing the one or more known VRCOl-like antibodies VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 are extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis.
  • those sequences that do not segregate in the branch of the phylogenetic tree containing the one or more of the known VRCOl-like antibodies VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 are discarded for the next round of phylogenetic tree construction. New nucleotide sequences of heavy-chain variable domains are then added together and the process is repeated in a recursive loop until convergence.
  • Convergence occurs when all sequences in the phylogenetic tree reside within a subtree containing the known neutralizing mAbs rooted in germline and no other sequences reside between this subtree and the root, and further repetition of the analysis does not alter the constructed NJ tree.
  • a second step is performed.
  • the second step involves maximum- likelihood (ML) phylogenetic analysis of the obtained sequences.
  • ML maximum- likelihood
  • a multiple sequence alignment can be constructed in a similar manner to the first step and provided as input to construct phylogenetic trees using Maximum Likelihood analysis
  • vailalcmgm.stanford.edu/phylip/dnamlk.html as part of the PHYLIP package v3.69 (as available on the world wide web at
  • the molecular clock is the assumption that the tips of the tree are all equidistant, in branch length, from its root. In some examples the calculation is done with default parameters (empirical base frequencies, the transitions to transversions ratio of 2.0, and the overall base substitution model as A 0.24, C 0.28, G 0.27, T 0.21).
  • the output unrooted tree can be visualized using Dendroscope and ordered to ladderize right and rooted at the sequence the IGHV1-2 germline, such as the IGHV1-2*01, IGHVl-2*02, IGHV1- 2*03, IGHVl-2*04, or IGHVl-2*05 germline.
  • a nucleic acid sequence of interest that segregates in a subtree with the at least one of VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 is interposed between the IGHV 1-2*02 germline origin and the subtree in a distinct branch in the phylogenetic tree is selected.
  • the identified VRCOl-like antibody heavy chains are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity. b. All origin cross donor phylogenetic analysis
  • heavy chain variable region sequences having an IGHV1-2 germline origin as well as heavy chain variable region sequences from other germline origins are used as the population of sequences for analysis.
  • the initial test population of nucleic acid heavy chain sequences includes IGHV1-2 origin sequences, and nucleic acid heavy chain sequences of up to all origins.
  • the cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from at least one known VRCOl -like antibody (reference antibody sequence or sequences), such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC- CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 to the test population.
  • VRCOl -like antibody reference antibody sequence or sequences
  • the nucleotide sequences of the IGHV1-2 germline such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04, IGHVl-2*05 germline are also added to the population of heavy chain variable domain nucleic acid sequences. This forms a population of nucleic acid sequences for analysis, for example to create an analytic population.
  • the known VCROl-like sequences are included as a reference for the segregation of the VRCOl -like sequences and the germline sequences are included to root the population of sequences to their ancestor.
  • a phylogenetic tree is constructed from this population of heavy chain variable domain sequences for example by using neighbor joining analysis (for example as using the program ClustalW2 "Phylogenetic trees" option). In the tree-building process, the nucleotide distance is calculated as percent divergence between all pairs of sequences in the multiple sequence alignment.
  • the phylogenetic tree is rooted at the IGHV1-2 germline sequence, such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV1- 2*04 , or IGHV 1-2*05 germline sequence.
  • nucleic acid sequences of interest in the analytic population that segregate in a distinct branch in the phylogenetic tree with the at least one known VRCOl -like antibody sequence, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or
  • 8ANC134 (or any combination thereof, such as all of the known VRCOl -like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • the population of heavy chain variable domain nucleic acid sequences from a subject is divided into sub-populations, and cross donor phylogenetic analysis on each subpopulation is performed independently.
  • the nucleic acid sequences identified in each of the subpopulations can then be pooled and/or combined with other heavy chain variable domain nucleic acid sequences from the subject and the cross donor phylogenetic analysis performed iteratively until convergence.
  • All of the sequences in a branch of the phylogenetic tree that segregate with at least one known VRCOl-like antibody such as least one of VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are identified as nucleic acid sequences that encode a VRCOl-like heavy chain variable domains.
  • the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
  • an iterative procedure is followed.
  • the first round of cross-donor analysis is performed as described above and the sequences clustered in a distinct branch (such as the smallest subtree) of the phylogenetic tree containing one or more of the known VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or a)
  • the process is repeated in a recursive loop until convergence. That is until all sequences in the phylogenetic tree reside within a subtree containing the known neutralizing mAbs rooted in germline and no other sequences reside between this subtree and the root, and further repetition of the analysis does not alter the constructed NJ tree.
  • the identified VRCOl-like antibody heavy chains are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity.
  • a VRCOl-like light chain variable domain is identified by performing a bioinformatic analysis on a population of light chain variable domain nucleic acid sequences that were obtained by sequencing nucleic acids, specifically light chain variable domains, from a sample of B cells from a subject.
  • the subject can be any subject of interest, such as, but not limited to a subject with a viral infection, such as HIV.
  • the nucleic acids from the sample of B cells are amplified prior to sequencing. These nucleic acid sequences form the test population.
  • the germline origins are assigned to each sequence using, for example the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site:
  • a VRCOl-like light chain nucleic acid sequence is identified as encoding a light chain variable region including a CDR3 including a hydrophobic residue followed by a glutamic acid residue or glutamine residue; and, if the VRCOl-like light chain nucleic acid sequence is derived from IGK1-33 germline, the CDRl of the VRCOl-like light chain variable domain includes at least two glycine residues.
  • a VRCOl-like light chain nucleic acid sequence is identified as encoding a light chain variable region including a CDR3 including a hydrophobic residue followed by a glutamic acid residue or glutamine residue; and, if the germline origin of the VRCOl-like light chain variable domain is not a IGKV1-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises a deletion of two or more amino acids compared to the corresponding germline origin.
  • VRCOl-like light chain sequence is an alanine, valine, leucine, isoleucine methionine, phenylalanine or tyrosine residue.
  • the hydrophobic residue is a tyrosine, leucine or phenylalanine residue.
  • the hydrophobic residue is followed by a glutamic acid residue. In other examples, the hydrophobic residue is followed by a glutamine residue.
  • the nucleic acid sequence encoding the VRCOl-like light chain is selected.
  • the nucleic acid sequence encoding the VRCOl-like light chain is produced or synthesized according to methods described herein and/or methods familiar to the person of ordinary skill.
  • the nucleic acid sequence encoding the identified VRCOl-like light chain is selected and synthesized for further analysis, such as functional complementation.
  • the identified VRCOl-like antibodies are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity.
  • the sample used to produce the nucleic acid sequences (the test population) used the analyses described above is a sample of peripheral blood mononuclear cells.
  • the sample can be a sample of isolated B cells, such as B cells isolated by fluorescence activated cell sorting.
  • B cells are isolated that express IgG.
  • B cells are isolated that express IgG, such as IgG 1; such as by using fluorescence activated cell sorting.
  • the presently described method do not require that B cells be isolated that are of a specific isotype.
  • B cells are purified that specifically bind an antigen of interest, such as gpl20, or that bind a specific epitope of IgG of interest, such as resurfaced stabilized core 3 probe (RSC3).
  • RSC3 resurfaced stabilized core 3 probe
  • B cells are not selected by antigen binding prior to the isolation of nucleic acids and sequencing.
  • Sequence analysis is then performed on nucleic acids from the B cells, to identify immunoglobulin heavy chain sequences, immunoglobulin light chain sequences, or both.
  • the variable domain sequences are obtained from the sample.
  • ultra deep pyrosequencing, or "454 pyrosequencing" are utilized.
  • the nucleic acid molecules derived from the sample comprising B cells are prepared and processed into template molecules amenable for high throughput sequencing.
  • the processing methods may vary from application to application resulting in template molecules comprising various characteristics.
  • template molecules with a sequence or read length that is selected that is at least the length for which a particular sequencing method can accurately produce sequence data.
  • the length can be about 200-300 base pairs, about 350-500 base pairs, 500 base pairs, 500-1,000 base pairs, or other length amenable for a particular sequencing application.
  • nucleic acids from a sample are fragmented using a number of methods known to those of ordinary skill in the art.
  • Some processing methods may employ size selection methods known in the art to selectively isolate nucleic acid fragments of the desired length.
  • Fictional elements can be employed with each template nucleic acid molecule. The elements may be employed for a variety of functions including, but not limited to, primer sequences for amplification (such as to amply heavy chain variable domains) and/or sequencing methods, quality control elements, unique identifiers (also referred to as a multiplex identifier or "MID”) that encode various associations such as with a sample of origin or patient, or other functional element.
  • primer sequences for amplification such as to amply heavy chain variable domains
  • sequencing methods quality control elements
  • unique identifiers also referred to as a multiplex identifier or "MID”
  • the primers are specific for a variable heavy or light chain domain.
  • Exemplary primers are provided in the Examples Section below.
  • Some or all of the described functional elements may be combined into adaptor elements that are coupled to nucleotide sequences in certain processing steps. For example, some embodiments may associate priming sequence elements or regions comprising complementary sequence composition to primer sequences employed for amplification and/or sequencing. Further, the same elements may be employed for what may be referred to as "strand selection" and immobilization of nucleic acid molecules to a solid phase substrate.
  • priming sequence A two sets of priming sequence regions
  • priming sequence B two sets of priming sequence regions
  • priming sequence A two sets of priming sequence regions
  • priming sequence B two sets of priming sequence regions
  • design characteristics of the adaptor elements eliminate the need for strand selection.
  • the same priming sequence regions can be employed in methods for amplification and immobilization where, for instance priming sequence B may be immobilized upon a solid substrate and amplified products are extended therefrom. Additional examples of sample processing for fragmentation, strand selection, and addition of functional elements and adaptors are described in U.S. Patent Application Publication No.
  • Various examples of systems and methods for performing amplification of template nucleic acid molecules to generate populations of substantially identical copies can be utilized.
  • many copies of each nucleic acid element are produced by amplification to generate a stronger signal when one or more nucleotide species is incorporated into each nascent molecule associated with a copy of the template molecule.
  • There are many techniques known in the art for generating copies of nucleic acid molecules such as, for instance, amplification using what are referred to as bacterial vectors, "Rolling Circle" amplification (described in U.S. Pat. Nos.
  • emulsion PCR methods include creating a stable emulsion of two immiscible substances creating aqueous droplets within which reactions may occur.
  • aqueous droplets of an emulsion amenable for use in PCR methods may include a first fluid such as water based fluid suspended or dispersed as droplets (also referred to as a discontinuous phase) within another fluid such as a
  • hydrophobic fluid also referred to as a continuous phase
  • hydrophobic fluid typically includes some type of oil.
  • oil that may be employed include, but are not limited to, mineral oils, silicone based oils, or fluorinated oils.
  • Emulsions can employ surfactants that act to stabilize the emulsion that may be particularly useful for specific processing methods such as PCR.
  • surfactant include one or more of a silicone or fluorinated surfactant.
  • one or more non-ionic surfactants can be employed that include but are not limited to sorbitan monooleate (also referred to as SPANTM80), polyoxyethylenesorbitsan monooleate (also referred to as TWEENTM80), or in some preferred embodiments dimethicone copolyol (also referred to as ABILTM EM90), polysiloxane, polyalkyl polyether copolymer, polyglycerol esters, poloxamers, and PVP/hexadecane copolymers (also referred to as Unimer U- 151), or a high molecular weight silicone polyether in cyclopentasiloxane (also referred to as DC 5225C available from Dow Coming).
  • sorbitan monooleate also referred to as SPANTM80
  • polyoxyethylenesorbitsan monooleate also referred to as TWEENTM80
  • dimethicone copolyol also referred to as ABILTM EM90
  • the droplets of an emulsion may also be referred to as compartments, microcapsules, microreactors, microenvironments, or other name commonly used in the related art.
  • the aqueous droplets may range in size depending on the
  • the described emulsions create the microenvironments within which chemical reactions, such as PCR, may be performed.
  • chemical reactions such as PCR
  • template nucleic acids and all reagents necessary to perform a desired PCR reaction may be encapsulated and chemically isolated in the droplets of an emulsion.
  • Additional surfactants or other stabilizing agent may be employed in some embodiments to promote additional stability of the droplets as described above.
  • Thermocycling operations typical of PCR methods may be executed using the droplets to amplify an encapsulated nucleic acid template resulting in the generation of a population comprising many substantially identical copies of the template nucleic acid.
  • the population within the droplet may be referred to as a "clonally isolated”, “compartmentalized”, “sequestered”, “encapsulated”, or “localized” population.
  • some or all of the described droplets may further encapsulate a solid substrate such as a bead for attachment of template and amplified copies of the template, amplified copies complementary to the template, or combination thereof.
  • the solid substrate may be enabled for attachment of other type of nucleic acids, reagents, labels, or other molecules of interest.
  • target specific amplicons for sequencing are employed that include using sets of specific nucleic acid primers to amplify a selected target region or regions from a sample comprising the target nucleic acid, such as immunoglobulins.
  • the nucleic acid is first subjected to amplification by a pair of PCR primers designed to amplify a region surrounding the region of interest or segment common to the nucleic acid population.
  • Each of the products of the PCR reaction (first amplicons) is subsequently further amplified individually in separate reaction vessels such as an emulsion based vessel described above.
  • the resulting amplicons (referred to herein as second amplicons), each derived from one member of the first population of amplicons, are sequenced and the collection of sequences, from different emulsion PCR amplicons (i.e. second amplicons), are used to determine an allelic frequency.
  • PICOTITREPLATE® array also sometimes referred to as a PTPTM plate or array
  • the described methods can be employed to generate sequence composition for over 100,000, over 300,000, over 500,000, or over 1,000,000 nucleic acid regions per run or experiment. These methods can provide a sensitivity of detection of low abundance alleles which may represent 1% or less of the allelic variants.
  • Another advantage of the methods includes generating data comprising the sequence of the analyzed region. Generally, it is not necessary to have prior knowledge of the sequence of the locus being analyzed.
  • Embodiments of sequencing include Sanger type techniques, techniques generally referred to as Sequencing by Hybridization (SBH), Sequencing by
  • sequencing techniques include colony sequencing techniques; nanopore, waveguide and other single molecule detection techniques; or reversible terminator techniques.
  • One technique of use is Sequencing by Synthesis methods. For example, in some Sanger Bead Sequencing (SBS) embodiments, sequence populations of substantially identical copies of a nucleic acid template and typically employ one or more oligonucleotide primers designed to anneal to a predetermined, complementary position of the sample template molecule or one or more adaptors attached to the template molecule. The primer/template complex is presented with a nucleotide species in the presence of a nucleic acid polymerase enzyme.
  • the polymerase will extend the primer with the nucleotide species.
  • the primer/template complex is presented with a plurality of nucleotide species of interest (typically A, G, C, and T) at once, and the nucleotide species that is complementary at the corresponding sequence position on the sample template molecule directly adjacent to the 3' end of the oligonucleotide primer is incorporated.
  • the nucleotide can be chemically blocked (such as at the 3'-0 position) to prevent further extension, and need to be deblocked prior to the next round of synthesis.
  • the process of adding a nucleotide species to the end of a nascent molecule is substantially the same as that described above for addition to the end of a primer.
  • Incorporation of the nucleotide species can be detected by a variety of methods known in the art, e.g. by detecting the release of pyrophosphate (PPi) (examples described in U.S. Pat. Nos. 6,210,891; 6,258,568; and 6,828,100, each of which is hereby incorporated by reference herein in its entirety for all purposes), or via detectable labels bound to the nucleotides.
  • Some examples of detectable labels include but are not limited to mass tags and fluorescent or chemiluminescent labels.
  • unincorporated nucleotides are removed, for example by washing.
  • the unincorporated nucleotides may be subjected to enzymatic degradation such as, for instance, degradation using the apyrase or pyrophosphatase enzymes.
  • detectable labels they will typically have to be inactivated (e.g. by chemical cleavage or photobleaching) prior to the following cycle of synthesis.
  • the next sequence position in the template/polymerase complex can then be queried with another nucleotide species, or a plurality of nucleotide species of interest, as described above. Repeated cycles of nucleotide addition, extension, signal acquisition, and washing result in a determination of the nucleotide sequence of the template strand.
  • a large number or population of substantially identical template molecules are typically analyzed simultaneously in any one sequencing reaction, in order to achieve a signal which is strong enough for reliable detection.
  • SBS apparatus can include one or more of a detection device such as a charge coupled device (i.e. CCD camera) or a confocal type architecture, a microfluidics chamber or flow cell, a reaction substrate, and/or a pump and flow valves.
  • a detection device such as a charge coupled device (i.e. CCD camera) or a confocal type architecture, a microfluidics chamber or flow cell, a reaction substrate, and/or a pump and flow valves.
  • a detection device such as a charge coupled device (i.e. CCD camera) or a confocal type architecture
  • a microfluidics chamber or flow cell i.e. confocal type architecture
  • a microfluidics chamber or flow cell i.e., a confocal type architecture
  • a microfluidics chamber or flow cell i.e., a confocal type architecture
  • a microfluidics chamber or flow cell i.e.
  • the reaction substrate for sequencing may include what is referred to as a PTPTM array, as described above, formed from a fiber optics faceplate that is acid-etched to yield hundreds of thousands or more of very small wells each enabled to hold a population of substantially identical template molecules (in one example, this is about 3.3 million wells on a 70 X 75mm PTPTM array at a 35 ⁇ well to well pitch).
  • each population of substantially identical template molecule is disposed upon a solid substrate such as a bead, each of which may be disposed in one of said wells.
  • an apparatus can include a reagent delivery element for providing fluid reagents to the PTP plate holders, as well as a CCD type detection device enabled to collect photons of light emitted from each well on the PTP plate.
  • a reagent delivery element for providing fluid reagents to the PTP plate holders
  • a CCD type detection device enabled to collect photons of light emitted from each well on the PTP plate.
  • Systems can be employed that automate one or more sample preparation processes, such as the emPCRTM process described above.
  • automated systems can be employed to provide an efficient solution for generating an emulsion for emPCR processing, performing PCR Thermocycling operations, and enriching for successfully prepared populations of nucleic acid molecules for sequencing. Examples of automated sample preparation systems are described in U.S. Published Patent Application No. 2005/0227264.
  • the systems can include implementation of some design, analysis, or other operation using a computer readable medium stored for execution on a computer system.
  • these computer systems can analyze data generated using SBS systems and methods where the processing and analysis embodiments are implementable on computer systems.
  • Computers typically include known components such as a processor, an operating system, system memory, memory storage devices, input-output controllers, input-output devices, and display devices. They can also include cache memory, a data backup unit, and many other devices.
  • Display devices include display devices that provide visual information, this information typically may be logically and/or physically organized as an array of pixels.
  • An interface controller may also be used that has software programs for providing input and output interfaces.
  • interfaces may include what are generally referred to as "Graphical User Interfaces” (often referred to as GUI's) that provides one or more graphical representations to a user. Interfaces are typically enabled to accept user inputs using means of selection or input.
  • the processor can include a commercially available processor such as a CORETM or PENTIUM® processor made by Intel Corporation, a SPARC® processor made by Sun Microsystems, an ATHALON® or OPTERON® processor made by AMD corporation, or it may be one of other processors that are or will become available.
  • Some embodiments of a processor may include what is referred to as Multi-core processor and/or be enabled to employ parallel processing technology in a single or multi-core configuration.
  • a multi-core architecture typically comprises two or more processor "execution cores.”
  • each execution core may perform as an independent processor that enables parallel execution of multiple threads.
  • a processor may be configured in what is generally referred to as 32 or 64 bit architectures, or other architectural configurations now known or that may be developed in the future.
  • a processor typically executes an operating system, which may be, for example, a
  • System memory may include any of a variety of known memory storage devices. Examples include any commonly available random access memory (RAM), magnetic medium such as a resident hard disk or tape, an optical medium such as a read and write compact disc, or other memory storage device, including a compact disk drive, USB or flash drive, or a diskette drive.
  • RAM random access memory
  • magnetic medium such as a resident hard disk or tape
  • optical medium such as a read and write compact disc
  • other memory storage device including a compact disk drive, USB or flash drive, or a diskette drive.
  • An instrument control and/or a data processing application if implemented in software, may be loaded into and executed from system memory and/or a memory storage device. All or portions of the instrument control and/or data processing applications may also reside in a read-only memory or similar device of the memory storage device, such devices not requiring that the instrument control and/or data processing applications first be loaded through input-output controllers. It will be understood by those skilled in the relevant art that the instrument control and/or data processing applications, or portions of it, may be loaded by a processor in a known manner into system memory, or cache memory, or both, as advantageous for execution.
  • a computer can include one or more library files, experiment data files, and an internet client stored in system memory. For example, experiment data could include data related to one or more experiments or assays such as detected signal values, or other values associated with one or more SBS experiments or processes.
  • the methods disclosed herein have broad applications for of identifying specific antibodies, heavy chains, light chains, classes or species of antibodies with defined specificity, for example as exemplified by the identification of VRCOl-like antibodies disclosed herein. This combination of structural and genomic analysis of
  • Ig may provide a generic way of identifying specific antibodies, as well as classes or species of antibodies with defined specificities. Such antibodies, like VRCOl and related antibodies, can potentially be used for prevention strategies, such as microbicides or passive protection of HIV infection, vaccine design, diagnostics, and therapy of infected individuals.
  • viral antigenic epitopes can be used with the methods disclosed herein to identify classes of antibodies specific for the antigen of interest.
  • Antigens of use in the methods disclosed herein include, but are not limited to, antigenic epitopes from dengue virus, human immunodeficiency virus, influenza virus,
  • metapneumo virus norovirus
  • papillomavirus parvovirus
  • SARS virus smallpox virus
  • picornaviruses respiratory syncitial virus
  • parainfluenza virus measles, hepatitis, measles, varicella zoster, rabies and West Nile virus, among many others.
  • the antigenic epitope is from a virus causes a respiratory disorder (for example, adeno, echo, rhino, coxsackie, influenza, parainfluenza, or respiratory syncytial virus), a digestive disorder (for example, rota, parvo, dane particle, or hepatitis A virus), an epidermal-epithelial disorder (for example, verruca, papilloma, molluscum, rubeola, rubella, small pox, cowpox), a herpes virus disease (for example, varicella-zoster, simplex I, or simplex II virus), an arbovirus disease (for example, dengue, yellow, or hemorrhagic fevers), a viral disease of the central nervous system (for example, polio or rabies), a viral heart disease, or acquired immune deficiency (AIDS).
  • a respiratory disorder for example, adeno, echo, rhino, coxsackie, influenza, parainfluenza, or respiratory
  • the antigenic epitope can also be from a bacteria.
  • bacteria antigenic epitope is a Pyogenic cocci antigen from an organism that causes, for example, staphylococcal, streptococcal, pneumococcal, meningococcal, and gonococcal infections; a gram-negative rod antigen from an organism that causes, for example, E. coli, Klebsiella, enterobacter, pseudomonas, or legionella infections; an antigenic epitope for an organism that causes, for example, hemophilus influenza, bordetella pertussis, or diphtheria infections.
  • enteropathic bacteria for example, S.
  • Exemplary antigens are the CFP10 polypeptide or a domain of other polypeptides of Mycobacterium tuberculosis, or of a domain of the pilus polypeptide of Vibrio cholera, the CjaA polypeptide of Campylobacter coli, the Sfbl polypeptide of Streptococcus pyogenes, the UreB polypeptide Helicobacter pylori, or of other pathogenic organisms such as the circumsporozoite polypeptide of Plasmodium falciparum.
  • Non-limiting examples of bacterial (including mycobacterial) epitopes can be found, for example, in Mei et al., Mol. Microbiol. 26:399-407, 1997; and U. S. Patent Nos. 6,790,950 (gram negative bacteria); 6,790,448 (gram positive bacteria); 6,776,993 and
  • the antigenic epitope is from a Chlamydia that causes ornithosis (C. psittaci), chlamydial urethritis and cervicitis (C. trachomatis), inclusion conjunctivitis (C. trachomatis), trachoma (C. trachomatis), or
  • the antigen epitope is from rickettsia that causes typhus fever (R. prowazekii), Rocky Mountain spotted fever (R. rickettsi), scrub fever (R. tsutsugamushi), or Q fever (Coxiella burnetii).
  • the antigenic epitope is from a fungus, such as Candidae (for example, C. albicans) or Aspergillis (for example, A. fumigatus).
  • the protozoan antigen is from, for example, Giardia Lamblia, Trichomoniasis, Pneumocystosis, Plasmodium, Leishmania, or Toxoplasma.
  • the helminth antigen is from, for example, Trichuris, Necator americanus (hookworm disease), Ancylostoma duodenale (hookworm disease), Trichinella spiralis, or S. mansoni.
  • Tumor antigens include, but are not limited to carcinoembryonic antigen ("CEA:” e.g., GENBANK® Accession No. AAA62835), ras proteins (see, e.g., Parada et al. Nature 297:474-478, 1982), p53 protein (e.g., GENBANK® Accession No. P07193), prostate-specific antigen
  • PSA GENBANK® Accession Nos. NP001639, NP665863
  • Mucl e.g., GENBANK® Accession No. P15941
  • tyrosinase see, e.g., Kwon et al., Proc Natl Acad Sci USA 84:7473-7477, 1987, erratum Proc Natl Acad Sci USA 85:6352, 1988 Melanoma- associated antigen (MAGEs: for examples, see, U.S. Patent Nos:
  • the tumor antigen can be from a tumor of any organ or tissue, including but not limited to solid organ tumors.
  • the tumor can be melanoma, colon-, breast-, lung, cervical-, ovarian, endometrial-, prostate-, skin-, brain-, liver-, kidney, thyroid, pancreatic, esophageal-, or gastric cancer, leukemias, lymphomas, multiple myeloma, myelodysplasia syndrome, premalignant human papilloma virus (HPV)-related lesions, intestinal polyps and other chronic states associated with increased tumor risk.
  • HPV human papilloma virus
  • Methods are disclosed herein for the prevention or treatment of an HIV infection, such as an HIV-1 infection.
  • Prevention can include inhibition of infection with HIV-1.
  • the methods include contacting a cell with an effective amount of the human monoclonal antibodies disclosed herein that specifically binds gpl20, or a functional fragment thereof.
  • the method can also include administering to a subject a therapeutically effective amount of the human monoclonal antibodies to a subject.
  • Methods to assay for neutralization activity include, but are not limited to, a single-cycle infection assay as described in Martin et al. (2003) Nature
  • Biotechnology 21:71-76 In this assay, the level of viral activity is measured via a selectable marker whose activity is reflective of the amount of viable virus in the sample, and the IC50 is determined. In other assays, acute infection can be monitored in the PMl cell line or in primary cells (normal PBMC). In this assay, the level of viral activity can be monitored by determining the p24 concentrations using ELISA. See, for example, Martin et al. (2003) Nature Biotechnology 21:71-76.
  • a composition can decrease HIV infection by a desired amount, for example by at least 10%, at least 20%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or even at least
  • the cell is also contacted with an effective amount of an additional agent, such as anti-viral agent.
  • an additional agent such as anti-viral agent.
  • the cell can be in vivo or in vitro.
  • the methods can include administration of one on more additional agents known in the art.
  • HIV replication can be reduced or inhibited by similar methods. HIV replication does not need to be completely eliminated for the composition to be effective.
  • a composition can decrease HIV replication by a desired amount, for example by at least 10%, at least 20%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or even at least 100% (elimination of detectable HIV), as compared to HIV replication in the absence of the composition.
  • the cell is also contacted with an effective amount of an additional agent, such as antiviral agent.
  • the cell can be in vivo or in vitro.
  • compositions include one or more of the antibodies that specifically bind gpl20, or functional fragments thereof, that are disclosed herein in a carrier.
  • the compositions can be prepared in unit dosage forms for administration to a subject. The amount and timing of administration are at the discretion of the treating physician to achieve the desired purposes.
  • the antibody can be formulated for systemic or local administration. In one example, the antibody that specifically binds gpl20 is formulated for parenteral administration, such as intravenous administration.
  • compositions for administration can include a solution of the antibody that specifically binds gpl20 dissolved in a pharmaceutically acceptable carrier, such as an aqueous carrier.
  • a pharmaceutically acceptable carrier such as an aqueous carrier.
  • aqueous carriers can be used, for example, buffered saline and the like. These solutions are sterile and generally free of undesirable matter.
  • These compositions may be sterilized by conventional, well known sterilization techniques.
  • the compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like.
  • concentration of antibody in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight and the like in accordance with the particular mode of administration selected and the subject's needs.
  • a typical pharmaceutical composition for intravenous administration includes about 0.1 to 10 mg of antibody per subject per day. Dosages from 0.1 up to about 100 mg per subject per day may be used, particularly if the agent is administered to a secluded site and not into the circulatory or lymph system, such as into a body cavity or into a lumen of an organ. Actual methods for preparing administrable compositions will be known or apparent to those skilled in the art and are described in more detail in such publications as Remington's Pharmaceutical Science, 19th ed., Mack Publishing Company, Easton, PA (1995).
  • Antibodies may be provided in lyophilized form and rehydrated with sterile water before administration, although they are also provided in sterile solutions of known concentration. The antibody solution is then added to an infusion bag containing 0.9% sodium chloride, USP, and typically administered at a dosage of from 0.5 to 15 mg/kg of body weight.
  • an infusion bag containing 0.9% sodium chloride, USP, and typically administered at a dosage of from 0.5 to 15 mg/kg of body weight.
  • an initial loading dose of 4 mg/kg may be infused over a period of some 90 minutes, followed by weekly maintenance doses for 4-8 weeks of 2 mg/kg infused over a 30 minute period if the previous dose was well tolerated.
  • a therapeutically effective amount of a human gp 120-specific antibody will depend upon the severity of the disease and/or infection and the general state of the patient's health.
  • a therapeutically effective amount of the antibody is that which provides either subjective relief of a symptom(s) or an objectively identifiable improvement as noted by the clinician or other qualified observer.
  • compositions can be administered in conjunction with another therapeutic agent, either simultaneously or sequentially.
  • administration of the antibody results in a reduction in the establishment of HIV infection and/or reducing subsequent HIV disease progression in a subject.
  • a reduction in the establishment of HIV infection and/or a reduction in subsequent HIV disease progression encompass any statistically significant reduction in HIV activity.
  • methods are disclosed for treating a subject with an HIV-1 infection. These methods include administering to the subject a therapeutically effective amount of an antibody, or a nucleic acid encoding the antibody, thereby preventing or treating the HIV-1 infection. Studies have shown that the rate of HIV transmission from mother to infant is reduced significantly when zidovudine is administered to HIV-infected women during pregnancy and delivery and to the offspring after birth (Connor et al., 1994 Pediatr Infect Dis J 14: 536-541).
  • a therapeutically effective amount of a human gp 120-specific antibody is administered in order to prevent transmission of HIV, or decrease the risk of transmission of HIV, from a mother to an infant.
  • a therapeutically effective amount of the antibody is administered to mother and/or to the child at childbirth.
  • a therapeutically effective amount of the antibody is administered to the mother and/or infant prior to breast feeding in order to prevent viral transmission to the infant or decrease the risk of viral transmission to the infant.
  • both a therapeutically effective amount of the antibody and a therapeutically effective amount of another agent, such as zidovudine is administered to the mother and/or infant.
  • the antibody can be combined with anti-retroviral therapy.
  • Antiretroviral drugs are broadly classified by the phase of the retrovirus life-cycle that the drug inhibits.
  • the disclosed antibodies can be administered in conjunction with Nucleoside and nucleotide reverse transcriptase inhibitors (nRTI), Non-nucleoside reverse transcriptase inhibitors (NNRTI), Protease inhibitors, Entry inhibitors (or fusion inhibitors), Maturation inhibitors, or a Broad spectrum inhibitors, such as natural antivirals.
  • Exemplary agents include lopinavir, ritonavir, zidovudine, lamivudine, tenofovir, emtricitabine and efavirenz.
  • compositions including the antibodies disclosed herein are administered depending on the dosage and frequency as required and tolerated by the patient.
  • the composition should provide a sufficient quantity of at least one of the antibodies disclosed herein to effectively treat the patient.
  • the dosage can be administered once but may be applied periodically until either a therapeutic result is achieved or until side effects warrant discontinuation of therapy.
  • a dose of the antibody is infused for thirty minutes every other day.
  • about one to about ten doses can be administered, such as three or six doses can be administered every other day.
  • a continuous infusion is administered for about five to about ten days.
  • the subject can be treated at regular intervals, such as monthly, until a desired therapeutic result is achieved.
  • the dose is sufficient to treat or ameliorate symptoms or signs of disease without producing unacceptable toxicity to the patient.
  • Controlled-release parenteral formulations can be made as implants, oily injections, or as particulate systems.
  • Therapeutic Peptides and Proteins see, Banga, A.J., Therapeutic Peptides and Proteins: Formulation,
  • Particulate systems include microspheres, microparticles,
  • Microcapsules contain the therapeutic protein, such as a cytotoxin or a drug, as a central core. In microspheres the therapeutic is dispersed throughout the particle. Particles, microspheres, and microcapsules smaller than about 1 ⁇ are generally referred to as nanoparticles, nanospheres, and nanocapsules, respectively.
  • Capillaries have a diameter of approximately 5 ⁇ so that only nanoparticles are administered intravenously. Microparticles are typically around 100 ⁇ in diameter and are administered subcutaneously or intramuscularly. See, for example, Kreuter, J., Colloidal Drug Delivery Systems, J.
  • Polymers can be used for ion-controlled release of the antibody compositions disclosed herein.
  • Various degradable and nondegradable polymeric matrices for use in controlled drug delivery are known in the art (Langer, Accounts Chem. Res.
  • the block copolymer, polaxamer 407 exists as a viscous yet mobile liquid at low temperatures but forms a semisolid gel at body temperature. It has been shown to be an effective vehicle for formulation and sustained delivery of recombinant interleukin-2 and urease (Johnston et ah, Pharm. Res. 9:425-434, 1992; and Pec et al, J. Parent. Sci. Tech. 44(2):58-65, 1990).
  • hydroxyapatite has been used as a microcarrier for controlled release of proteins (Ijntema et al., Int. J. Pharm.112:215-224, 1994).
  • liposomes are used for controlled release as well as drug targeting of the lipid- capsulated drug (Betageri et ah, Liposome Drug Delivery Systems, Technomic Publishing Co., Inc., Lancaster, PA (1993)). Numerous additional systems for controlled delivery of therapeutic proteins are known (see U.S. Patent No.
  • a method is provided herein for the detection of the expression of gpl20 in vitro or in vivo.
  • expression of gpl20 is detected in a biological sample, and can be used to detect HIV-1 infection.
  • the sample can be any sample, including, but not limited to, tissue from biopsies, autopsies and pathology specimens.
  • Biological samples also include sections of tissues, for example, frozen sections taken for histological purposes.
  • Biological samples further include body fluids, such as blood, serum, plasma, sputum, spinal fluid or urine.
  • a method for detecting AIDS and/or an
  • the disclosure provides a method for detecting HIV-1 in a biological sample, wherein the method includes contacting a biological sample with the antibody under conditions conducive to the formation of an immune complex, and detecting the immune complex, to detect the gpl20 in the biological sample.
  • the detection of gpl20 in the sample indicates that the subject has an HIV infection.
  • the detection of gpl20 in the sample indicates that the subject has AIDS.
  • detection of gpl20 in the sample confirms a diagnosis of AIDS and/or an HIV-1 infection in a subject.
  • the disclosed antibodies are used to test vaccines. For example to test if a vaccine composition assumes the same conformation as a gpl20 peptide.
  • a method for detecting testing a vaccine wherein the method includes contacting a sample containing the vaccine, such as a gpl20 immunogen, with the antibody under conditions conducive to the formation of an immune complex, and detecting the immune complex, to detect the vaccine g in the sample.
  • the detection of the immune complex in the sample indicates that vaccine component, such as such as a gpl20 immunogen assumes a conformation capable of binding the antibody.
  • the antibody is directly labeled with a detectable label.
  • the antibody that binds gpl20 (the first antibody) is unlabeled and a second antibody or other molecule that can bind the antibody that binds gpl20 is utilized.
  • a second antibody is chosen that is able to specifically bind the specific species and class of the first antibody.
  • the first antibody is a human IgG
  • the secondary antibody may be an anti-human-lgG.
  • Other molecules that can bind to antibodies include, without limitation, Protein A and Protein G, both of which are available commercially.
  • Suitable labels for the antibody or secondary antibody include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, magnetic agents and radioactive materials.
  • suitable enzymes include horseradish peroxidase, alkaline phosphatase, beta- galactosidase, or acetylcholinesterase.
  • suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin.
  • suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin.
  • a non-limiting exemplary luminescent material is luminol; a non-limiting exemplary a magnetic agent is gadolinium, and non-limiting exemplary radioactive labels include 125 I, 131 I, 35 S or 3 H.
  • Kits for detecting a polypeptide will typically comprise an antibody that binds gpl20, such as any of the antibodies disclosed herein.
  • an antibody fragment such as an Fv fragment or a Fab is included in the kit.
  • the antibody is labeled (for example, with a fluorescent, radioactive, or an enzymatic label).
  • kits includes instructional materials disclosing means of use.
  • the instructional materials may be written, in an electronic form (such as a computer diskette or compact disk) or may be visual (such as video files).
  • the kits may also include additional components to facilitate the particular application for which the kit is designed.
  • the kit may additionally contain means of detecting a label (such as enzyme substrates for enzymatic labels, filter sets to detect fluorescent labels, appropriate secondary labels such as a secondary antibody, or the like).
  • the kits may additionally include buffers and other reagents routinely used for the practice of a particular method. Such kits and appropriate contents are well known to those of skill in the art.
  • the diagnostic kit comprises an immunoassay.
  • the method of detecting gpl20 in a biological sample generally includes the steps of contacting the biological sample with an antibody which specifically reacts, under immunologically reactive conditions, to gpl20.
  • the antibody is allowed to specifically bind under immunologically reactive conditions to form an immune complex, and the presence of the immune complex (bound antibody) is detected directly or indirectly.
  • bound antibody bound antibody
  • HIV-1 exhibits extraordinary genetic diversity and has evolved multiple mechanisms of resistance to evade the humoral immune response. Despite these obstacles, 10-25% of HIV-1 -infected individuals develop cross-reactive neutralizing antibodies after several years of infection. Elicitation of such antibodies could form the basis for an effective HIV-1 vaccine, and intense effort has focused on identifying responsible antibodies and delineating their characteristics.
  • mAbs monoclonal antibodies
  • Some broadly neutralizing antibodies are directed against the membrane-proximal external region of gp41, but the majority recognize gpl20.
  • Antibodies typically accumulate 5- 10% changes in variable domain-amino acid sequence during the affinity maturation process, but for these gpl20- reactive antibodies, the degree of somatic mutation is markedly increased, ranging from -15- 20% for the quaternary structure-preferring antibodies and antibody HG16, to -25% for antibody 2G12 and to -30% for the CD4-binding-site antibodies, VRCOl, VRC02, and VRC03.
  • the mature antibody accumulates almost 70 total changes in amino acid sequence during the maturation process.
  • the mature VRCOl can neutralize -90% of HIV-1 isolates at a geometric mean IC50 of 0.3 ⁇ g/ml, and structural studies show that it achieves 4 this neutralization by precisely recognizing the initial site of CD4 attachment on HIV-1 gpl20.
  • the predicted unmutated germline ancestor of VRCOl has weak affinity for typical strains of gpl20 ( ⁇ mM).
  • donor 45 it has been unclear whether the VRCOl mode of recognition, genetic origin, and pathway of affinity maturation represent general features of the B-cell response to HIV-1.
  • VRCOl-like antibodies were isolated from two additional HIV-1- infected donors, determine their liganded-crystal structures with gpl20, examine cross-donor complementation of heavy and light chain function, and use deep sequencing to analyze the repertoire, lineage, and maturation pathways of related antibody sequences in each of two donors. The analysis presented here focuses primarily on the heavy chain, although some analysis of the light chain is also undertaken. Definition of the structural convergence and maturation pathways by which VRCOl-like antibodies achieve broad neutralization of HIV- 1 provides a framework for understanding the development of these antibodies and for efforts to guide their induction.
  • PBMCs Peripheral blood mononuclear cells
  • protocol G donor 74 infected with A/D recombinant HIV-1
  • CHAVI donor 0219 infected with clade A HIV-1
  • PBMCs were incubated with both RSC3 and ARSC3, each conjugated to a different fluorochrome, and flow cytometric analysis was used to identify and to sort individual IgG+ B cells reactive with RSC3 and not ARSC3.
  • donor 74 and 0219 respectively, a total of 0.13% and 0.15% of IgG+ B cells were identified (FIGS. IB and 8).
  • the heavy and light chain immunoglobulin genes from individual B-cells were amplified and cloned into IgGl expression vectors that reconstituted the full IgG. From donor 74, two somatically related antibodies named VRC-PG04 and
  • VRC-PG04b demonstrated strong binding to several versions of gpl20 and to RSC3 but ⁇ 100-fold less binding to ARSC3 (FIG. 9 and FIG. 24).
  • donor 0219 three somatically related antibodies named VRC-CH30, 31, and 32 displayed a similar pattern of RSC3/ARSC3 reactivity (FIG. 9 and FIG. 24).
  • Sequence analysis of these two sets of unique antibodies revealed that they originated from the same inferred immunoglobulin heavy chain variable (IGHV) precursor gene allele IGHV 1-2*02. Despite this similarity in heavy chain V- gene origin, the two unique antibody clones originated from different heavy chain J segment genes and contained different light chains.
  • IGHV immunoglobulin heavy chain variable
  • VRC-PG04 and 04b somatic variants originated from an IGKV3 allele while the VRC-CH30, 31 and 32 somatic variants derived from an IGKVI allele.
  • all five antibodies contained unusually high mutation frequencies: VRC-PG04 and 04b displayed a VH gene mutation frequency of 30% relative to the germline IGHVl-2*02 allele, a level of affinity maturation similar to that previously observed with VRCOl-03; the VRC- CH30, 31 and 32 antibodies were also highly affinity matured, with VH mutation frequency of 23-24%.
  • ELISAs were performed with a panel of well-characterized mAbs. Binding by each of the new antibodies was competed by VRCOl-03, by other CD4-binding-site antibodies and by CD4-Ig, but not by antibodies known to bind gpl20 at other sites (FIGs. ID and 10). Despite similarities in gpl20 reactivity and VH-genomic origin, sequence similarities of heavy and light chain gene regions did not readily account for their common mode of gpl20 recognition (FIG. IE). Finally, assessment of VRC-PG04 and VRC-CH31 neutralization on a panel of Env-pseudoviruses revealed their ability to potently neutralize a majority of diverse HIV-1 isolates (FIG. IF and FIGs. 25-28).
  • VRC03 and VRC-PG04 share only 51% heavy chain- variable protein sequence identity, and the heavy chain of VRC03 contains an unusual insertion in the framework 3 region.
  • Diffraction data to 1.9 A resolution were collected from orthorhombic crystals, and the structure solved by molecular replacement and refined to a crystallographic R-value of 18.7% (FIG. 2, FIG. 29, FIG. 32 and FIG. 33).
  • VRC03 also showed recognition of gpl20 that was strikingly similar to that of VRC-PG04 and VRCOl, with pairwise rmsds in Ca-atoms of 2.4 A and 1.9 A.
  • CDR H2 and CDR L3 regions showed similar recognition (pairwise Ca- rmds ranged from 0.5 - 1.4 A) (FIG. 11).
  • the light chains for donors 45 and 74 antibodies arise from either IGVK3-11*01 or IGVK3-20*01, while the light chains of donor 0219 antibodies are derived from IGVK1-33*01. For these light chains, no maturational changes are identical. Despite this diversity in maturation, comparison of the
  • VRCOl, VRC03, and VRC-PG04 paratopes shows that many of these changes are of conserved chemical character (FIG. 3C); a hydrophobic patch in the CDR L3, for example, is preserved.
  • variable regions of heavy and light chains are roughly 400 nucleotides in length
  • 454 pyrosequencing methods which allow read lengths of 500 nucleotides, were used for deep sequencing.
  • First heavy chain sequences were assessed from a 2008 PBMC sample from donor 45, the same time point from which antibodies VRC01 , VRC02, and VRC03 were isolated by RSC3-probing of the memory B-cell population.
  • mRNA from 5 million PBMC was used as the template for PCR to preferentially amplify the IgG and IgM genes from the IGHV1 family.
  • 454 pyrosequencing provided 221,104 sequences of which 33,386 encoded heavy chain variable domains that encompassed the entire V(D)J region.
  • characteristics particular to the heavy chains of VRCOl and VRC03 were chosen as filters: (i) sequence identity, (ii) IGHV gene allele origin, and (iii) sequence divergence from the germline IGHV-gene as a result of affinity maturation (FIG. 4B). Specifically, sequences were divided into IGHV 1-2*02 allelic origin (4597 sequences) and non-IGHV 1-2*02 origin (28,789 sequences), and divergence analyzed from inferred germline genes, and sequence identity to the template antibodies VRCOl and VRC03 (FIG. 4B).
  • a donor 45 2001 time point was chosen to maximize the likelihood of obtaining light chain sequences capable of functional complementation.
  • a total of 305,475 sequences were determined of which 87,658 sequences encompassed the V-J region of the light chain.
  • a biologically specific characteristics were chosen: A distinctive 2-amino acid deletion in the first complementary-determining region and high affinity maturation (17% and 19% for VRCOl and VRC-PG04, respectively).
  • Their biological function was assessed after synthesis in combination with the VRCOl, VRC03, and VRC-PG04 heavy chains (FIG. 39).
  • both chimeric antibodies displayed neutralization similar to the wild type antibody (FIG. 4D and FIG. 39).
  • the donor 74-derived IGHV 1-2*02 heavy chain sequences were also assessed by including probe-identified VRCOl -like antibodies from donor 45 and donor 0219 in the phylogenetic analysis.
  • 5047 sequences segregated within the donor 45 and 0219-identified subtree (FIG. 5A, right). This subtree included the actual VRC- PG04 and 04b heavy chain sequences, 4693 sequences of >85 identity to VRC- PG04, and several hundred sequences with identities as low as 68% to VRCPG04.
  • This cross-donor segregation method was also applied to the light chain antibodyome of donor 45.
  • the light chains from donor 74 and 0219 did not segregate with known VRCOl-like light chains from donor 45 (FIG. 18), likely because these three light chains do not arise from the same inferred germline sequences.
  • This difference may also reflect the dissimilarities in focused maturation of the two chains (see FIG. 3A): in the heavy chain, focused maturation occurs in the CDR H2 region (encompassed solely within the 2*02 VH gene from which all VRCOl-like heavy chains derive) and, in the light chain, selection pressures occur in the CDR L3 region (which is a product of different types of V-J recombination).
  • CDR H3-1, 2, and 9 Three of these classes (CDR H3-1, 2, and 9) were represented only by non-neutralizing antibodies, three by a single neutralizing antibody (CDR H3-4, 5 and 6), and three by a mixtures of neutralizing and non-neutralizing antibodies (CDR H3-3, 7 and 8).
  • Donor 74 IGHV 1-2*02 heavy chain sequences were further analyzed to identify those with CDR H3 sequences identical to the CDR H3s in each of the neutralizing classes (FIG. 7).
  • This analysis identified four clonal lineages (CDR H3- classes 3, 6, 7 and 8), with sequences that extended to 15% or less affinity maturation.
  • CDR H3 class 7 included the probe-identified antibodies, VRC-PG04 and 04b. In each case, a steady accumulation of changes lead to increased neutralization activity, and changes at positions 48, 52, 58, 69, 74, 82 and 94 in the V gene, among others, appeared to be selected in several lineages (FIG. 7).
  • Antibody genomics, HIV-1 immunity, and vaccine implications Affinity maturation that focuses a developing antibody onto a conserved site of HIV- 1 vulnerability provides a mechanism to achieve broad recognition of HIV-1 gpl20.
  • Such focused evolution may be common to broadly neutralizing antibodies that succeed in overcoming the immune evasion that protect HIV-1 gpl20 from humoral recognition; the multiple layers of evasion may constrain or focus the development of nascent antibodies to particular pathways during maturation.
  • the structure -based genomics approach described here provides tools for understanding antibody maturation. It is disclosed herein how deep sequencing can be utilized to determine the repertoire of sequences that compose the light chain and heavy chain antibodyomes in HIV-1 infected individuals. These antibodyomes can then be interrogated for unusual properties in sequence, or in maturation, to identify antibodies for functional characterization. Three means of sieving a large database of antibody sequences are disclosed herein: 1) by identity to a known mAb sequence and by divergence from putative germline (identity/divergence- grid analysis), 2) by cross-donor phylogenetic analysis of maturation pathway relationships, and 3) by CDR H3-lineage analysis.
  • FIG. S16 The deep sequencing and structural bioinformatics methodologies presented here facilitate analysis of the human antibodyome (FIG. S16).
  • This genomics technology allows interrogation of the antibody responses from infected donors, uninfected individuals or even vaccine recipients and has several implications.
  • a genomic rooted phylogenetic analysis of the VRCOl antibodyome may reveal a general maturation pathway for the production of VRCOl-like antibodies.
  • cross-donor phylogenetic analysis (FIG. 5B) suggests that common maturation intermediates with 20-30 affinity maturation changes from the IGHV1- 2*02 genomic precursor are found in different individuals. These intermediates give rise to mature, broadly neutralizing VRCOl-like antibodies, which have about 70-90 changes from the IGHV1- 2*02 precursor (FIG. 5).
  • PBMCs peripheral blood mononuclear cells
  • CHVI HIV/AIDS vaccine immunology
  • Donor 45 from whom monoclonal antibodies (mAbs) VRCOl, VRC02 and VRC03 were isolated, was infected with an HIV-1 clade B virus.
  • Donor 0219 from whom mAbs VRC-CH30, VRC-CH31 and VRC-CH32 were isolated, was infected with a clade A virus. These three donors were chronically infected and had not initiated antiretroviral treatment at the time of PBMC sampling.
  • Monomeric gp 120s, gp 120 with the CD4-binding site knockout mutation D368R, gpl20 cores, RSC3 and ARSC3 were expressed by transient transfection of 293F cells. Briefly, genes encoding the proteins of interest were each synthesized with a C-terminal His tag (GeneArt, Regensburg, Germany), and cloned into a mammalian CMV/R expression vector. Proteins were produced by transient transfection using 293fectin (Invitrogen, Carlsbad, CA) in 293F cells (Invitrogen) maintained in serum-free free-style medium (Invitrogen).
  • Culture supernatants were harvested 5 - 6 days after transfection, filtered through a 0.45 ⁇ filter, and concentrated with buffer exchange into 500 mM NaCl, 50 mM Tris (pH 8.0). Proteins were purified by Co-NTA (cobaltnitrilotriacetic acid) chromatography method using a HiTrap IMAC HP column (GE Healthcare, Piscataway, NJ). The peak fractions were collected, and further purified by gel-filtration using a HiLoad 16/60 Superdex 200 pg column (GE Healthcare). The fractions containing monomers of each protein were combined, concentrated and flash frozen at -80°C.
  • Co-NTA cobaltnitrilotriacetic acid
  • FACS fluorescence activated cell sorting
  • RSC proteins was confirmed by ELISA. The proteins were then conjugated with the streptavidin fluorochrome reagents, streptavidin-allophycocyanin (SA-APC) (Invitrogen) for RSC3 and streptavidin-phycoerythrin (SA-PE) (Sigma) for ARSC3.
  • SA-APC streptavidin-allophycocyanin
  • SA-PE streptavidin-phycoerythrin
  • the lysis buffer contained 0.5 ⁇ of RNase Out (Invitrogen), 5 ⁇ of 5x first strand buffer (Invitrogen), 1.25 ⁇ of 0.1M DTT (Invitrogen) and 0.0625 ⁇ of Igepal (Sigma).
  • the PCR plates with sorted cells were stored at -80°C.
  • the total content of the donor PBMC sample passing through the sorter was saved in FCS files for further analysis with FlowJo software (TreeStar, Cupertino, CA).
  • the cDNA plates were stored at -20°C, and the IgH, IgK and Ig variable region genes were amplified independently by nested PCR starting from 5 ⁇ of cDNA as template. All PCRs were performed in 96-well PCR plates in a total volume of 50 ⁇ containing water, 5 ⁇ of lOx buffer, 1 ⁇ of dNTP mix, each at 10 mM, 1 ⁇ of MgCl 2 at 25 mM (Qiagen, Valencia, CA) for 1st round PCR or 10 ⁇ 5x Q-Solution (Qiagen) for 2nd round PCR, 1 ⁇ of primer or primer mix for each direction at 25 ⁇ , and 0.4 ⁇ of HotStar Taq DNA polymerase (Qiagen).
  • PCR was initiated at 94°C for 5 min, followed by 50 cycles of 94°C for 30 sec, 58°C for IgH and IgK or 60°C for Ig for 30 sec, and 72°C for 1 min, followed by 72°C for 10 min.
  • the positive 2nd round PCR products were cherry-picked for direct sequencing with both forward and reverse PCR primers.
  • PCR products that gave a productive IgH, IgK or Ig rearranged sequence were reamplified from the 1st round PCR using custom primers containing unique restriction digest sites and subsequently cloned into the corresponding Igyl, IgK and Ig expression vectors.
  • the full-length IgGl was expressed by co-transfection of 293F cells with equal amount of the paired heavy and light chain plasmids, and purified using a recombinant protein-A column (GE Healthcare).
  • IgG gene family analysis IgG gene family analysis.
  • IgG gene family analysis The IgG heavy and light chain nucleotide sequences of the variable region were analyzed with
  • joinSolver® (which can be found on the world wide web at joinsolver.niaid.nih.gov) and IMGT/V-Quest (which can be found on the world wide web at joinsolver.niaid.nih.gov) and IMGT/V-Quest (which can be found on the world wide web at joinsolver.niaid.nih.gov) and IMGT/V-Quest (which can be found on the world wide web at
  • VRC mAb VK gene use was determined by homology to germline genes in the major 2pl 1.2 IGK locus.
  • the VRC mAb D gene use was determined by homology to genes in the major 14q32.33 IGH locus.
  • a combination of consecutive matching length with a +1/-2.02 scoring algorithm in the context of the V to J distance was applied for determining IGHD alignments and VD and DJ junctions in mutated sequences. Immunoglobulin rearrangements were grouped into classes based upon the VDJ gene use, similarity of replacement and silent mutations and the CDR3 identity.
  • serial dilutions of the competitor antibodies or CD4-Ig were added to the captured gpl20 or RSC3 in 50 ⁇ of B3T buffer, followed by adding 50 ⁇ of biotin-labeled antibody or CD4-Ig at fixed concentrations: 200 ng/ml of VRC-PG04 and 500 ng/ml of VRC-CH31 to bind to YU2 gp 120 or RSC3, 150 ng/ml of CD4-Ig and 80 ng/ml of 17b to bind to YU2 gpl20.
  • the plates were incubated at 37°C for 1 hour, followed by incubation with 250 ng/ml of streptavidin- HRP (Sigma) at room temperature for 30 min, and developed with TMB as described above.
  • HIV-1 neutralization and protein competition assays were measured using single-round-of-infection HIV-1 Env-pseudo viruses and TZM-bl target cells. Neutralization curves were fit by nonlinear regression using a 5- parameter hill slope equation. The 50% and 80% inhibitory concentrations (IC50 and IC80) were reported as the antibody concentrations required to inhibit infection by 50% and 80% respectively. Competition of serum or mAb neutralization was assessed by adding a fixed concentration (25 ⁇ g/ml) of the RSC3 or ARSC3 glycoprotein to serial dilutions of antibody for 15 min prior to the addition of virus. The resulting IC50 values were compared to the control with mock PBS added. The neutralization blocking effect of the proteins was calculated as the percent reduction in the ID50 (50% inhibitory dilution) value of the serum in the presence of protein compared to PBS.
  • HIV-1 envelope sequence phylogenetic trees Construction of the HIV-1 envelope sequence phylogenetic trees. HIV-1 gpl60 protein sequences of the 180 isolates used in the neutralization assays were aligned using MUSCLE, for multiple sequence comparison by log-expectation. The protein distance matrix was 5 calculated by "protdist” and the dendrogram was constructed using the neighbor-joining method by "Neighbor”. All analysis and the programs used were performed at the NIAID Biocluster (which can be found on the world wide web at niaid-biocluster.niaid.nih.gov/). The tree was displayed with Dendroscope.
  • the same HIV-1 clade A/E 93TH057 AV123 gpl20 that crystallized with VRCOl was used to form complexes with antibodies VRC03 and VRC-PG04 for crystallization trials.
  • the gpl20 was expressed, purified and deglycosylated.
  • the antigen-binding fragments (Fabs) of VRC-PG04 and VRC03 were generated by LyS-C (Roche) digestion of IgGl.
  • the gpl20: VRC-PG04 or gpl20:VRC03 complexes were formed by mixing deglycosylated 93TH057 gpl20 and antibody Fabs (1: 1.2 molar ratio) at room temperature and purified by size exclusion chromatography (Hiload 26/60 Superdex S200 prep grade, GE Healthcare) with buffer containing 0.35 M NaCl, 2.5 mM Tris pH 7.0, 0.02% NaN3. Fractions with gp 120: antibody complexes were concentrated to -10 mg/ml, flash frozen with liquid nitrogen before storing at -80°C and used for crystallization screening experiments. Three commercially available screens, Hampton Crystal Screen (Hampton
  • gp 120 antibody complexes.
  • Vapor-diffusion sitting drops were set up robotically by mixing 0.1 ⁇ of protein with an equal volume of precipitant solutions (Honeybee, DigiLab). Droplets were allowed to equilibrate at 20° C and imaged at scheduled times with Rocklmager (Formulatrix.). Robotic crystal hits were optimized manually using the hanging drop vapor-diffusion method. Crystals of diffraction- quality for the gpl20:VRC03 complex were obtained at 9 % PEG 4000, 200 mM Li2S0 4 , 100 mM Tris/Cl-, pH 8.5. For the gpl20:VRC-PG04 complex, best crystals were grown in 9.9% PEG 4000, 9.0 % isopropanol, 100 mM Li2S0 4 , 100 mM HEPES, pH 7.5.
  • X-ray data collection, structure determination and refinement for the gpl20:VRC-PG04 and gpl20:VRC03 complexes Diffraction data of the gpl20:VRC03 and gpl20:VRC-PG04 crystals were collected under cryogenic conditions. Best cryo-protectant conditions were obtained 6 by screening several commonly used cryo-protectants.
  • X-ray diffraction data were collected at beam-line ID-22 (SER-CAT) at the Advanced Photon Source, Argonne National Laboratory, with 1.0000 A radiation, processed and reduced with HKL2000.
  • gpl20:VRC-PG04 crystals a 2.0 A data set was collected using a cryoprotectant solution containing 18.0 % PEG 4000, 10.0 % isopropanol, 100 mM Li2S04, 100 mM HEPES, pH 7.5, 12.5 % glycerol and 7.5 % 2R,3R-butanediol.
  • a 1.9 A data set was collected using a cryoprotectant solution containing 15% PEG4000, 200 niM Li2S04, 100 niM Tris/Cl-, pH 8.5 and 30% ethylene glycol.
  • the crystal structures of gpl20:VRC-PG04 and gpl20:VRC03 complexes were solved by molecular replacement using Phaser in the CCP4
  • the structure of 93TH057 gpl20 in the previously solved VRCOl complex (PDB ID 3NGB) was used as an initial model to place gpl20 in the complexes.
  • Residues Table was also computed over the three structures. Residues with positive average PISA AiG were deemed to participate in hydrophobic interactions and were included in the correlation analysis against the respective per-residue Ca deviations.
  • the CD4-defined initial site of vulnerability included the following gpl20 residues: 257, 279, 280, 281, 282, 283, 365, 366, 367, 368, 370, 371, 455, 456, 457, 458, 459, 460, 469, 472, 473, 474, 475, 476, 477.
  • the interface surface areas on gpl20 were determined using the PISA server. In each case, the interface surface area corresponding to the residues from the initial site of vulnerability was termed 'Inside', while the remaining interface surface area was termed 'Outside'. Targeting precision was defined as the function 'Inside - Outside'.
  • the neutralization breadth of CD4-Ig and the different antibodies was determined using IC80 values for Tier 2 viruses, as obtained from: (VRCOl, VRC03, bl2, and CD4-Ig), (bl3 and F105), and the present study (VRC-PG04).
  • the cDNAs from each sample were combined, cleaned up and eluted in 20 ⁇ of elution buffer (NucleoSpin Extract II kit, Clontech). Therefore, 1 ⁇ of the cDNA was equivalent of transcripts from 1 million PBMC.
  • the immunoglobulin gene- specific PCRs were set up using 5 ⁇ of the cDNA as template (equivalent of transcripts from 5 million PBMC), using the Platinum Taq DNA Polymerase High Fidelity system (Invitrogen) in a total volume of 50 ⁇ .
  • the reaction mix was composed of water, 5 ⁇ of lOx buffer, 2 ⁇ of dNTP mix, each at 10 mM, 2 ⁇ of MgSC"4, 1 ⁇ of each primer at 25 ⁇ , and 1 ⁇ of platinum Taq DNA polymerase high fidelity.
  • the forward primers for VH1 gene amplification were 5'L-VH1, 5 ' AC AGGTGCCC ACTCCC AGGTGC AG 3' (SEQ ID NO: 2495); 5'L-VH1#2, 5'GCAGCCACAGGTGCCCACTCC3'(SEQ ID NO: 2496); 5'L-VHl-24,
  • PCR products were quantified using Qubit (Life Technologies, Carlsbad, CA). Following end repair 454 adapters were added by ligation. Library concentrations were determined using the KAPA Biosystems qPCR system (Woburn, MA) with 454 standards provided in the KAPA system.
  • 454 pyrosequencing 454 pyrosequencing of the PCR products was performed on a GS FLX sequencing instrument (Roche-454 Life Sciences, Bradford, CT) using the manufacturer's suggested methods and reagents. Initial image collection was performed on the GS FLX instrument and subsequent signal processing, quality filtering, and generation of nucleotide sequence and quality scores were performed on an off-instrument linux cluster using 454 application software (version 2.5.3). The amplicon quality filtering parameters were adjusted based on the manufacturer's recommendations (Roche-454 Life Sciences
  • Quality scores were assigned to each nucleotide using methodologies incorporated into the 454 application software to convert flowgram intensity values to Phred-based quality scores. The quality of each run was assessed by analysis of internal control sequences included in the 454 sequencing reagents. Reports were generated for each region of the PicoTiterPlate (PTP) for both the internal controls and the samples.
  • PTP PicoTiterPlate
  • antibodyomes A general bioinformatics pipeline has been developed to process and analyze 454 pyrosequencingdetermined antibodyomes. The information generated in each step of the process (see Appendices 1-4) was used to characterize the basic features of antibodyomes as well as to identify potential neutralizing antibody sequences for functional validation. Specifically, each sequence read was (1) reformatted and labeled with a unique index number; (2) assigned to variable (V) gene family and allele using an in-house implementation of IgBLAST
  • VRCOl-like antibodies (6) filtered using characteristic sequence motifs in variable domain sequence such as QVQ (or other possible triplets) at the N-terminus, CAR (or other possible triplets) at the end of V region, WGXG at the end of CDR H3, and
  • VSS (or other possible triplets) at the C-terminus of variable domain.
  • structural compatibility of a 454-pyrosequencing-derived heavy- or light- chain sequence with known VRCOl-like antibody/gpl20 complex structures can be evaluated by threading.
  • phylogenetic analysis was performed with maximum likelihood phylogenetic algorithms on a set of "representative" nucleotide sequences encompassing the heavy-chain variable domain from donors 45 and 74. Sequence selection involved dividing all of the IGHVl-2*02-originated sequences from each donor into 50 bins based on the sequence divergence from germline. The number 50 was arbitrary but sufficient to provide a set of sequences with different divergence suitable for building the phylogenetic trees shown in Fig. 5A. The first bin contained sequences with 0.0 to 0.7% divergence from the IGHVl-2*02 germline gene and each subsequent bin contained sequences with an increment of 0.7% germline divergence.
  • the 50 bins covered sequences with up to 35% germline divergence, which is approximately the highest germline divergence seen in these donors.
  • One sequence was randomly selected from each bin to represent sequences within that bin.
  • a total of 38 sequences were selected from donor 45, as 12 bins did not contain any sequences, and 50 from donor 74.
  • nucleotide sequences of heavy- chain variable domains of known neutralizing mAbs VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 , VRC-CH32 and the inferred reverted unmutated ancestors of VRCOl (VRCOl H germline) and VRC-PG04 (VRC-PG04 H germline) were added to each of the two data sets.
  • the sequences within each set were aligned using ClustalW, and the multiple sequence alignment was provided as input to construct phylogenetic trees using DNAMLK (for 11 DNA Maximum Likelihood program with Molecular Clock)
  • the IGHVl-2*02 germline sequence was used as template in the manual correction.
  • the DNAMLK program was also applied to 63 heavy-chain nucleotide sequences selected for expression from donor 74.
  • the maximum likelihood phylogenetic analysis was also carried out on the amino acid sequences using PROMLK (for Protein Maximum Likelihood program with Molecular Clock) (evolution.genetics.washington.edu/phylip/doc/promlk.html) with default parameters.
  • the topology of the maximum likelihood trees generated by DNAMLK and PROMLK appeared to be similar.
  • sequences clustered in a branch containing neutralizing mAbs VRCOl, VRC02, VRC03 and VRC-PG04 were extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis. The procedure was repeated until convergence, where all the sequences resided within a subtree containing VRCOl, VRC02, VRC03 and VRC- PG04 and no other sequences resided between this subtree and the root, and where further repeat of the analysis did not change the NJ tree.
  • 109 sequences from donor 45 and 5,047 sequences from donor 74 remained. Among them, 45 from donor 45 and 1,889 from donor 74 were unique sequences as identified using the "blastclust" module in the NCBI BLAST package. These numbers are reported in Fig. 5A, and the last NJ tree with 5,047 sequences remaining from donor 74.
  • CDR H3 lineage Due to the sequence variation, we adopted a template-based approach to CDR H3 identification for 454-pyrosequencing- determined heavy chain sequences. Specifically, a 454-derived heavy chain sequence was aligned to the VRCOl heavy chain sequence using ClustalW2; then the nucleotide sequences of two motifs that define the CDR H3 region - CTR and WGQG - were used as "anchors" to locate the CDR H3 region in the 454-derived heavy chain sequence.
  • IGHJ1*01 or IGHJ2*01 for comparison.
  • This example illustrates the cross-donor phylogenetic analysis (termed “all origin cross-donor phylogenetic analysis”) and comparison of the all-origin cross donor phylogenetic analysis with the IGHV1-2 cross donor phylogenetic analysis, discussed in Example 1.
  • the IGHV1-2 cross-donor analysis uses an initial input population of test sequences containing only heavy chain nucleotides having an IGHV1-2 germline origin
  • the all-origin cross donor analysis uses heavy chain nucleotide sequences from the IGHV1-2 germline and other germlines (up to all other germlines) as the input test sequences (see FIG. 40).
  • all-origin cross- donor phylogenetic analysis -99% of the nucleic acid sequences encoding VRCOl- like heavy chains in an antibodyome can be identified.
  • test sequences cross-donor positive heavy chain sequences
  • each node can have a series of parent nodes (interim/artificial nodes) with different depth.
  • 2*02 is the shallowest parent node and the common ancestor for all the leaves of the
  • the leaf nodes (and corresponding leaves are the deepest nodes (and leaves).
  • each node (interim or leaf), it contains a bifurcate tree with various number of leaves.
  • IGHVl-2*02 the root, has a subtree of ail input leaf nodes (antibody 454 sequences), including native/reference sequences test sequence reads.
  • Any leaf node represents a subtree which only contains itself. The deeper the node is, the smaller its subtree is, the fewer leaf nodes it contains.
  • the deepest nodes and hence the smallest subtree whose subtree contains all selected native/reference antibody heavy chain sequences is determined and selected. Then all the test sequences corresponding to leaves of the leaf nodes within the reference/native subtree are extracted. This set of test sequence reads is the smallest subset of test sequence reads that segregates with the native/reference antibody sequences.
  • test sequence reads There are two extreme situations: if the smallest subtree contains all the reference/native sequences and none of the test sequence reads, then no test sequence reads will be selected for further analysis. On the other hand, if the smallest subtree is the total of all (native/reference antibody sequences and all the test sequence reads), then all the test sequence reads will be selected for the next iteration of the analysis.
  • test sequence reads selected from the previous iteration as segregating with the native/reference tree are pooled and their order randomized before they are divided in smaller filed of no more than 3000 sequence reads each, so that the test sequence reads selected from the previous iteration have a lower chance of being analyzed with the same test sequence reads selected from the same smaller file in the previous iteration of the analysis.
  • This randomization procedure insures almost every test sequence read is evaluated with the other test sequence reads.
  • the procedure for IGHVl-2-o.ri gin-only cross-donor phylogenetic analysis runs substantially the same as the all-origin cross-donor phylogenetic analysis procedure.
  • the input sequences (test sequences) for the IGHVl-2-origin- only cross-donor analysis are only the antibody heavy chain variable domain encoding sequences assigned to IHGV1-2 germline V genes using IMGT/HighV- QUEST.
  • all-origin cross-donor phylogenetic analysis use antibody- heavy chain variable domain encoding sequences that were assigned to any origins by IgBLAST,
  • CXR/K peptide sequence is the signature end of the V genes of the VRCOl-like antibodies, all possible nucleotide fragments encoding the "CXR/K" peptide were assayed. The sequence fragment from the beginning to the end of the CXR/K are the V genes of the antibody are extracted.
  • VRC01 -like antibodies VRC01 , VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04B, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134) and termed known VRCOl-like V genes.
  • step 1 All V genes extracted in step 1 were split into smaller files, each of which contains maximum 3000 V genes.
  • Germline V gene IGHV 1 -2*02 and known VRC01 -like antibody V genes (VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04B, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134) were exogenously added to each of the small files obtained in step 3.
  • VRC01 -like antibody V genes VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04B, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134
  • the phylogenetic trees obtained in step 5 were automatically rooted on germline IGHV 1-2*02.
  • FIG. 41 illustrates the number of cross-donor positive sequence reads selected from each iteration of all origin cross donor analysis starting with 277,512 test sequences.
  • the final set of test sequence reads consists of 37,562 cross-donor positive sequences. These 37,562 contain 7,131 out of the original 7,135 VRC01- like Abs.
  • FIG. 42 shows a series of heatmaps illustrating the results of this analysis (showing sequence identity to VRCOl, VRC03, VRC-PG04, VRC-CH31, and VRC06 heavy chains), including the presence of "islands" within the heatmap that represent and high divergence/high sequence identity test sequences.
  • sequence identity to VRCOl, VRC03, VRC-PG04, VRC-CH31, and VRC06 heavy chains
  • FIG. 43A and B Results for first iteration of the all-origin- and IGHVl-2- cross-donor analyses, with and without rooting are illustrated in FIG. 43A and B. Venn diagrams further illustrating these results are shown in FIG. 44 and FIG. 45.
  • FIG. 46 illustrates the percent of antibody heavy chain nucleic acids that segregated into the native/reference tree during the first iteration of the above analyses.
  • rooting does affect the final result of antibody selection; that the effect of rooting is greater for the all origin cross-donor analysis; that when IGHVl-2 origin is not selected, the rooting procedure tends to select more test sequences than the non-rooting procedure; that when IGHVl-2 origin is selected, the rooting procedure tends to select fewer test sequences that the non- rooting procedure.
  • Whether rooting change the result of antibody selection depends on whether it changes the topology of the subtree that contains all native/reference heavy chain sequences (discussed with reference to FIGs. 47 and 48, below).
  • VRC-CHxx in this case native/reference- subtree (VRC-CHxx in this case) and germline IGHVl-2*02 is one of the internal nodes, rooting the tree on IGHVl-2*02 changes the number of selected sequences in the cross-donor phylogenetic analysis.
  • Clustalw2 were rooted on IGHV 1-2*02 sequence, which was added to the population of test sequences each iteration of the cross-donor analysis.
  • the percent (to the initial input, FIG. 49) of sequences segregated with native/reference sequences drops dramatically during the first few iterations of the cross-donor analyses, and eventually reaches a point that there are very few or no antibodies excluded from the native/reference tree during the cross-donor analyses.
  • the percent of segregated sequences drops greatly during the first few iterations of the cross-donor analysis. After 8 th iteration of cross-donor, the percent of segregated sequences does not drop as much and begins to flat out. Similar results were observed concerning the percent (to the initial input) of sequences segregated with native/reference sequences. For the all origin cross-donor analysis, the curves are a bit more zigzagged, which indicates that the distribution of antibodies in each single file is also important. However, as iterations process, the input Ab number drops, the fluctuations only reflect to a smaller number of antibodies.
  • FIGs. 51 and 52 illustrate number of sequences identified by the indicated cross- donor analysis methods after 12 iterations of cross-donor analysis. Significant overlap between the results from the cross-donor and V-gene-only cross-donor and the All-origin and IGHV1-2 origin cross donor analyses was observed.
  • FIG. 53 shows the percent of high identity sequences identified using the all-origin and IGHV1-2 origin cross donor analyses with or without V-gene only procedure.
  • the sequences in the initial dataset used for the indicate cross-donor analyses were filtered for high sequence identify to VRCOl and VRC03 and divergence from the germline sequence by 20% to 40%.
  • FIG. 53 A indicates the number and percent of sequences that fit these high identity criteria (#Hi) for the indicated cross-donor analyses.
  • FIG. 53B shows the corresponding heat map results. Based on these results, the all-origin cross-donor identified up almost all (>99%) VRCOl -like antibodies from the input dataset.
  • the J gene starts from an amino acid motif- WGXG peptide sequence ("X" is any residue) on the antibody sequence.
  • X is any residue
  • FIG. 54 shows a graph indicated the percent of these 17,188 antibodies as a function of WGXG start positions.
  • the majority of the identified antibodies have WGXG starting from -300 bp to ⁇ 400bp. This is reasonable since the heavy chain V-gene length is -300 bp. Those antibodies having WGXG before the 300 bp position are likely not real CDRH3 terminate signals. Thus, if an antibody has WGXG and the start of WGXG is in the range of 300 to 400 bp of the encoding sequence, the sequence of the antibody was extracted from the WGXG start position to the end. For all other antibodies, we just simply extract from 300 to the 3' end of the sequences.
  • FIG. 55 illustrates the heavy chain V-gene family assignment for identified heavy chain sequences using various computer programs to assign V-gene status.
  • FIG. 56 illustrates the heavy chain J-gene family assignment for identified heavy chain sequences using various computer programs to assign J-gene status.
  • FIG. 57 shows a table illustrating the number and percent of assignments that were correct using the various computer programs.
  • FIG 58 shows the percentage of high-identity heavy chain sequences having the indicated V-gene family status identified using the indicated computer programs to assign V- gene status.
  • FIG 59 shows the percentage of hi-identity heavy chain sequences having the indicated V-gene family status identified using the indicated computer programs to assign V- gene status.
  • Heavy and light chain chimeras including various heavy and light chains of VRCOl-like and other antibodies were produced as described in Example 1.
  • the heavy and light chains included in the chimeras are indicated in FIGs. 60A-60D.
  • VRCOl-like antibody heavy and light chains used in the complementation assays include the heavy and/or light chains of VRCOl, VRC17, VRC18, VRC-PG19, VRC-PG20, 12A12, 12A21, 3BCN60, 3BCN117, NIH45-46.
  • Non-VRCOl-like antibody heavy and light chains used in the complementation assays include the heavy and/or light chains of VRC13, VRC14, VRC15, VRC16, 1B2530, 1NC9, 8ANC131, 8ANC134. The heavy and light chains sequence of these antibodies is disclosed herein.
  • swapping heavy and light chains between different members of the VRCOl-like class can be accomplished without loss of the VRCOl-like functional characteristics, that is HIV-1 neutralization. Conversely, non-VRCOl-like heavy and light chains fail to functionally complement.
  • CD4-binding site of the HIV-1 surface glycoprotein gpl20 defining a promising vaccine design target.
  • Cross-donor phylogenetic analysis CDPA
  • CDPA Cross-donor phylogenetic analysis
  • CDPA selects antibody sequences based on maturation similarities to a small number of known "native/reference" neutralizing antibody sequences, such as VRCOl.
  • CDPA of sequences from an HIV-infected donor with previously uncharacterized antibody status revealed 13 V H sequences that segregated with the VRCOl-like phylogenetic subtree.
  • CDPA can identify VRCOl-like V H sequences in an infected individual based only on bioinformatic analysis of deep sequences.
  • the CDPA method was used to isolate antibody heavy chains related to VRCOl, a bNAb that neutralizes around 90% of HIV-1 strains, from an infected donor with previously uncharacterized antibody status. HIV-1 has evolved a variety of mechanisms to escape antibody recognition, and it remains unclear how to elicit bNAbs by vaccination that can successfully overcome the genetic and
  • VRCOl-like antibodies which achieve broad neutralization by precisely targeting the binding site for the human CD4 receptor on the viral surface glycoprotein gpl20, are responsible for this broad neutralizing activity of many such infected individuals.
  • VRCOl-like antibodies originate from B cells in which the VH1-2 gene has recombined with various D and J segments. Effective VRCOl-like antibodies are among the most highly- matured of all antibodies yet characterized, with variable-region mutation rates of about 30%. Sequence identity-based bioinformatic s techniques fail to identify most VRC01- related antibodies in these patients.
  • ELISAs immunosorbent assays
  • Serum binding was competed by VRCOl-03, VRC-PG04 and 04b, VRC-CH30-32 and other CD4bs -reactive antibodies and by CD4-Ig, but not by antibodies known to bind gpl20 at other sites.
  • assessment of neutralization rendered by donor 200-384 serum on a panel of Env-pseudoviruses revealed its ability to potently neutralize a majority of diverse HIV-1 isolates.
  • deep sequencing of cDNA from donor 200-384 PBMCs using the Roche 454 pyrosequencing method, as described in Example 1 was performed.
  • mRNA from 5 million B cell population was used to as template for polymerase chain reaction (PCR) to preferentially amplify the IgG and IgM genes from the IGHV1 family.
  • PCR polymerase chain reaction
  • the 454 sequencing provided 574,027 sequence reads (FIG. 62). 498,234 or 86.8% of the 454 reads spanned 400-500 bp, sufficiently covering the V H region (FIG. 64).
  • the V(D) J gene components were determined for each sequence using IgBlast.
  • 168,365 sequences were assigned to IGHVl-2*02 allelic origin, which is used by all VRCOl-like antibodies identified so far.
  • sequences originated from IGHV1- 2*02 constitute the largest family, accounting for 29.3% of the sequences.
  • each sequence was subjected to an automatic error-correction scheme, which improved the protein sequence identity to inferred germline V gene by an average of 16.5%.
  • the corrected sequences were then compared to a set of template VRCOl-like antibodies, including VRCOl-03, VRC-PG04 and 04b, and VRC-CH30-32. No sequence was found to be more than 73% identical to any template (FIG. 61B), suggesting that VRCOl-like antibodies, if do exist in this antibodyome, cannot be recognized by sequence identity.
  • V H3 sequences of IGHVl-2*02 allelic origin that encompassed the entire V(D)J region were selected for the third complementarity determining region (CDR H3) analysis, where CDR H3 of each sequence was determined and compared to the CDR H3s of the template VRCOl-like antibodies. Some sequences were found to have CDR H3s of -80% identity to that of VRC-PG04 and shared the same J gene allele, IGHJ2*01, suggesting that the same V(D)J recombination events occurred.
  • IGHV1-2 cross-donor phylogenetic analysis on the 163,108 full- length IGHVl-2*02-originated sequences was then perfomed.
  • the IGHV 1-2*02 antibodyome was divided into 60 subsets, each with donor 45-derived VRCOl-03, donor 74-derived VRC-PG04 and 4b, and donor 0219-derived VRC-CH30-32 added as reference.
  • IGHVl-2*02 sequence was also added to root the resultant cross-donor phylogenetic tree.
  • NJ neighbor-joining
  • sequences in the smallest subtree containing exogenous VRCOl-like antibodies were extracted and merged into a new set for the next iteration of cross-donor analysis.
  • the total number of sequences was reduced to 2,030 after the first iteration and converged to 166 after the second iteration, accounting for -0.01% of the antibodyome.
  • 81 unique sequences were used to construct the final "cross-donor phylogenetic" tree using a more accurate maximum- likelihood (ML) method.
  • the resultant donor 200-384 tree (FIG 61C and FIG. 65) showed four branches, all below the least-divergent VRC-CH30-32 and interleaved with other VRCOl-like antibodies. 11 sequences were selected evenly from the tree and two clustered closely with VRCOl-03 to assess their biological functions.
  • the 166 cross-donor-segregated sequences from donor 200-384 were analyzed and no apparent preference for J gene usage was found. Specifically, branches 1, 2 and 3 were occupied by sequences using IGHJ5 gene and sequences of branch 4 appeared to use IGHJ6. The two sequences clustered with VRCOl-03 share the same J allelic origin, IGHJ1*01, as VRCOl-03, indicating that the cross-donor analysis was not restricted to a particular J gene or by the J genes of exogenous sequences.
  • the donor 200-384 antibodyome was examined using functionally tested sequences as template. Sequences of IGHV1-2 allelic origin were plotted in two dimensions, the divergence from inferred germline gene (one of the IGHV1-2 alleles) and sequence identity to a chosen template, for either full sequence or CDR H3 region only (FIG. 62B). For all four branches in the cross-donor phylogenetic tree, only small, dispersed clusters of related sequences with over 80% identity to the selected template on the plot of donor 200-384 antibodyome, as opposed to the abundant populations of VRC-PG04-like sequences obtained at the same sequencing depth for donor 74, were observed.
  • the bioinformatic method includes a first step to process 454 sequencing data sets using a computational procedure and a second step to search for sequences with VRCOl-like signatures in the CDR-L1 and CDR-L3 regions of the light chains.
  • a general bioinformatic s pipeline has been developed to process and analyze 454 pyrosequencing-determined antibodyomes.
  • each sequence read was (1) reformatted and labeled with a unique index number; (2) assigned to variable (V) and joining (J) gene families and alleles using an in-house implementation of IgBLAST, and sequences with E-value > 10 " were rejected; (3) subjected to a template-based error correction scheme where 454 homopolymer errors in V and J genes were detected and corrected based on the alignment to germline sequences; (4) compared with a set of user-provided
  • “reference” light chains sequences to calculate respective identities, at both nucleotide and amino-acid levels; (5) subjected to a multiple sequence alignment (MSA) procedure to determine the CDR-L3 region, which was further compared with a set of user-provided "reference” CDR-L3 sequences.
  • MSA multiple sequence alignment
  • the two VRCOl-like light chain motifs are defined as the following:
  • the VRCOl-like light chain signature included a 2-residue or more amino acid deletion in the region

Abstract

Monoclonal neutralizing antibodies are disclosed that specifically bind to the CD4 binding site of HIV-1 gpl20. The identification of these antibodies, and the use of these antibodies are also disclosed. Methods are also provided for enhancing the binding and neutralizing activity of any antibody using epitope scaffold probes.

Description

NEUTRALIZING ANTIBODIES TO HIV-1 AND THEIR USE
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application No. 61/484,184, filed May 9, 2011, U.S. Provisional Application No. 61/515,528, filed August 5, 2011 and U.S. Provisional Application No. 61/522,205, filed August 10, 2011. All of the provisional applications are specifically incorporated by reference in their entirety. FIELD OF THE DISCLOSURE
This relates to broadly neutralizing monoclonal antibodies that bind to the CD4 binding site of human immunodeficiency virus (HIV)-l gpl20, their use, and methods of identifying these broadly neutralizing monoclonal antibodies. BACKGROUND
An effective HIV-1 vaccine will likely need to induce neutralizing antibodies (NAbs) that block HIV-1 entry into human cells. To be effective, vaccine induced antibodies will have to be active against most circulating strains of HIV-1.
Unfortunately, current HIV-1 vaccines are unable to induce potent and broadly reactive NAbs. One major obstacle to the design of better vaccines is the limited understanding of what region of the HIV-1 envelope glycoproteins (gpl20) are recognized by NAbs. A few neutralizing monoclonal antibodies (mAbs) have been isolated from HIV-1 infected individuals and these mAbs define specific regions (epitopes) on the virus that are vulnerable to NAbs.
One previously characterized HrV-l neutralizing mAb, called bl2, can bind to a site on gpl20 that is required for viral attachment to its primary cellular receptor, CD4. mAb bl2 was derived from a phage display library, a process which makes it impossible to know if the antibody was naturally present in an infected person, or was the result of a laboratory combination of antibody heavy and light chains. bl2 can neutralize about 75% of clade B strains of HIV-1 (those most common in North America), but it is not broadly neutralizing (it neutralizes less than 50% of other strains of HIV-1 found worldwide). Therefore, there is a need to develop broadly neutralizing antibodies for HIV-1.
SUMMARY OF THE DISCLOSURE
Isolated VRCOl-like broadly neutralizing antibodies that specifically bind
HIV-1 gpl20 are provided herein. In several examples, the isolated VRCOl-like broadly neutralizing antibodies do not include the heavy or light chain from an established VRCOl-like antibody. Also disclosed herein are compositions including these antibodies that specifically bind gpl20 and nucleic acids encoding these antibodies, expression vectors comprising the nucleic acids, and isolated host cells that express the nucleic acids. Also disclosed are methods for identifying the class of VRCOl-like heavy chain variable domains, broadly neutralizing antibodies that include these heavy chain variable domains, and methods of using these broadly neutralizing antibodies. Further disclosed are methods for identifying the class of VRCOl-like light chain variable domains, broadly neutralizing antibodies that include these light chain variable domains, and methods of using these broadly neutralizing antibodies.
In some embodiments, a VRCOl-like heavy chain variable domain is identified by performing a cross-donor phylogenetic analysis on a population of heavy chain variable domain nucleic acid sequences from B cells from a subject infected with HIV. In some examples, the cross-donor analysis is performed on a population of nucleic acid sequences encoding heavy chain variable regions having an IGHV1-2 germline origin. In other examples, the cross-donor analysis is performed on a population of nucleic acid sequences encoding heavy chain variable regions having an IGHV1-2 germline or other germline origin.
In some examples, cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from one or more VRCOl- like antibodies (reference antibodies), such as one or more of VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, PRC-CH30, VRC-CH31, VRC-CH32, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 and 8 ANC 134 heavy chain variable domains, to the population of test sequences. Additionally, the nucleotide sequence of one or more germline sequences (such as the IGHV1-2 germline sequence, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV 1-2*05 germline sequence) is added to the population of heavy chain variable domain nucleic acid sequences. This forms an analytic population of sequences.
A phylogenetic tree is then constructed from this analytic population, for example using neighbor joining analysis, rooted at the germline sequence (such as the IGHV 1-2 germline sequence, for example the IGHV 1-2*01, IGHV 1-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline sequence). Nucleic acid sequences of interest are selected that segregate within a distinct branch (such as the smallest subtree) in the phylogenetic tree with one or more (such as all) of the heavy chain variable domains from the one or more VRCOl-like antibodies included in the analytic population. In additional embodiments, the process is repeated in an iterative fashion, for example on subpopulations of heavy chain variable domain nucleic acid sequences from a subject infected with HIV, until the phylogenic tree converges.
In some embodiments, a nucleic acid encoding a VRCOl-like light chain in a population of test sequences is identified as a nucleic acid sequence of interest. In several such embodiments, each test sequence in the population is a nucleic acid sequence encoding a light chain variable domain from a subject infected with HIV, wherein the light chain variable domain comprises a complementarity determining region (CDR)l, a CDR2 and a CDR3 and has a corresponding germline origin light chain variable domain. In some embodiments, the CDR3 of the VRCOl-like light chain variable domain comprises a hydrophobic residue followed by a glutamic acid residue or glutamine residue, and if the germline origin of the VRCOl-like light chain variable domain is a IGKVl-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises at least two glycine residues, and if the germline origin of the VRCOl-like light chain variable domain is not a IGKVl-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises a deletion of two or more amino acids compared to the corresponding germline origin. Several embodiments, include selecting a test sequence that encodes the
VRCOl-like light chain variable domain as the nucleic acid sequence of interest and synthesizing an isolated nucleic acid molecule comprising the nucleic acid sequence of interest, thereby producing the isolated nucleic acid molecule encoding the VRCOl-like light chain variable domain.
In some examples, a polypeptide is produced from the nucleic acid sequence of interest, thereby producing the VRCOl-like heavy or light chain domain. In some examples, the selected and expressed VRCOl-like heavy or light chain domain is tested for neutralization activity by complementation with a corresponding heavy or light chain variable domain from an identified VRCOl-like antibody, such as one or more of VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, PRC-CH30, VRC- CH31, VRC-CH32, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 light chain variable domains.
The antibodies and compositions disclosed herein can be used for a variety of purposes, such as for detecting an HIV-1 infection or diagnosing acquired immune deficiency (AIDS) in a subject. These methods can include contacting a sample from the subject diagnosed with HIV-1 or AIDS with a VRCOl-like antibody that specifically binds gpl20, and detecting binding of the antibody to the sample. An increase in binding of the antibody to the sample relative to binding of the antibody to a control sample confirms that the subject has an HIV-1 infection and/or AIDS. In some embodiments, the methods further include contacting a second antibody that specifically binds gpl20 with the sample, and detecting binding of the second antibody. In some non-limiting examples an increase in binding of the antibody to the sample relative to a control sample detects HIV-1 in the subject. In some non-limiting examples, the antibody specifically binds soluble gpl20 in the sample. In some embodiments, the methods further comprise contacting a second antibody that specifically recognizes VRCOl-like antibody with the sample and detecting binding of the second antibody.
In additional embodiments, a method is disclosed for treating a subject with an HIV infection, such as, but not limited to, a subject with AIDS. The methods include administering a therapeutically effective amount of a VRCOl-like antibody or an antigen-binding fragment thereof to the subject.
Also disclosed are methods for testing a potential vaccine for example by contacting the potential vaccine with a VRCOl-like antibody, or an antigen -binding fragment thereof; and detecting the binding of the antibody to an immunogen in the potential vaccine.
The foregoing and other features and advantages of this disclosure will become more apparent from the following detailed description of a several embodiments, which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE FIGURES FIGS. 1A-1F show the identification and characterization of broadly neutralizing CD4-binding-site monoclonal antibodies (mAbs) from HIV- 1 -infected donors, 74 and 0219. The RSC3 probe was used to identify five broadly neutralizing mAbs, all of which were inferred to derive from the IGVH 1-2*02 allele and displayed a high levels of somatic mutation. FIG. 1A is a plot showing RSC3 analysis of serum. Twelve sera from the IAVI Protocol G cohort (donors 17-74) and one serum from the CHAVI 001 cohort (donor 0219) were analyzed for RSC3 reduction in serum neutralization on HrV-1 strains JR-FL, PVO.4, YU2 and ZA12.29. Lower bars show the mean serum reduction in neutralization IC50 resulting from RSC3 versus ARSC3 competition. Sera with greatest reduction were further analyzed on HIV-1 strains Q168.a2, RW020.2, Dul56.12 and ZM109.4. Upper bars show the mean reduction on eight viruses. FIG. IB is a set of dot plots showing the RSC3- and ARSC3-binding profile of IgG-i- B cells from donors 74 and 0219. Gating and percentage of B cells of interest (RSC3+ARSC3-) are indicated, with 40 and 26 sorted single B cells from donors 74 and 0219, respectively.
Additional sorting details are shown in FIG. 8. FIG. 1C is a sequence alignment showing the protein sequences of heavy chain variable regions of mAbs: VRC-PG04 (SEQ ID NO: 1609) and VRCPG04b (SEQ ID NO: 1610), isolated from donor 74; mAbs VRC-CH30 (SEQ ID NO: 1611), VRC-CH31 (SEQ ID NO: 1612) and VRC- CH32 (SEQ ID NO: 1613), isolated from donor 0219; and mAbs VRCOl (SEQ ID NO: 1614), VRC02 (SEQ ID NO: 1615), and VRC03 (SEQ ID NO: 1616) isolated from donor 45; and IGHV 1-02*02 (SEQ ID NO: 2487) and light chain variable regions of mAbs VRC-PG04 (SEQ ID NO: 1619) and VRCPG04b (SEQ ID NO: 1620), isolated from donor 74, and mAbs VRC-CH30 (SEQ ID NO: 1621), VRC- CH31 (SEQ ID NO: 1622) and VRC-CH32 (SEQ ID NO: 1623), isolated from donor 0219 and VRCOl (SEQ ID NO: 1624), VRC02 (SEQ ID NO: 1625), VRC03 (SEQ ID NO: 1626), IGKV3-11*01 (SEQ ID NO: 2488), IGKV3-20*01 (SEQ ID NO: 2489), IGKV1-33*01 (SEQ ID NO: 2490). Sequences are aligned to putative germline ancestral genes and to previously identified broadly neutralizing antibodies VRCOl and VRC03. Framework regions (FR) and complementary-determining regions (CDRs) are based on Kabat nomenclature. FIG. ID is a plot showing competition ELISAs. The binding to YU2 gpl20 by a single concentration of biotin- labeled VRC-PG04 or VRC-CH31 was assessed against increasing concentrations of competitive ligand. CD4-Ig is a fusion protein of the N-terminal two domains of CD4 with IgGl Fc. FIG. IE is a table showing amino acid sequence identities between VRC-PG04 or VRC-CH31 and other antibodies reactive with the CD4- binding site on gpl20 (CD4bs) or with the CD4-induced co-receptor-binding site (CD4i). FIG. IF is a set of neutralization dendrograms. VRC-PG04 and VRC-CH31 were tested against genetically diverse Env-pseudoviruses representing the major HIV-1 clades. Neighbor-joining dendrograms display the protein distance of gpl60 sequences from 179 HIV-1 isolates tested against VRCPG04 and a subset (52 isolates) tested against VRC-CH31. A scale bar denotes the distance corresponding to a 1% change in amino acid sequence. Dendrogram branches are indicated by the neutralization potencies of VRC-PG04 and VRC-CH31 against each particular virus.
FIGS. 2A-2C are digital images of the atomic structures of antibodies VRC-
PG04 and VRC03 in complex with HIV-1 gpl20. Despite being elicited and maturing in different individuals, broadly neutralizing antibodies VRC-PG04 and VRC03 display remarkable similarities in recognition of HIV- 1. FIG. 2A is digital image of the overall structures. The liganded complex for the Fab of antibody VRC- PG04 from donor 74 and the HIV-1 gpl20 envelope glycoprotein from isolate
93TH057 is depicted with polypeptide backbones in ribbon representation in the left image. The complex of Fab VRC03 from donor 45 is depicted in the right image, with surfaces of all variable domain residues that differ between VRC03 and VRC- PG04 highlighted according to their chemical characteristics. Although VRCPG04 and VRC03 derive from the same inferred heavy chain V-gene, roughly 40% of their variable domain residues have been altered relative to each other during the maturation process. FIGS. 2B and FIG. 2C are digital images showing interaction close-ups. Critical interactions are shown between the CD4-binding loop of gpl20 and the CDR H2 region of the broadly neutralizing mAbs, VRC03 and VRCPG04 and VRCOl, with hydrogen bonds depicted as dotted lines. The 1.9 and 2.1 A resolution structures of VRC03 and VRC-PG04, respectively, were sufficient to define interfacial waters shown in FIG. 2C, which were unclear in the 2.9 A 24 structure of VRCOl . The orientation shown in FIG. 2C is -180° rotated about the vertical axis from the orientation shown in FIG. 2B.
FIGS. 3A-3C are a set of plots and digital images of atomic structures showing the focused evolution of VRCOl-like antibodies. The maturational processes that facilitate the evolution of VRCOl-like antibodies from low affinity unmutated antibodies to high affinity potent neutralizers involve divergence in antibody sequence and convergence in epitope recognition. FIG. 3A shows antibody convergence. The gpl20 portions of liganded complexes with VRCOl, VRC03 and VRC-PG04 were superimposed to determine the average antibody per-residue Ca deviation, and the per-residue hydrophobic interaction (AiG) was calculated. These two quantities were found to correlate (P-value = 0.0427), with antibody residues containing strong hydrophobic interactions (e.g. at positions, 53, 55, 91 and 97) displaying high structural conservation. This correlation is visualized on VRC-PG04 in the left image, where the ribbon thickness is proportional to the corresponding per-residue Ca deviation and the paratope surface is highlighted according to hydrophobicity, from white (low) to dark (high); notably, surface patches map to thin ribbons. FIG. 3B shows epitope convergence. The HIV-1 gpl20 surface involved with CD4 binding contains conformationally invariant regions (e.g.
associated with the outer domain) and conformationally variable regions (e.g.
associated with the bridging sheet). It was previously hypothesized that the conformationally invariant outer domain-contact for CD4 represents a site of vulnerability. Thus the he precision of CD4-binding-site ligand recognition (vertical axis) was analyzed versus the IC80 neutralization breadth (horizontal axis) and observed significant correlation (P2=0.6, P-value=0.040). FIG. 3C is a set of digital images showing the divergences in sequence and convergences in recognition. The development of VRCOl-like antibodies involves a heavy chain derived from the IGHVl-02*02 allele and selected light chain VK alleles. The far left image depicts ribbon representation model of a putative germline antibody. Somatic hypermutation during the process of affinity maturation leads to a divergence in sequence, yet results in the convergent recognition of similar epitopes. Intersection of the epitope surfaces recognized by VRCOl, VRC03 and VRC-PG04 (far right image), reveals a remarkable similarity to the site of vulnerability. The primary divergence of this intersection from the hypothesized site of vulnerability occurs in the region of HIV- 1 gpl20 recognized by the light chain of the VRCOl-like antibodies. While the separate epitopes on gpl20 do show differences in recognition surface, these primarily involve the bridging sheet region, which is likely to adopt a different conformation in the functional viral spike prior to engagement of CD4.
FIGS. 4A-4E are dendrograms and plots showing the results of deep sequencing of expressed heavy and light chains from donors 45 and 74. 454 pyrosequencing facilitates the determination of the repertoire of heavy and light chain sequences (the heavy and light chain antibodyomes). Heavy and light chain complementation, computational bioinformatics, and neutralization measurements on reconstituted chimeric antibodies provide functional assessment. FIG. 4A is a dendrogram showing heavy and light chain complementation. The neutralization profiles of VRCOl and VRC03 (donor 45), VRC-PG04 (donor 74), and VRCCH31 (donor 0219) and their heavy and light chain chimeric swaps are depicted with 20- isolate neutralization dendrograms. Explicit neutralization IC50s are provided in
Table S10. FIG. 4B is a set of plots showing the repertoire of heavy chain sequences from donor 45 (2008 sample) and donor 74 (2008 sample). Heavy chain sequences are plotted as a function of sequence identity to the heavy chain of VRCOl (left), VRC03 (middle) and VRC-PG04 (right) and of sequence divergence from putative genomic VH-alleles: upper row plots show sequences of putative IGHV 1-2*02 allelic origin; lower row plots show sequences from other allelic origins.
Highlighting indicates the number of 26 sequences. FIG. 4C is a set of plots showing the repertoire of expressed light chain sequences from donor 45 (2001 sample). Light chain sequences are plotted as a function of sequence identify to VRCOl (left) and VRC03 (right) light chains, and of sequence divergence from putative genomic V-gene alleles. Sequences with 2-residue deletions in the CDR LI region (which is observed in VRCOl and VRC03) are shown as black dots. Two sequences, with 92.0% identify to VRCOl (sequence ID 181371) and with 90.3% identify to VRC03 (sequence ID 223454) are highlighted with triangles. FIG. 4D is a set of dendrograms showing the functional assessment of light chain sequences identified by deep sequencing. The neutralization profiles of sequence 181371 reconstituted with the VRCOl heavy chain (named gVRC-Lld45) and of sequence 223454 reconstituted with the VRC03 heavy chain (named gVRCL2d45) are depicted with 20-isolate neutralization dendrograms; explicit neutralization IC50s are shown provided in Table S15. FIG. 4E is a set of plots showing the functional assessment of heavy chain sequences identified by deep sequencing. Heavy chain sequences from donors 45 and 74 were synthesized and expressed with either the light chain of VRCOl or VRC03 (for donor 45) or the light chain of VRC-PG04 (for donor 74) and evaluated for neutralization. Neutralizing antibodies are shown as stars and are labeled. Comprehensive expression and neutralization results are presented in Tables S14 and S15. gVRC-H(n) refers to the heavy chains with confirmed neutralization when reconstituted with the light chain of VRC-PG04 (Tables S 14 and S 15).
FIGS. 5A-5B are a set of phylogenetic trees and digital images of antibody structures showing the maturational similarities of VRCOl -like antibodies in different donors revealed by phylogenetic analysis. The structural convergence in maturation of VRCOl -like antibodies suggested similarities of their maturation processes; phylogenetic analysis revealed such similarities and allowed maturation intermediates to be inferred. FIG. 5A shows neighbor-joining phylogenetic trees of heavy chain sequences from donor 45 (left) and donor 74 (right). The donor 45 tree is rooted by the putative reverted unmutated ancestor of the heavy chain of VRCOl, and also includes specific neutralizing sequences from donor 74 and 0219. Similarly the donor 74 tree is rooted in the putative reverted unmutated ancestor of the heavy chain of VRC-PG04, and sequences donor 45 and 0219 are included in the phylogenetic analysis. Bars representing 0.1 changes per nucleotide sequence are shown. Insets show J chain assignments for all sequences within the neutralizing subtree identified by the exogenous donor sequences. FIG. 5B shows
phylogenetically inferred maturation intermediates. Backbone ribbon representations are shown for HIV-1 gpl20 and the heavy chain variable domains. Critical intermediates defined from the phylogenetic tree in FIG. 5 A are labeled 145, 1145, III45, 174 and 1174. The number of VH-gene mutations is provided (e.g. 145: 23), and the location of these is highlighted in the surface representation and indicated according to their chemistry.
FIGS. 6A-6E is a set of plots and a phylogenetic tree showing the analysis of the heavy chain antibodyome of donor 74 and identification of heavy chains with HIV- 1 neutralizing activity. Identity/diversity- grid analysis, cross-donor
phylogenetic analysis, and CDR H3 analysis when coupled to functional characterization of selected heavy chain sequences, provides a means for identification of novel heavy chains with HIV-1 neutralizing activity. FIG. 6A is a plot showing identity/diversity-grid analysis. The location of the 70 synthesized heavy chains from donor 74 is shown, including neutralizing (light stars) and nonneutralizing (black stars) sequences. FIG. 6B shows a cross-donor phylogenetic analysis and CDR H3 lineage analysis. A maximum- likelihood phylogenetic tree of the 70 synthesized heavy chain sequences is rooted in the putative reverted unmutated ancestor of VRC-PG04. The probe-identified VRCPG and VRC-CH antibodies are shown. Grid location and CDR H3 class is specified for neutralizing and non-neutralizing sequences. Within each CDR H3 class, all sequences with identical CDR H3s are highlighted in orange in the far right grids (with the number of total sequences corresponding to each CDR H3 class shown). FIG. 6C is a plot showing the expression levels of selected heavy chains reconstituted with the light chain of VRC-PG04 versus breadth of neutralization. FIG. 6D is a plot showing the neutralization potency of reconstituted phylo genetically-predicted antibodies on seven HIV-1 isolates. FIG. 6E is a plot showing the CDR H3 analysis of donor 74 heavy chain sequences. For each of the 110,386 sequences with derived from the IGHV 1-2*02 allele, the CDR H3 was determined and its percent identity to that of the VRC-PG04 heavy chain was graphed. The sequences with high CDR H3 identity to VRC-PG04 reside in regions of high overall heavy chain sequence identity, even for sequences with a low divergence from IGHVl-2*02.
FIGS. 7A-7C are plots and digital images showing the maturation lineages of four unique VRCOl-like heavy chains in donor 74. The CDR H3 sequence, a product of V(D)J gene recombination and N nucleotide addition and removal, provides a signature to trace the lineage of a particular B cell. FIG. 7A shows the lineage analysis of CDR H3 class 3 (SEQ ID NO: 2491). Grid positions are displayed for the 390 heavy chain sequences with a CDR H3 sequence identical to the identified CDR H3 class 3. These sequences cluster into an elongated family of sequences with moderate identity to VRC-PG04. Representative sequences ranging from low to high IGVH 1-2*02 sequence divergence (representing low to high levels of affinity maturation) are shown as structural models of the heavy chain variable domain, with maturation changes highlighted in surface mode indicated by chemistry as in FIG.5B. Sequences of displayed structures are shown in FIG. 22. Overall neutralization breadth and potency for sequence ID 13826_2 was assessed on a 20-isolate HIV-1 panel, with individual neutralization results tabulated in Table S15. FIG. 7B shows the lineage analysis of CDR H3 class 6 (SEQ ID NO: 2492) that was performed as described above. The sequence ID 10731_1 that was selected in the grid analysis and found to be neutralizing is shown as a member of this family. FIG. 7C show the analysis of CDR H3 classes 7 (SEQ ID NO: 2493) and 8 (SEQ ID NO: 2594). Analysis of the CDR H3 of classes 7 and 8 suggest that these might be clonally related (FIG. 21). Sequences from these related classes segregate in similar ways, suggestive of related maturational pathways.
FIG. 8 is a set of dot plots and a table showing single RSC3-specific B cell sorting. About 20 million PBMC from donors 74 and 0219 were incubated with APC and PE labeled RSC3 and RSC3, respectively. Memory B cells were selected on the basis of the presented gating strategy. The percentages of B cells that reacted with RSC3 and not RSC3 within IgG-i- B cells are indicated. The actually sorted single B cells were 40 from donor 74 and 26 from donor 0219. The sorter configurations are indicated in the bottom table.
FIG. 9 is a set of graphs showing antigen binding profiles of five newly isolated mAbs, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 and VRC- CH32, as measured by ELISA. Solid symbols show mAb binding to RSC3 (top) and YU2 gpl20 (bottom). Open symbols indicate mAb binding to ARSC3 or to the CD4bs knockout mutant of gpl20, D368R.
FIG. 10 is a set of graphs of competition ELISAs showing that mAbs VRC- PG04 and VRC-CH31 are directed to the CD4bs of HIV-1 gpl20. The competition ELISAs were performed with a single concentration of biotin-labeled VRC-PG04 or VRCCH31. Unlabeled mAbs were titrated into the ELISA at increasing
concentrations to evaluate the effect on VRCPG04 or VRC-CH31 binding to RSC3 (top). The competition ELISAs were also performed with a single concentration of biotin-labeled CD4-Ig or the co-receptor binding site mAb 17b. Unlabeled mAbs were titrated into the ELISA at increasing concentrations to evaluate the effect on CD4-Ig or 17b binding to YU2 gpl20 (bottom). CD4-Ig is a fusion protein of the N- terminal two domains of CD4 fused with IgGl Fc to served as a CD4 surrogate.
FIG. 11A-11C is a set of digital images and a table showing that CDR H2 and CDR L3 regions of VRCOl-like antibodies showed high degree of similarity in recognition on gpl20. FIG. 11A shows that when gpl20 was superimposed, orientations of the antibodies in the gp 120: antibody complexes were compared. CDR H2 and CDR L3 regions of VRCOl -like antibodies showed high precision alignment. FIG. 11B shows ribbon representation of VRCOl, VRC03 and VRC- PG04 in the same orientation as panel A. FIG. 11C is a table showing the pairwise root-mean- square deviation (RMSD) of CDR loops between VRCOl, VRC03 and VRC-PG04.
FIG. 12 is a set of graphs showing the correlations between structural convergence and antigen-interacting surface areas of antibody, (left) A significant correlation was found between antigen-interfacing surface on CDR and average RMSD for the six CDR regions in the three available structures (VRCOl, VRC03, and VRC-PG04). The point for CDR L3 of VRC03 overlaps almost perfectly with the point for CDR L3 of VRCOl and is not visible, (right) While no correlation was found between average antigen-interfacing surface and Ca deviation for each interface residue, residues with large interface surface were observed to have low Ca deviations.
FIG. 13 is a plot of the 454 sequence distribution of donor 45 and donor 74 heavy-chain antibodyomes plotted as a function of sequence identity to VRC02 and VRC-PG04b and sequence divergence from respective germlines. Non-IGHV 1-2*02 germline divergence ( ) Row one plots sequences of IGHVl -2*02 origin and row two plots sequences of other origins. FIG. 14 is a plot of neutralization of expressed phylogeny-segregated sequences and sequences selected by other criteria from donor 45 2008 heavy-chain antibodyome. Specifically, the two neutralizing sequences were selected from the phylogenetic subtree of IGHV 1-2*02 sequences (see FIG. 5) where they segregate with VRCO 1 , VRC02, VRC03, VRC-PG04 and VRC-PG04b, whereas the 11 non- neutralizing sequences were selected either from different divergence bins of IGHV 1-2*02 family with high predicted structural compatibility with known VRCO 1 -like antibody-gpl20 structure complexes or from other germline families with high divergence and large family size (see Tables Sll and S12).
FIG. 15 are a set of plots of the sequence distribution of 454- pyrosequencing-determined donor 74 heavy-chain antibodyome (obtained from Beckman Coulter Genomics) plotted as a function of sequence identity to VRCOl, VRC03 and VRC-PG04 and sequence divergence from respective germlines. Row one plots sequences of IGHV 1-2*02 origin and row two plots sequences of non- IGHV 1-2*02 origin.
FIG. 16 is a plot of the identity/divergence -grid assessment of donor 74 heavy-chain 2008 antibodyome. A 10X10 grid was placed over the quadrant defined by high divergence and high sequence identity to VRC-PG04. The sequences within each square of the grid were subjected to a clustering procedure with a sequence identity cutoff of 90%. A sequence was then randomly selected from the largest cluster as candidate. An initial set of 57 sequences was obtained using this approach. Sequences with an identity of 95% or greater to others or containing uncorrected sequencing errors were replaced by new ones selected from the grid. Note that every time a new sequence was selected, the possibility of overlapping with sequences of neighboring squares was examined using sequence clustering. A total of 56 grid- selected sequences were synthesized to assess the function of 454- pyrosequencing- determined heavy-chain sequences.
FIG. 17 is a plot of a phylogenetic tree in which additional sequences were selected to enhance the coverage of phylogeny-segregated sequences. In the iterative phylogenetic analysis of IGHV 1-2*02 family of donor 74 2008 heavy-chain antibodyome, 5047 sequences were found to segregate with VRCOl, VRC02, VRC03 and VRC-PG04 on a district branch. A neighborjoining (NJ) tree of these 5047 sequences, rooted at the inferred VRC-PG04 germline, is shown in this figure. 38 out of the 57 identity/divergence-grid-derived sequences were found within these sequences and are labeled. 7 additional sequences were selected to represent unoccupied branches and are labeled by rectangles.
FIG. 18 is a phylogenetic tree of 98 sequences from donor 45 light-chain 2001 antibodyome that have the same VRCOl-like and VRC03-like deletions. The maximum likelihood (ML) tree is rooted at the IGKV3- 11*01, VRCOl light-chain V-gene germline, which is highlighted in green. The known VRCOl-like antibody light-chain sequences are colored in light grey and the two synthesized sequences that show functional complementation with VRCOl-like heavy chains are highlighted in grey. FIG. 19 is a sequence alignment of CDR H3 classification of 35 expressed and experimentally tested heavy-chain sequences (SEQ ID NOs: 2502-2536) in the neutralization tree shown in FIG. 6, with the J gene of each CDR H3 class listed in parentheses. The Germline sequence, IGHV1- 2*02, is used as reference in sequence alignment and VRC-PG04 heavy-chain sequence is included for comparison. Amino acids in the variable region that are different from IGHV 1-2*02 are highlighted. Note that of the 35 sequences 22 showed neutralizing activity.
FIG. 20 is a sequence alignment of CDRH3 analysis of expressed heavy chain sequences from donor 74. CDRH3 and HJ alignments of nucleotide and amino acid for CDRH3 classes 1-6 sequences, aligned to the putative V, D and J germline genes. Putative nucleotide excisions are indicated with strikethrough lines. In grey are the putative TdT N additions in V-D and D-J junctions. In lighter grey are mutations from the putative germline genes and the TdT N additions.
FIG. 21 s a sequence alignment of CDRH3analysis of expressed heavy chain sequences from donor 74. CDRH3 and HJ alignments of nucleotide and amino acid for VRC-PG04, 04b and their clonally related sequences, aligned to the putative V, D and J germline genes. Putative nucleotide excisions are indicated with
strikethrough lines. In grey are the putative TdT N additions in V-D and D-J junctions. In grey are mutations from the putative germline genes and the TdT N additions. The alignment analysis suggested that the CDRH3 classes 7 and 8 might be clonally related, as indicated by conserved V-D and D-J junctions, despite that a deletion "." occurred in the CDRH3 region. The non -neutralizing sequences are shown in italic.
FIG. 22 is a sequence alignment of maturation intermediates in CDR H3 classes 3, 6, 7 and 8 shown in FIG. 7. The neutralizing heavy-chain sequences are highlighted in grey and CDR H3 region is circled by dotted line.
FIG. 23 shows the amino acid frequencies in the VH domains of VRC01- like neutralizing antibodies. Sequence alignment was generated for the VH domains of the twenty-two identified neutralizing sequences from donor 74, along with VRCOl, VRC02, VRC03, VRC-PG04, and VRC-PG04b. The amino acid frequencies for each of the VH residue positions were plotted using WebLogo. The height of each letter is proportional to the frequency with which the respective amino acid type is observed for the given residue position. The IGHV1- 2*02 germline sequence is shown for comparison; insertions with respect to IGHVl-2*02 were not included in this analysis. For each residue position, the amino acid identity of IGHV 1-2*02 is shown in maroon, while all other amino acid types are shown in black. Residue positions for which the IGHVl-2*02 identity is of low or zero frequency could indicate affinity maturation changes of functional significance.
FIG. 24 is Tables SI and S2.
FIG. 25 is Tables S3a and S3b.
FIG. 26 is Tables S3c and S3d.
FIG. 27 is Tables S3d, S3e and S3f.
FIG. 28 is Tables S3g and S3h.
FIG. 29 is Table S4.
FIG. 30 is Tables S5a and S5b.
FIG. 31 is Tables S5c and S5d.
FIG. 32 is Tables S6a and S6b.
FIG. 33 is Tables S6c and S6d.
FIG. 34 is Tables S7 and S8.
FIG. 35 is Tables S9 and S10.
FIG. 36 is Tables SI 1 and S 12. FIG. 37 is Tables S13 (SEQ ID NOs: 2537-2579) and S14 (SEQ ID NOs: 2580-2623).
FIG. 38 is Table S14 (SEQ ID NOs: 1707-1755) continued.
FIG. 39 is Tables S15 and S16.
FIG. 40 is a flow diagram of a build of a neighborhood joining tree.
FIG. 41 is a table showing the number of cross-donor positive sequence reads from an exemplary analysis.
FIG. 42 is a heat map plot showing identity to a VRC antibody and divergence from the respective germline.
FIG. 43A and 43B are table of results from all-origin and IGHVl-2 origin cross donor phylogenetic analyses.
FIG. 44 is a set of Venn diagrams of results from all-origin and IGHVl-2 origin cross donor phylogenetic analyses.
FIG. 45 is a set of Venn diagrams of rooted an non-rooted cross donor phylogenetic analyses.
FIG. 46 is a set of graphs of the percent of the segregated antibodies in a rooted analysis and an unrooted analysis.
FIG. 47 is a phylogenetic tree, wherein all native/reference heavy chain antibody nucleotide sequences segregate in a subtree that does not contain the germline sequence.
FIG. 48 is a phylogenetic tree. Rooting the tree on the IGHVl-2*02 changes the number of sequences in the cross donor phylogenetic analysis.
FIG. 49A is a table of data from an exemplary analysis.
FIG. 49B is a graph showing the percent of segregation to the initial input and the iteration of cross-donor runs.
FIG. 50A is a table of data from an exemplary analysis.
FIG. 50B is a graph showing the percent of segregation to the initial input and the iteration of cross-donor runs.
FIG. 51A is a table showing the overlap of results from all-origin and
IGHVl-2 origin cross donor phylogenetic analyses.
FIG. 51B is a Venn diagram showing the overlap of results from all-origin and IGHV1-2 origin cross donor phylogenetic analyses.
FIG. 52A is a table of data from an exemplary analysis, following 12 iterations.
FIG. 52B is a Venn diagram of the data from an exemplary analysis, following 12 iterations.
FIG. 53B is a set of plots of heat maps. The all-origin cross donor analysis identified >99 of the VRCOl -like antibodies from the input data set.
FIG. 54 is a graph showing the percent of antibodies with WGXG start positioning.
FIG. 55 is a set of bar graphs of V-gene family assignments.
FIG. 56 is a set of bar graphs of J-gene family assignments.
FIG. 57 is a table of data illustrating the number of correct germline assignments corresponding to the indicated germline assignment programs.
FIG. 58 is a set of bar graphs of V-gene family assignments.
FIG. 59 is a set of bar graphs of J-gene family assignments.
FIG. 60A-60D are tables showing HIV- 1 neutralization by the indicated complemented antibody heavy and light chains.
FIG. 61A is a graph showing results of a neutralization assay.
FIG. 61B is a heat mapshowing sequence identity and germline divergence of the indicated antibodies heavy chains.
FIG. 61C is a phylogenic tree of a cross-donor phylogenetic analysis.
FIG. 62A is a bar graph showing germline origin of sequence reads.
FIG. 62B are pie charts and graphical representations of relatedness.
FIG. 63 is a schematic diagram comparing methods for functional antibody identification.
FIG. 64 is a bar graph showing the read length distribution.
FIG. 65 is a phylogenic tree of a cross-donor phylogenetic analysis.
FIG. 66 is data from deep sequencing analysis of donor 200-384.
FIG. 67A is a table entitled "Structural Studies on CD4-Binding
Antibodies."
FIG. 67B is a diagram of the conformation of gpl20 when bound to VRCOl like antibodies.
FIG. 68 is the amino acid sequence of IGKV3-11*01 (SEQ ID NO: 2488) and VRCOl (SEQ ID NO: 1624) light chain.
FIG. 69 illustrates the light chain contact residues.
FIG. 70 is a diagram showing the position of the CDR L3 in bound form.
FIG. 71 is a schematic diagram showing a conformation of loop D and VRCOl CDRL1. VK3-11*01 (SEQ ID NO: 2488), VRCOl (SEQ ID NO: 1624), VK3-20*01 (SEQ ID NO: 2489), VRC03 (SEQ ID NO: 1626), VRC-PG04 (SEQ ID NO: 1619), VK1-33*01 (SEQ ID NO: 2490), VRC-CH31 (SEQ ID NO: 1622), 12A21 (SEQ ID NO: 2460), LV2-14*01 (SEQ ID NO: 2624), and VRC-PG20 (SEQ ID NO: 1652) sequences are shown.
FIG. 72 is a diagram of conserved residues in CDR L3 that stabilize loop D and base of V5 on gpl20. Partial VK3- 11*01 (SEQ ID NO: 2488), VRCOl (SEQ ID NO: 1624), VK3-20*01 (SEQ ID NO: 2489), VRC03 (SEQ ID NO: 1626), VRC- PG04 (SEQ ID NO: 1619), VK1-33*01 (SEQ ID NO: 2490), VRC-CH31 (SEQ ID NO: 1622) sequences are shown.
FIG. 73 is a table of data showing that the light chains of VRCOl -like antibodies have diverse origins.
FIG. 74 is a set of sequences and a schematic showing the conserved hydrophobic-Glu motif. VK3-11*01 (SEQ ID NO: 2488), VRCOl (SEQ ID NO: 1624), VK3-20*01 (SEQ ID NO: 2489), VRC03 (SEQ ID NO: 1626), VRC-PG04 (SEQ ID NO: 1619), VK1-33*01 (SEQ ID NO: 2490), VRC-CH31 (SEQ ID NO: 1622), LV2-14*01 (SEQ ID NO: 2624), VRC-PG20 (SEQ ID NO: 1652), 12A21 (SEQ ID NO: 2460), and 12A12 (SEQ ID NO: 2458), sequences are shown.
FIG. 75 is a bar graph of the distribution of sequence reads for a donor sample (donor 45).
FIG. 76 is a bar graph of the germline family distribution for a donor sample (donor 45).
FIG. 77 is a heat map for donor 45 deep sequencing. VRCOl -like light chains are labeled as black dots.
FIG. 78 is a bar graph of the distribution of sequence reads for a donor sample (donor 57). FIG. 79 is a bar graph of the germline family distribution for a donor sample (donor 57)
FIG. 80 is a heat map for donor 57 deep sequencing. VRCOl-like light chains are labeled as black dots.
SEQUENCE LISTING
The nucleic and amino acid sequences listed in the accompanying sequence listing are shown using standard letter abbreviations for nucleotide bases, and three letter code for amino acids, as defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood as included by any reference to the displayed strand. In the accompanying sequence listing:
SEQ ID NOs: 1-2 are the nucleic acid sequence of VRCOl-like heavy chains isolated from donor 45 (and deposited as GENBANK® Accession Numbers JN159474 - JN159475).
SEQ ID NOs: 3-4 are the nucleic acid sequence of VRCOl-like light chains isolated from donor 45 (and deposited as GENBANK® Accession Numbers JN159468 - JN159469).
SEQ ID NOs: 5-7 are the nucleic acid sequences of heavy chains from the VRC-CH30, VRC-CH32, and VRC-CH32 antibodies (and deposited as
GENBANK® Accession Numbers JN159468 - JN159469).
SEQ ID NOs: 8-10 are the nucleic acid sequences of light chains from the VRC-CH30, VRC-CH32, and VRC-CH32 antibodies.
SEQ ID NOs: 11-34 are the nucleic acid sequence of VRCOl-like heavy chains isolated from donor 74 (and deposited as GENBANK® Accession Numbers JN159440 - JN159463).
SEQ ID NOs: 35-36 are the nucleic acid sequences of light chains from the VRC-PG04 and VRC-PG04b antibodies (and deposited as GENBANK® Accession Numbers JN159466 - JN159467).
SEQ ID NOs: 37-38 are the nucleic acid sequences of heavy chains from the
VRC-PG04 and VRC-PG04b antibodies (and deposited as GENBANK® Accession Numbers JN159464 - JN159465). SEQ ID NOs: 39-40 are the nucleic acid sequences of heavy chains from the VRC-CH33 and VRC-CH34 antibodies (and deposited as GENBANK® Accession Numbers JN159470 -JN159471).
SEQ ID NOs: 41-42 are the nucleic acid sequences of light chains from the VRC-CH33 and VRC-CH34 antibodies (and deposited as GENBANK® Accession Numbers JN159472 -JN159473).
SEQ ID NOs: 43-1603 are the nucleic acid sequences of heavy chain sequences of exemplary VRCOl-like antibodies (and deposited as GENBANK® Accession Numbers JN157873-JN159433).
SEQ ID NOs: 1604-1608 are the nucleic acid sequences of the IGHV1-02,
* l-*05 germlines respectively.
SEQ ID NO: 1609 is the amino acid sequence of VRC-PG04 heavy chain.
SEQ ID NO: 1610 is the amino acid sequence of VRC-PG04b heavy chain.
SEQ ID NO: 1611 is the amino acid sequence of VRC-CH30 heavy chain.
SEQ ID NO: 1612 is the amino acid sequence of VRC-CH31 heavy chain.
SEQ ID NO: 1613 is the amino acid sequence of VRC-CH32 heavy chain.
SEQ ID NO: 1614 is the amino acid sequence of VRCOl heavy chain.
SEQ ID NO: 1615 is the amino acid sequence of VRC02 heavy chain.
SEQ ID NO: 1616 is the amino acid sequence of VRC03 heavy chain.
SEQ ID NO: 1617 is the nucleic acid sequence of a primer.
SEQ ID NO: 1618 is the nucleic acid sequence of a primer.
SEQ ID NO: 1619 is the amino acid sequence of VRC-PG04 kappa chain.
SEQ ID NO: 1620 is the amino acid sequence of VRC_PG04b kappa chain.
SEQ ID NO: 1621 is the amino acid sequence of VRC-CH30 kappa chain.
SEQ ID NO: 1622 is the amino acid sequence of VRC-CH31 kappa chain.
SEQ ID NO: 1623 is the amino acid sequence of VRC-CH32 kappa chain.
SEQ ID NO: 1624 is the amino acid sequence of VRCOl kappa chain.
SEQ ID NO: 1625 is the amino acid sequence of VRC02 kappa chain.
SEQ ID NO: 1626 is the amino acid sequence of VRC03 kappa chain.
SEQ ID NO: 1627 is the amino acid sequence of VRCOlb heavy chain.
SEQ ID NO: 1628 is the amino acid sequence of VRCOlb kappa chain.
SEQ ID NO: 1629 is the amino acid sequence of VRC03b heavy chain. SEQ ID NO: 1630 is the amino acid sequence of VRC03b kappa chain.
SEQ ID NO: 1631 is the amino acid sequence of VRC03c heavy chain.
SEQ ID NO: 1632 is the amino acid sequence of VRC06 heavy chain.
SEQ ID NO: 1633 is the amino acid sequence of VRC06 kappa chain. SEQ ID NO: 1634 is the amino acid sequence of VRC06b heavy chain.
SEQ ID NO: 1635 is the amino acid sequence of VRC06b kappa chain.
SEQ ID NO: 1636 is the amino acid sequence of VRC07 heavy chain.
SEQ ID NO: 1637 is the amino acid sequence of VRC07b heavy chain.
SEQ ID NO: 1638 is the amino acid sequence of VRC07b kappa chain. SEQ ID NO: 1639 is the amino acid sequence of VRC07c heavy chain.
SEQ ID NO: 1640 is the amino acid sequence of VRC07c kappa chain.
SEQ ID NO: 1641 is the amino acid sequence of VRC08 heavy chain.
SEQ ID NO: 1642 is the amino acid sequence of VRC08b heavy chain.
SEQ ID NO: 1643 is the amino acid sequence of VRC17 heavy chain. SEQ ID NO: 1644 is the amino acid sequence of VRC18 heavy chain.
SEQ ID NO: 1645 is the amino acid sequence of VRC18b heavy chain.
SEQ ID NO: 1646 is the amino acid sequence of VRC18b kappa chain.
SEQ ID NO: 1647 is the amino acid sequence of VRC-PG19 heavy chain.
SEQ ID NO: 1648 is the amino acid sequence of VRC-PG19 lambda chain. SEQ ID NO: 1649 is the amino acid sequence of VRC-PG19b heavy chain.
SEQ ID NO: 1650 is the amino acid sequence of VRC-PG19b lambda chain.
SEQ ID NO: 1651 is the amino acid sequence of VRC-PG20 heavy chain. SEQ ID NO: 1652 is the amino acid sequence of VRC-PG20 lambda chain. SEQ ID NO: 1653 is the amino acid sequence of VRC-PG20b heavy chain.
SEQ ID NO: 1654 is the amino acid sequence of VRC-PG20b lambda chain.
SEQ ID NO: 1655 is the amino acid sequence of VRC23 heavy chain. SEQ ID NO: 1656 is the amino acid sequence of VRC23 kappa chain. SEQ ID NO: 1657 is the amino acid sequence of VRC23b heavy chain.
SEQ ID NO: 1658 is the amino acid sequence of VRC23b kappa chain. SEQ ID NO: 1659 is the amino acid sequence of VRC-CH33 heavy chain. SEQ ID NO: 1660 is the amino acid sequence of VRC-CH33 kappa chain.
SEQ ID NO: 1661 is the amino acid sequence of VRC-CH34 heavy chain.
SEQ ID NO: 1662 is the amino acid sequence of VRC-CH34 kappa chain.
SEQ ID NO: 1663 is an exemplary nucleic acid sequence encoding VRC- PG04 heavy chain.
SEQ ID NO: 1664 is an exemplary nucleic acid sequence encoding VRC- PG04b heavy chain.
SEQ ID NO: 1665 is an exemplary nucleic acid sequence encoding VRC- CH30 heavy chain.
SEQ ID NO: 1666 is an exemplary nucleic acid sequence encoding VRC-
CH31 heavy chain.
SEQ ID NO: 1667 is an exemplary nucleic acid sequence encoding VRC- CH32 heavy chain.
SEQ ID NO: 1668 is an exemplary nucleic acid sequence encoding VRCOl heavy chain.
SEQ ID NO: 1669 is an exemplary nucleic acid sequence encoding VRC02 heavy chain.
SEQ ID NO: 1670 is an exemplary nucleic acid sequence encoding VRC03 heavy chain.
SEQ ID NO: 1671 is an exemplary nucleic acid sequence encoding VRC-
PG04 kappa chain.
SEQ ID NO: 1672 is an exemplary nucleic acid sequence encoding VRC_PG04b kappa chain.
SEQ ID NO: 1673 is an exemplary nucleic acid sequence encoding VRC- CH30 kappa chain.
SEQ ID NO: 1674 is an exemplary nucleic acid sequence encoding VRC- CH31 kappa chain.
SEQ ID NO: 1675 is an exemplary nucleic acid sequence encoding VRC- CH32 kappa chain.
SEQ ID NO: 1676 is an exemplary nucleic acid sequence encoding VRCOl kappa chain.
SEQ ID NO: 1677 is an exemplary nucleic acid sequence encoding VRC02 kappa chain.
SEQ ID NO: 1678 is an exemplary nucleic acid sequence encoding VRC03 kappa chain.
SEQ ID NO: 1679 is an exemplary nucleic acid sequence encoding VRCOlb heavy chain.
SEQ ID NO: 1680 is an exemplary nucleic acid sequence encoding VRCOlb kappa chain.
SEQ ID NO: 1681 is an exemplary nucleic acid sequence encoding VRC03b heavy chain.
SEQ ID NO: 1682 is an exemplary nucleic acid sequence encoding VRC03b kappa chain.
SEQ ID NO: 1683 is an exemplary nucleic acid sequence encoding VRC03c heavy chain.
SEQ ID NO: 1684 is an exemplary nucleic acid sequence encoding VRC06 heavy chain.
SEQ ID NO: 1685 is an exemplary nucleic acid sequence encoding VRC06 kappa chain.
SEQ ID NO: 1686 is an exemplary nucleic acid sequence encoding VRC06b heavy chain.
SEQ ID NO: 1687 is an exemplary nucleic acid sequence encoding VRC06b kappa chain.
SEQ ID NO: 1688 is an exemplary nucleic acid sequence encoding VRC07 heavy chain.
SEQ ID NO: 1689 is an exemplary nucleic acid sequence encoding VRC07b heavy chain.
SEQ ID NO: 1690 is an exemplary nucleic acid sequence encoding VRC07b kappa chain.
SEQ ID NO: 1691 is an exemplary nucleic acid sequence encoding VRC07c heavy chain.
SEQ ID NO: 1692 is an exemplary nucleic acid sequence encoding VRC07c kappa chain.
SEQ ID NO: 1693 is an exemplary nucleic acid sequence encoding VRC08 heavy chain.
SEQ ID NO: 1694 is an exemplary nucleic acid sequence encoding VRC08b heavy chain.
SEQ ID NO: 1695 is an exemplary nucleic acid sequence encoding VRC17 heavy chain.
SEQ ID NO: 1696 is an exemplary nucleic acid sequence encoding VRC18 heavy chain.
SEQ ID NO: 1697 is an exemplary nucleic acid sequence encoding VRC18b heavy chain.
SEQ ID NO: 1698 is an exemplary nucleic acid sequence encoding VRC18b kappa chain.
SEQ ID NO: 1699 is an exemplary nucleic acid sequence encoding VRC- PG19 heavy chain.
SEQ ID NO: 1700 is an exemplary nucleic acid sequence encoding VRC- PG19 lambda chain.
SEQ ID NO: 1701 is an exemplary nucleic acid sequence encoding VRC- PG19b heavy chain.
SEQ ID NO: 1702 is an exemplary nucleic acid sequence encoding VRC- PG19b lambda chain.
SEQ ID NO: 1703 is an exemplary nucleic acid sequence encoding VRC-
PG20 heavy chain.
SEQ ID NO: 1704 is an exemplary nucleic acid sequence encoding VRC- PG20 lambda chain.
SEQ ID NO: 1705 is an exemplary nucleic acid sequence encoding VRC- PG20b heavy chain.
SEQ ID NO: 1706 is an exemplary nucleic acid sequence encoding VRC- PG20b lambda chain.
SEQ ID NO: 1707 is an exemplary nucleic acid sequence encoding VRC23 heavy chain.
SEQ ID NO: 1708 is an exemplary nucleic acid sequence encoding VRC23 kappa chain.
SEQ ID NO: 1709 is an exemplary nucleic acid sequence encoding VRC23b heavy chain.
SEQ ID NO: 1710 is an exemplary nucleic acid sequence encoding VRC23b kappa chain.
SEQ ID NO: 1711 is an exemplary nucleic acid sequence encoding VRC- CH33 heavy chain.
SEQ ID NO: 1712 is an exemplary nucleic acid sequence encoding VRC- CH33 kappa chain.
SEQ ID NO: 1713 is an exemplary nucleic acid sequence encoding VRC- CH34 heavy chain.
SEQ ID NO: 1714 is an exemplary nucleic acid sequence encoding VRC- CH34 kappa chain.
SEQ ID NOs: 1715-2414 are amino acid sequence of the heavy chains of VRCOl-like antibodies (and correspond to SEQ ID NOs:760-1459 of PCT
International Application No. PCT/US2010/050295, filed September 24, 2010).
SEQ ID NO: 2415 is the amino acid sequence of VRC13 heavy chain.
SEQ ID NO: 2416 is the amino acid sequence of VRC13 lambda chain.
SEQ ID NO: 2417 is the amino acid sequence of VRC14 heavy chain.
SEQ ID NO: 2418 is the amino acid sequence of VRC14 lambda chain.
SEQ ID NO: 2419 is the amino acid sequence of VRC14b heavy chain.
SEQ ID NO: 2420 is the amino acid sequence of VRC14b lambda chain.
SEQ ID NO: 2421 is the amino acid sequence of VRC14c heavy chain.
SEQ ID NO: 2422 is the amino acid sequence of VRC14c lambda chain.
SEQ ID NO: 2423 is the amino acid sequence of VRC15 heavy chain.
SEQ ID NO: 2424 is the amino acid sequence of VRC15 lambda chain.
SEQ ID NO: 2425 is the amino acid sequence of VRC16 heavy chain.
SEQ ID NO: 2426 is the amino acid sequence of VRC16 kappa chain.
SEQ ID NO: 2427 is the amino acid sequence of VRC16b heavy chain.
SEQ ID NO: 2428 is the amino acid sequence of VRC16b kappa chain.
SEQ ID NO: 2429 is the amino acid sequence of VRC16c heavy chain.
SEQ ID NO: 2430 is the amino acid sequence of VRC16c kappa chain.
SEQ ID NO: 2431 is the amino acid sequence of VRC16d heavy chain.
SEQ ID NO: 2432 is the amino acid sequence of VRC16d kappa chain. SEQ ID NO: 2433 is an exemplary nucleic acid sequence encoding VRC13 heavy chain.
SEQ ID NO: 2434 is an exemplary nucleic acid sequence encoding VRC13 lambda chain.
SEQ ID NO: 2435 is an exemplary nucleic acid sequence encoding VRC14 heavy chain.
SEQ ID NO: 2436 is an exemplary nucleic acid sequence encoding VRC14 lambda chain.
SEQ ID NO: 2437 is an exemplary nucleic acid sequence encoding VRC14b heavy chain.
SEQ ID NO: 2438 is an exemplary nucleic acid sequence encoding VRC14b lambda chain.
SEQ ID NO: 2439 is an exemplary nucleic acid sequence encoding VRC14c heavy chain.
SEQ ID NO: 2440 is an exemplary nucleic acid sequence encoding VRC14c lambda chain.
SEQ ID NO: 2441 is an exemplary nucleic acid sequence encoding VRC15 heavy chain.
SEQ ID NO: 2442 is an exemplary nucleic acid sequence encoding VRC15 lambda chain.
SEQ ID NO: 2443 is an exemplary nucleic acid sequence encoding VRC16 heavy chain.
SEQ ID NO: 2444 is an exemplary nucleic acid sequence encoding VRC16 kappa chain.
SEQ ID NO: 2445 is an exemplary nucleic acid sequence encoding VRC16b heavy chain.
SEQ ID NO: 2446 is an exemplary nucleic acid sequence encoding VRC16b kappa chain.
SEQ ID NO: 2447 is an exemplary nucleic acid sequence encoding VRC16c heavy chain.
SEQ ID NO: 2448 is an exemplary nucleic acid sequence encoding VRC16c kappa chain. SEQ ID NO: 2449 is an exemplary nucleic acid sequence encoding VRC16d heavy chain.
SEQ ID NO: 2450 is an exemplary nucleic acid sequence encoding VRC kappa chain.
SEQ ID NO: 2451 is the amino acid sequence of NIH45_46 heavy chain.
SEQ ID NO: 2452 is the amino acid sequence of NIH45_46 kappa chain.
SEQ ID NO: 2453 is the amino acid sequence of 3BNC60 heavy chain.
SEQ ID NO: 2454 is the amino acid sequence of 3BNC60 kappa chain.
SEQ ID NO: 2455 is the amino acid sequence of 3BNC117 heavy chain.
SEQ ID NO: 2456 is the amino acid sequence of 3BNC117 kappa chain.
SEQ ID NO: 2457 is the amino acid sequence of 12A12 heavy chain.
SEQ ID NO: 2458 is the amino acid sequence of 12A12 kappa chain.
SEQ ID NO: 2459 is the amino acid sequence of 12A21 heavy chain.
SEQ ID NO: 2460 is the amino acid sequence of 12A21 kappa chain.
SEQ ID NO: 2461 is an exemplary nucleic acid sequence encoding
NIH45_46 heavy chain.
SEQ ID NO: 2462 is an exemplary nucleic acid sequence encoding
NIH45_46 kappa chain.
SEQ ID NO: 2463 is an exemplary nucleic acid sequence encoding 3BNC60 heavy chain.
SEQ ID NO: 2464 is an exemplary nucleic acid sequence encoding 3BNC60 kappa chain.
SEQ ID NO: 2465 is an exemplary nucleic acid sequence encoding
3BNC117 heavy chain.
SEQ ID NO: 2466 is an exemplary nucleic acid sequence encoding
3BNC117 kappa chain.
SEQ ID NO: 2467 is an exemplary nucleic acid sequence encoding 12A12 heavy chain.
SEQ ID NO: 2468 is an exemplary nucleic acid sequence encoding 12A12 kappa chain.
SEQ ID NO: 2469 is an exemplary nucleic acid sequence encoding 12A21 heavy chain. SEQ ID NO: 2470 is an exemplary nucleic acid sequence encoding 12A21 kappa chain.
SEQ ID NO: 2471 is the amino acid sequence of 1NC9 heavy chain.
SEQ ID NO: 2472 is the amino acid sequence of 1NC9 lambda chain.
SEQ ID NO: 2473 is the amino acid sequence of 1B2530 heavy chain.
SEQ ID NO: 2474 is the amino acid sequence of 1B2530 lambda chain.
SEQ ID NO: 2475 is the amino acid sequence of 8ANC131 heavy chain.
SEQ ID NO: 2476 is the amino acid sequence of 8ANC131 kappa chain.
SEQ ID NO: 2477 is the amino acid sequence of 8ANC134 heavy chain.
SEQ ID NO: 2478 is the amino acid sequence of 8ANC134 kappa chain.
SEQ ID NO: 2479 is an exemplary nucleic acid sequence encoding 1NC9 heavy chain.
SEQ ID NO: 2480 is an exemplary nucleic acid sequence encoding 1NC9 lambda chain.
SEQ ID NO: 2481 is an exemplary nucleic acid sequence encoding 1B2530 heavy chain.
SEQ ID NO: 2482 is an exemplary nucleic acid sequence encoding 1B2530 lambda chain.
SEQ ID NO: 2483 is an exemplary nucleic acid sequence encoding
8ANC131 heavy chain.
SEQ ID NO: 2484 is an exemplary nucleic acid sequence encoding
8ANC131 kappa chain.
SEQ ID NO: 2485 is an exemplary nucleic acid sequence encoding
8ANC134 heavy chain.
SEQ ID NO: 2486 is an exemplary nucleic acid sequence encoding
8ANC134 kappa chain.
SEQ ID NO: 2487 is the amino acid sequence of IGHVl-02*02 heavy chain.
SEQ ID NO: 2488 IS the amino acid sequence of IGKV3- 11*01 light chain. SEQ ID NO: 2489 is the amino acid sequence of IGKV3-20*01 light chain, SEQ ID NO: 2490 IS the amino acid sequence of IGKV3-33*01 light chain. SEQ ID NO: 2491 is the amino acid sequence of the lineage analysis of CDR H3 class 3.
SEQ ID NO: 2492 is the amino acid sequence of the lineage analysis of CDR H3 class 6.
SEQ ID NO: 2493 is the amino acid sequence of the lineage analysis of
CDR H3 class 7.
SEQ ID NO: 2494 is the amino acid sequence of the lineage analysis of CDR H3 class 8.
SEQ ID NOs: 2495-2501 are the nucleic acid sequences of primers.
SEQ ID NOs: 2502-2536 are amino acid sequences of CDR H3 of 35 expressed and experimentally tested heavy-chain sequences.
SEQ ID NOs: 2537-2623 are amino acid sequences of the heavy chain variable regions of exemplary VRCOl-like antibodies.
SEQ ID NO: 2624 is the amino acid sequence of LV2- 14*01.
The Sequence Listing is submitted as an ASCII text file in the form of the file named Sequence.txt, which was created on March 23, 2012, and is -1.9 MB bytes, which is incorporated by reference herein. DETAILED DESCRIPTION OF SEVERAL EMBODIMENTS
The isolation of broadly neutralizing antibodies (bNabs) against HIV-1 has generally relied on moderate-throughput screening procedures utilizing either micro- neutralization assays or antigen- specific B-cell sorting followed by cloning and sequencing of the expressed antibody transcripts. Despite marked success, these methods typically yield only a few antibodies per donor sample. Advances in next- generation sequencing (NGS) technology now make it possible to determine millions of antibody sequences from a sample of donor B cells, thus providing an overall view of the repertoire of antibodies expressed by the donor. Despite the availability of such data, using it to isolate specific antibodies presents significant challenges. A new method for analysis of antibody variable region deep sequencing data, cross-donor phylogenetic analysis (CDPA) which enables the identification of up to thousands of new antibody sequences related to a small set of known sequences provided as a search template is described. Additional methods for VRCOl-like antibody heavy and light chain identification are also provided, as well as novel VRCOl-like antibodies, heavy chains and light chain. /. Summary of Terms
Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in
Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN
0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710).
The singular terms "a," "an," and "the" include plural referents unless context clearly indicates otherwise. Similarly, the word "or" is intended to include "and" unless the context clearly indicates otherwise. The term "comprises" means "includes." In case of conflict, the present specification, including explanations of terms, will control.
To facilitate review of the various embodiments of this disclosure, the following explanations of terms are provided:
Administration: The introduction of a composition into a subject by a chosen route. Administration can be local or systemic. For example, if the chosen route is intravenous, the composition is administered by introducing the composition into a vein of the subject. In some examples a disclosed antibody specific for an HIV protein or polypeptide is administered to a subject.
Amino acid substitution: The replacement of one amino acid in peptide with a different amino acid.
Amplification: A technique that increases the number of copies of a nucleic acid molecule (such as an RNA or DNA). An example of amplification is the polymerase chain reaction, in which a biological sample is contacted with a pair of oligonucleotide primers, under conditions that allow for the hybridization of the primers to a nucleic acid template in the sample. The primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid. The product of amplification can be characterized by electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing using standard techniques. Other examples of amplification include strand displacement amplification, as disclosed in U.S. Patent No. 5,744,311;
transcription-free isothermal amplification, as disclosed in U.S. Patent No.
6,033,881; repair chain reaction amplification, as disclosed in WO 90/01069; ligase chain reaction amplification, as disclosed in EP-A-320 308; gap filling ligase chain reaction amplification, as disclosed in U.S. Patent No. 5,427,930; and NASBA™ RNA transcription-free amplification, as disclosed in U.S. Patent No. 6,025,134.
Animal: Living multi-cellular vertebrate organisms, a category that includes, for example, mammals and birds. The term mammal includes both human and non- human mammals. Similarly, the term "subject" includes both human and veterinary subjects.
Antibody: A polypeptide substantially encoded by an immunoglobulin gene or immunoglobulin genes, or antigen binding fragments thereof, which specifically binds and recognizes an analyte (antigen) such as gpl20, or an antigenic fragment of gpl20. Immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as the myriad immunoglobulin variable region genes.
Antibodies exist, for example as intact immunoglobulins and as a number of well characterized fragments produced by digestion with various peptidases. For instance, Fabs, Fvs, and single-chain Fvs (scFvs) that specifically bind to gpl20 or fragments of gpl20 would be gp 120 -specific binding agents. A scFv protein is a fusion protein in which a light chain variable region of an immunoglobulin and a heavy chain variable region of an immunoglobulin are bound by a linker, while in dsFvs, the chains have been mutated to introduce a disulfide bond to stabilize the association of the chains. The term also includes genetically engineered forms such as chimeric antibodies (such as humanized murine antibodies), heteroconjugate antibodies such as bispecific antibodies). See also, Pierce Catalog and Handbook, 1994-1995 (Pierce Chemical Co., Rockford, IL); Kuby, J., Immunology, 3rd Ed., W.H. Freeman & Co., New York, 1997. Antibody fragments are defined as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; (4) F(ab')2, a dimer of two Fab' fragments held together by two disulfide bonds; (5) Fv, a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (6) single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain, the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. The term
"antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies.
Typically, a naturally occurring immunoglobulin has heavy (H) chains and light (L) chains interconnected by disulfide bonds. There are two types of light chain, lambda (λ) and kappa (κ). There are five main heavy chain classes (or isotypes) which determine the functional activity of an antibody molecule: IgM, IgD, IgG, IgA and IgE.
Each heavy and light chain contains a constant region and a variable region, (the regions are also known as "domains"). In combination, the heavy and the light chain variable regions specifically bind the antigen. Light and heavy chain variable regions contain a "framework" region interrupted by three hypervariable regions, also called "complementarity-determining regions" or "CDRs." The extent of the framework region and CDRs have been defined (see, Kabat et ah, Sequences of Proteins of Immunological Interest, U.S. Department of Health and Human
Services, 1991, which is hereby incorporated by reference in its entirety). Thus one of ordinary skill in the art will recognize the numbering of the residues in the disclosed antibodies is made with reference to the Kabat convention. The Kabat database is now maintained online. The sequences of the framework regions of different light or heavy chains are relatively conserved within a species. The framework region of an antibody, that is the combined framework regions of the constituent light and heavy chains, serves to position and align the CDRs in three- dimensional space.
The CDRs are primarily responsible for binding to an epitope of an antigen. The CDRs of each chain are typically referred to as CDRl, CDR2, and CDR3, numbered sequentially starting from the N-terminus, and are also typically identified by the chain in which the particular CDR is located. Thus, a VH CDR3 is located in the variable domain of the heavy chain of the antibody in which it is found, whereas a VL CDRl is the CDRl from the variable domain of the light chain of the antibody in which it is found. Light chain CDRs are sometimes referred to as CDR LI, CDR L2, and CDR L3. Heavy chain CDRs are sometimes referred to as CDR HI, CDR H2, and CDR H3.
References to "VH" or "VH" refer to the variable region of an
immunoglobulin heavy chain, including that of an antibody fragment, such as Fv, scFv, dsFv or Fab. References to "VL" or "VL" refer to the variable region of an immunoglobulin light chain, including that of an Fv, scFv, dsFv or Fab.
A "monoclonal antibody" is an antibody produced by a single clone of B-lymphocytes or by a cell into which the light and heavy chain genes of a single antibody have been transfected. Monoclonal antibodies are produced by methods known to those of skill in the art, for instance by making hybrid antibody-forming cells from a fusion of myeloma cells with immune spleen cells. These fused cells and their progeny are termed "hybridomas." Monoclonal antibodies include humanized monoclonal antibodies. In some examples monoclonal antibodies are isolated from a subject. The amino acid sequences of such isolated monoclonal antibodies can be determined.
A "humanized" immunoglobulin is an immunoglobulin including a human framework region and one or more CDRs from a non-human (such as a mouse, rat, or synthetic) immunoglobulin. The non-human immunoglobulin providing the
CDRs is termed a "donor," and the human immunoglobulin providing the framework is termed an "acceptor." In one embodiment, all the CDRs are from the donor immunoglobulin in a humanized immunoglobulin. Constant regions need not be present, but if they are, they must be substantially identical to human
immunoglobulin constant regions, such as at least about 85-90%, such as about 95% or more identical. Hence, all parts of a humanized immunoglobulin, except possibly the CDRs, are substantially identical to corresponding parts of natural human immunoglobulin sequences. A "humanized antibody" is an antibody comprising a humanized light chain and a humanized heavy chain immunoglobulin. A humanized antibody binds to the same antigen as the donor antibody that provides the CDRs. The acceptor framework of a humanized immunoglobulin or antibody may have a limited number of substitutions by amino acids taken from the donor framework. Humanized or other monoclonal antibodies can have additional conservative amino acid substitutions which have substantially no effect on antigen binding or other immunoglobulin functions. Humanized immunoglobulins can be constructed by means of genetic engineering (for example, see U.S. Patent No. 5,585,089).
A "neutralizing antibody" is an antibody which reduces the infectious titer of an infectious agent by binding to a specific antigen on the infectious agent. In some examples the infectious agent is a virus. In some examples, an antibody that is specific for gpl20 neutralizes the infectious titer of HIV.
Antibodyome: The entire repertoire of expressed antibody heavy and light chain sequence in an individual. The individual can be an individual infected with a pathogen, for example HIV.
Antigen: A compound, composition, or substance that can stimulate the production of antibodies or a T cell response in an animal, including compositions that are injected or absorbed into an animal. An antigen reacts with the products of specific humoral or cellular immunity, including those induced by heterologous antigens, such as the disclosed antigens. "Epitope" or "antigenic determinant" refers to the region of an antigen to which B and/or T cells respond. In one embodiment, T cells respond to the epitope, when the epitope is presented in conjunction with an MHC molecule. Epitopes can be formed both from contiguous amino acids or noncontiguous amino acids juxtaposed by tertiary folding of a protein. Epitopes formed from contiguous amino acids are typically retained on exposure to denaturing solvents whereas epitopes formed by tertiary folding are typically lost on treatment with denaturing solvents. An epitope typically includes at least 3, and more usually, at least 5, about 9, or about 8-10 amino acids in a unique spatial conformation. Methods of determining spatial conformation of epitopes include, for example, x-ray crystallography and nuclear magnetic resonance.
Examples of antigens include, but are not limited to, peptides, lipids, polysaccharides, and nucleic acids containing antigenic determinants, such as those recognized by an immune cell. In some examples, antigens include peptides derived from a pathogen of interest. Exemplary pathogens include bacteria, fungi, viruses and parasites. In specific examples, an antigen is derived from HIV, such as a gpl20 polypeptide or antigenic fragment thereof, such as a gpl20 outer domain or fragment thereof.
A "target epitope" is a specific epitope on an antigen that specifically binds an antibody of interest, such as a monoclonal antibody. In some examples, a target epitope includes the amino acid residues that contact the antibody of interest, such that the target epitope can be selected by the amino acid residues determined to be in contact with the antibody of interest.
Antigenic surface: A surface of a molecule, for example a protein such as a gpl20 protein or polypeptide, capable of eliciting an immune response. An antigenic surface includes the defining features of that surface, for example the three- dimensional shape and the surface charge. An antigenic surface includes both surfaces that occur on gpl20 polypeptides as well as surfaces of compounds that mimic the surface of a gpl20 polypeptide (mimetics). In some examples, an antigenic surface includes all or part of the surface of gpl20 that binds to the CD4 receptor.
Atomic Coordinates or Structure coordinates: Mathematical coordinates derived from mathematical equations related to the patterns obtained on diffraction of a monochromatic beam of X-rays by the atoms (scattering centers) such as an antigen, or an antigen in complex with an antibody. In some examples that antigen can be gpl20, a gp 120: antibody complex, or combinations thereof in a crystal. The diffraction data are used to calculate an electron density map of the repeating unit of the crystal. The electron density maps are used to establish the positions of the individual atoms within the unit cell of the crystal. In one example, the term "structure coordinates" refers to Cartesian coordinates derived from mathematical equations related to the patterns obtained on diffraction of a monochromatic beam of X-rays, such as by the atoms of a gpl20 in crystal form.
Those of ordinary skill in the art understand that a set of structure
coordinates determined by X-ray crystallography is not without standard error. For the purpose of this disclosure, any set of structure coordinates that have a root mean square deviation of protein backbone atoms (N, Ca, C and 0) of less than about 1.0 Angstroms when superimposed, such as about 0.75, or about 0.5, or about 0.25 Angstroms, using backbone atoms, shall (in the absence of an explicit statement to the contrary) be considered identical.
Binding affinity: Affinity of an antibody or antigen binding fragment thereof for an antigen. In one embodiment, affinity is calculated by a modification of the Scatchard method described by Frankel et al., Mol. Immunol., 16: 101-106, 1979. In another embodiment, binding affinity is measured by an antigen/antibody dissociation rate. In yet another embodiment, a high binding affinity is measured by a competition radioimmunoassay. In several examples, a high binding affinity is at least about 1 x 10 -"8 M. In other embodiments, a high binding affinity is at least about 1.5 x 10"8, at least about 2.0 x 10"8, at least about 2.5 x 10"8, at least about 3.0 x 10"8, at least about 3.5 x 10 -"8 , at least about 4.0 x 10 -"8 , at least about 4.5 x 10 -"8 , or at least about 5.0 x 10"8 M.
Broadly Neutralizing Antibody: An antibody that binds to and inhibits the function of related antigens, such as antigens that share 85%, 90%, 95%, 96%, 97%, 98% or 99% identity antigenic surface of antigen. With regard to an antigen from a pathogen, such as a virus, the antibody can bind to and inhibit the function of an antigen from more than one class and/or subclass of the pathogen. For example, with regard to a human immunodeficiency virus, the antibody can bind to and inhibit the function of an antigen, such as gpl20 from more than one clade. In one
embodiment, broadly neutralizing antibodies to HIV are distinct from other antibodies to HIV in that they neutralize a high percentage of the many types of HIV in circulation.
CD4: Cluster of differentiation factor 4 polypeptide; a T-cell surface protein that mediates interaction with the MHC class II molecule. CD4 also serves as the primary receptor site for HIV on T-cells during HIV-I infection. CD4 is known to bind to gpl20 from HIV. The known sequence of the CD4 precursor has a hydrophobic signal peptide, an extracellular region of approximately 370 amino acids, a highly hydrophobic stretch with significant identity to the membrane- spanning domain of the class II MHC beta chain, and a highly charged intracellular sequence of 40 resides (Maddon, Cell 42:93, 1985).
CD4BS antibodies: Antibodies that bind to or substantially overlap the CD4 binding surface of a gpl20 polypeptide. The antibodies interfere with or prevent CD4 from binding to a gpl20 polypeptide.
Chimeric antibody: An antibody which includes sequences derived from two different antibodies, which typically are of different species. In some examples, a chimeric antibody includes one or more CDRs and/or framework regions from one human antibody and CDRs and/or framework regions from another human antibody.
Contacting: Placement in direct physical association; includes both in solid and liquid form, which can take place either in vivo or in vitro. Contacting includes contact between one molecule and another molecule, for example the amino acid on the surface of one polypeptide, such as an antigen, that contacts another polypeptide, such as an antibody. Contacting can also include contacting a cell for example by placing an antibody in direct physical association with a cell.
Computer readable media: Any medium or media, which can be read and accessed directly by a computer, so that the media is suitable for use in a computer system. Such media include, but are not limited to: magnetic storage media such as floppy discs, hard disc storage medium and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
Computer system: Hardware that can be used to analyze atomic coordinate data and/or design an antigen using atomic coordinate data or to analyze an amino acid or nucleic acid sequence, for example to compare two or more sequences an calculate sequence similarity and/or divergence. The minimum hardware of a computer-based system typically comprises a central processing unit (CPU), an input device, for example a mouse, keyboard, and the like, an output device, and a data storage device. Desirably a monitor is provided to visualize structure data. The data storage device may be RAM or other means for accessing computer readable. Examples of such systems are microcomputer workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based Windows NT or IBM OS/2 operating systems.
Cross donor complementation: Formation of an antibody using a heavy chain variable domain of an antibody that specifically binds an epitope of an antigen of interest from first donor and a light chain variable domain of an antibody that specifically binds the same epitope from a second donor, wherein the antibody that is formed from the heavy chain variable domain and the light chain variable domain retains its ability to bind the epitope and wherein the first and the second donor are different antibodies. Thus, in cross donor complementation, the light chain variable domains and the heavy chain variable domains that form an antibody are from different sources, but the chimeric antibody that is formed still binds the epitope. In some embodiments, the different antibodies are from different subjects. In one embodiment, the antigen is gpl20. In another embodiment, the epitope is RSC3. In a further embodiment, the heavy chain variable domain or the light chain variable domain is the VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 heavy chain or light chain variable domain. In another embodiment, the heavy chain variable domain or the light chain variable domain is the VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC 134 heavy chain or light chain variable domain and the light chain variable domain or the heavy chain variable domain, respectively, is not from VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
Dendrogram: A diagrammatic representation of a phylogenetic tree.
DNA Maximum Likelihood: A method for constructing phylogenetic trees of nucleic acid sequences under the constraint that the phylogenetic trees must be consistent with a molecular clock. The molecular clock is the assumption that the tips of the tree are all equidistant, in branch length, from its root. The computer program and several embodiments of the method are disclosed on the DNA MLK website (version 3.5c, copyright 1986-1993, incorporated herein by reference). The assumptions of the model are:
1. Each site in the sequence evolves independently.
2. Different lineages evolve independently.
3. There is a molecular clock.
4. Each site undergoes substitution at an expected rate which is chosen from a series of rates (each with a probability of occurrence) which we specify.
5. All relevant sites are included in the sequence, not just those that have
changed or those that are "phylogenetically informative."
6. A substitution consists of one of two sorts of events:
o The first kind of event consists of the replacement of the existing base by a base drawn from a pool of purines or a pool of pyrimidines
(depending on whether the base being replaced was a purine or a pyrimidine). It can lead either to no change or to a transition.
o The second kind of event consists of the replacement of the existing base by a base drawn at random from a pool of bases at known frequencies, independently of the identity of the base which is being replaced. This could lead either to a no change, to a transition or to a transversion.
The ratio of the two purines in the purine replacement pool is the same as their ratio in the overall pool, and similarly for the pyrimidines.
DNA sequencing: The process of determining the nucleotide order of a given DNA molecule. The general characteristics of "deep sequencing" are that genetic material is amplified, such as by polymerase chain reaction, and then the amplified products are ligated to a solid surface. The sequence of the amplified target genetic material is then performed in parallel and the sequence information is captured by a computer. Generally, the sequencing can be performed using automated Sanger sequencing (AB 13730x1 genome analyzer), pyrosequencing on a solid support (454 sequencing, Roche), sequencing-by- synthesis with reversible terminations (ILLUMINA® Genome Analyzer), sequencing-by-ligation (ABI
SOLiD®) or sequencing-by-synthesis with virtual terminators (HELISCOPE®). . In some embodiments, DNA sequencing is performed using a chain termination method developed by Frederick Sanger, and thus termed "Sanger based sequencing" or "SBS." This technique uses sequence- specific termination of a DNA synthesis reaction using modified nucleotide substrates. Extension is initiated at a specific site on the template DNA by using a short oligonucleotide primer complementary to the template at that region. The oligonucleotide primer is extended using DNA polymerase in the presence of the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related
DNA fragments that are terminated only at positions where that particular nucleotide is present. The fragments are then size-separated by electrophoresis a
polyacrylamide gel, or in a narrow glass tube (capillary) filled with a viscous polymer. An alternative to using a labeled primer is to use labeled terminators instead; this method is commonly called "dye terminator sequencing."
"Pyrosequencing" is an array based method, which has been commercialized by 454 Life Sciences. In some embodiments of the array-based methods, single- stranded DNA is annealed to beads and amplified via EmPCR®. These DNA-bound beads are then placed into wells on a fiber-optic chip along with enzymes that produce light in the presence of ATP. When free nucleotides are washed over this chip, light is produced as the PCR amplification occurs and ATP is generated when nucleotides join with their complementary base pairs. Addition of one (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded, such as by the charge coupled device (CCD) camera, within the instrument. The signal strength is proportional to the number of nucleotides, for example, homopolymer stretches, incorporated in a single nucleotide flow.
Epitope: An antigenic determinant. These are particular chemical groups or peptide sequences on a molecule that are antigenic, i.e. that elicit a specific immune response. An antibody specifically binds a particular antigenic epitope on a polypeptide. In some examples a disclosed antibody specifically binds to an epitope on the surface of gpl20 from HIV, such as the CD4 binding site on the surface of gpl20. Established VRCOl-like antibody, heavy chain or light chain: A
VRCOl -like antibody or a heavy chain or light chain that can complement with a corresponding heavy chain or light chain from VRCOl, as specifically defined herein. As used herein, "established VRCOl-like" antibody, heavy chain or light chain refers to the following antibodies, heavy chains or light chains:
VRCOl-like antibodies, heavy chains and light chains disclosed in Wu et al., "Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1," Science, 329(5993):856-861, 2012, which is incorporated by reference herein. These include heavy and light chains sequences of antibodies VRCOl (SEQ ID NO: 1614 and SEQ ID NO: 1624, respectively), VRC02 (SEQ ID NO: 1615 and SEQ ID NO: 1625, respectively) and VRC03 (SEQ ID NO: 1616 and SEQ ID NO: 1626, respectively).
VRCOl-like antibodies, heavy chains and light chains disclosed in PCT International Application No. PCT/US2010/050295, filed September 24, 2010, which is incorporated by reference herein. These include heavy and light chains of the VRCOl (SEQ ID NO: 1614 and SEQ ID NO: 1624, respectively), VRC02 (SEQ ID NO: 1615 and SEQ ID NO: 1625, respectively) and VRC03 (SEQ ID NO: 1616 and SEQ ID NO: 1626, respectively) antibodies and 700 additional VRCOl-like heavy chains (SEQ ID NO: 1715-2414).
VRCOl-like antibodies, heavy chains and light chains disclosed in Scheid et al., "Sequence and structural convergence of broad and potent HIV antibodies that mimic CD4 binding," Science, 333(6049): 1633-1637, 2011, incorporated by reference herein. These include the heavy and light chains of the 3BNC117, 3BNC60, 12A12, 12A21, NIH45-46, 8ANC131, 8ANC134, 1B2530, 1NC9 antibodies (corresponding SEQ ID NOs. and/or Accession Nos. shown in Table 1, below) and up to 567 other clonal related antibodies, including those listed in Figures S3, S13, S14 and Table S8 of Scheid et al., which are specifically incorporated by reference herein.
Certain VRCOl-like antibodies, heavy chains and light chains disclosed in Wu et al., "Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing," Science, 333(6049): 1593-1602, 2011, incorporated herein by reference. These certain VRCOl-like antibodies, heavy chains and light chains include the heavy and light chains of the VRC-PG04 and VRC-PG04b antibodies (GENBANK® Accession Nos. JN159464 to JN159467, respectively), VRC-CH30, VRC-CH31, and VRC-CH32 antibodies (GENBANK® Accession Nos. JN159434 to JN159439, respectively), and VRC-CH33 and VRC-CH34 antibodies (GENBANK® Accession Nos. JN159470 to 159473, respectively)
(corresponding SEQ ID NOs for the heavy and light chains of these antibodies are shown in Table 1). These certain VRCOl-like antibodies, heavy chains and light chains also include 24 heavy chains from donor 74, 2008 (GENBANK® Accession Nos. JN159440 to JN159463), two heavy chains from donor 45, 2008
(GENBANK® Accession Nos. JN159474 and JN159475) and two light chains from donor 45, 2001 (GENBANK® Accession Nos. JN159468 and JN159469). These certain VRCOl-like antibodies, heavy chains and light chains also include 1561 unique sequences associated with neutralizing CDR H3 distributions with at least one low divergent member shown in Fig. 6B and Fig. S16 of Wu et ah, Science, 333(6049): 1593-1602, 2011 (GENBANK® Accession Nos. JN157873 to JN159433, respectively).
VRCOl-like antibodies, heavy chains and light chains disclosed in Diskin et ah, "Increasing the potency and breadth of an HIV antibody by using structure- based rational design," Science, 334(6060): 1289-93, 2011, incorporated by reference herein. These include the heavy and light chains of the NIH45-46 antibody with a G54W amino acid substitution (Kabat numbering) in the heavy chain variable domain.
All the Accession Nos. discussed in this definition of "established VRCOl- like antibody, heavy chain or light chain," are incorporated by reference as available on March 21, 2012.
Table 1. Established VRCOl-like antibodies (all Accession Nos. incorporated by reference as available on March 21, 2012).
Figure imgf000043_0001
VRC-CH30 SEQ ID NO: 1611 SEQ ID NO: 1621
VRC-CH31 SEQ ID NO: 1612 SEQ ID NO: 1622
VRC-CH32 SEQ ID NO: 1613 SEQ ID NO: 1623
3BNC117 EMBL Acc. No. HE584537 EMBL Acc. No. HE584538
3BNC60 EMBL Acc. No. HE584535 EMBL Acc. No. HE584536
12A12 EMBL Acc. No. HE584539 EMBL Acc. No. HE584540
12A21 EMBL Acc. No. HE584541 EMBL Acc. No. HE584542
NIH45-46 EMBL Acc. No. HE584543 EMBL Acc. No. HE584544
8ANC131 EMBL Acc. No. HE584540 EMBL Acc. No. HE584550
8ANC134 EMBL Acc. No. HE584551 EMBL Acc. No. HE584552
1B2530 EMBL Acc. No. HE584545 EMBL Acc. No. HE584546
1NC9 EMBL Acc. No. HE584547 EMBL Acc. No. HE584548
Framework Region: Amino acid sequences interposed between CDRs. Includes variable light and variable heavy framework regions. The framework regions serve to hold the CDRs in an appropriate orientation for antigen binding.
Fc polypeptide: The polypeptide comprising the constant region of an antibody excluding the first constant region immunoglobulin domain. Fc region generally refers to the last two constant region immunoglobulin domains of IgA, IgD, and IgG, and the last three constant region immunoglobulin domains of IgE and IgM. An Fc region may also include part or all of the flexible hinge N-terminal to these domains. For IgA and IgM, an Fc region may or may not comprise the tailpiece, and may or may not be bound by the J chain. For IgG, the Fc region comprises immunoglobulin domains Cgamma2 and Cgamma3 (Cy2 and Cy3) and the lower part of the hinge between Cgammal (Cyl) and Cy2. Although the boundaries of the Fc region may vary, the human IgG heavy chain Fc region is usually defined to comprise residues C226 or P230 to its carboxyl-terminus, wherein the numbering is according to the EU index as in Kabat. For IgA, the Fc region comprises immunoglobulin domains Calpha2 and Calpha3 (Ca2 and Ca3) and the lower part of the hinge between Calphal (Cal) and Ca2. Encompassed within the definition of the Fc region are functionally equivalent analogs and variants of the Fc region. A functionally equivalent analog of the Fc region may be a variant Fc region, comprising one or more amino acid modifications relative to the wild-type or naturally existing Fc region. Variant Fc regions will possess at least 50% homology with a naturally existing Fc region, such as about 80%, and about 90%, or at least about 95% homology. Functionally equivalent analogs of the Fc region may comprise one or more amino acid residues added to or deleted from the N- or C- termini of the protein, such as no more than 30 or no more than 10 additions and/or deletions. Functionally equivalent analogs of the Fc region include Fc regions operably linked to a fusion partner. Functionally equivalent analogs of the Fc region must comprise the majority of all of the Ig domains that compose Fc region as defined above; for example IgG and IgA Fc regions as defined herein must comprise the majority of the sequence encoding CH2 and the majority of the sequence encoding CH3. Thus, the CH2 domain on its own, or the CH domain on its own, are not considered Fc region. The Fc region may refer to this region in isolation, or this region in the context of an Fc fusion polypeptide.
Furin: A calcium dependent serine endoprotease that cleaves precursor proteins at paired basic amino acid processing sites. In vivo, substrates of furin include proparathyroid hormone, proablumin, and von Willebrand factor. Furin can also cleave HIV envelope protein gpl60 into gpl20 and gp41. Furin cleaves proteins just downstream of a basic amino acid target sequence (canonically, Arg-X- (Arg/Lys) -Arg'). Thus, this amino acid sequence is a furin cleavage site.
gpl20: An envelope protein from Human Immunodeficiency Virus (HIV). This envelope protein is initially synthesized as a longer precursor protein of 845- 870 amino acids in size, designated gpl60. gpl60 is cleaved by a cellular protease into gpl20 and gp41. gpl20 contains most of the external, surface-exposed, domains of the HIV envelope glycoprotein complex, and it is gpl20 which binds both to cellular CD4 receptors and to cellular chemokine receptors (such as CCR5).
The mature gpl20 wildtype polypeptides have about 500 amino acids in the primary sequence. gpl20 is heavily N-glycosylated giving rise to an apparent molecular weight of 120 kD. The polypeptide is comprised of five conserved regions (C1-C5) and five regions of high variability (V1-V5). Exemplary sequence of wt gpl20 polypeptides are shown on GENBANK®, for example accession numbers AAB05604 and AAD12142 (as available on August 10, 2011),
incorporated by reference herein. It is understood that there are numerous variation in the sequence of gpl20 from what is given in GENBANK®, for example accession numbers AAB05604 and AAD12142, and that these variants are skill recognized in the art as gpl20.
The gpl20 core has a molecular structure, which includes two domains: an "inner" domain (which faces gp41) and an "outer" domain (which is mostly exposed on the surface of the oligomeric envelope glycoprotein complex). The two gpl20 domains are separated by a "bridging sheet" that is not part of either domain. The gpl20 core comprises 25 beta strands, 5 alpha helices, and 10 defined loop segments.
The third variable region referred to herein as the V3 loop is a loop of about 35 amino acids critical for the binding of the co-receptor and determination of which of the co-receptors will bind. In certain examples the V3 loop comprises residues 296-331.
The numbering used in gpl20 polypeptides disclosed herein is relative to the HXB2 numbering scheme as set forth in Numbering Positions in HIV Relative to HXB2CG Bette Korber et al, Human Retroviruses and AIDS 1998: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences. Korber B, Kuiken CL, Foley B, Hahn B, McCutchan F, Mellors JW, and Sodroski J, Eds. Theoretical
Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM which is incorporated by reference herein in its entirety.
Host cells: Cells in which a vector can be propagated and its DNA expressed, for example a disclosed antibody can be expressed in a host cell. The cell may be prokaryotic or eukaryotic. The term also includes any progeny of the subject host cell. It is understood that all progeny may not be identical to the parental cell since there may be mutations that occur during replication. However, such progeny are included when the term "host cell" is used.
Immunoadhesin: A molecular fusion of a protein with the Fc region of an immunoglobulin, wherein the immunoglobulin retains specific properties, such as Fc receptor binding and increased half-life. An Fc fusion combines the Fc region of an immunoglobulin with a fusion partner, which in general can be any protein, polypeptide, peptide, or small molecule. In one example, and immunoadhesin includes the hinge, CH2, and CH3 domains of the immunoglobulin gamma 1 heavy chain constant region. In another example, the immunoadhesin includes the CH2, and CH3 domains of an IgG.
Immunologically reactive conditions: Includes reference to conditions which allow an antibody raised against a particular epitope to bind to that epitope to a detectably greater degree than, and/or to the substantial exclusion of, binding to substantially all other epitopes. Immunologically reactive conditions are dependent upon the format of the antibody binding reaction and typically are those utilized in immunoassay protocols or those conditions encountered in vivo. See Harlow & Lane, supra, for a description of immunoassay formats and conditions. The immunologically reactive conditions employed in the methods are "physiological conditions" which include reference to conditions (e.g., temperature, osmolarity, pH) that are typical inside a living mammal or a mammalian cell. While it is recognized that some organs are subject to extreme conditions, the intra-organismal and intracellular environment normally lies around pH 7 (e.g., from pH 6.0 to pH 8.0, more typically pH 6.5 to 7.5), contains water as the predominant solvent, and exists at a temperature above 0°C and below 50°C. Osmolarity is within the range that is supportive of cell viability and proliferation.
IgA: A polypeptide belonging to the class of antibodies that are substantially encoded by a recognized immunoglobulin alpha gene. In humans, this class or isotype comprises IgAi and IgA2. IgA antibodies can exist as monomers, polymers (referred to as plgA) of predominantly dimeric form, and secretory IgA. The constant chain of wild- type IgA contains an 18-amino-acid extension at its C- terminus called the tail piece (tp). Polymeric IgA is secreted by plasma cells with a 15-kDa peptide called the J chain linking two monomers of IgA through the conserved cysteine residue in the tail piece.
IgG: A polypeptide belonging to the class or isotype of antibodies that are substantially encoded by a recognized immunoglobulin gamma gene. In humans, this class comprises IgG1; IgG2, IgG3, and IgG4. In mice, this class comprises IgG1; IgG2a, IgG2b, IgG3. Inhibiting or treating a disease: Inhibiting the full development of a disease or condition, for example, in a subject who is at risk for a disease such as acquired immunodeficiency syndrome (AIDS). "Treatment" refers to a therapeutic intervention that ameliorates a sign or symptom of a disease or pathological condition after it has begun to develop. The term "ameliorating," with reference to a disease or pathological condition, refers to any observable beneficial effect of the treatment. The beneficial effect can be evidenced, for example, by a delayed onset of clinical symptoms of the disease in a susceptible subject, a reduction in severity of some or all clinical symptoms of the disease, a slower progression of the disease, a reduction in the viral load, an improvement in the overall health or well-being of the subject, or by other parameters well known in the art that are specific to the particular disease. A "prophylactic" treatment is a treatment administered to a subject who does not exhibit signs of a disease or exhibits only early signs for the purpose of decreasing the risk of developing pathology.
Isolated: An "isolated" biological component (such as a cell, for example a
B-cell, a nucleic acid, peptide, protein, heavy chain domain or antibody) has been substantially separated, produced apart from, or purified away from other biological components in the cell of the organism in which the component naturally occurs, such as, other chromosomal and extrachromosomal DNA and RNA, and proteins. Nucleic acids, peptides and proteins which have been "isolated" thus include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids, peptides, and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids. In some examples an antibody, such as an antibody specific for gpl20 can be isolated, for example isolated from a subject infected with HIV.
In silico: A process performed virtually within a computer. For example, using a computer, a virtual compound can be screened for surface similarity or conversely surface complementarity to a virtual representation of the atomic positions at least a portion of a gpl20 polypeptide, a gpl20 polypeptide in complex with an antibody.
Ka: The dissociation constant for a given interaction, such as a polypeptide ligand interaction or an antibody antigen interaction. For example, for the bimolecular interaction of an antibody (such as any of the antibodies disclosed herein) and an antigen (such as gpl20) it is the concentration of the individual components of the bimolecular interaction divided by the concentration of the complex.
Label: A detectable compound or composition that is conjugated directly or indirectly to another molecule, such as an antibody or a protein, to facilitate detection of that molecule. Specific, non-limiting examples of labels include fluorescent tags, enzymatic linkages, and radioactive isotopes. In some examples, a disclosed antibody as labeled.
Neighborhood Joining: A method of constructing phylogenetic trees that finds pairs of operational taxonomic units (OTUs, also called "neighbors") that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree.
Several embodiments of the neighbor joining method are disclosed in detail in Saitou and Nei, Mol Biol Evol. 1987 Jul;4(4):406-25, which is incorporated by reference herein in its entirety.
Nucleic acid: A polymer composed of nucleotide units (ribonucleotides, deoxyribonucleotides, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof) linked via phosphodiester bonds, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof. Thus, the term includes nucleotide polymers in which the nucleotides and the linkages between them include non-naturally occurring synthetic analogs, such as, for example and without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), and the like. Such polynucleotides can be
synthesized, for example, using an automated DNA synthesizer. The term
"oligonucleotide" typically refers to short polynucleotides, generally no greater than about 50 nucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), this also includes an RNA sequence (i.e., A, U, G, C) in which "U" replaces "T. "
Conventional notation is used herein to describe nucleotide sequences: the left-hand end of a single- stranded nucleotide sequence is the 5'-end; the left-hand direction of a double-stranded nucleotide sequence is referred to as the 5'-direction. The direction of 5' to 3' addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction. The DNA strand having the same sequence as an mRNA is referred to as the "coding strand;" sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5' to the 5'-end of the RNA transcript are referred to as "upstream
sequences;" sequences on the DNA strand having the same sequence as the RNA and which are 3' to the 3' end of the coding RNA transcript are referred to as "downstream sequences."
"cDNA" refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.
"Encoding" refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA produced by that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and non-coding strand, used as the template for transcription, of a gene or cDNA can be referred to as encoding the protein or other product of that gene or cDNA. Unless otherwise specified, a "nucleotide sequence encoding an amino acid sequence" includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. Nucleotide sequences that encode proteins and RNA may include introns.
"Recombinant nucleic acid" refers to a nucleic acid having nucleotide sequences that are not naturally joined together. This includes nucleic acid vectors comprising an amplified or assembled nucleic acid which can be used to transform a suitable host cell. A host cell that comprises the recombinant nucleic acid is referred to as a "recombinant host cell." The gene is then expressed in the recombinant host cell to produce, e.g., a "recombinant polypeptide." A recombinant nucleic acid may serve a non-coding function (e.g., promoter, origin of replication, ribosome-binding site, etc.) as well.
A first sequence is an "antisense" with respect to a second sequence if a polynucleotide whose sequence is the first sequence specifically hybridizes with a polynucleotide whose sequence is the second sequence.
Terms used to describe sequence relationships between two or more nucleotide sequences or amino acid sequences include "reference sequence," "selected from," "comparison window," "identical," "percentage of sequence identity," "substantially identical," "complementary," and "substantially
complementary."
For sequence comparison of nucleic acid sequences, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters are used. Methods of alignment of sequences for comparison are well known in the art.
Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443, 1970, by the search for similarity method of Pearson & Lipman, Proc. Nat' I. Acad. Sci. USA 85:2444, 1988, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds 1995 supplement)).
One example of a useful algorithm is PILEUP. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 35:351-360, 1987. The method used is similar to the method described by Higgins & Sharp, CABIOS 5:151-153, 1989. Using PILEUP, a reference sequence is compared to other test sequences to determine the percent sequence identity relationship using the following parameters: default gap weight (3.00), default gap length weight
(0.10), and weighted end gaps. PILEUP can be obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et al., Nuc. Acids Res.
12:387-395, 1984.
Another example of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and the BLAST 2.0 algorithm, which are described in Altschul et al., J. Mol. Biol. 215:403-410, 1990 and Altschul et al., Nucleic Acids Res. 25:3389-3402, 1977. Software for performing BLAST analyses is publicly available through the National Center for
Biotechnology Information (ncbi.nlm.nih.gov). The BLASTN program (for nucleotide sequences) uses as defaults a word length (W) of 11, alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. The
BLASTP program (for amino acid sequences) uses as defaults a word length (W) of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915, 1989). An oligonucleotide is a linear polynucleotide sequence of up to about 100 nucleotide bases in length.
ClustalW is a program that aligns three or more sequences in a
computationally efficient manner. Aligning multiple sequences highlights areas of similarity which may be associated with specific features that have been more highly conserved than other regions. Thus, this program can classify sequences for phylogenetic analysis, which aims to model the substitutions that have occurred over evolution and derive the evolutionary relationships between sequences. The
ClustalW multiple sequence alignment web form is available on the internet from EMBL-EBI (ebi.ac.uk/Tools/msa/clustalw2/), see also Larkin et al., Bioinformatics 200723(21): 2947-2948.
A polynucleotide or nucleic acid sequence refers to a polymeric form of nucleotide at least 10 bases in length. A recombinant polynucleotide includes a polynucleotide that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally occurring genome of the organism from which it is derived. The term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences. The nucleotides can be ribonucleotides, deoxyribonucleotides, or modified forms of either nucleotide. The term includes single- and double- stranded forms of DNA.
Pharmaceutically acceptable carriers: The pharmaceutically acceptable carriers of use are conventional. Remington's Pharmaceutical Sciences, by E. W. Martin, Mack Publishing Co., Easton, PA, 19th Edition, 1995, describes
compositions and formulations suitable for pharmaceutical delivery of the antibodies herein disclosed.
In general, the nature of the carrier will depend on the particular mode of administration being employed. For instance, parenteral formulations usually comprise injectable fluids that include pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle. For solid compositions (e.g., powder, pill, tablet, or capsule forms), conventional non-toxic solid carriers can include, for example, pharmaceutical grades of mannitol, lactose, starch, or magnesium stearate. In addition to biologically neutral carriers, pharmaceutical compositions to be administered can contain minor amounts of non-toxic auxiliary substances, such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
Pharmaceutical agent: A chemical compound or composition capable of inducing a desired therapeutic or prophylactic effect when properly administered to a subject or a cell. In some examples a pharmaceutical agent includes one or more of the disclosed antibodies.
Phylogenetic analysis: The assembly of a phylogenetic tree representing the evolutionary ancestry of a set of genes, such as genes encoding an antibody, or other taxa, using nucleotide sequences as the basis for classification.
Phylogenetic tree: A branching diagram or "tree" showing the inferred evolutionary relationships among nucleic acid or amino acid sequences based upon similarities and differences in their sequence. The "taxa" or "leaves" joined together in the tree are implied to have descended from a common ancestor, such as an inferred common ancestor. In a rooted phylogenetic tree, each node with
descendants represents the inferred most recent common ancestor of the
descendants. Each node can be referred to a taxonomic unit. Internal nodes are generally referred to ad hypothetical evolutionary intermediates as they cannot be directly observed.
Phylogenetic trees are constructed using computational phylogenetic methods and tools, such as distance-matrix methods for example, neighbor-joining, maximum likelihood or UPGMA, which calculate genetic distance from multiple sequence alignments. Many sequence alignment methods such as ClustalW also create trees by using the simpler algorithms (i.e., those based on distance) of tree construction. More advanced methods use the optimality criterion of maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation.
A rooted phylogenetic tree is a directed tree with a unique node
corresponding to the (usually imputed) most recent common ancestor of all the entities at the branches of the tree.With reference to a phylogenetic tree, a root is the common ancestor of all of the sequences in the phylogenetic tree.
In several embodiments, a phylogenetic tree is constructed as aprt of a cross- donor phylogenetic analysis, wherein nucleic acid sequences are leaves and the root of the phylogenetic tree.
Polypeptide: Any chain of amino acids, regardless of length or post- translational modification (e.g., glycosylation or phosphorylation). In one embodiment, the polypeptide is gpl20 polypeptide. In one embodiment, the polypeptide is a disclosed antibody or a fragment thereof. A "residue" refers to an amino acid or amino acid mimetic incorporated in a polypeptide by an amide bond or amide bond mimetic. A polypeptide has an amino terminal (N-terminal) end and a carboxy terminal end.
Purified: The term purified does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified peptide preparation is one in which the peptide or protein (such as an antibody) is more enriched than the peptide or protein is in its natural environment within a cell. In one embodiment, a preparation is purified such that the protein or peptide represents at least 50% of the total peptide or protein content of the preparation.
Recombinant: A recombinant nucleic acid is one that has a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
Sequence identity: The similarity between amino acid sequences is expressed in terms of the similarity between the sequences, otherwise referred to as sequence identity. Sequence identity is frequently measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more similar the two sequences are. Homologs or variants of a polypeptide will possess a relatively high degree of sequence identity when aligned using standard methods.
Methods of alignment of polypeptide sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith and Waterman, Adv. Appl. Math. 2:482, 1981; Needleman and Wunsch, J. Mol. Biol. 48:443, 1970; Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A.
85:2444, 1988; Higgins and Sharp, Gene 73:237, 1988; Higgins and Sharp, CABIOS 5: 151, 1989; Corpet et al., Nucleic Acids Research 16:10881, 1988; and Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A. 85:2444, 1988. Altschul et al., Nature Genet. 6: 119, 1994, presents a detailed consideration of sequence alignment methods and homology calculations. The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J. Mol. Biol. 215:403, 1990) is available from several sources, including the National Center for Biotechnology Information (NCBI, Bethesda, MD) and on the internet (along with a description of how to determine sequence identity using this program).
Homologs and variants of a VL or a VH of an antibody that specifically binds a polypeptide are typically characterized by possession of at least about 75%, for example at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity counted over the full length alignment with the amino acid sequence of interest. Proteins with even greater similarity to the reference sequences will show increasing percentage identities when assessed by this method, such as at least 80%, at least 85%, at least 90%, at least 95%, at least
98%, or at least 99% sequence identity. When less than the entire sequence is being compared for sequence identity, homologs and variants will typically possess at least 80% sequence identity over short windows of 10-20 amino acids, and may possess sequence identities of at least 85% or at least 90% or 95% depending on their similarity to the reference sequence. One of skill in the art will appreciate that these sequence identity ranges are provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that fall outside of the ranges provided.
Specifically bind: When referring to an antibody, refers to a binding reaction which determines the presence of a target protein, peptide, or
polysaccharide in the presence of a heterogeneous population of proteins and other biologies. Thus, under designated conditions, an antibody binds preferentially to a particular target protein, peptide or polysaccharide (such as an antigen present on the surface of a pathogen, for example gpl20) and do not bind in a significant amount to other proteins or polysaccharides present in the sample or subject. Specific binding can be determined by methods known in the art. With reference to an antibody antigen complex, specific binding of the antigen and antibody has a Kd of less than
-7 -7 -8 -9
about 10" Molar, such as less than about 10" Molar, 10" Molar, 10" , or even less than about 10"10 Molar.
Therapeutic agent: Used in a generic sense, it includes treating agents, prophylactic agents, and replacement agents.
Therapeutically effective amount or effective amount: A quantity of a specific substance, such as a disclosed antibody, sufficient to achieve a desired effect in a subject being treated. For instance, this can be the amount necessary to inhibit HIV replication or treat AIDS. In several embodiments, a therapeutically effective amount is the amount necessary to reduce a sign or symptom of AIDS, and/or to decrease viral titer in a subject. When administered to a subject, a dosage will generally be used that will achieve target tissue concentrations that has been shown to achieve a desired in vitro effect.
T Cell: A white blood cell critical to the immune response. T cells include, but are not limited to, CD4+ T cells and CD8+ T cells. A CD4+ T lymphocyte is an immune cell that carries a marker on its surface known as "cluster of differentiation
4" (CD4). These cells, also known as helper T cells, help orchestrate the immune response, including antibody responses as well as killer T cell responses. CD8+ T cells carry the "cluster of differentiation 8" (CD8) marker. In one embodiment, a CD8 T cells is a cytotoxic T lymphocytes. In another embodiment, a CD8 cell is a suppressor T cell.
Vector: A nucleic acid molecule as introduced into a host cell, thereby producing a transformed host cell. A vector may include nucleic acid sequences that permit it to replicate in a host cell, such as an origin of replication. A vector may also include one or more selectable marker genes and other genetic elements known in the art.
Virus: Microscopic infectious organism that reproduces inside living cells. A virus consists essentially of a core of a single nucleic acid surrounded by a protein coat, and has the ability to replicate only inside a living cell. "Viral replication" is the production of additional virus by the occurrence of at least one viral life cycle. A virus may subvert the host cells' normal functions, causing the cell to behave in a manner determined by the virus. For example, a viral infection may result in a cell producing a cytokine, or responding to a cytokine, when the uninfected cell does not normally do so.
"Retroviruses" are RNA viruses wherein the viral genome is RNA. When a host cell is infected with a retrovirus, the genomic RNA is reverse transcribed into a DNA intermediate which is integrated very efficiently into the chromosomal DNA of infected cells. The integrated DNA intermediate is referred to as a pro virus. The term "lentivirus" is used in its conventional sense to describe a genus of viruses containing reverse transcriptase. The lentiviruses include the "immunodeficiency viruses" which include human immunodeficiency virus (HIV) type 1 and type 2 (HIV-I and HIV-II), simian immunodeficiency virus (SIV), and feline
immunodeficiency virus (FIV).
HIV-I is a retrovirus that causes immunosuppression in humans (HIV disease), and leads to a disease complex known as the acquired immunodeficiency syndrome (AIDS). "HIV disease" refers to a well-recognized constellation of signs and symptoms (including the development of opportunistic infections) in persons who are infected by an HIV virus, as determined by antibody or western blot studies.
Laboratory findings associated with this disease are a progressive decline in T cells.
VRCOl-like antibody: VRC-01 antibodies, and methods for identifying and producing these antibodies, are disclosed herein. Generally, these antibodies bind to CD4 binding surface of gpl20 in substantially the same orientation as VRCOl, and are broadly neutralizing. VRCOl-like antibodies mimic the binding of CD4 to gpl20 with several of the important contacts between CD4 and gpl20 mimicked by the VRCOl-like antibodies (see below). In some embodiments, the heavy and or light chains of a VRCOl -antibody can be cross-complemented by the heavy and or light chains of a known VRCOl like antibody, such as VRC-PG04, VRC-PG04b, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 and maintain high binding affinity for gpl20.
Suitable methods and materials for the practice or testing of this disclosure are described below. Such methods and materials are illustrative only and are not intended to be limiting. Other methods and materials similar or equivalent to those described herein can be used. For example, conventional methods well known in the art to which a disclosed invention pertains are described in various general and more specific references, including, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, 1989; Sambrook et al., Molecular Cloning: A Laboratory Manual, 3d ed., Cold Spring Harbor Press, 2001; Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates, 1992 (and Supplements to 2000); Ausubel et al., Short Protocols in Molecular Biology: A Compendium of Methods from Current Protocols in
Molecular Biology, 4th ed., Wiley & Sons, 1999; Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1990; and Harlow and Lane, Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1999. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
//. Description of Several Embodiments
A. Neutralizing Monoclonal Antibodies
A class of isolated human monoclonal antibodies "VRCOl-like antibodies" that specifically bind gpl20 and are broadly neutralizing are disclosed herein. Also disclosed herein are compositions including these human monoclonal antibodies and a pharmaceutically acceptable carrier. Nucleic acids encoding these antibodies, expression vectors comprising these nucleic acids, and isolated host cells that express the nucleic acids are also provided.
Compositions comprising the human monoclonal antibodies specific for gpl20 can be used for research, diagnostic and therapeutic purposes. For example, the human monoclonal antibodies disclosed herein can be used to diagnose or treat a subject having an HIV-1 infection and/or AIDS. For example, the antibodies can be used to determine HIV-1 titer in a subject. The antibodies disclosed herein also can be used to study the biology of the human immunodeficiency virus.
VRCOl-like antibodies bind to the particular CD4 binding site on gpl20 in a specific orientation that mimics the binding of CD4 to gpl20. Thus, VRCOl-like antibodies can be described by this novel mode of binding. The crystal structures of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 antibodies in complex with gpl20 provides insight for a novel binding mode for antibodies and gpl20. Such a novel binding mode establishes a new class of antibody recognition for gpl20. Thus, in some embodiments, the antibody specifically binds to an epitope on the surface of gpl20 that includes, residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20 or a subset or combination thereof (see the numbering of gpl20 according to the HXBC2 convention). VRCOl -like antibodies bind to the epitope defined by residues
N276.T278NNAKT283 ..S365GGD368..I371...T455 DGG459.N461.. 469 ..G472GN474 in gpl20.
In certain embodiments a VRCOl-like antibody has a relative angle and orientation of binding of gpl20 as shown in the crystal structure of the complex of the VRCR03 antibody and gp 120 (see FIG. 2d of Zhou et al. , "Structural Basis for Broad and Potent Neutralization of ΗΓ -1 by Antibody VRCOl, Science 329, 811- 817 (2010), which is incorporated herein by reference in its entirety). In some examples, the VRCOl-like antibodies partially mimic the binding of the CD4 receptor, with an about 6 A shift and an about 43 degree rotation from the CD4- defined position (see FIG. 2d of Zhou et ah, "Structural Basis for Broad and Potent
Neutralization of HIV-1 by Antibody VRCOl, Science 329, 811-817 (2010), which is incorporated herein by reference in its entirety), such as about a 45 degree rotation from the CD4-defined binding, for example about a 40 degree rotation, about a 50 degree rotation, about a 35 degree rotation or about a 55 degree rotation, for example about a 40-50 degree rotation, about a 35-50 degree rotation, about a 40-55 degree rotation, about a 45-55 degree rotation or about a 35-55 degree rotation. In some examples, a VRCOl-like antibody is an antibody with heavy and light chain in an orientation of heavy chain relative to gpl20, that differs by less than 10 about degrees, such as less than about 9 about degrees, less than about 8 about degrees, less than about 7 about degrees, less than about 6 about degrees, less than about 5 about degrees, or less than about 4 degrees, such as about 10-8 degrees, about 10-7 degrees, about 9-6 degrees, or about 9-5 degrees, and/or less than about a 5 A translation from the binding angle of VRCOl and/or VRC03 to gpl20, such as less than about a 5 A translation, such as less than about a 4 A translation, than about a 3
A translation, less than about a 2 A translation, or less than about a 2 A translation, for example a 2-3 A translation a 5-3 A translation, or a 4-2 A translation. Such a binding characteristic can readily be determined from the crystal structure of the VRC03 or VRCOl antibody complex. In some embodiments the CDR H2 region (the C" strand in particular) forms hydrogen-bonds to the b-15 loop of gpl20. In some embodiments Asp 368 of gpl20 forms a salt-bridge with Arg 71 of the heavy chain. All VRCOl-like antibodies need to mimic CD4 with similar heavy chain orientations.
As disclosed herein, deep sequencing results define the variation allowed for VRCOl-like recognition, and relating this to germ-line VH sequences, which delineated maturation pathways to elicit additional neutralizing antibodies that bind to substantially similar epitopes on the surface of gpl20 in substantially the same orientation that VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 , VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 bind (see Section D below). Thus in some embodiments, a VRCOl-like antibody is one that is identified by the methods set forth in Section D. In some embodiments, a VRCOl-like antibody includes CDRs of the heavy chain variable domain sequences identified by the methods set forth in Section D and a light chain, such as a light chain from a known
VCROl-like antibody, for example VRC02, VRC03, NIH45-46, VRC-PG04, VRC-
PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134. In some embodiments, a VRCOl-like antibody includes an amino acid sequence of a heavy chain variable domain sequence identified by the methods set forth in Section D and a light chain, such as a light chain from a known VCROl-like antibody, for example VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
In several embodiments, a VRCOl-like antibody does not include a heavy chain with an amino acid sequence of an established VRCOl-like antibody. Thus, in some embodiments, the heavy chain variable domain sequence identified by the methods set forth in Section D is not a heavy chain variable domain sequence from an established VRCOl-like antibody. In additional embodiments, a VRCOl-like antibody does not include a light chain with an amino acid sequence of an established VRCOl-like antibody. Thus, in some embodiments, the light chain variable domain sequence identified by the methods set forth in Section D is not a light chain variable domain sequence from an established VRCOl-like antibody.
In some embodiments, a nucleic acid encoding a VRCOl-like antibody heavy chain variable domains is derived from the IGHV1-2 germline allelic origin, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV 1-2*05 germline allelic origin. In some embodiments, a nucleic acid encoding a VRCOl-like antibody light chain variable domain is derived from a IGKV3 allelic origin. In some examples, a nucleic acid sequence encoding a VRCOl-like antibody heavy or light chain variable domain derived from the IGHV 1-2 germline, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV1- 2*05 germline or IGKV3 germline origin is about 10%, 15%, 20%, 25%, 30%, 35% or 40%, such as about 15% to 40% divergent, such as 25% divergent from the heavy or light germline sequence of interest, such as the IGHV 1-2 germline sequence, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHV1- 2*05 germline sequence, or a IGKV3 germline sequence, respectively. In some embodiments a nucleic acid sequence encoding a VRCOl-like antibody heavy or light chain variable domain is derived from the IGHV 1-2 germline, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline, or IGKV3 germline, respectively, and is about 55% 60%, 65% 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical, such as 60% to 99% identical, such as 85% identical, to a heavy (or light) chain variable domain of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody heavy (or light) chain variable domain. In some embodiments, a VRCOl -like antibody heavy or light chain variable domain derived from the IGHV1-2 germline, for example the IGHV 1-2*01, IGHV 1-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline, or IGKV3 germline a nucleic at sequences that is about 55% 60%, 65% 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical, such as 60% to 99% identical, such as 85% identical, to a heavy (or light) chain variable domain of an antibody that is known to be broadly neutralizing to HIV, such as a VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody heavy (or light) chain variable domain and has a nucleic sequences that is about 10%, 15%, 20%, 25%, 30%, 35% or 40%, such as about 15% to 40% divergent, such as 25% divergent from a heavy (or light) germline gene of interest, such as the IGHV1-2 germline, for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV 1-2*04 , or IGHV 1-2*05 germline, or the IGKV3 germline.
The heavy chain of a VRCOl -like antibody can be complemented by the light chain of VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 antibody and still retain binding for gpl20, for example retain specific binding for residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20. For example, the VRCOl light chain and VRC03 heavy chain form active antibodies able to specifically bind HIV- 1. In addition the VRCOl heavy chain and VRC03 light chain form active antibodies that recognize HIV-1. Thus, disclosed herein are VCROl-like antibodies that can be identified by complementation of the heavy or light chains of VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC- CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134. For example, using complementation, the heavy chain amino acid sequences of one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and/or encoded by one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710 are demonstrated to be VRCOl-like antibodies that specifically bind gpl20. Thus, in some examples, the VRCOl-like antibody include one of the heavy chain amino acid sequences set forth as one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and/or encoded by one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710, and a light chain from VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134.
Once a heavy or light chain variable domain of interest is identified, binding to the antigen of interest (such as, but not limited to, gpl20) or an epitope of interest (such as, but not limited to, RSC3) can be determined using a cross
complementation analysis. Briefly, if the variable domain of interest is a heavy chain variable domain, the amino acid sequence of this heavy chain variable domain is produced. The heavy chain variable domain is then paired with a reference sequence light chain variable domain, such as VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134, light chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a KD of 10~8, 10"9 or 10"10. Similarly, if the variable domain of interest is a light chain variable domain, this amino acid sequence is produced. The variable light chain variable domain is then paired with a reference sequence heavy chain variable domain, such as variable domain is then paired with a reference sequence light chain variable domain, such as VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 heavy chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a KD of 10~8, 10"9 or 10"10.
In some embodiments, a VRCOl-like antibody includes one, two or all three
CDRs from a heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707-1710, or the amino acid sequence set forth as one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and a light chain. In some embodiments, a VRCOl-like antibody includes the heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or
1707-1710, or the amino acid sequence set forth as one of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623 and a light chain. The light chain can be the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 light chain. In additional embodiments, the isolated human monoclonal antibody specifically binds gpl20 and the light chain of the antibody includes amino acids CDR1 CDR2 and/or CDR3 of SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03). In specific examples, the light chain of the antibody includes SEQ ID NO: 1619 (VRC- PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03).
In some embodiments, a VRCOl-like antibody includes the CDRs from a light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 and 4 and a heavy chain, such as a heavy chain from the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody. In some embodiments, a VRCOl-like antibody includes the light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 and 4 and a heavy chain, such as a heavy chain from the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody. In some embodiments, the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDR1 CDR2 and/or CDR3 of SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC- CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03). In additional embodiments, a VRCOl-like antibody includes a light chain amino acid sequence comprising a L-CDR3 including three amino acids, that are, in order, a hydrophobic, a negative change, and a hydrophobic amino acid. Thus, in one example, the L-CDR3 includes, tyrosine-glutamic acid, and phenylalanine. In specific examples the L-CDR3 includes CQQYEFFG. The heavy chain variable domain of the VRCOl-like antibody can include the CDRs from any heavy chain variable domain identified using the methods disclosed herein. In some
embodiments, the VRC-01-like antibody includes a heavy chain variable domain comprising the heavy chain CDRs encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, and 1707-1710, or the CDRs from the amino acid sequence set forth as one of SEQ ID NOs: 1627-1646, 1655- 1658 or 2537-2623. In additional embodiments, the VRCOl-like antibody includes a heavy chain with CDR1 CDR2 and/or CDR3 of SEQ ID NO: SEQ ID NO: 1619 (VRC-PG04), SEQ ID NO: 1620 (VRC-PG04b), SEQ ID NO: 1621 (VRC-CH30), SEQ ID NO: 1622 (VRC-CH31), SEQ ID NO: 1623 (VRC-CH32), SEQ ID NO: 1624 (VRCOl), SEQ ID NO: 1625 (VRC02), or SEQ ID NO: 1626 (VRC03). In further embodiment, the antibodies specifically bind gpl20 and/or neutralize HIV.
In some embodiments, the heavy chain variable domain sequences identified by the methods set forth in Section D is not a heavy chain variable domain sequence from an established VRCOl-like antibody. In some embodiments, a
VRCOl-like antibody does not include a heavy chain domain from an established VRCOl-like antibody. In some embodiments, a VRCOl-like antibody is not an established VRCOl-like antibody.
Fully human monoclonal antibodies include human framework regions. Thus, any of the antibodies that specifically bind gpl20 herein can include the human framework region. Examples of framework sequences that can be used include the amino acid framework sequences of the heavy and light chains disclosed in PCT Publication No. WO 2006/074071 (see, for example, SEQ ID NOs: 1-16), which is herein incorporated by reference.
The monoclonal antibody can be of any isotype. The monoclonal antibody can be, for example, an IgM or an IgG antibody, such as IgGior an IgG2. The class of an antibody that specifically binds gpl20 can be switched with another. In one aspect, a nucleic acid molecule encoding VL or VH is isolated using methods well- known in the art, such that it does not include any nucleic acid sequences encoding the constant region of the light or heavy chain, respectively.
The nucleic acid molecule encoding VL or VH is then operatively linked to a nucleic acid sequence encoding a CL or CH from a different class of immunoglobulin molecule. This can be achieved using a vector or nucleic acid molecule that comprises a CL or CH chain, as known in the art. For example, an antibody that specifically binds gpl20, that was originally IgM may be class switched to an IgG. Class switching can be used to convert one IgG subclass to another, such as from IgGi to IgG2.
In some examples, the disclosed antibodies are multimers of antibodies, such as dimers trimers, tetramers, pentamers, hexamers, septamers, octomers and so on. In some examples, the antibodies are pentamers.
By Kabat definition, the CDRs of the light chain are bounded by the residues at positions 24 and 34 (L-CDR1), 50 and 56 (L-CDR2), 89 and 97 (L-CDR3); the CDRs of the heavy chain are bounded by the residues at positions 31 and 35b (H- CDR1), 50 and 65 (H-CDR2), 95 and 102 (H-CDR3), using the numbering convention delineated by Kabat et al., (1991) Sequences of Proteins of
Immunological Interest, 5th Edition, U.S. Department of Health and Human
Services, Public Health Service, National Institutes of Health, Bethesda, MD (NIH Publication No. 91-3242, which is specifically incorporated herein by reference in its entirety). The person of ordinary skill in the art will understand that various CDR numbering schemes (such as the Kabat, Chothia or IMGT numbering schemes) can be used to determine CDR positions.
Antibody fragments are encompassed by the present disclosure, such as Fab,
F(ab')2, and Fv which include a heavy chain and light chain variable region and specifically bind an antigen, such as gpl20. These antibody fragments retain the ability to selectively bind with the antigen. These fragments include:
(1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule;
(3) (Fab')2, the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds;
(4) Fv, a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and
(5) Single chain antibody (such as scFv), defined as a genetically engineered molecule containing the variable region of the light chain, the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule.
(6) A dimer of a single chain antibody (scFV2), defined as a dimer of a scFV. This has also been termed a "miniantibody."
Methods of making these fragments are known in the art (see for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor
Laboratory, New York, 1988). In several examples, the variable region included in the antibody is the variable region of m912.
In a further group of embodiments, the antibodies are Fv antibodies, which are typically about 25 kDa and contain a complete antigen-binding site with three CDRs per each heavy chain and each light chain. To produce these antibodies, the VH and the VL can be expressed from two individual nucleic acid constructs in a host cell. If the VH and the VL are expressed non-contiguously, the chains of the Fv antibody are typically held together by noncovalent interactions. However, these chains tend to dissociate upon dilution, so methods have been developed to crosslink the chains through glutaraldehyde, intermolecular disulfides, or a peptide linker. Thus, in one example, the Fv can be a disulfide stabilized Fv (dsFv), wherein the heavy chain variable region and the light chain variable region are chemically linked by disulfide bonds. In an additional example, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (scFv) are prepared by constructing a structural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The structural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing scFvs are known in the art (see Whitlow et ah, Methods: a Companion to Methods in
Enzymology, Vol. 2, page 97, 1991; Bird et al, Science 242:423, 1988; U.S. Patent No. 4,946,778; Pack et al., Bio/Technology 11: 1271, 1993; and Sandhu, supra). Dimers of a single chain antibody (scFV2), are also contemplated.
Antibody fragments can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly (see U.S. Patent No. 4,036,945 and U.S. Patent No. 4,331,647, and references contained therein; Nisonhoff et ah, Arch. Biochem. Biophys. 89:230, 1960; Porter, Biochem. J. 73: 119, 1959; Edelman et ah, Methods in Enzymology, Vol. 1, page 422, Academic Press, 1967; and Coligan et al. at sections 2.8.1-2.8.10 and 2.10.1-2.10.4).
Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody.
One of skill will realize that conservative variants of a VRC-l-like antibody can be produced. Such conservative variants employed in antibody fragments, such as dsFv fragments or in scFv fragments, will retain critical amino acid residues necessary for correct folding and stabilizing between the VH and the VL regions, and will retain the charge characteristics of the residues in order to preserve the low pi and low toxicity of the molecules.
Once a VRCOl-like antibody is identified, additional recombinant neutralizing antibodies that specifically bind the same epitope of gpl20 bound by the antibodies disclosed herein that specifically bind gpl20, such as affinity mature forms, can be isolated by screening of a recombinant combinatorial antibody library, such as a Fab phage display library (see, for example, U.S. Patent Application Publication No. 2005/0123900). To increase binding affinity of the antibody, the VL and VH segments can be randomly mutated, such as within H-CDR3 region or the L- CDR3 region, in a process analogous to the in vivo somatic mutation process responsible for affinity maturation of antibodies during a natural immune response. This in vitro affinity maturation can be accomplished by amplifying VH and VL regions using PCR primers complementary to the H-CDR3 or L-CDR3,
respectively. In this process, the primers have been "spiked" with a random mixture of the four nucleotide bases at certain positions such that the resultant PCR products encode VH and VL segments into which random mutations have been introduced into the VH and/or VL CDR3 regions. These randomly mutated VH and VL segments can be tested to determine the binding affinity for gpl20.
Following screening and isolation of an antibody that binds gpl20 from a recombinant immunoglobulin display library, nucleic acid encoding the selected antibody can be recovered from the display package (for example, from the phage genome) and subcloned into other expression vectors by standard recombinant DNA techniques, as described herein. If desired, the nucleic acid can be further manipulated to create other antibody fragments, also as described herein. To express a recombinant antibody isolated by screening of a combinatorial library, the DNA encoding the antibody is cloned into a recombinant expression vector and introduced into a mammalian host cells, as described herein.
The antibodies or antibody fragments disclosed herein can be derivatized or linked to another molecule (such as another peptide or protein). In general, the antibody or portion thereof is derivatized such that the binding to gpl20 is not affected adversely by the derivatization or labeling. For example, the antibody can be functionally linked (by chemical coupling, genetic fusion, noncovalent association or otherwise) to one or more other molecular entities, such as another antibody (for example, a bispecific antibody or a diabody), a detection agent, a pharmaceutical agent, and/or a protein or peptide that can mediate associate of the antibody or antibody portion with another molecule (such as a streptavidin core region or a polyhistidine tag).
One type of derivatized antibody is produced by cross-linking two or more antibodies (of the same type or of different types, such as to create bispecific antibodies). Suitable crosslinkers include those that are heterobifunctional, having two distinctly reactive groups separated by an appropriate spacer (such as m- maleimidobenzoyl-N-hydroxysuccinimide ester) or homobifunctional (such as disuccinimidyl suberate). Such linkers are available from Pierce Chemical Company (Rockford, IL).
An antibody that specifically binds gpl20 can be labeled with a detectable moiety. Useful detection agents include fluorescent compounds, including fluorescein, fluorescein isothiocyanate, rhodamine, 5-dimethylamine-l- napthalenesulfonyl chloride, phycoerythrin, lanthanide phosphors and the like. Bioluminescent markers are also of use, such as luciferase, Green fluorescent protein, Yellow fluorescent protein. An antibody can also be labeled with enzymes that are useful for detection, such as horseradish peroxidase, β- galactosidase, luciferase, alkaline phosphatase, glucose oxidase and the like. When an antibody is labeled with a detectable enzyme, it can be detected by adding additional reagents that the enzyme uses to produce a reaction product that can be discerned. For example, when the agent horseradish peroxidase is present the addition of hydrogen peroxide and diaminobenzidine leads to a colored reaction product, which is visually detectable. An antibody may also be labeled with biotin, and detected through indirect measurement of avidin or streptavidin binding. It should be noted that the avidin itself can be labeled with an enzyme or a fluorescent label.
An antibody may be labeled with a magnetic agent, such as gadolinium. Antibodies can also be labeled with lanthanides (such as europium and dysprosium), and manganese. Paramagnetic particles such as superparamagnetic iron oxide are also of use as labels. An antibody may also be labeled with a predetermined polypeptide epitopes recognized by a secondary reporter (such as leucine zipper pair sequences, binding sites for secondary antibodies, metal binding domains, epitope tags). In some embodiments, labels are attached by spacer arms of various lengths to reduce potential steric hindrance.
An antibody can also be labeled with a radiolabeled amino acid. The radiolabel may be used for both diagnostic and therapeutic purposes. Examples of labels for polypeptides include, but are not limited to, the following radioisotopes or radionucleotides: 3H, 14C, 15N, 35S, 90Y, 99Tc, mIn, 125I, 131I.
An antibody can also be derivatized with a chemical group such as polyethylene glycol (PEG), a methyl or ethyl group, or a carbohydrate group. These groups may be useful to improve the biological characteristics of the antibody, such as to increase serum half-life or to increase tissue binding.
Means of detecting such labels are well known to those of skill in the art. Thus, for example, radiolabels may be detected using photographic film or scintillation counters, fluorescent markers may be detected using a photodetector to detect emitted illumination. Enzymatic labels are typically detected by providing the enzyme with a substrate and detecting the reaction product produced by the action of the enzyme on the substrate, and colorimetric labels are detected by simply visualizing the colored label.
The present disclosure also relates to the crystals obtained from the VRC-
PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03, antibody or portions thereof in complex with gpl20, the crystal structures of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20, the three-dimensional coordinates of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20 and three-dimensional structures of models of the VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03, antibody or portions thereof in complex with gpl20. The three dimensional coordinates of VRCOl in complex with gpl20 are available at the Protein Data Bank, at accession number 3NGB. Those of skill in the art will understand that a set of structure coordinates for the VRC-PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody or portions thereof in complex with gpl20 or a portion thereof, is a relative set of points that define a shape in three dimensions. Thus, it is possible that an entirely different set of coordinates could define a similar or identical shape. Moreover, slight variations in the individual coordinates will have little effect on overall shape. The variations in coordinates discussed above may be generated because of mathematical manipulations of the structure coordinates.
This disclosure further provides systems, such as computer systems, intended to generate structures and/or perform rational drug or compound design for an antigenic compound capable of eliciting an immune response in a subject. The system can contain one or more or all of: atomic co-ordinate data according to VRC- PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03 antibody complex or a subset thereof, and the figures derived therefrom by homology modeling, the data defining the three- dimensional structure of a VRC-PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 antibody complex or at least one sub-domain thereof, or structure factor data for gpl20, the structure factor data being derivable from the atomic co-ordinate data of VRC- PG04, VRCPG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03 antibody complex or a subset thereof and the figures. B. Polynucleotides and Expression
Nucleic acid molecules (also referred to as polynucleotides) encoding the antibody heavy and light chains provided herein (including, but not limited to VRCOl -like antibodies disclosed herein) can readily be produced by one of skill in the art. For example, these nucleic acids can be produced using the amino acid sequences provided herein (such as the CDR sequences, heavy chain and light chain sequences), sequences available in the art (such as framework sequences), and the genetic code. In some embodiments, the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDRl, CDR2, and CDR3 encoded by any one of SEQ ID NOs: 1, 2, 11-34, 43-1603, 1679-1698, or 1707- 1710. In some embodiments, the isolated human monoclonal antibody specifically binds gpl20, and includes a heavy chain with CDRl, CDR2, and CDR3 of SEQ ID NOs: 1627-1646, 1655-1658 or 2537-2623.
One of skill in the art can readily use the genetic code to construct a variety of functionally equivalent nucleic acids, such as nucleic acids which differ in sequence but which encode the same antibody sequence, or encode a conjugate or fusion protein including the VL and/or VH nucleic acid sequence.
Nucleic acid sequences encoding the antibodies that specifically bind gpl20 can be prepared by any suitable method including, for example, cloning of appropriate sequences or by direct chemical synthesis by methods such as the phosphotriester method of Narang et al., Meth. Enzymol. 68:90-99, 1979; the phosphodiester method of Brown et al., Meth. Enzymol. 68: 109-151, 1979; the diethylphosphoramidite method of Beaucage et al., Tetra. Lett. 22: 1859-1862, 1981; the solid phase phosphoramidite triester method described by Beaucage &
Caruthers, Tetra. Letts. 22(20): 1859-1862, 1981, for example, using an automated synthesizer as described in, for example, Needham-VanDevanter et al., Nucl. Acids Res. 12:6159-6168, 1984; and, the solid support method of U.S. Patent No.
4,458,066. Chemical synthesis produces a single stranded oligonucleotide. This can be converted into double stranded DNA by hybridization with a complementary sequence or by polymerization with a DNA polymerase using the single strand as a template. One of skill would recognize that while chemical synthesis of DNA is generally limited to sequences of about 100 bases, longer sequences may be obtained by the ligation of shorter sequences.
Exemplary nucleic acids can be prepared by cloning techniques. Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Sambrook et al, supra, Berger and Kimmel (eds.), supra, and Ausubel, supra. Product information from manufacturers of biological reagents and experimental equipment also provide useful information. Such manufacturers include the SIGMA Chemical Company
(Saint Louis, MO), R&D Systems (Minneapolis, MN), Pharmacia Amersham (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, CA), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc., GIBCO BRL Life Technologies, Inc. (Gaithersburg, MD), Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), Invitrogen (Carlsbad, CA), and Applied Biosystems (Foster City, CA), as well as many other commercial sources known to one of skill.
Nucleic acids can also be prepared by amplification methods. Amplification methods include polymerase chain reaction (PCR), the ligase chain reaction (LCR), the transcription-based amplification system (TAS), the self-sustained sequence replication system (3SR). A wide variety of cloning methods, host cells, and in vitro amplification methodologies are well known to persons of skill.
Any of the nucleic acids encoding any of the antibodies, VH and/or VL, disclosed herein (or fragment thereof) can be expressed in a recombinantly engineered cell such as bacteria, plant, yeast, insect and mammalian cells. These antibodies can be expressed as individual VH and/or VL chain, or can be expressed as a fusion protein. An immunoadhesin can also be expressed. Thus, in some examples, nucleic acids encoding a VH and VL, and immunoadhesin are provided. The nucleic acid sequences can optionally encode a leader sequence.
To create a single chain antibody, (scFv) the VH- and VL-encoding DNA fragments are operatively linked to another fragment encoding a flexible linker, e.g., encoding the amino acid sequence (Gly4-Ser)3, such that the VH and VL sequences can be expressed as a contiguous single-chain protein, with the VL and VH domains joined by the flexible linker (see, e.g., Bird et al., Science 242:423-426, 1988;
Huston et al., Proc. Natl. Acad. Sci. USA 85:5879-5883, 1988; McCafferty et al., Nature 348:552-554, 1990). Optionally, a cleavage site can be included in a linker, such as a furin cleavage site.
The nucleic acid encoding the VH and/or the VL optionally can encode an Fc domain (immunoadhesin). The Fc domain can be an IgA, IgM or IgG Fc domain. The Fc domain can be an optimized Fc domain, as described in U.S. Published Patent Application No. 20100/093979, incorporated herein by reference. In one example, the immunoadhesin is an IgGi Fc. The single chain antibody may be monovalent, if only a single VH and VL are used, bivalent, if two VH and VL are used, or polyvalent, if more than two VH and VL are used. Bispecific or polyvalent antibodies may be generated that bind specifically to gpl20 and to another molecule, such as gp41. The encoded VH and VL optionally can include a furin cleavage site between the VH and VL domains.
It is expected that those of skill in the art are knowledgeable in the numerous expression systems available for expression of proteins including E. coli, other bacterial hosts, yeast, and various higher eukaryotic cells such as the COS, CHO, HeLa and myeloma cell lines.
The host cell can be a gram positive bacteria including, butare not limited to,
Bacillus, Streptococcus, Streptomyces, Staphylococcus, Enterococcus,
Lactobacillus, Lactococcus, Clostridium, Geobacillus, and Oceanobacillus.
Methods for expressing protein in gram positive bacteria, such as Lactobaccillus are well known in the art, see for example, U.S. Published Patent Application No.
20100/080774. Expression vectors for lactobacillus are described, for example in U.S. Pat. No. 6,100,388, and U.S. Patent No. 5,728,571. Leader sequences can be included for expression in Lactobacillus. Gram negative bacteria include, but not limited to, E. coli, Pseudomonas, Salmonella, Campylobacter, Helicobacter, Flavobacterium, Fusobacterium, Ilyobacter, Neisseria, and Ureaplasma.
One or more DNA sequences encoding the antibody or fragment thereof can be expressed in vitro by DNA transfer into a suitable host cell. The cell may be prokaryotic or eukaryotic. The term also includes any progeny of the subject host cell. It is understood that all progeny may not be identical to the parental cell since there may be mutations that occur during replication. Methods of stable transfer, meaning that the foreign DNA is continuously maintained in the host, are known in the art. Hybridomas expressing the antibodies of interest are also encompassed by this disclosure.
The expression of nucleic acids encoding the isolated proteins described herein can be achieved by operably linking the DNA or cDNA to a promoter (which is either constitutive or inducible), followed by incorporation into an expression cassette. The promoter can be any promoter of interest, including a cytomegalovirus promoter and a human T cell lymphotrophic virus promoter (HTLV)-l. Optionally, an enhancer, such as a cytomegalovirus enhancer, is included in the construct. The cassettes can be suitable for replication and integration in either prokaryotes or eukaryotes. Typical expression cassettes contain specific sequences useful for regulation of the expression of the DNA encoding the protein. For example, the expression cassettes can include appropriate promoters, enhancers, transcription and translation terminators, initiation sequences, a start codon (i.e., ATG) in front of a protein-encoding gene, splicing signal for introns, sequences for the maintenance of the correct reading frame of that gene to permit proper translation of mRNA, and stop codons. The vector can encode a selectable marker, such as a marker encoding drug resistance (for example, ampicillin or tetracycline resistance).
To obtain high level expression of a cloned gene, it is desirable to construct expression cassettes which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational initiation (internal ribosomal binding sequences), and a transcription/translation terminator. For E. coli, this includes a promoter such as the T7, trp, lac, or lambda promoters, a ribosome binding site, and preferably a transcription termination signal. For eukaryotic cells, the control sequences can include a promoter and/or an enhancer derived from, for example, an immunoglobulin gene, HTLV, SV40 or cytomegalovirus, and a polyadenylation sequence, and can further include splice donor and/or acceptor sequences (for example, CMV and/or HTLV splice acceptor and donor sequences). The cassettes can be transferred into the chosen host cell by well-known methods such as transformation or electroporation for E. coli and calcium phosphate treatment, electroporation or lipofection for mammalian cells. Cells transformed by the cassettes can be selected by resistance to antibiotics conferred by genes contained in the cassettes, such as the amp, gpt, neo and hyg genes.
When the host is a eukaryote, such methods of transfection of DNA as calcium phosphate coprecipitates, conventional mechanical procedures such as microinjection, electroporation, insertion of a plasmid encased in liposomes, or virus vectors may be used. Eukaryotic cells can also be cotransformed with
polynucleotide sequences encoding the antibody, labeled antibody, or functional fragment thereof, and a second foreign DNA molecule encoding a selectable phenotype, such as the herpes simplex thymidine kinase gene. Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein (see for example, Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory, Gluzman ed., 1982). One of skill in the art can readily use an expression systems such as plasmids and vectors of use in producing proteins in cells including higher eukaryotic cells such as the COS, CHO, HeLa and myeloma cell lines.
Modifications can be made to a nucleic acid encoding a polypeptide described herein without diminishing its biological activity. Some modifications can be made to facilitate the cloning, expression, or incorporation of the targeting molecule into a fusion protein. Such modifications are well known to those of skill in the art and include, for example, termination codons, a methionine added at the amino terminus to provide an initiation, site, additional amino acids placed on either terminus to create conveniently located restriction sites, or additional amino acids (such as poly His) to aid in purification steps. In addition to recombinant methods, the immunoconjugates, effector moieties, and antibodies of the present disclosure can also be constructed in whole or in part using standard peptide synthesis well known in the art.
Once expressed, the recombinant immunoconjugates, antibodies, and/or effector molecules can be purified according to standard procedures of the art, including ammonium sulfate precipitation, affinity columns, column
chromatography, and the like (see, generally, R. Scopes, PROTEIN
PURIFICATION, Springer- Verlag, N.Y., 1982). The antibodies, immunoconjugates and effector molecules need not be 100% pure. Once purified, partially or to homogeneity as desired, if to be used therapeutically, the polypeptides should be substantially free of endotoxin.
Methods for expression of antibodies and/or refolding to an appropriate active form, including single chain antibodies, from bacteria such as E. coli have been described and are well-known and are applicable to the antibodies disclosed herein. See, Buchner et al. , Anal. Biochem. 205:263-270, 1992; Pluckthun,
Biotechnology 9:545, 1991; Huse et al., Science 246: 1275, 1989 and Ward et al., Nature 341:544, 1989. Often, functional heterologous proteins from E. coli or other bacteria are isolated from inclusion bodies and require solubilization using strong denaturants, and subsequent refolding. During the solubilization step, as is well known in the art, a reducing agent must be present to separate disulfide bonds. An exemplary buffer with a reducing agent is: 0.1 M Tris pH 8, 6 M guanidine, 2 mM EDTA, 0.3 M DTE (dithioerythritol). Reoxidation of the disulfide bonds can occur in the presence of low molecular weight thiol reagents in reduced and oxidized form, as described in Saxena et ah, Biochemistry 9: 5015-5021, 1970, and especially as described by Buchner et ah, supra.
Renaturation is typically accomplished by dilution (for example, 100-fold) of the denatured and reduced protein into refolding buffer. An exemplary buffer is 0.1 M Tris, pH 8.0, 0.5 M L-arginine, 8 mM oxidized glutathione (GSSG), and 2 mM EDTA.
As a modification to the two chain antibody purification protocol, the heavy and light chain regions are separately solubilized and reduced and then combined in the refolding solution. An exemplary yield is obtained when these two proteins are mixed in a molar ratio such that a 5-fold molar excess of one protein over the other is not exceeded. Excess oxidized glutathione or other oxidizing low molecular weight compounds can be added to the refolding solution after the redox-shuffling is completed.
In addition to recombinant methods, the antibodies, labeled antibodies and functional fragments thereof that are disclosed herein can also be constructed in whole or in part using standard peptide synthesis. Solid phase synthesis of the polypeptides of less than about 50 amino acids in length can be accomplished by attaching the C-terminal amino acid of the sequence to an insoluble support followed by sequential addition of the remaining amino acids in the sequence.
Techniques for solid phase synthesis are described by Barany & Merrifield, The Peptides: Analysis, Synthesis, Biology. Vol. 2: Special Methods in Peptide Synthesis, Part A. pp. 3-284; Merrifield et al, J. Am. Chem. Soc. 85:2149-2156, 1963, and Stewart et ah, Solid Phase Peptide Synthesis, 2nd ed., Pierce Chem. Co., Rockford,
111., 1984. Proteins of greater length may be synthesized by condensation of the amino and carboxyl termini of shorter fragments. Methods of forming peptide bonds by activation of a carboxyl terminal end (such as by the use of the coupling reagent N, N'-dicylohexylcarbodimide) are well known in the art.
C. Epitope Scaffolds and Their Use
Epitope scaffolds have been used to isolate antibodies with particular binding specificity (See PCT Publication No. WO 2008/025015). Briefly, an epitope, such as an epitope of a pathogenic agent (for example, an epitope of an HIV-1 polypeptide) recognized by broadly neutralizing antibodies is placed into an appropriate peptide scaffold that preserves its structure and antigenicity. Such epitope scaffolds can then be used as an immunogen to elicit an epitope- specific antibody response in a subject. In another example, such scaffolds can be used to identify specific serum reactivities against the target epitope of the scaffold. This scaffolding technology is applicable not only to HIV-1, but to any pathogen for which a broadly neutralizing antibody and its respective epitope has been characterized at the atomic-level.
The design of epitope-protein scaffolds which elicit selected neutralizing antibodies is disclosed in PCT Publication No. WO 2008/025015, which is incorporated herein by reference. In general, the protocols utilize searchable databases containing the three dimensional structure of proteins, epitopes, and epitope-antibody complexes to identify proteins that are capable of structurally accommodating at least one selected epitope on their surface. Protein folding energetic predictions are further utilized to make energetic predictions. The predicted energies may be used to optimize the structure of the epitope- scaffold and filter results on the basis of energy criteria in order to reduce the number of candidate proteins and identify energetically stable epitope-scaffolds.
In one embodiment, a "superposition" epitope-scaffold can be designed and utilized. Superposition epitope-scaffolds are based upon scaffold proteins having an exposed segment on their surface with a similar conformation as a selected target epitope. The backbone atoms in this superposition region can be structurally superimposed onto the target epitope with less than a selected level of deviation from their native configuration. Candidate scaffolds are identified by
computationally searching through a library of three-dimensional structures. The candidate scaffolds are further designed by putting epitope residues in the superposition region of the scaffold protein and making additional mutations on the surrounding surface of the scaffold to prevent undesirable interactions between the scaffold and the epitope or the scaffold and the antibody.
Superposition is advantageous in that it is a conservative technique. Epitope- scaffolds designed by superposition require only a limited number of mutations on the surface of known, stable proteins. Thus, the designs can be produced rapidly and a high fraction of the first round designs are likely to fold properly.
In another embodiment, "grafting" epitope scaffolds are utilized. Grafting epitope scaffolds utilize scaffold proteins that can accommodate replacement of an exposed segment with the crystallized conformation of the target epitope. For each suitable scaffold identified by computationally searching through a database of known three-dimensional structures, an exposed segment is replaced by the target epitope. The surrounding protein side chains are further mutated to accommodate and stabilize the inserted epitope. Mutations are further made on the surface of the scaffold to avoid undesirable interactions between the scaffold and epitope or scaffold and antibody. Grafting epitope-scaffolds should substantially mimic the epitope-antibody interaction, as the epitope is presented in substantially its native conformation. As such, grafting may be utilized to treat complex epitopes which are more difficult to incorporate using superposition techniques.
In certain embodiments, protein and design calculations are performed using the ROSETTA™ computer program to design the epitope scaffolds. ROSETTA™ is a software application, developed at least in part at the University of Washington which provides protein structure predictions. ROSETTA™ utilizes physical models of the macromolecular interactions and algorithms for finding the lowest energy structure for an amino acid sequence in order to predict the structure of a protein. Furthermore ROSETTA™ may use these models and algorithms to find the lowest energy amino acid sequence for a protein or protein-protein complex for protein design. The ROSETTA™ energy function and several modules of the ROSETTA™ protein structure modeling and design platform are employed in the protein scaffold design discussed below.
Described herein are methods of increasing an antibody binding affinity and neutralizing capacity that utilize this epitope scaffolding technology. In the methods described herein, an original (parental) antibody that specifically binds a scaffold epitope is identified and sequenced. The antibody binding determinants of antibody reactivity are then identified by mutagenesis (for example, amino acid substitutions) of the antibody sequences, wherein variant antibodies are produced. These amino acid substitutions can be made in one or more CDRs and/or in one or more framework regions of the original antibody. The amino acid substitutions can be a replacement of the amino acid in the original antibody for a tryptophan. In some embodiments, the antibodies include at most one, at most two, at most three or at most four amino acids substitutions, such as in the CDRs. These variant antibodies, such as the antibodies including one, two, three or four amino acids substitutions, are then evaluated for binding to the epitope scaffold. Antibodies are selected that have altered binding affinity for the epitope scaffold as compared to the original (parental) antibody.
In particular examples, selection of residues for mutagenesis is aided by structural modeling of the scaffold- antibody interaction. To produce an antibody with enhanced binding affinity, the amino acid(s) that have been identified as critical for antibody reactivity are then further substituted and the effects on antibody reactivity measured by further probing with the epitope scaffold.
Any method known to the art can be used to determine antibody-scaffold affinity. In some examples, the epitope scaffold probe is fused to a biotinylation peptide. In some examples, the amino acid residues in the antibody that are responsible for specific binding to the epitope are indicated by a decrease in antibody affinity of the variant antibody as compared to the parental antibody. In some embodiments, antibodies are selected wherein binding is decreased by at least 20%, at least 30% at least 40% at least 50% at least 100% (2-fold), at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000% (10-fold) as compared to the original antibody. The decrease of affinity for the scaffold identifies the variant antibody as compared to the parental antibody to identify the one or more amino acids as critical for antigen binding. In one example, the complete loss of antibody binding affinity for the epitope scaffold identifies the one or more amino acid residues as critical for specific binding of antibody to the epitope. In other embodiments, variant antibodies are selected wherein binding is increased by at least 20%, at least 30% at least 40% at least 50% at least 100% (2- fold), at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000% (10-fold) as compared to the parental antibody. An increase of the binding of the variant antibody as compared to the parental antibody identifies the one or more amino acid residues as critical for specific binding of the antibody to the epitope.
An exemplary method is disclosed herein, wherein this technology is utilized for an antibody that specifically binds an antigenic glycoprotein of HIV. However, this method is broadly applicable to antibodies that specifically bind any antigen of interest. In some embodiments, the antibody specifically binds a pathogen of interest. Pathogens include viruses, fungi, bacteria, and protozoa. In other example the antibody specifically binds a tumor antigen of interest.
While this disclosure is written with specific reference to the identification of antibodies that are specific for HIV, such as antibodies specific for gpl20 and gpl40 from HIV, the methods disclosed herein are equally applicable to the identification of other antigens, for example antigens from pathogenic sources as well as tumor antigens. In some examples, using the epitope scaffolds such as described in PCT Publication No. WO 2008/025015 can be used to identify epitope specific B-cells and isolate specific IgG clones that bind to a target epitope, such as the site of vulnerability ion the surface of gpl20, as disclosed herein. In some examples, a subject is selected that produces, or has broadly neutralizing sera, such that the B- cells isolated from that subject are believed express one or more broadly neutralizing antibodies to an antigen of interest, such as an antigen from a pathogenic organism or a tumor. B-cells are isolated from the subject, and the isolated B-cells are contacted with a target antigen of interest, such as a resurfaced antigen, and the complex of the B-cells and the target antigen of interest is isolated. Nucleic acids are obtained from the B-cells are analyzed and antibodies encoded by the Ig gene are synthesized and the antibodies are further characterized. In some examples, the antibody antigen complexes are further characterized structurally, for example using
X-ray diffraction methods, which allows the important antibody/antigen contacts to be mapped. This information can be used to define classes of neutralizing antibodies specific for an antigen of interest, for example as is disclosed herein for the class of VRCOl-like antibodies. The structural information about the antigen/antibody contacts and conformation can be analyzed in conjunction with sequencing data, such as 454 sequencing data, to identify additional antibodies that have the same or similar binding properties, in that they are highly specific for a specific neutralizing epitope on the surface of the antigen of interest. By combining sequence analysis, such as 454 sequencing with structural characterization of antibody/antigen interactions at the atomic level it is now possible to identify classes of neutralizing antibodies from a subject. As disclosed herein, this has now been demonstrated for HIV using designed gpl20 antigens. In other words, the combination of sequencing, such as 454 sequencing with identified binding motifs in antibodies allows the identification of additional antibodies. Importantly, however, it allows for a shortcut as these antibodies are directly identified from B-cells as these antibodies are directly identified from B-cells without the requirement for isolating antigen specific B cells. In doing so, it ties genomics technologies directly to sera characterization. This tie permits direct interrogation of the antibodyome, which is the family of antibodies specific for an antigen or even an organism or cancer, or interest. In some examples, the methods described herein can be used to examine a time course of antibody maturation from seroconversion to production of broadly neutralizing antibodies. In some embodiments, the methods described herein are used to monitor the development of antibodies in vaccines is a subject, for example to allow feedback at the antibody sequence level and subsequent redesign of the vaccines during vaccine development. D. Identification of Antibodies of Interest
Disclosed herein are methods of that utilize a specific type of bioinformatic analysis to identify heavy or light chain variable domains of antibodies that bind an antigen of interest. Without being bound by theory, genomic analyses of B-cell cDNA libraries provide insight into sequence complexity and can be used to identify neutralizing antibodies of interest. These sequences of B cell variable domains specify the functional antibodyome, the repertoire of expressed antibody heavy and light chain sequences in each individual. High-throughput sequencing methods provide heavy chain and light chain sequences of antibodies, can be used in certain genomic analyses, such as cross-donor phylogenetic analyses, to identify an antibody that binds an antigen of interest, or an epitope of this antigen. The antigen can be any antigen of interest, such as a viral antigen. In some embodiments, the antigen is gpl20, or an epitope thereof, such as, but not limited to, resurfaced stabilized core 3 probe (RSC3). The methods disclosed herein are of use for identifying a class of antibodies of interest, such as, but not limited to, VRCOl-like antibodies.
The methods disclosed herein can be used to identify the heavy and/or light chain variable domains of antibodies that specifically bind gpl20, and specific subsets of these antibodies. In some example, the methods identify the heavy or light chain domains of VRCOl-like antibodies. Antibodies can be produced that include a heavy or light chain variable domain (or the CDR sequences in a framework region) identified by these methods, and a light chain variable domain for heavy chain variable domain (or the CDR sequences in a framework region).
Both of the heavy chain and the light chain variable domain are from antibodies that bind the same antigen, such that the resulting antibody (including three heavy chain CDRs and three light chain CDRs) binds this antigen.
Thus, with regard to a VRCOl-like antibody in some embodiments, functional characterization of selected sequences can be achieved through expression of an identified heavy or light chain variable domains followed by reconstitution with a corresponding known VRCOl-like light or heavy chain variable domains (respectively) into an antibody.
In some examples, the methods include isolating a sample such as a B cell sample from a subject (for example, see the methods described in the preceding section), such as a subject infected with HIV, and sequencing the heavy and/or chain variable domains. A data processing program can be used to assess sequence identity and divergence, such as divergence from a germline gene of interest and/or identity to a variable heavy and or variable light chain gene of interest. These programs are known to one of skill in the art, and include JOINSOLVR® (available on the internet the NIAID website), IMGT/V-Quest (the international
ImMunoGeneTics information system; imgt.org; available on the internet), ClustalW2 (for analysis of CDR3 and germline identity/divergence), and dnamlk program of PHYLIP® package (available on the internet through the University of Washington website; evolution.genetics.washington.edu/phylip.html), and the blastclust module in NCBI BLAST® package. Specific embodiments are disclosed below.
Once a heavy or light chain variable domain of interest is identified, binding to the antigen of interest (such as, but not limited to, gpl20) or an epitope of interest (such as, but not limited to, RSC3) can be determined using a cross
complementation analysis. Briefly, if the variable domain of interest is a heavy chain variable domain, the amino acid sequence of this heavy chain variable domain is produced. The heavy chain variable domain is then paired with a reference sequence light chain variable domain, such as VRC-PG04, VRCPG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03, light chain variable domain, and it is determined if the antibody specifically binds the antigen (or epitope) with a specified affinity, such as a KD of 10"8, 10"9 or 10"10 mol/L.
1. Identification of VRCOl-like heavy chains by cross-donor phylo genetic
analysis
Methods of sieving a population of antibody heavy chain sequences (such as an antibodyome of a subject) using cross-donor phylogenetic analysis to identify heavy chain variable domains of interest are provided. In some embodiments, additional functional characterization is also used. In some embodiments, a VRCOl-like heavy chain variable domain is identified by performing a cross-donor phylogenetic analysis on a population of heavy chain variable domain nucleic acid sequences that were obtained by sequencing nucleic acids, specifically heavy chain variable domains, from a sample of B cells from a subject infected with a virus, such as HIV. In some embodiments, the nucleic acids from the sample of B cells is amplified prior to sequencing.
Variations of the cross-donor phylogenetic analysis are provided herein, including all-origin cross donor phylogenetic analysis (which analyzes heavy chain sequences from IGHV1-2 germline origin and also other germline origins) and IGHV1-2 origin cross donor phylogenetic analysis (which analyzes heavy chain sequences derived from IGHV1-2 germline origin), such as to identify and isolate VRCOl-like heavy chain sequences. a. IGHV1-2 cross donor phylogenetic analysis
In some embodiments, heavy chain variable region sequences with an IGHV1-2 germline origin are used as the population of sequence for analysis. Thus, in some examples, prior to cross-donor phylogenetic analysis, the germline origins are assigned to each sequence using, for example the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site: ncbi.nlm.nih.gov/igblast/showGermline.cgi, which is specifically incorporated herein by reference in its entirety) and those sequences that are not assigned to the IGHV1-2 germline sequence, such as the IGHV1-2*01, IGHV1- 2*02, IGHVl-2*03, IGHVl-2*04, or IGHVl-2*05 germline sequence, are not included in the population of test sequences used for the cross-donor phylogenetic analysis.
In some examples, the cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from at least one known VRCOl-like antibody (reference antibody sequence or sequences), such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 to the population of test sequences. In additional embodiments, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 are added to the population of test sequences. In addition, the nucleotide sequences of the IGHV1-2 germline such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV 1-2*04, IGHV 1-2*05 germline, or of the V-gene reverted sequence for VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRC01 , VRC02, or VRC03 are added to the test population of heavy chain variable domain nucleic acid sequences. This forms a population of nucleic acid sequences for analysis, creating an analytic population. The known VCROl-like sequences are includes as a reference for the segregation of the VRCOl-like sequences, and the germline sequence is included to root the population of sequences to a common ancestor. A phylogenetic tree is constructed from this analytic population of heavy chain variable domain sequences for example by using neighbor joining analysis (for example as using the program ClustalW2
"Phylogenetic trees" option), maximum-likelihood phylogenetic analysis (for example as implemented in the computer program DNAMLK (for DNA Maximum Likelihood program with Molecular Clock) (as available on the world wide web at vailalcmgm.stanford.edu/phylip/dnamlk.html) as part of the PHYLIP package v3.69 (as available on the world wide web at
evolution.genetics.washington.edu/phylip.html), or a both, and rooted at the IGHV1- 2 germline sequence, such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , or IGHVl-2*05 germline sequence.
The nucleic acid sequences of interest that segregate in a distinct branch (such as a subtree) of the phylogenetic tree with the at least one known VRCOl-like antibody, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies. In some examples, the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody, such as least one of VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies. In some examples, the methods include selecting a nucleic acid sequence of interest that segregates in a subtree (such as the smallest subtree) with the at least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) is interposed between the IGHV 1-2*02 germline origin and the subtree in a distinct branch in the phylogenetic tree.
In some examples, the analytic population of heavy chain variable domain nucleic acid sequences from a subject is divided into sub-populations, and cross donor phylogenetic analysis on each subpopulation is performed independently. The nucleic acid sequences identified in each of the subpopulation can then be pooled and/or combined with other heavy chain variable domain nucleic acid sequences from the subject and the cross donor phylogenetic analysis performed iteratively until convergence. All of the sequences in a branch of the phylogenetic tree that segregate with the at least one known VRCOl-like antibody, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are identified as nucleic acid sequences that encode a VRCOl-like heavy chain variable domains. In some examples, the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
In some disclosed embodiments, the VRCOl-like antibody heavy chain nucleic acid sequences are identified in a stepwise fashion, in the first step an iterative screening of the antibody heavy chain nucleic acid sequences is performed based on neighbor-joining (NJ) phylogenetic analysis. Starting with a sequence (or sequences) which encompass the heavy-chain variable domain obtained from sequencing, such as 454 pyrosequencing, germline origins are assigned to each sequences using, for example using the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site: ncbi.nlm.nih.gov/igblast/showGermline.cgi). For heavy-chain sequences that share the same V-gene origin (IGHV 1-2 germline, such as the IGHV 1-2*01, IGHV1- 2*02, IGHVl-2*03, IGHVl-2*04, or IGHVl-2*05 germline), such as a VRCOl-like antibody, an iterative procedure based on the NJ method is used to search for a small set of potentially VRCOl-like sequences. In some examples, the full-length sequences of the IGHV1-2 germline origin, such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04 , IGHVl-2*05 germline origin are divided into subsets of sequences amenable to computational analysis, such as subsets of 2,500 sequences, 5,000 sequences, or 10,000 sequences. The size of these subsets is determined by the computing capabilities used in the analysis. The nucleotide sequences of heavy-chain variable domains of one or more known VRCOl-like antibodies (the reference antibody sequence or sequences), such as VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof) and the sequence of the IGHV1-2 germline, such as a IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV1- 2*04, or IGHV 1-2*05 germline, are added to each set, and a NJ phylogenetic tree is constructed, for example using the program ClustalW2 "Phylogenetic trees" option. In the tree-building process, the nucleotide distance is calculated as percent divergence between all pairs of sequences in the multiple sequence alignment. After the NJ tree is rooted at the germline gene (such as the IGHV 1-2 germline for example the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04, or IGHV1- 2*05 germline), the sequences clustered in a distinct branch containing the one or more known VRCOl-like antibodies VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis. In other words, those sequences that do not segregate in the branch of the phylogenetic tree containing the one or more of the known VRCOl-like antibodies VRC-PG04, VRC-PG04b, VRC-CH30, VRC- CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, or VRC03 (or any combination thereof, such as all of the VRCOl-like antibody sequences added to the test population) are discarded for the next round of phylogenetic tree construction. New nucleotide sequences of heavy-chain variable domains are then added together and the process is repeated in a recursive loop until convergence. Convergence occurs when all sequences in the phylogenetic tree reside within a subtree containing the known neutralizing mAbs rooted in germline and no other sequences reside between this subtree and the root, and further repetition of the analysis does not alter the constructed NJ tree.
In some examples, a second step is performed. The second step involves maximum- likelihood (ML) phylogenetic analysis of the obtained sequences. With the known VRCOl-like VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and/or VRC03 (or any combination thereof) incorporated into the data set, a multiple sequence alignment can be constructed in a similar manner to the first step and provided as input to construct phylogenetic trees using Maximum Likelihood analysis To construct phylogenetic trees using Maximum Likelihood method under the constrain that the tree estimated must be consistent with a molecular clock, for example as
implemented in the computer program DNAMLK (for DNA Maximum Likelihood program with Molecular Clock, as available on the world wide web at
vailalcmgm.stanford.edu/phylip/dnamlk.html) as part of the PHYLIP package v3.69 (as available on the world wide web at
evolution.genetics.washington.edu/phylip.html). The molecular clock is the assumption that the tips of the tree are all equidistant, in branch length, from its root. In some examples the calculation is done with default parameters (empirical base frequencies, the transitions to transversions ratio of 2.0, and the overall base substitution model as A 0.24, C 0.28, G 0.27, T 0.21). The output unrooted tree can be visualized using Dendroscope and ordered to ladderize right and rooted at the sequence the IGHV1-2 germline, such as the IGHV1-2*01, IGHVl-2*02, IGHV1- 2*03, IGHVl-2*04, or IGHVl-2*05 germline. The sequences that still segregate with the known broadly neutralizing VRCOl-like antibodies in a distinct subtree are considered "VRCOl-like" antibodies. In some examples, a nucleic acid sequence of interest that segregates in a subtree with the at least one of VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 is interposed between the IGHV 1-2*02 germline origin and the subtree in a distinct branch in the phylogenetic tree is selected.
In some embodiments the identified VRCOl-like antibody heavy chains are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity. b. All origin cross donor phylogenetic analysis
In some examples, heavy chain variable region sequences having an IGHV1-2 germline origin as well as heavy chain variable region sequences from other germline origins (e.g., all germline origins) are used as the population of sequences for analysis. Thus, the initial test population of nucleic acid heavy chain sequences includes IGHV1-2 origin sequences, and nucleic acid heavy chain sequences of up to all origins.
In some such embodiments, the cross-donor phylogenetic analysis includes adding the nucleotide sequence of a heavy chain variable domain from at least one known VRCOl -like antibody (reference antibody sequence or sequences), such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC- CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 to the test population. In some examples, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17 or at all 18 of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134 are added. The nucleotide sequences of the IGHV1-2 germline such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHVl-2*04, IGHVl-2*05 germline are also added to the population of heavy chain variable domain nucleic acid sequences. This forms a population of nucleic acid sequences for analysis, for example to create an analytic population.
The known VCROl-like sequences are included as a reference for the segregation of the VRCOl -like sequences and the germline sequences are included to root the population of sequences to their ancestor. A phylogenetic tree is constructed from this population of heavy chain variable domain sequences for example by using neighbor joining analysis (for example as using the program ClustalW2 "Phylogenetic trees" option). In the tree-building process, the nucleotide distance is calculated as percent divergence between all pairs of sequences in the multiple sequence alignment. The phylogenetic tree is rooted at the IGHV1-2 germline sequence, such as the IGHV1-2*01, IGHVl-2*02, IGHVl-2*03, IGHV1- 2*04 , or IGHV 1-2*05 germline sequence.
The nucleic acid sequences of interest in the analytic population that segregate in a distinct branch in the phylogenetic tree with the at least one known VRCOl -like antibody sequence, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or
8ANC134 (or any combination thereof, such as all of the known VRCOl -like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies. In some examples, the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
In some examples, the population of heavy chain variable domain nucleic acid sequences from a subject is divided into sub-populations, and cross donor phylogenetic analysis on each subpopulation is performed independently. The nucleic acid sequences identified in each of the subpopulations can then be pooled and/or combined with other heavy chain variable domain nucleic acid sequences from the subject and the cross donor phylogenetic analysis performed iteratively until convergence. All of the sequences in a branch of the phylogenetic tree that segregate with at least one known VRCOl-like antibody, such as least one of VRC- PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC- CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are identified as nucleic acid sequences that encode a VRCOl-like heavy chain variable domains.
In some examples, the nucleic acid sequences of interest that segregate into the smallest subtree of the phylogenetic tree with the at least one known VRCOl-like antibody, such as least one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or any combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are determined to be heavy chain variable domain nucleic acid sequences of VRCOl-like antibodies.
In some embodiments, an iterative procedure is followed. The first round of cross-donor analysis is performed as described above and the sequences clustered in a distinct branch (such as the smallest subtree) of the phylogenetic tree containing one or more of the known VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 (or a
combination thereof, such as all of the known VRCOl-like antibody sequences added to the test population) are extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis. In other words, those sequences that do not segregate in the distant branch (such as the smallest subtree) of the phylogenetic tree containing the one or more of the known VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, VRC03, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 antibodies (or any combination thereof) are discarded for the next round of phylogenetic tree construction. The process is repeated in a recursive loop until convergence. That is until all sequences in the phylogenetic tree reside within a subtree containing the known neutralizing mAbs rooted in germline and no other sequences reside between this subtree and the root, and further repetition of the analysis does not alter the constructed NJ tree.
In some examples the identified VRCOl-like antibody heavy chains are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity.
2. Identification of VRCOl-like light chains by bioinformatic interrogation Methods of sieving a population of antibody light chain sequences using a bioinformatic analysis to identify light chain variable domains of interest are provided. In some embodiments, functional characterization is also used. Thus, in some embodiments, a VRCOl-like light chain variable domain is identified by performing a bioinformatic analysis on a population of light chain variable domain nucleic acid sequences that were obtained by sequencing nucleic acids, specifically light chain variable domains, from a sample of B cells from a subject. The subject can be any subject of interest, such as, but not limited to a subject with a viral infection, such as HIV. In some embodiments, the nucleic acids from the sample of B cells are amplified prior to sequencing. These nucleic acid sequences form the test population.
In some embodiments, prior to bioinformatic analysis of light chain sequences, the germline origins are assigned to each sequence using, for example the program IgBLAST and the database of Ig germline gene sequences (as available August 9, 2011, from the NCBI web site:
ncbi.nlm.nih.gov/igblast/showGermline.cgi, which is specifically incorporated herein by reference in its entirety).
In several embodiments, a VRCOl-like light chain nucleic acid sequence is identified as encoding a light chain variable region including a CDR3 including a hydrophobic residue followed by a glutamic acid residue or glutamine residue; and, if the VRCOl-like light chain nucleic acid sequence is derived from IGK1-33 germline, the CDRl of the VRCOl-like light chain variable domain includes at least two glycine residues.
In additional embodiments, a VRCOl-like light chain nucleic acid sequence is identified as encoding a light chain variable region including a CDR3 including a hydrophobic residue followed by a glutamic acid residue or glutamine residue; and, if the germline origin of the VRCOl-like light chain variable domain is not a IGKV1-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises a deletion of two or more amino acids compared to the corresponding germline origin.
In several embodiments, the hydrophobic residue in the CDR3 of the
VRCOl-like light chain sequence is an alanine, valine, leucine, isoleucine methionine, phenylalanine or tyrosine residue. In some examples, the hydrophobic residue is a tyrosine, leucine or phenylalanine residue. In some examples, the hydrophobic residue is followed by a glutamic acid residue. In other examples, the hydrophobic residue is followed by a glutamine residue.
In several embodiments, the nucleic acid sequence encoding the VRCOl-like light chain is selected. In some examples, the nucleic acid sequence encoding the VRCOl-like light chain is produced or synthesized according to methods described herein and/or methods familiar to the person of ordinary skill.
In some embodiments, the nucleic acid sequence encoding the identified VRCOl-like light chain is selected and synthesized for further analysis, such as functional complementation.
In some examples the identified VRCOl-like antibodies are subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity.
3. Production of Test Populations and Data Analysis
In some examples, the sample used to produce the nucleic acid sequences (the test population) used the analyses described above is a sample of peripheral blood mononuclear cells. However, the sample can be a sample of isolated B cells, such as B cells isolated by fluorescence activated cell sorting. In some embodiments, B cells are isolated that express IgG. In specific, non-limiting examples, B cells are isolated that express IgG, such as IgG1; such as by using fluorescence activated cell sorting. However, the presently described method do not require that B cells be isolated that are of a specific isotype. In some embodiments, B cells are purified that specifically bind an antigen of interest, such as gpl20, or that bind a specific epitope of IgG of interest, such as resurfaced stabilized core 3 probe (RSC3).
However, in other embodiments, B cells are not selected by antigen binding prior to the isolation of nucleic acids and sequencing.
Sequence analysis is then performed on nucleic acids from the B cells, to identify immunoglobulin heavy chain sequences, immunoglobulin light chain sequences, or both. In one specific non-limiting example, the variable domain sequences are obtained from the sample. In some embodiments, ultra deep pyrosequencing, or "454 pyrosequencing" are utilized.
Some exemplary embodiments of systems and methods associated with sample preparation and processing, generation of sequence data, and analysis of sequence data are generally described below, which are amenable for use with embodiments of the presently describe methods, see U.S. Patent Publication No. 2010/0003687. In this patent publication, exemplary embodiments of systems and methods for preparation of template nucleic acid molecules, amplification of template molecules, generating target specific amplicons and/or genomic libraries, sequencing methods and instrumentation, and computer systems are described.
In some embodiments, the nucleic acid molecules derived from the sample comprising B cells are prepared and processed into template molecules amenable for high throughput sequencing. The processing methods may vary from application to application resulting in template molecules comprising various characteristics. For example, in some embodiments of high throughput sequencing, template molecules with a sequence or read length that is selected that is at least the length for which a particular sequencing method can accurately produce sequence data. For example, the length can be about 200-300 base pairs, about 350-500 base pairs, 500 base pairs, 500-1,000 base pairs, or other length amenable for a particular sequencing application. In some embodiments, nucleic acids from a sample are fragmented using a number of methods known to those of ordinary skill in the art. Methods of fragmentation such as digestion using restriction endonucleases may be employed for fragmentation purposes. Some processing methods may employ size selection methods known in the art to selectively isolate nucleic acid fragments of the desired length. Fictional elements can be employed with each template nucleic acid molecule. The elements may be employed for a variety of functions including, but not limited to, primer sequences for amplification (such as to amply heavy chain variable domains) and/or sequencing methods, quality control elements, unique identifiers (also referred to as a multiplex identifier or "MID") that encode various associations such as with a sample of origin or patient, or other functional element.
In some examples, the primers are specific for a variable heavy or light chain domain. Exemplary primers are provided in the Examples Section below. Some or all of the described functional elements may be combined into adaptor elements that are coupled to nucleotide sequences in certain processing steps. For example, some embodiments may associate priming sequence elements or regions comprising complementary sequence composition to primer sequences employed for amplification and/or sequencing. Further, the same elements may be employed for what may be referred to as "strand selection" and immobilization of nucleic acid molecules to a solid phase substrate. In some embodiments two sets of priming sequence regions (referred to as priming sequence A, and priming sequence B) can be employed for strand selection where only single strands having one copy of priming sequence A and one copy of priming sequence B is selected and included as the prepared sample. In alternative embodiments, design characteristics of the adaptor elements eliminate the need for strand selection. The same priming sequence regions can be employed in methods for amplification and immobilization where, for instance priming sequence B may be immobilized upon a solid substrate and amplified products are extended therefrom. Additional examples of sample processing for fragmentation, strand selection, and addition of functional elements and adaptors are described in U.S. Patent Application Publication No.
2004/0185484; U.S. Patent Application Publication No. 2009/0105959; and U.S. Patent Application Publication No. 2011/0003701, each of which is hereby incorporated by reference herein in its entirety for all purposes.
Various examples of systems and methods for performing amplification of template nucleic acid molecules to generate populations of substantially identical copies can be utilized. In some embodiments of Sanger based sequencing, many copies of each nucleic acid element are produced by amplification to generate a stronger signal when one or more nucleotide species is incorporated into each nascent molecule associated with a copy of the template molecule. There are many techniques known in the art for generating copies of nucleic acid molecules such as, for instance, amplification using what are referred to as bacterial vectors, "Rolling Circle" amplification (described in U.S. Pat. Nos. 6,274,320 and 7,211,390, incorporated by reference above) and Polymerase Chain Reaction (PCR) methods, including emulsion PCR methods (also referred to as emPCR™ methods). Generally emulsion PCR methods include creating a stable emulsion of two immiscible substances creating aqueous droplets within which reactions may occur. In particular, the aqueous droplets of an emulsion amenable for use in PCR methods may include a first fluid such as water based fluid suspended or dispersed as droplets (also referred to as a discontinuous phase) within another fluid such as a
hydrophobic fluid (also referred to as a continuous phase) that typically includes some type of oil. Examples of oil that may be employed include, but are not limited to, mineral oils, silicone based oils, or fluorinated oils. Emulsions can employ surfactants that act to stabilize the emulsion that may be particularly useful for specific processing methods such as PCR. Some embodiments of surfactant include one or more of a silicone or fluorinated surfactant. In some examples, one or more non-ionic surfactants can be employed that include but are not limited to sorbitan monooleate (also referred to as SPAN™80), polyoxyethylenesorbitsan monooleate (also referred to as TWEEN™80), or in some preferred embodiments dimethicone copolyol (also referred to as ABIL™ EM90), polysiloxane, polyalkyl polyether copolymer, polyglycerol esters, poloxamers, and PVP/hexadecane copolymers (also referred to as Unimer U- 151), or a high molecular weight silicone polyether in cyclopentasiloxane (also referred to as DC 5225C available from Dow Coming).
The droplets of an emulsion may also be referred to as compartments, microcapsules, microreactors, microenvironments, or other name commonly used in the related art. The aqueous droplets may range in size depending on the
composition of the emulsion components or composition, contents contained therein, and formation technique employed. The described emulsions create the microenvironments within which chemical reactions, such as PCR, may be performed. For example, template nucleic acids and all reagents necessary to perform a desired PCR reaction may be encapsulated and chemically isolated in the droplets of an emulsion. Additional surfactants or other stabilizing agent may be employed in some embodiments to promote additional stability of the droplets as described above. Thermocycling operations typical of PCR methods may be executed using the droplets to amplify an encapsulated nucleic acid template resulting in the generation of a population comprising many substantially identical copies of the template nucleic acid. In some embodiments, the population within the droplet may be referred to as a "clonally isolated", "compartmentalized", "sequestered", "encapsulated", or "localized" population. Also in the present example, some or all of the described droplets may further encapsulate a solid substrate such as a bead for attachment of template and amplified copies of the template, amplified copies complementary to the template, or combination thereof. Further, the solid substrate may be enabled for attachment of other type of nucleic acids, reagents, labels, or other molecules of interest.
In some embodiments, target specific amplicons for sequencing are employed that include using sets of specific nucleic acid primers to amplify a selected target region or regions from a sample comprising the target nucleic acid, such as immunoglobulins. The nucleic acid is first subjected to amplification by a pair of PCR primers designed to amplify a region surrounding the region of interest or segment common to the nucleic acid population. Each of the products of the PCR reaction (first amplicons) is subsequently further amplified individually in separate reaction vessels such as an emulsion based vessel described above. The resulting amplicons (referred to herein as second amplicons), each derived from one member of the first population of amplicons, are sequenced and the collection of sequences, from different emulsion PCR amplicons (i.e. second amplicons), are used to determine an allelic frequency.
The methods can employ high throughput sequencing instrumentation such as for instance embodiments that employ what is referred to as a
PICOTITREPLATE® array (also sometimes referred to as a PTP™ plate or array) of wells provided by 454 Life Sciences Corporation, the described methods can be employed to generate sequence composition for over 100,000, over 300,000, over 500,000, or over 1,000,000 nucleic acid regions per run or experiment. These methods can provide a sensitivity of detection of low abundance alleles which may represent 1% or less of the allelic variants. Another advantage of the methods includes generating data comprising the sequence of the analyzed region. Generally, it is not necessary to have prior knowledge of the sequence of the locus being analyzed.
Embodiments of sequencing include Sanger type techniques, techniques generally referred to as Sequencing by Hybridization (SBH), Sequencing by
Ligation (SBL), or Sequencing by Incorporation (SBI) techniques. Further, the sequencing techniques include colony sequencing techniques; nanopore, waveguide and other single molecule detection techniques; or reversible terminator techniques. One technique of use is Sequencing by Synthesis methods. For example, in some Sanger Bead Sequencing (SBS) embodiments, sequence populations of substantially identical copies of a nucleic acid template and typically employ one or more oligonucleotide primers designed to anneal to a predetermined, complementary position of the sample template molecule or one or more adaptors attached to the template molecule. The primer/template complex is presented with a nucleotide species in the presence of a nucleic acid polymerase enzyme. If the nucleotide species is complementary to the nucleic acid species corresponding to a sequence position on the sample template molecule that is directly adjacent to the 3' end of the oligonucleotide primer, then the polymerase will extend the primer with the nucleotide species. Alternatively, in some embodiments, the primer/template complex is presented with a plurality of nucleotide species of interest (typically A, G, C, and T) at once, and the nucleotide species that is complementary at the corresponding sequence position on the sample template molecule directly adjacent to the 3' end of the oligonucleotide primer is incorporated. In either of the described embodiments, the nucleotide can be chemically blocked (such as at the 3'-0 position) to prevent further extension, and need to be deblocked prior to the next round of synthesis. The process of adding a nucleotide species to the end of a nascent molecule is substantially the same as that described above for addition to the end of a primer.
Incorporation of the nucleotide species can be detected by a variety of methods known in the art, e.g. by detecting the release of pyrophosphate (PPi) (examples described in U.S. Pat. Nos. 6,210,891; 6,258,568; and 6,828,100, each of which is hereby incorporated by reference herein in its entirety for all purposes), or via detectable labels bound to the nucleotides. Some examples of detectable labels include but are not limited to mass tags and fluorescent or chemiluminescent labels. In typical embodiments, unincorporated nucleotides are removed, for example by washing. Further, in some embodiments the unincorporated nucleotides may be subjected to enzymatic degradation such as, for instance, degradation using the apyrase or pyrophosphatase enzymes. In the embodiments where detectable labels are used, they will typically have to be inactivated (e.g. by chemical cleavage or photobleaching) prior to the following cycle of synthesis. The next sequence position in the template/polymerase complex can then be queried with another nucleotide species, or a plurality of nucleotide species of interest, as described above. Repeated cycles of nucleotide addition, extension, signal acquisition, and washing result in a determination of the nucleotide sequence of the template strand. Continuing with the present example, a large number or population of substantially identical template molecules are typically analyzed simultaneously in any one sequencing reaction, in order to achieve a signal which is strong enough for reliable detection.
Some examples of SBS apparatus can include one or more of a detection device such as a charge coupled device (i.e. CCD camera) or a confocal type architecture, a microfluidics chamber or flow cell, a reaction substrate, and/or a pump and flow valves. Taking the example of pyrophosphate based sequencing, embodiments of an apparatus may employ a chemiluminescent detection strategy that produces an inherently low level of background noise. In some embodiments, the reaction substrate for sequencing may include what is referred to as a PTP™ array, as described above, formed from a fiber optics faceplate that is acid-etched to yield hundreds of thousands or more of very small wells each enabled to hold a population of substantially identical template molecules (in one example, this is about 3.3 million wells on a 70 X 75mm PTP™ array at a 35 μιη well to well pitch). In some embodiments, each population of substantially identical template molecule is disposed upon a solid substrate such as a bead, each of which may be disposed in one of said wells. For example, an apparatus can include a reagent delivery element for providing fluid reagents to the PTP plate holders, as well as a CCD type detection device enabled to collect photons of light emitted from each well on the PTP plate. Further examples of apparatus and methods for performing SBS type sequencing and pyrophosphate sequencing are described in U.S. Pat. No. 7,323,305 which is incorporated by reference above.
Systems can be employed that automate one or more sample preparation processes, such as the emPCR™ process described above. For example, automated systems can be employed to provide an efficient solution for generating an emulsion for emPCR processing, performing PCR Thermocycling operations, and enriching for successfully prepared populations of nucleic acid molecules for sequencing. Examples of automated sample preparation systems are described in U.S. Published Patent Application No. 2005/0227264. The systems can include implementation of some design, analysis, or other operation using a computer readable medium stored for execution on a computer system. For example, these computer systems can analyze data generated using SBS systems and methods where the processing and analysis embodiments are implementable on computer systems.
These computer systems include any type of computer platform such as a workstation, a personal computer, a server, or any other present or future computer. These can be configured to perform the specialized operations of the present methods. Computers typically include known components such as a processor, an operating system, system memory, memory storage devices, input-output controllers, input-output devices, and display devices. They can also include cache memory, a data backup unit, and many other devices. Display devices include display devices that provide visual information, this information typically may be logically and/or physically organized as an array of pixels. An interface controller may also be used that has software programs for providing input and output interfaces. For example, interfaces may include what are generally referred to as "Graphical User Interfaces" (often referred to as GUI's) that provides one or more graphical representations to a user. Interfaces are typically enabled to accept user inputs using means of selection or input. The processor can include a commercially available processor such as a CORE™ or PENTIUM® processor made by Intel Corporation, a SPARC® processor made by Sun Microsystems, an ATHALON® or OPTERON® processor made by AMD corporation, or it may be one of other processors that are or will become available. Some embodiments of a processor may include what is referred to as Multi-core processor and/or be enabled to employ parallel processing technology in a single or multi-core configuration. For example, a multi-core architecture typically comprises two or more processor "execution cores." In the present example, each execution core may perform as an independent processor that enables parallel execution of multiple threads. In addition, those of ordinary skill in the related will appreciate that a processor may be configured in what is generally referred to as 32 or 64 bit architectures, or other architectural configurations now known or that may be developed in the future. A processor typically executes an operating system, which may be, for example, a
WINDOWS®-type operating system from Microsoft Corporation; or the Mac OS X™ operating system from Apple Computer Corp a UNIX® or Linux-type operating system available An operating system, typically in cooperation with a processor, coordinates and executes functions of the other components of a computer. System memory may include any of a variety of known memory storage devices. Examples include any commonly available random access memory (RAM), magnetic medium such as a resident hard disk or tape, an optical medium such as a read and write compact disc, or other memory storage device, including a compact disk drive, USB or flash drive, or a diskette drive.
An instrument control and/or a data processing application, if implemented in software, may be loaded into and executed from system memory and/or a memory storage device. All or portions of the instrument control and/or data processing applications may also reside in a read-only memory or similar device of the memory storage device, such devices not requiring that the instrument control and/or data processing applications first be loaded through input-output controllers. It will be understood by those skilled in the relevant art that the instrument control and/or data processing applications, or portions of it, may be loaded by a processor in a known manner into system memory, or cache memory, or both, as advantageous for execution. A computer can include one or more library files, experiment data files, and an internet client stored in system memory. For example, experiment data could include data related to one or more experiments or assays such as detected signal values, or other values associated with one or more SBS experiments or processes.
4. Other Applications
The methods disclosed herein have broad applications for of identifying specific antibodies, heavy chains, light chains, classes or species of antibodies with defined specificity, for example as exemplified by the identification of VRCOl-like antibodies disclosed herein. This combination of structural and genomic analysis of
Ig may provide a generic way of identifying specific antibodies, as well as classes or species of antibodies with defined specificities. Such antibodies, like VRCOl and related antibodies, can potentially be used for prevention strategies, such as microbicides or passive protection of HIV infection, vaccine design, diagnostics, and therapy of infected individuals.
Thus, viral antigenic epitopes can be used with the methods disclosed herein to identify classes of antibodies specific for the antigen of interest. Antigens of use in the methods disclosed herein include, but are not limited to, antigenic epitopes from dengue virus, human immunodeficiency virus, influenza virus,
metapneumo virus, norovirus, papillomavirus, parvovirus, SARS virus, smallpox virus, picornaviruses, respiratory syncitial virus, parainfluenza virus, measles, hepatitis, measles, varicella zoster, rabies and West Nile virus, among many others. In some embodiments, the antigenic epitope is from a virus causes a respiratory disorder (for example, adeno, echo, rhino, coxsackie, influenza, parainfluenza, or respiratory syncytial virus), a digestive disorder (for example, rota, parvo, dane particle, or hepatitis A virus), an epidermal-epithelial disorder (for example, verruca, papilloma, molluscum, rubeola, rubella, small pox, cowpox), a herpes virus disease (for example, varicella-zoster, simplex I, or simplex II virus), an arbovirus disease (for example, dengue, yellow, or hemorrhagic fevers), a viral disease of the central nervous system (for example, polio or rabies), a viral heart disease, or acquired immune deficiency (AIDS). The antigenic epitope can also be from a bacteria. In some examples, bacteria antigenic epitope is a Pyogenic cocci antigen from an organism that causes, for example, staphylococcal, streptococcal, pneumococcal, meningococcal, and gonococcal infections; a gram-negative rod antigen from an organism that causes, for example, E. coli, Klebsiella, enterobacter, pseudomonas, or legionella infections; an antigenic epitope for an organism that causes, for example, hemophilus influenza, bordetella pertussis, or diphtheria infections. Also encompassed in this disclosure are bacterial antigens from enteropathic bacteria (for example, S. typhi), Clostridia (for example, C. tetani or C. botulinum)), and mycobacteria (for example, M. tuberculosis or M. leprae). Exemplary antigens are the CFP10 polypeptide or a domain of other polypeptides of Mycobacterium tuberculosis, or of a domain of the pilus polypeptide of Vibrio cholera, the CjaA polypeptide of Campylobacter coli, the Sfbl polypeptide of Streptococcus pyogenes, the UreB polypeptide Helicobacter pylori, or of other pathogenic organisms such as the circumsporozoite polypeptide of Plasmodium falciparum. Non-limiting examples of bacterial (including mycobacterial) epitopes can be found, for example, in Mei et al., Mol. Microbiol. 26:399-407, 1997; and U. S. Patent Nos. 6,790,950 (gram negative bacteria); 6,790,448 (gram positive bacteria); 6,776,993 and
6,384,018 {Mycobacterium tuberculosis).
In additional examples, the antigenic epitope is from a Chlamydia that causes ornithosis (C. psittaci), chlamydial urethritis and cervicitis (C. trachomatis), inclusion conjunctivitis (C. trachomatis), trachoma (C. trachomatis), or
lymphogranuloma venereum (C. trachomatis)). In additional examples, the antigen epitope is from rickettsia that causes typhus fever (R. prowazekii), Rocky Mountain spotted fever (R. rickettsi), scrub fever (R. tsutsugamushi), or Q fever (Coxiella burnetii).
In particular embodiments, the antigenic epitope is from a fungus, such as Candidae (for example, C. albicans) or Aspergillis (for example, A. fumigatus). In other embodiments, the protozoan antigen is from, for example, Giardia Lamblia, Trichomoniasis, Pneumocystosis, Plasmodium, Leishmania, or Toxoplasma. In further embodiments, the helminth antigen is from, for example, Trichuris, Necator americanus (hookworm disease), Ancylostoma duodenale (hookworm disease), Trichinella spiralis, or S. mansoni.
In additional embodiments, the method is applied to identify antibodies that bind antigenic epitopes of tumor antigens. Tumor antigens include, but are not limited to carcinoembryonic antigen ("CEA:" e.g., GENBANK® Accession No. AAA62835), ras proteins (see, e.g., Parada et al. Nature 297:474-478, 1982), p53 protein (e.g., GENBANK® Accession No. P07193), prostate-specific antigen
("PSA:" e.g., GENBANK® Accession Nos. NP001639, NP665863), Mucl (e.g., GENBANK® Accession No. P15941), tyrosinase (see, e.g., Kwon et al., Proc Natl Acad Sci USA 84:7473-7477, 1987, erratum Proc Natl Acad Sci USA 85:6352, 1988 Melanoma- associated antigen (MAGEs: for examples, see, U.S. Patent Nos:
5,462,871; 5,554,724; 5,554,506; 5,541,104 and 5,558,995). The tumor antigen can be from a tumor of any organ or tissue, including but not limited to solid organ tumors. For example, the tumor can be melanoma, colon-, breast-, lung, cervical-, ovarian, endometrial-, prostate-, skin-, brain-, liver-, kidney, thyroid, pancreatic, esophageal-, or gastric cancer, leukemias, lymphomas, multiple myeloma, myelodysplasia syndrome, premalignant human papilloma virus (HPV)-related lesions, intestinal polyps and other chronic states associated with increased tumor risk.
E. Compositions and Therapeutic Methods
Methods are disclosed herein for the prevention or treatment of an HIV infection, such as an HIV-1 infection. Prevention can include inhibition of infection with HIV-1. The methods include contacting a cell with an effective amount of the human monoclonal antibodies disclosed herein that specifically binds gpl20, or a functional fragment thereof. The method can also include administering to a subject a therapeutically effective amount of the human monoclonal antibodies to a subject.
Methods to assay for neutralization activity include, but are not limited to, a single-cycle infection assay as described in Martin et al. (2003) Nature
Biotechnology 21:71-76. In this assay, the level of viral activity is measured via a selectable marker whose activity is reflective of the amount of viable virus in the sample, and the IC50 is determined. In other assays, acute infection can be monitored in the PMl cell line or in primary cells (normal PBMC). In this assay, the level of viral activity can be monitored by determining the p24 concentrations using ELISA. See, for example, Martin et al. (2003) Nature Biotechnology 21:71-76.
HIV infection does not need to be completely eliminated for the composition to be effective. For example, a composition can decrease HIV infection by a desired amount, for example by at least 10%, at least 20%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or even at least
100% (elimination of detectable HIV infected cells), as compared to HIV infection in the absence of the composition. In example, the cell is also contacted with an effective amount of an additional agent, such as anti-viral agent. The cell can be in vivo or in vitro. The methods can include administration of one on more additional agents known in the art. In additional examples, HIV replication can be reduced or inhibited by similar methods. HIV replication does not need to be completely eliminated for the composition to be effective. For example, a composition can decrease HIV replication by a desired amount, for example by at least 10%, at least 20%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or even at least 100% (elimination of detectable HIV), as compared to HIV replication in the absence of the composition. In one example, the cell is also contacted with an effective amount of an additional agent, such as antiviral agent. The cell can be in vivo or in vitro.
Compositions are provided that include one or more of the antibodies that specifically bind gpl20, or functional fragments thereof, that are disclosed herein in a carrier. The compositions can be prepared in unit dosage forms for administration to a subject. The amount and timing of administration are at the discretion of the treating physician to achieve the desired purposes. The antibody can be formulated for systemic or local administration. In one example, the antibody that specifically binds gpl20 is formulated for parenteral administration, such as intravenous administration.
The compositions for administration can include a solution of the antibody that specifically binds gpl20 dissolved in a pharmaceutically acceptable carrier, such as an aqueous carrier. A variety of aqueous carriers can be used, for example, buffered saline and the like. These solutions are sterile and generally free of undesirable matter. These compositions may be sterilized by conventional, well known sterilization techniques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like. The concentration of antibody in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight and the like in accordance with the particular mode of administration selected and the subject's needs.
A typical pharmaceutical composition for intravenous administration includes about 0.1 to 10 mg of antibody per subject per day. Dosages from 0.1 up to about 100 mg per subject per day may be used, particularly if the agent is administered to a secluded site and not into the circulatory or lymph system, such as into a body cavity or into a lumen of an organ. Actual methods for preparing administrable compositions will be known or apparent to those skilled in the art and are described in more detail in such publications as Remington's Pharmaceutical Science, 19th ed., Mack Publishing Company, Easton, PA (1995).
Antibodies may be provided in lyophilized form and rehydrated with sterile water before administration, although they are also provided in sterile solutions of known concentration. The antibody solution is then added to an infusion bag containing 0.9% sodium chloride, USP, and typically administered at a dosage of from 0.5 to 15 mg/kg of body weight. Considerable experience is available in the art in the administration of antibody drugs, which have been marketed in the U.S. since the approval of RITUXAN® in 1997. Antibodies can be administered by slow infusion, rather than in an intravenous push or bolus. In one example, a higher loading dose is administered, with subsequent, maintenance doses being
administered at a lower level. For example, an initial loading dose of 4 mg/kg may be infused over a period of some 90 minutes, followed by weekly maintenance doses for 4-8 weeks of 2 mg/kg infused over a 30 minute period if the previous dose was well tolerated.
A therapeutically effective amount of a human gp 120- specific antibody will depend upon the severity of the disease and/or infection and the general state of the patient's health. A therapeutically effective amount of the antibody is that which provides either subjective relief of a symptom(s) or an objectively identifiable improvement as noted by the clinician or other qualified observer. These
compositions can be administered in conjunction with another therapeutic agent, either simultaneously or sequentially.
In one embodiment, administration of the antibody results in a reduction in the establishment of HIV infection and/or reducing subsequent HIV disease progression in a subject. A reduction in the establishment of HIV infection and/or a reduction in subsequent HIV disease progression encompass any statistically significant reduction in HIV activity. In some embodiments, methods are disclosed for treating a subject with an HIV-1 infection. These methods include administering to the subject a therapeutically effective amount of an antibody, or a nucleic acid encoding the antibody, thereby preventing or treating the HIV-1 infection. Studies have shown that the rate of HIV transmission from mother to infant is reduced significantly when zidovudine is administered to HIV-infected women during pregnancy and delivery and to the offspring after birth (Connor et al., 1994 Pediatr Infect Dis J 14: 536-541). Several studies of mother- to-infant transmission of HIV have demonstrated a correlation between the maternal virus load at delivery and risk of HIV transmission to the child. The present disclosure provides isolated human monoclonal antibodies that are of use in decreasing HIV-transmission from mother to infant. Thus, in some examples a therapeutically effective amount of a human gp 120- specific antibody is administered in order to prevent transmission of HIV, or decrease the risk of transmission of HIV, from a mother to an infant. In some examples, a therapeutically effective amount of the antibody is administered to mother and/or to the child at childbirth. In other examples, a therapeutically effective amount of the antibody is administered to the mother and/or infant prior to breast feeding in order to prevent viral transmission to the infant or decrease the risk of viral transmission to the infant. In some embodiments, both a therapeutically effective amount of the antibody and a therapeutically effective amount of another agent, such as zidovudine, is administered to the mother and/or infant.
For any application, the antibody can be combined with anti-retroviral therapy. Antiretroviral drugs are broadly classified by the phase of the retrovirus life-cycle that the drug inhibits. The disclosed antibodies can be administered in conjunction with Nucleoside and nucleotide reverse transcriptase inhibitors (nRTI), Non-nucleoside reverse transcriptase inhibitors (NNRTI), Protease inhibitors, Entry inhibitors (or fusion inhibitors), Maturation inhibitors, or a Broad spectrum inhibitors, such as natural antivirals. Exemplary agents include lopinavir, ritonavir, zidovudine, lamivudine, tenofovir, emtricitabine and efavirenz.
Single or multiple administrations of the compositions including the antibodies disclosed herein are administered depending on the dosage and frequency as required and tolerated by the patient. In any event, the composition should provide a sufficient quantity of at least one of the antibodies disclosed herein to effectively treat the patient. The dosage can be administered once but may be applied periodically until either a therapeutic result is achieved or until side effects warrant discontinuation of therapy. In one example, a dose of the antibody is infused for thirty minutes every other day. In this example, about one to about ten doses can be administered, such as three or six doses can be administered every other day. In a further example, a continuous infusion is administered for about five to about ten days. The subject can be treated at regular intervals, such as monthly, until a desired therapeutic result is achieved. Generally, the dose is sufficient to treat or ameliorate symptoms or signs of disease without producing unacceptable toxicity to the patient.
Controlled-release parenteral formulations can be made as implants, oily injections, or as particulate systems. For a broad overview of protein delivery systems see, Banga, A.J., Therapeutic Peptides and Proteins: Formulation,
Processing, and Delivery Systems, Technomic Publishing Company, Inc., Lancaster, PA, (1995). Particulate systems include microspheres, microparticles,
microcapsules, nanocapsules, nanospheres, and nanoparticles. Microcapsules contain the therapeutic protein, such as a cytotoxin or a drug, as a central core. In microspheres the therapeutic is dispersed throughout the particle. Particles, microspheres, and microcapsules smaller than about 1 μιη are generally referred to as nanoparticles, nanospheres, and nanocapsules, respectively. Capillaries have a diameter of approximately 5 μιη so that only nanoparticles are administered intravenously. Microparticles are typically around 100 μιη in diameter and are administered subcutaneously or intramuscularly. See, for example, Kreuter, J., Colloidal Drug Delivery Systems, J. Kreuter, ed., Marcel Dekker, Inc., New York, NY, pp. 219-342 (1994); and Tice & Tabibi, Treatise on Controlled Drug Delivery, A. Kydonieus, ed., Marcel Dekker, Inc. New York, NY, pp. 315-339, (1992).
Polymers can be used for ion-controlled release of the antibody compositions disclosed herein. Various degradable and nondegradable polymeric matrices for use in controlled drug delivery are known in the art (Langer, Accounts Chem. Res.
26:537-542, 1993). For example, the block copolymer, polaxamer 407, exists as a viscous yet mobile liquid at low temperatures but forms a semisolid gel at body temperature. It has been shown to be an effective vehicle for formulation and sustained delivery of recombinant interleukin-2 and urease (Johnston et ah, Pharm. Res. 9:425-434, 1992; and Pec et al, J. Parent. Sci. Tech. 44(2):58-65, 1990).
Alternatively, hydroxyapatite has been used as a microcarrier for controlled release of proteins (Ijntema et al., Int. J. Pharm.112:215-224, 1994). In yet another aspect, liposomes are used for controlled release as well as drug targeting of the lipid- capsulated drug (Betageri et ah, Liposome Drug Delivery Systems, Technomic Publishing Co., Inc., Lancaster, PA (1993)). Numerous additional systems for controlled delivery of therapeutic proteins are known (see U.S. Patent No.
5,055,303; U.S. Patent No. 5,188,837; U.S. Patent No. 4,235,871; U.S. Patent No. 4,501,728; U.S. Patent No. 4,837,028; U.S. Patent No. 4,957,735; U.S. Patent No. 5,019,369; U.S. Patent No. 5,055,303; U.S. Patent No. 5,514,670; U.S. Patent No. 5,413,797; U.S. Patent No. 5,268,164; U.S. Patent No. 5,004,697; U.S. Patent No. 4,902,505; U.S. Patent No. 5,506,206; U.S. Patent No. 5,271,961; U.S. Patent No. 5,254,342 and U.S. Patent No. 5,534,496).
E. Diagnostic Methods and Kits
A method is provided herein for the detection of the expression of gpl20 in vitro or in vivo. In one example, expression of gpl20 is detected in a biological sample, and can be used to detect HIV-1 infection. The sample can be any sample, including, but not limited to, tissue from biopsies, autopsies and pathology specimens. Biological samples also include sections of tissues, for example, frozen sections taken for histological purposes. Biological samples further include body fluids, such as blood, serum, plasma, sputum, spinal fluid or urine.
In several embodiments, a method is provided for detecting AIDS and/or an
HIV-1 infection in a subject. The disclosure provides a method for detecting HIV-1 in a biological sample, wherein the method includes contacting a biological sample with the antibody under conditions conducive to the formation of an immune complex, and detecting the immune complex, to detect the gpl20 in the biological sample. In one example, the detection of gpl20 in the sample indicates that the subject has an HIV infection. In another example, the detection of gpl20 in the sample indicates that the subject has AIDS. In another example, detection of gpl20 in the sample confirms a diagnosis of AIDS and/or an HIV-1 infection in a subject.
In some embodiments, the disclosed antibodies are used to test vaccines. For example to test if a vaccine composition assumes the same conformation as a gpl20 peptide. Thus provided herein is a method for detecting testing a vaccine, wherein the method includes contacting a sample containing the vaccine, such as a gpl20 immunogen, with the antibody under conditions conducive to the formation of an immune complex, and detecting the immune complex, to detect the vaccine g in the sample. In one example, the detection of the immune complex in the sample indicates that vaccine component, such as such as a gpl20 immunogen assumes a conformation capable of binding the antibody.
In one embodiment, the antibody is directly labeled with a detectable label. In another embodiment, the antibody that binds gpl20 (the first antibody) is unlabeled and a second antibody or other molecule that can bind the antibody that binds gpl20 is utilized. As is well known to one of skill in the art, a second antibody is chosen that is able to specifically bind the specific species and class of the first antibody. For example, if the first antibody is a human IgG, then the secondary antibody may be an anti-human-lgG. Other molecules that can bind to antibodies include, without limitation, Protein A and Protein G, both of which are available commercially.
Suitable labels for the antibody or secondary antibody are described above, and include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, magnetic agents and radioactive materials. Non-limiting examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, beta- galactosidase, or acetylcholinesterase. Non-limiting examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin. Non-limiting examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin. A non-limiting exemplary luminescent material is luminol; a non-limiting exemplary a magnetic agent is gadolinium, and non-limiting exemplary radioactive labels include 125 I, 131 I, 35 S or 3 H.
The immunoassays and method disclosed herein can be used for a number of purposes. Kits for detecting a polypeptide will typically comprise an antibody that binds gpl20, such as any of the antibodies disclosed herein. In some embodiments, an antibody fragment, such as an Fv fragment or a Fab is included in the kit. In a further embodiment, the antibody is labeled (for example, with a fluorescent, radioactive, or an enzymatic label).
I l l In one embodiment, a kit includes instructional materials disclosing means of use. The instructional materials may be written, in an electronic form (such as a computer diskette or compact disk) or may be visual (such as video files). The kits may also include additional components to facilitate the particular application for which the kit is designed. Thus, for example, the kit may additionally contain means of detecting a label (such as enzyme substrates for enzymatic labels, filter sets to detect fluorescent labels, appropriate secondary labels such as a secondary antibody, or the like). The kits may additionally include buffers and other reagents routinely used for the practice of a particular method. Such kits and appropriate contents are well known to those of skill in the art.
In one embodiment, the diagnostic kit comprises an immunoassay. Although the details of the immunoassays may vary with the particular format employed, the method of detecting gpl20 in a biological sample generally includes the steps of contacting the biological sample with an antibody which specifically reacts, under immunologically reactive conditions, to gpl20. The antibody is allowed to specifically bind under immunologically reactive conditions to form an immune complex, and the presence of the immune complex (bound antibody) is detected directly or indirectly. The following examples are provided to illustrate certain particular features and/or embodiments. These examples should not be construed to limit the disclosure to the particular features or embodiments described.
EXAMPLES
Example 1
Identification of VCROl-like Antibodies
HIV-1 exhibits extraordinary genetic diversity and has evolved multiple mechanisms of resistance to evade the humoral immune response. Despite these obstacles, 10-25% of HIV-1 -infected individuals develop cross-reactive neutralizing antibodies after several years of infection. Elicitation of such antibodies could form the basis for an effective HIV-1 vaccine, and intense effort has focused on identifying responsible antibodies and delineating their characteristics. A variety of monoclonal antibodies (mAbs) have been isolated that recognize a range of epitopes on the functional HIV-1 viral spike, which is composed of three highly glycosylated gpl20 exterior envelope glycoproteins and three transmembrane gp41 molecules. Some broadly neutralizing antibodies are directed against the membrane-proximal external region of gp41, but the majority recognize gpl20. These include the quaternary structure -preferring antibodies PG9, PG16, and CHOI -04, the glycan- reactive antibodies 2G12 and PGT121-144, and antibodies bl2, HJ16 and VRCOl - 03, which are directed against the region of HIV-1 gpl20 involved in initial contact with the CD4 receptor.
One unusual characteristic of all these gpl20-reactive broadly neutralizing antibodies is a high level of somatic mutation. Antibodies typically accumulate 5- 10% changes in variable domain-amino acid sequence during the affinity maturation process, but for these gpl20- reactive antibodies, the degree of somatic mutation is markedly increased, ranging from -15- 20% for the quaternary structure-preferring antibodies and antibody HG16, to -25% for antibody 2G12 and to -30% for the CD4-binding-site antibodies, VRCOl, VRC02, and VRC03.
In the case of VRCOl, the mature antibody accumulates almost 70 total changes in amino acid sequence during the maturation process. The mature VRCOl can neutralize -90% of HIV-1 isolates at a geometric mean IC50 of 0.3 μg/ml, and structural studies show that it achieves 4 this neutralization by precisely recognizing the initial site of CD4 attachment on HIV-1 gpl20. By contrast, the predicted unmutated germline ancestor of VRCOl has weak affinity for typical strains of gpl20 (~mM). Moreover, with only two unique VRCOl-like antibodies identified in a single individual (donor 45), it has been unclear whether the VRCOl mode of recognition, genetic origin, and pathway of affinity maturation represent general features of the B-cell response to HIV-1. VRCOl-like antibodies were isolated from two additional HIV-1- infected donors, determine their liganded-crystal structures with gpl20, examine cross-donor complementation of heavy and light chain function, and use deep sequencing to analyze the repertoire, lineage, and maturation pathways of related antibody sequences in each of two donors. The analysis presented here focuses primarily on the heavy chain, although some analysis of the light chain is also undertaken. Definition of the structural convergence and maturation pathways by which VRCOl-like antibodies achieve broad neutralization of HIV- 1 provides a framework for understanding the development of these antibodies and for efforts to guide their induction.
Isolation of neutralizing antibodies from donors 74 and 0219 with a CD4-binding-site probe. Structure-guided resurfacing was to alter the antigenic surfaces on HIV-1 gpl20 while preserving the initial site of attachment to the CD4 receptor (see Wu et ah, Science 329, 856 (2010), which is specifically incorporated by reference in its entirety). Resurfaced stabilized core 3 probe (RSC3), as well as a non-binding mutant (ARSC3),were used to interrogate a panel of 12 broadly neutralizing sera derived from the IAVI protocol G cohort of HIV-1 infected individuals (FIG. 1A). A substantial fraction of neutralization of three sera was specifically blocked by RSC3 compared with ARSC3, indicating the presence of CD4-binding- site-directed neutralizing antibodies. RSC3-neutralization competition assays also confirmed the presence of CD4- binding-site antibodies in the previously characterized sera 0219 (FIG. 1A).
Peripheral blood mononuclear cells (PBMCs) from protocol G donor 74 (infected with A/D recombinant HIV-1) and from CHAVI donor 0219 (infected with clade A HIV-1) were used for antigen specific B-cell sorting and antibody isolation. PBMCs were incubated with both RSC3 and ARSC3, each conjugated to a different fluorochrome, and flow cytometric analysis was used to identify and to sort individual IgG+ B cells reactive with RSC3 and not ARSC3. For donor 74 and 0219, respectively, a total of 0.13% and 0.15% of IgG+ B cells were identified (FIGS. IB and 8). The heavy and light chain immunoglobulin genes from individual B-cells were amplified and cloned into IgGl expression vectors that reconstituted the full IgG. From donor 74, two somatically related antibodies named VRC-PG04 and
VRC-PG04b demonstrated strong binding to several versions of gpl20 and to RSC3 but ~100-fold less binding to ARSC3 (FIG. 9 and FIG. 24). From donor 0219, three somatically related antibodies named VRC-CH30, 31, and 32 displayed a similar pattern of RSC3/ARSC3 reactivity (FIG. 9 and FIG. 24). Sequence analysis of these two sets of unique antibodies (FIG. 1C and FIG. 24) revealed that they originated from the same inferred immunoglobulin heavy chain variable (IGHV) precursor gene allele IGHV 1-2*02. Despite this similarity in heavy chain V- gene origin, the two unique antibody clones originated from different heavy chain J segment genes and contained different light chains. The light chains of the VRC-PG04 and 04b somatic variants originated from an IGKV3 allele while the VRC-CH30, 31 and 32 somatic variants derived from an IGKVI allele. Of note, all five antibodies contained unusually high mutation frequencies: VRC-PG04 and 04b displayed a VH gene mutation frequency of 30% relative to the germline IGHVl-2*02 allele, a level of affinity maturation similar to that previously observed with VRCOl-03; the VRC- CH30, 31 and 32 antibodies were also highly affinity matured, with VH mutation frequency of 23-24%.
To define the reactivities of these new antibodies on gpl20, competition
ELISAs were performed with a panel of well-characterized mAbs. Binding by each of the new antibodies was competed by VRCOl-03, by other CD4-binding-site antibodies and by CD4-Ig, but not by antibodies known to bind gpl20 at other sites (FIGs. ID and 10). Despite similarities in gpl20 reactivity and VH-genomic origin, sequence similarities of heavy and light chain gene regions did not readily account for their common mode of gpl20 recognition (FIG. IE). Finally, assessment of VRC-PG04 and VRC-CH31 neutralization on a panel of Env-pseudoviruses revealed their ability to potently neutralize a majority of diverse HIV-1 isolates (FIG. IF and FIGs. 25-28).
Structural definition of gpl20 recognition by RSC3-identified antibodies from different donors: A remarkable convergence. To define the mode of gpl20 recognition employed by donor 74-derived VRC-PG04, its antigen-binding fragment (Fab) was crystalized in complex with a gpl20 core from the clade A/E recombinant
93TH057 that was previously crystallized with VRC01. Diffraction data to 2.1 A resolution were collected from orthorhombic crystals, and the structure solved by molecular replacement and refined to a crystallographic Rvalue of 19.0% (FIG. 2A and FIGs. 29-31). The structure of VRC-PG04 in complex with HIV-1 gpl20 showed striking similarity with the previously determined complex with VRCOl, despite different donor origins and only 50% amino acid identity in the heavy chain- variable region (FIG. 2). When gpl20s were superimposed, the resultant heavy chain positions of VRCPG04 and VRCOl differed by a root- mean- square deviation
(rmsd) of 2.1 A in Ca-atoms, with even more precise alignment of the heavy chain second complementary determining (CDR H2) region (1.5 A rmsd). Critical interactions such as the Asp368gpl2o salt bridge to Arg71VRcoi were maintained in VRC-PG04 (FIG. 2B).
The gpl20-Fab complex of donor 45-derived VRC03 was also crystalized. VRC03 and VRC-PG04 share only 51% heavy chain- variable protein sequence identity, and the heavy chain of VRC03 contains an unusual insertion in the framework 3 region. Diffraction data to 1.9 A resolution were collected from orthorhombic crystals, and the structure solved by molecular replacement and refined to a crystallographic R-value of 18.7% (FIG. 2, FIG. 29, FIG. 32 and FIG. 33). VRC03 also showed recognition of gpl20 that was strikingly similar to that of VRC-PG04 and VRCOl, with pairwise rmsds in Ca-atoms of 2.4 A and 1.9 A. In particular, CDR H2 and CDR L3 regions showed similar recognition (pairwise Ca- rmds ranged from 0.5 - 1.4 A) (FIG. 11).
In general, the repertoire of possible immunoglobulin products is very large and highly similar modes of antibody recognition are expected to occur infrequently. Other families of HIV- 1 specific antibodies were analyzed that share a common IGVH-gene origin, including the CD4-induced antibodies, which often derive from a common VH1-69 allele. Analysis of the recognition of gpl20 by these antibodies indicated substantial variation in their recognition, with angular difference in heavy chain recognition of over 90° (FIG. 34). Other CD4-binding site antibodies were also analysed that are also recognized well by the RSC3 probe, such as antibodies bl2 and bl3; these other RSC3-reactive antibodies also showed dramatic differences in heavy chain orientation (FIG. 34).
The remarkable convergence in recognition observed with VRCOl, VRC03, and VRCPG04 suggested a common mode of HIV-1 gpl20 recognition, conserved between donors infected with a clade B (donor 45) and clade A/D (donor 74) strain of HIV-1. The precision required for this mode of recognition likely arises as a consequence of the multiple mechanisms of immune evasion that protect the site of CD4 attachment on HIV-1 gpl20. Analysis of the paratope surface properties revealed that the average energy of antibody hydrophobic interactions (AiG) correlated with the convergence in antibody recognition (P=0.0427) (FIG. 3A). Thus while precise H-bonding is required for this mode of recognition (FIG. 2C), the convergence in structure optimizes regions with hydrophobic interactions. Another important feature of this mode of recognition is its ability to focus precisely on the initial site of CD4 receptor attachment. Indeed, the breadth of HIV- 1 neutralization among CD4- binding-site ligands correlated with targeting onto this site (P=0.0405) (FIG. 3B).
This convergence in epitope recognition is accompanied by a divergence in antibody sequence identity (FIGs. 1C, IE and 3C). All eight antibodies isolated by RSC3 binding utilize the germline IGHV 1-2*02 and accrue 70-90 nucleotide changes. Despite the similarity in mature antibody recognition, only 2 residues from the germline IGHVl-2*02 allele change to the same amino acids (FIG. 1C). Both of these changes occur at a hydrophobic contact in the critical CDR H2 region
(Gly56Thr→ Ala56Val). The light chains for donors 45 and 74 antibodies arise from either IGVK3-11*01 or IGVK3-20*01, while the light chains of donor 0219 antibodies are derived from IGVK1-33*01. For these light chains, no maturational changes are identical. Despite this diversity in maturation, comparison of the
VRCOl, VRC03, and VRC-PG04 paratopes shows that many of these changes are of conserved chemical character (FIG. 3C); a hydrophobic patch in the CDR L3, for example, is preserved. These observations suggest that divergent amino acid changes among VRCOl -like antibodies nevertheless afford convergent recognition when guided by affinity maturation.
Functional complementation of heavy and light chains among VRC01- like antibodies. While the identification and sorting of antigen- specific B cells with resurfaced probes has resulted in the isolation of several broadly neutralizing antibodies, genomic analysis of B-cell cDNA libraries provide substantially greater sequence complexity. These sequences specify the functional antibodyome, the repertoire of expressed antibody heavy and light chain sequences in each individual. High-throughput sequencing methods provide heavy chain and light chain sequences, but do not retain information about their pairings. For VRCOl -like antibodies, the structural convergence revealed by the cry stallo graphic analysis indicated a potential solution: different heavy and light chains might achieve functional complementation within this antibody family. Heavy and light chain chimeras of VRCOl, VRC03, VRC-PG04 and VRC- CH31 were produced by transient transfection (FIG. 35) and tested for HIV-1 neutralization (FIG. 35). VRCOl (donor 45) and VRC-PG04 (donor 74) light chains were functionally compatible with VRCOl, VRC03 and VRC-PG04 heavy chains, though the VRC03 light chain was compatible only with the VRC03 heavy chain (FIG. 4A and FIG. 35). Similarly, despite -50% differences in sequence identity (FIG. IE), the VRC-CH31 (donor 0219) heavy and light chains were able to functionally complement most of the other antibodies (FIG. 4A and FIG. 35).
Identification of VRCOl-like antibodies by deep sequencing of donors 45 and 74. To study the antibody repertoire in these individuals, deep sequencing was performed on cDNA from donor 45 PBMC. The mRNA was extracted from 20 million PBMC, reverse transcribed with oligo (dT)12-18, and a quarter of the resultant cDNA (equivalent to the transcripts of 5 million PBMC) was used as the template for PCR to preferentially amplify the IGHV1 gene family from both the IgG and IgM expressing cells. PCR products were gel purified and analyzed by 454 pyrosequencing. Because the variable regions of heavy and light chains are roughly 400 nucleotides in length, 454 pyrosequencing methods, which allow read lengths of 500 nucleotides, were used for deep sequencing. First heavy chain sequences were assessed from a 2008 PBMC sample from donor 45, the same time point from which antibodies VRC01 , VRC02, and VRC03 were isolated by RSC3-probing of the memory B-cell population. mRNA from 5 million PBMC was used as the template for PCR to preferentially amplify the IgG and IgM genes from the IGHV1 family. 454 pyrosequencing provided 221,104 sequences of which 33,386 encoded heavy chain variable domains that encompassed the entire V(D)J region.
To categorize the donor 45-heavy chain sequence information,
characteristics particular to the heavy chains of VRCOl and VRC03 were chosen as filters: (i) sequence identity, (ii) IGHV gene allele origin, and (iii) sequence divergence from the germline IGHV-gene as a result of affinity maturation (FIG. 4B). Specifically, sequences were divided into IGHV 1-2*02 allelic origin (4597 sequences) and non-IGHV 1-2*02 origin (28,789 sequences), and divergence analyzed from inferred germline genes, and sequence identity to the template antibodies VRCOl and VRC03 (FIG. 4B). Interestingly, no sequence of higher than 75% identity to the VRCOl or VRC02 heavy chain was found, although 109 sequences of greater than 90% sequence identity to VRC03 were found and all were of IGHVl-2*02 origin (FIGs. 4B and 13). These sequences formed a well segregated cluster on a contour plot. To assess biological function, chimeric antibodies were made by pairing each of the two heavy chain sequences from the 454 sequence set with the VRC03 light chain. In both cases, potent neutralization was observed, with neutralization similar to the original VRC03 antibody (FIG. 4E and FIG. 39).
A similar heavy chain-deep sequencing analysis was performed with donor 74 PBMC from the same 2008 time point from which VRC-PG04 and VRC-PG04b were isolated. In the initial analysis, despite obtaining 263,764 sequences of which 85,851 encompassed the full V(D)J regions of the heavy chain, no sequences of greater than 75% identity to VRC-PG04 were found (FIG. 15). Because the number of unique heavy chain mRNAs present in the PBMC sample was likely much larger than the number of unique sequences obtained in the initial analysis, the deep sequencing of this sample was repeated with an increased number of 454
pyrosequencing reads and with protocols that optimized read length. In this analysis, 110,386 sequences of IGHVl-2*02 origin and 606,047 sequences of non-IGHVl- 2*02-origin were found to encompassed the V(D)J region of the heavy chain, a 10- fold increase in sequencing depth. Among these sequences, 4920 displayed greater than 75% identity to VRC-PG04 (FIG. 4B and Appendix 2). Heavy chain sequences of the IGHVl-2*02 allelic origin segregated into several clusters, one at -25% divergence and -85% identity to the VRC-PG04 heavy chain, and several at 25-35% divergence and 65%, 85%, and 95% identity to VRC-PG04 (FIG. 4B).
To assess the biological function of these numerous 454-identified heavy chain sequences, 56 representative sequences were selected from the quadrant defined by high divergence (16-38%) and high sequence similarity (60-100%) to VRC-PG04 (FIG. 16). The 56 sequences were synthesized and expressed with the VRC-PG04 light chain (FIG. 37 and FIG. 38). Remarkably, many of these antibodies displayed potent HIV-1 neutralization, confirming that these were functional VRC-PG04-like heavy chains (FIG. 4E and FIG. 39). Next a similar analysis was performed of the antibody light chain. Because VRCOl -03 and VRC-PG04 derive from IGKV3 alleles, primers were used that were designed to amplify the IGKV3 gene family. A donor 45 2001 time point was chosen to maximize the likelihood of obtaining light chain sequences capable of functional complementation. A total of 305,475 sequences were determined of which 87,658 sequences encompassed the V-J region of the light chain. To classify the donor 45- light chain sequences into useful subsets, a biologically specific characteristics were chosen: A distinctive 2-amino acid deletion in the first complementary-determining region and high affinity maturation (17% and 19% for VRCOl and VRC-PG04, respectively). Two such sequences with -90% sequence identity to their VRCOl and VRC03 light chains, respectively, were identified (FIG. 4C). Their biological function was assessed after synthesis in combination with the VRCOl, VRC03, and VRC-PG04 heavy chains (FIG. 39). When paired with their respective matching wild type heavy chain to produce a full IgG, both chimeric antibodies displayed neutralization similar to the wild type antibody (FIG. 4D and FIG. 39).
Maturation similarities of VRCOl-like antibodies in different donors revealed by phylogenetic analysis. The structural convergence in gpl20 recognition and the functional complementation between VRCOl-like antibodies from different donors suggested similarities in their maturation processes. Therefore phylogenetic tools were used to assess the evolutionary relationship among sequences derived from the same precursor germline gene. It was initially hypothesized that if known VRCOl-like sequences from one donor were added to the analysis of sequences of another donor, a genomic-rooted phylogenetic tree would reveal similarities in antibody maturation pathways. Specifically, with such an analysis, the exogenous sequences would be expected to interpose between branches in the dendrogram containing VRCOl-like antibodies and branches containing non-VRC01-like antibodies from the original donor's antibodyome. This analysis was performed with heavy chains, as all of the probe-identified VRCOl-like antibodies derived from the same heavy chain IGHV 1-2*02 allele. The donor 74- derived VRC-PG04 and 4b and donor 0219-derived VRC-CH30, 31 and 32 heavy chain sequences were added to the donor 45 antibodyome sequences of IGHV 1-
2*02 genomic origin and a phylogenetic tree was constructed rooted by the predicted VRCOl unmutated germline ancestor. This analysis revealed that sequences of high identity to VRC03 clustered as a subtree of a common node that was also the parent to donor 74 and 0219 VRCOl -like heavy chain sequences (FIG. 5 A, left). Two donor 45 sequences chosen at random from the subtree derived from this common node were shown to neutralize HIV- 1, whereas 11 heavy chain sequences from outside this node did not neutralize (P < 0.0001) (FIG. 14).
The donor 74-derived IGHV 1-2*02 heavy chain sequences were also assessed by including probe-identified VRCOl -like antibodies from donor 45 and donor 0219 in the phylogenetic analysis. In the tree rooted by the predicted VRC- PG04 unmutated germline ancestor, 5047 sequences segregated within the donor 45 and 0219-identified subtree (FIG. 5A, right). This subtree included the actual VRC- PG04 and 04b heavy chain sequences, 4693 sequences of >85 identity to VRC- PG04, and several hundred sequences with identities as low as 68% to VRCPG04. To test the functional activity of heavy chain sequences identified by this analysis, the location of the 56 heavy chain sequences was assessed that were identified and expressed from the previously described identity/divergence grid (FIG. 16). To these 56 sequences, 7 additional sequences were added from the donor 74 phylogenetic tree and 7 non-IGHV 1-2*02 sequences to enhance coverage of the cross-donor segregated sequences (FIG. 17). These sequences were also synthesized and expressed with the VRC-PG04 light chain (FIGs. 37 and 38). Among these 70 synthesized heavy chain sequences, 25 did not express. Of the remaining 45 reconstituted antibodies, 24 were able to neutralize HIV-1 (FIG. 39). Remarkably, all of the neutralizing sequences segregated into the subtree identified by the exogenously added donor 45 and 0219 VRCOl-like antibodies (P-value=0.0085) (FIG. 6D).
This cross-donor segregation method was also applied to the light chain antibodyome of donor 45. The light chains from donor 74 and 0219 did not segregate with known VRCOl-like light chains from donor 45 (FIG. 18), likely because these three light chains do not arise from the same inferred germline sequences. This difference may also reflect the dissimilarities in focused maturation of the two chains (see FIG. 3A): in the heavy chain, focused maturation occurs in the CDR H2 region (encompassed solely within the 2*02 VH gene from which all VRCOl-like heavy chains derive) and, in the light chain, selection pressures occur in the CDR L3 region (which is a product of different types of V-J recombination).
CDR H3-lineage analysis. The 35 heavy chain sequences that both segregated into the VRCOl- neutralizing subtree and expressed when reconstituted with the VRC-PG04 light chain could be clustered into 9 CDR H3 classes (FIG. 6B), with sequences in each class containing no more than 5 nucleotide differences in CDR H3 from other sequences in the same class (FIG. 19). A detailed junction analysis of the V(D)J recombination origins of these classes suggested that 8 of the 9 classes arose by separate recombination events (FIG. 20); two of the classes (7 and 8) differed primarily by a single three residues insert/deletion, Arg-Tyr-Ser, and may have arisen from a single V(D)J recombination event (FIG. 21). Three of these classes (CDR H3-1, 2, and 9) were represented only by non-neutralizing antibodies, three by a single neutralizing antibody (CDR H3-4, 5 and 6), and three by a mixtures of neutralizing and non-neutralizing antibodies (CDR H3-3, 7 and 8). While it was not clear if the non-neutralizing heavy chain sequences truly lacked neutralization function or if this phenotype was due to incompatibilities in light chain pairing, it was chosen to analyze CDR H3 classes only for those in which neutralization had been confirmed.
Donor 74 IGHV 1-2*02 heavy chain sequences were further analyzed to identify those with CDR H3 sequences identical to the CDR H3s in each of the neutralizing classes (FIG. 7). This analysis identified four clonal lineages (CDR H3- classes 3, 6, 7 and 8), with sequences that extended to 15% or less affinity maturation. CDR H3 class 7 included the probe-identified antibodies, VRC-PG04 and 04b. In each case, a steady accumulation of changes lead to increased neutralization activity, and changes at positions 48, 52, 58, 69, 74, 82 and 94 in the V gene, among others, appeared to be selected in several lineages (FIG. 7). Overall, more than 1500 unique sequences could be classified into these four CDR H3 lineages (FIG. 7). Although these CDR H3 lineages were inferred from a single time point they likely provide insight into the specific maturation pathway by which the heavy chain of a VRCOl-like antibody evolves from an initial unmutated
recombinant to a broadly neutralizing antibody. J chain analysis and maturation complexities. In the heavy chains of VRCOl-like sequences identified by phylogenetic analysis, a significant skewing of J chain usage was observed (FIG. 5A): in donor 45, over 87% of the phylogenetic - segregated sequences utilize the IGHJ1*01 allele, and in donor 74, 99% of the segregated sequences utilize the IGHJ2*01 allele. This preferential J chain usage does not appear to be a requirement for binding specificity; indeed, the use of the Jl allele in VRCOl, the J2 allele in VRC-PG04, and the J4 allele in VRC-CH31 provide examples for the functional compatibility of at least three different IGHJ alleles in VRCOl-like antibodies. In addition to preferential J chain usage, other complexities in the maturation process could be inferred from similarities in mature heavy chain genes and differences in CDR H3 sequence. In the absence of information on the natural pairing of heavy and light chains, the antibody maturation processes underlying these complexities is difficult to infer. Nevertheless, the deep sequencing data, with thousands of CDR H3-defined maturation intermediates (FIG. 7), provide sufficient information to suggest that the maturation may involve heavy chain revision or other mechanisms of B cell diversification.
Antibody genomics, HIV-1 immunity, and vaccine implications. Affinity maturation that focuses a developing antibody onto a conserved site of HIV- 1 vulnerability provides a mechanism to achieve broad recognition of HIV-1 gpl20. Such focused evolution may be common to broadly neutralizing antibodies that succeed in overcoming the immune evasion that protect HIV-1 gpl20 from humoral recognition; the multiple layers of evasion may constrain or focus the development of nascent antibodies to particular pathways during maturation.
The structure -based genomics approach described here provides tools for understanding antibody maturation. It is disclosed herein how deep sequencing can be utilized to determine the repertoire of sequences that compose the light chain and heavy chain antibodyomes in HIV-1 infected individuals. These antibodyomes can then be interrogated for unusual properties in sequence, or in maturation, to identify antibodies for functional characterization. Three means of sieving a large database of antibody sequences are disclosed herein: 1) by identity to a known mAb sequence and by divergence from putative germline (identity/divergence- grid analysis), 2) by cross-donor phylogenetic analysis of maturation pathway relationships, and 3) by CDR H3-lineage analysis. An important aspect of this analyses was the functional characterization of selected sequences achieved through expression of and reconstitution with known VRCOl-like heavy or light chains, although other means of pairing such as by frequency analysis are possible. While neutralization has been assessed on less than 100 of the antibodyomics-derived heavy-light reconstituted antibodies, the thousands of identified sequences provide a large dataset for analysis, which should enhance our understanding of the critical features of VRCOl-like antibodies. For example, the correlation of sequence variation at particular positions with neutralization provides insight into the allowed diversity and required elements of neutralization by this family of antibodies (FIG. 23).
The deep sequencing and structural bioinformatics methodologies presented here facilitate analysis of the human antibodyome (FIG. S16). This genomics technology allows interrogation of the antibody responses from infected donors, uninfected individuals or even vaccine recipients and has several implications. For example, a genomic rooted phylogenetic analysis of the VRCOl antibodyome may reveal a general maturation pathway for the production of VRCOl-like antibodies. Indeed, cross-donor phylogenetic analysis (FIG. 5B) suggests that common maturation intermediates with 20-30 affinity maturation changes from the IGHV1- 2*02 genomic precursor are found in different individuals. These intermediates give rise to mature, broadly neutralizing VRCOl-like antibodies, which have about 70-90 changes from the IGHV1- 2*02 precursor (FIG. 5). If modified gpl20s with affinity to the maturation intermediates represented by the nodes of the phylogenetic tree were to stimulate the elicitation of these intermediates, then the analysis presented here can help guide the vaccine-induced elicitation of VRCOl-like antibodies. Deep sequencing not only provides a means to identify such intermediates, but also a means to facilitate their detection. Overall, the application of genomic technologies to analysis of antibodies facilitates both highly sensitive feedback and an
unprecedented opportunity to understand the response of the antibodyome to infection and vaccination. MATERIALS AND METHODS
Human specimens. The sera and peripheral blood mononuclear cells (PBMCs) of donor 45 and donors from the international AIDS-vaccine initiative (IAVI) protocol G, and donor 0219 from the center for HIV/AIDS vaccine immunology (CHAVI) 001 cohort have been described previously. Donor 45, from whom monoclonal antibodies (mAbs) VRCOl, VRC02 and VRC03 were isolated, was infected with an HIV-1 clade B virus. The IAVI protocol G donor 74, from whom mAbs VRC-PG04 and VRC-PG04b were isolated, was infected with an A/D recombinant virus. Donor 0219, from whom mAbs VRC-CH30, VRC-CH31 and VRC-CH32 were isolated, was infected with a clade A virus. These three donors were chronically infected and had not initiated antiretroviral treatment at the time of PBMC sampling.
Protein expression and purification. Monomeric gp 120s, gp 120 with the CD4-binding site knockout mutation D368R, gpl20 cores, RSC3 and ARSC3 were expressed by transient transfection of 293F cells. Briefly, genes encoding the proteins of interest were each synthesized with a C-terminal His tag (GeneArt, Regensburg, Germany), and cloned into a mammalian CMV/R expression vector. Proteins were produced by transient transfection using 293fectin (Invitrogen, Carlsbad, CA) in 293F cells (Invitrogen) maintained in serum-free free-style medium (Invitrogen). Culture supernatants were harvested 5 - 6 days after transfection, filtered through a 0.45 μιη filter, and concentrated with buffer exchange into 500 mM NaCl, 50 mM Tris (pH 8.0). Proteins were purified by Co-NTA (cobaltnitrilotriacetic acid) chromatography method using a HiTrap IMAC HP column (GE Healthcare, Piscataway, NJ). The peak fractions were collected, and further purified by gel-filtration using a HiLoad 16/60 Superdex 200 pg column (GE Healthcare). The fractions containing monomers of each protein were combined, concentrated and flash frozen at -80°C.
Isolation of antigen-specific memory B cells by fluorescence activated cell sorting (FACS). The avi-tagged RSC3 and RSC3 were expressed, purified, and biotinylated using the biotin ligase Bir A (Avidity, Denver, CO). Biotinylation of the
RSC proteins was confirmed by ELISA. The proteins were then conjugated with the streptavidin fluorochrome reagents, streptavidin-allophycocyanin (SA-APC) (Invitrogen) for RSC3 and streptavidin-phycoerythrin (SA-PE) (Sigma) for ARSC3. About 20 million donor PBMC were stained with RSC3-APC, ARSC3-PE, and an antibody cocktail consisting of anti-CD3-APC-Cy7 (BD Pharmingen), CD8- Qdot705 (VRC), CD19-Qdot585 (VRC), CD20-Pacific Blue (VRC), CD27-APC- AlexaFluor700 (Beckman Coulter), CD14-Qdot800 (VRC), IgG-FITC (BD
Pharmingen), and IgM-PE-Cy5 (BD Pharmingen). In addition, aqua blue
(Invitrogen) was used to exclude dead cells. The stained PBMC were washed with PBS, then analyzed and sorted using a modified 3-laser FACSAria cell sorter using the FACSDiva software (BD Biosciences). Single cells with the phenotype of CD3-, CD8-, aqua blue-, CD14-, CD19+, CD20+, IgG+, IgM-, RSC3+ and ARSC3- were sorted into 96-well PCR plates containing 20 μΐ of lysis buffer per well. The lysis buffer contained 0.5 μΐ of RNase Out (Invitrogen), 5 μΐ of 5x first strand buffer (Invitrogen), 1.25 μΐ of 0.1M DTT (Invitrogen) and 0.0625 μΐ of Igepal (Sigma). The PCR plates with sorted cells were stored at -80°C. The total content of the donor PBMC sample passing through the sorter was saved in FCS files for further analysis with FlowJo software (TreeStar, Cupertino, CA).
Single B-cell immunoglobulin gene amplification and cloning. The frozen plates with single B-cell RNA were thawed at room temperature, and the reverse transcription was carried out by adding 3 μΐ of random hexamers (Gene Link, Hawthorne, NY) at 150 ng/μΐ, 2 μΐ of dNTP mix, each at 10 mM, and 1 μΐ of Superscript III (Invitrogen) into each 3 well. The thermocycle for reverse- transcription was 42°C for 10 min, 25°C for 10 min, 50°C for 60 min and 94°C for 5 min. The cDNA plates were stored at -20°C, and the IgH, IgK and Ig variable region genes were amplified independently by nested PCR starting from 5 μΐ of cDNA as template. All PCRs were performed in 96-well PCR plates in a total volume of 50 μΐ containing water, 5 μΐ of lOx buffer, 1 μΐ of dNTP mix, each at 10 mM, 1 μΐ of MgCl2 at 25 mM (Qiagen, Valencia, CA) for 1st round PCR or 10 μΐ 5x Q-Solution (Qiagen) for 2nd round PCR, 1 μΐ of primer or primer mix for each direction at 25 μΜ, and 0.4 μΐ of HotStar Taq DNA polymerase (Qiagen). Each round of PCR was initiated at 94°C for 5 min, followed by 50 cycles of 94°C for 30 sec, 58°C for IgH and IgK or 60°C for Ig for 30 sec, and 72°C for 1 min, followed by 72°C for 10 min. The positive 2nd round PCR products were cherry-picked for direct sequencing with both forward and reverse PCR primers. PCR products that gave a productive IgH, IgK or Ig rearranged sequence were reamplified from the 1st round PCR using custom primers containing unique restriction digest sites and subsequently cloned into the corresponding Igyl, IgK and Ig expression vectors. The full-length IgGl was expressed by co-transfection of 293F cells with equal amount of the paired heavy and light chain plasmids, and purified using a recombinant protein-A column (GE Healthcare).
IgG gene family analysis. IgG gene family analysis. The IgG heavy and light chain nucleotide sequences of the variable region were analyzed with
JoinSolver® (which can be found on the world wide web at joinsolver.niaid.nih.gov) and IMGT/V-Quest (which can be found on the world wide web at
imgt.org/IMGT_vquest/share/textes/). The VRC mAb VK gene use was determined by homology to germline genes in the major 2pl 1.2 IGK locus. The VRC mAb D gene use was determined by homology to genes in the major 14q32.33 IGH locus. A combination of consecutive matching length with a +1/-2.02 scoring algorithm in the context of the V to J distance was applied for determining IGHD alignments and VD and DJ junctions in mutated sequences. Immunoglobulin rearrangements were grouped into classes based upon the VDJ gene use, similarity of replacement and silent mutations and the CDR3 identity.
ELISA analyses. Each antigen in PBS at 2 μg/ml was used to coat plates overnight at 4°C. Coated plates were blocked with B3T buffer (150 mM NaCl, 50 mM Tris-HCl, 1 mM EDTA, 3.3% fetal bovine serum, 2% bovine albumin, 0.07% Tween 20) for 1 4 hour at 37°C, followed by incubation with antibody serially diluted in B3T buffer for 1 hour at 37°C. Horseradish peroxidase (HRP)-conjugated goat anti-human IgG Fc antibody (Jackson ImmunoResearch Laboratories Inc., West Grove, PA) at 1: 10,000 was added for 1 hour at 37°C. All volumes were 100 μΐ/well except that 200 μΐ/well was used for blocking. Plates were washed between each step with 0.1% Tween 20 in PBS. Plates were developed using either 3,3',5,5'- tetramethylbenzidine (TMB) (Kirkegaard & Perry Laboratories) and read at 450 nm. For competitive ELISA analyses, plates were coated with 1 μg/ml of a sheep anti- gpl20 C5 antibody, D7324 (Cliniqa Corp., Fallbrook, CA) or 10 μg/ml of Galanthus nivalis lectin (Sigma) to capture 2 μg/ml of purified YU2 gpl20 or RSC3 respectively. After blocking, serial dilutions of the competitor antibodies or CD4-Ig were added to the captured gpl20 or RSC3 in 50 μΐ of B3T buffer, followed by adding 50 μΐ of biotin-labeled antibody or CD4-Ig at fixed concentrations: 200 ng/ml of VRC-PG04 and 500 ng/ml of VRC-CH31 to bind to YU2 gp 120 or RSC3, 150 ng/ml of CD4-Ig and 80 ng/ml of 17b to bind to YU2 gpl20. The plates were incubated at 37°C for 1 hour, followed by incubation with 250 ng/ml of streptavidin- HRP (Sigma) at room temperature for 30 min, and developed with TMB as described above.
HIV-1 neutralization and protein competition assays. Neutralization was measured using single-round-of-infection HIV-1 Env-pseudo viruses and TZM-bl target cells. Neutralization curves were fit by nonlinear regression using a 5- parameter hill slope equation. The 50% and 80% inhibitory concentrations (IC50 and IC80) were reported as the antibody concentrations required to inhibit infection by 50% and 80% respectively. Competition of serum or mAb neutralization was assessed by adding a fixed concentration (25 μg/ml) of the RSC3 or ARSC3 glycoprotein to serial dilutions of antibody for 15 min prior to the addition of virus. The resulting IC50 values were compared to the control with mock PBS added. The neutralization blocking effect of the proteins was calculated as the percent reduction in the ID50 (50% inhibitory dilution) value of the serum in the presence of protein compared to PBS.
Construction of the HIV-1 envelope sequence phylogenetic trees. HIV-1 gpl60 protein sequences of the 180 isolates used in the neutralization assays were aligned using MUSCLE, for multiple sequence comparison by log-expectation. The protein distance matrix was 5 calculated by "protdist" and the dendrogram was constructed using the neighbor-joining method by "Neighbor". All analysis and the programs used were performed at the NIAID Biocluster (which can be found on the world wide web at niaid-biocluster.niaid.nih.gov/). The tree was displayed with Dendroscope.
Crystallization of the gpl20:VRC-PG04 and gpl20:VRC03 complexes.
The same HIV-1 clade A/E 93TH057 AV123 gpl20 that crystallized with VRCOl was used to form complexes with antibodies VRC03 and VRC-PG04 for crystallization trials. The gpl20 was expressed, purified and deglycosylated. The antigen-binding fragments (Fabs) of VRC-PG04 and VRC03 were generated by LyS-C (Roche) digestion of IgGl. The gpl20: VRC-PG04 or gpl20:VRC03 complexes were formed by mixing deglycosylated 93TH057 gpl20 and antibody Fabs (1: 1.2 molar ratio) at room temperature and purified by size exclusion chromatography (Hiload 26/60 Superdex S200 prep grade, GE Healthcare) with buffer containing 0.35 M NaCl, 2.5 mM Tris pH 7.0, 0.02% NaN3. Fractions with gp 120: antibody complexes were concentrated to -10 mg/ml, flash frozen with liquid nitrogen before storing at -80°C and used for crystallization screening experiments. Three commercially available screens, Hampton Crystal Screen (Hampton
Research), Precipitant Synergy Screen (Emerald BioSystems), and Wizard Screen (Emerald BioSystems), were used for initial crystallization trials of the
gp 120: antibody complexes. Vapor-diffusion sitting drops were set up robotically by mixing 0.1 μΐ of protein with an equal volume of precipitant solutions (Honeybee, DigiLab). Droplets were allowed to equilibrate at 20° C and imaged at scheduled times with Rocklmager (Formulatrix.). Robotic crystal hits were optimized manually using the hanging drop vapor-diffusion method. Crystals of diffraction- quality for the gpl20:VRC03 complex were obtained at 9 % PEG 4000, 200 mM Li2S04, 100 mM Tris/Cl-, pH 8.5. For the gpl20:VRC-PG04 complex, best crystals were grown in 9.9% PEG 4000, 9.0 % isopropanol, 100 mM Li2S04, 100 mM HEPES, pH 7.5.
X-ray data collection, structure determination and refinement for the gpl20:VRC-PG04 and gpl20:VRC03 complexes. Diffraction data of the gpl20:VRC03 and gpl20:VRC-PG04 crystals were collected under cryogenic conditions. Best cryo-protectant conditions were obtained 6 by screening several commonly used cryo-protectants. X-ray diffraction data were collected at beam-line ID-22 (SER-CAT) at the Advanced Photon Source, Argonne National Laboratory, with 1.0000 A radiation, processed and reduced with HKL2000. For the
gpl20:VRC-PG04 crystals, a 2.0 A data set was collected using a cryoprotectant solution containing 18.0 % PEG 4000, 10.0 % isopropanol, 100 mM Li2S04, 100 mM HEPES, pH 7.5, 12.5 % glycerol and 7.5 % 2R,3R-butanediol. For the gpl20:VRC03 crystals, a 1.9 A data set was collected using a cryoprotectant solution containing 15% PEG4000, 200 niM Li2S04, 100 niM Tris/Cl-, pH 8.5 and 30% ethylene glycol. The crystal structures of gpl20:VRC-PG04 and gpl20:VRC03 complexes were solved by molecular replacement using Phaser in the CCP4
Program Suite. The gpl20:VRCPG04 crystal was in a P212121 space group with dimensions a=61.8, b=66.5, c=237.3, α=β=γ=90.0. The gpl20:VRC03 crystal also belonged to a space group P212121 with cell dimensions a=61.0, b=70.3, c=217.9, α=β=γ=90.0. Both crystals contained only one molecule per asymmetric unit (FIG. 29). The structure of 93TH057 gpl20 in the previously solved VRCOl complex (PDB ID 3NGB) was used as an initial model to place gpl20 in the complexes. With gpl20 fixed in the search model, a variable domain of antibody Fab was then used to locate antibody VRC03 or VRC-PG04 in the complexes. Further refinements were carried out with PHENIX. Starting with torsion-angle simulated annealing with slow cooling, iterative manual model building was carried out on Xtalview and COOT with maps generated from combinations of standard positional, individual fifactor, TLS refinement algorithms and non-crystallographic symmetry (NCS) restraints. Ordered solvents were added during each macro cycle. Throughout the refinement processes, a cross validation (Rfree) test set consisting of 5% of the data was used and hydrogens were included as riding model. Structure validations were performed periodically during the model building/refinement process with MolProbity and pdb- care. X-ray crystallographic data and refinement statistics are summarized in FIG. 29.
Numbering of amino acid residues in antibody. The Kabat nomenclature for amino acid sequences in antibodies.
Protein structure analysis and graphical representations. GRASP and APBS were used in calculations of molecular surfaces, volumes, and electrostatic potentials. PISA was used to perform protein-protein interfaces analysis. CCP4 was used for structural alignments. All graphical representation with protein crystal structures were made with Pymol.
Analysis of structural convergence vs. binding interactions. To evaluate antibody structural convergence, the gpl20 molecules from the three complex structures (with VRCOl, VRC03, and VRC-PG04) were aligned. Residue correspondence in the three antibodies was determined based on the resulting structural alignment (rather than a sequence alignment). Residues in a given antibody that were not structurally aligned to residues in the other two antibodies were discarded from further analysis. For each of the three pairs of structures, Ca RMSD was computed for the six CDR regions, while Ca deviation was computed for each residue. Structural convergence for each CDR was then evaluated based on the average of the three pairwise Ca RMSDs for the given CDR. Structural convergence for the per-residue comparisons was evaluated based on the average of the three pairwise Ca deviation values for each residue. Residue numbering was based on the VRC-PG04 structure. Interface surface areas and hydrophobic interactions were computed using the PISA server. CDR interface surface areas for each antibody were computed as the sum of the interface surface areas of the corresponding residues. The average of the interface surface areas for each paratope residue was computed over the three structures. The average of the solvation energy values AiG for each paratope residue i (as obtained from the PISA Interface
Residues Table) was also computed over the three structures. Residues with positive average PISA AiG were deemed to participate in hydrophobic interactions and were included in the correlation analysis against the respective per-residue Ca deviations.
Analysis of neutralization breadth vs. targeting precision. The CD4-defined initial site of vulnerability included the following gpl20 residues: 257, 279, 280, 281, 282, 283, 365, 366, 367, 368, 370, 371, 455, 456, 457, 458, 459, 460, 469, 472, 473, 474, 475, 476, 477. For each antibody, the interface surface areas on gpl20 were determined using the PISA server. In each case, the interface surface area corresponding to the residues from the initial site of vulnerability was termed 'Inside', while the remaining interface surface area was termed 'Outside'. Targeting precision was defined as the function 'Inside - Outside'. The neutralization breadth of CD4-Ig and the different antibodies was determined using IC80 values for Tier 2 viruses, as obtained from: (VRCOl, VRC03, bl2, and CD4-Ig), (bl3 and F105), and the present study (VRC-PG04).
Sample preparation for 454 pyrosequencing. Briefly, mRNA was extracted from 20 million PBMC into 200 μΐ of elution buffer (Oligotex kit,
Qiagen), then concentrated to 10-30 μΐ by centrifuging the buffer through a 30 kD micron filter (Millipore). The reverse-transcription was performed in one or multiple 35 μΐ-reactions, each composed of 13 μΐ of mRNA, 3 μΐ of oligo(dT)12-18 at 0.5 μg/μl (invitrogen), 7 μΐ of 5x first strand buffer (Invitrogen), 3 μΐ of RNase Out (Invitrogen), 3 μΐ of 0.1M DTT (Invitrogen), 3 μΐ of dNTP mix, each at 10 mM, and 3 μΐ of Superscript II (Invitrogen). The reactions were incubated at 42°C for 2 hours. The cDNAs from each sample were combined, cleaned up and eluted in 20 μΐ of elution buffer (NucleoSpin Extract II kit, Clontech). Therefore, 1 μΐ of the cDNA was equivalent of transcripts from 1 million PBMC. The immunoglobulin gene- specific PCRs were set up using 5 μΐ of the cDNA as template (equivalent of transcripts from 5 million PBMC), using the Platinum Taq DNA Polymerase High Fidelity system (Invitrogen) in a total volume of 50 μΐ. The reaction mix was composed of water, 5 μΐ of lOx buffer, 2 μΐ of dNTP mix, each at 10 mM, 2 μΐ of MgSC"4, 1 μΐ of each primer at 25 μΜ, and 1 μΐ of platinum Taq DNA polymerase high fidelity. The forward primers for VH1 gene amplification were 5'L-VH1, 5 ' AC AGGTGCCC ACTCCC AGGTGC AG 3' (SEQ ID NO: 2495); 5'L-VH1#2, 5'GCAGCCACAGGTGCCCACTCC3'(SEQ ID NO: 2496); 5'L-VHl-24,
5'CAGCAGCTACAGGCACCCACGC3'(SEQ ID NO: 2497); 5'L-VHl-69, 5 ' GGC AGCAGCTAC AGGTGTCCAGTCC3 ' (SEQ ID NO: 2498); the reverse primers were 3'Cy-CHl, 5 ' GGGGG AAGACCGATGGGCCCTTGGTGG3 ' (SEQ ID NO: 2499), and 3' Cμ-CH1, 5 ' GGGAATTCTCAC AGGAGACGA3 ' (SEQ ID NO: 2500). The forward primer for VK3 amplification was 5'L9 VK3,
5'CTCTTCCTCCTGCTACTCTGGCTCCCAG3'(SEQ ID NO: 2501); the reverse primer was 3'CK494, 5 ' GTGCTGTCCTTGCTGTCCTGCT3 ' (SEQ ID NO: 1617). The PCRs were initiated at 95°C for 2 min, followed by 25 cycles of 95°C for 30 sec, 58°C for 30 sec, and 72°C for 1 min, followed by 72°C for 10 min. The PCR products at the expected size (450-500bp) were gel purified (Qiagen), followed by phenol/chloroform extraction.
454 library preparation. PCR products were quantified using Qubit (Life Technologies, Carlsbad, CA). Following end repair 454 adapters were added by ligation. Library concentrations were determined using the KAPA Biosystems qPCR system (Woburn, MA) with 454 standards provided in the KAPA system.
454 pyrosequencing. 454 pyrosequencing of the PCR products was performed on a GS FLX sequencing instrument (Roche-454 Life Sciences, Bradford, CT) using the manufacturer's suggested methods and reagents. Initial image collection was performed on the GS FLX instrument and subsequent signal processing, quality filtering, and generation of nucleotide sequence and quality scores were performed on an off-instrument linux cluster using 454 application software (version 2.5.3). The amplicon quality filtering parameters were adjusted based on the manufacturer's recommendations (Roche-454 Life Sciences
Application Brief No. 001-2010). Quality scores were assigned to each nucleotide using methodologies incorporated into the 454 application software to convert flowgram intensity values to Phred-based quality scores. The quality of each run was assessed by analysis of internal control sequences included in the 454 sequencing reagents. Reports were generated for each region of the PicoTiterPlate (PTP) for both the internal controls and the samples.
Bioinformatics analysis of 454-pyrosequencing-determined
antibodyomes. A general bioinformatics pipeline has been developed to process and analyze 454 pyrosequencingdetermined antibodyomes. The information generated in each step of the process (see Appendices 1-4) was used to characterize the basic features of antibodyomes as well as to identify potential neutralizing antibody sequences for functional validation. Specifically, each sequence read was (1) reformatted and labeled with a unique index number; (2) assigned to variable (V) gene family and allele using an in-house implementation of IgBLAST
(ncbi.nlm.nih.gov/igblast/), and sequences with E-value > 10-3 were rejected; (3) 10 compared with the germline V-gene and known VRCOl-like antibodies using nucleotide sequences and a global alignment module implemented in ClustalW2, which provides the basis for identity/divergence-grid analysis; (4) subjected to a template-based error correction scheme where 454 homopolymer errors in V gene were detected and corrected based on the alignment to germline sequence; (5) translated to amino acid sequence, which was further compared with known
VRCOl-like antibodies; (6) filtered using characteristic sequence motifs in variable domain sequence such as QVQ (or other possible triplets) at the N-terminus, CAR (or other possible triplets) at the end of V region, WGXG at the end of CDR H3, and
VSS (or other possible triplets) at the C-terminus of variable domain. As an optional step, the structural compatibility of a 454-pyrosequencing-derived heavy- or light- chain sequence with known VRCOl-like antibody/gpl20 complex structures can be evaluated by threading.
Cross-donor phylogenetic analysis of antibodyomes. Phylogenetic tools were used to interrogate donor antibodyomes. In specific, "cross-donor
phylogenetic" analysis was performed with maximum likelihood phylogenetic algorithms on a set of "representative" nucleotide sequences encompassing the heavy-chain variable domain from donors 45 and 74. Sequence selection involved dividing all of the IGHVl-2*02-originated sequences from each donor into 50 bins based on the sequence divergence from germline. The number 50 was arbitrary but sufficient to provide a set of sequences with different divergence suitable for building the phylogenetic trees shown in Fig. 5A. The first bin contained sequences with 0.0 to 0.7% divergence from the IGHVl-2*02 germline gene and each subsequent bin contained sequences with an increment of 0.7% germline divergence. The 50 bins covered sequences with up to 35% germline divergence, which is approximately the highest germline divergence seen in these donors. One sequence was randomly selected from each bin to represent sequences within that bin. As a result, a total of 38 sequences were selected from donor 45, as 12 bins did not contain any sequences, and 50 from donor 74. The nucleotide sequences of heavy- chain variable domains of known neutralizing mAbs VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 , VRC-CH32 and the inferred reverted unmutated ancestors of VRCOl (VRCOl H germline) and VRC-PG04 (VRC-PG04 H germline) were added to each of the two data sets. The sequences within each set were aligned using ClustalW, and the multiple sequence alignment was provided as input to construct phylogenetic trees using DNAMLK (for 11 DNA Maximum Likelihood program with Molecular Clock)
(cmgm.stanford.edu/phylip/dnamlk.html) in the PHYLIP package v3.69
(evolution.genetics.washington.edu/phylip.html). The calculations were done with default parameters (empirical base frequencies, the transitions to transversions ratio of 2.0, and the overall base substitution model as A 0.24, C 0.28, G 0.27, T 0.21). The output unrooted trees were visualized using Dendroscope, then ordered to ladderize right and rooted at VRCOl H germline and VRC-PG04 H germline for donors 45 and 74, respectively. The ordered and rooted trees are shown in Fig. 5A. We observed within each tree that the known neutralizing mAb sequences appeared in a branch or a subtree with a number of highly mutated (divergent from germline) IGHVl-2*02 sequences. To evaluate the branching significance of the known neutralizing mAbs, 1,000 bootstrapped data sets were generated using the PHYLIP SEQBOOT program. The majority-rule consensus tree was calculated using the PHYLIP CONSENSE program. Bootstrap values of key intermediate states shown in Fig. 5 (Id45, IId45, IIId45 for donor 45 and Id74, IId74 for donor 74) were extracted from the CONSENSE output. After sequence error correction using the VRC03 and VRC-PG04 sequences as templates, the VH gene sequences of the key intermediates were inferred using the option of "reconstruct hypothetical sequences" in the DNAMLK program, followed by manual correction of problematic nucleotides causing invalid or stop codons. The IGHVl-2*02 germline sequence was used as template in the manual correction. The DNAMLK program was also applied to 63 heavy-chain nucleotide sequences selected for expression from donor 74. The maximum likelihood phylogenetic analysis was also carried out on the amino acid sequences using PROMLK (for Protein Maximum Likelihood program with Molecular Clock) (evolution.genetics.washington.edu/phylip/doc/promlk.html) with default parameters. The topology of the maximum likelihood trees generated by DNAMLK and PROMLK appeared to be similar.
To analyze the full dataset of IGHVl-2*02-originated sequences, an iterative procedure based on the neighbor-joining (NJ) method was used to infer their phylogenetic relatedness with known neutralizing mAb sequences. Briefly, the full- length sequences of the IGHV 1-2*02 origin were divided into subsets of no more than 5,000 sequences each, and a NJ tree was constructed for each subset using the "Phylogenetic trees" option in ClustalW2. The nucleotide distance was calculated as percent divergence between all pairs of sequences from the multiple alignment. The NJ tree was rooted at the inferred reverted unmutated ancestor of VRCOl or VRC- PG04 for donors 45 and 74, respectively. The sequences clustered in a branch containing neutralizing mAbs VRCOl, VRC02, VRC03 and VRC-PG04 were extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis. The procedure was repeated until convergence, where all the sequences resided within a subtree containing VRCOl, VRC02, VRC03 and VRC- PG04 and no other sequences resided between this subtree and the root, and where further repeat of the analysis did not change the NJ tree. After this analysis, 109 sequences from donor 45 and 5,047 sequences from donor 74 remained. Among them, 45 from donor 45 and 1,889 from donor 74 were unique sequences as identified using the "blastclust" module in the NCBI BLAST package. These numbers are reported in Fig. 5A, and the last NJ tree with 5,047 sequences remaining from donor 74.
Analysis of CDR H3 lineage. Due to the sequence variation, we adopted a template-based approach to CDR H3 identification for 454-pyrosequencing- determined heavy chain sequences. Specifically, a 454-derived heavy chain sequence was aligned to the VRCOl heavy chain sequence using ClustalW2; then the nucleotide sequences of two motifs that define the CDR H3 region - CTR and WGQG - were used as "anchors" to locate the CDR H3 region in the 454-derived heavy chain sequence. For sequences with long CDR H3s, gap insertion may occur in the two motif regions and cause ambiguities in CDR H3 identification, which were dealt with by allowing a maximum a gap of maximum 10 nucleotides between two nucleotides that are adjacent in the template sequence. Using this template- based approach, the CDR H3 sequence and length were calculated for all full-length sequences in the IGHVl-2*02 family. In the CDR H3 lineage analysis, the 35 expressed and experimentally tested heavy-chain sequences shown in Fig. 6 were divided into 9 CDR H3 groups, allowing no more than 5-nucleotide difference between members within the group. For each lineage, the characteristic CDR H3 sequences were used to search for other sequences with the same CDR H3s from the IGHV 1-2*02 family. The number of sequences in each CDR H3 lineage group is listed in Fig. 6.
Analysis of J chain. 109 VRC03-like and 5,047 VRC-PG04-like heavy- chain sequences identified using iterative phylogenetic analysis were submitted to the SoDA2 server (which can be found on the world wide web at
hippocrates.duhs.duke.edu/soda/Getlnput.aspx) for assignment of variable (V), diverse (D), and joining (J) germline genes and junction analysis. For 14 VRC03- like sequences with non-IGHJl*01 assignment and 66 VRC-PG04-like sequences with non-IGHJ2*01 assignment, the J segment was manually alignment to
IGHJ1*01 or IGHJ2*01 for comparison.
Statistical analysis. Statistical analyses were performed using GraphPad Prism version 5.0 (GraphPad Software Inc.).
EXAMPLE 2
All origin cross-donor analysis for heavy chains of VRCOl-like antibodies
This example illustrates the cross-donor phylogenetic analysis (termed "all origin cross-donor phylogenetic analysis") and comparison of the all-origin cross donor phylogenetic analysis with the IGHV1-2 cross donor phylogenetic analysis, discussed in Example 1. Whereas the IGHV1-2 cross-donor analysis uses an initial input population of test sequences containing only heavy chain nucleotides having an IGHV1-2 germline origin, the all-origin cross donor analysis uses heavy chain nucleotide sequences from the IGHV1-2 germline and other germlines (up to all other germlines) as the input test sequences (see FIG. 40). Using all-origin cross- donor phylogenetic analysis, -99% of the nucleic acid sequences encoding VRCOl- like heavy chains in an antibodyome can be identified.
All origin cross-donor analysis procedure outline
Samples were obtained and prepared for 454 pyrosequencing, and 454 sequencing conducted as described in Example 1 to generate a data set of sequences for the all origin cross donor analysis. From this data set, the following procedure was followed for the all-origin cross donor analysis (Fig. 40):
1. Extract nucleotide sequence reads from the 454 sequencing (test sequences) with length from 300 to 600 bp (both inclusive [300, 600]).
2. Split extracted test sequence reads with length from 300 to 600 bp into smaller files with maximum 3,000 test sequence reads per file. Add germline gene IGHV 1-2*02 sequence and selected nucleotide sequences of the heavy chains of reference antibodies (VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04B, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134) into each smaller file. 3. Build non-rooted neighbor-joining tree for each smaller file via ClustalW2 on high-performance computing (HPC) cluster. The distance metric for the neighbor joining tree was calculated as the percent divergence between all pairs of sequences.
4. Automatically root the resultant neighbor-joining tree on germline IGHV 1-2*02 germline sequence.
5. Parse the rooted tree and get the smallest subtree that contains all the heavy chain sequences of the reference antibodies (this smallest subtree is termed the native tree or reference tree).
6. Extract all the test sequences (cross-donor positive heavy chain sequences) that segregate within the native/reference tree.
7. Run step 2 to 6 iteratively until convergence occurs, when the cross- donor positive sequences in current iteration yield more than 95% of the input sequences in this iteration.
Detailed method of native/reference tree selection ami test sequence extraction:
After rooting the tree at IGHV 1-2*02, each node can have a series of parent nodes (interim/artificial nodes) with different depth. Ultimately, 2*02 is the shallowest parent node and the common ancestor for all the leaves of the
phylo genetic tree, which include the reference/natives heavy chain antibody sequences and the test sequence reads). The leaf nodes (and corresponding leaves are the deepest nodes (and leaves).
For each node (interim or leaf), it contains a bifurcate tree with various number of leaves. IGHVl-2*02, the root, has a subtree of ail input leaf nodes (antibody 454 sequences), including native/reference sequences test sequence reads. Any leaf node represents a subtree which only contains itself. The deeper the node is, the smaller its subtree is, the fewer leaf nodes it contains.
Therefore, to identify the reference/native tree, the deepest nodes (and hence the smallest subtree) whose subtree contains all selected native/reference antibody heavy chain sequences is determined and selected. Then all the test sequences corresponding to leaves of the leaf nodes within the reference/native subtree are extracted. This set of test sequence reads is the smallest subset of test sequence reads that segregates with the native/reference antibody sequences.
There are two extreme situations: if the smallest subtree contains all the reference/native sequences and none of the test sequence reads, then no test sequence reads will be selected for further analysis. On the other hand, if the smallest subtree is the total of all (native/reference antibody sequences and all the test sequence reads), then all the test sequence reads will be selected for the next iteration of the analysis.
For each iteration of the cross-donor analysis, the test sequence reads selected from the previous iteration as segregating with the native/reference tree are pooled and their order randomized before they are divided in smaller filed of no more than 3000 sequence reads each, so that the test sequence reads selected from the previous iteration have a lower chance of being analyzed with the same test sequence reads selected from the same smaller file in the previous iteration of the analysis. This randomization procedure insures almost every test sequence read is evaluated with the other test sequence reads.
The procedure for IGHVl-2-o.ri gin-only cross-donor phylogenetic analysis runs substantially the same as the all-origin cross-donor phylogenetic analysis procedure. However, the input sequences (test sequences) for the IGHVl-2-origin- only cross-donor analysis are only the antibody heavy chain variable domain encoding sequences assigned to IHGV1-2 germline V genes using IMGT/HighV- QUEST. Conversely, all-origin cross-donor phylogenetic analysis use antibody- heavy chain variable domain encoding sequences that were assigned to any origins by IgBLAST,
Overview of All-origin V-gene-only cross-donor phylogenetic analysis
As CXR/K peptide sequence is the signature end of the V genes of the VRCOl-like antibodies, all possible nucleotide fragments encoding the "CXR/K" peptide were assayed. The sequence fragment from the beginning to the end of the CXR/K are the V genes of the antibody are extracted.
1. All V genes of known VRC01 -like antibodies (VRC01 , VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04B, VRC-CH30, VRC-CH31, VRC- CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134) and termed known VRCOl-like V genes.
2. All V genes extracted in step 1 were split into smaller files, each of which contains maximum 3000 V genes.
3. Germline V gene IGHV 1 -2*02 and known VRC01 -like antibody V genes (VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC-PG04B, VRC- CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131, 8ANC134) were exogenously added to each of the small files obtained in step 3.
4. Neighbor joining based phylogenetic trees were built via ClustalW2 on High Performance Computing (HPC) cluster as described above.
5. The phylogenetic trees obtained in step 5 were automatically rooted on germline IGHV 1-2*02.
6. Parse the rooted tree and get the smallest subtree that contains the known VRCOl-like antibody V genes (this smallest subtree is termed the native tree or reference tree).
7. Run step 3 to 7 iteratively until convergence occurs, when the cross- donor positive sequences in current iteration yield more than 95% of the input sequences in this iteration.
FIG. 41 illustrates the number of cross-donor positive sequence reads selected from each iteration of all origin cross donor analysis starting with 277,512 test sequences. The final set of test sequence reads consists of 37,562 cross-donor positive sequences. These 37,562 contain 7,131 out of the original 7,135 VRC01- like Abs.
Comparison of cross-donor analysis
Several different analyses were run to assay the cross-donor phylogenetic analysis, and also to compare the all origin cross-donor analysis with the IGHVl-2 origin cross-donor analysis described in Example 1.
First, sequence identity to VRCOl-like heavy chain sequences and divergence from respective germ line sequence was used to identify the high divergence/high sequence identity sequences within the 454 test sequence dataset. FIG. 42 shows a series of heatmaps illustrating the results of this analysis (showing sequence identity to VRCOl, VRC03, VRC-PG04, VRC-CH31, and VRC06 heavy chains), including the presence of "islands" within the heatmap that represent and high divergence/high sequence identity test sequences. Using an 80% sequence identity and 25 % divergence cut-offs, a total of 5394 antibody heavy chains were identified in the analysis.
Next, several variations of the cross-donor analysis were performed. (1) All antibodies heavy chain sequences from the patient 45 2008 Gl dataset with first run of cross-donor (All). (2) All antibodies from patient 45 2008 Gl dataset with first run of V-gene-only cross-donor (All Vonly). (3) IGHVl-2*02 origin antibodies (IMGT assignment based) from patient 45 2008 Gl dataset with first run of cross- donor (1-2). (4) IGHVl-2*02 origin antibodies (IMGT/HighV-QUEST assignment based) from patient 45 2008 Gl dataset with first run of V-gene-only cross-donor (1-2 Vonly). V-gene only cross-donor analysis involves the same cross-donor procedure as others except that only V gene regions of the 454-determined heavy chains sequences, roughly 300bp each, are used as input sequences for multiple alignment and subsequent phylogenetic analysis
Results for first iteration of the all-origin- and IGHVl-2- cross-donor analyses, with and without rooting are illustrated in FIG. 43A and B. Venn diagrams further illustrating these results are shown in FIG. 44 and FIG. 45. FIG. 46 illustrates the percent of antibody heavy chain nucleic acids that segregated into the native/reference tree during the first iteration of the above analyses.
These results indicate that rooting does affect the final result of antibody selection; that the effect of rooting is greater for the all origin cross-donor analysis; that when IGHVl-2 origin is not selected, the rooting procedure tends to select more test sequences than the non-rooting procedure; that when IGHVl-2 origin is selected, the rooting procedure tends to select fewer test sequences that the non- rooting procedure. Whether rooting change the result of antibody selection depends on whether it changes the topology of the subtree that contains all native/reference heavy chain sequences (discussed with reference to FIGs. 47 and 48, below).
As shown in FIG. 47, when all native/reference heavy chain antibody nucleotide sequences segregate in a subtree that DOES NOT contain the germline sequence (IGHV 1-2*02) and one of them (VRC-CH32) serves as the outgroup node for the rest of the native/reference sequences, rooting does not change the structure of the native tree. Therefore, the selected sequences from booth rooted and un- rooted procedures are the same.
As shown in FIG. 48, when one or a group of native/reference heavy chain antibody nucleotide sequences can serve as the outgroup node in the
native/reference- subtree (VRC-CHxx in this case) and germline IGHVl-2*02 is one of the internal nodes, rooting the tree on IGHVl-2*02 changes the number of selected sequences in the cross-donor phylogenetic analysis.
Comparison after 12 iterations of cross donor analysis
From the sequences obtained from patient 45 (2008) time point using the Gl primer 454 sequencing dataset, two dataset were produced: (1) All antibody heavy chain nucleotide sequences that were mapped to germ line V genes with high coverage (for use with the all origin cross-donor analysis), and (2) all antibody heavy chain nucleotide sequences having IGHV 1-2 origin defined by IMGT/HighV-
QUEST (for use with the IGHV1-2 cross-donor analysis).
For each of the two data sets, two variations of the cross-donor analyses were run on a high performance computing cluster: (1) Normal cross-donor analysis, (2)
V-gene-only cross-donor analysis. The resultant neighbor-joining trees from
Clustalw2 were rooted on IGHV 1-2*02 sequence, which was added to the population of test sequences each iteration of the cross-donor analysis.
During iterations of runs, numbers of test sequences that segregated with native/reference sequences were recorded and the percent of segregations were calculated in two different ways: (1) percent of segregation in regard to the initial number of input antibodies (FIG. 49 A and 49B); and (2) percent of segregation in regard to the current number of input antibodies (FIG 50A and 50B).
As illustrated in FIGs. 49 and 50, the percent (to the initial input, FIG. 49) of sequences segregated with native/reference sequences drops dramatically during the first few iterations of the cross-donor analyses, and eventually reaches a point that there are very few or no antibodies excluded from the native/reference tree during the cross-donor analyses. As illustrated in FIG. 50, the percent (to the current input) of sequences that segregate with the native/reference sequences in the
native/reference tree increases as more iterations of the cross-donor analysis are performed, and eventually reaches close to 100%. The final four result groups contained 27,310 unique sequences in total; of these, 7,837 (28.70%) were found in every single result.
Thus, the percent of segregated sequences drops greatly during the first few iterations of the cross-donor analysis. After 8th iteration of cross-donor, the percent of segregated sequences does not drop as much and begins to flat out. Similar results were observed concerning the percent (to the initial input) of sequences segregated with native/reference sequences. For the all origin cross-donor analysis, the curves are a bit more zigzagged, which indicates that the distribution of antibodies in each single file is also important. However, as iterations process, the input Ab number drops, the fluctuations only reflect to a smaller number of antibodies.
The overlap of sequence results identified using the all-origin and IGHV1-2 origin cross donor analyses with or without V-gene only procedure was compared. FIGs. 51 and 52 illustrate number of sequences identified by the indicated cross- donor analysis methods after 12 iterations of cross-donor analysis. Significant overlap between the results from the cross-donor and V-gene-only cross-donor and the All-origin and IGHV1-2 origin cross donor analyses was observed.
Also, the percent of high identity sequences identified using the all-origin and IGHV1-2 origin cross donor analyses with or without V-gene only procedure was compared (FIG. 53). The sequences in the initial dataset used for the indicate cross-donor analyses were filtered for high sequence identify to VRCOl and VRC03 and divergence from the germline sequence by 20% to 40%. FIG. 53 A indicates the number and percent of sequences that fit these high identity criteria (#Hi) for the indicated cross-donor analyses. FIG. 53B shows the corresponding heat map results. Based on these results, the all-origin cross-donor identified up almost all (>99%) VRCOl -like antibodies from the input dataset.
Germ line gene assignment
The J gene starts from an amino acid motif- WGXG peptide sequence ("X" is any residue) on the antibody sequence. The last WGXG in all antibodies and the start positions were recorded. For the total of 27,310 identifies antibodies, 17,188 (62.94%) of them have WGXG in their sequences. FIG. 54 shows a graph indicated the percent of these 17,188 antibodies as a function of WGXG start positions.
The majority of the identified antibodies have WGXG starting from -300 bp to ~400bp. This is reasonable since the heavy chain V-gene length is -300 bp. Those antibodies having WGXG before the 300 bp position are likely not real CDRH3 terminate signals. Thus, if an antibody has WGXG and the start of WGXG is in the range of 300 to 400 bp of the encoding sequence, the sequence of the antibody was extracted from the WGXG start position to the end. For all other antibodies, we just simply extract from 300 to the 3' end of the sequences.
FIG. 55 illustrates the heavy chain V-gene family assignment for identified heavy chain sequences using various computer programs to assign V-gene status. FIG. 56 illustrates the heavy chain J-gene family assignment for identified heavy chain sequences using various computer programs to assign J-gene status. FIG. 57 shows a table illustrating the number and percent of assignments that were correct using the various computer programs. FIG 58 shows the percentage of high-identity heavy chain sequences having the indicated V-gene family status identified using the indicated computer programs to assign V- gene status. FIG 59 shows the percentage of hi-identity heavy chain sequences having the indicated V-gene family status identified using the indicated computer programs to assign V- gene status.
EXAMPLE 3
Functional complementation of VRCOl-like antibody heavy and light chains This example illustrates functional complementation of VRCOl-like antibody heavy and light chains, indicated by complementation of VRCOl-like heavy chains complemented with VRCOl-like light chains to form monoclonal antibodies capable of HIV- 1 neutralization.
Heavy and light chain chimeras including various heavy and light chains of VRCOl-like and other antibodies were produced as described in Example 1. The heavy and light chains included in the chimeras are indicated in FIGs. 60A-60D.
"OIL," "19L" and "20L" refer to VRCOl, VRC19 and VRC20 light chains, respectively. "H" refers to heavy chain. VRCOl-like antibody heavy and light chains used in the complementation assays include the heavy and/or light chains of VRCOl, VRC17, VRC18, VRC-PG19, VRC-PG20, 12A12, 12A21, 3BCN60, 3BCN117, NIH45-46. Non-VRCOl-like antibody heavy and light chains used in the complementation assays include the heavy and/or light chains of VRC13, VRC14, VRC15, VRC16, 1B2530, 1NC9, 8ANC131, 8ANC134. The heavy and light chains sequence of these antibodies is disclosed herein.
Thus, swapping heavy and light chains between different members of the VRCOl-like class can be accomplished without loss of the VRCOl-like functional characteristics, that is HIV-1 neutralization. Conversely, non-VRCOl-like heavy and light chains fail to functionally complement.
EXAMPLE 4
Deep sequencing and bioinformatics-based identification of broadly
neutralizing
HIV-1 antibodies
This example illustrates the use of the bioinformatic cross-donor
phylogenetic analysis disclosed herein coupled with functional complementation to identify and confirm HIV-1 broadly neutralizing antibodies.
VRCOl-like broadly neutralizing antibodies (bNAbs) precisely target the
CD4-binding site of the HIV-1 surface glycoprotein gpl20, defining a promising vaccine design target. Cross-donor phylogenetic analysis (CDPA), a method disclosed herein that permits identification of VRCOl-like antibody heavy chain sequences through bioinformatic analysis of antibody deep sequencing data from infected donors was used to identify VRCOl-like heavy chains. CDPA selects antibody sequences based on maturation similarities to a small number of known "native/reference" neutralizing antibody sequences, such as VRCOl. CDPA of sequences from an HIV-infected donor with previously uncharacterized antibody status revealed 13 VH sequences that segregated with the VRCOl-like phylogenetic subtree. When paired with light chains from VRCOl-like antibodies, 11 out of 13 reconstituted antibodies showed neutralization breadth and potency on par with VRCOl. Thus, CDPA can identify VRCOl-like VH sequences in an infected individual based only on bioinformatic analysis of deep sequences.
The isolation of broadly neutralizing antibodies (bNabs) has generally relied on moderate-throughput screening procedures utilizing either micro-neutralization assays or antigen- specific B-cell sorting followed by cloning and sequencing of the expressed antibody transcripts. Despite marked success, these methods have typically yield only a few antibodies per donor sample. Advances in next- generation sequencing (NGS) technology now make it possible to determine millions of antibody sequences from a sample of donor B cells, thus providing an overall view of the repertoire of antibodies expressed by the donor. Despite the availability of such data, using it to isolate specific antibodies presents significant challenges.
The CDPA method was used to isolate antibody heavy chains related to VRCOl, a bNAb that neutralizes around 90% of HIV-1 strains, from an infected donor with previously uncharacterized antibody status. HIV-1 has evolved a variety of mechanisms to escape antibody recognition, and it remains unclear how to elicit bNAbs by vaccination that can successfully overcome the genetic and
conformational diversity of the viral envelope (Env). Nevertheless, after persistent HIV-1 infection, 10-30% of individuals develop broadly neutralizing sera and protective neutralizing antibodies. VRCOl-like antibodies, which achieve broad neutralization by precisely targeting the binding site for the human CD4 receptor on the viral surface glycoprotein gpl20, are responsible for this broad neutralizing activity of many such infected individuals. VRCOl-like antibodies originate from B cells in which the VH1-2 gene has recombined with various D and J segments. Effective VRCOl-like antibodies are among the most highly- matured of all antibodies yet characterized, with variable-region mutation rates of about 30%. Sequence identity-based bioinformatic s techniques fail to identify most VRC01- related antibodies in these patients. In CDPA, known VRCOl-like sequences were added "exogenously" to deep sequencing data sets which were then subjected to phylogenetic analysis. VRCOl-like sequences segregated in VH1-2 germline-rooted phylogenetic trees with the previously known VRCOl-like antibodies from the same patient, and also with exogenously added VRCOl-like sequences from other patients. New VRCOl-like sequences are identified by their position, interposed between the branches of the phylogenetic tree containing known VRCOl-like antibody sequences. This example illustrates use of the CDPA method in to identify VRCOl-like antibody sequences in cross-donor phylogenetic trees in samples from donors with entirely uncharacterized antibody status.
A 2008 serum sample from donor 200-384 was used assayed. The resurfaced stabilized core 3 (RSC3) probe and a mutant version of RSC3, which contains a single amino acid deletion in the CD4-binding loop (ARSC3), were used to interrogate the serum for gpl20 CD4bs antibodies. A substantial fraction of neutralization was specifically blocked by RSC3 compared with ARSC3, indicating the presence of CD4bs-directed neutralizing antibodies (FIG. 61A). To define the reactivity of donor 200-384 serum on gpl20, competition enzyme-linked
immunosorbent assays (ELISAs) with a panel of well-characterized mAbs was performed. Serum binding was competed by VRCOl-03, VRC-PG04 and 04b, VRC-CH30-32 and other CD4bs -reactive antibodies and by CD4-Ig, but not by antibodies known to bind gpl20 at other sites. Finally, assessment of neutralization rendered by donor 200-384 serum on a panel of Env-pseudoviruses revealed its ability to potently neutralize a majority of diverse HIV-1 isolates. Following the serum analysis, deep sequencing of cDNA from donor 200-384 PBMCs using the Roche 454 pyrosequencing method, as described in Example 1 was performed. mRNA from 5 million B cell population was used to as template for polymerase chain reaction (PCR) to preferentially amplify the IgG and IgM genes from the IGHV1 family.
The 454 sequencing provided 574,027 sequence reads (FIG. 62). 498,234 or 86.8% of the 454 reads spanned 400-500 bp, sufficiently covering the VH region (FIG. 64). The V(D) J gene components were determined for each sequence using IgBlast. Of note, 168,365 sequences were assigned to IGHVl-2*02 allelic origin, which is used by all VRCOl-like antibodies identified so far. As shown by the distribution of VH1 gene families (FIG. 62A), sequences originated from IGHV1- 2*02 constitute the largest family, accounting for 29.3% of the sequences. Following V(D)J assignment, each sequence was subjected to an automatic error-correction scheme, which improved the protein sequence identity to inferred germline V gene by an average of 16.5%. The corrected sequences were then compared to a set of template VRCOl-like antibodies, including VRCOl-03, VRC-PG04 and 04b, and VRC-CH30-32. No sequence was found to be more than 73% identical to any template (FIG. 61B), suggesting that VRCOl-like antibodies, if do exist in this antibodyome, cannot be recognized by sequence identity. After further screening, 163,108 VH sequences of IGHVl-2*02 allelic origin that encompassed the entire V(D)J region were selected for the third complementarity determining region (CDR H3) analysis, where CDR H3 of each sequence was determined and compared to the CDR H3s of the template VRCOl-like antibodies. Some sequences were found to have CDR H3s of -80% identity to that of VRC-PG04 and shared the same J gene allele, IGHJ2*01, suggesting that the same V(D)J recombination events occurred.
Iterative IGHV1-2 cross-donor phylogenetic analysis on the 163,108 full- length, IGHVl-2*02-originated sequences was then perfomed. The IGHV 1-2*02 antibodyome was divided into 60 subsets, each with donor 45-derived VRCOl-03, donor 74-derived VRC-PG04 and 4b, and donor 0219-derived VRC-CH30-32 added as reference. IGHVl-2*02 sequence was also added to root the resultant cross-donor phylogenetic tree. For each subset, a neighbor-joining (NJ) tree was constructed and sequences in the smallest subtree containing exogenous VRCOl-like antibodies were extracted and merged into a new set for the next iteration of cross-donor analysis. The total number of sequences was reduced to 2,030 after the first iteration and converged to 166 after the second iteration, accounting for -0.01% of the antibodyome. After removal of redundancy, 81 unique sequences were used to construct the final "cross-donor phylogenetic" tree using a more accurate maximum- likelihood (ML) method. The resultant donor 200-384 tree (FIG 61C and FIG. 65) showed four branches, all below the least-divergent VRC-CH30-32 and interleaved with other VRCOl-like antibodies. 11 sequences were selected evenly from the tree and two clustered closely with VRCOl-03 to assess their biological functions.
Remarkably, except the branch interposed between VRC-CH30-32 and VRCOl-03, all other branches appeared to be VRCOl-like as their sequences could pair with VRCOl-like light chains and displayed potent HIV-1 neutralization.
Three aspects of the cross-donor phylogenetic analysis related to germline origin and affinity maturation, which are of critical importance to antibody analysis, were then examined. First, the germline divergence of the 166 cross-donor- segregated donor 200-384 sequences, which ranged from 19.9 to 35.7% and overlapped with the divergence range of exogenous VRCOl-like sequences was calculated. However, these 166 sequences only accounted for 0.4% of the IGHV1- 2*02-originated sequences within the same range of divergence, suggesting that the segregation was not divergence-driven. Second, given the comparable VH1 germline gene populations, it seems possible that heavy chains of other genomic origin have evolved to recognize the CD4-binding site in the presence of rapidly evolving viruses. To assess this possibility, a similar cross-donor analysis on the antibodyomes originated from IGHV1-69 and IGHV1-18 germlines, with 118,638 and 90,097 full-length VH sequences, respectively, was performed. The analysis converged after a single iteration with no "VRCOl-like" sequences identified from either antibodyome. The third question of interest was the J gene usage. 99% of the cross-donor-segregated donor 74 VH sequences used the IGHJ2*01 allele, suggesting that this particular J gene allele was involved in a V(D)J recombination event that led to the VRC-PG04-like antibodies. The 166 cross-donor-segregated sequences from donor 200-384 were analyzed and no apparent preference for J gene usage was found. Specifically, branches 1, 2 and 3 were occupied by sequences using IGHJ5 gene and sequences of branch 4 appeared to use IGHJ6. The two sequences clustered with VRCOl-03 share the same J allelic origin, IGHJ1*01, as VRCOl-03, indicating that the cross-donor analysis was not restricted to a particular J gene or by the J genes of exogenous sequences.
The donor 200-384 antibodyome was examined using functionally tested sequences as template. Sequences of IGHV1-2 allelic origin were plotted in two dimensions, the divergence from inferred germline gene (one of the IGHV1-2 alleles) and sequence identity to a chosen template, for either full sequence or CDR H3 region only (FIG. 62B). For all four branches in the cross-donor phylogenetic tree, only small, dispersed clusters of related sequences with over 80% identity to the selected template on the plot of donor 200-384 antibodyome, as opposed to the abundant populations of VRC-PG04-like sequences obtained at the same sequencing depth for donor 74, were observed. Using #255552 as an example, which exhibited comparable breadth and potency to VRC-PG04, only 36 VH sequences with an identity of > 80% to #255552, of which 5 shared the same CDR H3 were identified. However, in the analysis of donor 74 antibodyome, 2,759 variants of VRC-PG04 heavy chain in classes 7 and 8, were observed. In addition to the small population of matured VRCOl-like sequences, no clonally related antibodies were identified in the lower divergence range (0-20%), as shown by the divergence-CDR H3 identity plots (FIG. 66). The most possible explanation is that the early and intermediate antibodies have been lost in the B cell selection due to their lower affinity.
Further sequence and structural analysis revealed unexpected features of VRCOl-like antibodies identified from donor 200-384. Sequence alignment (FIG 69) showed that the CDR H3 regions of #240171 and #534056 possessed two cysteines that were unique to VRCOl and VRC03. #240171, in particular, has the same CDR H3 length as VRC03, with two cysteines precisely aligned to those in the CDR H3 of VRC03. More intriguingly, #240171 also had an insertion in framework 3 (FR3), which is shorter than that of VRC03 by 4 amino acids but has 3 aspartic acids that are aligned to the same residues at the tip of inserted FR3 loop in VRC03. The homology model of #240171, built based on VRC03 structure (PDB id: 3SE8), showed a structurally similar CDR H3 loop in contact with VRC03 light chain in the gpl20-bound state and suggested that a shorter FR3 loop of similar charge distribution would better accommodate the interactions with V1V2 stem. The rare features shared by these antibodies suggest that the same V(D)J recombination event, accompanied by similar somatic mutations, occurred in the development of VRCOl-like antibodies in two different donors. Overall, sequences selected from the same branch shared a similar CDR H3, with less than 3 residues different from each other. All sequences except #240171 and #534056 have shorter CDR H3s (9- 10 amino acids) compared with VRCOl -03. Structural modeling indicates that such CDR H3 difference can be tolerated with minor conformational change without affecting the gpl20 recognition.
Our approach harnessed the power of 454 sequencing and bioinformatics techniques to identify broadly neutralizing VRCOl-like antibodies from the serum of an HIV- 1 infected donor, without experimental screening. Of note, cross-donor phylogenetic analysis was attempted on the donor 45 light-chain antibodyome without success, as the light chains of VRCOl-like antibodies did not segregate in a subtree. EXAMPLE 5
Identification of light chain of VRCOl-like antibodies
This example illustrates use of a bioinformatic method coupled with a functional complementation method to identify light chains of VRCOl-like antibodies. The bioinformatic method includes a first step to process 454 sequencing data sets using a computational procedure and a second step to search for sequences with VRCOl-like signatures in the CDR-L1 and CDR-L3 regions of the light chains.
Bioinformatic identification of VRCOl-like light chain encoding nucleic acids
A general bioinformatic s pipeline has been developed to process and analyze 454 pyrosequencing-determined antibodyomes. Given a light chain sequencing data set, each sequence read was (1) reformatted and labeled with a unique index number; (2) assigned to variable (V) and joining (J) gene families and alleles using an in-house implementation of IgBLAST, and sequences with E-value > 10" were rejected; (3) subjected to a template-based error correction scheme where 454 homopolymer errors in V and J genes were detected and corrected based on the alignment to germline sequences; (4) compared with a set of user-provided
"reference" light chains sequences to calculate respective identities, at both nucleotide and amino-acid levels; (5) subjected to a multiple sequence alignment (MSA) procedure to determine the CDR-L3 region, which was further compared with a set of user-provided "reference" CDR-L3 sequences. The processed sequences were screened for VRCOl-like motifs using the following procedure.
First, the two VRCOl-like light chain motifs are defined as the following:
(1) For light chains derived from non-IGKVl-33 origin, the VRCOl-like light chain signature included a 2-residue or more amino acid deletion in the region
corresponding to the CDR LI, and a hydrophobic residue followed by a glutamic acid or glutamine residue in the CDR L3. (2) For light chains derived from
IGKVl-33, the VRCOl-like light chain signature included affinity maturation from the germline sequence to at least two glycines in the CDR LI, and a hydrophobic residue followed by a glutamic acid or glutamine residue in the CDR L3. To facilitate the determination of CDR-L1 region, the CDR-L1 sequences of all germline genes were determined and compiled in a library. Each processed, 454- generated light chain sequence was aligned to its respective germline and the region corresponding to the "germline" CDR-L1 was defined as "matured" CDR-L1. A gap of 6 or more consecutive nucleotides in the CDR-L1 alignment was considered a 2-residue deletion for non-IGKVl-33 origin, while the amino-acid sequence of CDR-L1 would be screened for any "GXG" triplet in the case of IGKV1-33 origin. The previously determined CDR-L3 would be translated into amino-acid sequence and screened for a motif of a hydrophobic residue (A, V, L, I, M, F or Y) followed by a glutamic acid (E) or glutamine (Q) residue.
Following identification of the VRCOl-like light chain encoding nucleic acids using the bioinformatic procedure, identified light chains are complemented with a VRCOl-like heavy chain to form a VRCOl-like antibodies for assays of HIV- 1 neutralization and gpl20 binding.
Structural and genetic VRCOl-like light chain signature
The light chain analysis procedure was identified by a structural analysis of VRCOl-like light chains. FIG. 67A summarizes structural studies on CD4 binding site antibodies. FIG. 67B shows a ribbon diagram gpl20. FIG. 68 shows CDR LI and CDR L3 residues that contact gpl20. FIG. 69 illustrates affinity measurements of the VRCOl antibody with the indicated light chain amino acid substitution for gpl20. The results show that alteration of light chain residues alters affinity of VRCOl for gpl20. FIG. 70 shows structural features of the CDR-L1 and CDR-L2 of VRCOl-like light chain interaction with gpl20. FIG. 71 summarizes the structural mechanism used by the CDR-L1 of VRCOl-like antibodies to bind to gpl20. FIG. 72 illustrates the structural mechanism used by the CDR-L3 of VRCOl-like light chains to bind gpl20. As indicated in FIGs. 73 and 74, although the VRCOl-like light chains have diverse germline origin, they maintain a conserves hydrophobic motif in the CDR-L3 for binding to gpl20.
The results show that both CDR LI and CDR L3 are involved in gpl20 recognition; to accommodate the conformationally fixed Loop D of gpl20, VRCOl- like CDR LI avoids more than it engages gpl20; this avoidance by CDR LI is achieved through the intrinsic flexibility of CDR loops or by affinity-maturation- evolved deletion.
Additionally, the VRCOl-like light chains use conserved CDR L3 interactions as a structural anchor to allow the light chain to pivot and avoid potential clashes. CDR L3 interacts with gpl20 through main chain hydrogen bonds and non-specific hydrophobic interactions to avoid sequence variations on gpl20. Even though the light chains use various V and J genes, a conserved Glutamic acid was observed in CDR L3 at the V-J junction Identification of VRCOl-like light chains in the antibodyome of a HIV-1 positive donor
Light chain nucleic acid sequence data sets were obtained and processed for 454 deep sequencing (as described above) for samples from Donor 45 and IAVI donor 57. FIGs. 75 and 78 show the distribution of sequence reads for each of the donor samples. FIGs. 76 and 79 show the corresponding light chain germline distribution for the sequence reads from each of the samples. Also, FIGs. 77 and 80 show heat maps indicating the sequence identity to VRCOl light chain or VRC03 light chain in relation to the germline divergence for each of the sample reads.
The sequences obtained from the 454 deep sequences were then analyzed for the VRCOl-like light chain signature identified above. Briefly, for light chains derived from non-IGKVl-33 origin, the VRCOl-like light chain signature included a 2-residue or more amino acid deletion in the region corresponding to the CDR LI, and a hydrophobic residue followed by a glutamic acid or glutamine residue in the CDR L3. For light chains derived from IGKV1-33, the VRCOl-like light chain signature included affinity maturation from the germline sequence to at least two glycine residues in the CDR LI, and a hydrophobic residue followed by a glutamic acid or glutamine residue in the CDR L3. FIGs. 77 and 80 illustrate the resulting VRCOl-like light chain sequence distribution in the context of the above-described heatmaps. Identified VRCOl-like light chains are indicated by the black dots. Example 6
HIV-1 monoclonal neutralizing antibodies specific to gpl20 for detecting HIV-1 in a subject
This example describes the use of HIV-1 monoclonal neutralizing antibodies specific to gpl20 for the detection of HIV-1 in a subject. This example further describes the use of these antibodies to confirm the diagnosis of HIV-1 in a subject.
A biological sample, such as a blood sample is obtained from the patient diagnosed with, or suspected of having an HIV-1 infection. A blood sample taken from a patient who is not infected is used as a control. An ELISA is performed to detect the presence of HIV-1 in the blood sample. Proteins present in the blood samples (the patient sample and control sample) are immobilized on a solid support, such as a 96-well plate, according to methods well known in the art (see, for example, Robinson et ah, Lancet 362: 1612-1616, 2003, incorporated herein by reference). Following immobilization, HIV-1 monoclonal neutralizing antibodies specific to gpl20 that is directly labeled with a fluorescent marker is applied to the protein-immobilized plate. The plate is washed in an appropriate buffer, such as PBS, to remove any unbound antibody and to minimize non-specific binding of antibody. Fluorescence can be detected using a fluorometric plate reader according to standard methods. An increase in fluorescence intensity of the patient sample, relative to the control sample, indicates the anti-gpl20 antibody specifically bound proteins from the blood sample, thus detecting the presence of HIV-1 protein in the sample. Detection of HIV-1 protein in the patient sample indicates the patient has HIV-1, or confirms diagnosis of HIV-1 in the subject.
Example 7
HIV-1 monoclonal neutralizing antibodies specific to gpl20 for the treatment of
HIV-1
This example describes a particular method that can be used to treat HIV in a human subject by administration of one or more gpl20 specific human neutralizing mAbs. Although particular methods, dosages, and modes of administrations are provided, one skilled in the art will appreciate that variations can be made without substantially affecting the treatment. Based upon the teaching disclosed herein HIV-1 can be treated by administering a therapeutically effective amount of one or more of the neutralizing mAbs described herein, thereby reducing or eliminating HIV infection. Screening subjects
In particular examples, the subject is first screened to determine if they have HIV. Examples of methods that can be used to screen for HIV include a
combination of measuring a subject's CD4+ T cell count and the level of HIV in serum blood levels. Additional methods using the gpl20- specific mAbs described herein can also be used to screen for HIV.
In some examples, HIV testing consists of initial screening with an enzyme- linked immunosorbent assay (ELISA) to detect antibodies to HIV, such as to HIV-1. Specimens with a nonreactive result from the initial ELISA are considered HIV- negative unless new exposure to an infected partner or partner of unknown HIV status has occurred. Specimens with a reactive ELISA result are retested in duplicate. If the result of either duplicate test is reactive, the specimen is reported as repeatedly reactive and undergoes confirmatory testing with a more specific supplemental test (e.g., Western blot or an immunofluorescence assay (IFA)).
Specimens that are repeatedly reactive by ELISA and positive by IFA or reactive by Western blot are considered HIV-positive and indicative of HIV infection.
Specimens that are repeatedly ELISA-reactive occasionally provide an
indeterminate Western blot result, which may be either an incomplete antibody response to HIV in an infected person, or nonspecific reactions in an uninfected person. IFA can be used to confirm infection in these ambiguous cases. In some instances, a second specimen will be collected more than a month later and retested for subjects with indeterminate Western blot results. In additional examples, nucleic acid testing (e.g., viral RNA or proviral DNA amplification method) can also help diagnosis in certain situations.
The detection of HIV in a subject's blood is indicative that the subject has HIV and is a candidate for receiving the therapeutic compositions disclosed herein.
Moreover, detection of a CD4+ T cell count below 350 per microliter, such as 200 cells per microliter, is also indicative that the subject is likely to have HIV. Pre- screening is not required prior to administration of the therapeutic compositions disclosed herein
Pre-treatment of subjects
In particular examples, the subject is treated prior to administration of a therapeutic agent that includes one or more antiretroviral therapies known to those of skill in the art. However, such pre-treatment is not always required, and can be determined by a skilled clinician.
Administration of therapeutic compositions
Following subject selection, a therapeutically effective dose of a gpl20 specific neutralizing mAb described herein is administered to the subject (such as an adult human or a newborn infant either at risk for contracting HIV or known to be infected with HIV). Additional agents, such as anti-viral agents, can also be administered to the subject simultaneously or prior to or following administration of the disclosed agents. Administration can be achieved by any method known in the art, such as oral administration, inhalation, intravenous, intramuscular,
intraperitoneal, or subcutaneous.
The amount of the composition administered to prevent, reduce, inhibit, and/or treat HIV or a condition associated with it depends on the subject being treated, the severity of the disorder, and the manner of administration of the therapeutic composition. Ideally, a therapeutically effective amount of an agent is the amount sufficient to prevent, reduce, and/or inhibit, and/or treat the condition {e.g., HIV) in a subject without causing a substantial cytotoxic effect in the subject. An effective amount can be readily determined by one skilled in the art, for example using routine trials establishing dose response curves. As such, these compositions may be formulated with an inert diluent or with an pharmaceutically acceptable carrier.
In one specific example, antibodies are administered at 5 mg per kg every two weeks or 10 mg per kg every two weeks depending upon the particular stage of HIV. In an example, the antibodies are administered continuously. In another example, antibodies or antibody fragments are administered at 50 μg per kg given twice a week for 2 to 3 weeks. Administration of the therapeutic compositions can be taken long term (for example over a period of months or years).
Assessment
Following the administration of one or more therapies, subjects having HIV can be monitored for reductions in HIV levels, increases in a subjects CD4+ T cell count, or reductions in one or more clinical symptoms associated with HIV. In particular examples, subjects are analyzed one or more times, starting 7 days following treatment. Subjects can be monitored using any method known in the art. For example, biological samples from the subject, including blood, can be obtained and alterations in HIV or CD4+ T cell levels evaluated.
Additional treatments
In particular examples, if subjects are stable or have a minor, mixed or partial response to treatment, they can be re-treated after re-evaluation with the same schedule and preparation of agents that they previously received for the desired amount of time, including the duration of a subject's lifetime. A partial response is a reduction, such as at least a 10%, at least 20%, at least 30%, at least 40%, at least 50%, or at least 70% in HIV infection, HIV replication or combination thereof. A partial response may also be an increase in CD4+ T cell count such as at least 350 T cells per microliter.
Example 8
Creation of VRCOl and VRCOl-like Multimeric Antibodies.
As disclosed herein VRCOl, a broadly neutralizing human IgGl monoclonal antibody against HIV, was cloned from human B cells obtained from an HIV infected donor. VRCOl IgGl was shown to have very potent neutralization activity against more than 90% of HIV isolates from all clades. This makes it an attractive therapeutic candidate, and a subject for extensive studies aiming to understand the nature of the antigenic stimuli needed to generate VRCOl-like antibodies by infection or immunization. This example describes the characterization and development of an IgM antibody carrying the VRCOl V region. Its neutralizing activity against HIV with that of the originally isolated VRCOl IgGl is compared. The VRCOl V region was cloned into an expression vector containing the constant region from the m chain. The IgM was then produced in 293F cells transiently transfected with this plasmid along with two other plasmids that encoded the VRCOl k light chain and the human J chain, respectively The IgM was purified by FPLC using a HiTrap IgM column and a Superose-6 size exclusion column.
Secreted pentameric IgM antibodies carrying the VRCOl V region were purified to homogeneity by IgM affinity chromatography followed by size exclusion chromatography. On a molar basis comparison with the VRCOl- IgG, the IgM antibody has increased in vitro neutralizing activity against a panel of VRCOl - sensitive and resistant HIV virus strains.
Example 9
Cross-donor phylogenetic analysis as a means to identify VRCOl-like antibodies
Cross-donor phylogenetic analysis was applied to identify sequences of VRCOl-like heavy chains, from sequences of antibody heavy chains using the tools of phylogenetic analysis to identify features of VRCOl-like antibodies.
The first step in the analysis used an iterative screening of the antibody heavy chain sequences from an individual based on neighbor-joining (NJ) phylogenetic analysis. Starting with a sequence (or sequences) which encompass the heavy-chain variable domain obtained from 454 pyrosequencing, germline origins were assigned to each sequences using the program IgBLAST and the database of Ig germline gene sequences (see NCBI web site:
ncbi.nlm.nih.gov/igblast/showGermline.cgi). For heavy-chain sequences that share the same V-gene origin (IGHV 1-2*02) as VRCOl, an iterative procedure based on the NJ method was used to search for a small set of potentially VRCOl-like sequences. Briefly, the full-length sequences of the IGHV 1-2*02 origin were divided into subsets of no more than 5,000 sequences, the nucleotide sequences of heavy-chain variable domains of known neutralizing mAbs VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32 and the sequence of germline gene IGHV 1-2*02 were added to each set, and finally a NJ tree was constructed for each subset using the "Phylogenetic trees" option in
ClustalW2. In the tree-building process, the nucleotide distance was calculated as percent divergence between all pairs of sequences in the multiple sequence alignment. After the NJ tree was rooted at the germline gene (IGHV 1-2*02), the sequences that clustered in a distinct branch containing neutralizing mAbs VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31 and VRC- CH32 were extracted from the NJ tree and deposited into a new data set for the next round of NJ tree analysis.
The procedure was repeated until convergence, where all the sequences reside within a subtree containing the known neutralizing mAbs and no other sequences reside between this subtree and the root, and where a further repeat of the analysis did not change the NJ tree.
The second step involved a more accurate maximum-likelihood (ML) phylogenetic analysis of the obtained sequences. With the neutralizing mABs VRCOl, VRC02, VRC03, VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32 incorporated into the data set, the multiple sequence alignment was constructed in a similar manner to the first step and provided as input to construct phylogenetic trees using DNAMLK (for DNA Maximum Likelihood program with Molecular Clock) (cmgm.stanford.edu/phylip/dnamlk.html) in the PHYLIP package v3.69 (evolution.genetics.washington.edu/phylip.html). The calculations were done with default parameters (empirical base frequencies, the transitions to trans versions ratio of 2.0, and the overall base substitution model as A 0.24, C 0.28, G 0.27, T 0.21). The output unrooted tree was visualized using Dendroscope, then ordered to ladderize right and rooted at the sequence of germline IGHV 1-2*02. The sequences that still segregate with the known broadly neutralizing VRCOl -like antibodies in a distinct subtree were considered "VRCOl -like" antibodies and some of these sequences were produced and subjected to the experimental validation involving light chain complementation and verification of HIV- 1 neutralizing activity.
In view of the many possible embodiments to which the principles of our invention may be applied, it should be recognized that illustrated embodiments are only examples of the invention and should not be considered a limitation on the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.

Claims

We claim:
1. An isolated nucleic acid molecule encoding a VRCOl-like heavy chain variable domain, selected by the process of:
(a) performing a cross-donor phylogenetic analysis on a starting population of test sequences to select a nucleic acid sequence of interest, wherein each of the test sequences in the starting population is a nucleic acid sequence encoding a heavy chain variable domain from a subject infected with HIV, and wherein the cross- donor phylogenetic analysis comprises:
(i) forming an analytic population of sequences by adding to the starting population of test sequences:
reference nucleotide sequences, each reference nucleotide sequence encoding a heavy chain variable domain from one of the VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33 or VRC- CH34 heavy chain variable domains; and
the nucleotide sequence of IGHVl-2*02 germline;
(ii) constructing a phylogenetic tree of the analytic population of sequences using neighbor-joining analysis, wherein
the phylogenetic tree comprises leaves, at least one subtree and a root,
the phylogenetic tree is rooted at the nucleotide sequence of IGHVl-2*02 germline, and
the reference nucleotide sequences are leaves of a reference tree, which is the smallest subtree of the phylogenetic tree comprising all the reference nucleotide sequences;
(iii) selecting test sequences that are leaves of the reference tree to form a new population of test sequences;
wherein if a test sequence segregates between the reference tree and the root of the phylogenetic tree, repeating steps (i) - (iii) on the new population of test sequences, and if no test sequence segregates between the reference tree and the root of the phylogenetic tree, selecting a test sequence that segregates into a leaf of the reference tree as the nucleic acid sequence of interest; and
(b) producing a nucleic acid molecule comprising the nucleic acid sequence of interest, thereby producing the isolated nucleic acid molecule encoding the VRCOl-like heavy chain variable domain.
2. The isolated nucleic acid molecule of claim 1, wherein the starting population of test sequences is a population of nucleic acid sequences encoding heavy chain variable domains of VHl-2 genomic origin and other genomic origins.
3. The isolated nucleic acid molecule of claim 1, wherein the starting population of test sequences is a population of nucleic acid sequences encoding heavy chain variable domains of VHl-2 genomic origin.
4. The isolated nucleic acid molecule of claim 1 or claim 2, wherein forming an analytic population of sequences further comprises adding additional reference nucleotide sequences to the population of test sequences, each additional reference nucleotide sequence encoding a heavy chain variable domain from one of the 3BNC60, 3BNC117, 12A12, 12A21, 1NC9, 1B2530, 8ANC131 or 8ANC134 heavy chain variable domains.
5. The isolated nucleic acid molecule of any one of claims 1-4, wherein (ii) constructing the phylogenetic tree comprises forming a non-rooted neighbor joining tree of the analytical population of sequences and rooting the non-rooted neighbor joining tree on the nucleotide sequence of IGHVl-2*02 germline.
6. The isolated nucleic acid molecule of any one of claims 1-5, wherein performing the cross-donor phylogenetic analysis further comprises dividing the starting population of test sequences into subpopulations of test sequences and performing steps (i) - (iii) of the cross-donor phylogenetic analysis on each subpopulation independently; wherein if a test sequence segregates between the reference antibody tree and the root of the phylogenetic tree from any subpopulation of test sequences, combining the selected new populations of test sequences from each independently analyzed subpopulation of test sequences to form a combined new population of test sequences, dividing the combined new populations of test sequences into new subpopulations of test sequences and repeating steps (i) - (iii) of the cross-donor phylogenetic analysis on the new subpopulation of test sequences independently, and if no test sequence segregates between the reference antibody tree and the root of the phylogenetic tree from any subpopulation of test sequences, selecting a test sequence that segregates into a leaf of the reference tree from one of the
subpopulations of sequences as the nucleic acid sequence of interest.
7. The isolated nucleic acid molecule of claim 1, wherein the initial population of test sequences is a population of nucleic acid sequences from a B cell sample from the subject infected with a human immunodeficiency virus HIV.
8. The isolated nucleic acid molecule of claim 1, wherein the B cell sample comprises peripheral blood mononuclear cells.
9. The isolated nucleic acid molecule of claim 1, wherein the B cell sample is an isolated B cell that produces antibodies that specifically bind RSC3.
10. The isolated nucleic acid molecule encoding the VRCOl-like heavy chain variable domain of any one of claims 1-9, wherein a heavy chain comprising the VRCOl-like heavy chain variable domain complements with the light chain domain of one of VRC-PG04, VRC-PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33, VRC-CH34, VRCOl, VRC02, and VRC03 to form a monoclonal antibody, and wherein the monoclonal antibody specifically binds gpl20 and is neutralizing.
11. The isolated nucleic acid molecule of any one of claim 1-10, wherein the isolated nucleic acid molecule does not encode the heavy chain variable domain of an established VRCOl-like antibody.
12. The isolated nucleic acid molecule of any one of claim 1-11, wherein the starting population of test sequences does not include a nucleic acid sequence encoding the heavy chain variable domain of an established VRCOl-like antibody.
13. The isolated nucleic acid molecule of any one of claims 1-12, wherein the nucleotide sequences in the starting population of sequences are 300- 600 nucleotides in length.
14. An isolated antibody comprising:
the VRCOl-like heavy chain variable domain encoded by the nucleic acid molecule of claim 1 ; and
a human light chain variable domain, wherein the antibody specifically binds gpl20 and is neutralizing.
15. The isolated antibody of claim 14, wherein the light chain variable domain is a VRCOl, VRC02, or VRC03 light chain variable domain.
16. The isolated antibody of any one of claims 14 or 15 wherein the antibody binds resurfaced stabilized core (RSC) 3 with a KD of 10 -"8 or less and/or specifically binds to residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20 or a subset or combination thereof.
17. The isolated antibody of any one of claims 13-15, wherein the antibody contacts the epitope on the surface of gpl20 substantially with the heavy chain of the antibody in a manner similar to that of CD4.
18. The isolated antibody of any one of claims 14-16, wherein the antibody comprises one or more CDRs from a heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, and 43-1603, or the amino acid sequence set forth as one of 1669-1755 and a light chain.
19. The isolated antibody of any one of claims 14-18, wherein the antibody comprises the heavy chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, and 43-1603, or the amino acid sequence set forth as one of 1669-1755 and a light chain.
20. The isolated antibody of any of claims 14-18, wherein the antibody is an IgG, IgM or IgA.
21. An isolated antigen-binding fragment of the isolated human monoclonal antibody of any of claims 14-20.
22. The isolated antigen-binding fragment of claim 21, wherein the fragment is a Fab fragment, a Fab' fragment, a F(ab)'2 fragment, a single chain Fv protein (scFv), or a disulfide stabilized Fv protein (dsFv).
23. The isolated antigen-binding fragment of claim 22, wherein the fragment is a Fab fragment.
24. The isolated antibody of any of claims 14-23, or an antigen-binding fragment thereof, wherein the antibody is labeled.
25. The isolated antibody or antigen -binding fragment of claim 24, wherein the label is a fluorescent, enzymatic, or radioactive label.
26. A composition comprising the isolated antibody of claims 14-25, or an antigen-binding fragment thereof, and a pharmaceutically acceptable carrier.
An isolated nucleic acid molecule encoding the isolated antibody of any one of claims 14-25 or an antigen-binding fragment thereof.
28. The isolated nucleic acid molecule of claim 26, comprising the nucleic acid sequence set forth as one of SEQ ID NOs: 1, 2, 11-34, and 43-1603, or encoding the amino acid sequence set forth as one of 1669-1755.
29. The isolated nucleic acid molecule of claims 1-13, 27 or 28, operably linked to a promoter.
30. An expression vector comprising the isolated nucleic acid molecule of any one of claims 27-29.
31. The expression vector of claim 30, encoding an immunoadhesin.
32. The expression vector of claim 30 or 31, wherein the antibody is an
IgA.
33. The expression vector of any one of claims 30-32, wherein the expression vector comprises a promoter and an enhancer, and wherein the promoter is a cytomegalovirus promoter and/or the enhancer is a cytomegalovirus enhancer.
34. The expression vector of any one of claims 30-33, comprising RNA splicing donor sites, RNA splicing acceptor sites and/or internal ribosomal binding sequences.
35. The expression vector of any one of claims 30-34, wherein the heavy chain of the antibody and the light chain of the antibody are expressed as a fusion polypeptide following the introduction of the expression vector in a host cell.
36. The expression vector of claim 35, comprising a nucleic acid sequence encoding a furin cleavage site between the heavy chain and the light chain of the antibody.
37. The expression vector of anyone of claims 30-36, wherein the vector is expressed in Lactobacillus.
38. The expression vector of claim 37, comprising a leader sequence that is expressed in Lactobacillus.
39. The expression vector of claim 34, wherein the RNA splicing donor sites are HTLV-1 or CMV RNA splicing donor sites.
40. An isolated host cell transformed with the nucleic acid molecule or vector of any one of claims 1-13 or 27-39.
41. A composition comprising the nucleic acid molecule or vector of any one of claims 1-13, 26-39, and a pharmaceutically acceptable carrier.
42. A method of detecting a human immunodeficiency virus (HIV)-l infection in a subject comprising:
contacting a biological sample from the subject with at least one isolated human monoclonal antibody of claims 14-25 or a functional fragment thereof; and detecting antibody bound to the sample,
wherein the presence of antibody bound to the sample indicates that the subject has an HIV-1 infection.
43. The method of claim 42, wherein the isolated human monoclonal antibody is directly labeled.
44. The method of claim 42 or 43, further comprising:
contacting the sample with a second antibody that specifically binds the isolated human monoclonal antibody; and
detecting the binding of the second antibody,
wherein an increase in binding of the second antibody to the sample as compared to binding of the second antibody to a control sample detects the presence of an HIV-1 infection the subject.
45. A method for preventing or treating an human immunodeficiency virus (HIV)-l infection in a subject, comprising administering to the subject a therapeutically effective amount of at least one antibody of any one of claims 14-25, or an antigen-binding fragment thereof, or the nucleic acid molecule or vector of any one of claims 1-13 or 27-39, thereby preventing or treating the HIV-1 infection.
46. The method of claim 45, wherein the method is a method for treating an HIV-1 infection, and wherein the subject has acquired immune deficiency syndrome (AIDS).
47. The method of claim 45 or 46, further comprising administering to the subject an anti-viral agent.
48. The method of any one of claims 45-47, further comprising measuring HIV-1 viral titer in the subject.
49. A method for testing a potential vaccine, comprising contacting the potential vaccine with an antibody of any one of claims 14-25, or an antigen-binding fragment thereof; and detecting the binding of the antibody to an immunogen in the potential vaccine.
50. A system for identifying a VRCOl-like heavy chain variable domain, comprising
reference nucleotide sequences, each encoding the heavy chain variable domain from one of the VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33 or VRC-CH34 heavy chain variable domains,
the nucleotide sequence of IGHVl-2*02 germline;
a specialized computer program for performing cross-donor phylogenetic analysis on a starting population of test sequences to select a nucleic acid sequence of interest, wherein each of the test sequences is a nucleic acid sequence encoding a heavy chain variable domain from a subject infected with HIV, and wherein the cross-donor phylogenetic analysis comprises:
(i) forming an analytic population of sequences by adding the reference nucleotide sequences and the nucleotide sequence of IGHV 1-2*02 germline to the starting population of test sequences:
(ii) constructing a phylogenetic tree of the analytic population of sequences using neighbor-joining analysis, wherein
the phylogenetic tree comprises leaves, at least one subtree and a root,
the phylogenetic tree is rooted at the nucleotide sequence of IGHVl-2*02 germline, and
the reference nucleotide sequences are leaves of a reference tree, which is the smallest subtree of the phylogenetic tree comprising all the reference nucleotide sequences;
(iii) selecting the test sequences that segregate into leaves of the reference tree to form a new population of test sequences;
wherein if a test sequence segregates between the reference tree and the root of the phylogenetic tree, repeating steps (i) - (iii) on the new population of test sequences, and if no test sequence segregates between the reference tree and the root of the phylogenetic tree, selecting a test sequence that segregates into a leaf of the reference tree as the nucleic acid sequence of interest; and
wherein the nucleic acid sequence of interest encodes the VRCOl-like heavy chain variable domain.
51. The system of claim 51, wherein the isolated nucleic acid molecule does not encode the heavy chain variable domain of an established VRCOl-like antibody.
52. The isolated nucleic acid molecule of claim 51, wherein the starting population of test sequences does not include a nucleic acid sequence encoding the heavy chain variable domain of an established VRCOl-like antibody.
53. The isolated nucleic acid molecule of claim 51, wherein the nucleotide sequences in the starting population of sequences are from 300-600 nucleotides in length.
54. An isolated nucleic acid molecule encoding a VRCOl-like light chain variable domain, selected by the process of:
(a) identifying a nucleic acid encoding a VRCOl-like light chain in a population of test sequences to select a nucleic acid sequence of interest, wherein each test sequence in the population is a nucleic acid sequence encoding a light chain variable domain from a subject infected with HIV, wherein the light chain variable domain comprises a complementarity determining region (CDR)l, a CDR2 and a CDR3 and has a corresponding germline origin light chain variable domain, wherein the CDR3 of the VRCOl-like light chain variable domain comprises a hydrophobic residue followed by a glutamic acid residue or glutamine residue; and wherein if the germline origin of the VRCOl-like light chain variable domain is a IGKV1-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises at least two glycine residues, and if the germline origin of the VRCOl-like light chain variable domain is not a IGKV1-33 germline origin, the CDRl of the VRCOl-like light chain variable domain comprises a deletion of two or more amino acids compared to the corresponding germline origin,
(b) selecting a test sequence that encodes the VRCOl-like light chain variable domain as the nucleic acid sequence of interest; and
(c) synthesizing an isolated nucleic acid molecule comprising the nucleic acid sequence of interest, thereby producing the isolated nucleic acid molecule encoding the VRCOl-like light chain variable domain.
55. The isolate nucleic acid of claim 54, wherein the hydrophobic residue is a tyrosine, leucine or phenylalanine residue.
56. The isolated nucleic acid molecule of claim 54, wherein the population of test sequences is a population of nucleic acid sequences from a B cell sample from the subject infected with a human immunodeficiency virus HIV.
57. The isolated nucleic acid molecule of claim 56, wherein the B cell sample comprises peripheral blood mononuclear cells.
58. The isolated nucleic acid molecule of claim 56, wherein the B cell sample is an isolated B cell that produces antibodies that specifically bind RSC3.
59. The isolated nucleic acid molecule of any one of claim 54-58, wherein the isolated nucleic acid molecule does not encode the light chain variable domain of an established VRCOl-like antibody.
60. The isolated nucleic acid molecule of any one of claim 54-59, wherein the population of test sequences does not include a nucleic acid sequence encoding the heavy chain variable domain of an established VRCOl-like antibody.
61. The isolated nucleic acid molecule of any one of claims 54-60, wherein the nucleotide sequences in the population of test sequences are 300-600 nucleotides in length.
62. An isolated antibody comprising:
the VRCOl-like light chain variable domain encoded by the nucleic acid molecule of claim 54; and
a human heavy chain variable domain, wherein the antibody specifically binds gpl20 and is neutralizing.
63. The isolated antibody of claim 62, wherein the heavy chain variable domain is a VRCOl, VRC02, or VRC03 heavy chain variable domain.
64. The isolated antibody of any one of claims 62 or 63 wherein the antibody binds resurfaced stabilized core (RSC) 3 with a KD of 10 -"8 or less and/or specifically binds to residues 276, 278-283, 365-368, 371, 455-459, 461, 469, and 472-474 of gpl20 or a subset or combination thereof.
65. The isolated antibody of any one of claims 62-64, wherein the antibody contacts the epitope on the surface of gpl20 substantially with the heavy chain of the antibody in a manner similar to that of CD4.
66. The isolated antibody of any one of claims 62-65, wherein the antibody comprises one or more CDRs from a light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 or 4, and a heavy chain.
67. The isolated antibody of any one of claims 62-66, wherein the antibody comprises the light chain amino acid sequence encoded by the nucleic acid sequence set forth as one of SEQ ID NOs: 3 or 4 and a heavy chain.
68. The isolated antibody of any of claims 62-67, wherein the antibody is an IgG, IgM or IgA.
69. An isolated antigen-binding fragment of the isolated human monoclonal antibody of any of claims 62-68.
70. The isolated antigen-binding fragment of claim 69, wherein the fragment is a Fab fragment, a Fab' fragment, a F(ab)'2 fragment, a single chain Fv protein (scFv), or a disulfide stabilized Fv protein (dsFv).
71. The isolated antigen-binding fragment of claim 70, wherein the fragment is a Fab fragment.
The isolated antibody of any of claims 62-71, or an antigen-binding fragment thereof, wherein the antibody is labeled.
73. The isolated antibody or antigen -binding fragment of claim 72, wherein the label is a fluorescent, enzymatic, or radioactive label.
74. A composition comprising the isolated antibody of claims 62-73, or an antigen-binding fragment thereof, and a pharmaceutically acceptable carrier.
75. An isolated nucleic acid molecule encoding the isolated antibody of any one of claims 62-71 or an antigen-binding fragment thereof.
76. The isolated nucleic acid molecule of claims 54-61 or 76, operably linked to a promoter.
77. An expression vector comprising the isolated nucleic acid molecule of claim 76.
The expression vector of claim 77, encoding an immunoadhesin.
79. The expression vector of claim 78, wherein the antibody is an IgA.
80. The expression vector of any one of claims 78-79, wherein the expression vector comprises a promoter and an enhancer, and wherein the promoter is a cytomegalovirus promoter and/or the enhancer is a cytomegalovirus enhancer.
81. The expression vector of any one of claims 79-80, comprising RNA splicing donor sites, RNA splicing acceptor sites and/or internal ribosomal binding sequences.
82. The expression vector of any one of claims 78-81, wherein the heavy chain of the antibody and the light chain of the antibody are expressed as a fusion polypeptide following the introduction of the expression vector in a host cell.
83. The expression vector of claim 82, comprising a nucleic acid sequence encoding a furin cleavage site between the heavy chain and the light chain of the antibody.
84. The expression vector of anyone of claims 78-83, wherein the vector is expressed in Lactobacillus.
85. The expression vector of claim 84, comprising a leader sequence that is expressed in Lactobacillus.
86. The expression vector of claim 81, wherein the RNA splicing donor sites are HTLV-1 or CMV RNA splicing donor sites.
87. An isolated host cell transformed with the nucleic acid molecule or vector of any one of claims 54-61 or 78-86.
88. A composition comprising the nucleic acid molecule or vector of any one of claims 54-61 or 78-86, and a pharmaceutically acceptable carrier.
89. A method of detecting a human immunodeficiency virus (HIV)-l infection in a subject comprising:
contacting a biological sample from the subject with at least one isolated human monoclonal antibody of claims 62-73 or a functional fragment thereof; and detecting antibody bound to the sample,
wherein the presence of antibody bound to the sample indicates that the subject has an HIV-1 infection.
90. The method of claim 89, wherein the isolated human monoclonal antibody is directly labeled.
The method of claim 89 or 90, further comprising: contacting the sample with a second antibody that specifically binds the isolated human monoclonal antibody; and
detecting the binding of the second antibody,
wherein an increase in binding of the second antibody to the sample as compared to binding of the second antibody to a control sample detects the presence of an HIV-1 infection the subject.
92. A method for preventing or treating an human immunodeficiency virus (HIV)-l infection in a subject, comprising administering to the subject a therapeutically effective amount of at least one antibody of any one of claims 62-73, or an antigen-binding fragment thereof, or the nucleic acid molecule or vector of any one of claims 54-61 or 78-86, thereby preventing or treating the HIV-1 infection.
93. The method of claim 92, wherein the method is a method for treating an HIV-1 infection, and wherein the subject has acquired immune deficiency syndrome (AIDS).
94. The method of claim 92 or 93, further comprising administering to the subject an anti-viral agent.
95. The method of any one of claims 93-94, further comprising measuring HIV-1 viral titer in the subject.
96. A method for testing a potential vaccine, comprising contacting the potential vaccine with an antibody of any one of claims 62-73, or an antigen-binding fragment thereof; and detecting the binding of the antibody to an immunogen in the potential vaccine.
97. The isolated antibody of claim 62, wherein the heavy chain variable domain is a heavy chain variable domain selected by the process of:
(a) performing a cross-donor phylogenetic analysis on a starting population of test sequences to select a nucleic acid sequence of interest, wherein each of the test sequences in the starting population is a nucleic acid sequence encoding a heavy chain variable domain from a subject infected with HIV, and wherein the cross- donor phylogenetic analysis comprises:
(i) forming an analytic population of sequences by adding to the starting population of test sequences:
reference nucleotide sequences, each reference nucleotide sequence encoding a heavy chain variable domain from one of the VRCOl, VRC02, VRC03, NIH45-46, VRC-PG04, VRC- PG04b, VRC-CH30, VRC-CH31, VRC-CH32, VRC-CH33 or VRC- CH34 heavy chain variable domains; and
the nucleotide sequence of IGHVl-2*02 germline;
(ii) constructing a phylogenetic tree of the analytic population of sequences using neighbor-joining analysis, wherein
the phylogenetic tree comprises leaves, at least one subtree and a root,
the phylogenetic tree is rooted at the nucleotide sequence of IGHVl-2*02 germline, and
the reference nucleotide sequences are leaves of a reference tree, which is the smallest subtree of the phylogenetic tree comprising all the reference nucleotide sequences;
(iii) selecting test sequences that are leaves of the reference tree to form a new population of test sequences;
wherein if a test sequence segregates between the reference tree and the root of the phylogenetic tree, repeating steps (i) - (iii) on the new population of test sequences, and if no test sequence segregates between the reference tree and the root of the phylogenetic tree, selecting a test sequence that segregates into a leaf of the reference tree as the nucleic acid sequence of interest; and
(b) selecting the heavy chain variable domain encoded by the nucleic acid sequence of interest, thereby selecting the heavy chain variable domain.
PCT/US2012/030465 2011-05-09 2012-03-23 Neutralizing antibodies to hiv-1 and their use WO2012154312A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/US2013/032070 WO2013142324A1 (en) 2012-03-23 2013-03-15 Neutralizing antibodies to hiv-1 and their use
EP13763664.3A EP2828294A1 (en) 2012-03-23 2013-03-15 Neutralizing antibodies to hiv-1 and their use
US14/386,920 US20150044137A1 (en) 2012-03-23 2013-03-15 Neutralizing antibodies to hiv-1 and their use

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201161484184P 2011-05-09 2011-05-09
US61/484,184 2011-05-09
US201161515528P 2011-08-05 2011-08-05
US61/515,528 2011-08-05
US201161522205P 2011-08-10 2011-08-10
US61/522,205 2011-08-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/386,920 Continuation-In-Part US20150044137A1 (en) 2012-03-23 2013-03-15 Neutralizing antibodies to hiv-1 and their use

Publications (1)

Publication Number Publication Date
WO2012154312A1 true WO2012154312A1 (en) 2012-11-15

Family

ID=47139497

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/030465 WO2012154312A1 (en) 2011-05-09 2012-03-23 Neutralizing antibodies to hiv-1 and their use

Country Status (1)

Country Link
WO (1) WO2012154312A1 (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013086533A1 (en) 2011-12-08 2013-06-13 The United States Of America, As Represented By The Secretary Department Of Health & Human Services Neutralizing antibodies to hiv-1 and their use
CN104573405A (en) * 2014-12-22 2015-04-29 中国科学院计算机网络信息中心 Phylogenetic tree rebuilding method for building sub trees on basis of big trees
WO2016037154A1 (en) 2014-09-04 2016-03-10 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Recombinant hiv-1 envelope proteins and their use
WO2016154003A1 (en) * 2015-03-20 2016-09-29 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Neutralizing antibodies to gp120 and their use
WO2017074878A1 (en) 2015-10-25 2017-05-04 Sanofi Trispecific and/or trivalent binding proteins for prevention or treatment of hiv infection
WO2017079479A1 (en) 2015-11-03 2017-05-11 The United States Of America, As Represented By The Secretary, Department Of Health And Human Neutralizing antibodies to hiv-1 gp41 and their use
WO2018005558A1 (en) 2016-06-27 2018-01-04 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Self-assembling insect ferritin nanoparticles for display of co-assembled trimeric antigens
WO2018075564A1 (en) 2016-10-17 2018-04-26 University Of Maryland, College Park Multispecific antibodies targeting human immunodeficiency virus and methods of using the same
WO2018176031A1 (en) 2017-03-24 2018-09-27 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Glycan-masked engineered outer domains of hiv-1 gp120 and their use
AU2017204563B2 (en) * 2011-05-17 2018-11-22 California Institute Of Technology Human immunodeficiency virus neutralising antibodies and methods of use thereof
WO2019079337A1 (en) 2017-10-16 2019-04-25 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Recombinant hiv-1 envelope proteins and their use
WO2019165122A1 (en) 2018-02-21 2019-08-29 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Neutralizing antibodies to hiv-1 env and their use
WO2020010107A1 (en) 2018-07-03 2020-01-09 Gilead Sciences, Inc. Antibodies that target hiv gp120 and methods of use
WO2020210386A1 (en) 2019-04-09 2020-10-15 Sanofi Trispecific and/or trivalent binding proteins using the cross-over-dual-variable domain (codv) format for treatment of hiv infection
WO2020263830A1 (en) 2019-06-25 2020-12-30 Gilead Sciences, Inc. Flt3l-fc fusion proteins and methods of use
US10882922B2 (en) 2016-04-13 2021-01-05 Sanofi Trispecific and/or trivalent binding proteins
WO2021011544A1 (en) 2019-07-16 2021-01-21 Gilead Sciences, Inc. Hiv vaccines and methods of making and using
WO2021011891A1 (en) 2019-07-18 2021-01-21 Gilead Sciences, Inc. Long-acting formulations of tenofovir alafenamide
WO2021130638A1 (en) 2019-12-24 2021-07-01 Carna Biosciences, Inc. Diacylglycerol kinase modulating compounds
WO2021188959A1 (en) 2020-03-20 2021-09-23 Gilead Sciences, Inc. Prodrugs of 4'-c-substituted-2-halo-2'-deoxyadenosine nucleosides and methods of making and using the same
WO2021236944A1 (en) 2020-05-21 2021-11-25 Gilead Sciences, Inc. Pharmaceutical compositions comprising bictegravir
US11186649B2 (en) 2017-10-10 2021-11-30 Sanofi Anti-CD38 antibodies and methods of use
WO2022031894A1 (en) 2020-08-07 2022-02-10 Gilead Sciences, Inc. Prodrugs of phosphonamide nucleotide analogues and their pharmaceutical use
WO2022035860A2 (en) 2020-08-10 2022-02-17 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Replication-competent adenovirus type 4-hiv env vaccines and their use
WO2022046644A1 (en) 2020-08-25 2022-03-03 Gilead Sciences, Inc. Multi-specific antigen binding molecules targeting hiv and methods of use
WO2022103758A1 (en) 2020-11-11 2022-05-19 Gilead Sciences, Inc. METHODS OF IDENTIFYING HIV PATIENTS SENSITIVE TO THERAPY WITH gp120 CD4 BINDING SITE-DIRECTED ANTIBODIES
US11530268B2 (en) 2018-10-09 2022-12-20 Sanofi Trispecific anti-CD38, anti-CD28, and anti-CD3 binding proteins and methods of use for treating viral infection
WO2022271650A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271677A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271659A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271684A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
US11613576B2 (en) 2019-04-09 2023-03-28 Sanofi Trispecific binding proteins, methods, and uses thereof
WO2023102239A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023102529A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023102523A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023196875A1 (en) 2022-04-06 2023-10-12 Gilead Sciences, Inc. Bridged tricyclic carbamoylpyridone compounds and uses thereof
WO2024006982A1 (en) 2022-07-01 2024-01-04 Gilead Sciences, Inc. Therapeutic compounds useful for the prophylactic or therapeutic treatment of an hiv virus infection
WO2024015741A1 (en) 2022-07-12 2024-01-18 Gilead Sciences, Inc. Hiv immunogenic polypeptides and vaccines and uses thereof
US11932704B2 (en) 2016-04-13 2024-03-19 Sanofi Trispecific and/or trivalent binding proteins
WO2024076915A1 (en) 2022-10-04 2024-04-11 Gilead Sciences, Inc. 4'-thionucleoside analogues and their pharmaceutical use

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011038290A2 (en) * 2009-09-25 2011-03-31 The U. S. A., As Represented By The Secretary, Department Of Health And Human Services Neutralizing antibodies to hiv-1 and their use
WO2012040562A2 (en) * 2010-09-24 2012-03-29 International Aids Vaccine Initiative Novel hiv-1 broadly neutralizing antibodies

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011038290A2 (en) * 2009-09-25 2011-03-31 The U. S. A., As Represented By The Secretary, Department Of Health And Human Services Neutralizing antibodies to hiv-1 and their use
WO2012040562A2 (en) * 2010-09-24 2012-03-29 International Aids Vaccine Initiative Novel hiv-1 broadly neutralizing antibodies

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DATABASE GENBANK 18 August 2011 (2011-08-18), WU, X. ET AL: "Focused Evolution of HIV-1 Neutralizing Antibodies Revealed by Structures and Deep Sequencing", accession no. N159474 *
SCIENCE, vol. 333, 2011, pages 1593 - 1602 *

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017204563B2 (en) * 2011-05-17 2018-11-22 California Institute Of Technology Human immunodeficiency virus neutralising antibodies and methods of use thereof
US11634478B2 (en) 2011-05-17 2023-04-25 The Rockefeller University Human immunodeficiency virus neutralizing antibodies and methods of use thereof
AU2019200972B2 (en) * 2011-05-17 2020-04-16 California Institute Of Technology Human immunodeficiency virus neutralising antibodies and methods of use thereof
EP3865507A1 (en) * 2011-05-17 2021-08-18 The Rockefeller University Human immunodeficiency virus neutralizing antibodies and methods of use thereof
US10889633B2 (en) 2011-05-17 2021-01-12 The Rockefeller University Human immunodeficiency virus neutralizing antibodies and methods of use thereof
US10815295B2 (en) 2011-12-08 2020-10-27 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Broadly neutralizing HIV-1 antibodies that bind to the CD4-binding site of the envelope protein
US9695230B2 (en) 2011-12-08 2017-07-04 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Broadly neutralizing HIV-1 VRC07 antibodies that bind to the CD4-binding site of the envelope protein
AU2012347453B2 (en) * 2011-12-08 2017-11-23 The United States Of America, As Represented By The Secretary Department Of Health And Human Services Neutralizing antibodies to HIV-1 and their use
WO2013086533A1 (en) 2011-12-08 2013-06-13 The United States Of America, As Represented By The Secretary Department Of Health & Human Services Neutralizing antibodies to hiv-1 and their use
US10035844B2 (en) 2011-12-08 2018-07-31 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Broadly neutralizing HIV-1 VRC07 antibodies that bind to the CD4-binding site of the envelope protein
WO2016037154A1 (en) 2014-09-04 2016-03-10 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Recombinant hiv-1 envelope proteins and their use
CN104573405A (en) * 2014-12-22 2015-04-29 中国科学院计算机网络信息中心 Phylogenetic tree rebuilding method for building sub trees on basis of big trees
CN108137676A (en) * 2015-03-20 2018-06-08 美国政府(由卫生和人类服务部的部长所代表) Gp120 neutralizing antibodies and application thereof
CN108137676B (en) * 2015-03-20 2022-03-15 美国政府(由卫生和人类服务部的部长所代表) gp120 neutralizing antibodies and uses thereof
US10562960B2 (en) 2015-03-20 2020-02-18 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Neutralizing antibodies to gp120 and their use
EP3683233A1 (en) 2015-03-20 2020-07-22 The U.S.A. as represented by the Secretary, Department of Health and Human Services Neutralizing antibodies to gp120 and their use
WO2016154003A1 (en) * 2015-03-20 2016-09-29 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Neutralizing antibodies to gp120 and their use
EP3819310A1 (en) 2015-10-25 2021-05-12 Sanofi Trispecific and/or trivalent binding proteins for prevention or treatment of hiv infection
US11129905B2 (en) 2015-10-25 2021-09-28 Sanofi Bivalent, bispecific binding proteins for prevention or treatment of HIV infection
US11779651B2 (en) 2015-10-25 2023-10-10 Sanofi Bivalent, bispecific binding proteins for prevention or treatment of HIV infection
WO2017074878A1 (en) 2015-10-25 2017-05-04 Sanofi Trispecific and/or trivalent binding proteins for prevention or treatment of hiv infection
EP4011911A1 (en) 2015-11-03 2022-06-15 The United States of America as represented by The Secretary Department of Health and Human Services Neutralizing antibodies to hiv-1 gp41 and their use
WO2017079479A1 (en) 2015-11-03 2017-05-11 The United States Of America, As Represented By The Secretary, Department Of Health And Human Neutralizing antibodies to hiv-1 gp41 and their use
US11932704B2 (en) 2016-04-13 2024-03-19 Sanofi Trispecific and/or trivalent binding proteins
US10882922B2 (en) 2016-04-13 2021-01-05 Sanofi Trispecific and/or trivalent binding proteins
US11192960B2 (en) 2016-04-13 2021-12-07 Sanofi Trispecific and/or trivalent binding proteins
WO2018005558A1 (en) 2016-06-27 2018-01-04 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Self-assembling insect ferritin nanoparticles for display of co-assembled trimeric antigens
WO2018075564A1 (en) 2016-10-17 2018-04-26 University Of Maryland, College Park Multispecific antibodies targeting human immunodeficiency virus and methods of using the same
WO2018176031A1 (en) 2017-03-24 2018-09-27 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Glycan-masked engineered outer domains of hiv-1 gp120 and their use
US11235056B2 (en) 2017-03-24 2022-02-01 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Glycan-masked engineered outer domains of HIV-1 gp120 and their use
US11365261B2 (en) 2017-10-10 2022-06-21 Sanofi Anti-CD38 antibodies and methods of use
US11186649B2 (en) 2017-10-10 2021-11-30 Sanofi Anti-CD38 antibodies and methods of use
WO2019079337A1 (en) 2017-10-16 2019-04-25 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Recombinant hiv-1 envelope proteins and their use
WO2019165122A1 (en) 2018-02-21 2019-08-29 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Neutralizing antibodies to hiv-1 env and their use
WO2020010107A1 (en) 2018-07-03 2020-01-09 Gilead Sciences, Inc. Antibodies that target hiv gp120 and methods of use
EP4257600A2 (en) 2018-07-03 2023-10-11 Gilead Sciences, Inc. Antibodies that target hiv gp120 and methods of use
US11530268B2 (en) 2018-10-09 2022-12-20 Sanofi Trispecific anti-CD38, anti-CD28, and anti-CD3 binding proteins and methods of use for treating viral infection
US11613576B2 (en) 2019-04-09 2023-03-28 Sanofi Trispecific binding proteins, methods, and uses thereof
WO2020210386A1 (en) 2019-04-09 2020-10-15 Sanofi Trispecific and/or trivalent binding proteins using the cross-over-dual-variable domain (codv) format for treatment of hiv infection
WO2020263830A1 (en) 2019-06-25 2020-12-30 Gilead Sciences, Inc. Flt3l-fc fusion proteins and methods of use
WO2021011544A1 (en) 2019-07-16 2021-01-21 Gilead Sciences, Inc. Hiv vaccines and methods of making and using
WO2021011891A1 (en) 2019-07-18 2021-01-21 Gilead Sciences, Inc. Long-acting formulations of tenofovir alafenamide
WO2021130638A1 (en) 2019-12-24 2021-07-01 Carna Biosciences, Inc. Diacylglycerol kinase modulating compounds
WO2021188959A1 (en) 2020-03-20 2021-09-23 Gilead Sciences, Inc. Prodrugs of 4'-c-substituted-2-halo-2'-deoxyadenosine nucleosides and methods of making and using the same
WO2021236944A1 (en) 2020-05-21 2021-11-25 Gilead Sciences, Inc. Pharmaceutical compositions comprising bictegravir
WO2022031894A1 (en) 2020-08-07 2022-02-10 Gilead Sciences, Inc. Prodrugs of phosphonamide nucleotide analogues and their pharmaceutical use
WO2022035860A2 (en) 2020-08-10 2022-02-17 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Replication-competent adenovirus type 4-hiv env vaccines and their use
WO2022046644A1 (en) 2020-08-25 2022-03-03 Gilead Sciences, Inc. Multi-specific antigen binding molecules targeting hiv and methods of use
WO2022103758A1 (en) 2020-11-11 2022-05-19 Gilead Sciences, Inc. METHODS OF IDENTIFYING HIV PATIENTS SENSITIVE TO THERAPY WITH gp120 CD4 BINDING SITE-DIRECTED ANTIBODIES
WO2022271659A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271684A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271677A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2022271650A1 (en) 2021-06-23 2022-12-29 Gilead Sciences, Inc. Diacylglyercol kinase modulating compounds
WO2023102239A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023102529A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023102523A1 (en) 2021-12-03 2023-06-08 Gilead Sciences, Inc. Therapeutic compounds for hiv virus infection
WO2023196875A1 (en) 2022-04-06 2023-10-12 Gilead Sciences, Inc. Bridged tricyclic carbamoylpyridone compounds and uses thereof
EP4310087A1 (en) 2022-04-06 2024-01-24 Gilead Sciences, Inc. Bridged tricyclic carbamoylpyridone compounds and uses thereof
WO2024006982A1 (en) 2022-07-01 2024-01-04 Gilead Sciences, Inc. Therapeutic compounds useful for the prophylactic or therapeutic treatment of an hiv virus infection
WO2024015741A1 (en) 2022-07-12 2024-01-18 Gilead Sciences, Inc. Hiv immunogenic polypeptides and vaccines and uses thereof
WO2024076915A1 (en) 2022-10-04 2024-04-11 Gilead Sciences, Inc. 4'-thionucleoside analogues and their pharmaceutical use

Similar Documents

Publication Publication Date Title
WO2012154312A1 (en) Neutralizing antibodies to hiv-1 and their use
US10035845B2 (en) Neutralizing antibodies to HIV-1 and their use
US10815295B2 (en) Broadly neutralizing HIV-1 antibodies that bind to the CD4-binding site of the envelope protein
US10047148B2 (en) Neutralizing GP41 antibodies and their use
EP2828294A1 (en) Neutralizing antibodies to hiv-1 and their use
US10273291B2 (en) Focused evolution of HIV-1 neutralizing antibodies revealed by crystal structures and deep sequencing
Mukhamedova et al. Molecular Dissection Of Human Antibody Responses Following Prefusion-Stabilized RSV F Vaccination

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12782799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12782799

Country of ref document: EP

Kind code of ref document: A1