US20230399386A1 - Rationally designed, synthetic antibody libraries and uses therefor - Google Patents
Rationally designed, synthetic antibody libraries and uses therefor Download PDFInfo
- Publication number
- US20230399386A1 US20230399386A1 US17/856,673 US202317856673A US2023399386A1 US 20230399386 A1 US20230399386 A1 US 20230399386A1 US 202317856673 A US202317856673 A US 202317856673A US 2023399386 A1 US2023399386 A1 US 2023399386A1
- Authority
- US
- United States
- Prior art keywords
- seq
- sequences
- amino acid
- library
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 102000040430 polynucleotide Human genes 0.000 claims abstract description 163
- 108091033319 polynucleotide Proteins 0.000 claims abstract description 163
- 239000002157 polynucleotide Substances 0.000 claims abstract description 163
- 238000000034 method Methods 0.000 claims abstract description 136
- 150000001413 amino acids Chemical class 0.000 claims description 298
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 183
- 108090000623 proteins and genes Proteins 0.000 claims description 169
- 210000004602 germ cell Anatomy 0.000 claims description 102
- NFGXHKASABOEEW-UHFFFAOYSA-N 1-methylethyl 11-methoxy-3,7,11-trimethyl-2,4-dodecadienoate Chemical group COC(C)(C)CCCC(C)CC=CC(C)=CC(=O)OC(C)C NFGXHKASABOEEW-UHFFFAOYSA-N 0.000 claims description 65
- 125000000539 amino acid group Chemical group 0.000 claims description 56
- 239000000427 antigen Substances 0.000 claims description 53
- 108091007433 antigens Proteins 0.000 claims description 50
- 102000036639 antigens Human genes 0.000 claims description 50
- 210000004027 cell Anatomy 0.000 claims description 50
- 210000004899 c-terminal region Anatomy 0.000 claims description 44
- 210000003719 b-lymphocyte Anatomy 0.000 claims description 33
- 210000005253 yeast cell Anatomy 0.000 claims description 21
- -1 Kabat amino acid Chemical class 0.000 claims description 16
- 101001138128 Homo sapiens Immunoglobulin kappa variable 1-12 Proteins 0.000 claims description 9
- 101001138089 Homo sapiens Immunoglobulin kappa variable 1-39 Proteins 0.000 claims description 9
- 102100020773 Immunoglobulin kappa variable 1-12 Human genes 0.000 claims description 9
- 102100020910 Immunoglobulin kappa variable 1-39 Human genes 0.000 claims description 9
- 101001138126 Homo sapiens Immunoglobulin kappa variable 1-16 Proteins 0.000 claims description 8
- 101001138123 Homo sapiens Immunoglobulin kappa variable 1-27 Proteins 0.000 claims description 8
- 101001047627 Homo sapiens Immunoglobulin kappa variable 2-28 Proteins 0.000 claims description 8
- 101001047629 Homo sapiens Immunoglobulin kappa variable 2-30 Proteins 0.000 claims description 8
- 101001047617 Homo sapiens Immunoglobulin kappa variable 3-11 Proteins 0.000 claims description 8
- 101001047618 Homo sapiens Immunoglobulin kappa variable 3-15 Proteins 0.000 claims description 8
- 101001047619 Homo sapiens Immunoglobulin kappa variable 3-20 Proteins 0.000 claims description 8
- 102100020946 Immunoglobulin kappa variable 1-16 Human genes 0.000 claims description 8
- 102100020902 Immunoglobulin kappa variable 1-27 Human genes 0.000 claims description 8
- 102100022950 Immunoglobulin kappa variable 2-28 Human genes 0.000 claims description 8
- 102100022952 Immunoglobulin kappa variable 2-30 Human genes 0.000 claims description 8
- 102100022955 Immunoglobulin kappa variable 3-11 Human genes 0.000 claims description 8
- 102100022965 Immunoglobulin kappa variable 3-15 Human genes 0.000 claims description 8
- 102100022964 Immunoglobulin kappa variable 3-20 Human genes 0.000 claims description 8
- 101001138127 Homo sapiens Immunoglobulin kappa variable 1-13 Proteins 0.000 claims description 7
- 101001138125 Homo sapiens Immunoglobulin kappa variable 1-17 Proteins 0.000 claims description 7
- 101001138121 Homo sapiens Immunoglobulin kappa variable 1-33 Proteins 0.000 claims description 7
- 101001008333 Homo sapiens Immunoglobulin kappa variable 1D-16 Proteins 0.000 claims description 7
- 101001008335 Homo sapiens Immunoglobulin kappa variable 1D-17 Proteins 0.000 claims description 7
- 101001009877 Homo sapiens Immunoglobulin kappa variable 1D-43 Proteins 0.000 claims description 7
- 101001047626 Homo sapiens Immunoglobulin kappa variable 2-24 Proteins 0.000 claims description 7
- 101001047628 Homo sapiens Immunoglobulin kappa variable 2-29 Proteins 0.000 claims description 7
- 101001047625 Homo sapiens Immunoglobulin kappa variable 2-40 Proteins 0.000 claims description 7
- 101001008321 Homo sapiens Immunoglobulin kappa variable 2D-26 Proteins 0.000 claims description 7
- 101001008325 Homo sapiens Immunoglobulin kappa variable 2D-29 Proteins 0.000 claims description 7
- 101001008327 Homo sapiens Immunoglobulin kappa variable 2D-30 Proteins 0.000 claims description 7
- 101001008257 Homo sapiens Immunoglobulin kappa variable 3D-11 Proteins 0.000 claims description 7
- 101001008315 Homo sapiens Immunoglobulin kappa variable 3D-20 Proteins 0.000 claims description 7
- 101000604674 Homo sapiens Immunoglobulin kappa variable 4-1 Proteins 0.000 claims description 7
- 101001090250 Homo sapiens Immunoglobulin kappa variable 6-21 Proteins 0.000 claims description 7
- 102100020772 Immunoglobulin kappa variable 1-13 Human genes 0.000 claims description 7
- 102100020945 Immunoglobulin kappa variable 1-17 Human genes 0.000 claims description 7
- 102100020901 Immunoglobulin kappa variable 1-33 Human genes 0.000 claims description 7
- 102100027462 Immunoglobulin kappa variable 1D-16 Human genes 0.000 claims description 7
- 102100027457 Immunoglobulin kappa variable 1D-17 Human genes 0.000 claims description 7
- 102100030883 Immunoglobulin kappa variable 1D-43 Human genes 0.000 claims description 7
- 102100022947 Immunoglobulin kappa variable 2-24 Human genes 0.000 claims description 7
- 102100022949 Immunoglobulin kappa variable 2-29 Human genes 0.000 claims description 7
- 102100022948 Immunoglobulin kappa variable 2-40 Human genes 0.000 claims description 7
- 102100027460 Immunoglobulin kappa variable 2D-26 Human genes 0.000 claims description 7
- 102100027458 Immunoglobulin kappa variable 2D-29 Human genes 0.000 claims description 7
- 102100027465 Immunoglobulin kappa variable 2D-30 Human genes 0.000 claims description 7
- 102100027405 Immunoglobulin kappa variable 3D-11 Human genes 0.000 claims description 7
- 102100027403 Immunoglobulin kappa variable 3D-20 Human genes 0.000 claims description 7
- 102100038198 Immunoglobulin kappa variable 4-1 Human genes 0.000 claims description 7
- 102100034806 Immunoglobulin kappa variable 6-21 Human genes 0.000 claims description 7
- 101001008255 Homo sapiens Immunoglobulin kappa variable 1D-8 Proteins 0.000 claims description 6
- 101000605181 Homo sapiens Immunoglobulin kappa variable 5-2 Proteins 0.000 claims description 6
- 101001009875 Homo sapiens Probable non-functional immunoglobulin kappa variable 6D-41 Proteins 0.000 claims description 6
- 101001138120 Homo sapiens Probable non-functional immunoglobulinn kappa variable 1-37 Proteins 0.000 claims description 6
- 102100027406 Immunoglobulin kappa variable 1D-8 Human genes 0.000 claims description 6
- 102100038241 Immunoglobulin kappa variable 5-2 Human genes 0.000 claims description 6
- 102100030885 Probable non-functional immunoglobulin kappa variable 6D-41 Human genes 0.000 claims description 6
- 102100020904 Probable non-functional immunoglobulinn kappa variable 1-37 Human genes 0.000 claims description 6
- 210000000987 immune system Anatomy 0.000 abstract description 9
- 238000009510 drug design Methods 0.000 abstract description 4
- 235000001014 amino acid Nutrition 0.000 description 305
- 229940024606 amino acid Drugs 0.000 description 305
- 239000013598 vector Substances 0.000 description 121
- 239000000203 mixture Substances 0.000 description 88
- 108091034117 Oligonucleotide Proteins 0.000 description 85
- 238000013461 design Methods 0.000 description 65
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 63
- 238000009826 distribution Methods 0.000 description 53
- 238000003786 synthesis reaction Methods 0.000 description 51
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 50
- 230000015572 biosynthetic process Effects 0.000 description 50
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 49
- 101100112922 Candida albicans CDR3 gene Proteins 0.000 description 48
- 125000003729 nucleotide group Chemical group 0.000 description 45
- 239000002773 nucleotide Substances 0.000 description 44
- 108020004705 Codon Proteins 0.000 description 42
- 238000002744 homologous recombination Methods 0.000 description 40
- 230000006801 homologous recombination Effects 0.000 description 40
- 230000027455 binding Effects 0.000 description 39
- 101001055307 Homo sapiens Immunoglobulin heavy constant delta Proteins 0.000 description 38
- 230000000875 corresponding effect Effects 0.000 description 34
- 238000012217 deletion Methods 0.000 description 34
- 230000037430 deletion Effects 0.000 description 34
- 108090000765 processed proteins & peptides Proteins 0.000 description 33
- 108010047041 Complementarity Determining Regions Proteins 0.000 description 31
- 102100026211 Immunoglobulin heavy constant delta Human genes 0.000 description 29
- 230000035772 mutation Effects 0.000 description 28
- 210000005259 peripheral blood Anatomy 0.000 description 28
- 239000011886 peripheral blood Substances 0.000 description 28
- 238000013459 approach Methods 0.000 description 27
- 102000004169 proteins and genes Human genes 0.000 description 26
- 208000014384 isolated congenital growth hormone deficiency Diseases 0.000 description 25
- 201000002022 isolated growth hormone deficiency Diseases 0.000 description 25
- 238000004458 analytical method Methods 0.000 description 24
- 102000004196 processed proteins & peptides Human genes 0.000 description 24
- 235000018102 proteins Nutrition 0.000 description 24
- 239000012634 fragment Substances 0.000 description 23
- 230000000670 limiting effect Effects 0.000 description 22
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 21
- 102100033215 DNA nucleotidylexotransferase Human genes 0.000 description 21
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 21
- 229920001184 polypeptide Polymers 0.000 description 21
- 239000000047 product Substances 0.000 description 21
- 101150069296 IGHD gene Proteins 0.000 description 19
- 108010021625 Immunoglobulin Fragments Proteins 0.000 description 19
- 150000007523 nucleic acids Chemical class 0.000 description 19
- 230000008569 process Effects 0.000 description 19
- 230000000750 progressive effect Effects 0.000 description 19
- 229910052717 sulfur Inorganic materials 0.000 description 19
- 108020004414 DNA Proteins 0.000 description 18
- 102000008394 Immunoglobulin Fragments Human genes 0.000 description 18
- 238000007792 addition Methods 0.000 description 18
- 238000006243 chemical reaction Methods 0.000 description 17
- 230000006798 recombination Effects 0.000 description 17
- 238000005215 recombination Methods 0.000 description 17
- 102000039446 nucleic acids Human genes 0.000 description 16
- 108020004707 nucleic acids Proteins 0.000 description 16
- 229910052698 phosphorus Inorganic materials 0.000 description 16
- YMWUJEATGCHHMB-UHFFFAOYSA-N Dichloromethane Chemical compound ClCCl YMWUJEATGCHHMB-UHFFFAOYSA-N 0.000 description 15
- 230000004988 N-glycosylation Effects 0.000 description 15
- 238000012408 PCR amplification Methods 0.000 description 15
- 238000012216 screening Methods 0.000 description 15
- 238000002703 mutagenesis Methods 0.000 description 14
- 231100000350 mutagenesis Toxicity 0.000 description 14
- 239000011347 resin Substances 0.000 description 14
- 229920005989 resin Polymers 0.000 description 14
- 108091028043 Nucleic acid sequence Proteins 0.000 description 12
- 101150102092 ccdB gene Proteins 0.000 description 12
- 230000000694 effects Effects 0.000 description 12
- 230000002998 immunogenetic effect Effects 0.000 description 12
- 238000003780 insertion Methods 0.000 description 12
- 230000037431 insertion Effects 0.000 description 12
- 230000004048 modification Effects 0.000 description 12
- 238000012986 modification Methods 0.000 description 12
- 101001037147 Homo sapiens Immunoglobulin heavy variable 1-69 Proteins 0.000 description 11
- 102100040232 Immunoglobulin heavy variable 1-69 Human genes 0.000 description 11
- 230000003321 amplification Effects 0.000 description 11
- 238000010367 cloning Methods 0.000 description 11
- 210000004962 mammalian cell Anatomy 0.000 description 11
- 230000003278 mimic effect Effects 0.000 description 11
- 238000003199 nucleic acid amplification method Methods 0.000 description 11
- 238000002515 oligonucleotide synthesis Methods 0.000 description 11
- 239000013600 plasmid vector Substances 0.000 description 11
- 230000009261 transgenic effect Effects 0.000 description 11
- 102000053602 DNA Human genes 0.000 description 10
- 108060003951 Immunoglobulin Proteins 0.000 description 10
- 239000003795 chemical substances by application Substances 0.000 description 10
- 229910052739 hydrogen Inorganic materials 0.000 description 10
- 102000018358 immunoglobulin Human genes 0.000 description 10
- 230000002441 reversible effect Effects 0.000 description 10
- 241000282412 Homo Species 0.000 description 9
- 239000008186 active pharmaceutical agent Substances 0.000 description 9
- 238000010276 construction Methods 0.000 description 9
- 238000001727 in vivo Methods 0.000 description 9
- 239000003550 marker Substances 0.000 description 9
- 239000013612 plasmid Substances 0.000 description 9
- 241000894006 Bacteria Species 0.000 description 8
- 108700005091 Immunoglobulin Genes Proteins 0.000 description 8
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 8
- 238000002823 phage display Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 7
- 101001091242 Homo sapiens Immunoglobulin kappa joining 1 Proteins 0.000 description 7
- 102100034892 Immunoglobulin kappa joining 1 Human genes 0.000 description 7
- 229910052757 nitrogen Inorganic materials 0.000 description 7
- 230000002194 synthesizing effect Effects 0.000 description 7
- 108700028369 Alleles Proteins 0.000 description 6
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 6
- 101000843762 Homo sapiens Immunoglobulin heavy diversity 1-1 Proteins 0.000 description 6
- 101000998950 Homo sapiens Immunoglobulin heavy variable 1-18 Proteins 0.000 description 6
- 101000998953 Homo sapiens Immunoglobulin heavy variable 1-2 Proteins 0.000 description 6
- 101000998947 Homo sapiens Immunoglobulin heavy variable 1-46 Proteins 0.000 description 6
- 101000839686 Homo sapiens Immunoglobulin heavy variable 4-4 Proteins 0.000 description 6
- 102100030644 Immunoglobulin heavy diversity 1-1 Human genes 0.000 description 6
- 102100036884 Immunoglobulin heavy variable 1-18 Human genes 0.000 description 6
- 102100036887 Immunoglobulin heavy variable 1-2 Human genes 0.000 description 6
- 102100036888 Immunoglobulin heavy variable 1-46 Human genes 0.000 description 6
- 102100028308 Immunoglobulin heavy variable 4-4 Human genes 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 6
- 125000000151 cysteine group Chemical group N[C@@H](CS)C(=O)* 0.000 description 6
- BRZYSWJRSDMWLG-CAXSIQPQSA-N geneticin Chemical compound O1C[C@@](O)(C)[C@H](NC)[C@@H](O)[C@H]1O[C@@H]1[C@@H](O)[C@H](O[C@@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H](C(C)O)O2)N)[C@@H](N)C[C@H]1N BRZYSWJRSDMWLG-CAXSIQPQSA-N 0.000 description 6
- 238000005304 joining Methods 0.000 description 6
- 238000002702 ribosome display Methods 0.000 description 6
- 241000588724 Escherichia coli Species 0.000 description 5
- 101001037145 Homo sapiens Immunoglobulin heavy variable 2-5 Proteins 0.000 description 5
- 101001037136 Homo sapiens Immunoglobulin heavy variable 3-15 Proteins 0.000 description 5
- 101001037140 Homo sapiens Immunoglobulin heavy variable 3-23 Proteins 0.000 description 5
- 101001037139 Homo sapiens Immunoglobulin heavy variable 3-30 Proteins 0.000 description 5
- 101001037143 Homo sapiens Immunoglobulin heavy variable 3-33 Proteins 0.000 description 5
- 101001037153 Homo sapiens Immunoglobulin heavy variable 3-7 Proteins 0.000 description 5
- 101000839684 Homo sapiens Immunoglobulin heavy variable 4-31 Proteins 0.000 description 5
- 101000839682 Homo sapiens Immunoglobulin heavy variable 4-34 Proteins 0.000 description 5
- 101000839679 Homo sapiens Immunoglobulin heavy variable 4-39 Proteins 0.000 description 5
- 101000989076 Homo sapiens Immunoglobulin heavy variable 4-61 Proteins 0.000 description 5
- 101000989062 Homo sapiens Immunoglobulin heavy variable 5-51 Proteins 0.000 description 5
- 101000989060 Homo sapiens Immunoglobulin heavy variable 6-1 Proteins 0.000 description 5
- 102100040235 Immunoglobulin heavy variable 2-5 Human genes 0.000 description 5
- 102100040224 Immunoglobulin heavy variable 3-15 Human genes 0.000 description 5
- 102100040220 Immunoglobulin heavy variable 3-23 Human genes 0.000 description 5
- 102100040219 Immunoglobulin heavy variable 3-30 Human genes 0.000 description 5
- 102100040236 Immunoglobulin heavy variable 3-33 Human genes 0.000 description 5
- 102100040231 Immunoglobulin heavy variable 3-7 Human genes 0.000 description 5
- 102100028310 Immunoglobulin heavy variable 4-31 Human genes 0.000 description 5
- 102100028306 Immunoglobulin heavy variable 4-34 Human genes 0.000 description 5
- 102100028312 Immunoglobulin heavy variable 4-39 Human genes 0.000 description 5
- 102100029419 Immunoglobulin heavy variable 4-61 Human genes 0.000 description 5
- 102100029414 Immunoglobulin heavy variable 5-51 Human genes 0.000 description 5
- 102100029416 Immunoglobulin heavy variable 6-1 Human genes 0.000 description 5
- 108700026244 Open Reading Frames Proteins 0.000 description 5
- 101100281642 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FRM2 gene Proteins 0.000 description 5
- 108020004682 Single-Stranded DNA Proteins 0.000 description 5
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 5
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 5
- 230000009824 affinity maturation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000029087 digestion Effects 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 230000002163 immunogen Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 108091008146 restriction endonucleases Proteins 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 235000019333 sodium laurylsulphate Nutrition 0.000 description 5
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 4
- 108091033380 Coding strand Proteins 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 4
- 101000998949 Homo sapiens Immunoglobulin heavy variable 1-24 Proteins 0.000 description 4
- 101000998952 Homo sapiens Immunoglobulin heavy variable 1-3 Proteins 0.000 description 4
- 101000998948 Homo sapiens Immunoglobulin heavy variable 1-45 Proteins 0.000 description 4
- 101001002931 Homo sapiens Immunoglobulin heavy variable 1-58 Proteins 0.000 description 4
- 101001037152 Homo sapiens Immunoglobulin heavy variable 2-26 Proteins 0.000 description 4
- 101001037151 Homo sapiens Immunoglobulin heavy variable 2-70 Proteins 0.000 description 4
- 101001037138 Homo sapiens Immunoglobulin heavy variable 3-11 Proteins 0.000 description 4
- 101001037137 Homo sapiens Immunoglobulin heavy variable 3-13 Proteins 0.000 description 4
- 101001037142 Homo sapiens Immunoglobulin heavy variable 3-20 Proteins 0.000 description 4
- 101001037141 Homo sapiens Immunoglobulin heavy variable 3-21 Proteins 0.000 description 4
- 101000839665 Homo sapiens Immunoglobulin heavy variable 3-43 Proteins 0.000 description 4
- 101000839662 Homo sapiens Immunoglobulin heavy variable 3-48 Proteins 0.000 description 4
- 101000839663 Homo sapiens Immunoglobulin heavy variable 3-49 Proteins 0.000 description 4
- 101000839660 Homo sapiens Immunoglobulin heavy variable 3-53 Proteins 0.000 description 4
- 101000839661 Homo sapiens Immunoglobulin heavy variable 3-64 Proteins 0.000 description 4
- 101000839658 Homo sapiens Immunoglobulin heavy variable 3-66 Proteins 0.000 description 4
- 101000839659 Homo sapiens Immunoglobulin heavy variable 3-72 Proteins 0.000 description 4
- 101000839657 Homo sapiens Immunoglobulin heavy variable 3-73 Proteins 0.000 description 4
- 101000839687 Homo sapiens Immunoglobulin heavy variable 3-74 Proteins 0.000 description 4
- 101001037144 Homo sapiens Immunoglobulin heavy variable 3-9 Proteins 0.000 description 4
- 101000839683 Homo sapiens Immunoglobulin heavy variable 4-28 Proteins 0.000 description 4
- 101001077587 Homo sapiens Immunoglobulin heavy variable 4-38-2 Proteins 0.000 description 4
- 101000839781 Homo sapiens Immunoglobulin heavy variable 4-59 Proteins 0.000 description 4
- 101000989065 Homo sapiens Immunoglobulin heavy variable 7-4-1 Proteins 0.000 description 4
- 102000006496 Immunoglobulin Heavy Chains Human genes 0.000 description 4
- 108010019476 Immunoglobulin Heavy Chains Proteins 0.000 description 4
- 108010067060 Immunoglobulin Variable Region Proteins 0.000 description 4
- 102100036890 Immunoglobulin heavy variable 1-24 Human genes 0.000 description 4
- 102100036886 Immunoglobulin heavy variable 1-3 Human genes 0.000 description 4
- 102100036889 Immunoglobulin heavy variable 1-45 Human genes 0.000 description 4
- 102100020774 Immunoglobulin heavy variable 1-58 Human genes 0.000 description 4
- 102100040230 Immunoglobulin heavy variable 2-26 Human genes 0.000 description 4
- 102100040233 Immunoglobulin heavy variable 2-70 Human genes 0.000 description 4
- 102100040222 Immunoglobulin heavy variable 3-11 Human genes 0.000 description 4
- 102100040221 Immunoglobulin heavy variable 3-13 Human genes 0.000 description 4
- 102100040218 Immunoglobulin heavy variable 3-20 Human genes 0.000 description 4
- 102100040217 Immunoglobulin heavy variable 3-21 Human genes 0.000 description 4
- 102100028315 Immunoglobulin heavy variable 3-43 Human genes 0.000 description 4
- 102100028320 Immunoglobulin heavy variable 3-48 Human genes 0.000 description 4
- 102100028319 Immunoglobulin heavy variable 3-49 Human genes 0.000 description 4
- 102100028317 Immunoglobulin heavy variable 3-53 Human genes 0.000 description 4
- 102100028321 Immunoglobulin heavy variable 3-64 Human genes 0.000 description 4
- 102100027821 Immunoglobulin heavy variable 3-66 Human genes 0.000 description 4
- 102100027820 Immunoglobulin heavy variable 3-72 Human genes 0.000 description 4
- 102100027822 Immunoglobulin heavy variable 3-73 Human genes 0.000 description 4
- 102100028305 Immunoglobulin heavy variable 3-74 Human genes 0.000 description 4
- 102100040234 Immunoglobulin heavy variable 3-9 Human genes 0.000 description 4
- 102100028311 Immunoglobulin heavy variable 4-28 Human genes 0.000 description 4
- 102100025114 Immunoglobulin heavy variable 4-38-2 Human genes 0.000 description 4
- 102100028405 Immunoglobulin heavy variable 4-59 Human genes 0.000 description 4
- 102100029420 Immunoglobulin heavy variable 7-4-1 Human genes 0.000 description 4
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 4
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 4
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 4
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 4
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 4
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 4
- 238000000137 annealing Methods 0.000 description 4
- 230000004071 biological effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 239000002299 complementary DNA Substances 0.000 description 4
- 239000000539 dimer Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 229940088598 enzyme Drugs 0.000 description 4
- 239000013604 expression vector Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000000338 in vitro Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 210000001948 pro-b lymphocyte Anatomy 0.000 description 4
- 238000002818 protein evolution Methods 0.000 description 4
- 230000008439 repair process Effects 0.000 description 4
- 238000002864 sequence alignment Methods 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- 230000000392 somatic effect Effects 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000014616 translation Effects 0.000 description 4
- 239000013638 trimer Substances 0.000 description 4
- 108091093088 Amplicon Proteins 0.000 description 3
- 208000023275 Autoimmune disease Diseases 0.000 description 3
- 102100035355 Cadherin-related family member 3 Human genes 0.000 description 3
- BCCRXDTUTZHDEU-VKHMYHEASA-N Gly-Ser Chemical compound NCC(=O)N[C@@H](CO)C(O)=O BCCRXDTUTZHDEU-VKHMYHEASA-N 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 101000737802 Homo sapiens Cadherin-related family member 3 Proteins 0.000 description 3
- 101000998951 Homo sapiens Immunoglobulin heavy variable 1-8 Proteins 0.000 description 3
- 102000017727 Immunoglobulin Variable Region Human genes 0.000 description 3
- 102100036885 Immunoglobulin heavy variable 1-8 Human genes 0.000 description 3
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 3
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 3
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 3
- 125000000729 N-terminal amino-acid group Chemical group 0.000 description 3
- XZKQVQKUZMAADP-IMJSIDKUSA-N Ser-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(O)=O XZKQVQKUZMAADP-IMJSIDKUSA-N 0.000 description 3
- NHUHCSRWZMLRLA-UHFFFAOYSA-N Sulfisoxazole Chemical compound CC1=NOC(NS(=O)(=O)C=2C=CC(N)=CC=2)=C1C NHUHCSRWZMLRLA-UHFFFAOYSA-N 0.000 description 3
- ZSXJENBJGRHKIG-UWVGGRQHSA-N Tyr-Ser Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=C(O)C=C1 ZSXJENBJGRHKIG-UWVGGRQHSA-N 0.000 description 3
- JAQGKXUEKGKTKX-HOTGVXAUSA-N Tyr-Tyr Chemical compound C([C@H](N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)C1=CC=C(O)C=C1 JAQGKXUEKGKTKX-HOTGVXAUSA-N 0.000 description 3
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 3
- 230000004075 alteration Effects 0.000 description 3
- 230000003115 biocidal effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012219 cassette mutagenesis Methods 0.000 description 3
- 238000004113 cell culture Methods 0.000 description 3
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 3
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000001900 immune effect Effects 0.000 description 3
- 108020004999 messenger RNA Proteins 0.000 description 3
- 239000000178 monomer Substances 0.000 description 3
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 3
- 230000007935 neutral effect Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000004481 post-translational protein modification Effects 0.000 description 3
- 230000028327 secretion Effects 0.000 description 3
- 238000002741 site-directed mutagenesis Methods 0.000 description 3
- 239000002002 slurry Substances 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 108010003137 tyrosyltyrosine Proteins 0.000 description 3
- MZOFCQQQCNRIBI-VMXHOPILSA-N (3s)-4-[[(2s)-1-[[(2s)-1-[[(1s)-1-carboxy-2-hydroxyethyl]amino]-4-methyl-1-oxopentan-2-yl]amino]-5-(diaminomethylideneamino)-1-oxopentan-2-yl]amino]-3-[[2-[[(2s)-2,6-diaminohexanoyl]amino]acetyl]amino]-4-oxobutanoic acid Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN MZOFCQQQCNRIBI-VMXHOPILSA-N 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 108020005065 3' Flanking Region Proteins 0.000 description 2
- 108020005029 5' Flanking Region Proteins 0.000 description 2
- QCVGEOXPDFCNHA-UHFFFAOYSA-N 5,5-dimethyl-2,4-dioxo-1,3-oxazolidine-3-carboxamide Chemical compound CC1(C)OC(=O)N(C(N)=O)C1=O QCVGEOXPDFCNHA-UHFFFAOYSA-N 0.000 description 2
- 206010069754 Acquired gene mutation Diseases 0.000 description 2
- 229920001817 Agar Polymers 0.000 description 2
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 2
- 101100165819 Arabidopsis thaliana LHCA1 gene Proteins 0.000 description 2
- 239000004475 Arginine Substances 0.000 description 2
- 125000001433 C-terminal amino-acid group Chemical group 0.000 description 2
- 101150010084 CAB13 gene Proteins 0.000 description 2
- 101150061750 CAB5 gene Proteins 0.000 description 2
- 101150112060 CAB7 gene Proteins 0.000 description 2
- 101150055457 CAB8 gene Proteins 0.000 description 2
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 2
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 2
- 101150097493 D gene Proteins 0.000 description 2
- 230000006820 DNA synthesis Effects 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 102000002322 Egg Proteins Human genes 0.000 description 2
- 108010000912 Egg Proteins Proteins 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- XBGGUPMXALFZOT-VIFPVBQESA-N Gly-Tyr Chemical compound NCC(=O)N[C@H](C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-VIFPVBQESA-N 0.000 description 2
- 239000004471 Glycine Substances 0.000 description 2
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 2
- 101001079285 Homo sapiens Immunoglobulin heavy joining 1 Proteins 0.000 description 2
- 101000662009 Homo sapiens UDP-N-acetylglucosamine pyrophosphorylase Proteins 0.000 description 2
- 102000013463 Immunoglobulin Light Chains Human genes 0.000 description 2
- 108010065825 Immunoglobulin Light Chains Proteins 0.000 description 2
- 102100028078 Immunoglobulin heavy joining 1 Human genes 0.000 description 2
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 description 2
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 2
- 125000000393 L-methionino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])C([H])([H])C(SC([H])([H])[H])([H])[H] 0.000 description 2
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 2
- 102000016943 Muramidase Human genes 0.000 description 2
- 108010014251 Muramidase Proteins 0.000 description 2
- 108010062010 N-Acetylmuramoyl-L-alanine Amidase Proteins 0.000 description 2
- ISWSIDIOOBJBQZ-UHFFFAOYSA-N Phenol Chemical compound OC1=CC=CC=C1 ISWSIDIOOBJBQZ-UHFFFAOYSA-N 0.000 description 2
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- WOUIMBGNEUWXQG-VKHMYHEASA-N Ser-Gly Chemical compound OC[C@H](N)C(=O)NCC(O)=O WOUIMBGNEUWXQG-VKHMYHEASA-N 0.000 description 2
- 210000001744 T-lymphocyte Anatomy 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 2
- 102000000852 Tumor Necrosis Factor-alpha Human genes 0.000 description 2
- HPYDSVWYXXKHRD-VIFPVBQESA-N Tyr-Gly Chemical compound [O-]C(=O)CNC(=O)[C@@H]([NH3+])CC1=CC=C(O)C=C1 HPYDSVWYXXKHRD-VIFPVBQESA-N 0.000 description 2
- 102100037921 UDP-N-acetylglucosamine pyrophosphorylase Human genes 0.000 description 2
- 102100025568 Voltage-dependent L-type calcium channel subunit beta-1 Human genes 0.000 description 2
- 101710176690 Voltage-dependent L-type calcium channel subunit beta-1 Proteins 0.000 description 2
- 102100025807 Voltage-dependent L-type calcium channel subunit beta-2 Human genes 0.000 description 2
- 101710176691 Voltage-dependent L-type calcium channel subunit beta-2 Proteins 0.000 description 2
- 102100025838 Voltage-dependent L-type calcium channel subunit beta-3 Human genes 0.000 description 2
- 101710176707 Voltage-dependent L-type calcium channel subunit beta-3 Proteins 0.000 description 2
- 102100025836 Voltage-dependent L-type calcium channel subunit beta-4 Human genes 0.000 description 2
- 101710176693 Voltage-dependent L-type calcium channel subunit beta-4 Proteins 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 239000008272 agar Substances 0.000 description 2
- 230000006229 amino acid addition Effects 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 125000000837 carbohydrate group Chemical group 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 235000004879 dioscorea Nutrition 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 235000014103 egg white Nutrition 0.000 description 2
- 210000000969 egg white Anatomy 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 210000003754 fetus Anatomy 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 239000000499 gel Substances 0.000 description 2
- XBGGUPMXALFZOT-UHFFFAOYSA-N glycyl-L-tyrosine hemihydrate Natural products NCC(=O)NC(C(O)=O)CC1=CC=C(O)C=C1 XBGGUPMXALFZOT-UHFFFAOYSA-N 0.000 description 2
- 108010087823 glycyltyrosine Proteins 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 210000004408 hybridoma Anatomy 0.000 description 2
- 229940072221 immunoglobulins Drugs 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 229960000274 lysozyme Drugs 0.000 description 2
- 239000004325 lysozyme Substances 0.000 description 2
- 235000010335 lysozyme Nutrition 0.000 description 2
- 239000002609 medium Substances 0.000 description 2
- 235000016709 nutrition Nutrition 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 2
- 230000004850 protein–protein interaction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 210000003705 ribosome Anatomy 0.000 description 2
- 102200033974 rs1555427497 Human genes 0.000 description 2
- 230000037439 somatic mutation Effects 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 108020001568 subdomains Proteins 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000010396 two-hybrid screening Methods 0.000 description 2
- 239000010856 very low level radioactive waste Substances 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- RQAFMLCWWGDNLI-UHFFFAOYSA-N 2-[4-[bis(2-chloroethyl)amino]phenyl]acetic acid Chemical compound OC(=O)CC1=CC=C(N(CCCl)CCCl)C=C1 RQAFMLCWWGDNLI-UHFFFAOYSA-N 0.000 description 1
- TVZRAEYQIKYCPH-UHFFFAOYSA-N 3-(trimethylsilyl)propane-1-sulfonic acid Chemical compound C[Si](C)(C)CCCS(O)(=O)=O TVZRAEYQIKYCPH-UHFFFAOYSA-N 0.000 description 1
- 101100295756 Acinetobacter baumannii (strain ATCC 19606 / DSM 30007 / JCM 6841 / CCUG 19606 / CIP 70.34 / NBRC 109757 / NCIMB 12457 / NCTC 12156 / 81) omp38 gene Proteins 0.000 description 1
- 102000054930 Agouti-Related Human genes 0.000 description 1
- 102000008102 Ankyrins Human genes 0.000 description 1
- 108010049777 Ankyrins Proteins 0.000 description 1
- 241001156002 Anthonomus pomorum Species 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 101100136076 Aspergillus oryzae (strain ATCC 42149 / RIB 40) pel1 gene Proteins 0.000 description 1
- 108091008875 B cell receptors Proteins 0.000 description 1
- 102000016605 B-Cell Activating Factor Human genes 0.000 description 1
- 108010028006 B-Cell Activating Factor Proteins 0.000 description 1
- 102100022005 B-lymphocyte antigen CD20 Human genes 0.000 description 1
- 101000984722 Bos taurus Pancreatic trypsin inhibitor Proteins 0.000 description 1
- 231100000023 Cell-mediated cytotoxicity Toxicity 0.000 description 1
- 206010057250 Cell-mediated cytotoxicity Diseases 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 241000699802 Cricetulus griseus Species 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108700022150 Designed Ankyrin Repeat Proteins Proteins 0.000 description 1
- BWGNESOTFCXPMA-UHFFFAOYSA-N Dihydrogen disulfide Chemical compound SS BWGNESOTFCXPMA-UHFFFAOYSA-N 0.000 description 1
- 108010016626 Dipeptides Proteins 0.000 description 1
- 101000609473 Ecballium elaterium Trypsin inhibitor 2 Proteins 0.000 description 1
- 241001524679 Escherichia virus M13 Species 0.000 description 1
- 102100037362 Fibronectin Human genes 0.000 description 1
- 108010067306 Fibronectins Proteins 0.000 description 1
- 108010058643 Fungal Proteins Proteins 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 102100032518 Gamma-crystallin B Human genes 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 102100039856 Histone H1.1 Human genes 0.000 description 1
- 102100039855 Histone H1.2 Human genes 0.000 description 1
- 101000897405 Homo sapiens B-lymphocyte antigen CD20 Proteins 0.000 description 1
- 101001035402 Homo sapiens Histone H1.1 Proteins 0.000 description 1
- 101001035375 Homo sapiens Histone H1.2 Proteins 0.000 description 1
- 101001008317 Homo sapiens Immunoglobulin kappa variable 6D-21 Proteins 0.000 description 1
- 101000742373 Homo sapiens Vesicular inhibitory amino acid transporter Proteins 0.000 description 1
- 108010054477 Immunoglobulin Fab Fragments Proteins 0.000 description 1
- 102000001706 Immunoglobulin Fab Fragments Human genes 0.000 description 1
- 102000012745 Immunoglobulin Subunits Human genes 0.000 description 1
- 108010079585 Immunoglobulin Subunits Proteins 0.000 description 1
- 102100027402 Immunoglobulin kappa variable 6D-21 Human genes 0.000 description 1
- 101150008942 J gene Proteins 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- 125000000998 L-alanino group Chemical group [H]N([*])[C@](C([H])([H])[H])([H])C(=O)O[H] 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- 125000000174 L-prolyl group Chemical group [H]N1C([H])([H])C([H])([H])C([H])([H])[C@@]1([H])C(*)=O 0.000 description 1
- 125000000510 L-tryptophano group Chemical group [H]C1=C([H])C([H])=C2N([H])C([H])=C(C([H])([H])[C@@]([H])(C(O[H])=O)N([H])[*])C2=C1[H] 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 102000019298 Lipocalin Human genes 0.000 description 1
- 108050006654 Lipocalin Proteins 0.000 description 1
- 101000680845 Luffa aegyptiaca Ribosome-inactivating protein luffin P1 Proteins 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 108010021466 Mutant Proteins Proteins 0.000 description 1
- 102000008300 Mutant Proteins Human genes 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 108010079855 Peptide Aptamers Proteins 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 101100084022 Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) lapA gene Proteins 0.000 description 1
- 102000014128 RANK Ligand Human genes 0.000 description 1
- 108010025832 RANK Ligand Proteins 0.000 description 1
- 102220472894 Receptor-type tyrosine-protein phosphatase beta_R94K_mutation Human genes 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 241000235343 Saccharomycetales Species 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 102100024554 Tetranectin Human genes 0.000 description 1
- 102000002933 Thioredoxin Human genes 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- 239000007984 Tris EDTA buffer Substances 0.000 description 1
- 108090000848 Ubiquitin Proteins 0.000 description 1
- 102000044159 Ubiquitin Human genes 0.000 description 1
- 101150117115 V gene Proteins 0.000 description 1
- 241000700618 Vaccinia virus Species 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 102100038170 Vesicular inhibitory amino acid transporter Human genes 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000001042 affinity chromatography Methods 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000005875 antibody response Effects 0.000 description 1
- 230000007503 antigenic stimulation Effects 0.000 description 1
- 101150042295 arfA gene Proteins 0.000 description 1
- 125000000637 arginyl group Chemical group N[C@@H](CCCNC(N)=N)C(=O)* 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- 238000002819 bacterial display Methods 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 238000004166 bioassay Methods 0.000 description 1
- 229920001222 biopolymer Polymers 0.000 description 1
- 230000006287 biotinylation Effects 0.000 description 1
- 238000007413 biotinylation Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 101150014721 cdr gene Proteins 0.000 description 1
- 230000005890 cell-mediated cytotoxicity Effects 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000013377 clone selection method Methods 0.000 description 1
- 239000013599 cloning vector Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000024203 complement activation Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 235000018417 cysteine Nutrition 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 1
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 1
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 1
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000002050 diffraction method Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 108010069898 fibrinogen fragment X Proteins 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 230000005714 functional activity Effects 0.000 description 1
- 238000011207 functional examination Methods 0.000 description 1
- 108010083914 gammaB crystallin Proteins 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 230000008303 genetic mechanism Effects 0.000 description 1
- 230000008826 genomic mutation Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 239000012216 imaging agent Substances 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000003018 immunoassay Methods 0.000 description 1
- 230000005847 immunogenicity Effects 0.000 description 1
- 238000005462 in vivo assay Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000002898 library design Methods 0.000 description 1
- 230000029226 lipidation Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- MYWUZJCMWCOHBA-VIFPVBQESA-N methamphetamine Chemical compound CN[C@@H](C)CC1=CC=CC=C1 MYWUZJCMWCOHBA-VIFPVBQESA-N 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 201000000050 myeloid neoplasm Diseases 0.000 description 1
- 238000001668 nucleic acid synthesis Methods 0.000 description 1
- 230000005257 nucleotidylation Effects 0.000 description 1
- 101150087557 omcB gene Proteins 0.000 description 1
- 101150115693 ompA gene Proteins 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 101150040383 pel2 gene Proteins 0.000 description 1
- 101150050446 pelB gene Proteins 0.000 description 1
- 210000001322 periplasm Anatomy 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 101150009573 phoA gene Proteins 0.000 description 1
- 238000007747 plating Methods 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000006916 protein interaction Effects 0.000 description 1
- 230000002797 proteolythic effect Effects 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000002708 random mutagenesis Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000007423 screening assay Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000001509 sodium citrate Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000012289 standard assay Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000002198 surface plasmon resonance spectroscopy Methods 0.000 description 1
- 108010013645 tetranectin Proteins 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 description 1
- 108060008226 thioredoxin Proteins 0.000 description 1
- 229940094937 thioredoxin Drugs 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- HRXKRNGNAMMEHJ-UHFFFAOYSA-K trisodium citrate Chemical compound [Na+].[Na+].[Na+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O HRXKRNGNAMMEHJ-UHFFFAOYSA-K 0.000 description 1
- 229940038773 trisodium citrate Drugs 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/18—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against material from animals or humans
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/005—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies constructed by phage libraries
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K16/00—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies
- C07K16/40—Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies against enzymes
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2317/00—Immunoglobulins specific features
- C07K2317/20—Immunoglobulins specific features characterized by taxonomic origin
- C07K2317/21—Immunoglobulins specific features characterized by taxonomic origin from primates, e.g. man
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2317/00—Immunoglobulins specific features
- C07K2317/50—Immunoglobulins specific features characterized by immunoglobulin fragments
- C07K2317/56—Immunoglobulins specific features characterized by immunoglobulin fragments variable (Fv) region, i.e. VH and/or VL
- C07K2317/565—Complementarity determining region [CDR]
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2317/00—Immunoglobulins specific features
- C07K2317/50—Immunoglobulins specific features characterized by immunoglobulin fragments
- C07K2317/56—Immunoglobulins specific features characterized by immunoglobulin fragments variable (Fv) region, i.e. VH and/or VL
- C07K2317/567—Framework region [FR]
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/10—Libraries containing peptides or polypeptides, or derivatives thereof
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Definitions
- Antibodies have profound relevance as research tools and in diagnostic and therapeutic applications. However, the identification of useful antibodies is difficult and once identified, antibodies often require considerable redesign or ‘humanization’ before they are suitable for therapeutic applications.
- a physical realization of a library capable of screening 10 12 library members will only sample about 10% of the sequences contained in a library with 10 13 members.
- a median CDRH3 length of about 12.7 amino acids Rock et al., J. Exp. Med., 1994, 179:323-328
- the number of theoretical sequence variants in CDRH3 alone is about 20 12.7 , or about 3.3 ⁇ 10 16 variants. This number does not account for known variation that occurs in CDRH1 and CDRH2, heavy chain framework regions, and pairing with different light chains, each of which also exhibit variation in their respective CDRL1, CDRL2, and CDRL3.
- the antibodies isolated from these libraries are often not amenable to rational affinity maturation techniques to improve the binding of the candidate molecule.
- the present invention is relates to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention.
- the libraries of the invention are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences. It will be appreciated that certain non-limiting embodiments of the invention are described below. As described throughout the specification, the invention encompasses many other embodiments as well.
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least about 10 6 unique antibody CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:
- the invention comprises wherein said library consists essentially of a plurality of polynucleotides encoding CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode one or more full length antibody heavy chain sequences, and wherein the CDRH3 amino acid sequences of the heavy chain comprise:
- one or more CDRH3 amino acid sequences further comprise an N-terminal tail residue.
- the N-terminal tail residue is selected from the group consisting of G, D, and E.
- the N1 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT,
- the N2 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT,
- the H3-JH amino acid sequence is selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), Y
- the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i ⁇ i+1 pairs in the library is within the ranges specified below:
- the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i ⁇ i+2 pairs in the library is within the ranges specified below:
- the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i ⁇ i+3 pairs in the library is within the ranges specified below:
- At least 2, 3, 4, 5, 6, or 7 of the specified i ⁇ i+1 pairs in the library are within the specified ranges.
- the CDRH3 amino acid sequences are human.
- the polynucleotides encode at least about 10 6 unique CDRH3 amino acid sequences.
- the polynucleotides further encode one or more heavy chain chassis amino acid sequences that are N-terminal to the CDRH3 amino acid sequences, and the one or more heavy chain chassis sequences are selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 94 encoded by IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NOs: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 24),
- the polynucleotides further encode one or more FRM4 amino acid sequences that are C-terminal to the CDRH3 amino acid sequences, wherein the one or more FRM4 amino acid sequences are selected from the group consisting of a FRM4 amino acid sequence encoded by IGHJ1 (SEQ ID NO: 253), IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: 255), IGHJ4 (SEQ ID NO: 256), IGHJ5 (SEQ ID NO: 257), and IGHJ6 (SEQ ID NO: 257), or a sequence of at least about 80% identity to any of them.
- the polynucleotides further encode one or more immunoglobulin heavy chain constant region amino acid sequences that are C-terminal to the FRM4 sequence.
- the CDRH3 amino acid sequences are expressed as part of full-length heavy chains.
- the full-length heavy chains are selected from the group consisting of an IgG1, IgG2, IgG3, and IgG4, or combinations thereof.
- the CDRH3 amino acid sequences are from about 2 to about 30, from about 8 to about 19, or from about 10 to about 18 amino acid residues in length.
- the synthetic polynucleotides of the library encode from about 10 6 to about 10 14 , from about 10 7 to about 10 13 , from about 10 8 to about 10 12 , from about 10 9 to about 10 12 , or from about 10 10 to about 10 12 unique CDRH3 amino acid sequences.
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody VKCDR3 amino acid sequences comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 amino acid sequences derived from a particular IGKV or IGKJ germline sequence.
- the synthetic polynucleotides encode one or more of the amino acid sequences listed in Table 33 or a sequence at least about 80% identical to any of them.
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of unique antibody VKCDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:
- X may be selected from the group consisting of F, L, I, R, W, Y, and P.
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of V ⁇ CDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:
- the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:
- the VKCDR3 amino acid sequence comprises one or more of the sequences listed in Table 33 or a sequence at least about 80% identical to any of them.
- the antibody proteins are expressed in a heterodimeric form.
- the human antibody proteins are expressed as antibody fragments.
- the antibody fragments are selected from the group consisting of Fab, Fab′, F(ab′) 2 , Fv fragments, diabodies, linear antibodies, and single-chain antibodies.
- polynucleotides further comprise a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination.
- the polynucleotides further encode an alternative scaffold.
- the invention comprises a library of polypeptides encoded by any of the synthetic polynucleotide libraries described herein.
- the doubling time of the population of cells is from about 1 to about 3 hours, from about 3 to about 8 hours, from about 8 to about 16 hours, from about 16 to about 20 hours, or from 20 to about 30 hours.
- the cells are yeast cells.
- the yeast is Saccharomyces cerevisiae.
- the invention comprises a library that has a theoretical total diversity of N unique CDRH3 sequences, wherein N is about 10 6 to about 10 15 ; and wherein the physical realization of the theoretical total CDRH3 diversity has a size of at least about 3N, thereby providing a probability of at least about 95% that any individual CDRH3 sequence contained within the theoretical total diversity of the library is present in the actual library.
- the invention relates to a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the library has a theoretical total diversity of about 10 6 to about 10 15 unique CDRH3 sequences.
- the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody VK amino acid sequences, the method comprising:
- the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody light chain CDR3 sequences, the method comprising:
- the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody V ⁇ amino acid sequences, the method comprising:
- the invention comprises a method of preparing the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, the method comprising:
- one or more of the polynucleotide sequences are synthesized via split-pool synthesis.
- the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.
- the method of the invention further comprises the step of providing a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination.
- the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.
- the step of recombining is performed in yeast.
- the yeast is S. cerevisiae.
- the invention comprises a method of isolating one or more host cells expressing one or more antibodies, the method comprising:
- the method of the invention further comprises the step of isolating one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens.
- the method of the invention further comprises the step of isolating one or more polynucleotide sequences encoding one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens.
- the invention comprises a kit comprising the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, or any of the other sequences disclosed herein.
- the CDRH3 amino acid sequences encoded by the libraries of synthetic polynucleotides described herein, or any of the other sequences disclosed herein, are in computer readable form.
- FIG. 1 depicts a schematic of recombination between a fragment (e.g., CDR3) and a vector (e.g., comprising a chassis and constant region) for the construction of a library.
- a fragment e.g., CDR3
- a vector e.g., comprising a chassis and constant region
- FIG. 3 depicts the length distribution of the CDRL3 regions of rearranged human kappa light chain sequences compiled from the NCBI database (Appendix A).
- FIG. 4 depicts the length distribution of the CDRL3 regions of rearranged human lambda light chain sequences compiled from the NCBI database (Appendix B).
- FIG. 5 depicts a schematic representation of the 424 cloning vectors used in the synthesis of the CDRH3 regions before and after ligation of the [DH]-[N2]-[JH] segment (DTAVYYCAR: SEQ ID NO: 579; DTAVYYCAK: SEQ ID NO: 578; SSASTK: SEQ ID NO: 580).
- FIG. 7 depicts a schematic diagram of a CDRH3 integrated into a heavy chain vector and the polynucleotide and polypeptide sequences of CDRH3 (amino acid: SEQ ID NO: 847; coding strand: SEQ ID NO: 581; complementary strand: SEQ ID NO: 845).
- FIG. 8 depicts a schematic structure of a kappa light chain vector, prior to recombination with a CDRL3.
- FIG. 9 depicts a schematic diagram of a CDRL3 integrated into a light chain vector and the polynucleotide and polypeptide sequences of CDRL3 (amino acid: SEQ ID NO: 848; coding strand: SEQ ID NO: 582; complementary strand: SEQ ID NO: 846).
- FIG. 10 depicts the length distribution of the CDRH3 domain (Kabat positions 95-102) from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 11 depicts the length distribution of the DH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 12 depicts the length distribution of the N2 segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 13 depicts the length distribution of the H3-JH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 14 depicts the length distribution of the CDRH3 domains from 291 sequences prepared from yeast cells transformed according to the method outlined in Example 10.4, namely the co-transformation of vectors containing heavy chain chassis and constant regions with a CDRH3 insert (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 15 depicts the length distribution of the [Tail]-[N1] region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the expected (i.e., designed) distribution.
- FIG. 16 depicts the length distribution of the DH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.
- FIG. 17 depicts the length distribution of the N2 region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.
- FIG. 18 depicts the length distribution of the H3-JH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.
- FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences (observed), as compared to the theoretical (i.e., designed) familial origin.
- FIG. 20 depicts the representation of each of the 16 chassis of the library (observed), as compared to the theoretical (i.e., designed) chassis representation.
- VH3-23 is represented twice; once ending in CAR and once ending in CAK. These representations were combined, as were the ten variants of VH3-33 with one variant of VH3-30.
- FIG. 21 depicts a comparison of the CDRL3 length from 86 sequences selected from the VKCDR3 library of Example 6.2 (observed) to human sequences (human) and the designed sequences (designed).
- FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in an exemplary library of the invention, versus the preimmune repertoire of Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety).
- the present invention is directed to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention.
- the libraries taught in this application are described, at least in part, in terms of the components from which they are assembled.
- the instant invention provides antibody libraries specifically designed based on the composition and CDR length distribution in the naturally occurring human antibody repertoire. It is estimated that, even in the absence of antigenic stimulation, a human makes at least about 10 7 different antibody molecules.
- the antigen-binding sites of many antibodies can cross-react with a variety of related but different epitopes.
- the human antibody repertoire is large enough to ensure that there is an antigen-binding site to fit almost any potential epitope, albeit with low affinity.
- the mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way, by combinatorially joining chromosomally separated gene segments prior to transcription.
- Each type of immunoglobulin (Ig) chain i.e., ⁇ light, ⁇ light, and heavy
- Ig immunoglobulin
- the heavy chains and light chains each consist of a variable region and a constant (C) region.
- the variable regions of the heavy chains are encoded by DNA sequences assembled from three families of gene segments: variable (IGHV), joining (IGHJ) and diversity (IGHD).
- B cell receptor locus After a B cell recognizes an antigen, it is induced to proliferate. During proliferation, the B cell receptor locus undergoes an extremely high rate of somatic mutation that is far greater than the normal rate of genomic mutation. The mutations that occur are primarily localized to the Ig variable regions and comprise substitutions, insertions and deletions. This somatic hypermutation enables the production of B cells that express antibodies possessing enhanced affinity toward an antigen. Such antigen-driven somatic hypermutation fine-tunes antibody responses to a given antigen.
- the present invention provides, for the first time, a fully synthetic antibody library that is representative of the human preimmune antibody repertoire (e.g., in composition and length), and that can be readily screened (i.e., it is physically realizable and, in some cases can be oversampled) using, for example, high throughput methods, to obtain, for example, new therapeutics and/or diagnostics
- the synthetic antibody libraries of the instant invention have the potential to recognize any antigen, including self-antigens of human origin.
- the ability to recognize self-antigens is usually lost in an expressed human library, because self-reactive antibodies are removed by the donor's immune system via negative selection.
- Another feature of the invention is that screening the antibody library using positive clone selection, for example, by FACS (florescence activated cell sorter) bypasses the standard and tedious methodology of generating a hybridoma library and supernatant screening.
- FACS fluorescence activated cell sorter
- the libraries, or sub-libraries thereof can be screened multiple times, to discover additional antibodies against other desired targets.
- antibody is used herein in the broadest sense and specifically encompasses at least monoclonal antibodies, polyclonal antibodies, multi-specific antibodies (e.g., bispecific antibodies), chimeric antibodies, humanized antibodies, human antibodies, and antibody fragments.
- An antibody is a protein comprising one or more polypeptides substantially or partially encoded by immunoglobulin genes or fragments of immunoglobulin genes.
- the recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as myriad immunoglobulin variable region genes.
- Antibody fragments comprise a portion of an intact antibody, for example, one or more portions of the antigen-binding region thereof.
- antibody fragments include Fab, Fab′, F(ab′) 2 , and Fv fragments, diabodies, linear antibodies, single-chain antibodies, and multi-specific antibodies formed from intact antibodies and antibody fragments.
- variable refers to the portions of the immunoglobulin domains that exhibit variability in their sequence and that are involved in determining the specificity and binding affinity of a particular antibody (i.e., the “variable domain(s)”). Variability is not evenly distributed throughout the variable domains of antibodies; it is concentrated in sub-domains of each of the heavy and light chain variable regions. These sub-domains are called “hypervariable” regions or “complementarity determining regions” (CDRs). The more conserved (i.e., non-hypervariable) portions of the variable domains are called the “framework” regions (FRM).
- CDRs complementarity determining regions
- variable domains of naturally occurring heavy and light chains each comprise four FRM regions, largely adopting a ⁇ -sheet configuration, connected by three hypervariable regions, which form loops connecting, and in some cases forming part of, the ⁇ -sheet structure.
- the hypervariable regions in each chain are held together in close proximity by the FRM and, with the hypervariable regions from the other chain, contribute to the formation of the antigen-binding site (see Kabat et al. Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health, Bethesda, Md., 1991, incorporated by reference in its entirety).
- the constant domains are not directly involved in antigen binding, but exhibit various effector functions, such as, for example, antibody-dependent, cell-mediated cytotoxicity and complement activation.
- the “chassis” of the invention represent a portion of the antibody heavy chain variable (IGHV) or light chain variable (IGLV) domains that are not part of CDRH3 or CDRL3, respectively.
- the chassis of the invention is defined as the portion of the variable region of an antibody beginning with the first amino acid of FRM1 and ending with the last amino acid of FRM3.
- the chassis includes the amino acids including from about Kabat position 1 to about Kabat position 94.
- the chassis are defined as including from about Kabat position 1 to about Kabat position 88.
- the chassis of the invention may contain certain modifications relative to the corresponding germline variable domain sequences presented herein or available in public databases.
- modifications may be engineered (e.g., to remove N-linked glycosylation sites) or naturally occurring (e.g., to account for allelic variation).
- immunoglobulin gene repertoire is polymorphic (Wang et al., Immunol. Cell. Biol., 2008, 86: 111; Collins et al., Immunogenetics, 2008, DOI 10.1007/s00251-008-0325-z, published online, each incorporated by reference in its entirety); chassis, CDRs (e.g., CDRH3) and constant regions representative of these allelic variants are also encompassed by the invention.
- one, two or three nucleotides may follow the heavy chain chassis, forming either a partial (if one or two) or a complete (if three) codon. When a full codon is present, these nucleotides encode an amino acid residue that is referred to as the “tail,” and occupies position 95.
- CDRH3 numbering system used herein defines the first amino acid of CDRH3 as being at Kabat position 95 (the “tail,” when present) and the last amino acid of CDRH3 as position 102.
- the amino acids following the “tail” are called “N1” and, when present, are assigned numbers 96, 96A, 96B, etc.
- the N1 segment is followed by the “DH” segment, which is assigned numbers 97, 97A, 97B, 97C, etc.
- the DH segment is followed by the “N2” segment, which, when present, is numbered 98, 98A, 98B, etc.
- the most C-terminal amino acid residue of the set of the “H3-JH” segment is designated as number 102.
- sequence diversity refers to a variety of sequences which are collectively representative of several possibilities of sequences, for example, those found in natural human antibodies.
- heavy chain CDR3 (CDRH3) sequence diversity may refer to a variety of possibilities of combining the known human DH and H3-JH segments, including the N1 and N2 regions, to form heavy chain CDR3 sequences.
- the light chain CDR3 (CDRL3) sequence diversity may refer to a variety of possibilities of combining the naturally occurring light chain variable region contributing to CDRL3 (i.e., L3-VL) and joining (i.e., L3-JL) segments, to form light chain CDR3 sequences.
- L3-VL refers to the portion of the IGHJ gene contributing to CDRH3.
- L3-VL and L3-JL refer to the portions of the IGLV and IGLJ genes (kappa or lambda) contributing to CDRL3, respectively.
- a sequence designed with “directed diversity” has been specifically designed to contain both sequence diversity and length diversity. Directed diversity is not stochastic.
- stochastic describes a process of generating a randomly determined sequence of amino acids, which is considered as a sample of one element from a probability distribution.
- library of polynucleotides refers to two or more polynucleotides having a diversity as described herein, specifically designed according to the methods of the invention.
- library of polypeptides refers to two or more polypeptides having a diversity as described herein, specifically designed according to the methods of the invention.
- library of synthetic polynucleotides refers to a polynucleotide library that includes synthetic polynucleotides.
- library of vectors refers herein to a library of at least two different vectors.
- human antibody libraries at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies.
- library is used herein in its broadest sense, and also may include the sub-libraries that may or may not be combined to produce libraries of the invention.
- synthetic polynucleotide refers to a molecule formed through a chemical process, as opposed to molecules of natural origin, or molecules derived via template-based amplification of molecules of natural origin (e.g., immunoglobulin chains cloned from populations of B cells via PCR amplification are not “synthetic” used herein).
- libraries of the invention that comprise multiple components (e.g., N1, DH, N2, and/or H3-JH)
- the invention encompasses libraries in which at least one of the aforementioned components is synthetic.
- a library in which certain components are synthetic, while other components are of natural origin or derived via template-based amplification of molecules of natural origin would be encompassed by the invention.
- split-pool synthesis refers to a procedure in which the products of a plurality of first reactions are combined (pooled) and then separated (split) before participating in a plurality of second reactions.
- Example 9 describes the synthesis of 278 DH segments (products), each in a separate reaction. After synthesis, these 278 segments are combined (pooled) and then distributed (split) amongst 141 columns for the synthesis of the N2 segments. This enables the pairing of each of the 278 DH segments with each of the 141 N2 segments. As described elsewhere in the specification, these numbers are non-limiting.
- Preimmune antibody libraries have similar sequence diversities and length diversities to naturally occurring human antibody sequences before these sequences have undergone negative selection or somatic hypermutation.
- the set of sequences described in Lee et al. is believed to represent sequences from the preimmune repertoire.
- the sequences of the invention will be similar to these sequences (e.g., in terms of composition and length).
- such antibody libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode antibodies with the potential to recognize any antigen.
- an antibody library comprises about 10 7 to about 10 20 different antibodies and/or polynucleotide sequences encoding the antibodies of the library.
- the libraries of the instant invention are designed to include 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 different antibodies and/or polynucleotide sequences encoding the antibodies.
- the libraries of the invention may comprise or encode about 10 3 to about 10 5 , about 10 5 to about 10 7 , about 10 7 to about 10 9 , about 10 9 to about 10 11 , about 10 11 to about 10 13 , about 10 13 to about 10 15 , about 10 15 to about 10 17 , or about 10 17 to about 10 20 different antibodies.
- the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10 1 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 or less than about 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 .
- the antibodies of the present invention may not be present in expressed human libraries for reasons including because self-reactive antibodies are removed by the donor's immune system via negative selection.
- novel heavy/light chain pairings may in some cases create self-reactive antibody specificity (Griffiths et al. U.S. Pat. No. 5,885,793, incorporated by reference in its entirety).
- the number of unique heavy chains in a library may be about 10, 50, 10 2 , 150, 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 11 , 10 19 , 10 20 , or more.
- the number of unique light chains in a library may be about 5, 10, 25, 50, 10 2 , 150, 500, 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , 10 20 , or more.
- human antibody CDRH3 libraries at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies. “Preimmune” CDRH3 libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRH3 sequences before these sequences undergo negative selection and somatic hypermutation. Known human CDRH3 sequences are represented in various data sets, including Jackson et al., J. Immunol Methods, 2007, 324: 26; Martin, Proteins, 1996, 25: 130; and Lee et al., Immunogenetics, 2006, 57: 917, each of which is incorporated by reference in its entirety.
- an antibody library includes about 10 6 to about 10 15 different CDRH3 sequences and/or polynucleotide sequences encoding said CDRH3 sequences.
- the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , or 10 16 or less than about 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , or 10 16 .
- the probability of a CDRH3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library).
- the preimmune CDRH3 libraries of the invention may also include CDRH3s directed to, for example, self (i.e., human) antigens. Such CDRH3s may not be present in expressed human libraries, because self-reactive CDRH3s are removed by the donor's immune system via negative selection.
- VKCDR3 sequences and “V ⁇ CDR3” sequences refer to the kappa and lambda sub-sets of the CDRL3 sequences, respectively. These libraries may be designed with directed diversity, to collectively represent the length and sequence diversity of the human antibody CDRL3 repertoire. “Preimmune” versions of these libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRL3 sequences before these sequences undergo negative selection.
- Known human CDRL3 sequences are represented in various data sets, including the NCBI database (see Appendix A and Appendix B for light chain sequence data sets) and Martin, Proteins, 1996, 25: 130 incorporated by reference in its entirety.
- such CDRL3 libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode CDRL3s with the potential to recognize any antigen.
- an antibody library comprises about 10 5 different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences.
- the libraries of the instant invention are designed to comprise about 10 1 , 10 2 , 10 3 , 10 4 , 10 6 , 10 7 , or 10 8 different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences.
- the libraries of the invention may comprise or encode about 10 1 to about 10 3 , about 10 3 to about 10 5 , or about 10 5 to about 10 8 different CDRL3 sequences.
- the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10 1 , 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , or 10 8 or less than about 10 1 , 10 2 , 10 1 , 10 4 , 10 5 , 10 6 , 10 7 , or 10 8 .
- the probability of a CDRL3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5% or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library).
- the preimmune CDRL3 libraries of the invention may also include CDRL3s directed to, for example, self (i.e., human) antigens. Such CDRL3s may not be present in expressed human libraries, because self-reactive CDRL3s are removed by the donor's immune system via negative selection.
- known light chain CDR3 sequences refers to light chain CDR3 sequences (e.g., kappa or lambda) in the public domain that have been cloned from populations of human B cells. Examples of such sequences are those published or derived from public data sets, including, for example, the NCBI database (see Appendices A and B filed herewith).
- antibody binding regions refers to one or more portions of an immunoglobulin or antibody variable region capable of binding an antigen(s).
- the antibody binding region is, for example, an antibody light chain (or variable region or one or more CDRs thereof), an antibody heavy chain (or variable region or one or more CDRs thereof), a heavy chain Fd region, a combined antibody light and heavy chain (or variable regions thereof) such as a Fab, F(ab′) 2 , single domain, or single chain antibodies (scFv), or any region of a full length antibody that recognizes an antigen, for example, an IgG (e.g., an IgG1, IgG2, IgG3, or IgG4 subtype), IgA1, IgA2, IgD, IgE, or IgM antibody.
- an IgG e.g., an IgG1, IgG2, IgG3, or IgG4 subtype
- IgA1, IgA2, IgD, IgE, or IgM antibody e
- framework region refers to the art-recognized portions of an antibody variable region that exist between the more divergent (i.e., hypervariable) CDRs.
- framework regions are typically referred to as frameworks 1 through 4 (FRM1, FRM2, FRM3, and FRM4) and provide a scaffold for the presentation of the six CDRs (three from the heavy chain and three from the light chain) in three dimensional space, to form an antigen-binding surface.
- canonical structure refers to the main chain conformation that is adopted by the antigen binding (CDR) loops. From comparative structural studies, it has been found that five of the six antigen binding loops have only a limited repertoire of available conformations. Each canonical structure can be characterized by the torsion angles of the polypeptide backbone. Correspondent loops between antibodies may, therefore, have very similar three dimensional structures, despite high amino acid sequence variability in most parts of the loops (Chothia and Lesk, J. Mol. Biol., 1987, 196: 901; Chothia et al., Nature, 1989, 342: 877; Martin and Thornton, J. Mol.
- the conformation of a particular canonical class is determined by the length of the loop and the amino acid residues residing at key positions within the loop, as well as within the conserved framework (i.e., outside of the loop). Assignment to a particular canonical class can therefore be made based on the presence of these key amino acid residues.
- the term “canonical structure” may also include considerations as to the linear sequence of the antibody, for example, as catalogued by Kabat (Kabat et al., in “Sequences of Proteins of Immunological Interest,” 5 th Edition, U.S.
- Kabat numbering scheme is a widely adopted standard for numbering the amino acid residues of an antibody variable domain in a consistent manner. Additional structural considerations can also be used to determine the canonical structure of an antibody. For example, those differences not fully reflected by Kabat numbering can be described by the numbering system of Chothia et al. and/or revealed by other techniques, for example, crystallography and two or three-dimensional computational modeling. Accordingly, a given antibody sequence may be placed into a canonical class which allows for, among other things, identifying appropriate chassis sequences (e.g., based on a desire to include a variety of canonical structures in a library). Kabat numbering of antibody amino acid sequences and structural considerations as described by Chothia et al., and their implications for construing canonical aspects of antibody structure, are described in the literature.
- CDR refers to a complementarity determining region (CDR) of which three make up the binding character of a light chain variable region (CDRL1, CDRL2 and CDRL3) and three make up the binding character of a heavy chain variable region (CDRH1, CDRH2 and CDRH3).
- CDRs contribute to the functional activity of an antibody molecule and are separated by amino acid sequences that comprise scaffolding or framework regions.
- the exact definitional CDR boundaries and lengths are subject to different classification and numbering systems. CDRs may therefore be referred to by Kabat, Chothia, contact or any other boundary definitions, including the numbering system described herein.
- amino acid typically refers to an amino acid having its art recognized definition such as an amino acid selected from the group consisting of: alanine (Ala or A); arginine (Arg or R); asparagine (Asn or N); aspartic acid (Asp or D); cysteine (Cys or C); glutamine (Gln or Q); glutamic acid (Glu or E); glycine (Gly or G); histidine (His or H); isoleucine (Ile or I): leucine (Leu or L); lysine (Lys or K); methionine (Met or M); phenylalanine (Phe or F); proline (Pro or P); serine (Ser or S); threonine (Thr or T); tryptophan (Trp or W); tyrosine (Tyr or Y); and valine (Val or V), although modified, synthetic, or rare amino acids may be used as desired.
- polynucleotide(s) refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry).
- the polynucleotides may be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, be modified. Typical modifications include methylation, biotinylation, and other art-known modifications.
- the nucleic acid molecule can be single-stranded or double-stranded and, where desired, linked to a detectable moiety.
- the term “physical realization” refers to a portion of the theoretical diversity that can actually be physically sampled, for example, by any display methodology.
- Exemplary display methodology include: phage display, ribosomal display, and yeast display.
- the size of the physical realization of a library depends on (1) the fraction of the theoretical diversity that can actually be synthesized, and (2) the limitations of the particular screening method.
- Exemplary limitations of screening methods include the number of variants that can be screened in a particular assay (e.g., ribosome display, phage display, yeast display) and the transformation efficiency of a host cell (e.g., yeast, mammalian cells, bacteria) which is used in a screening assay.
- the term “functionally expressed” refers to those immunoglobulin genes that are expressed by human B cells and that do not contain premature stop codons.
- percent occurrence of each amino acid residue at each position refers to the percentage of instances in a sample in which an amino acid is found at a defined position within a particular sequence. For example, given the following three sequences:
- the term “most frequently occurring amino acids” at a specified position of a sequence in a population of polypeptides refers to the amino acid residues that have the highest percent occurrence at the indicated position in the indicated polypeptide population.
- the most frequently occurring amino acids in each of the three most N-terminal positions in N1 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 21, and the most frequently occurring amino acids in each of the three most N-terminal positions in N2 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 22.
- a “central loop” of CDRH3 is defined. If the C-terminal 5 amino acids from Kabat CDRH3 (95-102) are removed, then the remaining sequence is termed the “central loop”. Thus, considering the duplet occurrence calculations of Example 13, using a CDRH3 of size 6 or less would not contribute to the analysis of the occurrence of duplets.
- a CDRH3 of size 7 would contribute only to the i ⁇ i+1 data set, a CDRH3 of size 8 would also contribute to the i ⁇ i+2 data set, and a CDRH3 of size 9 and larger would also contribute to the i ⁇ i+3 data set.
- a CDR H3 of size 9 may have amino acids at positions 95-96-97-98-99-100-100A-101-102, but only the first four residues (bolded) would be part of the central loop and contribute to the pair-wise occurrence (duplet) statistics.
- the chassis may be designed to encode portions of a polypeptide encoding an antibody fragment or subunit of an antibody fragment, so that a sequence encoding an antibody fragment, or subunit thereof, is produced when the oligonucleotide cassette containing the CDR is recombined with the acceptor vector.
- the invention provides a synthetic, preimmune human antibody repertoire comprising about 10 7 to about 10 20 antibody members, wherein the repertoire comprises:
- the heavy chain chassis may be any sequence with homology to Kabat residues 1 to 94 of an immunoglobulin heavy chain variable domain.
- Non-limiting examples of heavy chain chassis are included in the Examples, and one of ordinary skill in the art will readily recognize that the principles presented therein, and throughout the specification, may be used to derive additional heavy chain chassis.
- the heavy chain chassis region is followed, optionally, by a “tail” region.
- the tail region comprises zero, one, or more amino acids that may or may not be selected on the basis of comparing naturally occurring heavy chain sequences. For example, in certain embodiments of the invention, heavy chain sequences available in the art may be compared, and the residues occurring most frequently in the tail position in the naturally occurring sequences included in the library (e.g., to produce sequences that most closely resemble human sequences). In other embodiments, amino acids that are used less frequently may be used. In still other embodiments, amino acids selected from any group of amino acids may be used. In certain embodiments of the invention, the length of the tail is zero (no residue) or one (e.g., G/D/E) amino acid.
- N1 and N2 are two peptide segments derived from nucleotides which are added by TdT in the naturally occurring human antibody repertoire. These segments are designated N1 and N2 (referred to herein as N1 and N2 segments, domains, regions or sequences).
- N1 and N2 are about 0, 1, 2, or 3 amino acids in length. Without being bound by theory, it is thought that these lengths most closely mimic the N1 and N2 lengths found in the human repertoire (see FIG. 2 ). In other embodiments of the invention, N1 and N2 may be about 4, 5, 6, 7, 8, 9, or 10 amino acids in length.
- the composition of the amino acid residues utilized to produce the N1 and N2 segments may also vary.
- the amino acids used to produce N1 and N2 segments may be selected from amongst the eight most frequently occurring amino acids in the N1 and N2 domains of the human repertoire (e.g., G, R, S, P, L, A, V, and T).
- the amino acids used to produce the N1 and N2 segments may be selected from the group consisting of fewer than about 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, or 3 of the amino acids preferentially encoded by the activity of TdT and functionally expressed by human B cells.
- N1 and N2 may comprise amino acids selected from any group of amino acids. It is not required that N1 and N2 be of a similar length or composition, and independent variation of the length and composition of N1 and N2 is one method by which additional diversity may be introduced into the library.
- DH segments of the libraries are based on the peptides encoded by the naturally occurring IGHD gene repertoire, with progressive deletion of residues at the N- and C-termini.
- IGHD genes may be read in multiple reading frames, and peptides representing these reading frames, and their N- and C-terminal deletions are also included in the libraries of the invention.
- DH segments as short as three amino acid residues may be included in the libraries.
- DH segments as short as about 1, 2, 4, 5, 6, 7, or 8 amino acids may be included in the libraries.
- the H3-JH segments of the libraries are based on the peptides encoded by the naturally occurring IGHJ gene repertoire, with progressive deletion of residues at the N-terminus.
- the N-terminal portion of the IGHJ segment that makes up part of the CDRH3 is referred to herein as H3-JH.
- the H3-JH segment may be represented by progressive N-terminal deletions of one or more H3-JH residues, down to two H3-JH residues.
- the H3-JH segments of the library may contain N-terminal deletions (or no deletions) down to about 6, 5, 4, 3, 2, 1, or 0 H3-JH residues.
- the light chain chassis of the libraries may be any sequence with homology to Kabat residues 1 to 88 of naturally occurring light chain ( ⁇ or ⁇ ) sequences.
- the light chain chassis of the invention are synthesized in combinatorial fashion, utilizing VL and JL segments, to produce one or more libraries of light chain sequences with diversity in the chassis and CDR3 sequences.
- the light chain CDR3 sequences are synthesized using degenerate oligonucleotides or trinucleotides and recombined with the light chain chassis and light chain constant region, to form full-length light chains.
- the instant invention also provides methods for producing and using such libraries, as well as libraries comprising one or more immunoglobulin domains or antibody fragments. Design and synthesis of each component of the claimed antibody libraries is provided in more detail below.
- chassis sequences which are based on naturally occurring variable domain sequences (e.g., IGHV and IGLV). This selection can be done arbitrarily, or by the selection of chassis that meet certain criteria.
- the Kabat database an electronic database containing non-redundant rearranged antibody sequences, can be queried for those heavy and light chain germline sequences that are most frequently represented.
- the BLAST search algorithm or more specialized tools such as SoDA (Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety), can be used to compare rearranged antibody sequences with germline sequences, using the V BASE2 database (Retter et al., Nucleic Acids Res., 2005, 33: D671-D674), or similar collections of human V, D, and J genes, to identify the germline families that are most frequently used to generate functional antibodies.
- SoDA Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety
- Chassis may also be chosen based on their representation in the peripheral blood of humans. In certain embodiments of the invention, it may be desirable to select chassis that correspond to germline sequences that are highly represented in the peripheral blood of humans. In other embodiments, it may be desirable to select chassis that correspond to germline sequences that are less frequently represented, for example, to increase the canonical diversity of the library. Therefore, chassis may be selected to produce libraries that represent the largest and most structurally diverse group of functional human antibodies.
- the antibody library comprises variable heavy domains and variable light domains, or portions thereof. Each of these domains is built from certain components, which will be more fully described in the examples provided herein.
- the libraries described herein may be used to isolate fully human antibodies that can be used as diagnostics and/or therapeutics. Without being bound by theory, antibodies with sequences most similar or identical to those most frequently found in peripheral blood (for example, in humans) may be less likely to be immunogenic when administered as therapeutic agents.
- the VH domains of the library may be considered to comprise three primary components: (1) a VH “chassis”, which includes amino acids 1 to 94 (using Kabat numbering), (2) the CDRH3, which is defined herein to include the Kabat CDRH3 proper (positions 95-102), and (3) the FRM4 region, including amino acids 103 to 113 (Kabat numbering).
- the overall VH structure may therefore be depicted schematically (not to scale) as:
- VH chassis sequences selected for use in the library may correspond to all functionally expressed human IGHV germline sequences.
- IGHV germline sequences may be selected for representation in a library according to one or more criteria.
- the selected IGHV germline sequences may be among those that are most highly represented among antibody molecules isolated from the peripheral blood of healthy adults, children, or fetuses.
- the specific germline sequences chosen for representation of a particular IGHV family in a library of the invention may therefore comprise at least about 100%, 99%, 98%, 97%, 96% 95%, 94%, 93%, 92%, 91% 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or 0% of the particular IGHV family member repertoire found in peripheral blood.
- VH chassis based on IGHV germline sequences that may maximize the probability of isolating an antibody with particular characteristics. For example, without being bound by theory, in some embodiments it may be advantageous to restrict the IGHV germline sequences to include only those germline sequences that are utilized in antibodies undergoing clinical development, or antibodies that have been approved as therapeutics. On the other hand, in some embodiments, it may be advantageous to produce libraries containing VH chassis that are not represented amongst clinically utilized antibodies. Such libraries may be capable of yielding antibodies with novel properties that are advantageous over those obtained with the use of “typical” IGHV germline sequences, or enabling studies of the structures and properties of “atypical” IGHV germline sequences or canonical structures.
- the VH chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 94 of one or more of the following IGHV germline sequences: IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NO: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV3-9 (SEQ ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV2-70 (SEQ
- VH chassis While the selection of the VH chassis with sequences based on the IGHV germline sequences is expected to support a large diversity of CDRH3 sequences, further diversity in the VH chassis may be generated by altering the amino acid residues comprising the CDRH1 and/or CDRH2 regions of each chassis selected for inclusion in the library (see Example 2).
- the alterations or mutations in the amino acid residues comprising the CDRH1 and CDRH2 regions, or other regions, of the IGHV germline sequences are made after analyzing the sequence identity within data sets of rearranged human heavy chain sequences that have been classified according to the identity of the original IGHV germline sequence from which the rearranged sequences are derived. For example, from a set of rearranged antibody sequences, the IGHV germline sequence of each antibody is determined, and the rearranged sequences are classified according to the IGHV germline sequence. This determination is made on the basis of sequence identity.
- Example 2 illustrates the application of this method to heavy chains derived from a particular IGHV germline.
- this method can be applied to any germline sequence, and can be used to generate at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 1000, 10 4 , 10 5 , 10 6 , or more variants of each heavy chain chassis.
- the light chain chassis of the invention may be based on kappa and/or lambda light chain sequences.
- the principles underlying the selection of light chain variable (IGLV) germline sequences for representation in the library are analogous to those employed for the selection of the heavy chain sequences (described above and in Examples 1 and 2).
- the methods used to introduce variability into the selected heavy chain chassis may also be used to introduce variability into the light chain chassis.
- a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- DH segments may be selected from among those that are commonly expressed, it is also within the scope of the invention to select these gene segments based on the fact that they are less commonly expressed. This may be advantageous, for example, in obtaining antibodies toward self-antigens or in further expanding the diversity of the library.
- DH segments can be used to add compositional diversity in a manner that is strictly relative to their occurrence in actual human heavy chain sequences.
- the progressive deletion of IGHD genes containing disulfide loop encoding segments may be limited, so as to leave the loop intact and to avoid the presence of unpaired cysteine residues.
- the presence of the loop can be ignored and the progressive deletion of the IGHD gene segments can occur as for any other segments, regardless of the presence of unpaired cysteine residues.
- the cysteine residues can be mutated to any other amino acid.
- a library may contain one or more of these sequences, allelic variations thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 60%, 55%, or 50% identical to one or more of these sequences.
- TdT is responsible for the addition of nucleotides to the V-D and D-J junctions of antibody heavy chains (Alt and Baltimore, PNAS, 1982, 79: 4118; Collins et al., J. Immunol., 2004, 172: 340, each incorporated by reference in its entirety). Specifically, TdT is responsible for creating the N1 and N2 (non-templated) segments that flank the D (diversity) region.
- the libraries of the current invention are the consideration of the composition of naturally occurring duplet and triplet amino acid sequences during the design of the library.
- Table 23 shows the top twenty-five naturally occurring duplets in the N1 and N2 regions. Many of these can be represented by the general formula (G/P)(G/R/S/P/L/A/V/T) or (R/S/L/A/V/T)(G/P).
- the synthetic N1 and N2 regions may comprise all of these duplets.
- the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most common naturally occurring N1 and/or N2 duplets.
- the libraries may include duplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.
- the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most commonly occurring N1 and/or N2 triplets.
- the libraries may include triplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.
- N1 segments there are about 59 total N1 segments and about 59 total N2 segments used to create a library of CDRH3s.
- the number of N1 segments, N2 segments, or both is increased to about 141 (see, for example, Example 5).
- compositions and lengths of the N1 and N2 segments vary from those presented in the Examples herein.
- sub-stoichiometric synthesis of trinucleotides may be used for the synthesis of N1 and N2 segments.
- Sub-stoichiometric synthesis with trinucleotides is described in Knappik et al. (U.S. Pat. No. 6,300,064, incorporated by reference in its entirety). The use of sub-stoichiometric synthesis would enable synthesis with consideration of the length variation in the N1 and N2 sequences.
- a model of the activity of TdT may also be used to determine the composition of the N1 and N2 sequences in a library of the invention.
- the probability of incorporating a particular nucleotide base (A, C, G, T) on a polynucleotide, by the activity of TdT is dependent on the type of base and the base that occurs on the strand directly preceding the base to be added.
- Jackson et al. (J. Immunol. Methods, 2007, 324: 26, incorporated by reference in its entirety) have constructed a Markov model describing this process.
- this model may be used to determine the composition of the N1 and/or N2 segments used in libraries of the invention.
- the parameters presented in Jackson et al. could be further refined to produce sequences that more closely mimic human sequences.
- [X] is any amino acid residue or no residue.
- the sequence WG(Q/R)G (SEQ ID NO: 23) is presented in this exemplary embodiment, additional amino acids, C-terminal to this sequence in FRM4-JH may also be included in the polynucleotide encoding the C-terminal sequence.
- the purpose of the polynucleotides encoding the N-terminal and C-terminal sequences, in this case, is to facilitate homologous recombination, and one of ordinary skill in the art would recognize that these sequences may be longer or shorter than depicted below. Accordingly, in certain embodiments of the invention, the overall design of the CDRH3 repertoire, including the sequences required to facilitate homologous recombination with the selected chassis, can be represented by the following formula (regions homologous with vector underlined):
- the CDRH3 repertoire can be represented by the following formula, which excludes the T residue presented in the schematic above:
- the instant invention in addition to accounting for the composition of naturally occurring CDRH3 segments, the instant invention also takes into account the length distribution of naturally occurring CDRH3 segments.
- Surveys by Zemlin et al. (JMB, 2003, 334: 733, incorporated by reference in its entirety) and Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety) provide analyses of the naturally occurring CDRH3 lengths. These data show that about 95% of naturally occurring CDRH3 sequences have a length from about 7 to about 23 amino acids.
- the instant invention provides rationally designed antibody libraries with CDRH3 segments which directly mimic the size distribution of naturally occurring CDRH3 sequences.
- the length of the CDRH3s may be about 2 to about 30, about 3 to about 35, about 7 to about 23, about 3 to about 28, about 5 to about 28, about 5 to about 26, about 5 to about 24, about 7 to about 24, about 7 to about 22, about 8 to about 19, about 9 to about 22, about 9 to about 20, about 10 to about 18, about 11 to about 20, about 11 to about 18, about 13 to about 18, or about 13 to about 16 residues in length.
- the length distribution of a CDRH3 library of the invention may be defined based on the percentage of sequences within a certain length range.
- CDRH3s with a length of about 10 to about 18 amino acid residues comprise about 84% to about 94% of the sequences of a the library.
- sequences within this length range comprise about 89% of the sequences of a library.
- CDRH3s with a length of about 11 to about 17 amino acid residues comprise about 74% to about 84% of the sequences of a library. In some embodiments, sequences within this length range comprise about 79% of the sequences of a library.
- CDRH3s with a length of about 12 to about 16 residues comprise about 57% to about 67% of the sequences of a library. In some embodiments, sequences within this length range comprise about 62% of the sequences of a library.
- CDRH3s with a length of about 13 to about 15 residues comprise about 35% to about 45% of the sequences of a library. In some embodiments, sequences within this length range comprise about 40% of the sequences of a library.
- the CDRL3 libraries of the invention can be generated by one of several approaches.
- the actual version of the CDRL3 library made and used in a particular embodiment of the invention will depend on objectives for the use of the library.
- More than one CDRL3 library may be used in a particular embodiment; for example, a library containing CDRH3 diversity, with kappa and lambda light chains is within the scope of the invention.
- a CDRL3 library is a VKCDR3 (kappa) library and/or a V ⁇ CDR3 (lambda) library.
- the CDRL3 libraries described herein differ significantly from CDRL3 libraries in the art. First, they consider length variation that is consistent with what is observed in actual human sequences. Second, they take into consideration the fact that a significant portion of the CDRL3 is encoded by the IGLV gene. Third, the patterns of amino acid variation within the IGLV gene-encoded CDRL3 portions are not stochastic and are selected based on depending on the identity of the IGLV gene.
- CDRL3 libraries that faithfully mimic observed patterns in human sequences cannot use a generic design that is independent of the chassis sequences in FRM1 to FRM3.
- JL to CDRL3 is also considered explicitly, and enumeration of each amino acid residue at the relevant positions is based on the compositions and natural variations of the JL genes themselves.
- a unique aspect of the design of the libraries of the invention is the germline or “chassis-based” aspect, which is meant to preserve more of the integrity and variability of actual human sequences. This is in contrast to other codon-based synthesis or degenerate oligonucleotide synthesis approaches that have been described in the literature and that aim to produce “one-size-fits-all” (e.g., consensus) libraries (e.g., Knappik, et al., J Mol Biol, 2000, 296: 57; Akamatsu et al., J Immunol, 1993, 151: 4651, each incorporated by reference in its entirety).
- patterns of occurrence of particular amino acids at defined positions within VL sequences are determined by analyzing data available in public or other databases, for example, the NCBI database (see, for example, GI numbers in Appendices A and B filed herewith). In certain embodiments of the invention, these sequences are compared on the basis of identity and assigned to families on the basis of the germline genes from which they are derived. The amino acid composition at each position of the sequence, in each germline family, may then be determined. This process is illustrated in the Examples provided herein.
- the light chain CDR3 library is a VKCDR3 library.
- Certain embodiments of the invention may use only the most common VKCDR3 length, nine residues; this length occurs in a dominant proportion (greater than about 70%) of human VKCDR3 sequences.
- positions 89-95 are encoded by the IGKV gene and positions 96-97 are encoded by the IGKJ gene.
- Analysis of human kappa light chain sequences indicates that there are not strong biases in the usage of the IGKJ genes.
- each of the five the IGKJ genes can be represented in equal proportions to create a combinatorial library of (M VK chassis) ⁇ (5 JK genes), or a library of size M ⁇ 5.
- it may be desirable to bias IGKJ gene representation for example to restrict the size of the library or to weight the library toward IGKJ genes known to have particular properties.
- amino acid residue at position 96 may be one of these seven residues. In other embodiments of the invention, the amino acid at this position may be chosen from amongst any of the other 13 amino acid residues.
- the amino acid residue at position 96 may be chosen from amongst the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at position 96, or even residues that never occur at position 96.
- the occurrence of the amino acids selected to occupy position 96 may be equivalent or weighted.
- arginine occurs at position 96 most frequently when the IGKJ1 (SEQ ID NO: 552) germline sequence is used. Therefore, in certain embodiments of the invention, it may be desirable to bias amino acid usage at position 96 according to the origin of the IGKJ germline sequence(s) and/or the IGKV germline sequence(s) selected for representation in a library.
- F, L, I, R, W, Y, and P are the seven most commonly occurring amino acids at position 96 of VKCDR3s with length nine, X is any amino acid, and JK* is an IGKJ amino acid sequence without the N-terminal residue (i.e., the N-terminal residue is substituted with F, L, I, R, W, Y, P, or X).
- 70 members could be produced by utilizing 10 VK chassis, each paired with its respective L3-VK, 7 amino acids at position 96 (i.e., X), and one JK* sequence.
- Another embodiment of the library may have 350 members, produced by combining 10 VK chassis, each paired with its respective L3-VK, with 7 amino acids at position 96, and all 5 JK* genes. Still another embodiment of the library may have 1,125 members, produced by combining 15 VK chassis, each paired with its respective H3-JK, with 15 amino acids at position 96 and all JK* genes, and so on.
- a person of ordinary skill in the art will readily recognize that many other combinations are possible.
- the L3-VK regions may also be combinatorially varied with different VK chassis regions, to create additional diversity.
- VKCDR3 of lengths 8 and 10 represent about 8.5% and about 16%, respectively, of VKCDR3 lengths in representative samples (Example 6.2; FIG. 3 ).
- more complex VKCDR3 libraries may include CDR lengths of 8, 10, and 11 amino acids. Such libraries could account for a greater percentage of the length distribution observed in collections of human VKCDR3 sequences, or even introduce VKCDR3 lengths that do not occur frequently in human VKCDR3 sequences (e.g., less than eight residues or greater than 11 residues).
- the inclusion of a diversity of kappa light chain length variations in a library of the invention also enables one to include sequence variability that occurs outside of the amino acid at the VK-JK junction (i.e., position 96, described above).
- the patterns of sequence variation within the VK, and/or JK segments can be determined by aligning collections of sequences derived from particular germline sequences.
- the frequency of occurrence of amino acid residues within VKCDR3 can be determined by sequence alignments (e.g., see Example 6.2 and Table 30).
- this frequency of occurrence may be used to introduce variability into the VK_Chassis, L3-VK and/or JK segments that are used to synthesize the VKCDR3 libraries.
- the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at any particular position in a naturally occurring repertoire may be included at that position in a VKCDR3 library of the invention.
- the percent occurrence of any amino acid at any particular position within the VKCDR3 or a VK light chain may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%.
- the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5% 6%, 7%, 8%, 9% 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.
- a VKCDR3 library may be synthesized using degenerate oligonucleotides (see Table 31 for IUPAC base symbol definitions).
- the limits of oligonucleotide synthesis and the genetic code may require the inclusion of more or fewer amino acids at a particular position in the VKCDR3 sequences. An illustrative embodiment of this approach is provided in Example 6.2.
- oligonucleotide synthesis may, in some cases, require the inclusion of more or fewer amino acids at a particular position within VKCDR3 (e.g., Example 6.2, Table 32), in comparison to those amino acids found at that position in nature.
- This limitation can be overcome through the use of a codon-based synthesis approach (Vimekas et al. Nucleic Acids Res., 1994, 22: 5600, incorporated by reference in its entirety), which enables precise synthesis of oligonucleotides encoding particular amino acids and a finer degree of control over the proportion of any particular amino acid incorporated at any position.
- Example 6.3 describes this approach in greater detail.
- a codon-based synthesis approach may be used to vary the percent occurrence of any amino acid at any particular position within the VKCDR3 or kappa light chain.
- the percent occurrence of any amino acid at any position in a VKCDR3 or kappa light chain sequence of the library may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%.
- the percent occurrence of any amino acid at any position may be about 1%, 2%, 3%, or 4%. In certain embodiments of the invention, the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.
- the VKCDR3 (and any other sequence used in the library, regardless of whether or not it is part of VKCDR3) may be altered to remove undesirable amino acid motifs.
- peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P will undergo post-translational modification (N-linked glycosylation) in a number of expression systems, including yeast and mammalian cells.
- the introduction of N residues at certain positions may be avoided, so as to avoid the introduction of N-linked glycosylation sites. In some embodiments of the invention, these modifications may not be necessary, depending on the organism used to express the library and the culture conditions.
- N-linked glycosylation sites e.g., bacteria
- N-X-(S/T) sequences the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity.
- the individual sub-libraries of different lengths e.g., one or more of lengths 5, 6, 7, 8, 9, 10, 11, or more
- it may be preferable to create the individual sub-libraries of different lengths e.g., one or more of lengths 5, 6, 7, 8, 9, 10, 11, or more
- the sub-libraries in proportions that reflect the length distribution of VKCDR3 in human sequences; for example, in ratios approximating the 1:9:2 distribution that occurs in natural VKCDR3 sequences of lengths 8, 9, and 10 (see FIG. 3 ).
- V ⁇ CDR3 libraries of the invention The principles used to design the minimalist V ⁇ CDR3 libraries of the invention are similar to those enumerated above, for the VKCDR3 libraries, and are explained in more detail in the Examples.
- One difference between the V ⁇ CDR3 libraries of the invention and the VKCDR3 libraries of the invention is that, unlike the IGKV genes, the contribution of the IGV ⁇ genes to CDRL3 (i.e., L3-V ⁇ ) is not constrained to a fixed number of amino acid residues.
- VK including L3-VK
- JK segments with inclusion of position 96, yields CDRL3 with a length of only 9 residues
- length variation may be obtained within a V ⁇ CDR3 library even when only the V ⁇ (including L3-V ⁇ ) and J ⁇ segments are considered.
- both the heavy and light chain chassis sequences and the heavy and light chain CDR3 sequences are synthetic.
- the polynucleotide sequences of the instant invention can be synthesized by various methods. For example, sequences can be synthesized by split pool DNA synthesis as described in Feldhaus et al., Nucleic Acids Research, 2000, 28: 534; Omstein et al., Biopolymers, 1978, 17: 2341; and Brenner and Lerner, PNAS, 1992, 87: 6378 (each of which is incorporated by reference in its entirety).
- cassettes representing the possible V, D, and J diversity found in the human repertoire, as well as junctional diversity are synthesized de novo either as double-stranded DNA oligonucleotides, single-stranded DNA oligonucleotides representative of the coding strand, or single-stranded DNA oligonucleotides representative of the non-coding strand.
- These sequences can then be introduced into a host cell along with an acceptor vector containing a chassis sequence and, in some cases a portion of FRM4 and a constant region. No primer-based PCR amplification from mammalian cDNA or mRNA or template-directed cloning steps from mammalian cDNA or mRNA need be employed.
- a vector carrying a mutant gene can contain two sequence segments that are homologous to the 5′ and 3′ open reading frame (ORF) sequences of a gene that is intended to be interrupted or mutated.
- the vector may also encode a positive selection marker, such as a nutritional enzyme allele (e.g., URA3) and/or an antibiotic resistant marker (e.g., Geneticin/G418), flanked by the two homologous DNA segments.
- a positive selection marker such as a nutritional enzyme allele (e.g., URA3) and/or an antibiotic resistant marker (e.g., Geneticin/G418), flanked by the two homologous DNA segments.
- URA3 nutritional enzyme allele
- an antibiotic resistant marker e.g., Geneticin/G418
- the plasmid vector may include a positive selection marker, such as a nutritional enzyme allele (e.g., URA3), or an antibiotic resistance marker (e.g., Geneticin/G418).
- a positive selection marker such as a nutritional enzyme allele (e.g., URA3), or an antibiotic resistance marker (e.g., Geneticin/G418).
- URA3 nutritional enzyme allele
- an antibiotic resistance marker e.g., Geneticin/G4108
- the plasmid vector is then linearized by a unique restriction cut located in-between the regions of sequence homology shared with the target gene fragment, thereby creating an artificial gap at the cleavage site.
- the linearized plasmid vector and the target gene fragment flanked by sequences homologous to the plasmid vector are co-transformed into a yeast host strain.
- the yeast is then able to recognize the two stretches of sequence homology between the vector and target gene fragment and facilitate a reciprocal exchange of DNA content through homologous recombination at the gap.
- the target gene fragment is inserted into the vector without ligation.
- the method described above has also been demonstrated to work when the target gene fragments are in the form of single stranded DNA, for example, as a circular M13 phage derived form, or as single stranded oligonucleotides (Simon and Moore, Mol. Cell Biol., 1987, 7: 2329; Ivanov et al., Genetics, 1996, 142: 693; and DeMarini et al., 2001, 30: 520, each incorporated by reference in its entirety).
- the form of the target that can be recombined into the gapped vector can be double stranded or single stranded, and derived from chemical synthesis, PCR, restriction digestion, or other methods.
- the efficiency of the gap repair is correlated with the length of the homologous sequences flanking both the linearized vector and the target gene.
- about 20 or more base pairs may be used for the length of the homologous sequence, and about 80 base pairs may give a near-optimized result (Hua et al., Plasmid, 1997, 38: 91; Raymond et al., Genome Res., 2002, 12: 190, each incorporated by reference in its entirety).
- At least about 5, 10, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 187, 190, or 200 homologous base pairs may be used to facilitate recombination. In other embodiments, between about 20 and about 40 base pairs are utilized.
- the reciprocal exchange between the vector and gene fragment is strictly sequence-dependent, i.e. it does not cause a frame shift. Therefore, gap-repair cloning assures the insertion of gene fragments with both high efficiency and precision.
- yeast libraries of gene fragments have also been constructed in yeast using homologous recombination.
- a human brain cDNA library was constructed as a two-hybrid fusion library in vector pJG4-5 (Guidotti and Zervos, Yeast, 1999, 15: 715, incorporated by reference in its entirety). It has also been reported that a total of 6,000 pairs of PCR primers were used for amplification of 6,000 known yeast ORFs for a study of yeast genomic protein interactions (Hudson et al., Genome Res., 1997, 7: 1169, incorporated by reference in its entirety). In 2000, Uetz et al.
- a synthetic CDR3 (heavy or light chain) may be joined by homologous recombination with a vector encoding a heavy or light chain chassis, a portion of FRM4, and a constant region, to form a full-length heavy or light chain.
- the homologous recombination is performed directly in yeast cells.
- the method comprises:
- the CDR3 inserts may have a 5′ flanking sequence and a 3′ flanking sequence that are homologous to the termini of the linearized vector.
- a host cell for example, a yeast cell
- the “gap” the linearization site
- the homologous sequences at the 5′ and 3′ termini of these two linear double-stranded DNAs i.e., the vector and the insert.
- libraries of circular vectors encoding full-length heavy or light chains comprising variable CDR3 inserts is generated. Particular instances of these methods are presented in the Examples.
- Subsequent analysis may be carried out to determine the efficiency of homologous recombination that results in correct insertion of the CDR3 sequences into the vectors. For example, PCR amplification of the CDR3 inserts directly from selected yeast clones may reveal how many clones are recombinant. In certain embodiments, libraries with minimum of about 90% recombinant clones are utilized.
- libraries with a minimum of about 1%, 5% 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% recombinant clones are utilized.
- the same PCR amplification of selected clones may also reveal the insert size.
- a PCR amplification product with the correct size of insert may be “fingerprinted” with restriction enzymes known to cut or not cut within the amplified region. From a gel electrophoresis pattern, it may be determined whether the clones analyzed are of the same identity or of the distinct or diversified identity. The PCR products may also be sequenced directly to reveal the identity of inserts and the fidelity of the cloning procedure, and to prove the independence and diversity of the clones.
- FIG. 1 depicts a schematic of recombination between a fragment (e.g., CDR3) and a vector (e.g., comprising a chassis, portion of FRM4, and constant region) for the construction of a library.
- a fragment e.g., CDR3
- a vector e.g., comprising a chassis, portion of FRM4, and constant region
- Libraries of polynucleotides generated by any of the techniques described herein, or other suitable techniques, can be expressed and screened to identify antibodies having desired structure and/or activity.
- Expression of the antibodies can be carried out, for example, using cell-free extracts (and e.g., ribosome display), phage display, prokaryotic cells (e.g., bacterial display), or eukaryotic cells (e.g., yeast display).
- the antibody libraries are expressed in yeast.
- the polynucleotides are engineered to serve as templates that can be expressed in a cell-free extract.
- Vectors and extracts as described, for example in U.S. Pat. Nos. 5,324,637; 5,492,817; 5,665,563, (each incorporated by reference in its entirety) can be used and many are commercially available.
- Ribosome display and other cell-free techniques for linking a polynucleotide (i.e., a genotype) to a polypeptide (i.e., a phenotype) can be used, e.g., ProfusionTM (see, e.g., U.S. Pat. Nos. 6,348,315; 6,261,804; 6,258,558; and 6,214,553, each incorporated by reference in its entirety).
- the polynucleotides of the invention can be expressed in an E. coli expression system, such as that described by Pluckthun and Skerra. (Meth. Enzymol., 1989, 178: 476; Biotechnology, 1991, 9: 273, each incorporated by reference in its entirety).
- the mutant proteins can be expressed for secretion in the medium and/or in the cytoplasm of the bacteria, as described by Better and Horwitz, Meth. Enzymol., 1989, 178: 476, incorporated by reference in its entirety.
- the single domains encoding VH and VL are each attached to the 3′ end of a sequence encoding a signal sequence, such as the ompA, phoA or pelB signal sequence (Lei et al., J. Bacteriol., 1987, 169: 4379, incorporated by reference in its entirety).
- a signal sequence such as the ompA, phoA or pelB signal sequence
- These gene fusions are assembled in a dicistronic construct, so that they can be expressed from a single vector, and secreted into the periplasmic space of E. coli where they will refold and can be recovered in active form. (Skerra et al., Biotechnology, 1991, 9: 273, incorporated by reference in its entirety).
- antibody heavy chain genes can be concurrently expressed with antibody light chain genes to produce antibodies or antibody fragments.
- the antibody sequences are expressed on the membrane surface of a prokaryote, e.g., E. coli , using a secretion signal and lipidation moiety as described, e.g., in US20040072740; US20030100023; and US20030036092 (each incorporated by reference in its entirety).
- Higher eukaryotic cells such as mammalian cells, for example myeloma cells (e.g., NS/0 cells), hybridoma cells, Chinese hamster ovary (CHO), and human embryonic kidney (HEK) cells, can also be used for expression of the antibodies of the invention.
- mammalian cells for example myeloma cells (e.g., NS/0 cells), hybridoma cells, Chinese hamster ovary (CHO), and human embryonic kidney (HEK) cells
- mammalian cells e.g., myeloma cells (e.g., NS/0 cells), hybridoma cells, Chinese hamster ovary (CHO), and human embryonic kidney (HEK) cells
- Typically, antibodies expressed in mammalian cells are designed to be secreted into the culture medium, or expressed on the surface of the cell.
- the antibody or antibody fragments can be produced, for example, as intact antibody molecules or as individual VH and VL fragments, Fab
- antibodies can be expressed and screened by anchored periplasmic expression (APEx 2-hybrid surface display), as described, for example, in Jeong et al., PNAS, 2007, 104: 8247 (incorporated by reference in its entirety) or by other anchoring methods as described, for example, in Mazor et al., Nature Biotechnology, 2007, 25: 563 (incorporated by reference in its entirety).
- APIEx 2-hybrid surface display as described, for example, in Jeong et al., PNAS, 2007, 104: 8247 (incorporated by reference in its entirety) or by other anchoring methods as described, for example, in Mazor et al., Nature Biotechnology, 2007, 25: 563 (incorporated by reference in its entirety).
- antibodies can be selected using mammalian cell display (Ho et al., PNAS, 2006, 103: 9637, incorporated by reference in its entirety).
- binding activity can be evaluated by standard immunoassay and/or affinity chromatography.
- Screening of the antibodies of the invention for catalytic function, e.g., proteolytic function can be accomplished using a standard assays, e.g., the hemoglobin plaque assay as described in U.S. Pat. No. 5,798,208 (incorporated by reference in its entirety).
- Determining the ability of candidate antibodies to bind therapeutic targets can be assayed in vitro using, e.g., a BIACORETM instrument, which measures binding rates of an antibody to a given target or antigen based on surface plasmon resonance.
- In vivo assays can be conducted using any of a number of animal models and then subsequently tested, as appropriate, in humans. Cell-based biological assays are also contemplated.
- the antibody library can be expressed in yeast, which have a doubling time of less than about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 hours.
- the doubling times are about 1 to about 3 hours, about 2 to about 4, about 3 to about 8 hours, about 3 to about 24, about 5 to about 24, about 4 to about 6 about 5 to about 22, about 6 to about 8, about 7 to about 22, about 8 to about 10 hours, about 7 to about 20, about 9 to about 20, about 9 to about 18, about 11 to about 18, about 11 to about 16, about 13 to about 16, about 16 to about 20, or about 20 to about 30 hours.
- the antibody library is expressed in yeast with a doubling time of about 16 to about 20 hours, about 8 to about 16 hours, or about 4 to about 8 hours.
- the antibody library of the instant invention can be expressed and screened in a matter of hours, as compared to previously known techniques which take several days to express and screen antibody libraries.
- a limiting step in the throughput of such screening processes in mammalian cells is simply the time required to iteratively regrow populations of isolated cells, which, in some cases, have doubling times greater than the doubling times of the yeast used in the current invention.
- the composition of a library may be defined after one or more enrichment steps (for example by screening for antigen binding, or other properties).
- a library with a composition comprising about x % sequences or libraries of the invention may be enriched to contain about 2x %, 3x %, 4x %, 5x %, 6x %, 7x %, 8x %, 9x %, 10x %, 20x %, 25x %, 40x %, 50x %, 60x % 75x %, 80x %, 90x %, 95x %, or 99x % sequences or libraries of the invention, after one or more screening steps.
- sequences or libraries of the invention may be enriched about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1,000-fold, or more, relative to their occurrence prior to the one or more enrichment steps.
- a library may contain at least a certain number of a particular type of sequence(s), such as CDRH3s, CDRL3s, heavy chains, light chains, or whole antibodies (e.g., at least about 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 ).
- a particular type of sequence(s) such as CDRH3s, CDRL3s, heavy chains, light chains, or whole antibodies (e.g., at least about 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , 10 19 , or 10 20 ).
- these sequences may be enriched during one or more enrichment steps, to provide libraries comprising at least about 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , 10 11 , 10 12 , 10 13 , 10 14 , 10 15 , 10 16 , 10 17 , 10 18 , or 10 19 of the respective sequence(s).
- antibody leads can be identified through a selection process that involves screening the antibodies of a library of the invention for binding to one or more antigens, or for a biological activity.
- the coding sequences of these antibody leads may be further mutagenized in vitro or in vivo to generate secondary libraries with diversity introduced in the context of the initial antibody leads.
- the mutagenized antibody leads can then be further screened for binding to target antigens or biological activity, in vitro or in vivo, following procedures similar to those used for the selection of the initial antibody lead from the primary library.
- Such mutagenesis and selection of primary antibody leads effectively mimics the affinity maturation process naturally occurring in a mammal that produces antibodies with progressive increases in the affinity to an antigen.
- light chain shuffling may be used as part of the affinity maturation protocol. In certain embodiments, this may involve pairing one or more heavy chains with a number of light chains, to select light chains that enhance the affinity and/or biological activity of an antibody.
- the number of light chains to which the one or more heavy chains can be paired is at least about 2, 5, 10, 100, 1000, 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , or 10 10 .
- these light chains are encoded by plasmids. In other embodiments of the invention, the light chains may be integrated into the genome of the host cell.
- the coding sequences of the antibody leads may be mutagenized by a wide variety of methods.
- methods of mutagenesis include, but are not limited to site-directed mutagenesis, error-prone PCR mutagenesis, cassette mutagenesis, and random PCR mutagenesis.
- oligonucleotides encoding regions with the desired mutations can be synthesized and introduced into the sequence to be mutagenized, for example, via recombination or ligation.
- Site-directed mutagenesis or point mutagenesis may be used to gradually change the CDR sequences in specific regions. This may be accomplished by using oligonucleotide-directed mutagenesis or PCR. For example, a short sequence of an antibody lead may be replaced with a synthetically mutagenized oligonucleotide in either the heavy chain or light chain region, or both. The method may not be efficient for mutagenizing large numbers of CDR sequences, but may be used for fine tuning of a particular lead to achieve higher affinity toward a specific target protein.
- Cassette mutagenesis may also be used to mutagenize the CDR sequences in specific regions.
- a sequence block, or a region, of a single template is replaced by a completely or partially randomized sequence.
- the maximum information content that can be obtained may be statistically limited by the number of random sequences of the oligonucleotides. Similar to point mutagenesis, this method may also be used for fine tuning of a particular lead to achieve higher affinity towards a specific target protein.
- Error-prone PCR may be used to mutagenize the CDR sequences by following protocols described in Caldwell and Joyce, PCR Methods and Applications, 1992, 2: 28; Leung et al., Technique, 1989, 1: 11; Shafikhani et al., Biotechniques, 1997, 23: 304; and Stemmer et al., PNAS, 1994, 91: 10747 (each of which is incorporated by reference in its entirety).
- Conditions for error prone PCR may include (a) high concentrations of Mn 2+ (e.g., about 0.4 to about 0.6 mM) that efficiently induces malfunction of Taq DNA polymerase; and (b) a disproportionally high concentration of one nucleotide substrate (e.g., dGTP) in the PCR reaction that causes incorrect incorporation of this high concentration substrate into the template and produces mutations. Additionally, other factors such as, the number of PCR cycles, the species of DNA polymerase used, and the length of the template, may affect the rate of misincorporation of “wrong” nucleotides into the PCR product.
- kits may be utilized for the mutagenesis of the selected antibody library, such as the “Diversity PCR random mutagenesis kit” (CLONTECHTM).
- the primer pairs used in PCR-based mutagenesis may, in certain embodiments, include regions matched with the homologous recombination sites in the expression vectors. This design allows facile re-introduction of the PCR products back into the heavy or light chain chassis vectors, after mutagenesis, via homologous recombination.
- PCR-based mutagenesis methods can also be used, alone or in conjunction with the error prone PCR described above.
- the PCR amplified CDR segments may be digested with DNase to create nicks in the double stranded DNA. These nicks can be expanded into gaps by other exonucleases such as Bal 31. The gaps may then be filled by random sequences by using DNA Klenow polymerase at a low concentration of regular substrates dGTP, dATP, dTTP, and dCTP with one substrate (e.g., dGTP) at a disproportionately high concentration. This fill-in reaction should produce high frequency mutations in the filled gap regions.
- These method of DNase digestion may be used in conjunction with error prone PCR to create a high frequency of mutations in the desired CDR segments.
- the CDR or antibody segments amplified from the primary antibody leads may also be mutagenized in vivo by exploiting the inherent ability of mutation in pre-B cells.
- the Ig genes in pre-B cells are specifically susceptible to a high-rate of mutation.
- the Ig promoter and enhancer facilitate such high rate mutations in a pre-B cell environment while the pre-B cells proliferate.
- CDR gene segments may be cloned into a mammalian expression vector that contains a human Ig enhancer and promoter. This construct may be introduced into a pre-B cell line, such as 38B9, which allows the mutation of the VH and VL gene segments naturally in the pre-B cells (Liu and Van Ness, Mol.
- the mutagenized CDR segments can be amplified from the cultured pre-B cell line and re-introduced back into the chassis-containing vector(s) via, for example, homologous recombination.
- a CDR “hit” isolated from screening the library can be re-synthesized, using degenerate codons or trinucleotides, and re-cloned into the heavy or light chain vector using gap repair.
- a library of the invention comprises a designed, non-random repertoire wherein the theoretical diversity of particular components of the library (for example, CDRH3), but not necessarily all components or the entire library, can be over-sampled in a physical realization of the library, at a level where there is a certain degree of statistical confidence (e.g., 95%) that any given member of the theoretical library is present in the physical realization of the library at least at a certain frequency (e.g., at least once, twice, three times, four times, five times, or more) in the library.
- a certain degree of statistical confidence e.g., 95%
- M is the maximum number of theoretical library members that can be feasibly physically realized
- M/3 is the maximum theoretical repertoire size for which one can be 95% confident that any given member of the theoretical library will be sampled. It is important to note that there is a difference between a 95% chance that a given member is represented and a 95% chance that every possible member is represented.
- the instant invention provides a rationally designed library with diversity so that any given member is 95% likely to be represented in a physical realization of the library.
- the library is designed so that any given member is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% 99%, 99.5%, or 99.9% likely to be represented in a physical realization of the library.
- any given member is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% 99%, 99.5%, or 99.9% likely to be represented in a physical realization of the library.
- a library may have a theoretical total diversity of X unique members and the physical realization of the theoretical total diversity may contain at least about 1 ⁇ , 2 ⁇ , 3 ⁇ , 4 ⁇ , 5 ⁇ , 6 ⁇ , 7 ⁇ , 8 ⁇ 9 ⁇ , 10 ⁇ , or more members.
- the physical realization of the theoretical total diversity may contain about 1 ⁇ to about 2 ⁇ , about 2 ⁇ to about 3 ⁇ , about 3 ⁇ to about 4 ⁇ , about 4 ⁇ to about 5 ⁇ , about 5 ⁇ to about 6 ⁇ members.
- the physical realization of the theoretical total diversity may contain about 1 ⁇ to about 3 ⁇ , or about 3 ⁇ to about 5 ⁇ total members.
- the invention relates to a polynucleotide that hybridizes with a polynucleotide taught herein, or that hybridizes with the complement of a polynucleotide taught herein.
- a polynucleotide that remains hybridized after hybridization and washing under low, medium, or high stringency conditions to a polynucleotide taught herein or the complement of a polynucleotide taught herein is encompassed by the present invention.
- Exemplary moderate stringency conditions include hybridization in about 40% to about 45% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.5 ⁇ to about 1 ⁇ SSC at abut 55° C. to about 60° C.
- Exemplary high stringency conditions include hybridization in about 50% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.1 ⁇ SSC at about 60° C. to about 65° C.
- wash buffers may comprise about 0.1% to about 1% SDS.
- the duration of hybridization is generally less than about 24 hours, usually about 4 to about 12 hours.
- the libraries of the current invention are distinguished, in certain embodiments, by their human-like sequence composition and length, and the ability to generate a physical realization of the library which contains all members of (or, in some cases, even oversamples) a particular component of the library.
- Libraries comprising combinations of the libraries described herein (e.g., CDRH3 and CDRL3 libraries) are encompassed by the invention.
- Sub-libraries comprising portions of the libraries described herein are also encompassed by the invention (e.g., a CDRH3 library in a particular heavy chain chassis or a sub-set of the CDRH3 libraries).
- each of the libraries described herein has several components (e.g., CDRH3, VH, CDRL3, VL, etc.), and that the diversity of these components can be varied to produce sub-libraries that fall within the scope of the invention.
- libraries containing one of the libraries or sub-libraries of the invention also fall within the scope of the invention.
- one or more libraries or sub-libraries of the invention may be contained within a larger library, which may include sequences derived by other means, for example, non-human or human sequence derived by stochastic or semi-stochastic synthesis.
- at least about 1% of the sequences in a polynucleotide library may be those of the invention (e.g., CDRH3 sequences, CDRL3 sequences, VH sequences, VL sequences), regardless of the composition of the other 99% of sequences.
- At least about 0.001%, 0.01%, 0.1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% of the sequences in any polynucleotide library may be those of the invention, regardless of the composition of the other sequences.
- the sequences of the invention may comprise about 0.001% to about 1%, about 1% to about 2%, about 2% to about 5%, about 5% to about 10%, about 10% to about 15%, about 15% to about 20%, about 20% to about 25%, about 25% to about 30%, about 30% to about 35%, about 35% to about 40%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 60%, about 60% to about 65%, about 65% to about 70%, about 70% to about 75%, about 75% to about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, or about 95% to about 99% of the sequences in any polynucleotide library, regardless of the composition of the other sequences.
- libraries more diverse than one or more libraries or sub-libraries of the invention but yet still comprising one or more libraries or sub-libraries of the invention, in an amount in which the one or more libraries or sub-libraries of the invention can be effectively screened and from which sequences encoded by the one or more libraries or sub-libraries of the invention can be isolated, also fall within the scope of the invention.
- the amino acid products of a library of the invention may be displayed on an alternative scaffold.
- a CDRH3 or CDRL3 may be displayed on an alternative scaffold.
- Exemplary alternative scaffolds include those derived from fibronectin (e.g., AdNectin), the ⁇ -sandwich (e.g., iMab), lipocalin (e.g., Anticalin), EETI-II/AGRP, BPTI/LACI-D1/ITI-D2 (e.g., Kunitz domain), thioredoxin (e.g., peptide aptamer), protein A (e.g., Affibody), ankyrin repeats (e.g., DARPin), ⁇ B-crystallin/ubiquitin (e.g., Affilin), CTLD 3 (e.g., Tetranectin), and (LDLR-A module) 3 (e.g., Avimers).
- fibronectin e.g., AdNectin
- the ⁇ -sandwich e.g., iMab
- lipocalin e.g., Antical
- the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 10 7 to 10 8 polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.
- the invention comprises a synthetic preimmune human antibody CDRH3 library comprising polynucleotide sequences encoding CDRH3 represented by the following formula:
- [G/D/E/-] is zero to one amino acids in length
- [N1] is zero to three amino acids
- [DH] is three to ten amino acids in length
- [N2] is zero to three amino acids in length
- [H3-JH] is two to nine amino acids in length.
- [G/D/E/-] is represented by an amino acid sequence selected from the group consisting of: G, D, E, and nothing.
- [N1] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/A/V/T), (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.
- [N2] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/A/V/T), (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.
- [DH] comprises a sequence selected from the group consisting of: IGHD3-10 reading frame 1 (SEQ ID NO: 1), IGHD3-10 reading frame 2 (SEQ ID NO: 2), IGHD3-10 reading frame 3 (SEQ ID NO: 3), IGHD3-22 reading frame 2 (SEQ ID NO: 4), IGHD6-19 reading frame 1 (SEQ ID NO: 5), IGHD6-19 reading frame 2 (SEQ ID NO: 6), IGHD6-13 reading frame 1 (SEQ ID NO: 7), IGHD6-13 reading frame 2 (SEQ ID NO: 8), IGHD3-03 reading frame 3 (SEQ ID NO: 9), IGHD2-02 reading frame 2 (SEQ ID NO: 10), IGHD2-02 reading frame 3 (SEQ ID NO: 11), IGHD4-17 reading frame 2 (SEQ ID NO: 12), IGHD1-26 reading frame 1 (SEQ ID NO: 13), IGHD1-26 reading frame 3 (SEQ ID NO: 14), IGHD5-5/5-18 reading
- [H3-JH] comprises a sequence selected from the group consisting of: AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, AFDV (SEQ ID NO: 19), FDV, DV, YFDY (SEQ ID NO: 20), FDY, DY, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, YYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 17), EYFQ
- sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 3 to about 26 amino acids in length.
- sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 7 to about 23 amino acids in length.
- the library comprises about 10 7 to about 10 10 sequences.
- the library comprises about 10 7 sequences.
- the polynucleotide sequences of the libraries further comprise a 5′ polynucleotide sequence encoding a framework 3 (FRM3) region on the corresponding N-terminal end of the library sequence, wherein the FRM3 region comprises a sequence of about 1 to about 9 amino acid residues.
- FRM3 region comprises a sequence of about 1 to about 9 amino acid residues.
- the FRM3 region comprises a sequence selected from the group consisting of CAR, CAK, and CAT.
- the polynucleotide sequences further comprise a 3′ polynucleotide sequence encoding a framework 4 (FRM4) region on the corresponding C-terminal end of the library sequence, wherein the FRM4 region comprises a sequence of about 1 to about 9 amino acid residues.
- FRM4 region comprises a sequence of about 1 to about 9 amino acid residues.
- the library comprises a FRM4 region comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).
- the polynucleotide sequences further comprise an FRM3 region coding for a corresponding polypeptide sequence comprising a sequence selected from the group consisting of CAR, CAK, and CAT; and an FRM4 region coding for a corresponding polypeptide sequence comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).
- the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a heavy chain chassis.
- the invention comprises a synthetic preimmune human antibody light chain library comprising polynucleotide sequences encoding human antibody kappa light chains represented by the formula:
- [IGKV (1-95)] is selected from the group consisting of IGKV3-20 (SEQ ID NO: 237) (1-95), IGKV1-39 (SEQ ID NO: 233) (1-95), IGKV3-11 (SEQ ID NO: 235) (1-95), IGKV3-15 (SEQ ID NO: 236) (1-95), IGKV1-05 (SEQ ID NO: 229) (1-95), IGKV4-01 (1-95), IGKV2-28 (SEQ ID NO: 234) (1-95), IGKV 1-33 (1-95), IGKV1-09 (SEQ ID NO: 454) (1-95), IGKV1-12 (SEQ ID NO: 230) (1-95), IGKV2-30 (SEQ ID NO: 467) (1-95), IGKV1-27 (SEQ ID NO: 231) (1-95), IGKV1-16 (SEQ ID NO: 456) (1-95), and truncations of said group up
- [F/L/I/R/W/Y] is an amino acid selected from the group consisting of F, L, I, R, W, and Y.
- [JK] comprises a sequence selected from the group consisting of TFGQGTKVEIK (SEQ ID NO: 528) and TFGGGT (SEQ ID NO: 529).
- the light chain library comprises a kappa light chain library.
- the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a light chain chassis.
- the invention comprises a method for producing a synthetic preimmune human antibody CDRH3 library comprising 10 7 to 10 8 polynucleotide sequences, said method comprising:
- the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 10 7 to 10 10 polynucleotide sequences representative of known human IGHD and IGHJ germline sequences encoding CDRH3, represented by the following formula:
- the invention comprises a synthetic preimmune human antibody heavy chain variable domain library comprising 10 7 to 10 10 polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:
- the synthetic preimmune human antibody heavy chain variable domain library is expressed as a full length chain selected from the group consisting of an IgG1 full length chain, an IgG2 full length chain, an IgG3 full length chain, and an IgG4 full length chain.
- the human antibody heavy chain chassis is selected from the group consisting of IGHV4-34 (SEQ ID NO: 35), IGHV3-23 (SEQ ID NO: 30), IGHV5-51 (SEQ ID NO: 40), IGHV1-69 (SEQ ID NO: 27), IGHV3-30 (SEQ ID NO: 31), IGHV4-39 (SEQ ID NO: 36), IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV2-5 (SEQ ID NO: 429), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV6-1 (SEQ ID NO: 449), IGHV1-46 (SEQ ID NO: 26), IGHV3-33 (SEQ ID NO: 32), IGHV4-31 (SEQ ID NO: 34), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-61 (SEQ ID NO: 38
- the synthetic preimmune human antibody heavy chain variable domain library comprises 10 7 to 10 10 polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:
- the polynucleotide sequences are single-stranded coding polynucleotide sequences.
- the polynucleotide sequences are single-stranded non-coding polynucleotide sequences.
- the polynucleotide sequences are double-stranded polynucleotide sequences.
- the invention comprises a population of replicable cells with a doubling time of four hours or less, in which a synthetic preimmune human antibody repertoire is expressed.
- the population of replicable cells are yeast cells.
- the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.
- the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.
- the invention comprises a method of generating an antibody library comprising synthesizing polynucleotide sequences by split-pool DNA synthesis.
- the polynucleotide sequences are selected from the group consisting of single-stranded coding polynucleotide sequences, single-stranded non-coding polynucleotide sequences, and double-stranded polynucleotide sequences.
- the invention comprises a synthetic full-length preimmune human antibody library comprising about 10 7 to about 10 10 polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.
- the invention comprises a method of selecting an antibody of interest from a human antibody library, comprising providing a synthetic preimmune human antibody CDRH3 library comprising a theoretical diversity of (N) polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences, wherein the physical realization of that diversity is an actual library of a size at least 3(N), thereby providing a 95% probability that a single antibody of interest is present in the library, and selecting an antibody of interest.
- the theoretical diversity is about 10 7 to about 10 8 polynucleotide sequences.
- the practice of the present invention employs, unless otherwise indicated, conventional techniques of chemistry, molecular biology, recombinant DNA technology, PCR technology, immunology (especially, e.g., antibody technology), expression systems (e.g., yeast expression, cell-free expression, phage display, ribosome display, and PROFUSIONTM), and any necessary cell culture that are within the skill of the art and are explained in the literature. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989); DNA Cloning , Vols. 1 and 2, (D. N. Glover, Ed. 1985); Oligonucleotide Synthesis (M. J. Gait, Ed.
- VH chassis sequences were selected by examining collections of human IGHV germline sequences (Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Tomlinson et al., J. Mol. Biol., 1992, 227: 799; Matsuda et al., J. Exp. Med., 1998, 188: 2151, each incorporated by reference in its entirety). As discussed in the Detailed Description, as well as below, a variety of criteria can be used to select VH chassis sequences, from these data sources or others, for inclusion in the library.
- 17 germline sequences were chosen for representation in the VH chassis of the library (Table 4). As described in more detail below, these sequences were selected based on their relatively high representation in the peripheral blood of adults, with consideration given to the structural diversity of the chassis and the representation of particular germline sequences in antibodies used in the clinic. These 17 sequences account for about 76% of the total sample of heavy chain sequences used to derive the results of Table 4. As outlined in the Detailed Description, these criteria are non-limiting, and one of ordinary skill in the art will readily recognize that a variety of other criteria can be used to select the VH chassis sequences, and that the invention is not limited to a library comprising the 17 VH chassis genes presented in Table 4.
- the four chosen VH1 chassis represent about 80% of the VH1 repertoire.
- the six chosen VH3 chassis account for about 70% of the VH3 repertoire.
- VH4-31 25 7 16 Among highest usage in VH4 family VH4-34 125 5 16 Highest usage in VH4 family VH4-39 63 7 16 Among highest usage in VH4 family VH4-59 51 5 16 Among highest usage in VH4 family VH4-61 23 7 16 Among highest usage in VH4 family VH4-B 7 6 16 Not among highest usage in VH4 family, but has unique structure (H1 of length 6).
- the 6 chosen VH4 chassis account for close to 90% of the VH4 family repertoire VH5-51 52 5 17 High usage
- VH chassis derived from sequences in the IGHV2, IGHV6 and IGHV7 germline families were not included. As described in the Detailed Description, this exemplification is not meant to be limiting, as, in some embodiments, it may be desirable to include one or more of these families, particularly as clinical information on antibodies with similar sequences becomes available, to produce libraries with additional diversity that is potentially unexplored, or to study the properties and potential of these IGHV families in greater detail.
- the modular design of the library of the present invention readily permits the introduction of these, and other, VH chassis sequences.
- the amino acid sequences of the VH chassis utilized in this particular embodiment of the library, which are derived from the IGHV germline sequences, are presented in Table 5. The details of the derivation procedures are presented below.
- the modification to RA was made so that no unique sequence stretches of up to about 20 amino acids are created. Without being bound by theory, this modification is expected to reduce the odds of introducing novel T-cell epitopes in the VH3-15 derived chassis sequence.
- the avoidance of T cell epitopes is an additional criterion that can be considered in the design of certain libraries of the invention. 2
- the original NHS motif in VH4-34 was mutated to DHS, in order to remove a possible N-linked glycosylation site in CDR-H2. In certain embodiments of the invention, for example, if the library is transformed into yeast, this may prevent unwanted N-linked glycosylation.
- Table 5 provides the amino acid sequences of the seventeen chassis.
- most of the corresponding germline nucleotide sequences include two additional nucleotides on the 3′ end (i.e., two-thirds of a codon). In most cases, those two nucleotides are GA.
- nucleotides are added to the 3′ end of the IGHV-derived gene segment in vivo, prior to recombination with the IGHD gene segment. Any additional nucleotide would make the resulting codon encode one of the following two amino acids: Asp (if the codon is GA C or GA T ) or Glu (if the codon is GA A or GA G ).
- One, or both, of the two 3′-terminal nucleotides may also be deleted in the final rearranged heavy chain sequence. If only the A is deleted, the resulting amino acid is very frequently a G. If both nucleotides are deleted, this position is “empty,” but followed by a general V-D addition or an amino acid encoded by the IGHD gene. Further details are presented in Example 5.
- This first position, after the CAR or CAK motif at the C-terminus of FRM3 (Table 5), is designated the “tail.” In the currently exemplified embodiment of the library, this residue may be G, D, E, or nothing.
- adding the tail to any chassis enumerated above (Table 5) can produce one of the following four schematic sequences, wherein the residue following the VH chassis is the tail:
- VH3-66, with canonical structure 1-1 may be included in the library.
- the inclusion of VH3-66 may compensate for the removal of other chassis from the library, which may not express well in yeast under some conditions (e.g., VH4-34 and VH4-59).
- This example demonstrates the introduction of further diversity into the VH chassis by creating mutations in the CDRH1 and CDRH2 regions of each chassis shown in Example 1.
- the following approach was used to select the positions and nature of the amino acid variation for each chassis:
- about 200 sequences in the data set exhibited greatest identity to the IGHV1-69 germline, indicating that they were likely to have been derived from IGHV1-69.
- the original germline sequence is provided in the second row of the tables, in bold font, beneath the residue number (Kabat system).
- the entries in the table indicate the number of times a given amino acid residue (first column) is observed at the indicated CDRH1 (Table 6) or CDRH2 (Table 7) position.
- CDRH1 Table 6
- CDRH2 Table 7
- the amino acid type G glycine
- variants were constructed with N at position 31, L at position 32 (H can be charged, under some conditions), G and T at position 33, no variants at position 34 and N at position 35, resulting in the following VH1-69 chassis CDRH1 single-amino acid variant sequences:
- VK chassis library This example describes the design of an exemplary VK chassis library.
- One of ordinary skill in the art will recognize that similar principles may be used to design a V ⁇ library, or a library containing both VK and V ⁇ chassis.
- Design of a V ⁇ chassis library is presented in Example 4.
- IGKV germline genes (bolded in column 6 of Table 9) account for just over 90% of the usage of the entire repertoire in peripheral blood. From the analysis of Table 9, ten IGKV germline genes were selected for representation as chassis in the currently exemplified library (Table 10). All but V1-12 and V1-27 are among the top 10 most commonly occurring. IGKV germline genes VH2-30, which was tenth in terms of occurrence in peripheral blood, was not included in the currently exemplified embodiment of the library, in order to maintain the proportion of chassis with short (i.e., 11 or 12 residues in length) CDRL1 sequences at about 80% in the final set of 10 chassis. V1-12 was included in its place.
- V1-17 was more similar to other members of the V1 family that were already selected; therefore, V1-27 was included, instead of V1-17.
- the library could include 12 chassis (e.g., the ten of Table 10 plus V1-17 and V2-30), or a different set of any “N” chassis, chosen strictly by occurrence (Table 9) or any other criteria.
- the ten chosen VK chassis account for about 80% of the usage in the data set believed to be representative of the entire kappa light chain repertoire.
- VK Chassis Selected for Use in the Exemplary Library Estimated Relative CDR-L1 CDR-L2 Canonical Occurrence in Chassis Length Length Structures Peripheral Blood VK1-5 11 7 2-1-(U) 69 VK1-12 11 7 2-1-(1) 32 VK1-27 11 7 2-1-(1) 27 VK1-33 11 7 2-1-(1) 43 VK1-39 11 7 2-1-(1) 147 VK2-28 16 7 4-1-(1) 62 VK3-11 11 7 2-1-(1) 87 VK3-15 11 7 2-1-(1) 53 VK3-20 12 7 6-1-(1) 195 VK4-1 17 7 3-1-(1) 83
- the VK chassis is defined as Kabat residues 1 to 88 of the IGKV-encoded sequence, or from the start of FRM1 to the end of FRM3.
- the portion of the VKCDR3 sequence contributed by the IGKV gene is referred to herein as the L3-KV region.
- V ⁇ chassis library This example, describes the design of an exemplary V ⁇ chassis library.
- VH and VK chassis sequences the sequence characteristics and occurrence of human Ig ⁇ V germline-derived sequences in peripheral blood were analyzed.
- assignment of V ⁇ sequences to a germline family was performed via SoDA and VBASE2 (Volpe and Kepler, Bioinformatics, 2006, 22: 438; Mollova et al., BMS Systems Biology, 2007, 1S: P30, each incorporated by reference in its entirety). The data are presented in Table 12.
- VK chassis the portion of the IG ⁇ V gene contributing to V ⁇ CDR3 is not considered part of the chassis as described herein.
- the V ⁇ chassis is defined as Kabat residues 1 to 88 of the IG ⁇ V-encoded sequence, or from the start of FRM1 to the end of FRM3.
- the portion of the V ⁇ CDR3 sequence contributed by the IG ⁇ V gene is referred to herein as the L3-V ⁇ region.
- the CDRH3 sequence is derived from a complex process involving recombination of three different genes, termed IGHV, IGHD and IGHJ.
- these genes may also undergo progressive nucleotide deletions: from the 3′ end of the IGHV gene, either end of the IGHD gene, and/or the 5′ end of the IGHJ gene.
- Non-templated nucleotide additions may also occur at the junctions between the V, D and J sequences.
- Non-templated additions at the V-D junction are referred to as “N1”, and those at the D-J junction are referred to as “N2”.
- the D gene segments may be read in three forward and, in some cases, three reverse reading frames.
- the codon (nucleotide triplet) or single amino acid was designated as a fundamental unit, to maintain all sequences in the desired reading frame.
- all deletions or additions to the gene segments are carried out via the addition or deletion of amino acids or codons, and not single nucleotides.
- CDRH3 extends from amino acid number 95 (when present; see Example 1) to amino acid 102.
- selection of DH gene segments for use in the library was performed according to principles similar to those used for the selection of the chassis sequences.
- an analysis of IGHD gene usage was performed, using data from Lee et al., Immunogenetics, 2006, 57: 917; Corbett et al., PNAS, 1982, 79: 4118; and Souto-Cameiro et al., J. Immunol., 2004, 172: 6790 (each incorporated by reference in its entirety), with preference for representation in the library given to those IGHD genes most frequently observed in human sequences.
- the degree of deletion on either end of the IGHD gene segments was estimated by comparison with known heavy chain sequences, using the SoDA algorithm (Volpe et al., Bioinformatics, 2006, 22: 438, incorporated by reference in its entirety) and sequence alignments.
- SoDA algorithm Volpe et al., Bioinformatics, 2006, 22: 438, incorporated by reference in its entirety
- sequence alignments For the presently exemplified library, progressively deleted DH segments, as short as three amino acids, were included.
- other embodiments of the invention comprise DH segments with deletions to a different length, for example, about 1, 2, 4, 5, 6, 7, 8, 9, or 10 amino acids.
- Table 15 shows the relative occurrence of IGHD gene usage in human antibody heavy chain sequences isolated mainly from peripheral blood B cells (list adapted from Lee et al., Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety).
- IGHD3-10 117 IGHD3-22 111 IGHD6-19 95 IGHD6-13 93 IGHD3-3 82 IGHD2-2 63 IGHD4-17 61 IGHD1-26 51 IGHD5-5/5-18 1 49 IGHD2-15 47 IGHD6-6 38 IGHD3-9 32 IGHD5-12 29 IGHD5-24 29 IGHD2-21 28 IGHD3-16 18 IGHD4-23 13 IGHD1-1 9 IGHD1-7 9 IGHD4-4/4-11 2 7 IGHD1-20 6 IGHD7-27 6 IGHD2-8 4 IGHD6-25 3 1 Although distinct genes in the genome, the nucleotide sequences of IGHD5-5 and IGHD5-18 are 100% identical and thus indistinguishable in rearranged VH sequences.
- IGHD4-4 and IGHD4-11 are also 100% identical.
- 3 Adapted from Lee et al. Immunogenetics, 2006, 57: 917, by merging the information for distinct alleles of the same IGHD gene.
- *IGHD1-14 may also be included in the libraries of the invention.
- the top 10 IGHD genes most frequently used in heavy chain sequences occurring in peripheral blood were chosen for representation in the library.
- Other embodiments of the library could readily utilize more or fewer D genes.
- the amino acid sequences of the selected IGHD genes, including the most commonly used reading frames and the total number of variants after progressive N- and C-terminal deletion to a minimum of three residues, are listed in Table 17. As depicted in Table 17, only the most commonly occurring alleles of certain IGHD genes were included in the illustrative library. This is, however, not required, and other embodiments of the invention may utilize IGHD reading frames that occur less frequently in the peripheral blood.
- variants were generated by systematic deletion from the N- and/or C-termini, until there were three amino acids remaining.
- the full sequence DYGDY (SEQ ID NO: 12) may be used to generate the progressive deletion variants: DYGD (SEQ ID NO: 613), YGDY (SEQ ID NO: 614), DYG, GDY and YGD.
- DYGD SEQ ID NO: 613
- YGDY SEQ ID NO: 614
- the progressive deletions were limited, so as to leave the loop intact i.e., only amino acids N-terminal to the first Cys, or C-terminal to the second Cys, were deleted in the respective DH segment variants.
- the foregoing strategy was used to avoid the presence of unpaired cysteine residues in the exemplified version of the library.
- other embodiments of the library may include unpaired cysteine residues, or the substitution of these cysteine residues with other amino acids.
- the variants would be: GYCSSTSCYT (SEQ ID NO: 10), GYCSSTSCY (SEQ ID NO: 615), YCSSTSCYT (SEQ ID NO: 616), CSSTSCYT (SEQ ID NO: 617), GYCSSTSC (SEQ ID NO: 618), YCSSTSCY (SEQ ID NO: 619), CSSTSCY (SEQ ID NO: 620), YCSSTSC (SEQ ID NO: 621) and CSSTSC (SEQ ID NO: 622).
- 293 DH sequences were obtained from the selected IGHD gene segments, including the original IGHD gene segments. Certain sequences are redundant. For example, it is possible to obtain the YYY variant from either IGHD3-102 (full sequence YYY GSGSYYN (SEQ ID NO: 2)), or in two different ways from IGHD3-222 (SEQ ID NO: 4) ( YYY DSSG YYY ). When redundant sequences are removed, the number of unique DH segment sequences in this illustrative embodiment of the library is 278. These sequences are enumerated in Table 18.
- Table 19 shows the length distribution of the 278 DH segments selected according to the methods described above.
- IGHD-derived amino acids i.e., DH segments
- DH segments are numbered beginning with position 97, followed by positions 97A, 97B, etc.
- the shortest DH segment has three amino acids: 97, 97A and 97B, while the longest DH segment has 10 amino acids: 97, 97A, 97B, 97C, 97D, 97E, 97F, 97G, 97H and 97I.
- IGHJ genes There are six human germline IGHJ genes. During in vivo assembly of antibody genes, these segments are progressively deleted at their 5′ end. In this exemplary embodiment of the library, IGHJ gene segments with no deletions, or with 1, 2, 3, 4, 5, 6, or 7 deletions (at the amino acid level), yielding JH segments as short as 13 amino acids, were included (Table 20). Other embodiments of the invention, in which the IGHJ gene segments are progressively deleted (at their 5′/N-terminal end) to yield 15, 14, 12, or 11 amino acids are also contemplated.
- JH6_1 the contribution of, for example, JH6_1 to CDRH3, would be designated by positions 99F, 99E, 99D, 99C, 99B, 99A, 100, 101 and 102 (Y, Y, Y, Y, G, M, D and V, respectively).
- the JH4_3 sequence would contribute amino acid positions 101 and 102 (D and Y, respectively) to CDRH3.
- the JH segment will contribute amino acids 103 to 113 to the FRM4 region, in accordance with the standard Kabat numbering system for antibody variable regions (Kabat, op. cit. 1991). This may not be the case in other embodiments of the library.
- N1 and N2 segments located at the V-D and D-J junctions, respectively, were identified in a sample containing about 2,700 antibody sequences (Jackson et al., J. Immunol. Methods, 2007, 324: 26) also analyzed by the SoDA method of Volpe et al., Bioinformatics, 2006, 22: 438-44; (both Jackson et al., and Volpe et al., are incorporated by reference in their entireties). Examination of these sequences revealed patterns in the length and composition of N1 and N2.
- certain embodiments of the invention include N1 and N2 segments with rationally designed length and composition, informed by statistical biases in these parameters that are found by comparing naturally occurring N1 and N2 segments in human antibodies.
- N1 and N2 were fixed to a length of 0, 1, 2, or 3 amino acids. The naturally occurring composition of these sequences in human antibodies was used as a guide for the inclusion of different amino acid residues.
- the naturally occurring composition of single amino acid, two amino acids, and three amino acids N1 additions is defined in Table 21, and the naturally occurring composition of the corresponding N2 additions is defined in Table 22.
- the most frequently occurring duplets in the N1 and N2 set are compiled in Table 23.
- N1 segments located at the Junction between V and D, revealed that the eight most frequently occurring amino acid residues were G, R, S, P, L, A, T and V (Table 21).
- the number of amino acid additions in the N1 segment was frequently none, one, two, or three ( FIG. 2 ).
- the addition of four or more amino acids was relatively rare. Therefore, in the currently exemplified embodiment of the library, the N1 segments were designed to include zero, one, two or three amino acids. However, in other embodiments, N1 segments of four, five, or more amino acids may also be utilized. G and P were always among the most commonly occurring amino acid residues in the N1 regions.
- the N1 segments that are dipeptides are of the form GX, XG, PX, or XP, where X is any of the eight most commonly occurring amino acids listed above. Due to the fact that G residues were observed more frequently than P residues, the tripeptide members of the exemplary N1 library have the form GXG, GGX, or XGG, where X is, again, one of the eight most frequently occurring amino acid residues listed above.
- the resulting set of N1 sequences used in the present exemplary embodiment of the library include the “zero” addition amounts to 59 sequences, which are listed in Table 24.
- V segment joins directly to D segment 1 Monomers
- G, P, R, A, S, L, T, V 8 Dimers GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, 28 TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP Trimers GGG, CPC, CRC, GAG, CSC, CLC, CTC, CVC, PGG, 22 RCC, ACC, SCC, LCC, TGG, VCG, CCP, CCR, GGA, CCS, CCL, GGT, GGV
- the sequences enumerated in Table 24 contribute the following positions to CDRH3: the monomers contribute position 96, the dimers to 96 and 96A, and the trimers to 96, 96A and 96B.
- the corresponding numbers would go on to include 96C, and so on.
- N2 segments located at the junction between D and J, revealed that the eight most frequently occurring amino acid residues were also G, R, S, P, L, A, T and V (Table 22). The number of amino acid additions in the N2 segment was also frequently none, one, two, or three ( FIG. 2 ).
- an expanded set of sequences was utilized for the design of the N2 segments in the exemplary library. Specifically, the sequences in Table 25 were used, in addition to the 59 sequences enumerated in Table 24, for N1.
- the presently exemplified embodiment of the library therefore, contains 141 total N2 sequences, including the “zero” state.
- these 141 sequences may also be used in the N1 region, and that such embodiments are within the scope of the invention.
- the length and compositional diversity of the N1 and N2 sequences can be further increased by utilizing amino acids that occur less frequently than G, R, S, P, L, A, T and V, in the N1 and N2 regions of naturally occurring antibodies, and including N1 and N2 segments of four, five, or more amino acids in the library.
- Tables 21 to 23 and FIG. 2 provides information about the composition and length of the N1 and N2 sequences in naturally occurring antibodies that is useful for the design of additional N1 and N2 regions which mimic the natural composition and length.
- N2 sequences will begin at position 98 (when present) and extend to 98A (dimers) and 98B (trimers). Alternative embodiments may occupy positions 98C, 98D, and so on.
- the CDRH3 in the exemplified library may be represented by the general formula:
- [G/D/E/-] represents each of the four possible terminal amino acid “tails”; N1 can be any of the 59 sequences in Table 24; DH can be any of the 278 sequences in Table 18; N2 can be any of the 141 sequences in Tables 24 and 25; and H3-JH can be any of the 28 H3-JH sequences in Table 20.
- the tail and N1 segments were combined, and redundancies were removed from the library.
- the sequence [VH_Chassis]-[G] may be obtained in two different ways: [VH_Chassis]+[G]+[nothing] or [VH_Chassis]+[nothing]+[G]. Removal of redundant sequences resulted in a total of 212 unique [G/D/E/-]-[N1] segments out of the 236 possible combinations (i.e., 4 tails ⁇ 59 N1).
- FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in this library, versus the preimmune repertoire of Lee et al.
- Table 26 further illustrates specific exemplary sequences from the CDRH3 library described above, using the CDRH3 numbering system of the present application. In instances where a position is not used, the hyphen symbol (-) is included in the table instead.
- VKCDR3 libraries This example describes the design of a number of exemplary VKCDR3 libraries. As specified in the Detailed Description, the actual version(s) of the VKCDR3 library made or used in particular embodiments of the invention will depend on the objectives for the use of the library. In this example the Kabat numbering system for light chain variable regions was used.
- human kappa light chain sequences were obtained from the publicly available NCBI database (Appendix A).
- the heavy chain sequences (Example 2), each of the sequences obtained from the publicly available database was assigned to its closest germline gene, on the basis of sequence identity. The amino acid compositions at each position were then determined within each kappa light chain subset.
- This example describes the design of a “minimalist” VKCDR3 library, wherein the VKCDR3 repertoire is restricted to a length of nine residues.
- Examination of the VKCDR3 lengths of human sequences shows that a dominant proportion (over 70%) has nine amino acids within the Kabat definition of CDRL3: positions 89 through 97.
- the currently exemplified minimalist design considers only VKCDR3 of length nine.
- Examination of human kappa light chain sequences shows that there are not strong biases in the usage of IGKJ genes; there are five such IKJ genes in humans.
- Table 27 depicts IGKJ gene usage amongst three data sets, namely Juul et al. (Clin. Exp.
- five of the seven most commonly occurring amino acids found in position 96 of rearranged human sequences appear to originate from the first amino acid encoded by each of the five human IGKJ genes, namely, W, Y, F, L, and I.
- GGG Yielding GGG results in a codon encoding Gly (G).
- R occurs about ten times more often at position 96 in human sequences than G (when the IGKJ gene is IGKJ1 (SEQ ID NO: 552)), and it is encoded by CGG more often than AGG. Therefore, without being bound by theory, C may originate from one of the aforementioned two Cs at the end of IGKV gene.
- R and P are among the most frequently observed amino acid types at position 96, when the length of VKCDR3 is 9. Therefore, a minimalist VKCDR3 library may be represented by the following amino acid sequence:
- VK_Chassis represents any selected VK chassis (for non-limiting examples, see Table 11), specifically Kabat residues 1 to 88 encoded by the IGKV gene.
- L3-VK represents the portion of the VKCDR3 encoded by the chosen IGKV gene (in this embodiment, residues 89-95).
- F/L/I/R/W/Y/P represents any one of amino residues F, L, I, R, W, Y, or P.
- IKJ4 (minus the first residue) has been depicted.
- the GGG amino acid sequence is expected to lead to larger conformational flexibility than any of the alternative IGKJ genes, which contain a GXG amino acid sequence, where X is an amino acid other than G.
- one implementation of the minimalist VKCDR3 library would have 70 members resulting from the combination of 10 VK chassis by 7 junction (position 96) options and one IGKJ-derived sequence (e.g., IGKJ4 (SEQ ID NO: 555)).
- IGKJ4 SEQ ID NO: 555
- this embodiment of the library has been depicted using IGKJ4 (SEQ ID NO: 555)
- another embodiment of the library may have 350 members (10 VK chassis by 7 junctions by 5 IGKJ genes).
- minimalist VKCDR3 libraries may be constructed using any of the IGKJ genes. Using the notation above, these minimalist VKCDR3 libraries may have sequences represented by, for example:
- JK1 [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTKVEIK (SEQ ID NO: 528)]
- JK2 [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTKLEIK (SEQ ID NO: 842]
- JK3 [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGPGTKVDIK (SEQ ID NO: 843]
- JK5 [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTRLEIK (SEQ ID NO: 844].
- Example 6.2 A VKCDR3 Library of about 10s Complexity
- the nine residue VKCDR3 repertoire described in Example 6.1 is expanded to include VKCDR3 lengths of eight and ten residues.
- the previously enumerated VKCDR3 library included the VK chassis and portions of the IGKJ gene not contributing to VKCDR3
- the presently exemplified version focuses only on residues comprising a portion of VKCDR3. This embodiment may be favored, for example, when recombination with a vector which already contains VK chassis sequences and constant region sequences is desired.
- VKCDR3 of lengths 8 and 10 represent, respectively, about 8.5% and about 16% of sequences in representative samples ( FIG. 3 ).
- a more complex VKCDR3 library includes CDR lengths of 8 to 10 amino acids; this library accounts for over 95% of the length distribution observed in typical collections of human VKCDR3 sequences.
- This library also enables the inclusion of additional variation outside of the junction between the VK and JK genes.
- the present example describes such a library.
- the library comprises 10 sub-libraries, each designed around one of the 10 exemplary VK chassis depicted in Table 11.
- M may be less than or more than 10.
- the library employs a practical and facile synthesis approach using standard oligonucleotide synthesis instrumentation and degenerate oligonucleotides.
- the IUPAC code for degenerate nucleotides as given in Table 31, will be used.
- the VKCDR3 library may be represented by the following four oligonucleotides (left column in Table 32), with the corresponding amino acids encoded at each position of CDRL3 (Kabat numbering) provided in the columns on the right.
- the first codon (CWG) of the first nucleotide of Table 32 corresponding to Kabat position 89, represents 50% CTG and 50% CAG, which encode Leu (L) and Gln (Q), respectively.
- the expressed polypeptide would be expected to have L and Q each about 50% of the time.
- the codon CBT represents 1 ⁇ 3 each of CCT, CGT and CTT, corresponding in turn to 1 ⁇ 3 each of Pro (P), Leu (L) and Arg (R) upon translation.
- the numbers are 864 for the first three oligonucleotides and 1,296 for the fourth oligonucleotide.
- the oligonucleotides encoding VK1-39 CDR3s of length nine contribute 3,888 members to the library.
- sequences with L or R at position 95A (when position 96 is empty) are identical to those with L or R at position 96 (and 95A empty). Therefore, the 3,888 number overestimates the LR contribution and the actual number of unique members is slightly lower, at 3,024.
- the overall complexity is about 1.3 ⁇ 10 5 or 1.2 ⁇ 10 5 unique sequences after correcting for over-counting of the LR contribution for the size 9 VKCDR3.
- This example demonstrates how a more faithful representation of amino acid variation at each position may be obtained by using a codon-based synthesis approach (Virnekas et al. Nucleic Acids Res., 1994, 22: 5600).
- This synthetic scheme also allows for finer control of the proportions of particular amino acids included at a position. For example, as described above for the VK1-39 sequences, position 89 was designed as 50% Q and 50% L; however, as Table 30 shows, Q is used much more frequently than L.
- the more complex VKCDR3 libraries of the present example account for the different relative occurrence of Q and L, for example, 90% Q and 10% L. Such control is better exercised within codon-based synthetic schemes, especially when multiple amino acid types are considered.
- This example also describes an implementation of a codon-based synthetic scheme, using the ten VK chassis described in Table 11. Similar approaches, of course, can be implemented with more or fewer such chassis. As indicated in the Detailed Description, a unique aspect of the design of the present libraries, as well as those of the preceding examples, is the germline or chassis-based aspect, which is meant to preserve more of the integrity and variation of actual human kappa light chain sequences.
- the library of Table 34 would have 1.37 ⁇ 10 6 unique polypeptide sequences, calculated by multiplying together the numbers in the bottom row of the table.
- the underlined 0 entries for Asn (N) at certain positions represent regions where the possibility of having N-linked glycosylation sites in the VKCDR3 has been minimized or eliminated.
- Peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P may undergo post-translational modification in a number of expression systems, including yeast and mammalian cells. Moreover, the nature of such modification depends on the specific cell type and, even for a given cell type, on culture conditions. N-linked glycosylation may be disadvantageous when it occurs in a region of the antibody molecule likely to be involved in antigen binding (e.g., a CDR), as the function of the antibody may then be influenced by factors that may be difficult to control.
- a CDR antigen binding
- N-linked glycosylation sites e.g., bacteria
- N-X-(S/T) sequences the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity.
- systems e.g., yeast, mammalian cells
- carbohydrate moieties in the variable domains, and the CDRs in particular may lead to unwanted modifications of activity.
- the total number of unique members in the VK1-39 library of length 8 thus, can be obtained as before, and is 3.73 ⁇ 10 5 (or, 3 ⁇ 3 ⁇ 4 ⁇ 6 ⁇ 8 ⁇ 8 ⁇ 9 ⁇ 3).
- the complexity of the VK1-39 library of length 10 would be 10.9 ⁇ 10 6 (or 8 times that of the library of size 9, as there is additional 8-fold variation at the insertion position 95A).
- the individual sub-libraries of lengths 8, 9 and 10 may be preferable to create the individual sub-libraries of lengths 8, 9 and 10 separately, and then mix the sub-libraries in proportions that reflect the length distribution of VKCDR3 in human sequences; for example, in ratios approximating the 1:9:2 distribution that occurs in natural VKCDR3 sequences (see FIG. 3 ).
- the present invention provides the compositions and methods for one of ordinary skill synthesizing VKCDR3 libraries corresponding to other VK chassis.
- This example describes the design of a minimalist V ⁇ CDR3 library.
- the principles used in designing this library are similar to those used to design the VKCDR3 libraries.
- the contribution of the Ig ⁇ V segment to CDRL3 is not constrained to a fixed number of amino acids. Therefore, length variation may be obtained in a minimalist V ⁇ CDR3 library even when only considering combinations between V ⁇ chassis and J ⁇ sequences.
- V ⁇ CDR3 lengths of human sequences shows that lengths of 9 to 12 account for almost about 95% of sequences, and lengths of 8 to 12 account for about 97% of sequences ( FIG. 4 ).
- Table 36 shows the usage (percent occurrence) of the six known IG ⁇ J genes in the rearranged human lambda light chain sequences compiled from the NCBI database (see Appendix B), and Table 37 shows the sequences encoded by the genes.
- IG ⁇ J3-01 and IG ⁇ J7-02 are not represented among the sequences that were analyzed; therefore, they were not included in Table 36.
- IG ⁇ J1-01, IG ⁇ J2-01, and IG ⁇ J3-02 are over-represented in their usage, and have thus been bolded in Table 37. In some embodiments of the invention, for example, only these three over-represented sequences may be utilized. In other embodiments of the invention, one may use all six segments, any 1, 2, 3, 4, or 5 of the 6 segments, or any combination thereof may be utilized.
- CDRL3 As shown in Table 14, the portion of CDRL3 contributed by the IG ⁇ V gene segment is 7, 8, or 9 amino acids.
- the remainder of CDRL3 and FRM4 are derived from the IG ⁇ J sequences (Table 37).
- the IG ⁇ J sequences contribute either one or two amino acids to CDRL3. If two amino acids are contributed by IG ⁇ J, the contribution is from the N-terminal two residues of the IG ⁇ J segment: YV (IG ⁇ J1-01), VV (IG ⁇ J2-01), WV (IG ⁇ J3-01), VV (IG ⁇ J3-02), or AV (IG ⁇ J7-01 and IG ⁇ J7-02). If one amino acid is contributed from IG ⁇ J, it is a V residue, which is formed after the deletion of the N-terminal residue of a IG ⁇ J segment.
- the FRM4 segment was fixed as FGGGTKLTVL, corresponding to IG ⁇ J2-01 and IG ⁇ J3-02 (i.e., portions of SEQ ID NOs: 558 and 560).
- the final set of chassis in the currently exemplified embodiment of the invention is 15: eleven contributed by the chassis in Table 14 and an additional four contributed by the chassis of Table 38.
- the corresponding L3-V ⁇ domains of the 15 chassis contribute from 7 to 10 amino acids to CDRL3.
- the 15 chassis are V ⁇ 1-40 (SEQ ID NO: 531), V ⁇ 1-44 (SEQ ID NO: 532), V ⁇ 1-51 (SEQ ID NO: 533), V ⁇ 2-14 (SEQ ID NO: 534), V ⁇ 3-1* (SEQ ID NO: 535), V ⁇ 3-19 (SEQ ID NO: 536), V ⁇ 3-21 (SEQ ID NO: 537), V ⁇ 4-69 (SEQ ID NO: 538), V ⁇ 6-57 (SEQ ID NO: 539), V ⁇ 5-45 (SEQ ID NO: 540), V ⁇ 7-43 (SEQ ID NO: 541), V ⁇ 1-40+(SEQ ID NO: 564), V ⁇ 3-19+(SEQ ID NO: 565), V ⁇ 3-21+(SEQ ID NO: 566), and V ⁇ 6-57+(SEQ ID NO: 567).
- the 5 IG ⁇ J-derived segments are YVFGGGTKLTVL (IG ⁇ J1; SEQ ID NO: 568), VVFGGGTKLTVL (IG ⁇ J2; SEQ ID NO: 558), WVFGGGTKLTVL (IG ⁇ J3; SEQ ID NO: 559), AVFGGGTKLTVL (IG ⁇ J7; SEQ ID NO: 569), and -VFGGGTKLTVL (from any of the preceding sequences).
- CDRH3 sequences of human antibodies of interest that are known in the art (e.g., antibodies that have been used in the clinic) have close counterparts in the designed library of the invention.
- a set of fifteen CDRH3 sequences from clinically relevant antibodies is presented in Table 39.
- Synthetic libraries produced by other (e.g., random or semi-random/biased) methods tend to have very large numbers of unique members.
- matches to a given input sequence for example, at 80% or greater
- the probability of synthesizing and then producing a physical realization of the theoretical library that contains such a sequence and then selecting an antibody corresponding to such a match may be remotely small.
- a CDRH3 of length 19 in the Knappik library may have over 10 19 distinct sequences.
- a tenth or so of the sequences may have length 19 and the largest total library may have in the order of 10 10 to 10 12 transformants; thus, the probability of a given pre-defined member being present, in practice, is effectively zero (less than one in ten million).
- Other libraries e.g., Enzelberger et al. WO2008053275 and Ladner US20060257937, each incorporated by reference in its entirety) suffer from at least one of the limitations described throughout this application.
- Custom Primer SupportTM 200 dT40S resin (GE Healthcare) was used to synthesize the oligonucleotides, using a loading of about 39 ⁇ mol/g of resin.
- a column bed volume of 30 ⁇ L was used in the synthesis, with 120 nmol of resin loaded in each column.
- Oligonucleotides were synthesized using a Dr. Oligo® 192 oligonucleotide synthesizer and standard phosphorothioate chemistry.
- oligonucleotide leader sequences containing a randomly chosen 10 nucleotide sequence (ATGCACAGTT; SEQ ID NO: 395), a BsrDI recognition site (GCAATG), and a two base “overlap sequence” (TG, AC, AG, CT, or GA) were synthesized. The purpose of each of these segments is explained below.
- the DH segments were synthesized; approximately 1 g of resin (with the 18 nucleotide segment still conjugated) was suspended in 20 mL of DCM/MeOH.
- the pooled resin (about 1.36 g) containing the 278 DH segments was subsequently suspended in about 17 mL of DCM/MeOH, and about 60 ⁇ L of the resulting slurry was distributed inside each of two sets of 141 columns.
- the 141 N2 segments enumerated in Tables 24 and 25 were then synthesized, in duplicate (282 total columns), 3′ to the 278 DH segments synthesized in the first step.
- the resin from the 282 columns was then pooled, washed, and dried, as described above.
- the pooled resin obtained from the N2 synthesis (about 1.35 g) was suspended in about 17 mL of DCM/MeOH, and about 60 ⁇ L of the resulting slurry was distributed inside each of 280 columns, representing 28 H3-JH segments synthesized ten times each. A portion (described more fully below) of each of the 28 IGHJ segments, including H3-JH of Table 20 were then synthesized, 3′ to the N2 segments, in ten of the columns. Final oligonucleotides were cleaved and deprotected by exposure to gaseous ammonia (85° C., 2 h, 60 psi).
- split pool synthesis was used to synthesize the exemplary CDRH3 library.
- the split pool synthesis described herein is, therefore, one possible means of obtaining the oligonucleotides of the library, but is not limiting.
- One other possible means of synthesizing the oligonucleotides described in this application is the use of trinucleotides. This may be expected to increase the fidelity of the synthesis, since frame shift mutants would be reduced or eliminated.
- This example outlines the procedures used to create exemplary CDRH3 and heavy chain libraries of the invention.
- a two step process was used to create the CDRH3 library. The first step involved the assembly of a set of vectors encoding the tail and N1 segments, and the second step involved utilizing the split pool nucleic acid synthesis procedures outlined in Example 9 to create oligonucleotides encoding the DH, N2, and H3-JH segments. The chemically synthesized oligonucleotides were then ligated into the vectors, to yield CDRH3 residues 95-102, based on the numbering system described herein.
- This CDRH3 library was subsequently amplified by PCR and recombined into a plurality of vectors containing the heavy chain chassis variants described in Examples 1 and 2.
- CDRH1 and CDRH2 variants were produced by QuikChange® Mutagenesis (StratageneTM), using the oligonucleotides encoding the ten heavy chain chassis of Example 1 as a template.
- the plurality of vectors contained the heavy chain constant regions (i.e., CH1, CH2, and CH3) from IgG1, so that a full-length heavy chain was formed upon recombination of the CDRH3 with the vector containing the heavy chain chassis and constant regions.
- the recombination to produce the full-length heavy chains and the expression of the full-length heavy chains were both performed in S. cerevisiae.
- a light chain protein was also expressed in the yeast cell.
- the light chain library used in this embodiment was the kappa light chain library, wherein the VKCDR3s were synthesized using degenerate oligonucleotides (see Example 6.2). Due to the shorter length of the oligonucleotides encoding the light chain library (in comparison to those encoding the heavy chain library), the light chain CDR3 oligonucleotides could be synthesized de novo, using standard procedures for oligonucleotide synthesis, without the need for assembly from sub-components (as in the heavy chain CDR3 synthesis).
- One or more light chains can be expressed in each yeast cell which expresses a particular heavy chain clone from a library of the invention.
- One or more light chains have been successfully expressed from both episomal (e.g., plasmid) vectors and from integrated sites in the yeast genome.
- the steps involved in the process may be generally characterized as (i) synthesis of 424 vectors encoding the tail and N1 regions; (ii) ligation of oligonucleotides encoding the [DH]-[N2]-[H3-JH]segments into these 424 vectors; (iii) PCR amplification of the CDRH3 sequences from the vectors produced in these ligations; and (iv) homologous recombination of these PCR-amplified CDRH3 domains into the yeast expression vectors containing the chassis and constant regions.
- This example demonstrates the synthesis of 424 vectors encoding the tail and N1 regions of CDRH3.
- the tail was restricted to G, D, E, or nothing, and the N1 region was restricted to one of the 59 sequences shown in Table 24.
- Table 24 As described throughout the specification, many other embodiments are possible.
- a single “base vector” (pJM204, a pUC-derived cloning vector) was constructed, which contained (i) a nucleic acid sequence encoding two amino acids that are common to the C-terminal portion of all 28 IGHJ segments (SS), and (ii) a nucleic acid sequence encoding a portion of the CH1 constant region from IgG1.
- the base vector contains an insert encoding a sequence that can be depicted as:
- SS is a common portion of the C-terminus of the 28 IGHJ segments and CH11 is a portion of the CH1 constant region from IgG1, namely:
- 424 different oligonucleotides were cloned into the base vector, upstream (i.e., 5′) from the region encoding the [SS]-[CH1 ⁇ ]. These 424 oligonucleotides were synthesized by standard methods and each encoded a C-terminal portion of one of the 17 heavy chain chassis enumerated in Table 5, plus one of four exemplary tail segments (G/D/E/-), and one of 59 exemplary N1 segments (Table 24). These 424 oligonucleotides, therefore, encode a plurality of sequences that may be represented by:
- ⁇ FRM3 represents a C-terminal portion of a FRM3 region from one of the 17 heavy chain chassis of Table 5
- G/D/E/- represents G, D, E, or nothing
- N1 represents one of the 59 N1 sequences enumerated in Table 24.
- the invention is not limited to the chassis exemplified in Table 5, their CDRH1 and CDRH2 variants (Table 8), the four exemplary tail options used in this example, or the 59 N1 segments presented in Table 24.
- the oligonucleotide sequences represented by the sequences above were synthesized in two groups: one group containing a ⁇ FRM3 region identical to the corresponding region on 16 of the 17 the heavy chain chassis enumerated in Table 5, and another group containing a ⁇ FRM3 region that is identical to the corresponding region on VH3-15.
- an oligonucleotide encoding DTAVYYCAR (SEQ ID NO: 397) was used for ⁇ FRM3.
- the V residue of VH5-51 was altered to an M, to correspond to the VH5-51 germline sequence.
- oligonucleotide encoding the sequence AISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAK (SEQ ID NO: 398) was used for ⁇ FRM3.
- Each of the two oligonucleotides encoding the ⁇ FRM3 regions were paired with oligonucleotides encoding one of the four tail regions (G/D/E/-) and one of the 59 N1 segments, yielding a total of 236 possible combinations for each ⁇ FRM3 (i.e., 1 ⁇ 4 ⁇ 59), or a total of 472 possible combinations when both ⁇ FRM3 sequences are considered.
- oligonucleotides encoding the [ ⁇ FRM3]-[G/D/E/-]-[N1] and [SS]-[CH1 ⁇ ] segments were cloned into the vector, as described above, additional sequences were added to the vector to facilitate the subsequent insertion of the oligonucleotides encoding the [DH]-[N2]-[H3-JH] fragments synthesized during the split pool synthesis.
- additional sequences comprise a polynucleotide encoding a selectable marker protein, flanked on each side by a recognition site for a type II restriction enzyme, for example:
- the selectable marker protein is ccdB and the type II restriction enzyme recognition sites are specific for BsrDI and BbsI.
- the ccdB protein is toxic, thereby preventing the growth of these bacteria when the gene is present.
- This example describes the cloning of the oligonucleotides encoding the [D]-[N2]-[H3-JH] segments (made via split pool synthesis; Example 9) into the 424 vectors produced in Example 10.1.
- the [DH]-[N2]-[H3-JH] oligonucleotides produced via split pool synthesis were amplified by PCR, to produce double-stranded oligonucleotides, to introduce restriction sites that would create overhangs complementary to those on the vectors (i.e., BsrDI and BbsI), and to complete the 3′ portion of the IGHJ segments that was not synthesized in the split pool synthesis.
- the amplified oligonucleotides were then digested with the restriction enzymes BsrDI (cleaves adjacent to the DH segment) and BbsI (cleaves near the end of the JH segment).
- the cleaved oligonucleotides were then purified and ligated into the 424 vectors which had previously been digested with BsrDI and BbsI. After ligation, the reactions were purified, ethanol precipitated, and resolubilized.
- the first 10 nucleotides represent a portion of a random sequence that is increased to 20 base pairs in the PCR amplification step, below. This portion of the sequence increases the efficiency of BsrDI digestion and facilitates the downstream purification of the oligonucleotides.
- Nucleotides 11-16 represent the BsrDI recognition site.
- the two base overlap sequence that follows this site was synthesized to be complementary to the two base overhang created by digesting certain of the 424 vectors with BsrDI (i.e., depending on the composition of the tail/N1 region of the particular vector).
- Other oligonucleotides contain different two-base overhangs, as described below.
- the two base overlap is followed by the DH gene segment (nucleotides 19-48), in this example, by a 30 bp sequence (TATTACTATGGATCTGGTTCTTACTATAAT, SEQ ID NO: 400) which encodes the ten residue DH segment YYYGSGSYYN (i.e., IGHD3-10_2 of Table 17; SEQ ID NO: 2).
- the region of the oligonucleotide encoding the DH segment is followed, in this example, by a nine base region (GTGGGCGGA; bold; nucleotides 49-57), encoding the N2 segment (in this case VGG; Table 24).
- this exemplary oligonucleotide represents the portion of the JH segment that is synthesized during the split pool synthesis (TATTATTACTACTATGGTATGGACGTATGGGGGCAAGGGACC; SEQ ID NO: 401; nucleotides 58-99; underlined), encoding the sequence YYYYYGMDVWGQGT (Table 20; residues 1-14 of SEQ ID NO: 258).
- the balance of the IGHJ segment is added during the subsequent PCR amplification described below.
- oligonucleotides After the split pool-synthesized oligonucleotides were cleaved from the resin and deprotected, they served as a template for a PCR reaction which added an additional randomly chosen 10 nucleotides (e.g., GACGAGCTTC; SEQ ID NO: 402) to the 5′ end and the rest of the IGHJ segment plus the BbsI restriction site to the 3′ end. These additions facilitate the cloning of the [DH]-[N2]-[JH] oligonucleotides into the 424 vectors.
- additional randomly chosen 10 nucleotides e.g., GACGAGCTTC; SEQ ID NO: 402
- the last round of the split pool synthesis involves 280 columns: 10 columns for each of the oligonucleotides encoding one of 28 H3-JH segments.
- the oligonucleotide products obtained from these 280 columns are pooled according to the identity of their H3-JH segments, for a total of 28 pools.
- Each of these 28 pools is then amplified in five separate PCR reactions, using five forward primers that each encode a different two base overlap (preceding the DH segment; see above) and one reverse primer that has a sequence corresponding to the familial origin of the H3-JH segment being amplified.
- the sequences of these 11 primers are provided below:
- Amplifications were performed using Taq polymerase, under standard conditions.
- the oligonucleotides were amplified for eight cycles, to maintain the representation of sequences of different lengths. Melting of the strands was performed at 95° C. for 30 seconds, with annealing at 58° C. and a 15 second extension time at 72° C.
- the PCR amplification was performed using the TG primer and the JH6 primer, where the annealing portion of the primers has been underlined:
- TG (SEQ ID NO: 407) GACGAGCTTCA ATGCACAGTTGCAATGTG JH6 (SEQ ID NO: 413) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCGT GGTCC CTTGCCCCCA
- the portion of the TG primer that is 5′ to the annealing portion includes the random 10 base pairs described above.
- the portion of the JH6 primer that is 5′ to the annealing portion includes the balance of the JH6 segment and the BbsI restriction site.
- the following PCR product (SEQ ID NO: 414) is formed in the reaction (added sequences underlined):
- the PCR products from each reaction were then combined into five pools, based on the forward primer that was used in the reaction, creating sets of sequences yielding the same two-base overhang after BsrDI digestion.
- the five pools of PCR products were then digested with BsRDI and BbsI (100 ⁇ g of PCR product; 1 mL reaction volume; 200 U BbsI; 100 U BsrDI; 2 h; 37° C.; NEB Buffer 2).
- the digested oligonucleotides were extracted twice with phenol/chloroform, ethanol precipitated, air dried briefly and resolubilized in 300 ⁇ L of TE buffer by sitting overnight at 4° C.
- Each of the 424 vectors described in the preceding sections was then digested with BsrDI and BbsI, each vector yielding a two base overhang that was complimentary to one of those contained in one of the five pools of PCR products.
- BsrDI and BbsI each vector yielding a two base overhang that was complimentary to one of those contained in one of the five pools of PCR products.
- one of the five pools of restriction digested PCR products are ligated into each of the 424 vectors, depending on their compatible ends, for a total of 424 ligations.
- This example describes the PCR amplification of the CDRH3 regions from the 424 vectors described above.
- the 424 vectors represent two sets: one for the VH3-23 family, with FRM3 ending in CAK (212 vectors) and one for the other 16 chassis, with FRM3 ending in CAR (212 vectors).
- the CDRH3s in the VH3-23-based vectors were amplified using a reverse primer (EK137; see Table 41) recognizing a portion of the CH1 region of the plasmid and the VH3-23-specific primer EK135 (see Table 41).
- Amplification of the CDRH3s from the 212 vectors with FRM3 ending in CAR was performed using the same reverse primer (EK137) and each of five FRM3-specific primers shown in Table 41 (EK139, EK140, EK141, EK143, and EK144). Therefore, 212 VH3-23 amplifications and 212 ⁇ 5 FRM3 PCR reactions were performed, for a total of 1,272 reactions.
- An additional PCR reaction amplified the CDRH3 from the 212 VH3-23-based vectors, using the EK 133 forward primer, to allow the amplicons to be cloned into the other 5 VH3 family member chassis while making the last three amino acids of these chassis CAK instead of the original CAR (VH3-23*).
- the primers used in each reaction are shown in Table 41.
- reaction products were pooled according to the respective VH chassis that they would ultimately be cloned into.
- Table 42 enumerates these pools, with the PCR primers used to obtain the CDRH3 sequences in each pool provided in the last two columns.
- FIG. 6 shows a schematic structure of a heavy chain vector, prior to recombination with a CDRH3.
- These 152 vectors represent 17 individual variable heavy chain gene families (Table 5; Examples 1 and 2).
- VH 3-30 differs from VH3-33 by a single amino acid; thus VH3-30 was included in the VH3-33 pool of variants.
- the 4-34 VH family member was kept separate from all others and, in this exemplary embodiment, no variants of it were included in the library. Thus, a total of 16 pools, representing 17 heavy chain chassis, were generated from the 152 vectors.
- the vector pools were digested with the restriction enzyme SfiI, which cuts at two sites in the vector that are located between the end of the FRM3 of the variable domain and the start of the CH1 (SEQ ID NO: 573).
- the gapped vector pools were then mixed with the appropriate (i.e., compatible) pool of CDRH3 amplicons, generated as described above, at a 50:1 insert to vector ratio.
- the mixture was then transformed into electrocompetent yeast ( S. cerevisiae ), which already contained plasmids or integrated genes comprising a VK light chain library (described below).
- the degree of library diversity was determined by plating a dilution of the electroporated cells on a selectable agar plate. In this exemplified embodiment of the invention, the agar plate lacked tryptophan and the yeast lacked the ability to endogenously synthesize tryptophan.
- a heavy chain library pool was then produced, based on the approximate representation of the heavy chain family members as depicted in Table 43.
- This example describes the mutation of position 94 in VH3-23, VH3-33, VH3-30, VH3-7, and VH3-48.
- VH3-23 the amino acid at this position was mutated from K to R.
- VH3-33, VH3-30, VH3-7, and VH3-48 this amino acid was mutated from R to K.
- VH3-32 this position was mutated from K to R.
- the purpose of making these mutations was to enhance the diversity of CDRH3 presentation in the library. For example, in naturally occurring VH3-23 sequences, about 90% have K at position 94, while about 10% have position R. By making these changes the diversity of the CDRH3 presentation is increased, as is the overall diversity of the library.
- Amplification was performed using the 424 vectors as a template.
- the vectors containing the sequence DTAVYYCAK (VH3-23; SEQ ID NO: 578) were amplified with a PCR primer that changed the K to a R and added 5′ tail for homologous recombination with the VH3-48, VH3-33, VH-30, and VH3-7.
- the “T” base in 3-48 does not change the amino acid encoded and thus the same primer with a T::C mismatch still allows homologous recombination into the 3-48 chassis.
- the amplification products from the 424 vectors (produced as described above) containing the DTAVYYCAR (SEQ ID NO: 579) sequence can be homologously recombined into the VH3-23 (CAR) vector, changing R to K in this framework and thus further increasing the diversity of CDRH3 presentation in this chassis.
- VK library of the invention.
- the exemplary VK library described herein corresponds to the VKCDR3 library of about 10 5 complexity, described in Example 6.2.
- VK libraries are within the scope of the invention, as are V ⁇ libraries.
- FIG. 8 shows a schematic structure of a light chain vector, prior to recombination with a CDRL3.
- VKCDR oligonucleotide libraries were then synthesized, as described in Example 6.2, using degenerate oligonucleotides (Table 33). The oligonucleotides were then PCR amplified, as separate pools, to make them double stranded and to add additional nucleotides required for efficient homologous recombination with the gapped (by SfiI) vector containing the VK chassis and constant region sequences.
- the VKCDR3 pools in this embodiment of the invention represented lengths 8, 9, and 10 amino acids, which were mixed post-PCR at a ratio 1:8:1.
- the pools were then cloned into the respective SfiI gapped VK chassis via homologous recombination, as described for the CDRH3 regions, set forth above.
- a schematic diagram of a CDRL3 integrated into a light chain vector and the accompanying sequence are provided in FIG. 9 .
- a kappa light chain library pool was then produced, based on the approximate representation of the VK family members found in the circulating pool of B cells.
- the 10 kappa variable regions used and the relative frequency in the final library pool are shown in Table 44.
- This example shows the characteristics of exemplary libraries of the invention, constructed according to the methods described herein.
- the length distribution of the individual DH, N2, and H3-JH segments obtained from the ten vectors are shown in FIGS. 11 - 13 .
- FIGS. 15 - 18 The mean theoretical length was 14.4 ⁇ 4 amino acids, while the average observed length was 14.3 ⁇ 3 amino acids.
- FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences
- FIG. 20 shows the representation of 16 of the chassis of the library.
- the VH3-15 chassis was not represented amongst these sequences. This was corrected later by introducing yeast transformants containing the VH3-15 chassis, with CDRH3 diversity, into the library at the desired composition.
- FIG. 22 shows the representation of the light chain chassis from amongst the 86 sequences selected from the library. About 91% of the CDRL3 sequences were exact matches to the design, and about 9% differed by a single amino acid.
- This example presents data on the composition of the CDRH3 domains of exemplary libraries, and a comparison to other libraries of the art. More specifically, this example presents an analysis of the occurrence of the 400 possible amino acid pairs (20 amino acids ⁇ 20 amino acids) occurring in the CDRH3 domains of the libraries. The prevalence of these pairs is computed by examination of the nearest neighbor (i ⁇ i+1; designated IP1), next nearest neighbor (i ⁇ i+2; designated IP2), and next-next nearest neighbor (i ⁇ i+3; designated IP3) of the i residue in CDRH3.
- IP1 nearest neighbor
- IP2 next nearest neighbor
- IP3 next-next nearest neighbor
- the present invention represents the first recognition that, surprisingly, a position-specific bias does exist within the central portion of the CDRH3 loop, when the occurrences of amino acid pairs recited above are considered.
- This example shows that the libraries described herein more faithfully reproduce the occurrence of these pairs as found in human sequences, in comparison to other libraries of the art.
- the composition of the libraries described herein may thus be considered more “human” than other libraries of the art.
- the pair-wise composition of Knappik et al. was determined based on the percent occurrences presented in FIG. 7 a of Knappik et al. (p. 71). The relevant data are reproduced below, in Table 45.
- the pair-wise composition of Lee et al. was determined based on the libraries depicted in Table 5 of Lee et al., where the positions corresponding to those CDRH3 regions analyzed from the current invention and from Knappik et al. are composed of an “XYZ” codon in Lee et al.
- the XYZ codon of Lee et al. is a degenerate codon with the following base compositions:
- each of the 400 amino acid pairs, in each of the IP1, IP2, and IP3 configurations can be computed for Knappik et al. and Lee et al. by multiplying together the individual amino acid compositions.
- the occurrence of YS pairs in the library is calculated by multiplying 15% by 4.1%, to yield 6.1%; note that the occurrence of SY pairs would be the same.
- the occurrence of YS pairs would be 6.86% (Y) multiplied by 9.35% (S), to give 6.4%; the same, again, for SY.
- the calculation is performed by ignoring the last five amino acids in the Kabat definition. By ignoring the C-terminal 5 amino acids of the human CDRH3, these sequences may be compared to those of Lee et al., based on the XYZ codons. While Lee et al. also present libraries with “NNK” and “NNS” codons, the pair-wise compositions of these libraries are even further away from human CDRH3 pair-wise composition. The XYZ codon was designed by Lee et al. to replicate, to some extent, the individual amino acid type biases observed in CDRH3.
- LUA-59 includes 59 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above).
- LUA-141 includes 141 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above). Redundancies created by combination of the N1 and tail sequences were removed from the dataset in each respective library.
- the invention may be defined based on the percent occurrence of any of the 400 amino acid pairs, particularly those in Tables 47-49. In certain embodiments, the invention may be defined based on at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more of these pairs.
- the percent occurrence of certain pairs of amino acids may fall within ranges indicated by “LUA-” (lower boundary) and “LUA+” (higher boundary), in the following tables.
- the lower boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, and 5.
- the higher boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 7.25, 7.5, 7.75, and 8.
- any of the lower boundaries recited may be combined with any of the higher boundaries recited, to establish ranges, and vice-versa.
- composition of the libraries of the present invention more closely mimics the composition of human sequences than other libraries known in the art.
- Synthetic libraries of the art do not intrinsically reproduce the composition of the “central loop” portion actual human CDRH3 sequences at the level of pair percentages.
- the libraries of the invention have a more complex pair-wise composition that closely reproduces that observed in actual human CDRH3 sequences.
- the exact degree of this reproduction versus a target set of actual human CDRH3 sequences may be optimized, for example, by varying the compositions of the segments used to design the CDRH3 libraries.
- f i is the normalized frequency of occurrence of i, which may be an amino acid type (in which case N would be equal to 20).
- N is the normalized frequency of occurrence of i, which may be an amino acid type (in which case N would be equal to 20).
- the value of I is zero.
- the value of I would be smaller, i.e., negative, and the lowest value is achieved when all f i values are the same and equal to N.
- N is 20, and the resulting value of I would be ⁇ 4.322. Because I is defined with base 2 logarithms, the units of I are bits.
- the I value for the HuCAL and XYZ libraries at the single position level may be derived from Tables 45 and 46, respectively, and are equal to ⁇ 4.08 and ⁇ 4.06.
- the corresponding single residue frequency occurrences in the non-limiting exemplary libraries of the invention and the sets of human sequences previously introduced, taken within the “central loop” as defined above, are provided in Table 50.
- MI mutual information
- MI values decrease as the pairs being considered sit further and further apart, and this is the case for both sets of human sequences, and exemplary libraries of the invention.
- the odds of their straddling an actual segment V, D, J plus V-D or D-J insertions
- their pair frequencies become closer to a simple product of singleton frequencies.
- Table 52 contains sequence information on certain immunoglobulin gene segments cited in the application. These sequences are non-limiting, and it is recognized that allelic variants exist and encompassed by the present invention. Accordingly, the methods present herein can be utilized with mutants of these sequences.
- XTS where X is not N
- NTZ where Z is not S or T are also options.
- NPS is yet another option that is much less likely to be N- linked glycosylated. 426 IGHV1-24 QVQLVQSGAEVKKPGASVKVSCKVSGYTLTELSMHWVRQ APGKGLEWMGGFDPEDGETIYAQKFQGRVTMTEDTSTDT AYMELSSLRSEDTAVYYCAT 427 IGHV1-45 QMQLVQSGAEVKKTGSSVKVSCKASGYTFTYRYLHWVRQ APGQALEWMGWITPFNGNTNYAQKFQDRVTITRDRSMST AYMELSSLRSEDTAMYYCAR 428 IGHV1-58 QMQLVQSGPEVKKPGTSVKVSCKASGFTFTSSAVQWVRQ ARGQRLEWIGWIVVGSGNTNYAQKFQERVTITRDMSTSTA YMELSSLRSEDTAVYYCAA 429 I
- G was chosen by analogy to other germ- line sequences, but other amino acid types, R, S, T, as non-limiting examples, are possible.
- 433 IGHV3-9 EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQ APGKGLEWVSGISWNSGSIGYADSVKGRFTISRDNAKNSL YLQMNSLRAEDTALYYCAKD 434 IGHV3-11 QVQLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQ APGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLY LQMNSLRAEDTAVYYCAR 435 IGHV3-13 EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYDMHWVRQ ATGKGLEWVSAIGTAGDTYYPGSVKGRFTISRENAKNSLYL QMNSLRAGDTAVYYCAR 436 IGHV3-20 EVQLVESGGGVVRPGGSLRLSCA
- XIS where X is not N
- NIZ where Z is not S or T
- NPS is yet another option that is much less likely to be N-linked glycosylated.
- IGHD2-8 AGGATATTGTACTAATGGTGTATGCTATACC 508 IGHD3-16 GTATTATGATTACGTTTGGGGGAGTTATGCTTATACC 509 IGHD3-9 GTATTACGATATTTTGACTGGTTATTATAAC 510 IGHD4-23 TGACTACGGTGGTAACTCC 511 IGHD4- TGACTACAGTAACTAC 4/4-11 512 IGHD5-12 GTGGATATAGTGGCTACGATTAC 513 IGHD5-24 GTAGAGATGGCTACAATTAC 514 IGHD6-25 GGGTATAGCAGCGGCTAC 515 IGHD6-6 GAGTATAGCAGCTCGTCC 516 IGHD7-27 CTAACTGGGGA (1)
- Each of the IGHD nucleotide sequences can be read in three (3) forward reading frames, and, possibly, in 3 reverse reading frames.
- the nucleotide sequence given for IGHD1-1 may encode the full peptide sequences: GTTGT (SEQ ID NO: 517), VQLER (SEQ ID NO: 518) and YNWND (SEQ ID NO: 519) in the forward direction, and VVPVV (SEQ ID NO: 520), SFQLY (SEQ ID NO: 521) and RSSCT (SEQ ID NO: 522) in the reverse direction.
- GTTGT SEQ ID NO: 517
- VQLER SEQ ID NO: 518) and YNWND (SEQ ID NO: 519)
- VVPVV SEQ ID NO: 520
- SFQLY SEQ ID NO: 521
- RSSCT SEQ ID NO: 522
- FIG. 24 shows binding curves for six clones specifically binding Antigen X, and their Kd values. This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library integrated into the genome of the yeast.
- FIG. 25 shows the binding curves for 10 clones specifically binding HEL; each gave a Kd>500 nM.
- This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library on a plasmid vector. The sequences of the heavy and light chains were determined for clones isolated from the library and it was demonstrated that multiple clones were present. A portion of the FRM3s (underlined) and the entire CDRH3s from four clones are shown below (Table 53 and Table 54, the latter using the numbering system of the invention).
- CDRL3 YYC QESFHIPYT FGGG .
- SEQ ID NO: 527) the CDRL3 matched the design of a degenerate VK1-39 oligonucleotide sequence in row 49 of Table 33. The relevant portion of this table is reproduced below, with the amino acids occupying each position of the isolated CDRL3 bolded and underlined:
- the present invention overcomes the inadequacies inherent in the known methods for generating libraries of antibody-encoding polynucleotides by specifically designing the libraries with directed sequence and length diversity.
- the libraries are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences.
Abstract
The present invention overcomes the inadequacies inherent in the known methods for generating libraries of antibody-encoding polynucleotides by specifically designing the libraries with directed sequence and length diversity. The libraries are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences.
Description
- This application is a continuation of application Ser. No. 17/532,287, filed Nov. 22, 2021, which is a continuation of application Ser. No. 17/229,484, filed Apr. 13, 2021, which is a divisional of application Ser. No. 16/215,523, filed Dec. 10, 2018 and issued as U.S. Pat. No. 11,008,383 on May 18, 2021, which is a divisional of application Ser. No. 14/150,129, filed Jan. 8, 2014 and issued as U.S. Pat. No. 10,189,894 on Jan. 29, 2019, which is a divisional of application Ser. No. 12/210,072, filed Sep. 12, 2008 and issued as U.S. Pat. No. 8,691,730 on Apr. 8, 2014, which claims the benefit of U.S. Provisional Application No. 60/993,785, filed on Sep. 14, 2007; the contents of these applications are hereby incorporated by reference in their entirety.
- In accordance with 37 CFR § 1.52(e)(5), the present specification makes reference to a Sequence Listing (submitted electronically as a .xml file named “2009186-0361_SL.xml). The .xml file was generated on Apr. 25, 2023 and is 746,177 bytes in size. The entire contents of the Sequence Listing are herein incorporated by reference.
- Antibodies have profound relevance as research tools and in diagnostic and therapeutic applications. However, the identification of useful antibodies is difficult and once identified, antibodies often require considerable redesign or ‘humanization’ before they are suitable for therapeutic applications.
- Previous methods for identifying desirable antibodies have typically involved phage display of representative antibodies, for example human libraries derived by amplification of nucleic acids from B cells or tissues, or, alternatively, synthetic libraries. However, these approaches have limitations. For example, most human libraries known in the art contain only the antibody sequence diversity that can be experimentally captured or cloned from the source (e.g., B cells). Accordingly, the human library may completely lack or under-represent certain useful antibody sequences. Synthetic or consensus libraries known in the art have other limitations, such as the potential to encode non-naturally occurring (e.g., non-human) sequences that have the potential to be immunogenic. Moreover, certain synthetic libraries of the art suffer from at least one of two limitations: (1) the number of members that the library can theoretically contain (i.e., theoretical diversity) may be greater than the number of members that can actually be synthesized, and (2) the number of members actually synthesized may be so great as to preclude screening of each member in a physical realization of the library, thereby decreasing the probability that a library member with a particular property may be isolated.
- For example, a physical realization of a library (e.g., yeast display, phage display, ribosomal display, etc.) capable of screening 1012 library members will only sample about 10% of the sequences contained in a library with 1013 members. Given a median CDRH3 length of about 12.7 amino acids (Rock et al., J. Exp. Med., 1994, 179:323-328), the number of theoretical sequence variants in CDRH3 alone is about 2012.7, or about 3.3×1016 variants. This number does not account for known variation that occurs in CDRH1 and CDRH2, heavy chain framework regions, and pairing with different light chains, each of which also exhibit variation in their respective CDRL1, CDRL2, and CDRL3. Finally, the antibodies isolated from these libraries are often not amenable to rational affinity maturation techniques to improve the binding of the candidate molecule.
- Accordingly, a need exists for smaller (i.e., able to be synthesized and physically realizable) antibody libraries with directed diversity that systematically represent candidate antibodies that are non-immunogenic (i.e., more human) and have desired properties (e.g., the ability to recognize a broad variety of antigens). However, obtaining such libraries requires balancing the competing objectives of restricting the sequence diversity represented in the library (to enable synthesis and physical realization, potentially with oversampling, while limiting the introduction of non-human sequences) while maintaining a level of diversity sufficient to recognize a broad variety of antigens. Prior to the instant invention, it was known in the art that “[al]though libraries containing heavy chain CDR3 length diversity have been reported, it is impossible to synthesize DNA encoding both the sequence and the length diversity found in natural heavy chain CDR3 repertoires” (Hoet et al., Nat. Biotechnol., 2005, 23: 344, incorporated by reference in its entirety).
- Therefore, it would be desirable to have antibody libraries which (a) can be readily synthesized, (b) can be physically realized and, in certain cases, oversampled, (c) contain sufficient diversity to recognize all antigens recognized by the preimmune human repertoire (i.e., before negative selection), (d) are non-immunogenic in humans (i.e., comprise sequences of human origin), and (e) contain CDR length and sequence diversity, and framework diversity, representative of naturally-occurring human antibodies. Embodiments of the instant invention at least provide, for the first time, antibody libraries that have these desirable features.
- The present invention is relates to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention. In some embodiments, the libraries of the invention are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences. It will be appreciated that certain non-limiting embodiments of the invention are described below. As described throughout the specification, the invention encompasses many other embodiments as well.
- In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least 106 unique antibody CDRH3 amino acid sequences comprising:
-
- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N2 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.
- In other embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least about 106 unique antibody CDRH3 amino acid sequences comprising:
-
- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein:
- (a) the most N-terminal N1 amino acid, if present, is selected from a group consisting of R, G, P, L, S, A, V, K, I, Q, T and D;
- (b) the second most N-terminal N1 amino acid, if present, is selected from a group consisting of G, P, R, S, L, V, E, A, D, I, T and K; and
- (c) the third most N-terminal N1 amino acid, if present, is selected from the group consisting of G, R, P, S, L, A, V, T, E, D, K and F;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein:
- (a) the most N-terminal N2 amino acid, if present, is selected from a group consisting of G, P, R, L, S, A, T, V, E, D, F and H;
- (b) the second most N-terminal N2 amino acid, if present, is selected from a group consisting of G, P, R, S, T, L, A, V, E, Y, D and K; and
- (c) the third most N-terminal N2 amino acid, if present, is selected from the group consisting of G, P, S, R, L, A, T, V, D, E, W and Q; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.
- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein:
- In still other embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least about 106 unique antibody CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:
-
[X]-[NT]-[DH]-[N2]-[H3-JH], wherein: -
- (i) X is any amino acid residue or no amino acid residue;
- (ii) N1 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof;
- (iii) DH is an amino acid sequence selected from the group consisting of all possible reading frames that do not include a stop codon encoded by IGHD1-1 (SEQ ID NO: 501), IGHD1-20 (SEQ ID NO: 503), IGHD1-26 (polynucleotides encoding SEQ ID NOs: 13 and 14), IGHD1-7 (SEQ ID NO: 504), IGHD2-15 (polynucleotides encoding SEQ ID NO: 16), IGHD2-2 (polynucleotides encoding SEQ ID NOs: 10 and 11), IGHD2-21 (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: 507), IGHD3-10 (polynucleotides encoding SEQ ID NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 (SEQ ID NO: 509), IGHD4-17 (polynucleotides encoding SEQ ID NO: 12), IGHD4-23 (SEQ ID NO: 510), IGHD4-4 (SEQ ID NO: 511), IGHD-4-11 (SEQ ID NO: 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ ID NO: 513), IGHD5-5 (polynucleotides encoding SEQ ID NO: 15), IGHD-5-18 (polynucleotides encoding SEQ ID NO: 15), IGHD6-13 (polynucleotides encoding SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID NO: 514), IGHD6-6 (SEQ ID NO: 515), and IGHD7-27 (SEQ ID NO: 516), and N- and C-terminal truncations thereof,
- (iv) N2 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof, and
- (v) H3-JH is an amino acid sequence selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV, or a sequence of at least 80% identity to any of them.
- In still another embodiment, the invention comprises wherein said library consists essentially of a plurality of polynucleotides encoding CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:
-
[X]-[N1]-[DH]-[N2]-[H3-JH], wherein: -
- (i) X is any amino acid residue or no amino acid residue;
- (ii) N1 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof,
- (iii) DH is an amino acid sequence selected from the group consisting of all possible reading frames that do not include a stop codon encoded by IGHD1-1 (SEQ ID NO: 501), IGHD1-20 (SEQ ID NO: 503), IGHD1-26 (polynucleotides encoding SEQ ID NOs: 13 and 14), IGHD1-7 (SEQ ID NO: 504), IGHD2-15 (polynucleotides encoding SEQ ID NO: 16), IGHD2-2 (polynucleotides encoding SEQ ID NOs: 10 and 11), IGHD2-21 (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: 507), IGHD3-10 (polynucleotides encoding SEQ ID NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 (SEQ ID NO: 509), IGHD4-17 (polynucleotides encoding SEQ ID NO: 12), IGHD4-23 (SEQ ID NO: 510), IGHD4-4 (SEQ ID NO: 511), IGHD-4-11 (SEQ ID NO: 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ ID NO: 513), IGHD5-5 (polynucleotides encoding SEQ ID NO: 15), IGHD-5-18 (polynucleotides encoding SEQ ID NO: 15), IGHD6-13 (polynucleotides encoding SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID NO: 514), IGHD6-6 (SEQ ID NO: 515), and IGHD7-27 (SEQ ID NO: 516), and N- and C-terminal truncations thereof,
- (iv) N2 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof, and
- (v) H3-JH is an amino acid sequence selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV, or a sequence of at least 80% identity to any of them.
- In another embodiment, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode one or more full length antibody heavy chain sequences, and wherein the CDRH3 amino acid sequences of the heavy chain comprise:
-
- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N2 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.
- The following embodiments may be applied throughout the embodiments of the instant invention. In one aspect, one or more CDRH3 amino acid sequences further comprise an N-terminal tail residue. In still another aspect, the N-terminal tail residue is selected from the group consisting of G, D, and E.
- In yet another aspect, the N1 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof. In certain other aspects, the N1 amino acid sequence may be of about 0 to about 5 amino acids.
- In yet another aspect, the N2 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof. In certain other aspects, the N2 sequence may be of about 0 to about 5 amino acids.
- In yet another aspect, the H3-JH amino acid sequence is selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV.
- In other embodiments, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i−i+1 pairs in the library is within the ranges specified below:
-
- Tyr-Tyr in an amount from about 2.5% to about 6.5%;
- Ser-Gly in an amount from about 2.5% to about 4.5%;
- Ser-Ser in an amount from about 2% to about 4%;
- Gly-Ser in an amount from about 1.5% to about 4%;
- Tyr-Ser in an amount from about 0.75% to about 2%;
- Tyr-Gly in an amount from about 0.75% to about 2%; and
- Ser-Tyr in an amount from about 0.75% to about 2%.
- In still other embodiments, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i−i+2 pairs in the library is within the ranges specified below:
-
- Tyr-Tyr in an amount from about 2.5% to about 4.5%;
- Gly-Tyr in an amount from about 2.5% to about 5.5%;
- Ser-Tyr in an amount from about 2% to about 4%;
- Tyr-Ser in an amount from about 1.75% to about 3.75%;
- Ser-Gly in an amount from about 2% to about 3.5%;
- Ser-Ser in an amount from about 1.5% to about 3%;
- Gly-Ser in an amount from about 1.5% to about 3%; and
- Tyr-Gly in an amount from about 1% to about 2%.
- In another embodiment, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i−i+3 pairs in the library is within the ranges specified below:
-
- Gly-Tyr in an amount from about 2.5% to about 6.5%;
- Ser-Tyr in an amount from about 1% to about 5%;
- Tyr-Ser in an amount from about 2% to about 4%;
- Ser-Ser in an amount from about 1% to about 3%;
- Gly-Ser in an amount from about 2% to about 5%; and
- Tyr-Tyr in an amount from about 0.75% to about 2%.
- In one aspect of the invention, at least 2, 3, 4, 5, 6, or 7 of the specified i−i+1 pairs in the library are within the specified ranges. In another aspect, the CDRH3 amino acid sequences are human. In yet another aspect, the polynucleotides encode at least about 106 unique CDRH3 amino acid sequences.
- In other aspects of the invention, the polynucleotides further encode one or more heavy chain chassis amino acid sequences that are N-terminal to the CDRH3 amino acid sequences, and the one or more heavy chain chassis sequences are selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 94 encoded by IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NOs: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV3-9 (SEQ ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV3-13 (SEQ ID NO: 435), IGHV3-15 (SEQ ID NO: 29), IGHV3-20 (SEQ ID NO: 436), IGHV3-21 (SEQ ID NO: 437), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-43 (SEQ ID NO: 438), IGHV3-48 (SEQ ID NO: 33), IGHV3-49 (SEQ ID NO: 439), IGHV3-53 (SEQ ID NO: 440), IGHV3-64 (SEQ ID NO: 441), IGHV3-66 (SEQ ID NO: 442), IGHV3-72 (SEQ ID NO: 443), IGHV3-73 (SEQ ID NO: 444), IGHV3-74 (SEQ ID NO: 445), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-28 (SEQ ID NO: 448), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), IGHV5-51 (SEQ ID NO: 40), IGHV6-1 (SEQ ID NO: 449), and IGHV7-4-1 (SEQ ID NO: 450), or a sequence of at least about 80% identity to any of them.
- In another aspect, the polynucleotides further encode one or more FRM4 amino acid sequences that are C-terminal to the CDRH3 amino acid sequences, wherein the one or more FRM4 amino acid sequences are selected from the group consisting of a FRM4 amino acid sequence encoded by IGHJ1 (SEQ ID NO: 253), IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: 255), IGHJ4 (SEQ ID NO: 256), IGHJ5 (SEQ ID NO: 257), and IGHJ6 (SEQ ID NO: 257), or a sequence of at least about 80% identity to any of them. In still another aspect, the polynucleotides further encode one or more immunoglobulin heavy chain constant region amino acid sequences that are C-terminal to the FRM4 sequence.
- In yet another aspect, the CDRH3 amino acid sequences are expressed as part of full-length heavy chains. In other aspects, the full-length heavy chains are selected from the group consisting of an IgG1, IgG2, IgG3, and IgG4, or combinations thereof. In one embodiment, the CDRH3 amino acid sequences are from about 2 to about 30, from about 8 to about 19, or from about 10 to about 18 amino acid residues in length. In other aspects, the synthetic polynucleotides of the library encode from about 106 to about 1014, from about 107 to about 1013, from about 108 to about 1012, from about 109 to about 1012, or from about 1010 to about 1012 unique CDRH3 amino acid sequences.
- In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody VKCDR3 amino acid sequences comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 amino acid sequences derived from a particular IGKV or IGKJ germline sequence.
- In one aspect, the synthetic polynucleotides encode one or more of the amino acid sequences listed in Table 33 or a sequence at least about 80% identical to any of them.
- In some embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of unique antibody VKCDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:
-
[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein: -
- (i) VK_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence of at least about 80% identity to any of them;
- (ii) L3-VK is the portion of the VKCDR3 encoded by the IGKV gene segment; and
- (iii) X is any amino acid residue; and
- (iv) JK* is an amino acid sequence selected from the group consisting of sequences encoded by IGJK1, IGJK2, IGJK3, IGJK4, and IGJK5, wherein the first residue of each IGJK sequence is not present.
- In still other aspects, X may be selected from the group consisting of F, L, I, R, W, Y, and P.
- In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of VλCDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:
-
[Vλ_Chassis]-[L3-Vλ]-[Jλ], wherein: -
- (i) Vλ_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGλV1-36 (SEQ ID NO: 480), IGλV1-40 (SEQ ID NO: 531), IGλV1-44 (SEQ ID NO: 532), IGλV1-47 (SEQ ID NO: 481), IGλV1-51 (SEQ ID NO: 533), IGλV10-54 (SEQ ID NO: 482), IGλV2-11 (SEQ ID NOS: 483, 484), IGλV2-14 (SEQ ID NO: 534), IGλV2-18 (SEQ ID NO: 485), IGλV2-23 (SEQ ID NOS: 486, 487), IGλV2-8 (SEQ ID NO: 488), IGλV3-1 (SEQ ID NO: 535), IGλV3-10 (SEQ ID NO: 489), IGλV3-12 (SEQ ID NO: 490), IGλV3-16 (SEQ ID NO: 491), IGλV3-19 (SEQ ID NO: 536), IGλV3-21 (SEQ ID NO: 537), IGλV3-25 (SEQ ID NO: 492), IGλV3-27 (SEQ ID NO: 493), IGλV3-9 (SEQ ID NO: 494), IGλV4-3 (SEQ ID NO: 495), IGλV4-60 (SEQ ID NO: 496), IGλV4-69 (SEQ ID NO: 538), IGλV5-39 (SEQ ID NO: 497), IGλV5-45 (SEQ ID NO: 540), IGλV6-57 (SEQ ID NO: 539), IGλV7-43 (SEQ ID NO: 541), IGλV7-46 (SEQ ID NO: 498), IGλV8-61 (SEQ ID NO: 499), IGλV9-49 (SEQ ID NO: 500), and IGλV10-54 (SEQ ID NO: 482), or a sequence of at least about 80% identity to any of them;
- (ii) L3-Vλ is the portion of the VλCDR3 encoded by the IGλV segment; and
- (iii) Jλ is an amino acid sequence selected from the group consisting of sequences encoded by IGλJ1-01, IGλJ2-01, IGλJ3-01, IGλJ3-02, IGλJ6-01, IGλJ7-01, and IGλJ7-02, and wherein the first residue of each IGJλ sequence may or may not be deleted.
- In further aspects, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:
-
- (i) a CDRH3 amino acid sequence as described herein; and
- (ii) a VKCDR3 amino acid sequence comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 sequences derived from a particular IGKV or IGKJ germline sequence.
- In still further aspects, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:
-
- (i) a CDRH3 amino acid sequence as described herein; and
- (ii) a VKCDR3 amino acid sequences of at least about 80% identity to an amino acid sequence represented by the following formula:
-
[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein: -
- (a) VK_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20, IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence of at least about 80% identity to any of them;
- (b) L3-VK is the portion of the VKCDR3 encoded by the IGKV gene segment; and
- (c) X is any amino acid residue; and
- (d) JK* is an amino acid sequence selected from the group consisting of sequences encoded by IGJK1, IGJK2, IGJK3, IGJK4, and IGJK5, wherein the first residue of each IGJK sequence is not present.
- In some aspects, the VKCDR3 amino acid sequence comprises one or more of the sequences listed in Table 33 or a sequence at least about 80% identical to any of them. In other aspects, the antibody proteins are expressed in a heterodimeric form. In yet another aspect, the human antibody proteins are expressed as antibody fragments. In still other aspects of the invention, the antibody fragments are selected from the group consisting of Fab, Fab′, F(ab′)2, Fv fragments, diabodies, linear antibodies, and single-chain antibodies.
- In certain embodiments, the invention comprises an antibody isolated from the polypeptide expression products of any library described herein.
- In still other aspects, the polynucleotides further comprise a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination.
- In one embodiment, the polynucleotides further encode an alternative scaffold.
- In another embodiment, the invention comprises a library of polypeptides encoded by any of the synthetic polynucleotide libraries described herein.
- In yet another embodiment, the invention comprises a library of vectors comprising any of the polynucleotide libraries described herein. In certain other aspects, the invention comprises a population of cells comprising the vectors of the instant invention.
- In one aspect, the doubling time of the population of cells is from about 1 to about 3 hours, from about 3 to about 8 hours, from about 8 to about 16 hours, from about 16 to about 20 hours, or from 20 to about 30 hours. In yet another aspect, the cells are yeast cells. In still another aspect, the yeast is Saccharomyces cerevisiae.
- In other embodiments, the invention comprises a library that has a theoretical total diversity of N unique CDRH3 sequences, wherein N is about 106 to about 1015; and wherein the physical realization of the theoretical total CDRH3 diversity has a size of at least about 3N, thereby providing a probability of at least about 95% that any individual CDRH3 sequence contained within the theoretical total diversity of the library is present in the actual library.
- In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody VλCDR3 amino acid sequences comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 95B, 95C, 96, and 97, in selected VλCDR3 sequences encoded by a single germline sequence.
- In some embodiments, the invention relates to a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the library has a theoretical total diversity of about 106 to about 1015 unique CDRH3 sequences.
- In still other embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody VK amino acid sequences, the method comprising:
-
- (i) providing polynucleotide sequences encoding:
- (a) one or more VK_Chassis amino acid sequences selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence at least about 80% identical to any of them;
- (b) one or more L3-VK amino acid sequences, wherein L3-VK the portion of the VKCDR3 amino acid sequence encoded by the IGKV gene segment;
- (c) one or more X residues, wherein X is any amino acid residue; and
- (d) one or more JK* amino acid sequences, wherein JK* is an amino acid sequence selected from the group consisting amino acid sequences encoded by IGKJ1 (SEQ ID NO: 552), IGKJ2 (SEQ ID NO: 553), IGKJ3 (SEQ ID NO: 554), IGKJ4 (SEQ ID NO: 555), and IGKJ5 (SEQ ID NO: 556), wherein the first amino acid residue of each sequence is not present; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human VK sequences represented by the following formula:
- (i) providing polynucleotide sequences encoding:
-
[VK_Chassis]-[L3-VK]-[X]-[JK*]. - In some embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody light chain CDR3 sequences, the method comprising:
-
- (i) determining the percent occurrence of each amino acid residue at each position in selected light chain CDR3 amino acid sequences derived from a single germline polynucleotide sequence;
- (ii) designing synthetic polynucleotides encoding a plurality of human antibody light chain CDR3 amino acid sequences, wherein the percent occurrence of any amino acid at any position within the designed light chain CDR3 amino acid sequences is within at least about 30% of the percent occurrence in the selected light chain CDR3 amino acid sequences derived from a single germline polynucleotide sequence, as determined in (i); and
- (iii) synthesizing one or more polynucleotides that were designed in (ii).
- In other embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody Vλ amino acid sequences, the method comprising:
-
- (i) providing polynucleotide sequences encoding:
- (a) one or more Vλ_Chassis amino acid sequences selected from the group consisting of about Kabat residue 1 to about Kabat residue 88 encoded by IGλV1-36 SEQ ID NO: 480), IGλV1-40 (SEQ ID NO: 531), IGλV1-44 (SEQ ID NO: 532), IGλV1-47 (SEQ ID NO: 481), IGλV1-51 (SEQ ID NO: 533), IGλV10-54 (SEQ ID NO: 482), IGλV2-11 (SEQ ID NO: 483, 484), IGλV2-14 (SEQ ID NO: 534), IGλV2-18 (SEQ ID NO: 485), IGλV2-23 (SEQ ID NO: 486, 487), IGλV2-8 (SEQ ID NO: 488), IGλV3-1 (SEQ ID NO: 535), IGλV3-10 (SEQ ID NO: 489), IGλV3-12 (SEQ ID NO: 490), IGλV3-16 (SEQ ID NO: 491), IGλV3-19 (SEQ ID NO: 536), IGλV3-21 (SEQ ID NO: 537), IGλV3-25 (SEQ ID NO: 492), IGλV3-27 (SEQ ID NO: 493), IGλV3-9 (SEQ ID NO: 494), IGλV4-3 (SEQ ID NO: 495), IGλV4-60 (SEQ ID NO: 496), IGλV4-69 (SEQ ID NO: 538), IGλV5-39 (SEQ ID NO: 497), IGλV5-45 (SEQ ID NO: 540), IGλV6-57 (SEQ ID NO: 539), IGλV7-43 (SEQ ID NO: 541), IGλV7-46 (SEQ ID NO: 498), IGλV8-61 (SEQ ID NO: 499), IGλV9-49 (SEQ ID NO: 500), and IGλV10-54 (SEQ ID NO: 482), or a sequence at least about 80% identical to any of them;
- (b) one ore more L3-Vλ sequences, wherein L3-Vλ is the portion of the VλCDR3 amino acid sequence encoded by the IGλV gene segment;
- (c) one or more Jλ sequences, wherein Jλ is an amino acid sequence selected from the group consisting of amino acid sequences encoded by IGλJ1-01 (SEQ ID NO: 557), IGλJ2-01 (SEQ ID NO: 558), IGλJ3-01 (SEQ ID NO: 559), IGλJ3-02 (SEQ ID NO: 560), IGλJ6-01 (SEQ ID NO: 561), IGλJ7-01 (SEQ ID NO: 562), and IGλJ7-02 (SEQ ID NO: 563) wherein the first amino acid residue of each sequence may or may not be present; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human Vλ amino acid sequences represented by the following formula:
- (i) providing polynucleotide sequences encoding:
-
[Vλ_Chassis]-[L3-Vλ]-[Jλ]. - In certain embodiments, the amino acid sequences encoded by the polynucleotides of the libraries of the invention are human.
- The present invention is also directed to methods of preparing a synthetic polynucleotide library comprising providing and assembling the polynucleotide sequences of the instant invention.
- In another aspect, the invention comprises a method of preparing the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, the method comprising:
-
- (i) providing polynucleotide sequences encoding:
- (a) one or more N1 amino acid sequences of about 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
- (b) one or more human CDRH3 DH amino acid sequences, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (c) one or more N2 amino acid sequences of about 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
- (d) one or more human CDRH3 H3-JH amino acid sequences, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human antibody CDRH3 amino acid sequences represented by the following formula:
- (i) providing polynucleotide sequences encoding:
-
[N1]-[DH]-[N2]-[H3-JH]. - In one aspect, one or more of the polynucleotide sequences are synthesized via split-pool synthesis.
- In another aspect, the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.
- In another aspect, the method of the invention further comprises the step of providing a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination. In still another aspect, the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.
- In some embodiments, the step of recombining is performed in yeast. In certain embodiments, the yeast is S. cerevisiae.
- In certain other embodiments, the invention comprises a method of isolating one or more host cells expressing one or more antibodies, the method comprising:
-
- (i) expressing the human antibodies as described herein in one or more host cells;
- (ii) contacting the host cells with one or more antigens; and
- (iii) isolating one or more host cells having antibodies that bind to the one or more antigens.
- In another aspect, the method of the invention further comprises the step of isolating one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens. In yet another aspect, the method of the invention further comprises the step of isolating one or more polynucleotide sequences encoding one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens.
- In certain other embodiments, the invention comprises a kit comprising the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, or any of the other sequences disclosed herein.
- In still other aspects, the CDRH3 amino acid sequences encoded by the libraries of synthetic polynucleotides described herein, or any of the other sequences disclosed herein, are in computer readable form.
-
FIG. 1 depicts a schematic of recombination between a fragment (e.g., CDR3) and a vector (e.g., comprising a chassis and constant region) for the construction of a library. -
FIG. 2 depicts the length distribution of the N1 and N2 regions of rearranged human antibody sequences compiled from Jackson et al. (J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety). -
FIG. 3 depicts the length distribution of the CDRL3 regions of rearranged human kappa light chain sequences compiled from the NCBI database (Appendix A). -
FIG. 4 depicts the length distribution of the CDRL3 regions of rearranged human lambda light chain sequences compiled from the NCBI database (Appendix B). -
FIG. 5 depicts a schematic representation of the 424 cloning vectors used in the synthesis of the CDRH3 regions before and after ligation of the [DH]-[N2]-[JH] segment (DTAVYYCAR: SEQ ID NO: 579; DTAVYYCAK: SEQ ID NO: 578; SSASTK: SEQ ID NO: 580). -
FIG. 6 depicts a schematic structure of a heavy chain vector, prior to recombination with a CDRH3. -
FIG. 7 depicts a schematic diagram of a CDRH3 integrated into a heavy chain vector and the polynucleotide and polypeptide sequences of CDRH3 (amino acid: SEQ ID NO: 847; coding strand: SEQ ID NO: 581; complementary strand: SEQ ID NO: 845). -
FIG. 8 depicts a schematic structure of a kappa light chain vector, prior to recombination with a CDRL3. -
FIG. 9 depicts a schematic diagram of a CDRL3 integrated into a light chain vector and the polynucleotide and polypeptide sequences of CDRL3 (amino acid: SEQ ID NO: 848; coding strand: SEQ ID NO: 582; complementary strand: SEQ ID NO: 846). -
FIG. 10 depicts the length distribution of the CDRH3 domain (Kabat positions 95-102) from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 11 depicts the length distribution of the DH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 12 depicts the length distribution of the N2 segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 13 depicts the length distribution of the H3-JH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 14 depicts the length distribution of the CDRH3 domains from 291 sequences prepared from yeast cells transformed according to the method outlined in Example 10.4, namely the co-transformation of vectors containing heavy chain chassis and constant regions with a CDRH3 insert (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 15 depicts the length distribution of the [Tail]-[N1] region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the expected (i.e., designed) distribution. -
FIG. 16 depicts the length distribution of the DH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution. -
FIG. 17 depicts the length distribution of the N2 region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution. -
FIG. 18 depicts the length distribution of the H3-JH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution. -
FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences (observed), as compared to the theoretical (i.e., designed) familial origin. -
FIG. 20 depicts the representation of each of the 16 chassis of the library (observed), as compared to the theoretical (i.e., designed) chassis representation. VH3-23 is represented twice; once ending in CAR and once ending in CAK. These representations were combined, as were the ten variants of VH3-33 with one variant of VH3-30. -
FIG. 21 depicts a comparison of the CDRL3 length from 86 sequences selected from the VKCDR3 library of Example 6.2 (observed) to human sequences (human) and the designed sequences (designed). -
FIG. 22 depicts the representation of the light chain chassis amongst the 86 sequences selected from the library (observed), as compared to the theoretical (i.e., designed) chassis representation. -
FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in an exemplary library of the invention, versus the preimmune repertoire of Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety). -
FIG. 24 depicts binding curves for 6 antibodies selected from a library of the invention. -
FIG. 25 depicts binding curves for 10 antibodies selected from a library of the invention binding to hen egg white lysozyme. - The present invention is directed to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention. The libraries taught in this application are described, at least in part, in terms of the components from which they are assembled.
- In certain embodiments, the instant invention provides antibody libraries specifically designed based on the composition and CDR length distribution in the naturally occurring human antibody repertoire. It is estimated that, even in the absence of antigenic stimulation, a human makes at least about 107 different antibody molecules. The antigen-binding sites of many antibodies can cross-react with a variety of related but different epitopes. In addition the human antibody repertoire is large enough to ensure that there is an antigen-binding site to fit almost any potential epitope, albeit with low affinity.
- The mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way, by combinatorially joining chromosomally separated gene segments prior to transcription. Each type of immunoglobulin (Ig) chain (i.e., κ light, λ light, and heavy) is synthesized by combinatorial assembly of DNA sequences selected from two or more families of gene segments, to produce a single polypeptide chain. Specifically, the heavy chains and light chains each consist of a variable region and a constant (C) region. The variable regions of the heavy chains are encoded by DNA sequences assembled from three families of gene segments: variable (IGHV), joining (IGHJ) and diversity (IGHD). The variable regions of light chains are encoded by DNA sequences assembled from two families of gene segments for each of the kappa and lambda light chains: variable (IGLV) and joining (IGLJ). Each variable region (heavy and light) is also recombined with a constant region, to produce a full-length immunoglobulin chain.
- While combinatorial assembly of the V, D and J gene segments make a substantial contribution to antibody variable region diversity, further diversity is introduced in vivo, at the pre-B cell stage, via imprecise joining of these gene segments and the introduction of non-templated nucleotides at the junctions between the gene segments.
- After a B cell recognizes an antigen, it is induced to proliferate. During proliferation, the B cell receptor locus undergoes an extremely high rate of somatic mutation that is far greater than the normal rate of genomic mutation. The mutations that occur are primarily localized to the Ig variable regions and comprise substitutions, insertions and deletions. This somatic hypermutation enables the production of B cells that express antibodies possessing enhanced affinity toward an antigen. Such antigen-driven somatic hypermutation fine-tunes antibody responses to a given antigen.
- Significant efforts have been made to create antibody libraries with extensive diversity, and to mimic the natural process of affinity maturation of antibodies against various antigens, especially antigens associated with diseases such as autoimmune diseases, cancer, and infectious disease. Antibody libraries comprising candidate binding molecules that can be readily screened against targets are desirable. However, the full promise of an antibody library, which is representative of the preimmune human antibody repertoire, has remained elusive. In addition to the shortcomings enumerated above, and throughout the application, synthetic libraries that are known in the art often suffer from noise (i.e., very large libraries increase the presence of many sequences which do not express well, and/or which misfold), while entirely human libraries that are known in the art may be biased against certain antigen classes (e.g., self-antigens). Moreover, the limitations of synthesis and physical realization techniques restrict the functional diversity of antibody libraries of the art. The present invention provides, for the first time, a fully synthetic antibody library that is representative of the human preimmune antibody repertoire (e.g., in composition and length), and that can be readily screened (i.e., it is physically realizable and, in some cases can be oversampled) using, for example, high throughput methods, to obtain, for example, new therapeutics and/or diagnostics
- In particular, the synthetic antibody libraries of the instant invention have the potential to recognize any antigen, including self-antigens of human origin. The ability to recognize self-antigens is usually lost in an expressed human library, because self-reactive antibodies are removed by the donor's immune system via negative selection. Another feature of the invention is that screening the antibody library using positive clone selection, for example, by FACS (florescence activated cell sorter) bypasses the standard and tedious methodology of generating a hybridoma library and supernatant screening. Still further, the libraries, or sub-libraries thereof, can be screened multiple times, to discover additional antibodies against other desired targets.
- Before further description of the invention, certain terms are defined.
- Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by one of ordinary skill in the art relevant to the invention. The definitions below supplement those in the art and are directed to the embodiments described in the current application.
- The term “antibody” is used herein in the broadest sense and specifically encompasses at least monoclonal antibodies, polyclonal antibodies, multi-specific antibodies (e.g., bispecific antibodies), chimeric antibodies, humanized antibodies, human antibodies, and antibody fragments. An antibody is a protein comprising one or more polypeptides substantially or partially encoded by immunoglobulin genes or fragments of immunoglobulin genes. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as myriad immunoglobulin variable region genes.
- “Antibody fragments” comprise a portion of an intact antibody, for example, one or more portions of the antigen-binding region thereof. Examples of antibody fragments include Fab, Fab′, F(ab′)2, and Fv fragments, diabodies, linear antibodies, single-chain antibodies, and multi-specific antibodies formed from intact antibodies and antibody fragments.
- An “intact antibody” is one comprising full-length heavy- and light-chains and an Fc region. An intact antibody is also referred to as a “full-length, heterodimeric” antibody or immunoglobulin.
- The term “variable” refers to the portions of the immunoglobulin domains that exhibit variability in their sequence and that are involved in determining the specificity and binding affinity of a particular antibody (i.e., the “variable domain(s)”). Variability is not evenly distributed throughout the variable domains of antibodies; it is concentrated in sub-domains of each of the heavy and light chain variable regions. These sub-domains are called “hypervariable” regions or “complementarity determining regions” (CDRs). The more conserved (i.e., non-hypervariable) portions of the variable domains are called the “framework” regions (FRM). The variable domains of naturally occurring heavy and light chains each comprise four FRM regions, largely adopting a β-sheet configuration, connected by three hypervariable regions, which form loops connecting, and in some cases forming part of, the β-sheet structure. The hypervariable regions in each chain are held together in close proximity by the FRM and, with the hypervariable regions from the other chain, contribute to the formation of the antigen-binding site (see Kabat et al. Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health, Bethesda, Md., 1991, incorporated by reference in its entirety). The constant domains are not directly involved in antigen binding, but exhibit various effector functions, such as, for example, antibody-dependent, cell-mediated cytotoxicity and complement activation.
- The “chassis” of the invention represent a portion of the antibody heavy chain variable (IGHV) or light chain variable (IGLV) domains that are not part of CDRH3 or CDRL3, respectively. The chassis of the invention is defined as the portion of the variable region of an antibody beginning with the first amino acid of FRM1 and ending with the last amino acid of FRM3. In the case of the heavy chain, the chassis includes the amino acids including from about
Kabat position 1 to about Kabat position 94. In the case of the light chains (kappa and lambda), the chassis are defined as including from aboutKabat position 1 to aboutKabat position 88. The chassis of the invention may contain certain modifications relative to the corresponding germline variable domain sequences presented herein or available in public databases. These modifications may be engineered (e.g., to remove N-linked glycosylation sites) or naturally occurring (e.g., to account for allelic variation). For example, it is known in the art that the immunoglobulin gene repertoire is polymorphic (Wang et al., Immunol. Cell. Biol., 2008, 86: 111; Collins et al., Immunogenetics, 2008, DOI 10.1007/s00251-008-0325-z, published online, each incorporated by reference in its entirety); chassis, CDRs (e.g., CDRH3) and constant regions representative of these allelic variants are also encompassed by the invention. In some embodiments, the allelic variant(s) used in a particular embodiment of the invention may be selected based on the allelic variation present in different patient populations, for example, to identify antibodies that are non-immunogenic in these patient populations. In certain embodiments, the immunogenicity of an antibody of the invention may depend on allelic variation in the major histocompatibility complex (MHC) genes of a patient population. Such allelic variation may also be considered in the design of libraries of the invention. In certain embodiments of the invention, the chassis and constant regions are contained on a vector, and a CDR3 region is introduced between them via homologous recombination. - In some embodiments, one, two or three nucleotides may follow the heavy chain chassis, forming either a partial (if one or two) or a complete (if three) codon. When a full codon is present, these nucleotides encode an amino acid residue that is referred to as the “tail,” and occupies
position 95. - The “CDRH3 numbering system” used herein defines the first amino acid of CDRH3 as being at Kabat position 95 (the “tail,” when present) and the last amino acid of CDRH3 as position 102. The amino acids following the “tail” are called “N1” and, when present, are assigned numbers 96, 96A, 96B, etc. The N1 segment is followed by the “DH” segment, which is assigned numbers 97, 97A, 97B, 97C, etc. The DH segment is followed by the “N2” segment, which, when present, is numbered 98, 98A, 98B, etc. Finally, the most C-terminal amino acid residue of the set of the “H3-JH” segment is designated as number 102. The residue directly before (N-terminal) it, when present, is 101, and the one before (if present) is 100. For reasons of convenience, and which will become apparent elsewhere, the rest of the H3-JH amino acids are numbered in reverse order, beginning with 99 for the amino acid just N-terminal to 100, 99A for the residue N-terminal to 99, and so forth for 99B, 99C, etc. Examples of certain CDRH3 sequence residue numbers may therefore include the following:
- As used herein, the term “diversity” refers to a variety or a noticeable heterogeneity. The term “sequence diversity” refers to a variety of sequences which are collectively representative of several possibilities of sequences, for example, those found in natural human antibodies. For example, heavy chain CDR3 (CDRH3) sequence diversity may refer to a variety of possibilities of combining the known human DH and H3-JH segments, including the N1 and N2 regions, to form heavy chain CDR3 sequences. The light chain CDR3 (CDRL3) sequence diversity may refer to a variety of possibilities of combining the naturally occurring light chain variable region contributing to CDRL3 (i.e., L3-VL) and joining (i.e., L3-JL) segments, to form light chain CDR3 sequences. As used herein, H3-JH refers to the portion of the IGHJ gene contributing to CDRH3. As used herein, L3-VL and L3-JL refer to the portions of the IGLV and IGLJ genes (kappa or lambda) contributing to CDRL3, respectively.
- As used herein, the term “expression” includes any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
- As used herein, the term “host cell” is intended to refer to a cell into which a polynucleotide of the invention. It should be understood that such terms refer not only to the particular subject cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
- The term “length diversity” refers to a variety in the length of a particular nucleotide or amino acid sequence. For example, in naturally occurring human antibodies, the heavy chain CDR3 sequence varies in length, for example, from about 3 amino acids to over about 35 amino acids, and the light chain CDR3 sequence varies in length, for example, from about 5 to about 16 amino acids. Prior to the instant invention, it was known in the art that it is possible to design antibody libraries containing sequence diversity or length diversity (see, e.g., Hoet et al., Nat. Biotechnol., 2005, 23: 344; Kretzschmar and von Ruden, Curr. Opin. Biotechnol., 2002 13: 598; and Rauchenberger et al., J. Biol. Chem., 2003 278: 38194, each of which is incorporated by reference in its entirety); however, the instant invention is directed to, at least, the design of synthetic antibody libraries containing the sequence diversity and length diversity of naturally occurring human sequences. In some cases, synthetic libraries containing sequence and length diversity have been synthesized, however these libraries contain too much theoretical diversity to synthesize the entire designed repertoire and/or too many theoretical members to physically realize or oversample the entire library.
- As used herein, a sequence designed with “directed diversity” has been specifically designed to contain both sequence diversity and length diversity. Directed diversity is not stochastic.
- As used herein, “stochastic” describes a process of generating a randomly determined sequence of amino acids, which is considered as a sample of one element from a probability distribution.
- The term “library of polynucleotides” refers to two or more polynucleotides having a diversity as described herein, specifically designed according to the methods of the invention. The term “library of polypeptides” refers to two or more polypeptides having a diversity as described herein, specifically designed according to the methods of the invention. The term “library of synthetic polynucleotides” refers to a polynucleotide library that includes synthetic polynucleotides. The term “library of vectors” refers herein to a library of at least two different vectors. As used herein, the term “human antibody libraries,” at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies.
- As described throughout the specification, the term “library” is used herein in its broadest sense, and also may include the sub-libraries that may or may not be combined to produce libraries of the invention.
- As used herein, the term “synthetic polynucleotide” refers to a molecule formed through a chemical process, as opposed to molecules of natural origin, or molecules derived via template-based amplification of molecules of natural origin (e.g., immunoglobulin chains cloned from populations of B cells via PCR amplification are not “synthetic” used herein). In some instances, for example, when referring to libraries of the invention that comprise multiple components (e.g., N1, DH, N2, and/or H3-JH), the invention encompasses libraries in which at least one of the aforementioned components is synthetic. By way of illustration, a library in which certain components are synthetic, while other components are of natural origin or derived via template-based amplification of molecules of natural origin, would be encompassed by the invention.
- The term “split-pool synthesis” refers to a procedure in which the products of a plurality of first reactions are combined (pooled) and then separated (split) before participating in a plurality of second reactions. Example 9, describes the synthesis of 278 DH segments (products), each in a separate reaction. After synthesis, these 278 segments are combined (pooled) and then distributed (split) amongst 141 columns for the synthesis of the N2 segments. This enables the pairing of each of the 278 DH segments with each of the 141 N2 segments. As described elsewhere in the specification, these numbers are non-limiting.
- “Preimmune” antibody libraries have similar sequence diversities and length diversities to naturally occurring human antibody sequences before these sequences have undergone negative selection or somatic hypermutation. For example, the set of sequences described in Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety) is believed to represent sequences from the preimmune repertoire. In certain embodiments of the invention, the sequences of the invention will be similar to these sequences (e.g., in terms of composition and length). In certain embodiments of the invention, such antibody libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode antibodies with the potential to recognize any antigen. In one embodiment of the invention, an antibody library comprises about 107 to about 1020 different antibodies and/or polynucleotide sequences encoding the antibodies of the library. In some embodiments, the libraries of the instant invention are designed to include 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, or 1020 different antibodies and/or polynucleotide sequences encoding the antibodies. In certain embodiments, the libraries of the invention may comprise or encode about 103 to about 105, about 105 to about 107, about 107 to about 109, about 109 to about 1011, about 1011 to about 1013, about 1013 to about 1015, about 1015 to about 1017, or about 1017 to about 1020 different antibodies. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 101, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 10 19, or 1020 or less than about 103, 10 4, 10 5, 10 6, 10 7, 10 8, 10 9, 10 10, 10 11, 10 12, 10 13, 10 14, 10 15, 10 16, 10 17, 10 18, 10 19, or 1020. In certain other embodiments of the invention, the probability of an antibody of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The antibody libraries of the invention may also include antibodies directed to, for example, self (i.e., human) antigens. The antibodies of the present invention may not be present in expressed human libraries for reasons including because self-reactive antibodies are removed by the donor's immune system via negative selection. However, novel heavy/light chain pairings may in some cases create self-reactive antibody specificity (Griffiths et al. U.S. Pat. No. 5,885,793, incorporated by reference in its entirety). In certain embodiments of the invention, the number of unique heavy chains in a library may be about 10, 50, 102, 150, 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1011, 1019, 1020, or more. In certain embodiments of the invention, the number of unique light chains in a library may be about 5, 10, 25, 50, 102, 150, 500, 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, or more.
- As used herein, the term “human antibody CDRH3 libraries,” at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies. “Preimmune” CDRH3 libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRH3 sequences before these sequences undergo negative selection and somatic hypermutation. Known human CDRH3 sequences are represented in various data sets, including Jackson et al., J. Immunol Methods, 2007, 324: 26; Martin, Proteins, 1996, 25: 130; and Lee et al., Immunogenetics, 2006, 57: 917, each of which is incorporated by reference in its entirety. In certain embodiments of the invention, such CDRH3 libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode CDRH3s with the potential to recognize any antigen. In one embodiment of the invention, an antibody library includes about 106 to about 1015 different CDRH3 sequences and/or polynucleotide sequences encoding said CDRH3 sequences. In some embodiments, the libraries of the instant invention are designed to about 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, or 1016, different CDRH3 sequences and/or polynucleotide sequences encoding said CDRH3 sequences. In some embodiments, the libraries of the invention may include or encode about 103 to about 106, about 106 to about 108, about 108 to about 1010, about 1010 to about 1012, about 1012 to about 1014, or about 1014 to about 1016 different CDRH3 sequences. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 103, 10 4, 10 5, 10 6, 10 7, 10 8, 10 9, 10 10, 10 11, 10 12, 10 13, 10 14, 10 15, or 1016 or less than about 103, 10 4, 10 5, 10 6, 10 7, 10 8, 10 9, 10 10, 10 11, 10 12, 10 13, 10 14, 10 15, or 1016. In certain embodiments of the invention, the probability of a CDRH3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The preimmune CDRH3 libraries of the invention may also include CDRH3s directed to, for example, self (i.e., human) antigens. Such CDRH3s may not be present in expressed human libraries, because self-reactive CDRH3s are removed by the donor's immune system via negative selection.
- Libraries of the invention containing “VKCDR3” sequences and “VλCDR3” sequences refer to the kappa and lambda sub-sets of the CDRL3 sequences, respectively. These libraries may be designed with directed diversity, to collectively represent the length and sequence diversity of the human antibody CDRL3 repertoire. “Preimmune” versions of these libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRL3 sequences before these sequences undergo negative selection. Known human CDRL3 sequences are represented in various data sets, including the NCBI database (see Appendix A and Appendix B for light chain sequence data sets) and Martin, Proteins, 1996, 25: 130 incorporated by reference in its entirety. In certain embodiments of the invention, such CDRL3 libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode CDRL3s with the potential to recognize any antigen.
- In one embodiment of the invention, an antibody library comprises about 105 different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences. In some embodiments, the libraries of the instant invention are designed to comprise about 101, 102, 103, 104, 106, 107, or 108 different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences. In some embodiments, the libraries of the invention may comprise or encode about 101 to about 103, about 103 to about 105, or about 105 to about 108 different CDRL3 sequences. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 101, 102, 103, 104, 105, 106, 107, or 108 or less than about 101, 102, 101, 104, 10 5, 10 6, 107, or 108. In certain embodiments of the invention, the probability of a CDRL3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5% or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The preimmune CDRL3 libraries of the invention may also include CDRL3s directed to, for example, self (i.e., human) antigens. Such CDRL3s may not be present in expressed human libraries, because self-reactive CDRL3s are removed by the donor's immune system via negative selection.
- As used herein, the term “known heavy chain CDR3 sequences” refers to heavy chain CDR3 sequences in the public domain that have been cloned from populations of human B cells. Examples of such sequences are those published or derived from public data sets, including, for example, Zemlin et al., JMB, 2003, 334: 733; Lee et al., Immunogenetics, 2006, 57: 917; and Jackson et al. J. Immunol Methods, 2007, 324: 26, each of which are incorporated by reference in their entirety.
- As used herein, the term “known light chain CDR3 sequences” refers to light chain CDR3 sequences (e.g., kappa or lambda) in the public domain that have been cloned from populations of human B cells. Examples of such sequences are those published or derived from public data sets, including, for example, the NCBI database (see Appendices A and B filed herewith).
- As used herein the term “antibody binding regions” refers to one or more portions of an immunoglobulin or antibody variable region capable of binding an antigen(s). Typically, the antibody binding region is, for example, an antibody light chain (or variable region or one or more CDRs thereof), an antibody heavy chain (or variable region or one or more CDRs thereof), a heavy chain Fd region, a combined antibody light and heavy chain (or variable regions thereof) such as a Fab, F(ab′)2, single domain, or single chain antibodies (scFv), or any region of a full length antibody that recognizes an antigen, for example, an IgG (e.g., an IgG1, IgG2, IgG3, or IgG4 subtype), IgA1, IgA2, IgD, IgE, or IgM antibody.
- The term “framework region” refers to the art-recognized portions of an antibody variable region that exist between the more divergent (i.e., hypervariable) CDRs. Such framework regions are typically referred to as
frameworks 1 through 4 (FRM1, FRM2, FRM3, and FRM4) and provide a scaffold for the presentation of the six CDRs (three from the heavy chain and three from the light chain) in three dimensional space, to form an antigen-binding surface. - The term “canonical structure” refers to the main chain conformation that is adopted by the antigen binding (CDR) loops. From comparative structural studies, it has been found that five of the six antigen binding loops have only a limited repertoire of available conformations. Each canonical structure can be characterized by the torsion angles of the polypeptide backbone. Correspondent loops between antibodies may, therefore, have very similar three dimensional structures, despite high amino acid sequence variability in most parts of the loops (Chothia and Lesk, J. Mol. Biol., 1987, 196: 901; Chothia et al., Nature, 1989, 342: 877; Martin and Thornton, J. Mol. Biol., 1996, 263: 800, each of which is incorporated by reference in its entirety). Furthermore, there is a relationship between the adopted loop structure and the amino acid sequences surrounding it. The conformation of a particular canonical class is determined by the length of the loop and the amino acid residues residing at key positions within the loop, as well as within the conserved framework (i.e., outside of the loop). Assignment to a particular canonical class can therefore be made based on the presence of these key amino acid residues. The term “canonical structure” may also include considerations as to the linear sequence of the antibody, for example, as catalogued by Kabat (Kabat et al., in “Sequences of Proteins of Immunological Interest,” 5th Edition, U.S. Department of Heath and Human Services, 1992). The Kabat numbering scheme is a widely adopted standard for numbering the amino acid residues of an antibody variable domain in a consistent manner. Additional structural considerations can also be used to determine the canonical structure of an antibody. For example, those differences not fully reflected by Kabat numbering can be described by the numbering system of Chothia et al. and/or revealed by other techniques, for example, crystallography and two or three-dimensional computational modeling. Accordingly, a given antibody sequence may be placed into a canonical class which allows for, among other things, identifying appropriate chassis sequences (e.g., based on a desire to include a variety of canonical structures in a library). Kabat numbering of antibody amino acid sequences and structural considerations as described by Chothia et al., and their implications for construing canonical aspects of antibody structure, are described in the literature.
- The terms “CDR”, and its plural “CDRs”, refer to a complementarity determining region (CDR) of which three make up the binding character of a light chain variable region (CDRL1, CDRL2 and CDRL3) and three make up the binding character of a heavy chain variable region (CDRH1, CDRH2 and CDRH3). CDRs contribute to the functional activity of an antibody molecule and are separated by amino acid sequences that comprise scaffolding or framework regions. The exact definitional CDR boundaries and lengths are subject to different classification and numbering systems. CDRs may therefore be referred to by Kabat, Chothia, contact or any other boundary definitions, including the numbering system described herein. Despite differing boundaries, each of these systems has some degree of overlap in what constitutes the so called “hypervariable regions” within the variable sequences. CDR definitions according to these systems may therefore differ in length and boundary areas with respect to the adjacent framework region. See for example Kabat, Chothia, and/or MacCallum et al., (Kabat et al., in “Sequences of Proteins of Immunological Interest,” 5th Edition, U.S. Department of Health and Human Services, 1992; Chothia et al., J. Mol. Biol., 1987, 196: 901; and MacCallum et al., J. Mol. Biol., 1996, 262: 732, each of which is incorporated by reference in its entirety).
- The term “amino acid” or “amino acid residue” typically refers to an amino acid having its art recognized definition such as an amino acid selected from the group consisting of: alanine (Ala or A); arginine (Arg or R); asparagine (Asn or N); aspartic acid (Asp or D); cysteine (Cys or C); glutamine (Gln or Q); glutamic acid (Glu or E); glycine (Gly or G); histidine (His or H); isoleucine (Ile or I): leucine (Leu or L); lysine (Lys or K); methionine (Met or M); phenylalanine (Phe or F); proline (Pro or P); serine (Ser or S); threonine (Thr or T); tryptophan (Trp or W); tyrosine (Tyr or Y); and valine (Val or V), although modified, synthetic, or rare amino acids may be used as desired. Generally, amino acids can be grouped as having a nonpolar side chain (e.g., Ala, Cys, Ile, Leu, Met, Phe, Pro, Val); a negatively charged side chain (e.g., Asp, Glu); a positively charged sidechain (e.g., Arg, His, Lys); or an uncharged polar side chain (e.g., Asn, Cys, Gln, Gly, His, Met, Phe, Ser, Thr, Trp, and Tyr).
- The term “polynucleotide(s)” refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry). As desired, the polynucleotides may be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, be modified. Typical modifications include methylation, biotinylation, and other art-known modifications. In addition, the nucleic acid molecule can be single-stranded or double-stranded and, where desired, linked to a detectable moiety.
- The terms “theoretical diversity”, “theoretical total diversity”, or “theoretical repertoire” refer to the maximum number of variants in a library design. For example, given an amino acid sequence of three residues, where residues one and three may each be any one of five amino acid types and residue two may be any one of 20 amino acid types, the theoretical diversity is 5×20×5=500 possible sequences. Similarly if sequence X is constructed by combination of 4 amino acid segments, where
segment 1 has 100 possible sequences,segment 2 has 75 possible sequences,segment 3 has 250 possible sequences, andsegment 4 has 30 possible sequences, the theoretical total diversity of fragment X would be 100×75×200×30, or 5.6×105 possible sequences. - The term “physical realization” refers to a portion of the theoretical diversity that can actually be physically sampled, for example, by any display methodology. Exemplary display methodology include: phage display, ribosomal display, and yeast display. For synthetic sequences, the size of the physical realization of a library depends on (1) the fraction of the theoretical diversity that can actually be synthesized, and (2) the limitations of the particular screening method. Exemplary limitations of screening methods include the number of variants that can be screened in a particular assay (e.g., ribosome display, phage display, yeast display) and the transformation efficiency of a host cell (e.g., yeast, mammalian cells, bacteria) which is used in a screening assay. For the purposes of illustration, given a library with a theoretical diversity of 1012 members, an exemplary physical realization of the library (e.g., in yeast, bacterial cells, ribosome display, etc.; details provided below) that can maximally include 1011 members will, therefore, sample about 10% of the theoretical diversity of the library. However, if less than 1011 members of the library with a theoretical diversity of 1012 are synthesized, and the physical realization of the library can maximally include 1011 members, less than 10% of the theoretical diversity of the library is sampled in the physical realization of the library. Similarly, a physical realization of the library that can maximally include more than 1012 members would “oversample” the theoretical diversity, meaning that each member may be present more than once (assuming that the entire 1012 theoretical diversity is synthesized).
- The term “all possible reading frames” encompasses at least the three forward reading frames and, in some embodiments, the three reverse reading frames.
- The term “antibody of interest” refers to any antibody that has a property of interest that is isolated from a library of the invention. The property of interest may include, but is not limited to, binding to a particular antigen or epitope, blocking a binding interaction between two molecules, or eliciting a certain biological effect.
- The term “functionally expressed” refers to those immunoglobulin genes that are expressed by human B cells and that do not contain premature stop codons.
- The term “full-length heavy chain” refers to an immunoglobulin heavy chain that contains each of the canonical structural domains of an immunoglobulin heavy chain, including the four framework regions, the three CDRs, and the constant region. The term “full-length light chain” refers to an immunoglobulin light chain that contains each of the canonical structural domains of an immunoglobulin light chain, including the four framework regions, the three CDRs, and the constant region.
- The term “unique,” as used herein, refers to a sequence that is different (e.g. has a different chemical structure) from every other sequence within the designed theoretical diversity. It should be understood that there are likely to be more than one copy of many unique sequences from the theoretical diversity in a particular physical realization. For example, a library comprising three unique sequences may comprise nine total members if each sequence occurs three times in the library. However, in certain embodiments, each unique sequence may occur only once.
- The term “heterologous moiety” is used herein to indicate the addition of a composition to an antibody wherein the composition is not normally part of the antibody. Exemplary heterologous moieties include drugs, toxins, imaging agents, and any other compositions which might provide an activity that is not inherent in the antibody itself.
- As used herein, the term “percent occurrence of each amino acid residue at each position” refers to the percentage of instances in a sample in which an amino acid is found at a defined position within a particular sequence. For example, given the following three sequences:
-
- K V R
- K Y P
- K R P,
K occurs in position one in 100% of the instances and P occurs in position three in about 67% of the instances. In certain embodiments of the invention, the sequences selected for comparison are human immunoglobulin sequences.
- As used herein, the term “most frequently occurring amino acids” at a specified position of a sequence in a population of polypeptides refers to the amino acid residues that have the highest percent occurrence at the indicated position in the indicated polypeptide population. For example, the most frequently occurring amino acids in each of the three most N-terminal positions in N1 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 21, and the most frequently occurring amino acids in each of the three most N-terminal positions in N2 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 22.
- For the purposes of analyzing the occurrence of certain duplets (Example 13) and the information content (Example 14) of the libraries of the invention, and other libraries, a “central loop” of CDRH3 is defined. If the C-
terminal 5 amino acids from Kabat CDRH3 (95-102) are removed, then the remaining sequence is termed the “central loop”. Thus, considering the duplet occurrence calculations of Example 13, using a CDRH3 ofsize 6 or less would not contribute to the analysis of the occurrence of duplets. A CDRH3 ofsize 7 would contribute only to the i−i+1 data set, a CDRH3 ofsize 8 would also contribute to the i−i+2 data set, and a CDRH3 ofsize 9 and larger would also contribute to the i−i+3 data set. For example, a CDR H3 ofsize 9 may have amino acids at positions 95-96-97-98-99-100-100A-101-102, but only the first four residues (bolded) would be part of the central loop and contribute to the pair-wise occurrence (duplet) statistics. As a further example, a CDRH3 ofsize 14 may have the sequence: 95-96-97-98-99-100-100A-100B-100C-100D-100E-100F-101-102. Here, only the first nine residues (bolded) contribute to the central loop. - Library screening requires a genotype-phenotype linkage. The term “genotype-phenotype linkage” is used in a manner consistent with its art-recognized meaning and refers to the fact that the nucleic acid (genotype) encoding a protein with a particular phenotype (e.g., binding an antigen) can be isolated from a library. For the purposes of illustration, an antibody fragment expressed on the surface of a phage can be isolated based on its binding to an antigen (e.g., Ladner et al.). The binding of the antibody to the antigen simultaneously enables the isolation of the phage containing the nucleic acid encoding the antibody fragment. Thus, the phenotype (antigen-binding characteristics of the antibody fragment) has been “linked” to the genotype (nucleic acid encoding the antibody fragment). Other methods of maintaining a genotype-phenotype linkage include those of Wittrup et al. (U.S. Pat. Nos. 6,300,065, 6,331,391, 6,423,538, 6,696,251, 6,699,658, and US Pub. No. 20040146976, each of which is incorporated by reference in its entirety), Miltenyi (U.S. Pat. No. 7,166,423, incorporated by reference in its entirety), Fandl (U.S. Pat. No. 6,919,183, US Pub No. 20060234311, each incorporated by reference in its entirety), Clausell-Tormos et al. (Chem. Biol., 2008, 15: 427, incorporated by reference in its entirety), Love et al. (Nat. Biotechnol., 2006, 24: 703, incorporated by reference in its entirety), and Kelly et al. (Chem. Commun., 2007, 14: 1773, incorporated by reference in its entirety). Any method which localizes the antibody protein with the gene encoding the antibody, in a way in which they can both be recovered while the linkage between them is maintained, is suitable.
- The antibody libraries of the invention are designed to reflect certain aspects of the preimmune repertoire as naturally created by the human immune system. Certain libraries of the invention are based on rational design informed by the collection of human V, D, and J genes, and other large databases of human heavy and light chain sequences (e.g., publicly known germline sequences; sequences from Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety; sequences from Lee et al., Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety; and sequences compiled for rearranged VK and Vλ—see Appendices A and B filed herewith). Additional information may be found, for example, in Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Tomlinson et al., J. Mol. Biol., 1992, 227: 799; and Matsuda et al., J. Exp. Med., 1998, 188: 2151 each incorporated by reference in its entirety. In certain embodiments of the invention, cassettes representing the possible V, D, and J diversity found in the human repertoire, as well as junctional diversity (i.e., N1 and N2), are synthesized de novo as single or double-stranded DNA oligonucleotides. In certain embodiments of the invention, oligonucleotide cassettes encoding CDR sequences are introduced into yeast along with one or more acceptor vectors containing heavy or light chain chassis sequences. No primer-based PCR amplification or template-directed cloning steps from mammalian cDNA or mRNA are employed. Through standard homologous recombination, the recipient yeast recombines the cassettes (e.g., CDR3s) with the acceptor vector(s) containing the chassis sequence(s) and constant regions, to create a properly ordered synthetic, full-length human heavy chain and/or light chain immunoglobulin library that can be genetically propagated, expressed, displayed, and screened. One of ordinary skill in the art will readily recognize that the chassis contained in the acceptor vector can be designed so as to produce constructs other than full-length human heavy chains and/or light chains. For example, in certain embodiments of the invention, the chassis may be designed to encode portions of a polypeptide encoding an antibody fragment or subunit of an antibody fragment, so that a sequence encoding an antibody fragment, or subunit thereof, is produced when the oligonucleotide cassette containing the CDR is recombined with the acceptor vector.
- In certain embodiments, the invention provides a synthetic, preimmune human antibody repertoire comprising about 107 to about 1020 antibody members, wherein the repertoire comprises:
-
- (a) selected human antibody heavy chain chassis (i.e.,
amino acids 1 to 94 of the heavy chain variable region, using Kabat's definition); - (b) a CDRH3 repertoire, designed based on the human IGHD and IGHJ germline sequences, the CDRH3 repertoire comprising the following:
- (i) optionally, one or more tail regions;
- (ii) one or more N1 regions, comprising about 0 to about 10 amino acids selected from the group consisting of fewer than 20 of the amino acid types preferentially encoded by the action of terminal deoxynucleotidyl transferase (TdT) and functionally expressed by human B cells;
- (iii) one or DH segments, based on one or more selected IGHD segments, and one or more N- or C-terminal truncations thereof;
- (iv) one or more N2 regions, comprising about 0 to about 10 amino acids selected from the group consisting of fewer than 20 of the amino acids preferentially encoded by the activity of TdT and functionally expressed by human B cells; and
- (v) one or more H3-JH segments, based on one or more IGHJ segments, and one or more N-terminal truncations thereof (e.g., down to XXWG);
- (c) one or more selected human antibody kappa and/or lambda light chain chassis; and
- (d) a CDRL3 repertoire designed based on the human IGLV and IGLJ germline sequences, wherein “L” may be a kappa or lambda light chain.
- (a) selected human antibody heavy chain chassis (i.e.,
- The heavy chain chassis may be any sequence with homology to
Kabat residues 1 to 94 of an immunoglobulin heavy chain variable domain. Non-limiting examples of heavy chain chassis are included in the Examples, and one of ordinary skill in the art will readily recognize that the principles presented therein, and throughout the specification, may be used to derive additional heavy chain chassis. - As described above, the heavy chain chassis region is followed, optionally, by a “tail” region. The tail region comprises zero, one, or more amino acids that may or may not be selected on the basis of comparing naturally occurring heavy chain sequences. For example, in certain embodiments of the invention, heavy chain sequences available in the art may be compared, and the residues occurring most frequently in the tail position in the naturally occurring sequences included in the library (e.g., to produce sequences that most closely resemble human sequences). In other embodiments, amino acids that are used less frequently may be used. In still other embodiments, amino acids selected from any group of amino acids may be used. In certain embodiments of the invention, the length of the tail is zero (no residue) or one (e.g., G/D/E) amino acid. For the purposes of clarity, and without being bound by theory, in the naturally occurring human repertoire, the first ⅔ of the codon encoding the tail residue is provided by the FRM3 region of the VH gene. The amino acid at this position in naturally occurring heavy chain sequences may thus be considered to be partially encoded by the IGHV gene (⅔) and partially encoded by the CDRH3 (⅓). However, for the purposes of clearly illustrating certain aspects of the invention, the entire codon encoding the tail residue (and, therefore, the amino acid derived from it) is described herein as being part of the CDRH3 sequence.
- As described above, there are two peptide segments derived from nucleotides which are added by TdT in the naturally occurring human antibody repertoire. These segments are designated N1 and N2 (referred to herein as N1 and N2 segments, domains, regions or sequences). In certain embodiments of the invention, N1 and N2 are about 0, 1, 2, or 3 amino acids in length. Without being bound by theory, it is thought that these lengths most closely mimic the N1 and N2 lengths found in the human repertoire (see
FIG. 2 ). In other embodiments of the invention, N1 and N2 may be about 4, 5, 6, 7, 8, 9, or 10 amino acids in length. Similarly, the composition of the amino acid residues utilized to produce the N1 and N2 segments may also vary. In certain embodiments of the invention, the amino acids used to produce N1 and N2 segments may be selected from amongst the eight most frequently occurring amino acids in the N1 and N2 domains of the human repertoire (e.g., G, R, S, P, L, A, V, and T). In other embodiments of the invention, the amino acids used to produce the N1 and N2 segments may be selected from the group consisting of fewer than about 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, or 3 of the amino acids preferentially encoded by the activity of TdT and functionally expressed by human B cells. Alternatively, N1 and N2 may comprise amino acids selected from any group of amino acids. It is not required that N1 and N2 be of a similar length or composition, and independent variation of the length and composition of N1 and N2 is one method by which additional diversity may be introduced into the library. - The DH segments of the libraries are based on the peptides encoded by the naturally occurring IGHD gene repertoire, with progressive deletion of residues at the N- and C-termini. IGHD genes may be read in multiple reading frames, and peptides representing these reading frames, and their N- and C-terminal deletions are also included in the libraries of the invention. In certain embodiments of the invention, DH segments as short as three amino acid residues may be included in the libraries. In other embodiments of the invention, DH segments as short as about 1, 2, 4, 5, 6, 7, or 8 amino acids may be included in the libraries.
- The H3-JH segments of the libraries are based on the peptides encoded by the naturally occurring IGHJ gene repertoire, with progressive deletion of residues at the N-terminus. The N-terminal portion of the IGHJ segment that makes up part of the CDRH3 is referred to herein as H3-JH. In certain embodiments of the invention, the H3-JH segment may be represented by progressive N-terminal deletions of one or more H3-JH residues, down to two H3-JH residues. In other embodiments of the invention, the H3-JH segments of the library may contain N-terminal deletions (or no deletions) down to about 6, 5, 4, 3, 2, 1, or 0 H3-JH residues.
- The light chain chassis of the libraries may be any sequence with homology to
Kabat residues 1 to 88 of naturally occurring light chain (κ or λ) sequences. In certain embodiments of the invention, the light chain chassis of the invention are synthesized in combinatorial fashion, utilizing VL and JL segments, to produce one or more libraries of light chain sequences with diversity in the chassis and CDR3 sequences. In other embodiments of the invention, the light chain CDR3 sequences are synthesized using degenerate oligonucleotides or trinucleotides and recombined with the light chain chassis and light chain constant region, to form full-length light chains. - The instant invention also provides methods for producing and using such libraries, as well as libraries comprising one or more immunoglobulin domains or antibody fragments. Design and synthesis of each component of the claimed antibody libraries is provided in more detail below.
- One step in building certain libraries of the invention is the selection of chassis sequences, which are based on naturally occurring variable domain sequences (e.g., IGHV and IGLV). This selection can be done arbitrarily, or by the selection of chassis that meet certain criteria. For example, the Kabat database, an electronic database containing non-redundant rearranged antibody sequences, can be queried for those heavy and light chain germline sequences that are most frequently represented. The BLAST search algorithm, or more specialized tools such as SoDA (Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety), can be used to compare rearranged antibody sequences with germline sequences, using the V BASE2 database (Retter et al., Nucleic Acids Res., 2005, 33: D671-D674), or similar collections of human V, D, and J genes, to identify the germline families that are most frequently used to generate functional antibodies.
- Several criteria can be utilized for the selection of chassis for inclusion in the libraries of the invention. For example, sequences that are known (or have been determined) to express poorly in yeast, or other organisms used in the invention (e.g., bacteria, mammalian cells, fungi, or plants) can be excluded from the libraries. Chassis may also be chosen based on their representation in the peripheral blood of humans. In certain embodiments of the invention, it may be desirable to select chassis that correspond to germline sequences that are highly represented in the peripheral blood of humans. In other embodiments, it may be desirable to select chassis that correspond to germline sequences that are less frequently represented, for example, to increase the canonical diversity of the library. Therefore, chassis may be selected to produce libraries that represent the largest and most structurally diverse group of functional human antibodies. In other embodiments of the invention, less diverse chassis may be utilized, for example, if it is desirable to produce a smaller, more focused library with less chassis variability and greater CDR variability. In some embodiments of the invention, chassis may be selected based on both their expression in a cell of the invention (e.g., a yeast cell) and the diversity of canonical structures represented by the selected sequences. One may therefore produce a library with a diversity of canonical structures that express well in a cell of the invention.
- In certain embodiments of the invention, the antibody library comprises variable heavy domains and variable light domains, or portions thereof. Each of these domains is built from certain components, which will be more fully described in the examples provided herein. In certain embodiments, the libraries described herein may be used to isolate fully human antibodies that can be used as diagnostics and/or therapeutics. Without being bound by theory, antibodies with sequences most similar or identical to those most frequently found in peripheral blood (for example, in humans) may be less likely to be immunogenic when administered as therapeutic agents.
- Without being bound by theory, and for the purposes of illustrating certain embodiments of the invention, the VH domains of the library may be considered to comprise three primary components: (1) a VH “chassis”, which includes amino acids 1 to 94 (using Kabat numbering), (2) the CDRH3, which is defined herein to include the Kabat CDRH3 proper (positions 95-102), and (3) the FRM4 region, including amino acids 103 to 113 (Kabat numbering). The overall VH structure may therefore be depicted schematically (not to scale) as:
- The selection and design of VH chassis sequences based on the human IGHV germline repertoire will become more apparent upon review of the examples provided herein. In certain embodiments of the invention, the VH chassis sequences selected for use in the library may correspond to all functionally expressed human IGHV germline sequences. Alternatively, IGHV germline sequences may be selected for representation in a library according to one or more criteria. For example, in certain embodiments of the invention, the selected IGHV germline sequences may be among those that are most highly represented among antibody molecules isolated from the peripheral blood of healthy adults, children, or fetuses.
- In certain embodiments, it may be desirable to base the design of the VH chassis on the utilization of IGHV germline sequences in adults, children, or fetuses with a disease, for example, an autoimmune disease. Without being bound by theory, it is possible that analysis of germline sequence usage in the antibody molecules isolated from the peripheral blood of individuals with autoimmune disease may provide information useful for the design of antibodies recognizing human antigens.
- In some embodiments, the selection of IGHV germline sequences for representation in a library of the invention may be based on their frequency of occurrence in the peripheral blood. For the purposes of illustration, four IGHV1 germline sequences (IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV1-46 (SEQ ID NO: 26), and IGHV1-69 (SEQ ID NO: 27)) comprise about 80% of the IGHV1 family repertoire in peripheral blood. Thus, the specific IGHV1 germline sequences selected for representation in the library may include those that are most frequently occurring and that cumulatively comprise at least about 80% of the IGHV1 family repertoire found in peripheral blood. An analogous approach can be used to select specific IGHV germline sequences from any other IGHV family (i.e., IGHV1, IGHV2, IGHV3, IGHV4, IGHV5, IGHV6, and IGHV7). The specific germline sequences chosen for representation of a particular IGHV family in a library of the invention may therefore comprise at least about 100%, 99%, 98%, 97%, 96% 95%, 94%, 93%, 92%, 91% 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or 0% of the particular IGHV family member repertoire found in peripheral blood.
- In some embodiments, the selected IGHV germline sequences may be chosen to maximize the structural diversity of the VH chassis library. Structural diversity may be evaluated by, for example, comparing the lengths, compositions, and canonical structures of CDRH1 and CDRH2 in the IGHV germline sequences. In human IGHV sequences, the CDRH1 (Kabat definition) may have a length of 5, 6 or 7 amino acids, while CDRH2 (Kabat definition) may have length of 16, 17, 18 or 19 amino acids. The amino acid compositions of the IGHV germline sequences and, in particular, the CDR domains, may be evaluated by sequence alignments, as presented in the Examples. Canonical structure may be assigned, for example, according to the methods described by Chothia et al., J. Mol. Biol., 1992, 227: 799, incorporated by reference in its entirety.
- In certain embodiments of the invention, it may be advantageous to design VH chassis based on IGHV germline sequences that may maximize the probability of isolating an antibody with particular characteristics. For example, without being bound by theory, in some embodiments it may be advantageous to restrict the IGHV germline sequences to include only those germline sequences that are utilized in antibodies undergoing clinical development, or antibodies that have been approved as therapeutics. On the other hand, in some embodiments, it may be advantageous to produce libraries containing VH chassis that are not represented amongst clinically utilized antibodies. Such libraries may be capable of yielding antibodies with novel properties that are advantageous over those obtained with the use of “typical” IGHV germline sequences, or enabling studies of the structures and properties of “atypical” IGHV germline sequences or canonical structures.
- One of ordinary skill in the art will readily recognize that a variety of other criteria can be used to select IGHV germline sequences for representation in a library of the invention. Any of the criteria described herein may also be combined with any other criteria. Further exemplary criteria include the ability to be expressed at sufficient levels in certain cell culture systems, solubility in particular antibody formats (e.g., whole immunoglobulins and antibody fragments), and the thermodynamic stability of the individual domains, whole immunoglobulins, or antibody fragments. The methods of the invention may be applied to select any IGHV germline sequence that has utility in an antibody library of the instant invention.
- In certain embodiments of the invention, the VH chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 94 of one or more of the following IGHV germline sequences: IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NO: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV3-9 (SEQ ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV3-13 (SEQ ID NO: 435), IGHV3-15 (SEQ ID NO: 29), IGHV3-20 (SEQ ID NO: 436), IGHV3-21 (SEQ ID NO: 437), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-43 (SEQ ID NO: 438), IGHV3-48 (SEQ ID NO: 33), IGHV3-49 (SEQ ID NO: 439), IGHV3-53 (SEQ ID NO: 440), IGHV3-64 (SEQ ID NO: 441), IGHV3-66 (SEQ ID NO: 442), IGHV3-72 (SEQ ID NO: 443), IGHV3-73 (SEQ ID NO: 444), IGHV3-74 (SEQ ID NO: 445), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-28 (SEQ ID NO: 448), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), IGHV5-51 (SEQ ID NO: 40), IGHV6-1 (SEQ ID NO: 449), and IGHV7-4-1 (SEQ ID NO: 450). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- In other embodiments, the VH chassis of the libraries may comprise from about
Kabat residue 1 to about Kabat residue 94 of the following IGHV germline sequences: IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV1-46 (SEQ ID NO: 26), IGHV1-69 (SEQ ID NO: 27), IGHV3-7 (SEQ ID NO: 28), IGHV3-15 (SEQ ID NO: 29), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-48 (SEQ ID NO: 33), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), and IGHV5-51 (SEQ ID NO: 40). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 5. - While the selection of the VH chassis with sequences based on the IGHV germline sequences is expected to support a large diversity of CDRH3 sequences, further diversity in the VH chassis may be generated by altering the amino acid residues comprising the CDRH1 and/or CDRH2 regions of each chassis selected for inclusion in the library (see Example 2).
- In certain embodiments of the invention, the alterations or mutations in the amino acid residues comprising the CDRH1 and CDRH2 regions, or other regions, of the IGHV germline sequences are made after analyzing the sequence identity within data sets of rearranged human heavy chain sequences that have been classified according to the identity of the original IGHV germline sequence from which the rearranged sequences are derived. For example, from a set of rearranged antibody sequences, the IGHV germline sequence of each antibody is determined, and the rearranged sequences are classified according to the IGHV germline sequence. This determination is made on the basis of sequence identity.
- Next, the occurrence of any of the 20 amino acid residues at each position in these sequences is determined. In certain embodiments of the invention, one may be particularly interested in the occurrence of different amino acid residues at the positions within CDRH1 and CDRH2, for example if increasing the diversity of the antigen-binding portion of the VH chassis is desired. In other embodiments of the invention, it may be desirable to evaluate the occurrence of different amino acid residues in the framework regions. Without being bound by theory, alterations in the framework regions may impact antigen binding by altering the spatial orientation of the CDRs.
- After the occurrence of amino acids at each position of interest has been identified, alterations may be made in the VH chassis sequence, according to certain criteria. In some embodiments, the objective may be to produce additional VH chassis with sequence variability that mimics the variability observed in the heavy chain domains of rearranged human antibody sequences (derived from respective IGHV germline sequences) as closely as possible, thereby potentially obtaining sequences that are most human in nature (i.e., sequences that most closely mimic the composition and length of human sequences). In this case, one may synthesize additional VH chassis sequences that include mutations naturally found at a particular position and include one or more of these VH chassis sequences in a library of the invention, for example, at a frequency that mimics the frequency found in nature. In another embodiment of the invention, one may wish to include VH chassis that represent only mutations that most frequently occur at a given position in rearranged human antibody sequences. For example, rather than mimicking the human variability precisely, as described above, and with reference to exemplary Tables 6 and 7, one may choose to include only top 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1, amino acid residues that most frequently occur at each position. For the purposes of illustration, and with reference to Table 6, if one wished to include the top four most frequently occurring amino acid residues at position 31 of the VH1-69 sequence, then position 31 in the VH1-69 sequence would be varied to include S, N, T, and R. Without being bound by theory, it is thought that the introduction of diversity by mimicking the naturally occurring composition of the rearranged heavy chain sequences is likely to produce antibodies that are most human in composition. However, the libraries of the invention are not limited to heavy chain sequences that are diversified by this method, and any criteria can be used to introduce diversity into the heavy chain chassis, including random or rational mutagenesis. For example, in certain embodiments of the invention, it may be preferable to substitute neutral and/or smaller amino acid residues for those residues that occur in the IGHV germline sequence. Without being bound by theory, neutral and/or smaller amino acid residues may provide a more flexible and less sterically hindered context for the display of a diversity of CDR sequences.
- Example 2 illustrates the application of this method to heavy chains derived from a particular IGHV germline. One of ordinary skill in the art will readily recognize that this method can be applied to any germline sequence, and can be used to generate at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 1000, 104, 105, 106, or more variants of each heavy chain chassis.
- The light chain chassis of the invention may be based on kappa and/or lambda light chain sequences. The principles underlying the selection of light chain variable (IGLV) germline sequences for representation in the library are analogous to those employed for the selection of the heavy chain sequences (described above and in Examples 1 and 2). Similarly, the methods used to introduce variability into the selected heavy chain chassis may also be used to introduce variability into the light chain chassis.
- Without being bound by theory, and for the purposes of illustrating certain embodiments of the invention, the VL domains of the library may be considered to comprise three primary components: (1) a VL “chassis”, which includes amino acids 1 to 88 (using Kabat numbering), (2) the VLCDR3, which is defined herein to include the Kabat CDRL3 proper (positions 89-97), and (3) the FRM4 region, including amino acids 98 to 107 (Kabat numbering). The overall VL structure may therefore be depicted schematically (not to scale) as:
- In certain embodiments of the invention, the VL chassis of the libraries include one or more chassis based on IGKV germline sequences. In certain embodiments of the invention, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of one or more of the following IGKV germline sequences: IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41. In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- In other embodiments, the VL chassis of the libraries may comprise from about
Kabat residue 1 to aboutKabat residue 88 of the following IGKV germline sequences: IGKV1-05 (SEQ ID NO: 229), IGKV1-12 (SEQ ID NO: 230), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-39 (SEQ ID NO: 233), IGKV2-28 (SEQ ID NO: 234), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), and IGKV4-1 (SEQ ID NO: 238). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 11. - In certain embodiments of the invention, the VL chassis of the libraries include one or more chassis based on IGλV germline sequences. In certain embodiments of the invention, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of one or more of the following IGλV germline sequences: IGλV3-1 (SEQ ID NO: 535), IGλV3-21 (SEQ ID NO: 537), IGλV2-14 (SEQ ID NO: 534), IGλV1-40 (SEQ ID NO: 531), IGλV3-19 (SEQ ID NO: 536), IGλV1-51 (SEQ ID NO: 533), IGλV1-44 (SEQ ID NO: 532), IGλV6-57 (SEQ ID NO: 539), IGλV2-8, IGλV3-25, IGλV2-23, IGλV3-10, IGλV4-69 (SEQ ID NO: 538), IGλV1-47, IGλV2-11, IGλV7-43 (SEQ ID NO: 541), IGλV7-46, IGλV5-45 (SEQ ID NO: 540), IGλV4-60, IGλV10-54 (SEQ ID NO: 482), IGλV8-61 (SEQ ID NO: 499), IGλV3-9 (SEQ ID NO: 494), IGλV1-36 (SEQ ID NO: 480), IGλV2-18 (SEQ ID NO: 485), IGλV3-16 (SEQ ID NO: 491), IGλV3-27 (SEQ ID NO: 493), IGλV4-3 (SEQ ID NO: 495), IGλV5-39 (SEQ ID NO: 497), IGλV9-49 (SEQ ID NO: 500), and IGλV3-12 (SEQ ID NO: 490). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- In other embodiments, the VL chassis of the libraries may comprise from about
Kabat residue 1 to aboutKabat residue 88 of the following IGλV germline sequences: IGλV3-1 (SEQ ID NO: 535), IGλV3-21 (SEQ ID NO: 537), IGλV2-14 (SEQ ID NO: 534), IGλV1-40 (SEQ ID NO: 531), IGλV3-19 (SEQ ID NO: 536), IGλV1-51 (SEQ ID NO: 533), IGλV1-44 (SEQ ID NO: 532), IGλV6-57 (SEQ ID NO: 539), IGλV4-69 (SEQ ID NO: 538), IGλV7-43 (SEQ ID NO: 541), and IGλV5-45 (SEQ ID NO: 540). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 14. - It is known in the art that diversity in the CDR3 region of the heavy chain is sufficient for most antibody specificities (Xu and Davis, Immunity, 2000, 13: 27-45, incorporated by reference in its entirety) and that existing successful libraries have been created using CDRH3 as the major source of diversification (Hoogenboom et al., J. Mol. Biol., 1992, 227: 381; Lee et al., J. Mol. Biol., 2004, 340: 1073 each of which is incorporated by reference in its entirety). It is also known that both the DH region and the N1/N2 regions contribute to the CDRH3 functional diversity (Schroeder et al., J. Immunol., 2005, 174: 7773 and Mathis et al., Eur J Immunol., 1995, 25: 3115, each of which is incorporated by reference in its entirety). For the purposes of the present invention, the CDHR3 region of naturally occurring human antibodies can be divided into five segments: (1) the tail segment, (2) the N1 segment, (3) the DH segment, (4) the N2 segment, and (5) the JH segment. As exemplified below, the tail, N1 and N2 segments may or may not be present.
- In certain embodiments of the invention, the method for selecting amino acid sequences for the synthetic CDRH3 libraries includes a frequency analysis and the generation of the corresponding variability profiles of existing rearranged antibody sequences. In this process, which is described in more detail in the Examples section, the frequency of occurrence of a particular amino acid residue at a particular position within rearranged CDRH3s (or any other heavy or light chain region) is determined. Amino acids that are used more frequently in nature may then be chosen for inclusion in a library of the invention.
- In certain embodiments of the invention, the libraries contain CDRH3 regions comprising one or more segments designed based on the IGHD gene germline repertoire. In some embodiments of the invention, DH segments selected for inclusion in the library are selected and designed based on the most frequent usage of human IGHD genes, and progressive N-terminal and C-terminal deletions thereof, to mimic the in vivo processing of the IGHD gene segments. In some embodiments of the invention, the DH segments of the library are about 3 to about 10 amino acids in length. In some embodiments of the invention, the DH segments of the library are about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids in length, or a combination thereof. In certain embodiments, the libraries of the invention may contain DH segments with a wide distribution of lengths (e.g., about 0 to about 10 amino acids). In other embodiments, the length distribution of the DH may be restricted (e.g., about 1 to about 5 amino acids, about 3 amino acids, about 3 and about 5 amino acids, and so on). In certain embodiments of the library, the shortest DH segments may be about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids.
- In certain embodiments of the invention, libraries may contain DH segments representative of any reading frame of any IGHD germline sequence. In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, or their derivatives (i.e., any reading frame and any degree of N-terminal and C-terminal truncation): IGHD3-10 (SEQ ID NOs: 1-3), IGHD3-22 (SEQ ID NOs: 239, 4, 240), IGHD6-19 (SEQ ID NOs: 5, 6, 241), IGHD6-13 (SEQ ID NOs: 7, 8, 242), IGHD3-3 (SEQ ID NOs: 243, 244, 9), IGHD2-2 (SEQ ID NOs: 245, 10, 11), IGHD4-17 (SEQ ID NOs: 246, 12, 247), IGHD1-26 (SEQ ID NOs: 13, 248 and 14), IGHD5-5/5-18 (SEQ ID NOs: 249, 250, 15), IGHD2-15 (SEQ ID NOs: 251, 16, 252), IGHD6-6 (encoded by SEQ ID NO: 515), IGHD3-9 (encoded by SEQ ID NO: 509), IGHD5-12 (encoded by SEQ ID NO: 512), IGHD5-24 (encoded by SEQ ID NO: 513), IGHD2-21 (encoded by SEQ ID NOs: 505 and 506), IGHD3-16 (encoded by SEQ ID NO: 508), IGHD4-23 (encoded by SEQ ID NO: 510), IGHD1-1 (encoded by SEQ ID NO: 501), IGHD1-7 (encoded by SEQ ID NO: 504), IGHD4-4/4-11 (encoded by SEQ ID NO: 511), IGHD1-20 (encoded by SEQ ID NO: 503), IGHD7-27, IGHD2-8, and IGHD6-25. In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- For the purposes of illustration, progressive N-terminal and C-terminal deletions of IGHD3-10,
reading frame 1, are enumerated in the Table 1. N-terminal and C-terminal deletions of other IGHD sequences and reading frames are also encompassed by the invention, and one of ordinary skill in the art can readily determine these sequences using, for example, the non-limiting exemplary data presented in Table 16. and/or the methods outlined above. Table 18 (Example 5) enumerates certain DH segments used in certain embodiments of the invention. -
TABLE 1 Example of Progressive N- and C-terminal Deletions of Reading Frame 1for Gene IGHD3-10, Yielding DH Segments DH SEQ ID NO: DH SEQ ID NO: VLLWFGELL 1 LWFGEL 604 VLLWFGEL 593 LWFGE 605 VLLWFGE 594 LWFG 606 VLLLWFG 595 LWF VLLWF 596 WFGELL 607 VLLW 597 WFGEL 608 VLL WFGE 609 LLWFGELL 598 WFG LLWFGEL 599 FGELL 610 LLWFGE 600 FGEL 611 LLWFG 601 FGE LLWF 602 GELL 612 LLW GEL LWFGELL 603 ELL - In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, or their derivatives (i.e., any reading frame and any degree N-terminal and C-terminal truncation): IGHD3-(SEQ ID NOs: 1-3), IGHD3-22 (SEQ ID NOs: 239, 4, 240), IGHD6-19 (SEQ ID NOs: 5, 6, 241), IGHD6-13 (SEQ ID NOs: 7, 8, 242), IGHD3-03 (SEQ ID NOs: 243, 244, 9), IGHD2-02 (SEQ ID NOs: 245, 10, 11), IGHD4-17 (SEQ ID NOs: 246, 12, 247), IGHD1-26 (SEQ ID NOs: 13, 248 and 14), IGHD5-5/5-18 (SEQ ID NOs: 249, 250, 15), and IGHD2-15 (SEQ ID NOs: 251, 16, 252). In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, wherein the notation “_x” denotes the reading frame of the gene, or their derivatives (i.e., any degree of N-terminal or C-terminal truncation): IGHD1-26_1 (SEQ ID NO: 13), IGHD1-26_3 (SEQ ID NO: 14), IGHD2-2_2 (SEQ ID NO: 10), IGHD2-2_3 (SEQ ID NO: 11), IGHD2-15_2 (SEQ ID NO: 16), IGHD3-3_3 (SEQ ID NO: 9), IGHD3-10_1 (SEQ ID NO: 1), IGHD3-10_2 (SEQ ID NO: 2), IGHD3-10_3 (SEQ ID NO: 3), IGHD3-22_2 (SEQ ID NO: 4), IGHD4-17_2 (SEQ ID NO: 12), IGHD5-5_3 (SEQ ID NO: 15), IGHD6-13_1 (SEQ ID NO: 7), IGHD6-13_2 (SEQ ID NO: 8), IGHD6-19_1 (SEQ ID NO: 5), and IGHD6-19_2 (SEQ ID NO: 6). In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.
- In certain embodiments of the invention, the libraries are designed to reflect a pre-determined length distribution of N- and C-terminal deleted IGHD segments. For example, in certain embodiments of the library, the DH segments of the library may be designed to mimic the natural length distribution of DH segments found in the human repertoire. For example, the relative occurrence of different IGHD segments in rearranged human antibody heavy chain domains from Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety). Table 2 shows the relative occurrence of the top 68% of IGHD segments from Lee et al.
-
TABLE 2 Relative Occurrence of Top 68% of IGHD Gene Usage from Lee et al. IGHD Reading Sequence SEQ Relative Frame (Parent) ID NO: Occurrence IGHD3- 10_1 VLLWFGELL 1 4.3% IGHD3- 10_2 YYYGSGSYYN 2 8.4% IGHD3- 10_3 ITMVRGVII 3 4.0% IGHD3- 22_2 YYYDSSGYYY 4 15.6% IGHD6- 19_1 GYSSGWY 5 7.4% IGHD6- 19_2 GIAVAG 6 6.0% IGHD6- 13_1 GYSSSWY 7 8.4% IGHD6- 13_2 GIAAAG 8 5.3% IGHD3- 3_3 ITIFGVVII 9 7.4% IGHD2- 2_2 GYCSSTSCYT 10 5.2% IGHD2-2_3 DIVVVPAAM 11 4.1% IGHD4- 17_2 DYGDY 12 6.8% IGHD1- 26_1 GIVGATT 13 2.9% IGHD1- 26_3 YSGSYY 14 4.3% IGHD5- 5_3 GYSYGY 15 4.3% IGHD2- 15_2 GYCSGGSCYS 16 5.6% - In certain embodiments, these relative occurrences may be used to design a library with DH prevalence that is similar to the IGHD usage found in peripheral blood. In other embodiments of the invention, it may be preferable to bias the library toward longer or shorter DH segments, or DH segments of a particular composition. In other embodiments, it may be desirable to use all DH segments selected for the library in equal proportion.
- In certain embodiments of the invention, the most commonly used reading-frames of the ten most frequently occurring IGHD sequences are utilized, and progressive N-terminal and C-terminal deletions of these sequences are made, thus providing a total of 278 non-redundant DH segments that are used to create a CDRH3 repertoire of the instant invention (Table 18). In some embodiments of the invention, the methods described above can be applied to produce libraries comprising the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 expressed IGHD sequences, and progressive N-terminal and C-terminal deletions thereof. As with all other components of the library, while the DH segments may be selected from among those that are commonly expressed, it is also within the scope of the invention to select these gene segments based on the fact that they are less commonly expressed. This may be advantageous, for example, in obtaining antibodies toward self-antigens or in further expanding the diversity of the library. Alternatively, DH segments can be used to add compositional diversity in a manner that is strictly relative to their occurrence in actual human heavy chain sequences.
- In certain embodiments of the invention, the progressive deletion of IGHD genes containing disulfide loop encoding segments may be limited, so as to leave the loop intact and to avoid the presence of unpaired cysteine residues. In other embodiments of the invention, the presence of the loop can be ignored and the progressive deletion of the IGHD gene segments can occur as for any other segments, regardless of the presence of unpaired cysteine residues. In still other embodiments of the invention, the cysteine residues can be mutated to any other amino acid.
- There are six IGHJ (joining) segments, IGHJ1 (SEQ ID NO: 253), IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: 255), IGHJ4 (SEQ ID NO: 256), IGHJ5 (SEQ ID NO: 257), and IGHJ6 (SEQ ID NO: 258). The amino acid sequences of the parent segments and the progressive N-terminal deletions are presented in Table 20 (Example 5). Similar to the N- and C-terminal deletions that the IGHD genes undergo, natural variation is introduced into the IGHJ genes by N-terminal “nibbling”, or progressive deletion, of one or more codons by exonuclease activity.
- The H3-JH segment refers to the portion of the IGHJ segment that is part of CDRH3. In certain embodiments of the invention, the H3-JH segment of a library comprises one or more of the following sequences: AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), MDV, and DV. In some embodiments of the invention, a library may contain one or more of these sequences, allelic variations thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 60%, 55%, or 50% identical to one or more of these sequences.
- In other embodiments of the invention, the H3-JH segment may comprise about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or more amino acids. For example, the H3-JH segment of JH1_4 (Table 20) has a length of three residues, while non-deleted JH6 has an H3-JH segment length of nine residues. The FRM4-JH region of the IGHJ segment begins with the sequence WG(Q/R)G (SEQ ID NO: 23) and corresponds to the portion of the IGHJ segment that makes up part of
framework 4. In certain embodiments of the invention, as enumerated in Table 20, there are 28 H3-JH segments that are included in a library. In certain other embodiments, libraries may be produced by utilizing about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 of the IGHJ segments enumerated above or in Table 20. - Terminal deoxynucleotidyl transferase (TdT) is a highly conserved enzyme from vertebrates that catalyzes the attachment of 5′ triphosphates to the 3′ hydroxyl group of single- or double-stranded DNA. Hence, the enzyme acts as a template-independent polymerase (Koiwai et al., Nucleic Acids Res., 1986, 14: 5777; Basu et al., Biochem. Biophys. Res. Comm., 1983, 111: 1105, each incorporated by reference in its entirety). In vivo, TdT is responsible for the addition of nucleotides to the V-D and D-J junctions of antibody heavy chains (Alt and Baltimore, PNAS, 1982, 79: 4118; Collins et al., J. Immunol., 2004, 172: 340, each incorporated by reference in its entirety). Specifically, TdT is responsible for creating the N1 and N2 (non-templated) segments that flank the D (diversity) region.
- In certain embodiments of the invention, the length and composition of the N1 and N2 segments are designed rationally, according to statistical biases in amino acid usage found in naturally occurring N1 and N2 segments in human antibodies. One embodiment of a library produced via this method is described in Example 5. According to data compiled from human databases (Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety), there are an average of 3.02 amino acid insertions for N1 and 2.4 amino acid insertions for N2, not taking into account insertions of two nucleotides or less (
FIG. 2 ). In certain embodiments of the invention, N1 and N2 segments are restricted to lengths of zero to three amino acids. In other embodiments of the invention, N1 and N2 may be restricted to lengths of less than about 4, 5, 6, 7, 8, 9, or 10 amino acids. - In some embodiments of the invention, the composition of these sequences may be chosen according to the frequency of occurrence of particular amino acids in the N1 and N2 sequences of natural human antibodies (for examples of this analysis, see, Tables 21 to 23, in Example 5). In certain embodiments of the invention, the eight most commonly occurring amino acids in these regions (i.e., G, R, S, P, L, A, T, and V) are used to design the synthetic N1 and N2 segments. In other embodiments of the invention about the most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 most commonly occurring amino acids may be used in the design of the synthetic N1 and N2 segments. In still other embodiments, all 20 amino acids may be used in these segments. Finally, while it is possible to base the designed composition of the N1 and N2 segments of the invention on the composition of naturally occurring N1 and N2 segments, this is not a requirement. The N1 and N2 segments may comprise amino acids selected from any group of amino acids, or designed according to other criteria considered for the design of a library of the invention. A person of ordinary skill in the art would readily recognize that the criteria used to design any portion of a library of the invention may vary depending on the application of the particular library. It is an object of the invention that it may be possible to produce a functional library through the use of N1 and N2 segments selected from any group of amino acids, no N1 or N2 segments, or the use of N1 and N2 segments with compositions other than those described herein.
- One important difference between the libraries of the current invention and other libraries known in the art is the consideration of the composition of naturally occurring duplet and triplet amino acid sequences during the design of the library. Table 23 shows the top twenty-five naturally occurring duplets in the N1 and N2 regions. Many of these can be represented by the general formula (G/P)(G/R/S/P/L/A/V/T) or (R/S/L/A/V/T)(G/P). In certain embodiments of the invention, the synthetic N1 and N2 regions may comprise all of these duplets. In other embodiments, the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most common naturally occurring N1 and/or N2 duplets. In other embodiments of the invention, the libraries may include duplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.
- Finally, the data from the naturally occurring triplet N1 and N2 regions demonstrates that the naturally occurring N1 and N2 triplet sequences can often be represented by the formulas (G)(G)(G/R/S/P/L/A/V/T), (G)(R/S/P/L/A/V/T)(G), or (R/S/P/L/A/V/T)(G)(G). In certain embodiments of the invention, the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most commonly occurring N1 and/or N2 triplets. In other embodiments of the invention, the libraries may include triplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.
- In certain embodiments of the invention, there are about 59 total N1 segments and about 59 total N2 segments used to create a library of CDRH3s. In other embodiments of the invention, the number of N1 segments, N2 segments, or both is increased to about 141 (see, for example, Example 5). In other embodiments of the invention, one may select a total of about 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 1000, 104, or more N1 and/or N2 segments for inclusion in a library of the invention.
- One of ordinary skill in the art will readily recognize that, given the teachings of the instant specification, it is well within the realm of normal experimentation to extend the analysis detailed herein, for example, to generate additional rankings of naturally occurring duplet and triplet (or higher order) N regions that extend beyond those presented herein (e.g., using sequence alignment, the SoDA algorithm, and any database of human sequences (Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety). An ordinarily skilled artisan would also recognize that, based on the information taught herein, it is now possible to produce libraries that are more diverse or less diverse (i.e., more focused) by varying the number of distinct amino acid sequences used in the N1 pool and/or N2 pool.
- As described above, many alternative embodiments are envisioned, in which the compositions and lengths of the N1 and N2 segments vary from those presented in the Examples herein. In some embodiments, sub-stoichiometric synthesis of trinucleotides may be used for the synthesis of N1 and N2 segments. Sub-stoichiometric synthesis with trinucleotides is described in Knappik et al. (U.S. Pat. No. 6,300,064, incorporated by reference in its entirety). The use of sub-stoichiometric synthesis would enable synthesis with consideration of the length variation in the N1 and N2 sequences.
- In addition to the embodiments described above, a model of the activity of TdT may also be used to determine the composition of the N1 and N2 sequences in a library of the invention. For example, it has been proposed that the probability of incorporating a particular nucleotide base (A, C, G, T) on a polynucleotide, by the activity of TdT, is dependent on the type of base and the base that occurs on the strand directly preceding the base to be added. Jackson et al., (J. Immunol. Methods, 2007, 324: 26, incorporated by reference in its entirety) have constructed a Markov model describing this process. In certain embodiments of the invention, this model may be used to determine the composition of the N1 and/or N2 segments used in libraries of the invention. Alternatively, the parameters presented in Jackson et al. could be further refined to produce sequences that more closely mimic human sequences.
- The CDRH3 libraries of the invention comprise an initial amino acid (in certain exemplary embodiments, G, D, E) or lack thereof (designated herein as position 95), followed by the N1, DH, N2, and H3-JH segments. Thus, in certain embodiments of the invention, the overall design of the CDRH3 libraries can be represented by the following formula:
-
[G/D/E/-]-[N1]-[DH]-[N2]-[H3-JH]. - While the compositions of each portion of a CDRH3 of a library of the invention are more fully described above, the composition of the tail presented above (G/D/E/-) is non-limiting, and that any amino acid (or no amino acid) can be used in this position. Thus, certain embodiments of the invention may be represented by the following formula:
-
[X]-[N1]-[DH]-[N2]-[H3-JH], - wherein [X] is any amino acid residue or no residue.
- In certain embodiments of the invention, a synthetic CDRH3 repertoire is combined with selected VH chassis sequences and heavy chain constant regions, via homologous recombination. Therefore, in certain embodiments of the invention, it may be necessary to include DNA sequences flanking the 5′ and 3′ ends of the synthetic CDRH3 libraries, to facilitate homologous recombination between the synthetic CDRH3 libraries and vectors containing the selected chassis and constant regions. In certain embodiments, the vectors also contain a sequence encoding at least a portion of the non-nibbled region of the IGHJ gene (i.e., FRM4-JH). Thus, a polynucleotide encoding an N-terminal sequence (e.g., CA(K/R/T)) may be added to the synthetic CDRH3 sequences, wherein the N-terminal polynucleotide is homologous with FRM3 of the chassis, while a polynucleotide encoding a C-terminal sequence (e.g., WG(Q/R)G; SEQ ID NO: 23) may be added to the synthetic CDRH3, wherein the C-terminal polynucleotide is homologous with FRM4-JH. Although the sequence WG(Q/R)G (SEQ ID NO: 23) is presented in this exemplary embodiment, additional amino acids, C-terminal to this sequence in FRM4-JH may also be included in the polynucleotide encoding the C-terminal sequence. The purpose of the polynucleotides encoding the N-terminal and C-terminal sequences, in this case, is to facilitate homologous recombination, and one of ordinary skill in the art would recognize that these sequences may be longer or shorter than depicted below. Accordingly, in certain embodiments of the invention, the overall design of the CDRH3 repertoire, including the sequences required to facilitate homologous recombination with the selected chassis, can be represented by the following formula (regions homologous with vector underlined):
-
CA[R/K/T]-[X]-[N1]-[DH]-[N2]-[H3-JH]-[WG(Q/R)G]. - In other embodiments of the invention, the CDRH3 repertoire can be represented by the following formula, which excludes the T residue presented in the schematic above:
-
CA[R/K]-[X]-[N1]-[DH]-[N2]-[H3-JH]-[WG(Q/R)G]. - References describing collections of V, D, and J genes include Scaviner et al., Exp. Clin, Immunogenet., 1999, 16: 243 and Ruiz et al., Exp. Clin. Immunogenet, 1999, 16: 173, each incorporated by reference in its entirety.
- As described throughout this application, in addition to accounting for the composition of naturally occurring CDRH3 segments, the instant invention also takes into account the length distribution of naturally occurring CDRH3 segments. Surveys by Zemlin et al. (JMB, 2003, 334: 733, incorporated by reference in its entirety) and Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety) provide analyses of the naturally occurring CDRH3 lengths. These data show that about 95% of naturally occurring CDRH3 sequences have a length from about 7 to about 23 amino acids. In certain embodiments, the instant invention provides rationally designed antibody libraries with CDRH3 segments which directly mimic the size distribution of naturally occurring CDRH3 sequences. In certain embodiments of the invention, the length of the CDRH3s may be about 2 to about 30, about 3 to about 35, about 7 to about 23, about 3 to about 28, about 5 to about 28, about 5 to about 26, about 5 to about 24, about 7 to about 24, about 7 to about 22, about 8 to about 19, about 9 to about 22, about 9 to about 20, about 10 to about 18, about 11 to about 20, about 11 to about 18, about 13 to about 18, or about 13 to about 16 residues in length.
- In certain embodiments of the invention, the length distribution of a CDRH3 library of the invention may be defined based on the percentage of sequences within a certain length range. For example, in certain embodiments of the invention, CDRH3s with a length of about 10 to about 18 amino acid residues comprise about 84% to about 94% of the sequences of a the library. In some embodiments, sequences within this length range comprise about 89% of the sequences of a library.
- In other embodiments of the invention, CDRH3s with a length of about 11 to about 17 amino acid residues comprise about 74% to about 84% of the sequences of a library. In some embodiments, sequences within this length range comprise about 79% of the sequences of a library.
- In still other embodiments of the invention, CDRH3s with a length of about 12 to about 16 residues comprise about 57% to about 67% of the sequences of a library. In some embodiments, sequences within this length range comprise about 62% of the sequences of a library.
- In certain embodiments of the invention, CDRH3s with a length of about 13 to about 15 residues comprise about 35% to about 45% of the sequences of a library. In some embodiments, sequences within this length range comprise about 40% of the sequences of a library.
- The CDRL3 libraries of the invention can be generated by one of several approaches. The actual version of the CDRL3 library made and used in a particular embodiment of the invention will depend on objectives for the use of the library. More than one CDRL3 library may be used in a particular embodiment; for example, a library containing CDRH3 diversity, with kappa and lambda light chains is within the scope of the invention.
- In certain embodiments of the invention, a CDRL3 library is a VKCDR3 (kappa) library and/or a VλCDR3 (lambda) library. The CDRL3 libraries described herein differ significantly from CDRL3 libraries in the art. First, they consider length variation that is consistent with what is observed in actual human sequences. Second, they take into consideration the fact that a significant portion of the CDRL3 is encoded by the IGLV gene. Third, the patterns of amino acid variation within the IGLV gene-encoded CDRL3 portions are not stochastic and are selected based on depending on the identity of the IGLV gene. Taken together, the second and third distinctions mean that CDRL3 libraries that faithfully mimic observed patterns in human sequences cannot use a generic design that is independent of the chassis sequences in FRM1 to FRM3. Fourth, the contribution of JL to CDRL3 is also considered explicitly, and enumeration of each amino acid residue at the relevant positions is based on the compositions and natural variations of the JL genes themselves.
- As indicated above, and throughout the application, a unique aspect of the design of the libraries of the invention is the germline or “chassis-based” aspect, which is meant to preserve more of the integrity and variability of actual human sequences. This is in contrast to other codon-based synthesis or degenerate oligonucleotide synthesis approaches that have been described in the literature and that aim to produce “one-size-fits-all” (e.g., consensus) libraries (e.g., Knappik, et al., J Mol Biol, 2000, 296: 57; Akamatsu et al., J Immunol, 1993, 151: 4651, each incorporated by reference in its entirety).
- In certain embodiments of the invention, patterns of occurrence of particular amino acids at defined positions within VL sequences are determined by analyzing data available in public or other databases, for example, the NCBI database (see, for example, GI numbers in Appendices A and B filed herewith). In certain embodiments of the invention, these sequences are compared on the basis of identity and assigned to families on the basis of the germline genes from which they are derived. The amino acid composition at each position of the sequence, in each germline family, may then be determined. This process is illustrated in the Examples provided herein.
- In certain embodiments of the invention, the light chain CDR3 library is a VKCDR3 library. Certain embodiments of the invention may use only the most common VKCDR3 length, nine residues; this length occurs in a dominant proportion (greater than about 70%) of human VKCDR3 sequences. In human VKCDR3 sequences of length nine, positions 89-95 are encoded by the IGKV gene and positions 96-97 are encoded by the IGKJ gene. Analysis of human kappa light chain sequences indicates that there are not strong biases in the usage of the IGKJ genes. Therefore, in certain embodiments of the invention, each of the five the IGKJ genes can be represented in equal proportions to create a combinatorial library of (M VK chassis)×(5 JK genes), or a library of size M×5. However, in other embodiments of the invention, it may be desirable to bias IGKJ gene representation, for example to restrict the size of the library or to weight the library toward IGKJ genes known to have particular properties.
- As described in Example 6.1, examination of the first amino acid encoded by the IGKJ gene (position 96) indicated that the seven most common residues found at this position are L, Y, R, W, F, P, and I. These residues cumulatively account for about 85% of the residues found in position 96 in naturally occurring kappa light chain sequences. In certain embodiments of the invention, the amino acid residue at position 96 may be one of these seven residues. In other embodiments of the invention, the amino acid at this position may be chosen from amongst any of the other 13 amino acid residues. In still other embodiments of the invention, the amino acid residue at position 96 may be chosen from amongst the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at position 96, or even residues that never occur at position 96. Similarly, the occurrence of the amino acids selected to occupy position 96 may be equivalent or weighted. In certain embodiments of the invention, it may be desirable to include each of the amino acids selected for inclusion in position 96 at equivalent amounts. In other embodiments of the invention, it may be desirable to bias the composition of position 96 to include particular residues more or less frequently than others. For example, as presented in Example 6.1, arginine occurs at position 96 most frequently when the IGKJ1 (SEQ ID NO: 552) germline sequence is used. Therefore, in certain embodiments of the invention, it may be desirable to bias amino acid usage at position 96 according to the origin of the IGKJ germline sequence(s) and/or the IGKV germline sequence(s) selected for representation in a library.
- Therefore, in certain embodiments of the invention, a minimalist VKCDR3 library may be represented by one or more of the following amino acid sequences:
-
[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[JK*] -
[VK_Chassis]-[L3-VK]-[X]-[JK*] - In these schematic exemplary sequences, VK_Chassis represents any VK chassis selected for inclusion in a library of the invention (e.g., see Table 11). Specifically, VK_Chassis comprises about
Kabat residues 1 to 88 of a selected IGKV sequence. L3-VK represents the portion of the VKCDR3 encoded by the chosen IGKV gene (in this embodiment, Kabat residues 89-95). F, L, I, R, W, Y, and P are the seven most commonly occurring amino acids at position 96 of VKCDR3s with length nine, X is any amino acid, and JK* is an IGKJ amino acid sequence without the N-terminal residue (i.e., the N-terminal residue is substituted with F, L, I, R, W, Y, P, or X). Thus, in one possible embodiment of the minimalist VKCDR3 library, 70 members could be produced by utilizing 10 VK chassis, each paired with its respective L3-VK, 7 amino acids at position 96 (i.e., X), and one JK* sequence. Another embodiment of the library may have 350 members, produced by combining 10 VK chassis, each paired with its respective L3-VK, with 7 amino acids at position 96, and all 5 JK* genes. Still another embodiment of the library may have 1,125 members, produced by combining 15 VK chassis, each paired with its respective H3-JK, with 15 amino acids at position 96 and all JK* genes, and so on. A person of ordinary skill in the art will readily recognize that many other combinations are possible. Moreover, while it is believed that maintaining the pairing between the VK chassis and the L3-VK results in libraries that are more similar to human kappa light chain sequences in composition, the L3-VK regions may also be combinatorially varied with different VK chassis regions, to create additional diversity. - While the dominant length of VKCDR3 sequences in humans is about nine amino acids, other lengths appear at measurable frequencies that cumulatively approach almost about 30% of VKCDR3 sequences. In particular, VKCDR3 of
lengths FIG. 3 ). Thus, more complex VKCDR3 libraries may include CDR lengths of 8, 10, and 11 amino acids. Such libraries could account for a greater percentage of the length distribution observed in collections of human VKCDR3 sequences, or even introduce VKCDR3 lengths that do not occur frequently in human VKCDR3 sequences (e.g., less than eight residues or greater than 11 residues). - The inclusion of a diversity of kappa light chain length variations in a library of the invention also enables one to include sequence variability that occurs outside of the amino acid at the VK-JK junction (i.e., position 96, described above). In certain embodiments of the invention, the patterns of sequence variation within the VK, and/or JK segments can be determined by aligning collections of sequences derived from particular germline sequences. In certain embodiments of the invention, the frequency of occurrence of amino acid residues within VKCDR3 can be determined by sequence alignments (e.g., see Example 6.2 and Table 30). In some embodiments of the invention, this frequency of occurrence may be used to introduce variability into the VK_Chassis, L3-VK and/or JK segments that are used to synthesize the VKCDR3 libraries. In certain embodiments of the invention, the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at any particular position in a naturally occurring repertoire may be included at that position in a VKCDR3 library of the invention. In certain embodiments of the invention, the percent occurrence of any amino acid at any particular position within the VKCDR3 or a VK light chain may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In certain embodiments of the invention, the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5% 6%, 7%, 8%, 9% 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.
- In some embodiments of the invention, a VKCDR3 library may be synthesized using degenerate oligonucleotides (see Table 31 for IUPAC base symbol definitions). In some embodiments of the invention, the limits of oligonucleotide synthesis and the genetic code may require the inclusion of more or fewer amino acids at a particular position in the VKCDR3 sequences. An illustrative embodiment of this approach is provided in Example 6.2.
- The limitations inherent in using the genetic code and degenerate oligonucleotide synthesis may, in some cases, require the inclusion of more or fewer amino acids at a particular position within VKCDR3 (e.g., Example 6.2, Table 32), in comparison to those amino acids found at that position in nature. This limitation can be overcome through the use of a codon-based synthesis approach (Vimekas et al. Nucleic Acids Res., 1994, 22: 5600, incorporated by reference in its entirety), which enables precise synthesis of oligonucleotides encoding particular amino acids and a finer degree of control over the proportion of any particular amino acid incorporated at any position. Example 6.3 describes this approach in greater detail.
- In some embodiments of the invention, a codon-based synthesis approach may be used to vary the percent occurrence of any amino acid at any particular position within the VKCDR3 or kappa light chain. In certain embodiments, the percent occurrence of any amino acid at any position in a VKCDR3 or kappa light chain sequence of the library may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In some embodiments of the invention, the percent occurrence of any amino acid at any position may be about 1%, 2%, 3%, or 4%. In certain embodiments of the invention, the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.
- In certain embodiments of the invention, the VKCDR3 (and any other sequence used in the library, regardless of whether or not it is part of VKCDR3) may be altered to remove undesirable amino acid motifs. For example, peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P, will undergo post-translational modification (N-linked glycosylation) in a number of expression systems, including yeast and mammalian cells. In certain embodiments of the invention, the introduction of N residues at certain positions may be avoided, so as to avoid the introduction of N-linked glycosylation sites. In some embodiments of the invention, these modifications may not be necessary, depending on the organism used to express the library and the culture conditions. However, even in the event that the organism used to express libraries with potential N-linked glycosylation sites is incapable of N-linked glycosylation (e.g., bacteria), it may still be desirable to avoid N-X-(S/T) sequences, as the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity.
- In certain embodiments of the invention, it may be preferable to create the individual sub-libraries of different lengths (e.g., one or more of
lengths lengths FIG. 3 ). In other embodiments, it may be desirable to mix these sub-libraries at ratios that are different from the distribution of lengths in natural VKCDR3 sequences, for example, to produce more focused libraries or libraries with particular properties. - The principles used to design the minimalist VλCDR3 libraries of the invention are similar to those enumerated above, for the VKCDR3 libraries, and are explained in more detail in the Examples. One difference between the VλCDR3 libraries of the invention and the VKCDR3 libraries of the invention is that, unlike the IGKV genes, the contribution of the IGVλ genes to CDRL3 (i.e., L3-Vλ) is not constrained to a fixed number of amino acid residues. Therefore, while the combination of the VK (including L3-VK) and JK segments, with inclusion of position 96, yields CDRL3 with a length of only 9 residues, length variation may be obtained within a VλCDR3 library even when only the Vλ (including L3-Vλ) and Jλ segments are considered.
- As for the VKCDR3 sequences, additional variability may be introduced into the VλCDR3 sequences via the same methods outlined above, namely determining the frequency of occurrence of particular residues within VλCDR3 sequences and synthesizing the oligonucleotides encoding the desired compositions via degenerate oligonucleotide synthesis or trinucleotides-based synthesis.
- In certain embodiments of the invention, both the heavy and light chain chassis sequences and the heavy and light chain CDR3 sequences are synthetic. The polynucleotide sequences of the instant invention can be synthesized by various methods. For example, sequences can be synthesized by split pool DNA synthesis as described in Feldhaus et al., Nucleic Acids Research, 2000, 28: 534; Omstein et al., Biopolymers, 1978, 17: 2341; and Brenner and Lerner, PNAS, 1992, 87: 6378 (each of which is incorporated by reference in its entirety).
- In some embodiments of the invention, cassettes representing the possible V, D, and J diversity found in the human repertoire, as well as junctional diversity, are synthesized de novo either as double-stranded DNA oligonucleotides, single-stranded DNA oligonucleotides representative of the coding strand, or single-stranded DNA oligonucleotides representative of the non-coding strand. These sequences can then be introduced into a host cell along with an acceptor vector containing a chassis sequence and, in some cases a portion of FRM4 and a constant region. No primer-based PCR amplification from mammalian cDNA or mRNA or template-directed cloning steps from mammalian cDNA or mRNA need be employed.
- In certain embodiments, the present invention exploits the inherent ability of yeast cells to facilitate homologous recombination at high efficiency. The mechanism of homologous recombination in yeast and its applications are briefly described below.
- As an illustrative embodiment, homologous recombination can be carried out in, for example, Saccharomyces cerevisiae, which has genetic machinery designed to carry out homologous recombination with high efficiency. Exemplary S. cerevisiae strains include EM93, CEN.PK2, RM11-1a, YJM789, and BJ5465. This mechanism is believed to have evolved for the purpose of chromosomal repair, and is also called “gap repair” or “gap filling”. By exploiting this mechanism, mutations can be introduced into specific loci of the yeast genome. For example, a vector carrying a mutant gene can contain two sequence segments that are homologous to the 5′ and 3′ open reading frame (ORF) sequences of a gene that is intended to be interrupted or mutated. The vector may also encode a positive selection marker, such as a nutritional enzyme allele (e.g., URA3) and/or an antibiotic resistant marker (e.g., Geneticin/G418), flanked by the two homologous DNA segments. Other selection markers and antibiotic resistance markers are known to one of ordinary skill in the art. In some embodiments of the invention, this vector (e.g., a plasmid) is linearized and transformed into the yeast cells. Through homologous recombination between the plasmid and the yeast genome, at the two homologous recombination sites, a reciprocal exchange of the DNA content occurs between the wild type gene in the yeast genome and the mutant gene (including the selection marker gene(s)) that is flanked by the two homologous sequence segments. By selecting for the one or more selection markers, the surviving yeast cells will be those cells in which the wild-type gene has been replaced by the mutant gene (Pearson et al., Yeast, 1998, 14: 391, incorporated by reference in its entirety). This mechanism has been used to make systematic mutations in all 6,000 yeast genes, or open reading frames (ORFs), for functional genomics studies. Because the exchange is reciprocal, a similar approach has also been used successfully to clone yeast genomic DNA fragments into a plasmid vector (Iwasaki et al., Gene, 1991, 109: 81, incorporated by reference in its entirety).
- By utilizing the endogenous homologous recombination machinery present in yeast, gene fragments or synthetic oligonucleotides can also be cloned into a plasmid vector without a ligation step. In this application of homologous recombination, a target gene fragment (i.e., the fragment to be inserted into a plasmid vector, e.g., a CDR3) is obtained (e.g., by oligonucleotides synthesis, PCR amplification, restriction digestion out of another vector, etc.). DNA sequences that are homologous to selected regions of the plasmid vector are added to the 5′ and 3′ ends of the target gene fragment. These homologous regions may be fully synthetic, or added via PCR amplification of a target gene fragment with primers that incorporate the homologous sequences. The plasmid vector may include a positive selection marker, such as a nutritional enzyme allele (e.g., URA3), or an antibiotic resistance marker (e.g., Geneticin/G418). The plasmid vector is then linearized by a unique restriction cut located in-between the regions of sequence homology shared with the target gene fragment, thereby creating an artificial gap at the cleavage site. The linearized plasmid vector and the target gene fragment flanked by sequences homologous to the plasmid vector are co-transformed into a yeast host strain. The yeast is then able to recognize the two stretches of sequence homology between the vector and target gene fragment and facilitate a reciprocal exchange of DNA content through homologous recombination at the gap. As a consequence, the target gene fragment is inserted into the vector without ligation.
- The method described above has also been demonstrated to work when the target gene fragments are in the form of single stranded DNA, for example, as a circular M13 phage derived form, or as single stranded oligonucleotides (Simon and Moore, Mol. Cell Biol., 1987, 7: 2329; Ivanov et al., Genetics, 1996, 142: 693; and DeMarini et al., 2001, 30: 520, each incorporated by reference in its entirety). Thus, the form of the target that can be recombined into the gapped vector can be double stranded or single stranded, and derived from chemical synthesis, PCR, restriction digestion, or other methods.
- Several factors may influence the efficiency of homologous recombination in yeast. For example, the efficiency of the gap repair is correlated with the length of the homologous sequences flanking both the linearized vector and the target gene. In certain embodiments, about 20 or more base pairs may be used for the length of the homologous sequence, and about 80 base pairs may give a near-optimized result (Hua et al., Plasmid, 1997, 38: 91; Raymond et al., Genome Res., 2002, 12: 190, each incorporated by reference in its entirety). In certain embodiments of the invention, at least about 5, 10, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 187, 190, or 200 homologous base pairs may be used to facilitate recombination. In other embodiments, between about 20 and about 40 base pairs are utilized. In addition, the reciprocal exchange between the vector and gene fragment is strictly sequence-dependent, i.e. it does not cause a frame shift. Therefore, gap-repair cloning assures the insertion of gene fragments with both high efficiency and precision. The high efficiency makes it possible to clone two, three, or more targeted gene fragments simultaneously into the same vector in one transformation attempt (Raymond et al., Biotechniques, 1999, 26: 134, incorporated by reference in its entirety). Moreover, the nature of precision sequence conservation through homologous recombination makes it possible to clone selected genes or gene fragments into expression or fusion vectors for direct functional examination (El-Deiry et al., Nature Genetics, 1992, 1: 4549; Ishioka et al., PNAS, 1997, 94: 2449, each incorporated by reference in its entirety).
- Libraries of gene fragments have also been constructed in yeast using homologous recombination. For example, a human brain cDNA library was constructed as a two-hybrid fusion library in vector pJG4-5 (Guidotti and Zervos, Yeast, 1999, 15: 715, incorporated by reference in its entirety). It has also been reported that a total of 6,000 pairs of PCR primers were used for amplification of 6,000 known yeast ORFs for a study of yeast genomic protein interactions (Hudson et al., Genome Res., 1997, 7: 1169, incorporated by reference in its entirety). In 2000, Uetz et al. conducted a comprehensive analysis-of protein-protein interactions in Saccharomyces cerevisiae (Uetz et al., Nature, 2000, 403: 623, incorporated by reference in its entirety). The protein-protein interaction map of the budding yeast was studied by using a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins (Ito et al., PNAS, 2000, 97: 1143, incorporated by reference in its entirety), and the genomic protein linkage map of Vaccinia virus was studied using this system (McCraith et al., PNAS, 2000, 97: 4879, incorporated by reference in its entirety).
- In certain embodiments of the invention, a synthetic CDR3 (heavy or light chain) may be joined by homologous recombination with a vector encoding a heavy or light chain chassis, a portion of FRM4, and a constant region, to form a full-length heavy or light chain. In certain embodiments of the invention, the homologous recombination is performed directly in yeast cells. In some embodiments, the method comprises:
-
- (a) transforming into yeast cells:
- (i) a linearized vector encoding a heavy or light chain chassis, a portion of FRM4, and a constant region, wherein the site of linearization is between the end of FRM3 of the chassis and the beginning of the constant region; and
- (ii) a library of CDR3 insert nucleotide sequences that are linear and double stranded, wherein each of the CDR3 insert sequences comprises a nucleotide sequence encoding CDR3 and 5′- and 3′-flanking sequences that are sufficiently homologous to the termini of the vector of (i) at the site of linearization to enable homologous recombination to occur between the vector and the library of CDR3 insert sequences; and
- (b) allowing homologous recombination to occur between the vector and the CDR3 insert sequences in the transformed yeast cells, such that the CDR3 insert sequences are incorporated into the vector, to produce a vector encoding full-length heavy chain or light chain.
- (a) transforming into yeast cells:
- As specified above, the CDR3 inserts may have a 5′ flanking sequence and a 3′ flanking sequence that are homologous to the termini of the linearized vector. When the CDR3 inserts and the linearized vectors are introduced into a host cell, for example, a yeast cell, the “gap” (the linearization site) created by linearization of the vector is filled by the CDR3 fragment insert through recombination of the homologous sequences at the 5′ and 3′ termini of these two linear double-stranded DNAs (i.e., the vector and the insert). Through this event of homologous recombination, libraries of circular vectors encoding full-length heavy or light chains comprising variable CDR3 inserts is generated. Particular instances of these methods are presented in the Examples.
- Subsequent analysis may be carried out to determine the efficiency of homologous recombination that results in correct insertion of the CDR3 sequences into the vectors. For example, PCR amplification of the CDR3 inserts directly from selected yeast clones may reveal how many clones are recombinant. In certain embodiments, libraries with minimum of about 90% recombinant clones are utilized. In certain other embodiments libraries with a minimum of about 1%, 5% 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% recombinant clones are utilized. The same PCR amplification of selected clones may also reveal the insert size.
- To verify the sequence diversity of the inserts in the selected clones, a PCR amplification product with the correct size of insert may be “fingerprinted” with restriction enzymes known to cut or not cut within the amplified region. From a gel electrophoresis pattern, it may be determined whether the clones analyzed are of the same identity or of the distinct or diversified identity. The PCR products may also be sequenced directly to reveal the identity of inserts and the fidelity of the cloning procedure, and to prove the independence and diversity of the clones.
FIG. 1 depicts a schematic of recombination between a fragment (e.g., CDR3) and a vector (e.g., comprising a chassis, portion of FRM4, and constant region) for the construction of a library. - Libraries of polynucleotides generated by any of the techniques described herein, or other suitable techniques, can be expressed and screened to identify antibodies having desired structure and/or activity. Expression of the antibodies can be carried out, for example, using cell-free extracts (and e.g., ribosome display), phage display, prokaryotic cells (e.g., bacterial display), or eukaryotic cells (e.g., yeast display). In certain embodiments of the invention, the antibody libraries are expressed in yeast.
- In other embodiments, the polynucleotides are engineered to serve as templates that can be expressed in a cell-free extract. Vectors and extracts as described, for example in U.S. Pat. Nos. 5,324,637; 5,492,817; 5,665,563, (each incorporated by reference in its entirety) can be used and many are commercially available. Ribosome display and other cell-free techniques for linking a polynucleotide (i.e., a genotype) to a polypeptide (i.e., a phenotype) can be used, e.g., Profusion™ (see, e.g., U.S. Pat. Nos. 6,348,315; 6,261,804; 6,258,558; and 6,214,553, each incorporated by reference in its entirety).
- Alternatively, the polynucleotides of the invention can be expressed in an E. coli expression system, such as that described by Pluckthun and Skerra. (Meth. Enzymol., 1989, 178: 476; Biotechnology, 1991, 9: 273, each incorporated by reference in its entirety). The mutant proteins can be expressed for secretion in the medium and/or in the cytoplasm of the bacteria, as described by Better and Horwitz, Meth. Enzymol., 1989, 178: 476, incorporated by reference in its entirety. In some embodiments, the single domains encoding VH and VL are each attached to the 3′ end of a sequence encoding a signal sequence, such as the ompA, phoA or pelB signal sequence (Lei et al., J. Bacteriol., 1987, 169: 4379, incorporated by reference in its entirety). These gene fusions are assembled in a dicistronic construct, so that they can be expressed from a single vector, and secreted into the periplasmic space of E. coli where they will refold and can be recovered in active form. (Skerra et al., Biotechnology, 1991, 9: 273, incorporated by reference in its entirety). For example, antibody heavy chain genes can be concurrently expressed with antibody light chain genes to produce antibodies or antibody fragments.
- In other embodiments of the invention, the antibody sequences are expressed on the membrane surface of a prokaryote, e.g., E. coli, using a secretion signal and lipidation moiety as described, e.g., in US20040072740; US20030100023; and US20030036092 (each incorporated by reference in its entirety).
- Higher eukaryotic cells, such as mammalian cells, for example myeloma cells (e.g., NS/0 cells), hybridoma cells, Chinese hamster ovary (CHO), and human embryonic kidney (HEK) cells, can also be used for expression of the antibodies of the invention. Typically, antibodies expressed in mammalian cells are designed to be secreted into the culture medium, or expressed on the surface of the cell. The antibody or antibody fragments can be produced, for example, as intact antibody molecules or as individual VH and VL fragments, Fab fragments, single domains, or as single chains (scFv) (Huston et al., PNAS, 1988, 85: 5879, incorporated by reference in its entirety).
- Alternatively, antibodies can be expressed and screened by anchored periplasmic expression (APEx 2-hybrid surface display), as described, for example, in Jeong et al., PNAS, 2007, 104: 8247 (incorporated by reference in its entirety) or by other anchoring methods as described, for example, in Mazor et al., Nature Biotechnology, 2007, 25: 563 (incorporated by reference in its entirety).
- In other embodiments of the invention, antibodies can be selected using mammalian cell display (Ho et al., PNAS, 2006, 103: 9637, incorporated by reference in its entirety).
- The screening of the antibodies derived from the libraries of the invention can be carried out by any appropriate means. For example, binding activity can be evaluated by standard immunoassay and/or affinity chromatography. Screening of the antibodies of the invention for catalytic function, e.g., proteolytic function can be accomplished using a standard assays, e.g., the hemoglobin plaque assay as described in U.S. Pat. No. 5,798,208 (incorporated by reference in its entirety). Determining the ability of candidate antibodies to bind therapeutic targets can be assayed in vitro using, e.g., a BIACORE™ instrument, which measures binding rates of an antibody to a given target or antigen based on surface plasmon resonance. In vivo assays can be conducted using any of a number of animal models and then subsequently tested, as appropriate, in humans. Cell-based biological assays are also contemplated.
- One aspect of the instant invention is the speed at which the antibodies of the library can be expressed and screened. In certain embodiments of the invention, the antibody library can be expressed in yeast, which have a doubling time of less than about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 hours. In some embodiments, the doubling times are about 1 to about 3 hours, about 2 to about 4, about 3 to about 8 hours, about 3 to about 24, about 5 to about 24, about 4 to about 6 about 5 to about 22, about 6 to about 8, about 7 to about 22, about 8 to about 10 hours, about 7 to about 20, about 9 to about 20, about 9 to about 18, about 11 to about 18, about 11 to about 16, about 13 to about 16, about 16 to about 20, or about 20 to about 30 hours. In certain embodiments of the invention, the antibody library is expressed in yeast with a doubling time of about 16 to about 20 hours, about 8 to about 16 hours, or about 4 to about 8 hours. Thus, the antibody library of the instant invention can be expressed and screened in a matter of hours, as compared to previously known techniques which take several days to express and screen antibody libraries. A limiting step in the throughput of such screening processes in mammalian cells is simply the time required to iteratively regrow populations of isolated cells, which, in some cases, have doubling times greater than the doubling times of the yeast used in the current invention.
- In certain embodiments of the invention, the composition of a library may be defined after one or more enrichment steps (for example by screening for antigen binding, or other properties). For example, a library with a composition comprising about x % sequences or libraries of the invention may be enriched to contain about 2x %, 3x %, 4x %, 5x %, 6x %, 7x %, 8x %, 9x %, 10x %, 20x %, 25x %, 40x %, 50x %, 60x % 75x %, 80x %, 90x %, 95x %, or 99x % sequences or libraries of the invention, after one or more screening steps. In other embodiments of the invention, the sequences or libraries of the invention may be enriched about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1,000-fold, or more, relative to their occurrence prior to the one or more enrichment steps. In certain embodiments of the invention, a library may contain at least a certain number of a particular type of sequence(s), such as CDRH3s, CDRL3s, heavy chains, light chains, or whole antibodies (e.g., at least about 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, or 1020). In certain embodiments, these sequences may be enriched during one or more enrichment steps, to provide libraries comprising at least about 102, 103, 104, 105, 106, 107, 108, 109, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, or 1019 of the respective sequence(s).
- As described above, antibody leads can be identified through a selection process that involves screening the antibodies of a library of the invention for binding to one or more antigens, or for a biological activity. The coding sequences of these antibody leads may be further mutagenized in vitro or in vivo to generate secondary libraries with diversity introduced in the context of the initial antibody leads. The mutagenized antibody leads can then be further screened for binding to target antigens or biological activity, in vitro or in vivo, following procedures similar to those used for the selection of the initial antibody lead from the primary library. Such mutagenesis and selection of primary antibody leads effectively mimics the affinity maturation process naturally occurring in a mammal that produces antibodies with progressive increases in the affinity to an antigen. In one embodiment of the invention, only the CDRH3 region is mutagenized. In another embodiment of the invention, the whole variable region is mutagenized. In other embodiments of the invention one or more of CDRH1, CDRH2, CDRH3, CDRL1, CDRL2, and/CDRL3 may be mutagenized. In some embodiments of the invention, “light chain shuffling” may be used as part of the affinity maturation protocol. In certain embodiments, this may involve pairing one or more heavy chains with a number of light chains, to select light chains that enhance the affinity and/or biological activity of an antibody. In certain embodiments of the invention, the number of light chains to which the one or more heavy chains can be paired is at least about 2, 5, 10, 100, 1000, 104, 105, 106, 107, 108, 109, or 1010. In certain embodiments of the invention, these light chains are encoded by plasmids. In other embodiments of the invention, the light chains may be integrated into the genome of the host cell.
- The coding sequences of the antibody leads may be mutagenized by a wide variety of methods. Examples of methods of mutagenesis include, but are not limited to site-directed mutagenesis, error-prone PCR mutagenesis, cassette mutagenesis, and random PCR mutagenesis. Alternatively, oligonucleotides encoding regions with the desired mutations can be synthesized and introduced into the sequence to be mutagenized, for example, via recombination or ligation.
- Site-directed mutagenesis or point mutagenesis may be used to gradually change the CDR sequences in specific regions. This may be accomplished by using oligonucleotide-directed mutagenesis or PCR. For example, a short sequence of an antibody lead may be replaced with a synthetically mutagenized oligonucleotide in either the heavy chain or light chain region, or both. The method may not be efficient for mutagenizing large numbers of CDR sequences, but may be used for fine tuning of a particular lead to achieve higher affinity toward a specific target protein.
- Cassette mutagenesis may also be used to mutagenize the CDR sequences in specific regions. In a typical cassette mutagenesis, a sequence block, or a region, of a single template is replaced by a completely or partially randomized sequence. However, the maximum information content that can be obtained may be statistically limited by the number of random sequences of the oligonucleotides. Similar to point mutagenesis, this method may also be used for fine tuning of a particular lead to achieve higher affinity towards a specific target protein.
- Error-prone PCR, or “poison” PCR, may be used to mutagenize the CDR sequences by following protocols described in Caldwell and Joyce, PCR Methods and Applications, 1992, 2: 28; Leung et al., Technique, 1989, 1: 11; Shafikhani et al., Biotechniques, 1997, 23: 304; and Stemmer et al., PNAS, 1994, 91: 10747 (each of which is incorporated by reference in its entirety).
- Conditions for error prone PCR may include (a) high concentrations of Mn2+ (e.g., about 0.4 to about 0.6 mM) that efficiently induces malfunction of Taq DNA polymerase; and (b) a disproportionally high concentration of one nucleotide substrate (e.g., dGTP) in the PCR reaction that causes incorrect incorporation of this high concentration substrate into the template and produces mutations. Additionally, other factors such as, the number of PCR cycles, the species of DNA polymerase used, and the length of the template, may affect the rate of misincorporation of “wrong” nucleotides into the PCR product. Commercially available kits may be utilized for the mutagenesis of the selected antibody library, such as the “Diversity PCR random mutagenesis kit” (CLONTECH™).
- The primer pairs used in PCR-based mutagenesis may, in certain embodiments, include regions matched with the homologous recombination sites in the expression vectors. This design allows facile re-introduction of the PCR products back into the heavy or light chain chassis vectors, after mutagenesis, via homologous recombination.
- Other PCR-based mutagenesis methods can also be used, alone or in conjunction with the error prone PCR described above. For example, the PCR amplified CDR segments may be digested with DNase to create nicks in the double stranded DNA. These nicks can be expanded into gaps by other exonucleases such as Bal 31. The gaps may then be filled by random sequences by using DNA Klenow polymerase at a low concentration of regular substrates dGTP, dATP, dTTP, and dCTP with one substrate (e.g., dGTP) at a disproportionately high concentration. This fill-in reaction should produce high frequency mutations in the filled gap regions. These method of DNase digestion may be used in conjunction with error prone PCR to create a high frequency of mutations in the desired CDR segments.
- The CDR or antibody segments amplified from the primary antibody leads may also be mutagenized in vivo by exploiting the inherent ability of mutation in pre-B cells. The Ig genes in pre-B cells are specifically susceptible to a high-rate of mutation. The Ig promoter and enhancer facilitate such high rate mutations in a pre-B cell environment while the pre-B cells proliferate. Accordingly, CDR gene segments may be cloned into a mammalian expression vector that contains a human Ig enhancer and promoter. This construct may be introduced into a pre-B cell line, such as 38B9, which allows the mutation of the VH and VL gene segments naturally in the pre-B cells (Liu and Van Ness, Mol. Immunol., 1999, 36: 461, incorporated by reference in its entirety). The mutagenized CDR segments can be amplified from the cultured pre-B cell line and re-introduced back into the chassis-containing vector(s) via, for example, homologous recombination.
- In some embodiments, a CDR “hit” isolated from screening the library can be re-synthesized, using degenerate codons or trinucleotides, and re-cloned into the heavy or light chain vector using gap repair.
- In certain embodiments of the invention, a library of the invention comprises a designed, non-random repertoire wherein the theoretical diversity of particular components of the library (for example, CDRH3), but not necessarily all components or the entire library, can be over-sampled in a physical realization of the library, at a level where there is a certain degree of statistical confidence (e.g., 95%) that any given member of the theoretical library is present in the physical realization of the library at least at a certain frequency (e.g., at least once, twice, three times, four times, five times, or more) in the library.
- In a library, it is generally assumed that the number of copies of a given clone obeys a Poisson probability distribution (see Feller, W. An Introduction to Probability Theory and Its Applications, 1968, Wiley New York, incorporated by reference in its entirety). The probability of a Poisson random number being zero, corresponding to the probability of missing a given component member in an instance of a library (see below), is e−N, where N is the average of the random number. For example, if there are 106 possible theoretical members of a library and a physical realization of the library has 107 members, with an equal probability of each member of the theoretical library being sampled, then the average number of times that each member occurs in the physical realization of the library is 107/106=10, and the probability that the number of copies of a given member is zero is e−N=e−10=0.000045; or a 99.9955% chance that there is at least one copy of any of the 106 theoretical members in this 10× oversampled library. For a 2.3× oversampled library one is 90% confident that a given component is present. For a 3× oversampled library one is 95% confident that a given component is present. For a 4.6× oversampled library one is 99% confident a given clone is present, and so on.
- Therefore, if M is the maximum number of theoretical library members that can be feasibly physically realized, then M/3 is the maximum theoretical repertoire size for which one can be 95% confident that any given member of the theoretical library will be sampled. It is important to note that there is a difference between a 95% chance that a given member is represented and a 95% chance that every possible member is represented. In certain embodiments, the instant invention provides a rationally designed library with diversity so that any given member is 95% likely to be represented in a physical realization of the library. In other embodiments of the invention, the library is designed so that any given member is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% 99%, 99.5%, or 99.9% likely to be represented in a physical realization of the library. For a review, see, e.g, Firth and Patrick, Biomol. Eng., 2005, 22: 105, and Patrick et al., Protein Engineering, 2003, 16: 451, each of which is incorporated by reference in its entirety.
- In certain embodiments of the invention, a library may have a theoretical total diversity of X unique members and the physical realization of the theoretical total diversity may contain at least about 1×, 2×, 3×, 4×, 5×, 6×, 7×, 8×9×, 10×, or more members. In some embodiments, the physical realization of the theoretical total diversity may contain about 1× to about 2×, about 2× to about 3×, about 3× to about 4×, about 4× to about 5×, about 5× to about 6× members. In other embodiments, the physical realization of the theoretical total diversity may contain about 1× to about 3×, or about 3× to about 5× total members.
- An assumption underlying all directed evolution experiments is that the amount of molecular diversity theoretically possible is enormous compared with the ability to synthesize it, physically realize it, and screen it. The likelihood of finding a variant with improved properties in a given library is maximized when that library is maximally diverse. Patrick et al. used simple statistics to derive a series of equations and computer algorithms for estimating the number of unique sequence variants in libraries constructed by randomized oligonucleotide mutagenesis, error-prone PCR and in vitro recombination. They have written a suite of programs for calculating library statistics, such as GLUE, GLUE-IT, PEDEL, PEDEL-AA, and DRIVeR. These programs are described, with instructions on how to access them, in Patrick et al., Protein Engineering, 2003, 16: 451 and Firth et al., Nucleic Acids Res., 2008, 36: W281 (each of which is incorporated by reference in its entirety).
- It is possible to construct a physical realization of a library in which some components of the theoretical diversity (such as CDRH3) are oversampled, while other aspects (VH/VL pairings) are not. For example, consider a library in which 108 CDRH3 segments are designed to be present in a single VH chassis, and then paired with 105 VL genes to produce 1013 (=108*105) possible full heterodimeric antibodies. If a physical realization of this library is constructed with a diversity of 109 transformant clones, then the CDRH3 diversity is oversampled ten-fold (=109/108), however the possible VH/VL pairings are undersampled by 10−4 (=109/1013). In this example, on average, each CDRH3 is paired only with 10 samples of the VL from the possible 105 partners. In certain embodiments of the invention, it is the CDRH3 diversity that is preferably oversampled.
- In certain embodiments, the invention relates to a polynucleotide that hybridizes with a polynucleotide taught herein, or that hybridizes with the complement of a polynucleotide taught herein. For example, an isolated polynucleotide that remains hybridized after hybridization and washing under low, medium, or high stringency conditions to a polynucleotide taught herein or the complement of a polynucleotide taught herein is encompassed by the present invention.
- Exemplary low stringency conditions include hybridization with a buffer solution of about 30% to about 35% formamide, about 1 M NaCl, about 1% SDS (sodium dodecyl sulphate) at about 37° C., and a wash in about 1× to about 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at about 50° C. to about 55° C.
- Exemplary moderate stringency conditions include hybridization in about 40% to about 45% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.5× to about 1×SSC at abut 55° C. to about 60° C.
- Exemplary high stringency conditions include hybridization in about 50% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.1×SSC at about 60° C. to about 65° C.
- Optionally, wash buffers may comprise about 0.1% to about 1% SDS.
- The duration of hybridization is generally less than about 24 hours, usually about 4 to about 12 hours.
- As described throughout the application, the libraries of the current invention are distinguished, in certain embodiments, by their human-like sequence composition and length, and the ability to generate a physical realization of the library which contains all members of (or, in some cases, even oversamples) a particular component of the library. Libraries comprising combinations of the libraries described herein (e.g., CDRH3 and CDRL3 libraries) are encompassed by the invention. Sub-libraries comprising portions of the libraries described herein are also encompassed by the invention (e.g., a CDRH3 library in a particular heavy chain chassis or a sub-set of the CDRH3 libraries). One of ordinary skill in the art will readily recognize that each of the libraries described herein has several components (e.g., CDRH3, VH, CDRL3, VL, etc.), and that the diversity of these components can be varied to produce sub-libraries that fall within the scope of the invention.
- Moreover, libraries containing one of the libraries or sub-libraries of the invention also fall within the scope of the invention. For example, in certain embodiments of the invention, one or more libraries or sub-libraries of the invention may be contained within a larger library, which may include sequences derived by other means, for example, non-human or human sequence derived by stochastic or semi-stochastic synthesis. In certain embodiments of the invention, at least about 1% of the sequences in a polynucleotide library may be those of the invention (e.g., CDRH3 sequences, CDRL3 sequences, VH sequences, VL sequences), regardless of the composition of the other 99% of sequences. In other embodiments of the invention, at least about 0.001%, 0.01%, 0.1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% of the sequences in any polynucleotide library may be those of the invention, regardless of the composition of the other sequences. In some embodiments, the sequences of the invention may comprise about 0.001% to about 1%, about 1% to about 2%, about 2% to about 5%, about 5% to about 10%, about 10% to about 15%, about 15% to about 20%, about 20% to about 25%, about 25% to about 30%, about 30% to about 35%, about 35% to about 40%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 60%, about 60% to about 65%, about 65% to about 70%, about 70% to about 75%, about 75% to about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, or about 95% to about 99% of the sequences in any polynucleotide library, regardless of the composition of the other sequences. Thus, libraries more diverse than one or more libraries or sub-libraries of the invention, but yet still comprising one or more libraries or sub-libraries of the invention, in an amount in which the one or more libraries or sub-libraries of the invention can be effectively screened and from which sequences encoded by the one or more libraries or sub-libraries of the invention can be isolated, also fall within the scope of the invention.
- In certain embodiments of the invention, the amino acid products of a library of the invention (e.g., a CDRH3 or CDRL3) may be displayed on an alternative scaffold. Several of these scaffolds have been shown to yield molecules with specificities and affinities that rival those of antibodies. Exemplary alternative scaffolds include those derived from fibronectin (e.g., AdNectin), the β-sandwich (e.g., iMab), lipocalin (e.g., Anticalin), EETI-II/AGRP, BPTI/LACI-D1/ITI-D2 (e.g., Kunitz domain), thioredoxin (e.g., peptide aptamer), protein A (e.g., Affibody), ankyrin repeats (e.g., DARPin), γB-crystallin/ubiquitin (e.g., Affilin), CTLD3 (e.g., Tetranectin), and (LDLR-A module)3 (e.g., Avimers). Additional information on alternative scaffolds are provided in Binz et al., Nat. Biotechnol., 2005 23: 1257 and Skerra, Current Opin. in Biotech., 2007 18: 295-304, each of which is incorporated by reference in its entirety.
- In certain embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 107 to 108 polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.
- In other embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising polynucleotide sequences encoding CDRH3 represented by the following formula:
-
[G/D/E/-][N1][DH][N2][H3-JH], - wherein [G/D/E/-] is zero to one amino acids in length, [N1] is zero to three amino acids, [DH] is three to ten amino acids in length, [N2] is zero to three amino acids in length, and [H3-JH] is two to nine amino acids in length.
- In certain embodiments of the invention, [G/D/E/-] is represented by an amino acid sequence selected from the group consisting of: G, D, E, and nothing.
- In some embodiments of the invention, [N1] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/A/V/T), (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.
- In certain embodiments of the invention, [N2] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/A/V/T), (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.
- In some embodiments of the invention, [DH] comprises a sequence selected from the group consisting of: IGHD3-10 reading frame 1 (SEQ ID NO: 1), IGHD3-10 reading frame 2 (SEQ ID NO: 2), IGHD3-10 reading frame 3 (SEQ ID NO: 3), IGHD3-22 reading frame 2 (SEQ ID NO: 4), IGHD6-19 reading frame 1 (SEQ ID NO: 5), IGHD6-19 reading frame 2 (SEQ ID NO: 6), IGHD6-13 reading frame 1 (SEQ ID NO: 7), IGHD6-13 reading frame 2 (SEQ ID NO: 8), IGHD3-03 reading frame 3 (SEQ ID NO: 9), IGHD2-02 reading frame 2 (SEQ ID NO: 10), IGHD2-02 reading frame 3 (SEQ ID NO: 11), IGHD4-17 reading frame 2 (SEQ ID NO: 12), IGHD1-26 reading frame 1 (SEQ ID NO: 13), IGHD1-26 reading frame 3 (SEQ ID NO: 14), IGHD5-5/5-18 reading frame 3 (SEQ ID NO: 15), IGHD2-15 reading frame 2 (SEQ ID NO: 16), and all possible N-terminal and C-terminal truncations of the above-identified IGHDs down to three amino acids.
- In certain embodiments of the invention, [H3-JH] comprises a sequence selected from the group consisting of: AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, AFDV (SEQ ID NO: 19), FDV, DV, YFDY (SEQ ID NO: 20), FDY, DY, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), MDV, and DV.
- In some embodiments of the invention, the sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 3 to about 26 amino acids in length.
- In certain embodiments of the invention, the sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 7 to about 23 amino acids in length.
- In some embodiments of the invention, the library comprises about 107 to about 1010 sequences.
- In certain embodiments of the invention, the library comprises about 107 sequences.
- In some embodiments of the invention, the polynucleotide sequences of the libraries further comprise a 5′ polynucleotide sequence encoding a framework 3 (FRM3) region on the corresponding N-terminal end of the library sequence, wherein the FRM3 region comprises a sequence of about 1 to about 9 amino acid residues.
- In certain embodiments of the invention, the FRM3 region comprises a sequence selected from the group consisting of CAR, CAK, and CAT.
- In some embodiments of the invention, the polynucleotide sequences further comprise a 3′ polynucleotide sequence encoding a framework 4 (FRM4) region on the corresponding C-terminal end of the library sequence, wherein the FRM4 region comprises a sequence of about 1 to about 9 amino acid residues.
- In certain embodiments of the invention, the library comprises a FRM4 region comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).
- In some embodiments of the invention, the polynucleotide sequences further comprise an FRM3 region coding for a corresponding polypeptide sequence comprising a sequence selected from the group consisting of CAR, CAK, and CAT; and an FRM4 region coding for a corresponding polypeptide sequence comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).
- In certain embodiments of the invention, the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a heavy chain chassis.
- In some embodiments, the invention comprises a synthetic preimmune human antibody light chain library comprising polynucleotide sequences encoding human antibody kappa light chains represented by the formula:
-
[IGKV(1-95)][F/L/I/R/W/Y][JK]. - In certain embodiments of the invention, [IGKV (1-95)] is selected from the group consisting of IGKV3-20 (SEQ ID NO: 237) (1-95), IGKV1-39 (SEQ ID NO: 233) (1-95), IGKV3-11 (SEQ ID NO: 235) (1-95), IGKV3-15 (SEQ ID NO: 236) (1-95), IGKV1-05 (SEQ ID NO: 229) (1-95), IGKV4-01 (1-95), IGKV2-28 (SEQ ID NO: 234) (1-95), IGKV 1-33 (1-95), IGKV1-09 (SEQ ID NO: 454) (1-95), IGKV1-12 (SEQ ID NO: 230) (1-95), IGKV2-30 (SEQ ID NO: 467) (1-95), IGKV1-27 (SEQ ID NO: 231) (1-95), IGKV1-16 (SEQ ID NO: 456) (1-95), and truncations of said group up to and including
position 95 according to Kabat. - In some embodiments of the invention, [F/L/I/R/W/Y] is an amino acid selected from the group consisting of F, L, I, R, W, and Y.
- In certain embodiments of the invention, [JK] comprises a sequence selected from the group consisting of TFGQGTKVEIK (SEQ ID NO: 528) and TFGGGT (SEQ ID NO: 529).
- In some embodiments of the invention, the light chain library comprises a kappa light chain library.
- In certain embodiments of the invention, the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a light chain chassis.
- In some embodiments, the invention comprises a method for producing a synthetic preimmune human antibody CDRH3 library comprising 107 to 108 polynucleotide sequences, said method comprising:
-
- a) selecting the CDRH3 polynucleotide sequences encoded by the CDRH3 sequences, as follows:
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by
- {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
- {all possible N-terminal truncations of IGHJ, down to DXWG, wherein X is S, V, L, or Y}; and
- b) synthesizing the CDRH3 library described in a) by chemical synthesis, wherein a synthetic preimmune human antibody CDRH3 library is produced.
- a) selecting the CDRH3 polynucleotide sequences encoded by the CDRH3 sequences, as follows:
- In certain embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 107 to 1010 polynucleotide sequences representative of known human IGHD and IGHJ germline sequences encoding CDRH3, represented by the following formula:
-
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
- {all possible N-terminal truncations of IGHJ, down to DXWG (SEQ ID NO: 530), wherein X is S, V, L, or Y}.
- In certain embodiments, the invention comprises a synthetic preimmune human antibody heavy chain variable domain library comprising 107 to 1010 polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:
-
- a) an antibody heavy chain chassis, and
- b) a CDRH3 repertoire designed based on the human IGHD and IGHJ germline sequences, as follows:
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by
- {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
- {all possible N-terminal truncations of IGHJ, down to DXWG (SEQ ID NO: 530), wherein X is S, V, L, or Y}.
- In some embodiments of the invention, the synthetic preimmune human antibody heavy chain variable domain library is expressed as a full length chain selected from the group consisting of an IgG1 full length chain, an IgG2 full length chain, an IgG3 full length chain, and an IgG4 full length chain.
- In certain embodiments of the invention, the human antibody heavy chain chassis is selected from the group consisting of IGHV4-34 (SEQ ID NO: 35), IGHV3-23 (SEQ ID NO: 30), IGHV5-51 (SEQ ID NO: 40), IGHV1-69 (SEQ ID NO: 27), IGHV3-30 (SEQ ID NO: 31), IGHV4-39 (SEQ ID NO: 36), IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV2-5 (SEQ ID NO: 429), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV6-1 (SEQ ID NO: 449), IGHV1-46 (SEQ ID NO: 26), IGHV3-33 (SEQ ID NO: 32), IGHV4-31 (SEQ ID NO: 34), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-61 (SEQ ID NO: 38), and IGHV3-15 (SEQ ID NO: 29).
- In some embodiments of the invention, the synthetic preimmune human antibody heavy chain variable domain library comprises 107 to 1010 polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:
-
- a) an antibody heavy chain chassis, and
- b) a synthetic preimmune human antibody CDRH3 library.
- In some embodiments of the invention, the polynucleotide sequences are single-stranded coding polynucleotide sequences.
- In certain embodiments of the invention, the polynucleotide sequences are single-stranded non-coding polynucleotide sequences.
- In some embodiments of the invention, the polynucleotide sequences are double-stranded polynucleotide sequences.
- In certain embodiments, the invention comprises a population of replicable cells with a doubling time of four hours or less, in which a synthetic preimmune human antibody repertoire is expressed.
- In some embodiments of the invention, the population of replicable cells are yeast cells.
- In certain embodiments, the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.
- In some embodiments, the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.
- In certain embodiments, the invention comprises a method of generating an antibody library comprising synthesizing polynucleotide sequences by split-pool DNA synthesis.
- In some embodiments of the invention, the polynucleotide sequences are selected from the group consisting of single-stranded coding polynucleotide sequences, single-stranded non-coding polynucleotide sequences, and double-stranded polynucleotide sequences.
- In certain embodiments, the invention comprises a synthetic full-length preimmune human antibody library comprising about 107 to about 1010 polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.
- In certain embodiments, the invention comprises a method of selecting an antibody of interest from a human antibody library, comprising providing a synthetic preimmune human antibody CDRH3 library comprising a theoretical diversity of (N) polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences, wherein the physical realization of that diversity is an actual library of a size at least 3(N), thereby providing a 95% probability that a single antibody of interest is present in the library, and selecting an antibody of interest.
- In some embodiments of the invention, the theoretical diversity is about 107 to about 108 polynucleotide sequences.
- This invention is further illustrated by the following examples which should not be construed as limiting. The contents of all references, patents and published patent applications cited throughout this application are hereby incorporated by reference.
- In general, the practice of the present invention employs, unless otherwise indicated, conventional techniques of chemistry, molecular biology, recombinant DNA technology, PCR technology, immunology (especially, e.g., antibody technology), expression systems (e.g., yeast expression, cell-free expression, phage display, ribosome display, and PROFUSION™), and any necessary cell culture that are within the skill of the art and are explained in the literature. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989); DNA Cloning, Vols. 1 and 2, (D. N. Glover, Ed. 1985); Oligonucleotide Synthesis (M. J. Gait, Ed. 1984); PCR Handbook Current Protocols in Nucleic Acid Chemistry, Beaucage, Ed. John Wiley & Sons (1999) (Editor); Oxford Handbook of Nucleic Acid Structure, Neidle, Ed., Oxford Univ Press (1999); PCR Protocols: A Guide to Methods and Applications, Innis et al., Academic Press (1990); PCR Essential Techniques: Essential Techniques, Burke, Ed., John Wiley & Son Ltd (1996); The PCR Technique: RT-PCR, Siebert, Ed., Eaton Pub. Co. (1998); Antibody Engineering Protocols (Methods in Molecular Biology), 510, Paul, S., Humana Pr (1996); Antibody Engineering: A Practical Approach (Practical Approach Series, 169), McCafferty, Ed., Irl Pr (1996); Antibodies: A Laboratory Manual, Harlow et al., C.S.H.L. Press, Pub. (1999); Current Protocols in Molecular Biology, eds. Ausubel et al., John Wiley & Sons (1992); Large-Scale Mammalian Cell Culture Technology, Lubiniecki, A., Ed., Marcel Dekker, Pub., (1990); Phage Display: A Laboratory Manual, C. Barbas (Ed.), CSHL Press, (2001); Antibody Phage Display, P O'Brien (Ed.), Humana Press (2001); Border et al., Nature Biotechnology, 1997, 15: 553; Border et al., Methods Enzymol., 2000, 328: 430; ribosome display as described by Pluckthun et al. in U.S. Pat. No. 6,348,315, and Profusion™ as described by Szostak et al. in U.S. Pat. Nos. 6,258,558; 6,261,804; and 6,214,553; and bacterial periplasmic expression as described in US20040058403A1. Each of the references cited in this paragraph is incorporated by reference in its entirety.
- Further details regarding antibody sequence analysis using Kabat conventions and programs to screen aligned nucleotide and amino acid sequences may be found, e.g., in Johnson et al., Methods Mol. Biol., 2004, 248: 11; Johnson et al., Int. Immunol., 1998, 10: 1801; Johnson et al., Methods Mol. Biol., 1995, 51: 1; Wu et al., Proteins, 1993, 16: 1; and Martin, Proteins, 1996, 25: 130. Each of the references cited in this paragraph is incorporated by reference in its entirety.
- Further details regarding antibody sequence analysis using Chothia conventions may be found, e.g., in Chothia et al., J. Mol. Biol., 1998, 278: 457; Morea et al., Biophys. Chem., 1997, 68: 9; Morea et al., J. Mol. Biol., 1998, 275: 269; Al-Lazikani et al., J. Mol. Biol., 1997, 273: 927. Barre et al., Nat. Struct. Biol., 1994, 1: 915; Chothia et al., J. Mol. Biol., 1992, 227: 799; Chothia et al., Nature, 1989, 342: 877; and Chothia et al., J. Mol. Biol., 1987, 196: 901. Further analysis of CDRH3 conformation may be found in Shirai et al., FEBS Lett., 1999, 455: 188 and Shirai et al., FEBS Lett., 1996, 399: 1. Further details regarding Chothia analysis are described, for example, in Chothia et al., Cold Spring Harb. Symp. Quant Biol., 1987, 52: 399. Each of the references cited in this paragraph is incorporated by reference in its entirety.
- Further details regarding CDR contact considerations are described, for example, in MacCallum et al., J. Mol. Biol., 1996, 262: 732, incorporated by reference in its entirety.
- Further details regarding the antibody sequences and databases referred to herein are found, e.g., in Tomlinson et al., J. Mol. Biol., 1992, 227: 776, VBASE2 (Retter et al., Nucleic Acids Res., 2005, 33: D671); BLAST (www.ncbi.nlm.nih.gov/BLAST/); CDHIT (bioinformatics.ljcrfedu/cd-hi/); EMBOSS (www.hgmp.mrc.ac.uk/Software/EMBOSS/); PHYLIP (evolution.genetics.washington.edu/phylip.html); and FASTA (fasta.bioch.virginia.edu). Each of the references cited in this paragraph is incorporated by reference in its entirety.
- This example demonstrates the selection and design of exemplary, non-limiting VH chassis sequences of the invention. VH chassis sequences were selected by examining collections of human IGHV germline sequences (Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Tomlinson et al., J. Mol. Biol., 1992, 227: 799; Matsuda et al., J. Exp. Med., 1998, 188: 2151, each incorporated by reference in its entirety). As discussed in the Detailed Description, as well as below, a variety of criteria can be used to select VH chassis sequences, from these data sources or others, for inclusion in the library.
- Table 3 (adapted from information provided in Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Matsuda et al., J. Exp. Med., 1998, 188: 2151; and Wang et al. Immunol. Cell. Biol., 2008, 86: 111, each incorporated by reference in its entirety) lists the CDRH1 and CDRH2 length, the canonical structure and the estimated relative occurrence in peripheral blood, for the proteins encoded by each of the human IGHV germline sequences.
-
TABLE 3 IGHV Characteristics and Occurrence in Antibodies from Peripheral Blood Estimated Relative IGHV Length of Length of Canonical Occurrence in Germline CDRH1 CDRH2 Structures1 Peripheral Blood2 IGHV1-2 5 17 1-3 37 IGHV1-3 5 17 1-3 15 IGHV1-8 5 17 1-3 13 IGHV1-18 5 17 1-2 25 IGHV1-24 5 17 1-U 5 IGHV1-45 5 17 1-3 0 IGHV1-46 5 17 1-3 25 IGHV1-58 5 17 1-3 2 IGHV1-69 5 17 1-2 58 IGHV2-5 7 16 3-1 10 IGHV2-26 7 16 3-1 9 IGHV2-70 7 16 3-1 13 IGHV3-7 5 17 1-3 26 IGHV3-9 5 17 1-3 15 IGHV3-11 5 17 1-3 13 IGHV3-13 5 16 1-1 3 IGHV3-15 5 19 1-4 14 IGHV3-20 5 17 1-3 3 IGHV3-21 5 17 1-3 19 IGHV3-23 5 17 1-3 80 IGHV3-30 5 17 1-3 67 IGHV3-33 5 17 1-3 28 IGHV3-43 5 17 1-3 2 IGHV3-48 5 17 1-3 21 IGHV3-49 5 19 1-U 8 IGHV3-53 5 16 1-1 7 IGHV3-64 5 17 1-3 2 IGHV3-66 5 17 1-3 3 IGHV3-72 5 19 1-4 2 IGHV3-73 5 19 1-4 3 IGHV3-74 5 17 1-3 14 IGHV4-4 5 16 1-1 33 IGHV4-28 6 16 2-1 1 IGHV4-31 7 16 3-1 25 IGHV4-34 5 16 1-1 125 IGHV4-39 7 16 3-1 63 IGHV4-59 5 16 1-1 51 IGHV4-61 7 16 3-1 23 IGHV4-B 6 16 2-1 7 IGHV5-51 5 17 1-2 52 IGHV6-1 7 18 3-5 26 IGHV7-4-1 5 17 1-2 8 1Adapted from Chothia et al., J. Mol. Biol., 1992, 227: 799 2Adapted from Table S1 of Wang et al., Immunol. Cell. Biol., 2008, 86: 111 - In the currently exemplified library, 17 germline sequences were chosen for representation in the VH chassis of the library (Table 4). As described in more detail below, these sequences were selected based on their relatively high representation in the peripheral blood of adults, with consideration given to the structural diversity of the chassis and the representation of particular germline sequences in antibodies used in the clinic. These 17 sequences account for about 76% of the total sample of heavy chain sequences used to derive the results of Table 4. As outlined in the Detailed Description, these criteria are non-limiting, and one of ordinary skill in the art will readily recognize that a variety of other criteria can be used to select the VH chassis sequences, and that the invention is not limited to a library comprising the 17 VH chassis genes presented in Table 4.
-
TABLE 4 VH Chassis Selected for Use in the Exemplary Library VH Relative Length of Length of Chassis Occurrence CDRH1 CDRH2 Comment VH1-2 37 5 17 Among highest usage for VH1 family VH1-18 25 5 17 Among highest usage for VH1 family VH1-46 25 5 17 Among highest usage for VH1 family VH1-69 58 5 17 Highest usage for VH1 family. The four chosen VH1 chassis represent about 80% of the VH1 repertoire. VH3-7 26 5 17 Among highest usage in VH3 family VH3-15 14 5 19 Not among highest usage, but it has unique structure (H2 of length 19). Highest occurrence among those with such structure. VH3-23 80 5 17 Highest usage in VH3 family. VH3-30 67 5 17 Among highest usage in VH3 family VH3-33 28 5 17 Among highest usage in VH3 family VH3-48 21 5 17 Among highest usage in VH3 family. The six chosen VH3 chassis account for about 70% of the VH3 repertoire. VH4-31 25 7 16 Among highest usage in VH4 family VH4-34 125 5 16 Highest usage in VH4 family VH4-39 63 7 16 Among highest usage in VH4 family VH4-59 51 5 16 Among highest usage in VH4 family VH4-61 23 7 16 Among highest usage in VH4 family VH4- B 7 6 16 Not among highest usage in VH4 family, but has unique structure (H1 of length 6). The 6 chosen VH4 chassis account for close to 90% of the VH4 family repertoire VH5-51 52 5 17 High usage - In this particular embodiment of the library, VH chassis derived from sequences in the IGHV2, IGHV6 and IGHV7 germline families were not included. As described in the Detailed Description, this exemplification is not meant to be limiting, as, in some embodiments, it may be desirable to include one or more of these families, particularly as clinical information on antibodies with similar sequences becomes available, to produce libraries with additional diversity that is potentially unexplored, or to study the properties and potential of these IGHV families in greater detail. The modular design of the library of the present invention readily permits the introduction of these, and other, VH chassis sequences. The amino acid sequences of the VH chassis utilized in this particular embodiment of the library, which are derived from the IGHV germline sequences, are presented in Table 5. The details of the derivation procedures are presented below.
-
TABLE 5 Amino Acid Sequences for VH Chassis Selected for Inclusion in the Exemplary Library Chassis SEQ ID NO: FRM1 CDRH1 FRM2 CDRH2 FRM3 VH1-2 24 QVQLVQSG GYYMH WVRQAPG WINPNSG RVTMTRDTSI AEVKKPGA QGLEWMG GTNYAQK STAYMELSRL SVKVSCKA FQG RSDDTAVYYC SGYTFT AR VH1-18 25 QVQLVQSG SYGIS WVRQAPG WISAYNG RVTMTTDTST AEVKKPGA QGLEWMG NTNYAQK STAYMELRSL SVKVSCKA LQG RSDDTAVYYC SGYTFT AR VH1-46 26 QVQLVQSG SYYMH WVRQAPG IINPSGG RVTMTRDTST AEVKKPGA QGLEWMG STSYAQK STVYMELSSL SVKVSCKA FQG RSEDTAVYYC SGYTFT AR VH1-69 27 QVQLVQSG SYAIS WVRQAPG GIIPIFG RVTITADKST AEVKKPGS QGLEWMG TANYAQK STAYMELSSL SVKVSCKA FQG RSEDTAVYYC SGGTFS AR VH3-7 28 EVQLVESG SYWMS WVRQAPG NIKQDGS RFTISRDNAK GGLVQPGG KGLEWVA EKYYVDS NSLYLQMNSL SLRLSCAA VKG RAEDTAVYYC SGFTFS AR VH3-151 29 EVQLVESG NAWMS WVRQAPG RIKSKTD RFTISRDDSK GGLVKPGG KGLEWVG GGTTDYA NTLYLQMNSL SLRLSCAA APVKG RA EDTAVYYC SGFTFS AR VH3-23 30 EVQLLESG SYAMS WVRQAPG AISGSGG RFTISRDNSK GGLVQPGG KGLEWVS STYYADS NTLYLQMNSL SLRLSCAA VKG RAEDTAVYYC SGFTFS AK VH3-30 31 QVQLVESG SYGMH WVRQAPG VISYDGS RFTISRDNSK GGVVQPGR KGLEWVA NKYYADS NTLYLQMNSL SLRLSCAA VKG RAEDTAVYYC SGFTFS AR VH3-33 32 QVQLVESG SYGMH WVRQAPG VIWYDGS RFTISRDNSK GGVVQPGR KGLEWVA NKYYADS NTLYLQMNSL SLRLSCAA VKG RAEDTAVYYC SGFTFS AR VH3-48 33 EVQLVESG SYSMN WVRQAPG YISSSSS RFTISRDNAK GGLVQPGG KGLEWVS TIYYADS NSLYLQMNSL SLRLSCAA VKG RAEDTAVYYC SGFTFS AR VH4-31 34 QVQLQESG SGGYY WIRQHPG YIYYSGS RVTISVDTSK PGLVKPSQ WS KGLEWIG TYYNPSL NQFSLKLSSV TLSLTCTV KS TAADTAVYYC SGGSIS AR VH4-342 35 QVQLQQWG GYYWS WIRQPPG EI DHS GS RVTISVDTSK AGLLKPSE KGLEWIG TNYNPSL NQFSLKLSSV TLSLTCAV KS TAADTAVYYC YGGSFS AR VH4-39 36 QLQLQESG SSSYY WIRQPPG SIYYSGS RVTISVDTSK PGLVKPSE WG KGLEWIG TYYNPSL NQFSLKLSSV TLSLTCTV KS TAADTAVYYC SGGSIS AR VH4-59 37 QVQLQESG SYYWS WIRQPPG YIYYSGS RVTISVDTSK PGLVKPSE KGLEWIG TNYNPSL NQFSLKLSSV TLSLTCTV KS TAADTAVYYC SGGSIS AR VH4-61 38 QVQLQESG SGSYY WIRQPPG YIYYSGS RVTISVDTSK PGLVKPSE WS KGLEWIG TNYNPSL NQFSLKLSSV TLSLTCTV KS TAADTAVYYC SGGSVS AR VH4-B 39 QVQLQESG SGYYW WIRQPPG SIYHSGS RVTISVDTSK PGLVKPSE G KGLEWIG TYYNPSL NQFSLKLSSV TLSLTCAV KS TAADTAVYYC SGYSIS AR VH5-51 40 EVQLVQSG SYWIG WVRQMPG IIYPGDS QVTISADKSI AEVKKPGE KGLEWMG DTRYSPS STAYLQWSSL SLKISCKG FQG KASDTAVYYC SGYSFT AR 1The original KT sequence in VH3-15 was mutated to RA (bold/underlined) and TT to AR (bold/underlined), in order to match other VH3 family members selected for inclusions in the library. The modification to RA was made so that no unique sequence stretches of up to about 20 amino acids are created. Without being bound by theory, this modification is expected to reduce the odds of introducing novel T-cell epitopes in the VH3-15 derived chassis sequence. The avoidance of T cell epitopes is an additional criterion that can be considered in the design of certain libraries of the invention. 2The original NHS motif in VH4-34 was mutated to DHS, in order to remove a possible N-linked glycosylation site in CDR-H2. In certain embodiments of the invention, for example, if the library is transformed into yeast, this may prevent unwanted N-linked glycosylation. - Table 5 provides the amino acid sequences of the seventeen chassis. In nucleotide space, most of the corresponding germline nucleotide sequences include two additional nucleotides on the 3′ end (i.e., two-thirds of a codon). In most cases, those two nucleotides are GA. In many cases, nucleotides are added to the 3′ end of the IGHV-derived gene segment in vivo, prior to recombination with the IGHD gene segment. Any additional nucleotide would make the resulting codon encode one of the following two amino acids: Asp (if the codon is GAC or GAT) or Glu (if the codon is GAA or GAG). One, or both, of the two 3′-terminal nucleotides may also be deleted in the final rearranged heavy chain sequence. If only the A is deleted, the resulting amino acid is very frequently a G. If both nucleotides are deleted, this position is “empty,” but followed by a general V-D addition or an amino acid encoded by the IGHD gene. Further details are presented in Example 5. This first position, after the CAR or CAK motif at the C-terminus of FRM3 (Table 5), is designated the “tail.” In the currently exemplified embodiment of the library, this residue may be G, D, E, or nothing. Thus, adding the tail to any chassis enumerated above (Table 5) can produce one of the following four schematic sequences, wherein the residue following the VH chassis is the tail:
-
[VH_Chassis]-[G] (1) -
[VH_Chassis]-[D] (2) -
[VH_Chassis]-[E] (3) -
[VH_Chassis] (4) - These structures can also be represented in the format:
-
[VH_Chassis]-[G/D/E/-], - wherein the hyphen symbol (-) indicates an empty or null position.
- Using the CDRH3 numbering system defined in the Definitions section, the above sequences could be denoted to have
amino acid 95 as G, D, or E, for instances (1), (2), and (3), respectively, while the sequence ofinstance 4 would have noposition 95, and CDRH3 proper would begin at position 96 or 97. - In some embodiments of the invention, VH3-66, with canonical structure 1-1 (five residues in CDRH1 and 16 for CDRH2) may be included in the library. The inclusion of VH3-66 may compensate for the removal of other chassis from the library, which may not express well in yeast under some conditions (e.g., VH4-34 and VH4-59).
- This example demonstrates the introduction of further diversity into the VH chassis by creating mutations in the CDRH1 and CDRH2 regions of each chassis shown in Example 1. The following approach was used to select the positions and nature of the amino acid variation for each chassis: First, the sequence identity between rearranged human heavy chain antibody sequences was analyzed (Lee et al., Immunogenetics, 2006, 57: 917; Jackson et al., J. Immunol. Methods, 2007, 324: 26) and they were classified by the origin of their respective IGHV germline sequence. As an illustrative example, about 200 sequences in the data set exhibited greatest identity to the IGHV1-69 germline, indicating that they were likely to have been derived from IGHV1-69. Next, the occurrence of amino acid residues at each position within the CDRH1 and CDRH2 segments, in each germline family selected in Example 1 was determined. For VH1-69, these occurrences are illustrated in Tables 6 and 7. Second, neutral and/or smaller amino acid residues were favored, where possible, as replacements. Without being bound by theory, the rationale for the choice of these amino acid residues is the desire to provide a more flexible and less sterically hindered context for the display of a diversity of CDR sequences.
-
TABLE 6 Occurrence of Amino Acid Residues at Each Position Within IGHV1-69-derived CDRH1 Sequences 31 32 33 34 35 S Y A I S A 1 0 129 0 0 C 0 1 0 0 2 D 0 5 1 0 0 E 0 0 0 0 0 F 0 9 1 8 0 G 0 0 24 0 3 H 2 11 0 0 4 I 2 0 0 159 1 K 3 0 0 0 0 L 0 10 2 5 0 M 1 0 0 0 0 N 21 2 2 0 27 P 0 0 1 0 0 Q 1 1 0 0 5 R 9 0 0 0 1 S 133 3 7 0 129 T 12 1 10 0 12 V 0 0 7 13 0 W 0 0 0 0 0 Y 0 142 1 0 1 -
TABLE 7 Occurrence of Amino Acid Residues at Each Position Within IGHV1-69-derived CDRH2 Sequences 50 51 52 52A 53 54 55 56 57 58 59 60 61 62 63 64 65 G I I P I F G T A N Y A Q K F Q G A 0 0 7 0 2 0 4 3 132 0 0 178 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 1 0 0 0 0 0 11 0 1 21 0 0 0 2 0 0 12 E 2 0 0 0 0 0 4 0 0 2 0 1 1 4 0 2 0 F 0 1 0 1 7 119 0 0 0 0 0 0 0 0 180 0 0 G 135 0 1 0 0 0 155 0 3 1 0 0 0 0 0 0 173 H 0 0 0 0 1 0 0 0 0 4 4 0 3 0 0 4 0 I 0 166 159 0 132 2 0 34 0 2 1 0 0 0 0 0 0 K 1 0 0 0 0 0 0 4 1 5 0 0 2 156 0 3 0 L 0 1 2 0 16 37 0 1 0 0 0 0 0 0 3 2 0 M 0 6 2 0 9 1 0 3 1 0 0 0 0 0 0 0 0 N 0 0 1 0 2 0 5 0 0 132 1 0 0 8 0 0 0 P 0 2 0 181 1 3 0 0 15 0 0 3 6 0 0 0 0 Q 0 0 0 0 0 0 1 0 1 0 0 0 173 2 0 164 0 R 44 0 0 0 0 0 1 4 0 3 0 0 0 13 0 9 0 S 1 0 1 1 2 6 3 5 8 7 0 2 0 0 1 0 0 T 1 1 7 2 2 1 0 127 15 8 3 1 0 0 0 0 0 V 0 8 5 0 11 4 0 4 8 0 0 0 0 0 0 0 0 W 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 11 1 0 0 0 176 0 0 0 1 1 0 - The original germline sequence is provided in the second row of the tables, in bold font, beneath the residue number (Kabat system). The entries in the table indicate the number of times a given amino acid residue (first column) is observed at the indicated CDRH1 (Table 6) or CDRH2 (Table 7) position. For example, at position 33 the amino acid type G (glycine) is observed 24 times in the set of IGHV1-69-based sequences that were examined. Thus, applying the criteria above, variants were constructed with N at position 31, L at position 32 (H can be charged, under some conditions), G and T at position 33, no variants at position 34 and N at position 35, resulting in the following VH1-69 chassis CDRH1 single-amino acid variant sequences:
-
(SEQ ID NO: 41) N YAIS (SEQ ID NO: 42) S L AIS (SEQ ID NO: 43) SY G IS (SEQ ID NO: 44) SY T IS (SEQ ID NO: 45) SYAI N - Similarly, the analysis that produced Table 7 provided a basis for choosing the following single-amino acid variant sequences for VH1-69 chassis CDRH2s:
-
(SEQ ID NO: 46) S IIPIFGTANYAQKFQG (SEQ ID NO: 47) GI A PIFGTANYAQKFQG (SEQ ID NO: 48) GITPI L GTANYAQKFQG (SEQ ID NO: 49) GIIPITGTA S YAQKFQG - A similar approach was used to design and construct variants of the other selected chassis; the resulting CDRH1 and CDRH2 variants for each of the exemplary chassis are provided in Table 8. One of ordinary skill in the art will readily recognize that the methods described herein can be applied to create variants of other VH chassis and VL chassis.
-
TABLE 8 VH Chassis Variants SEQ SEQ SEQ SEQ Chassis CDRH1 ID NO: CDRH2 ID NO: Chassis CDRH1 ID NO: CDRH2 ID NO: 1-18.0 SYGIS 50 WISAYNGNT 56 3-48.0 SYSMN 129 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.1 N YGIS 51 WISAYNGNT 56 3-48.11 N YSMN 130 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.2 S N GIS 52 WISAYNGNT 56 3-48.2 I YSMN 131 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.3 SY A IS 53 wISAYNGNT 56 3-48.3 S N SMN 132 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.4 SYGI T 54 WISAYNGNT 56 3-48.4 SY E MN 133 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.5 SYGI H 55 WISAYNGNT 56 3-48.5 SY N MN 134 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.6 SYGIS 50 S ISAYNGNT 57 3-48.6 SYSM T 135 YISSSSSTI 136 NYAQKLQG YYADSVKG 1-18.7 SYGIS 50 WIS T YNGNT 58 3-48.7 SYSMN 129 TISSSSSTI 137 NYAQKLQG YYADSVKG 1-18.8 SYGIS 50 WIS P YNGNT 59 3-48.8 SYSMN 129 YISGSSSTI 138 NYAQKLQG YYADSVKG 1-18.9 SYGIS 50 WIS A YNGNT 60 3-48.9 SYSMN 129 YISSSSSTI 139 YYAQKLQG L YADSVKG 1-2.0 GYYMH 61 WINPNSGGT 67 3-7.0 SYWMS 140 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.1 D YYMH 62 WINPNSGGT 67 3-7.1 T YWMS 141 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.2 R YYMH 63 WINPNSGGT 67 3-7.2 N YWMS 142 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.3 G S YMH 64 WINPNSGGT 67 3-7.3 S S WMS 143 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.4 GY S MH 65 WINPNSGGT 67 3-7.4 SY G MS 144 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.5 GYYM Q 66 WINPNSGGT 67 3-7.5 SYWM T 145 NIKQDGSEK 152 NYAQKFQG YYVDSVKG 1-2.6 GYYMH 61 S INPNSGGT 68 3-7.6 SYWMS 140 S IKQDGSEK 153 NYAQKFQG YYVDSVKG 1-2.7 GYYMH 61 WINP S SGGT 69 3-7.7 SYWMS 140 NI N QDGSEK 154 NYAQKFQG YYVDSVKG 1-2.8 GYYMH 61 WINPNSGGT 70 3-7.8 SYWMS 140 NIK S DGSEK 155 K YAQKFQG YYVDSVKG 1-2.9 GYYMH 61 WINPNSGGT 71 3-7.9 SYWMS 140 NIKQDGSEK 156 S YAQKFQG Q YVDSVKG 1-46.0 SYYMH 72 IINPSGGST 79 4-31.0 SGGYYWS 147 YIYYSGSTY 157 SYAQKFQG YNPSLKS 1-46.1 N YYMH 73 IINPSGGST 79 4-31.1 SG S YYWS 148 YIYYSGSTY 157 SYAQKFQG YNPSLKS 1-46.2 S S YMH 74 IINPSGGST 79 4-31.2 SG T YYWS 149 YIYYSGSTY 157 SYAQKFQG YNPSLKS 1-46.3 SY S MH 75 IINPSGGST 79 4-31.3 SGG T YWS 150 YIYYSGSTY 157 SYAQKFQG YNPSLKS 1-46.4 SYY I H 76 IINPSGGST 79 4-31.4 SGGY S WS 151 YIYYSGSTY 157 SYAQKFQG YNPSLKS 1-46.5 SYYM V 77 IINPSGGST 79 4-31.5 SGGYYWS 147 S IYYSGSTY 158 SYAQKFQG YNPSLKS 1-46.6 SYYM S 78 IINPSGGST 79 4-31.6 SGGYYWS 147 N IYYSGSTY 159 SYAQKFQG YNPSLKS 1-46.7 SYYMH 72 V INPSGGST 80 4-31.7 SGGYYWS 147 YIYYSG N TY 160 SYAQKFQG YNPSLKS 1-46.8 SYYMH 72 IINP G GGST 81 4-31.8 SGGYYWS 147 YIYYSGST S 161 SYAQKFQG YNPSLKS 1-46.9 SYYMH 72 IINPSGGST 82 4-31.9 SGGYYWS 147 YIYYSGST V 162 T YAQKFQG YNPSLKS 1-69.0 SYAIS 83 GIIPIFGTA 84 4-34.0 GYYWS 163 EIDHSGSTN 166 NYAQKFQG YNPSLKS 1-69.1 N YAIS 41 GIIPIFGTA 84 4-34.1 D YYWS 164 EIDHSGSTN 166 NYAQKFQG YNPSLKS 1-69.2 S L AIS 42 GIIPIFGTA 84 4-34.2 GYYW T 165 EIDHSGSTN 166 NYAQKFQG YNPSLKS 1-69.3 SY G IS 43 GIIPIFGTA 84 4-34.3 GYYWS 163 D IDHSGSTN 167 NYAQKFQG YNPSLKS 1-69.4 SY T IS 44 GIIPIFGTA 84 4-34.4 GYYWS 163 EI S HSGSTN 168 NYAQKFQG YNPSLKS 1-69.5 SYAI N 45 GIIPIFGTA 84 4-34.5 GYYWS 163 EID Q SGSTN 169 NYAQKFQG YNPSLKS 1-69.6 SYAIS 83 S IIPIFGTA 46 4-34.6 GYYWS 163 EIDH G GSTN 170 NYAQKFQG YNPSLKS 1-69.7 SYAIS 83 GI A PIFGTA 47 4-34.7 GYYWS 163 EIDHSG N TN 171 NYAQKFQG YNPSLKS 1-69.8 SYAIS 83 GIIPI L GTA 48 4-34.8 GYYWS 163 EIDHSGST S 172 NYAQKFQG YNPSLKS 1-69.9 SYAIS 83 GIIPIFGTA 49 4-34.9 GYYWS 163 EIDHSGST D 173 S YAQKFQG YNPSLKS 3-15.0 NAWMS 85 RIKSKTDGG 91 4-39.0 SSSYYWG 174 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.1 K AWMS 86 RIKSKTDGG 91 4-39.1 T SSYYWG 175 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.2 D AWMS 87 RIKSKTDGG 91 4-39.2 S N SYYWG 176 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.3 NA L MS 88 RIKSKTDGG 91 4-39.3 SS D YYWG 177 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.4 NA A MS 89 RIKSKTDGG 91 4-39.4 SS N YYWG 178 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.5 NAWM N 90 RIKSKTDGG 91 4-39.5 SS R YYWG 179 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.6 NAWMS 85 S IKSKTDGG 92 4-39.6 SSSY A WG 180 SIYYSGSTY 181 TTDYAAPVK YNPSLKS G 3-15.7 NAWMS 85 RIKS T TDGG 93 4-39.7 SSSYYWG 174 N IYYSGSTY 182 TTDYAAPVK YNPSLKS G 3-15.8 NAWMS 85 RIKSK A DGG 94 4-39.8 SSSYYWG 174 SI S YSGSTY 183 TTDYAAPVK YNPSLKS G 3-15.9 NAWMS 85 RIKSKTDGG 95 4-39.9 SSSYYWG 174 SIYYSGST S 184 TT G YAAPVK YNPSLKS G 3-23.0 SYAMS 96 AISGSGGST 100 4-59.0 SYYWS 185 YIYYSGST N 189 YYADSVKG YNPSLKS 3-23.1 N YAMS 97 AISGSGGST 100 4-59.1 T YYWS 186 YIYYSGSTN 189 YYADSVKG YNPSLKS 3-23.2 T YAMS 98 AISGSGGST 100 4-59.2 S S YWS 187 YIYYSGSTN 189 YYADSVKG YNPSLKS 3-23.3 S S AMS 99 AISGSGGST 100 4-59.3 SY S WS 188 YIYYSGSTN 189 YYADSVKG YNPSLKS 3-23.4 SYAMS 96 G ISGSGGST 101 4-59.4 SYYWS 185 F IYYSGSTN 190 YYADSVKG YNPSLKS 3-23.5 SYAMS 96 S ISGSGGST 102 4-59.5 SYYWS 185 H IYYSGSTN 191 YYADSVKG YNPSLKS 3-23.6 SYAMS 96 T ISGSGGST 103 4-59.6 SYYWS 185 S IYYSGSTN 192 YYADSVKG YNPSLKS 3-23.7 SYAMS 96 V ISGSGGST 104 4-59.7 SYYWS 185 YIY S SGSTN 193 YYADSVKG YNPSLKS 3-23.8 SYAMS 96 AIS A SGGST 105 4-59.8 SYYWS 185 YIYYSGST D 194 YYADSVKG YNPSLKS 3-23.9 SYAMS 96 AISGSGGST 106 4-59.9 SYYWS 185 YIYYSGST T 195 S YADSVKG YNPSLKS 3-30.0 SYGMH 107 VISYDGSNK 111 4-61.0 SGSYYWS 196 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.1 NYGMH 108 VISYDGSNK 111 4-61.1 SG G YYWS 197 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.2 SY A MH 109 VISYDGSNK 111 4-61.2 SG N YYWS 198 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.3 SYG F H 110 VISYDGSNK 111 4-61.3 SGS S YWS 199 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.4 SYGMH 107 F ISYDGSNK 112 4-61.4 SGSY S WS 200 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.5 SYGMH 107 L ISYDGSNK 113 4-61.5 SGSYYW T 201 YIYYSGSTN 202 YYADSVKG YNPSLKS 3-30.6 SYGMH 107 VIS S DGSNK 114 4-61.6 SGSYYWS 196 R IYYSGSTN 203 YYADSVKG YNPSLKS 3-30.7 SYGMH 107 VISYDG N NK 115 4-61.7 SGSYYWS 196 S IYYSGSTN 204 YYADSVKG YNPSLKS 3-30.8 SYGMH 107 VISYDGS I K 116 4-61.8 SGSYYWS 196 YIY T SGSTN 205 YYADSVKG YNPSLKS 3-30.9 SYGMH 107 VISYDGSN Q 117 4-61.9 SGSYYWS 196 YIYYSGST S 206 YYADSVKG YNPSLKS 3-33.0 SYGMH 118 VIWYDGSNK 124 4-B.0 SGYYWG 207 SIYHSGSTY 212 YYADSVKG YNPSLKS 3-33.1 T YGMH 119 VIWYDGSNK 124 4-B.1 SAYYWG 208 SIYHSGSTY 212 YYADSVKG YNPSLKS 3-33.2 N YGMH 120 VIWYDGSNK 124 4-B.2 SGSYWG 209 SIYHSGSTY 212 YYADSVKG YNPSLKS 3-33.3 S S GMH 121 VIWYDGSNK 124 4-B.3 SGY N WG 210 SIYHSGSTY 212 YYADSVKG YNPSLKS 3-33.4 SY A MH 122 VIWYDGSNK 124 4-B.4 SGYYW A 211 SIYHSGSTY 212 YYADSVKG YNPSLKS 3-33.5 SYGM N 123 VIWYDGSNK 124 4-B.5 SGYYWG 207 T IYHSGSTY 213 YYADSVKG YNPSLKS 3-33.6 SYGMH 118 L IWYDGSNK 125 4-B.6 SGYYWG 207 S S YHSGSTY 214 YYADSVKG YNPSLKS 3-33.7 SYGMH 118 F IWYDGSNK 126 4-B.7 SGYYWG 207 SIYHSG N TY 215 YYADSVKG YNPSLKS 3-33.8 SYGMH 118 VIWYDGSNK 127 4-B.8 SGYYWG 207 SIYHSGST N 216 S YADSVKG YNPSLKS 3-33.9 SYGMH 118 VIWYDGSNK 128 4-B.9 SGYYWG 207 SIYHSGST G 217 G YADSVKG YNPSLKS 5-51.0 SYWIG 218 IIYPGDSDT 224 RYSPSFQG 5-51.1 T YWIG 219 IIYPGDSDT 224 RYSPSFQG 5-51.2 N YWIG 220 IIYPGDSDT 224 RYSPSFQG 5-51.3 S N WIG 221 IIYPGDSDT 224 RYSPSFQG 5-51.4 SY Y IG 222 IIYPGDSDT 224 RYSPSFQG 5-51.5 SYWI S 223 IIYPGDSDT 224 RYSPSFQG 5-51.6 SYWIG 218 S IYPGDSDT 225 RYSPSFQG 5-51.7 SYWIG 218 IIYP A DSDT 226 RYSPSFQG 5-51.8 SYWIG 218 IIYPGDS S T 227 RYSPSFQG 5-51.9 SYWIG 218 IIYPGDSDT 228 T YSPSFQG 1Contains an N-linked glycosylation site which can be removed, if desired, as described herein. - As specified in the Detailed Description, other criteria can be used to select which amino acids are to be altered and the identity of the resulting altered sequence. This is true for any heavy chain chassis sequence, or any other sequence of the invention. The approach outlined above is meant for illustrative purposes and is non-limiting.
- This example describes the design of an exemplary VK chassis library. One of ordinary skill in the art will recognize that similar principles may be used to design a Vλ library, or a library containing both VK and Vλ chassis. Design of a Vλ chassis library is presented in Example 4.
- As was previously demonstrated in Example 1, for IGHV germline sequences, the sequence characteristics and occurrence of human IGKV germline sequences in antibodies from peripheral blood were analyzed. The data are presented in Table 9.
-
TABLE 9 IGKV Gene Characteristics and Occurrence in Antibodies from Peripheral Blood Estimated Relative Occurrence in Alternative CDRL1 CDRL2 Canonical Peripheral IGKV Gene Names Length Length Structures1 Blood2 IGKV1-05 L12 11 7 2-1-(U) 69 IGKV1-06 L11 11 7 2-1-(1) 14 IGKV1-08 L9 11 7 2-1-(1) 9 IGKV1-09 L8 11 7 2-1-(1) 24 IGKV1-12 L5, L19 11 7 2-1-(1) 32 IGKV1-13 L4, L18 11 7 2-1-(1) 13 IGKV1-16 L1 11 7 2-1-(1) 15 IGKV1-17 A30 11 7 2-1-(1) 34 IGKV1-27 A20 11 7 2-1-(1) 27 IGKV1-33 O8, O18 11 7 2-1-(1) 43 IGKV1-37 O14, O4 11 7 2-1-(1) 3 IGKV1-39 O2, O12 11 7 2-1-(1) 147 IGKV1D-16 L15 11 7 2-1-(1) 6 IGKV1D-17 L14 11 7 2-1-(1) 1 IGKV1D-43 L23 11 7 2-1-(1) 1 IGKV1D-8 L24 11 7 2-1-(1) 1 IGKV2-24 A23 16 7 4-1-(1) 8 IGKV2-28 A19, A3 16 7 4-1-(1) 62 IGKV2-29 A18 16 7 4-1-(1) 6 IGKV2-30 A17 16 7 4-1-(1) 30 IGKV2-40 O1, O11 17 7 3-1-(1) 3 IGKV2D-26 A8 16 7 4-1-(1) 0 IGKV2D-29 A2 16 7 4-1-(1) 20 IGKV2D-30 A1 16 7 4-1-(1) 4 IGKV3-11 L6 11 7 2-1-(1) 87 IGKV3-15 L2 11 7 2-1-(1) 53 IGKV3-20 A27 12 7 6-1-(1) 195 IGKV3D-07 L25 12 7 6-1-(1) 0 IGKV3D-11 L20 11 7 2-1-(U) 0 IGKV3D-20 A11 12 7 6-1-(1) 2 IGKV4-1 B3 17 7 3-1-(1) 83 IGKV5-2 B2 11 7 2-1-(1) 1 IGKV6-21 A10, A26 11 7 2-1-(1) 6 IGKV6D-41 A14 11 7 2-1-(1) 0 1Adapted from Tomlinson et al. EMBO J., 1995, 4: 4628, incorporated by reference in its entirety. The number in parenthesis refers to canonical structures in CDRL3, if one assuming the most common length (see Example 5 for further detail about CDRL3). 2Estimated from sets of human VK sequences compiled from the NCBI database; full set of GI numbers provided in Appendix A. - The 14 most commonly occurring IGKV germline genes (bolded in
column 6 of Table 9) account for just over 90% of the usage of the entire repertoire in peripheral blood. From the analysis of Table 9, ten IGKV germline genes were selected for representation as chassis in the currently exemplified library (Table 10). All but V1-12 and V1-27 are among the top 10 most commonly occurring. IGKV germline genes VH2-30, which was tenth in terms of occurrence in peripheral blood, was not included in the currently exemplified embodiment of the library, in order to maintain the proportion of chassis with short (i.e., 11 or 12 residues in length) CDRL1 sequences at about 80% in the final set of 10 chassis. V1-12 was included in its place. V1-17 was more similar to other members of the V1 family that were already selected; therefore, V1-27 was included, instead of V1-17. In other embodiments, the library could include 12 chassis (e.g., the ten of Table 10 plus V1-17 and V2-30), or a different set of any “N” chassis, chosen strictly by occurrence (Table 9) or any other criteria. The ten chosen VK chassis account for about 80% of the usage in the data set believed to be representative of the entire kappa light chain repertoire. -
TABLE 10 VK Chassis Selected for Use in the Exemplary Library Estimated Relative CDR-L1 CDR-L2 Canonical Occurrence in Chassis Length Length Structures Peripheral Blood VK1-5 11 7 2-1-(U) 69 VK1-12 11 7 2-1-(1) 32 VK1-27 11 7 2-1-(1) 27 VK1-33 11 7 2-1-(1) 43 VK1-39 11 7 2-1-(1) 147 VK2-28 16 7 4-1-(1) 62 VK3-11 11 7 2-1-(1) 87 VK3-15 11 7 2-1-(1) 53 VK3-20 12 7 6-1-(1) 195 VK4-1 17 7 3-1-(1) 83 - The amino acid sequences of the selected VK chassis enumerated in Table 10 are provided in Table 11.
-
TABLE 11 Amino Acid Sequences for VK Chassis Selected for Inclusion in the Exemplary Library Chassis FRM1 CDRL1 FRM2 CDRL2 FRM3 CDRL31 SEQ ID NO: VK1-5 DIQMTQS RASQSI WYQQKP DASSLE GVPSRFSGSGSGT QQYNS 229 PSTLSAS SSWLA GKAPKL S EFTLTISSLQPDD YS VGDRVTI LIY LIY TC VK1-12 DIQMTQS RASQGI WYQQKP AASSLQ GVPSRFSGSGSGT QQANS 230 PSSVSAS SSQLA GKAPKL S DFTLTISSLQPED FP VGDRVTI LIY FATYYC TC VK1-27 DIQMTQS RASQGI WYQQKP AASTLQ GVPSRFSGSGSGT QKYNS 231 PSSLSAS SNYLA GVKPKL S DFTLTISSLQPED AP VGDRVTI LIY VATYYC TC VK1-33 DIQMTQS QASQDI WYQQKP DASNLE GVPSRFSGSGSGT QQYDN 232 PSSLSAS SNYLN GKAPKL T DFTLTISSLQPED LP VGDRVTI LIY IATYYC TC VK1-39 DIQMTQS RASQSI WYQQKP AASSLQ GVPSRFSGSGSGT QQSYS 233 PSSLSAS SSYLN GKAPKL S DFTLTISSLQPED TP VGDRVTI LIY FATYYC TC VK2-28 DIVMTQS RSSQSL WYLQKP LGSNRA GVPDRFSGSGSGT MQALQ 234 PLSLPVT LHSNGY GQSPQL S DFTLKISRVEAED TP PGEPASI NYLD LIY VGVYYC SC VK3-11 EIVLTQS RASQSV WYQQKP DASNRA GIPARFSGSGSGT QQRSN 235 PATLSLS SSYLA GQAPRL R DFTLTISSLEPED WP PGERATL LIY FAVYYC SC VK3-15 EIVMTQS EASQSV WYQQKP GASTRA GIPARFSGSGSGT QQYNN 236 PATLSVS SSNLA GQAPRL T EFTLTISSLQSED WP PGERATL LIY FAVYYC SC VK3-20 EIVLTQS RASQSV WYQQKP GASSRA GIPDRFSGSGSGT QQYGS 237 PGTLSLS SSSYLA GQPARL T DFTLTISRLEPED SP PGERATL LIY FAVYYC SC VK4-1 DIVMTQS KSSQSV WYQQKP WASTRE GVPDRFSGSGSGT QQYYS 238 PDSLAVS LYSSNN GQPPKL S DFTLTISSLQAED TP LGERATI KNYLA LIY VAVYYC NC 1Note that the portion of the IGKV gene contributing to VKCDR3 is not considered part of the chassis as described herein. The VK chassis is defined as Kabat residues 1 to 88 of the IGKV-encoded sequence, or from the start of FRM1 to the end of FRM3. The portion of the VKCDR3 sequence contributed by the IGKV gene is referred to herein as the L3-KV region. - This example, describes the design of an exemplary Vλ chassis library. As was previously demonstrated in Examples 1-3, for the VH and VK chassis sequences, the sequence characteristics and occurrence of human IgλV germline-derived sequences in peripheral blood were analyzed. As with the assignment of other sequences set forth herein to germline families, assignment of Vλ sequences to a germline family was performed via SoDA and VBASE2 (Volpe and Kepler, Bioinformatics, 2006, 22: 438; Mollova et al., BMS Systems Biology, 2007, 1S: P30, each incorporated by reference in its entirety). The data are presented in Table 12.
-
TABLE 12 IGλV Gene Characteristics and Occurrence in Peripheral Blood Estimated Contribution Relative of IGVλ Occurrence in IGλV Alternative Canonical Gene to Peripheral Gene Name Structures1 CDRL3 Blood2 IGλV3-1 3R 11-7(*) 8 11.5 IGλV3-21 3H 11-7(*) 9 10.5 IGλV2-14 2A2 14-7(A) 9 10.1 IGλV1-40 1E 14-7(A) 9 7.7 IGλV3-19 3L 11-7(*) 9 7.6 IGλV1-51 1B 13-7(A) 9 7.4 IGλV1-44 1C 13-7(A) 9 7.0 IGλV6-57 6A 13-7(B) 7 6.1 IGλV2-8 2C 14-7(A) 9 4.7 IGλV3-25 3M 11-7(*) 9 4.6 IGλV2-23 2B2 14-7(A) 9 4.3 IGλV3-10 3P 11-7(*) 9 3.4 IGλV4-69 4B 12-11(*) 7 3.0 IGλV1-47 IG 13-7(A) 9 2.9 IGλV2-11 2E 14-7(A) 9 1.3 IGλV7-43 7A 14-7(B) 8 1.3 IGλV7-46 7B 14-7(B) 8 1.1 IGλV5-45 5C 14-11(*) 8 1.0 IGλV4-60 4A 12-11(*) 7 0.7 IGλV10-54 8A 14-7(B) 8 0.7 IGλV8-61 10A 13-7(C) 9 0.7 IGλV3-9 3J 11-7(*) 8 0.6 IGλV1-36 1A 13-7(A) 9 0.4 IGλV2-18 2D 14-7(A) 9 0.3 IGλV3-16 3A 11-7(*) 9 0.2 IGλV3-27 11-7(*) 7 0.2 IGλV4-3 5A 14-11(*) 8 0.2 IGλV5-39 4C 12-11(*) 12 0.2 IGλV9-49 9A 12-12(*) 12 0.2 IGλV3-12 3I 11-7(*) 9 0.1 1Adapted from Williams et al. J. Mol. Biol. 1996: 264, 220-32. The (*) indicates that the canonical structure is entirely defined by the lengths of CDRs L1 and L2. When distinct structures are possible for identical L1 and L2 length combinations, the structure present in a given gene is set forth as A, B, or C. 2Estimated from a set of human Vλ sequences compiled from the NCBI database; full set of GI codes set forth in Appendix B. - To choose a subset of the sequences from Table 12 to serve as chassis, those represented at less than 1% in peripheral blood (as extrapolated from analysis of published sequences corresponding to the GI codes provided in Appendix B) were first discarded. From the remaining 18 germline sequences, the top occurring genes for each unique canonical structure and contribution to CDRL3, as well as any germline gene represented at more than the 500 level, were chosen to constitute the exemplary Vλ chassis. The list of 11 such sequences is given in Table 13, below. These 11 sequences represent approximately 73% of the repertoire in the examined data set (Appendix B).
-
TABLE 13 Vλ Chassis Selected for Use in the Exemplary Library CDRL1 CDRL2 Canonical Relative Chassis Length Length Structure Occurrence Vλ3-1 11 7 11-7(*) 11.5 Vλ3-21 11 7 11-7(*) 10.5 Vλ2-14 14 7 14-7(A) 10.1 Vλ1-40 14 7 14-7(A) 7.7 Vλ3-19 11 7 11-7(*) 7.6 Vλ1-51 13 7 13-7(A) 7.4 Vλ1-44 13 7 13-7(A) 7.0 Vλ6-57 13 7 13-7(B) 6.1 Vλ4-69 12 11 12-11(*) 3.0 Vλ7-43 14 7 14-7(B) 1.3 Vλ5-45 11 11 14-11(*) 1.0 - The amino acid sequences of the selected Vλ chassis enumerated in Table 13 are provided in Table 14, below.
-
TABLE 14 Amino Acid Sequences for Vλ Chassis Selected for Inclusion in the Exemplary Library Chassis FRM1 CDRL1 FRM2 CDRL2 FRM3 CDRL32 Vλ1-40 QSVLTQP TGSSSN WYQQLP GN---- GVPDRFSGSKSG-- QSYDSS SEQ ID NO: 531 PSVSGAP IGAGYD GTAPKL SNRPS TSASLAITGLQAED LSG GQRVTIS ---VH LIY EADYYC C Vλ1-44 QSVLTQP SGSSSN WYQQKP SN---- GVPDRFSGSKSG-- QQWDDS SEQ ID NO: 532 PSASGTP IGSNT- GTAPKL NQRPS TSASLAISGLQSED LNG GQRVTIS ---VN LIY EADYYC C Vλ1-51 QSVLTQP SGSSSN WYQQLP DN---- GIPDRFSGSKSG-- GTWDSS SEQ ID NO: 533 PSVSAAP IGNNY- GTAPKL NKRPS TSATLGITGLQTGD LSA GQKVTIS ---VS LIY EADYYC C Vλ2-14 QSALTQP TGTSSD WYQQHP EV---- GVSNRFSGSKSG-- SSYTSS SEQ ID NO: 534 ASVSGSP VGGYNY GKAPKL SNRPS NTASLTISGLQAED STL GQSITIS ---VS MIY EADYYC C Vλ3-11 SYELTQP SGDKLG WYQQKP QD---- GIPERFSGSNSG-- QAWDSS SEQ ID NO: 535 PSVSVSP DKY--- GQSPVL SKRPS NTATLTISGTQAMD TA- GQTASIT ---A S VIY EADYYC C Vλ3-19 SSELTQD QGDSLR WYQQKP GK---- GIPDRFSGSSSG-- NSRDSS SEQ ID NO: 536 PAVSVAL SYY--- GQAPVL NNRPS NTASLTITGAQAED GNH GQTVRIT ---AS VIY EADYYC C Vλ3-21 SYVLTQP GGNNIG WYQQKP YD---- GIPERFSGSNSG-- QVWDSS SEQ ID NO: 537 PSVSVAP SKS--- GQAPVL SDRPS NTATLTISRVEAGD SDH GKTARIT ---VH VIY EADYYC C Vλ4-69 QLVLTQS TLSSGH WHQQQP LNSDGS GIPDRFSGSSSG-- QTWGTG SEQ ID NO: 538 PSASASL SSYA-- EKGPRY HSKGD ARTYLTISSLQSED I-- GASVKLT ---IA LMK EADYYC C Vλ6-57 NFMLTQP TRSSGS WYQQRP ED---- GVPDRFSGSIDSSS QSYDSS SEQ ID NO: 539 HSVSESP IASNY- GSSPTT NQRPS NSASLTISGLKTED N-- GKTVTIS ---VQ VIY EADYYC C Vλ5-45 QAVLTQP TLRSGI WYQQKP YYSDSD GVPSRFSGSKDASA MIWHSS SEQ ID NO: 540 ASLSASP NVGTYR GSPPQY KQQGS NAGILLISGLQSED AS- GASASLT ---IY LLR EADYYC C Vλ7-43 QTVVTQE ASSTGA WFQQKP ST---- WTPARFSGSLLG-- LLYYGG SEQ ID NO: 541 PSLTVSP VTSGYY GQAPRA SNKHS GKAALTLSGVQPED AQ- GGTVTLT ---PN LIY EAEYYC C 1The last amino acid in CDRL1 of the Vλ3-1 chassis, S, differs from the corresponding on in the IGλV3-1 germline gene, C. This was done to avoid having a potentially unpaired CYS (C) amino acid in the resulting synthetic light chain. 2Note that, as for the VK chassis, the portion of the IGλV gene contributing to VλCDR3 is not considered part of the chassis as described herein. The Vλ chassis is defined as Kabat residues 1 to 88 of the IGλV-encoded sequence,or from the start of FRM1 to the end of FRM3. The portion of the VλCDR3 sequence contributed by the IGλV gene is referred to herein as the L3-Vλ region. - This example describes the design of a CDHR3 library from its individual components. In nature, the CDRH3 sequence is derived from a complex process involving recombination of three different genes, termed IGHV, IGHD and IGHJ. In addition to recombination, these genes may also undergo progressive nucleotide deletions: from the 3′ end of the IGHV gene, either end of the IGHD gene, and/or the 5′ end of the IGHJ gene. Non-templated nucleotide additions may also occur at the junctions between the V, D and J sequences. Non-templated additions at the V-D junction are referred to as “N1”, and those at the D-J junction are referred to as “N2”. The D gene segments may be read in three forward and, in some cases, three reverse reading frames.
- In the design of the present exemplary library, the codon (nucleotide triplet) or single amino acid was designated as a fundamental unit, to maintain all sequences in the desired reading frame. Thus, all deletions or additions to the gene segments are carried out via the addition or deletion of amino acids or codons, and not single nucleotides. According to the CDRH3 numbering system of this application, CDRH3 extends from amino acid number 95 (when present; see Example 1) to
amino acid 102. - In this illustrative example, selection of DH gene segments for use in the library was performed according to principles similar to those used for the selection of the chassis sequences. First, an analysis of IGHD gene usage was performed, using data from Lee et al., Immunogenetics, 2006, 57: 917; Corbett et al., PNAS, 1982, 79: 4118; and Souto-Cameiro et al., J. Immunol., 2004, 172: 6790 (each incorporated by reference in its entirety), with preference for representation in the library given to those IGHD genes most frequently observed in human sequences. Second, the degree of deletion on either end of the IGHD gene segments was estimated by comparison with known heavy chain sequences, using the SoDA algorithm (Volpe et al., Bioinformatics, 2006, 22: 438, incorporated by reference in its entirety) and sequence alignments. For the presently exemplified library, progressively deleted DH segments, as short as three amino acids, were included. As enumerated in the Detailed Description, other embodiments of the invention comprise DH segments with deletions to a different length, for example, about 1, 2, 4, 5, 6, 7, 8, 9, or 10 amino acids. Table 15 shows the relative occurrence of IGHD gene usage in human antibody heavy chain sequences isolated mainly from peripheral blood B cells (list adapted from Lee et al., Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety).
-
TABLE 15 Usage of IGHD Genes Based on Relative Occurrence in Peripheral Blood* Estimated Relative Occurrence IGHD Gene in Peripheral Blood3 IGHD3-10 117 IGHD3-22 111 IGHD6-19 95 IGHD6-13 93 IGHD3-3 82 IGHD2-2 63 IGHD4-17 61 IGHD1-26 51 IGHD5-5/5-181 49 IGHD2-15 47 IGHD6-6 38 IGHD3-9 32 IGHD5-12 29 IGHD5-24 29 IGHD2-21 28 IGHD3-16 18 IGHD4-23 13 IGHD1-1 9 IGHD1-7 9 IGHD4-4/4-112 7 IGHD1-20 6 IGHD7-27 6 IGHD2-8 4 IGHD6-25 3 1Although distinct genes in the genome, the nucleotide sequences of IGHD5-5 and IGHD5-18 are 100% identical and thus indistinguishable in rearranged VH sequences. 2IGHD4-4 and IGHD4-11 are also 100% identical. 3Adapted from Lee et al. Immunogenetics, 2006, 57: 917, by merging the information for distinct alleles of the same IGHD gene. *IGHD1-14 may also be included in the libraries of the invention. - The translations of the ten most commonly expressed IGHD gene sequences found in naturally occurring human antibodies, in three reading frames, are shown in Table 16. Those reading frames which occur most commonly in peripheral blood have been highlighted in gray. As in Table 15, data regarding IGHD sequence usage and reading frame statistics were derived from Lee et al., 2006, and data regarding IGHD sequence reading frame usage were further complemented by data derived from Corbett et al., PNAS, 1982, 79: 4118 and Souto-Cameiro et al., J. Immunol, 2004, 172: 6790, each of which is incorporated by reference in its entirety.
-
TABLE 16 Translations of the Ten Most Common Naturally Occurring IGHD Sequences, in Three Reading Frames (RF) SEQ SEQ SEQ IGHD RF 1 ID NO RF 2ID NO RF 3ID NO IGHD3-10 VLLWFGELL 1 YYYGSGSYYN 2 ITMVRGVII 3 IGHD3-22 VLLL###WLLL 239 YYYDSSGYYY 4 ITMIVVVIT 240 IGHD6-19 GYSSGWY 5 GIAVAG 6 V#QWLV 241 IGHD6-13 GYSSSWY 7 GIAAAG 8 V#QQLV 242 IGHD3-03 VLRFLEWLLY 243 YYDFWSGYYT 244 ITIFGVVII 9 IGHD2-02 WIL##YQLLC 245 GYCSSTSCYT 10 DIVVVPAAM 11 IGHD4-17 #LR#L 246 DYGDY 12 TTVT 247 IGHD1-26 GIVGATT 13 V#WELL 248 YSGSYY 14 IGHD5-5/5-18 VDTAMVT 249 WIQLWL 250 GYSYGY 15 IGHD2-15 RIL#WW#LLL 251 GYCSGGSCYS 16 DIVVVVAAT 252 #represents a stop codon. Reading frames in bold text correspond to the most commonly used reading frames. - In the presently exemplified library, the top 10 IGHD genes most frequently used in heavy chain sequences occurring in peripheral blood were chosen for representation in the library. Other embodiments of the library could readily utilize more or fewer D genes. The amino acid sequences of the selected IGHD genes, including the most commonly used reading frames and the total number of variants after progressive N- and C-terminal deletion to a minimum of three residues, are listed in Table 17. As depicted in Table 17, only the most commonly occurring alleles of certain IGHD genes were included in the illustrative library. This is, however, not required, and other embodiments of the invention may utilize IGHD reading frames that occur less frequently in the peripheral blood.
-
TABLE 17 D Genes Selected for use in the Exemplary Library SEQ Total Number IGHD Gene1 Amino Acid Sequence ID NO: of Variants2 IGHD1-26_1 GIVGATT 13 15 IGHD1-26_3 YSGSYY 14 10 IGHD2-2_2 GYCSSTSCYT 10 93 IGHD2-2_3 DIVVVPAAM 11 28 IGHD2-15_2 GYCSGGSCYS 16 9 IGHD3-3_3 ITIFGVVII 9 28 IGHD3-10_1 VLLWFGELL 1 28 IGHD3-10_2 YYYGSGSYYN 2 36 IGHD3-10_3 ITMVRGVII 3 28 IGHD3-22_2 YYYDSSGYYY 4 36 IGHD4-17_2 DYGDY 12 6 IGHD5-5_3 GYSYGY 15 10 IGHD6-13_1 GYSSSWY 7 15 IGHD6-13_2 GIAAAG 8 10 IGHD6-19_1 GYSSGWY 5 15 IGHD6-19_2 GIAVAG 6 10 1The reading frame (RF) is specified as _RF after the name of the gene. 2In most cases the total number of variants is given by (N-1) times (N-2) divided by two, where N is the total length in amino acids of the intact D segment. 3As detailed herein, the number of variants for segments containig a putative disulfide bond (two C or Cys residues) is limited in this illustrative embodiment. - For each of the selected sequences of Table 17, variants were generated by systematic deletion from the N- and/or C-termini, until there were three amino acids remaining. For example, for the IGHD4-17_2 above, the full sequence DYGDY (SEQ ID NO: 12) may be used to generate the progressive deletion variants: DYGD (SEQ ID NO: 613), YGDY (SEQ ID NO: 614), DYG, GDY and YGD. In general, for any full-length sequence of size N, there will be a total of (N−1)*(N−2)/2 total variants, including the original full sequence. For the disulfide-loop-encoding segments, as exemplified by
reading frame 2 of both IGHD2-2 and IGHD2-15, (i.e., IGHD2-2_2 and IGH2-15_2), the progressive deletions were limited, so as to leave the loop intact i.e., only amino acids N-terminal to the first Cys, or C-terminal to the second Cys, were deleted in the respective DH segment variants. The foregoing strategy was used to avoid the presence of unpaired cysteine residues in the exemplified version of the library. However, as discussed in the Detailed Description, other embodiments of the library may include unpaired cysteine residues, or the substitution of these cysteine residues with other amino acids. In the cases where the truncation of the IGHD gene is limited by the presence of the Cys residues, only 9 variants (including the original full sequence) were generated; e.g., for IGHD2-2_2, the variants would be: GYCSSTSCYT (SEQ ID NO: 10), GYCSSTSCY (SEQ ID NO: 615), YCSSTSCYT (SEQ ID NO: 616), CSSTSCYT (SEQ ID NO: 617), GYCSSTSC (SEQ ID NO: 618), YCSSTSCY (SEQ ID NO: 619), CSSTSCY (SEQ ID NO: 620), YCSSTSC (SEQ ID NO: 621) and CSSTSC (SEQ ID NO: 622). - According to the criteria outlined above, 293 DH sequences were obtained from the selected IGHD gene segments, including the original IGHD gene segments. Certain sequences are redundant. For example, it is possible to obtain the YYY variant from either IGHD3-102 (full sequence YYYGSGSYYN (SEQ ID NO: 2)), or in two different ways from IGHD3-222 (SEQ ID NO: 4) (YYYDSSGYYY). When redundant sequences are removed, the number of unique DH segment sequences in this illustrative embodiment of the library is 278. These sequences are enumerated in Table 18.
-
TABLE 18 DH Gene Segments Used in the Presently Exemplified Library* DH Segment SEQ DH Segment SEQ ID Designation1 Peptide ID NO: Designation Peptide NO: IGHD1-26_1-1 ATT IGHD3-10_2- YYGSG 713 20 IGHD1-26_1-2 GAT IGHD3-10_2- YYYGS 714 21 IGHD1-26_1-3 GIV IGHD3-10_2- GSGSYY 715 22 IGHD1-26_1-4 IVG IGHD3-10_2- SGSYYN 716 23 IGHD1-26_1-5 VGA IGHD3-10_2- YGSGSY 717 24 IGHD1-26_1-6 GATT 623 IGHD3-10_2- YYGSGS 718 25 IGHD1-26_1-7 GIVG 624 IGHD3-10_2- YYYGSG 719 26 IGHD1-26_1-8 IVGA 625 IGHD3-10_2- GSGSYYN 720 27 IGHD1-26_1-9 VGAT 626 IGHD3-10_2- YGSGSYY 721 28 IGHD1-26_1-10 GIVGA 627 IGHD3-10_2- YYGSGSY 722 29 IGHD1-26_1-11 IVGAT 628 IGHD3-10_2- YYYGSGS 723 30 IGHD1-26_1-12 VGATT 629 IGHD3-10_2- YGSGSYYN 724 31 IGHD1-26_1-13 GIVGAT 630 IGHD3-10_2- YYGSGSYY 725 32 IGHD1-26_1-14 IVGATT 631 IGHD3-10_2- YYYGSGSY 726 33 IGHD1-26_1-15 GIVGATT 13 IGHD3-10_2- YYGSGSYYN 727 34 IGHD1-26_3-1 YSG IGHD3-10_2- YYYGSGSYY 728 35 IGHD1-26_3-2 YSGS 632 IGHD3-10_2- YYYGSGSYYN 2 36 IGHD1-26_3-3 YSGSY 633 IGHD3-10_3-1 GVI IGHD1-26_3-4 YSGSYY 14 IGHD3-10_3-2 ITM IGHD2-02_2-1 CSSTSC 622 IGHD3-10_3-3 MVR IGHD2-02_2-2 CSSTSCY 620 IGHD3-10_3-4 RGV IGHD2-02_2-3 YCSSTSC 621 IGHD3-10_3-5 TMV IGHD2-02_2-4 CSSTSCYT 617 IGHD3-10_3-6 VII IGHD2-02_2-5 GYCSSTSC 618 IGHD3-10_3-7 VRG IGHD2-02_2-6 YCSSTSCY 619 IGHD3-10_3-8 GVII 729 IGHD2-02_2-7 GYCSSTSCY 615 IGHD3-10_3-9 ITMV 730 IGHD2-02_2-8 YCSSTSCYT 616 IGHD3-10_3- MVRG 731 10 IGHD2-02_2-9 GYCSSTSCYT 10 IGHD3-10_3- RGVI 732 11 IGHD2-02_3-1 AAM IGHD3-10_3- TMVR 733 12 IGHD2-02_3-2 DIV IGHD3-10_3- VRGV 734 13 IGHD2-02_3-3 IVV IGHD3-10_3- ITMVR 735 14 IGHD2-02_3-4 PAA IGHD3-10_3- MVRGV 736 15 IGHD2-02_3-5 VPA IGHD3-10_3- RGVII 737 16 IGHD2-02_3-6 VVP IGHD3-10_3- TMVRG 738 17 IGHD2-02_3-7 VVV IGHD3-10_3- VRGVI 739 18 IGHD2-02_3-8 DIVV 634 IGHD3-10_3- ITMVRG 740 19 IGHD2-02_3-9 IVVV 635 IGHD3-10_3- MVRGVI 741 20 IGHD2-02_3-10 PAAM 636 IGHD3-10_3- TMVRGV 742 21 IGHD2-02_3-11 VPAA 637 IGHD3-10_3- VRGVII 743 22 IGHD2-02_3-12 VVPA 638 IGHD3-10_3- ITMVRGV 744 23 IGHD2-02_3-13 VVVP 639 IGHD3-10_3- MVRGVII 745 24 IGHD2-02_3-14 DIVVV 640 IGHD3-10_3- TMVRGVI 746 25 IGHD2-02_3-15 IVVVP 641 IGHD3-10_3- ITMVRGVI 747 26 IGHD2-02_3-16 VPAAM 642 IGHD3-10_3- TMVRGVII 748 27 IGHD2-02_3-17 VVPAA 643 IGHD3-10_3- ITMVRGVII 3 28 IGHD2-02_3-18 VVVPA 644 IGHD3-22_2-1 DSS IGHD2-02_3-19 DIVVVP 645 IGHD3-22_2-2 GYY IGHD2-02_3-20 IVVVPA 646 IGHD3-22_2-3 SGY IGHD2-02_3-21 VVPAAM 647 IGHD3-22_2-4 SSG IGHD2-02_3-22 VVVPAA 648 IGHD3-22_2-5 YDS IGHD2-02_3-23 DIVVVPA 649 IGHD3-22_2-6 YYD IGHD2-02_3-24 IVVVPAA 650 IGHD3-22_2-7 DSSG 749 IGHD2-02_3-25 VVVPAAM 651 IGHD3-22_2-8 GYYY 750 IGHD2-02_3-26 DIVVVPAA 652 IGHD3-22_2-9 SGYY 751 IGHD2-02_3-27 IVVVPAAM 653 IGHD3-22_2- SSGY 752 10 IGHD2-02_3-28 DIVVVPAAM 11 IGHD3-22_2- YDSS 753 11 IGHD2-15_2-1 CSGGSC 654 IGHD3-22_2- YYDS 754 12 IGHD2-15_2-2 CSGGSCY 655 IGHD3-22_2- YYYD 755 13 IGHD2-15_2-3 YCSGGSC 656 IGHD3-22_2- DSSGY 756 14 IGHD2-15_2-4 CSGGSCYS 657 IGHD3-22_2- SGYYY 757 15 IGHD2-15_2-5 GYCSGGSC 658 IGHD3-22_2- SSGYY 758 16 IGHD2-15_2-6 YCSGGSCY 659 IGHD3-22_2- YDSSG 759 17 IGHD2-15_2-7 GYCSGGSCY 660 IGHD3-22_2- YYDSS 760 18 IGHD2-15_2-8 YCSGGSCYS 661 IGHD3-22_2- YYYDS 761 19 IGHD2-15_2-9 GYCSGGSCYS 16 IGHD3-22_2- DSSGYY 762 20 IGHD3-03_3-1 FGV IGHD3-22_2- SSGYYY 763 21 IGHD3-03_3-2 GVV IGHD3-22_2- YDSSGY 764 22 IGHD3-03_3-3 IFG IGHD3-22_2- YYDSSG 765 23 IGHD3-03_3-4 ITI IGHD3-22_2- YYYDSS 766 24 IGHD3-03_3-5 TIF IGHD3-22_2- DSSGYYY 767 25 IGHD3-03_3-6 VVI IGHD3-22_2- YDSSGYY 768 26 IGHD3-03_3-7 FGVV 662 IGHD3-22_2- YYDSSGY 769 27 IGHD3-03_3-8 GVVI 663 IGHD3-22_2- YYYDSSG 770 28 IGHD3-03_3-9 IFGV 664 IGHD3-22_2- YDSSGYYY 771 29 IGHD3-03_3-10 ITIF 665 IGHD3-22_2- YYDSSGYY 772 30 IGHD3-03_3-11 TIFG 666 IGHD3-22_2- YYYDSSGY 773 31 IGHD3-03_3-12 VVII 667 IGHD3-22_2- YYDSSGYYY 774 32 IGHD3-03_3-13 FGVVI 668 IGHD3-22_2- YYYDSSGYY 775 33 IGHD3-03_3-14 GVVII 669 IGHD3-22_2- YYYDSSGYYY 4 34 IGHD3-03_3-15 IFGVV 670 IGHD4-17_2-1 DYG IGHD3-03_3-16 ITIFG 671 IGHD4-17_2-2 GDY IGHD3-03_3-17 TIFGV 672 IGHD4-17_2-3 YGD IGHD3-03_3-18 FGVVII 673 IGHD4-17_2-4 DYGD 613 IGHD3-03_3-19 IFGVVI 674 IGHD4-17_2-5 YGDY 614 IGHD3-03_3-20 ITIFGV 675 IGHD4-17_2-6 DYGDY 12 IGHD3-03_3-21 TIFGVV 676 IGHD5-5_3-1 SYG IGHD3-03_3-22 IFGVVII 677 IGHD5-5_3-2 YGY IGHD3-03_3-23 ITIFGVV 678 IGHD5-5_3-3 YSY IGHD3-03_3-24 TIFGVVI 679 IGHD5-5_3-4 GYSY 776 IGHD3-03_3-25 ITIFGVVI 680 IGHD5-5_3-5 SYGY 777 IGHD3-03_3-26 TIFGVVII 681 IGHD5-5_3-6 YSYG 778 IGHD3-03_3-27 ITIFGVVII 9 IGHD5-5_3-7 GYSYG 779 IGHD3-10_1-1 ELL IGHD5-5_3-8 YSYGY 780 IGHD3-10_1-2 FGE IGHD5-5_3-9 GYSYGY 15 IGHD3-10_1-3 GEL IGHD6-13_1-1 SSS IGHD3-10_1-4 LLW IGHD6-13_1-2 SSW IGHD3-10_1-5 LWF IGHD6-13_1-3 SWY IGHD3-10_1-6 VLL IGHD6-13_1-4 sssW 781 IGHD3-10_1-7 WFG IGHD6-13_1-5 SSWY 782 IGHD3-10_1-8 FGEL 682 IGHD6-13_1-6 YSSS 783 IGHD3-10_1-9 GELL 683 IGHD6-13_1-7 GYSSS 784 IGHD3-10_1-10 LLWF 684 IGHD6-13_1-8 SSSWY 785 IGHD3-10_1-11 LWFG 685 IGHD6-13_1-9 YSSSW 786 IGHD3-10_1-12 VLLW 686 IGHD6-13_1- GYSSSW 787 10 IGHD3-10_1-13 WFGE 687 IGHD6-13_1- YSSSWY 788 11 IGHD3-10_1-14 FGELL 688 IGHD6-13_1- GYSSSWY 7 12 IGHD3-10_1-15 LLWFG 689 IGHD6-19_1-1 GWY IGHD3-10_1-16 LWFGE 690 IGHD6-19_1-2 GYS IGHD3-10_1-17 VLLWF 691 IGHD6-19_1-3 SGW IGHD3-10_1-18 WFGEL 692 IGHD6-19_1-4 YSS IGHD3-10_1-19 LLWFGE 693 IGHD6-19_1-5 GYSS 789 IGHD3-10_1-20 LWFGEL 694 IGHD6-19_1-6 SGWY 790 IGHD3-10_1-21 VLLWFG 695 IGHD6-19_1-7 SSGW 791 IGHD3-10_1-22 WFGELL 696 IGHD6-19_1-8 YSSG 792 IGHD3-10_1-23 LLWFGEL 697 IGHD6-19_1-9 GYSSG 793 IGHD3-10_1-24 LWFGELL 698 IGHD6-19_1- SSGWY 794 10 IGHD3-10_1-25 VLLWFGE 699 IGHD6-19_1- YSSGW 795 11 IGHD3-10_1-26 LLWFGELL 700 IGHD6-19_1- GYSSGW 796 12 IGHD3-10_1-27 VLLWFGEL 701 IGHD6-19_1- YSSGWY 797 13 IGHD3-10_1-28 VLLWFGELL 1 IGHD6-19_1- GYSSGWY 5 14 IGHD3-10_2-1 GSG IGHD6-19_2-1 AVA IGHD3-10_2-2 GSY IGHD6-19_2-2 GIA IGHD3-10_2-3 SGS IGHD6-19_2-3 IAV IGHD3-10_2-4 SYY IGHD6-19_2-4 VAG IGHD3-10_2-5 YGS IGHD6-19_2-5 AVAG 798 IGHD3-10_2-6 YYG IGHD6-19_2-6 GIAV 799 IGHD3-10_2-7 YYN IGHD6-19_2-7 IAVA 800 IGHD3-10_2-8 YYY IGHD6-19_2-8 GIAVA 801 IGHD3-10_2-9 GSGS 702 IGHD6-19_2-9 IAVAG 802 IGHD3-10_2-10 GSYY 703 IGHD6-19_2- GIAVAG 6 10 IGHD3-10_2-11 SGSY 704 IGHD6-13_2-1 AAA IGHD3-10_2-12 SYYN 705 IGHD6-13_2-2 AAG IGHD3-10_2-13 YGSG 706 IGHD6-13_2-3 IAA IGHD3-10_2-14 YYGS 707 IGHD6-13_2-4 AAAG 803 IGHD3-10_2-15 YYYG 708 IGHD6-13_2-5 GIAA 804 IGHD3-10_2-16 GSGSY 709 IGHD6-13_2-6 IAAA 805 IGHD3-10_2-17 GSYYN 710 IGHD6-13_2-7 GIAAA 806 IGHD3-10_2-18 SGSYY 711 IGHD6-13_2-8 IAAAG 807 IGHD3-10_2-19 YGSGS 712 IGHD6-13_2-9 GIAAAG 8 1The sequence designation is formatted as follows: (IGHD Gene Name)_(Reading Frame)-(Variant Number) *Note that the origin of certain variants is rendered somewhat arbitrary when redundant segments are deleted from the library (i.e., certain segments may have their origins with more than one parent, including the one specified in the table). - Table 19 shows the length distribution of the 278 DH segments selected according to the methods described above.
-
TABLE 19 Length Distributions of DH Segments Selected for Inclusion in the Exemplary Library Number of DH Size Occurrences 3 78 4 64 5 50 6 38 7 27 8 20 9 12 10 4 - As specified above, based on the CDRH3 numbering system defined in this application, IGHD-derived amino acids (i.e., DH segments) are numbered beginning with position 97, followed by positions 97A, 97B, etc. In the currently exemplified embodiment of the library, the shortest DH segment has three amino acids: 97, 97A and 97B, while the longest DH segment has 10 amino acids: 97, 97A, 97B, 97C, 97D, 97E, 97F, 97G, 97H and 97I.
- There are six human germline IGHJ genes. During in vivo assembly of antibody genes, these segments are progressively deleted at their 5′ end. In this exemplary embodiment of the library, IGHJ gene segments with no deletions, or with 1, 2, 3, 4, 5, 6, or 7 deletions (at the amino acid level), yielding JH segments as short as 13 amino acids, were included (Table 20). Other embodiments of the invention, in which the IGHJ gene segments are progressively deleted (at their 5′/N-terminal end) to yield 15, 14, 12, or 11 amino acids are also contemplated.
-
TABLE 20 IGHJ Gene Segments Selected for use in the Exemplary Library IGHJ SEQ ID SEQ ID Segment [H3-JH]-[FRM4]1 NO: H3-JH NO: JH1 parent or AEYFQHWGQGTLVTVSS 253 AEYFQH 17 JH1_1 JH1_2 EYFQHWGQGTLVTVSS 808 EYFQH 830 JH1_3 YFQHWGQGTLVTVSS 809 YFQH 831 JH1_4 FQHWGQGTLVTVSS 810 FQH JH1_5 QHWGQGTLVTVSS 811 QH JH2 parent or YWYFDLWGRGTLVTVSS 254 YWYFDL 18 JH2_1 JH2_2 WYFDLWGRGTLVTVSS 812 WYFDL 832 JH2_3 YFDLWGRGTLVTVSS 813 YFDL 833 JH2_4 FDLWGRGTLVTVSS 814 FDL JH2_5 DLWGRGTLVTVSS 815 DL JH3 parent or AFDVWGQGTMVTVSS 255 AFDV 19 JH3_1 JH3_2 FDVWGQGTMVTVSS 816 FDV JH3_3 DVWGQGTMVTVSS 817 DV JH3 parent or AFDIWGQGTMVTVSS 849 AFDI 852 JH3_1 JH3_2 FDIWGQGTMVTVSS 850 FDI JH3_3 DIWGQGTMVTVSS 851 DI JH4 parent or YFDYWGQGTLVTVSS 256 YFDY 20 JH4_1 JH4_2 FDYWGQGTLVTVSS 818 FDY JH4_3 DYWGQGTLVTVSS 819 DY JH5 parent or NWFDSWGQGTLVTVSS 257 NWFDS 21 JH5_1 JH5_2 WFDSWGQGTLVTVSS 820 WFDS 834 JH5_3 FDSWGQGTLVTVSS 821 FDS JH5_4 DSWGQGTLVTVSS 822 DS JHS parent or NWFDPWGQGTLVTVSS 853 NWFDP 857 JH5_1 JH5_2 WFDPWGQGTLVTVSS 854 WFDP 858 JH5_3 FDPWGQGTLVTVSS 855 FDP JH5_4 DPWGQGTLVTVSS 856 DP JH6 parent or YYYYYGMDVWGQGTTVTVSS 258 YYYYYGMDV 22 JH6_1 JH6_2 YYYYGMDVWGQGTTVTVSS 823 YYYYGMDV 835 JH6_3 YYYGMDVWGQGTTVTVSS 824 YYYGMDV 836 JH6_4 YYGMDVWGQGTTVTVSS 825 YYGMDV 837 JH6_5 YGMDVWGQGTTVTVSS 826 YGMDV 838 JH6_6 GMDVWGQGTTVTVSS 827 GMDV 839 JH6_7 MDVWGQGTTVTVSS 828 MDV JH6_8 DVWGQGTTVTVSS 829 DV 1H3-JH is defined as the portion of the IGHJ segment included within the Kabat definition of CDRH3; FRM4 is defined as the portion of the IGHJ segment encoding framework region four. - According to the CDRH3 numbering system of this application, the contribution of, for example, JH6_1 to CDRH3, would be designated by
positions 99F, 99E, 99D, 99C, 99B, 99A, 100, 101 and 102 (Y, Y, Y, Y, Y, G, M, D and V, respectively). Similarly, the JH4_3 sequence would contribute amino acid positions 101 and 102 (D and Y, respectively) to CDRH3. However, in all cases of the exemplified library, the JH segment will contribute amino acids 103 to 113 to the FRM4 region, in accordance with the standard Kabat numbering system for antibody variable regions (Kabat, op. cit. 1991). This may not be the case in other embodiments of the library. - While the consideration of V-D-J recombination enhanced by mimicry of the naturally occurring process of progressive deletion (as exemplified above) can generate enormous diversity, the diversity of the CDRH3 sequences in vivo is further amplified by non-templated addition of a varying number of nucleotides at the V-D junction and the D-J junction.
- N1 and N2 segments located at the V-D and D-J junctions, respectively, were identified in a sample containing about 2,700 antibody sequences (Jackson et al., J. Immunol. Methods, 2007, 324: 26) also analyzed by the SoDA method of Volpe et al., Bioinformatics, 2006, 22: 438-44; (both Jackson et al., and Volpe et al., are incorporated by reference in their entireties). Examination of these sequences revealed patterns in the length and composition of N1 and N2. For the construction of the currently exemplified CDRH3 library, specific short amino acid sequences were derived from the above analysis and used to generate a number of N1 and N2 segments that were incorporated into the CDRH3 design, using the synthetic scheme described herein.
- As described in the Detailed Description, certain embodiments of the invention include N1 and N2 segments with rationally designed length and composition, informed by statistical biases in these parameters that are found by comparing naturally occurring N1 and N2 segments in human antibodies. According to data compiled from human databases (see, e.g., Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety), there are an average of about 3.02 amino acid insertions for N1 and about 2.4 amino acid insertions for N2, not taking into account insertions of two nucleotides or less.
FIG. 2 shows the length distributions of the N1 and N2 regions in human antibodies. In this exemplary embodiment of the invention, N1 and N2 were fixed to a length of 0, 1, 2, or 3 amino acids. The naturally occurring composition of these sequences in human antibodies was used as a guide for the inclusion of different amino acid residues. - The naturally occurring composition of single amino acid, two amino acids, and three amino acids N1 additions is defined in Table 21, and the naturally occurring composition of the corresponding N2 additions is defined in Table 22. The most frequently occurring duplets in the N1 and N2 set are compiled in Table 23.
-
TABLE 21 Composition of Naturally Occurring 1, 2, and 3 Amino Acid N1 Additions* Position Number of Position Number of Position Number of 1 Occurrences 2 Occurrences 3 Occurrences R 251 G 97 G 101 G 249 P 67 R 66 P 173 R 67 P 47 L 130 S 42 S 47 S 117 L 39 L 38 A 84 V 33 A 33 V 62 E 24 V 28 K 61 A 21 T 27 I 55 D 18 E 24 Q 51 I 18 D 22 T 51 T 18 K 18 D 50 K 16 F 14 E 49 Y 16 I 13 F 3 H 13 W 13 H 32 F 12 N 10 N 30 Q 11 Y 10 W 28 N 5 H 8 Y 21 W 5 Q 5 M 16 C 4 C 3 C 3 M 4 M 3 1546 530 530 *Defined as the sequence C-terminal to “CARX” (SEQ ID NO: 840), or equivalent, of VH, wherein “X” is the “tail” (e.g., D, E, G, or no amino acid residue). -
TABLE 22 Composition of Naturally Occurring 1, 2, and 3 Amino Acid N2 Additions* Position Number of Position Number of Position Number of 1 Occurrences 2 Occurrences 3 Occurrences G 242 G 244 G 156 P 219 P 138 P 79 R 180 R 86 S 54 L 132 S 85 R 51 S 123 T 77 L 49 A 97 L 74 A 41 T 78 A 69 T 31 V 75 V 46 V 29 E 57 E 41 D 23 D 56 Y 38 E 23 F 54 D 36 W 23 H 54 K 30 Q 19 Q 53 F 29 F 17 I 49 W 27 Y 17 N 45 H 24 H 16 Y 40 I 23 I 11 K 35 Q 23 K 11 W 29 N 21 N 8 M 20 M 8 C 6 C 6 C 5 M 6 1644 1124 670 *Defined as the sequence C-terminal to the D segment but not encoded by IGHJ genes. -
TABLE 23 Top Twenty-Five Naturally Occurring N1 and N2 Duplets Number of Cumulative Individual Sequence Occurrences Frequency Frequency GG 17 0.037 0.037 PG 15 0.070 0.033 RG 15 0.103 0.033 PP 13 0.132 0.029 GP 12 0.158 0.026 GL 11 0.182 0.024 PT 10 0.204 0.022 TG 10 0.226 0.022 GV 9 0.246 0.020 RR 9 0.266 0.020 SG 8 0.284 0.018 RP 7 0.299 0.015 IG 6 0.312 0.013 GS 6 0.325 0.013 SR 6 0.338 0.013 PA 6 0.352 0.013 LP 6 0.365 0.013 VG 6 0.378 0.013 KG 6 0.389 0.011 GW 5 0.400 0.011 FP 5 0.411 0.011 LG 5 0.422 0.011 RS 5 0.433 0.011 TP 5 0.444 0.011 EG 5 0.455 0.011 - Analysis of the identified N1 segments, located at the Junction between V and D, revealed that the eight most frequently occurring amino acid residues were G, R, S, P, L, A, T and V (Table 21). The number of amino acid additions in the N1 segment was frequently none, one, two, or three (
FIG. 2 ). The addition of four or more amino acids was relatively rare. Therefore, in the currently exemplified embodiment of the library, the N1 segments were designed to include zero, one, two or three amino acids. However, in other embodiments, N1 segments of four, five, or more amino acids may also be utilized. G and P were always among the most commonly occurring amino acid residues in the N1 regions. Thus, in the present exemplary embodiment of the library, the N1 segments that are dipeptides are of the form GX, XG, PX, or XP, where X is any of the eight most commonly occurring amino acids listed above. Due to the fact that G residues were observed more frequently than P residues, the tripeptide members of the exemplary N1 library have the form GXG, GGX, or XGG, where X is, again, one of the eight most frequently occurring amino acid residues listed above. The resulting set of N1 sequences used in the present exemplary embodiment of the library, include the “zero” addition amounts to 59 sequences, which are listed in Table 24. -
TABLE 24 N1 Sequences Selected for Inclusion in the Exemplary Library Segment Type Sequences Number “Zero” (no addition) V segment joins directly to D segment 1 Monomers G, P, R, A, S, L, T, V 8 Dimers GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, 28 TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP Trimers GGG, CPC, CRC, GAG, CSC, CLC, CTC, CVC, PGG, 22 RCC, ACC, SCC, LCC, TGG, VCG, CCP, CCR, GGA, CCS, CCL, GGT, GGV - In accordance with the CDRH3 numbering system of the application, the sequences enumerated in Table 24 contribute the following positions to CDRH3: the monomers contribute position 96, the dimers to 96 and 96A, and the trimers to 96, 96A and 96B. In alternative embodiments, where tetramers and longer segments could be included among the N1 sequences, the corresponding numbers would go on to include 96C, and so on.
- Similarly, analysis of the identified N2 segments, located at the junction between D and J, revealed that the eight most frequently occurring amino acid residues were also G, R, S, P, L, A, T and V (Table 22). The number of amino acid additions in the N2 segment was also frequently none, one, two, or three (
FIG. 2 ). For the design of the N2 segments in the exemplary library, an expanded set of sequences was utilized. Specifically, the sequences in Table 25 were used, in addition to the 59 sequences enumerated in Table 24, for N1. -
TABLE 25 Extra Sequences in N2 Additions Segment Number Number Type Sequence New Total Monomers D, E, F, H, I, K, M, Q, W, Y 10 18 Dimers AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, 54 82 HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS Trimers AAE, AYH, DTL, EKR, ISR, NTP, 18 40 PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT - The presently exemplified embodiment of the library, therefore, contains 141 total N2 sequences, including the “zero” state. One of ordinary skill in the art will readily recognize that these 141 sequences may also be used in the N1 region, and that such embodiments are within the scope of the invention. In addition, the length and compositional diversity of the N1 and N2 sequences can be further increased by utilizing amino acids that occur less frequently than G, R, S, P, L, A, T and V, in the N1 and N2 regions of naturally occurring antibodies, and including N1 and N2 segments of four, five, or more amino acids in the library. Tables 21 to 23 and
FIG. 2 provides information about the composition and length of the N1 and N2 sequences in naturally occurring antibodies that is useful for the design of additional N1 and N2 regions which mimic the natural composition and length. - In accordance with the CDRH3 numbering system of the application, N2 sequences will begin at position 98 (when present) and extend to 98A (dimers) and 98B (trimers). Alternative embodiments may occupy positions 98C, 98D, and so on.
- When the “tail” (i.e., G/D/E/-) is considered, the CDRH3 in the exemplified library may be represented by the general formula:
-
[G/D/E/-]-[N1]-[DH]-[N2]-[H3-JH] - In the currently exemplified, non-limiting, embodiment of the library, [G/D/E/-] represents each of the four possible terminal amino acid “tails”; N1 can be any of the 59 sequences in Table 24; DH can be any of the 278 sequences in Table 18; N2 can be any of the 141 sequences in Tables 24 and 25; and H3-JH can be any of the 28 H3-JH sequences in Table 20. The total theoretical diversity or repertoire size of this CDRH3 library is obtained by multiplying the variations at each of the components, i.e., 4×59×278×141×28=2.59×108.
- However, as described in the previous examples, redundancies may be eliminated from the library. In the presently exemplified embodiment, the tail and N1 segments were combined, and redundancies were removed from the library. For example, considering the VH chassis, tail, and N1 regions, the sequence [VH_Chassis]-[G] may be obtained in two different ways: [VH_Chassis]+[G]+[nothing] or [VH_Chassis]+[nothing]+[G]. Removal of redundant sequences resulted in a total of 212 unique [G/D/E/-]-[N1] segments out of the 236 possible combinations (i.e., 4 tails×59 N1). Therefore, the actual diversity of the presently exemplified CDRH3 library is 212×278×141×28=2.11×108.
FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in this library, versus the preimmune repertoire of Lee et al. - Table 26 further illustrates specific exemplary sequences from the CDRH3 library described above, using the CDRH3 numbering system of the present application. In instances where a position is not used, the hyphen symbol (-) is included in the table instead.
-
TABLE 26 Examples of Designed CDRH3 Sequences According to the Library Exemplified in Examples 1 to 5 [Tail] [N1] [DH] [N2] 95 96 96A 96B 97 97A 97B 97C 97D 97E 97F 97G 97H 97I 98 98A 98B No. 1 G — — — Y Y Y — — — — — — — — — — No. 2 D G — — G Y C S G G S C Y S Y — — No. 3 E R — — I T I F G V — — — — G G — No. 4 — P P — V L L W F G E L L — D — — No. 5 G G S G Y Y Y G S G S Y Y N P — — No. 6 D — — — R G V I I — — — — — M — — No. 7 E S G — Y Y Y D S S G Y Y Y T G L No. 8 — S — — D Y G D Y — — — — — S I — No. 9 — P G — W F G — — — — — — — P S — No. 10 — — — — C S G G S C — — — — A Y — [H3-JH] CDRH3 99E 99D 99C 99B 99A 99 100 101 102 Length No. 1 — — — — — — — D V 6 No. 2 — — — — — F Q H 16 No. 3 — — — — — Y F D Y 14 No. 4 — — — — — — — D L 14 No. 5 — — — A E Y F Q H 21 No. 6 Y Y Y Y Y G M D V 16 No. 7 — — — — W Y F D L 21 No. 8 — — — — — — F D I 11 No. 9 — — — Y Y G M D V 13 No. 10 — — — — N W F D P 13 Sequence Identifiers: No. 1 (SEQ ID NO: 542); No. 2 (SEQ ID NO: 543); No. 3 (SEQ ID NO: 544); No. 4 (SEQ ID NO: 545); No. 5 (SEQ ID NO: 546); No. 6 (SEQ ID NO: 547); No. 7 (SEQ ID NO: 548); No. 8 (SEQ ID NO: 549); No. 9 (SEQ ID NO: 550); No. 10 (SEQ ID NO: 551). - This example describes the design of a number of exemplary VKCDR3 libraries. As specified in the Detailed Description, the actual version(s) of the VKCDR3 library made or used in particular embodiments of the invention will depend on the objectives for the use of the library. In this example the Kabat numbering system for light chain variable regions was used.
- In order to facilitate examination of patterns of occurrence, human kappa light chain sequences were obtained from the publicly available NCBI database (Appendix A). As for the heavy chain sequences (Example 2), each of the sequences obtained from the publicly available database was assigned to its closest germline gene, on the basis of sequence identity. The amino acid compositions at each position were then determined within each kappa light chain subset.
- This example describes the design of a “minimalist” VKCDR3 library, wherein the VKCDR3 repertoire is restricted to a length of nine residues. Examination of the VKCDR3 lengths of human sequences shows that a dominant proportion (over 70%) has nine amino acids within the Kabat definition of CDRL3: positions 89 through 97. Thus, the currently exemplified minimalist design considers only VKCDR3 of length nine. Examination of human kappa light chain sequences shows that there are not strong biases in the usage of IGKJ genes; there are five such IKJ genes in humans. Table 27 depicts IGKJ gene usage amongst three data sets, namely Juul et al. (Clin. Exp. Immunol., 1997, 109: 194, incorporated by reference in its entirety), Klein and Zachau (Eur. J. Immunol., 1993, 23: 3248, incorporated by reference in its entirety), and the kappa light chain data set provided in Appendix A (labeled LUA).
-
TABLE 27 IGKJ Gene Usage in Various Data Sets Gene Klein Juul LUA IGKJ1 35.0% 29.0% 29.3% IGKJ2 25.0% 23.0% 24.1% IGKJ3 7.0% 8.0% 12.1% IGKJ4 26.0% 24.0% 26.5% IGKJ5 6.0% 18.0% 8.0% - Thus, a simple combinatorial of “M” VK chassis and the 5 IGKJ genes would generate a library of size M×5. In the Kabat numbering system, for VKCDR3 of length nine, amino acid number 96 is the first encoded by the IGKJ gene. Examination of the amino acid occupying this position in human sequences showed that the seven most common residues are L, Y, R, W, F, P, and I, cumulatively accounting for about 85% of the residues found in position 96. The remaining 13 amino acids account for the other 15%. The occurrence of all 20 amino acids at position 96 is presented in Table 28.
-
TABLE 28 Occurrence of 20 Amino Acid Residues at Position 96 in Human VK Data Set Type Number Percent Cumulative L 333 22.3 22.3 Y 235 15.8 38.1 R 222 14.9 52.9 W 157 10.5 63.5 F 148 9.9 73.4 I 96 6.4 79.8 P 90 6.0 85.9 Q 53 3.6 89.4 N 39 2.6 92.0 H 31 2.1 94.1 V 21 1.4 95.5 G 20 1.3 96.8 C 14 0.9 97.8 K 7 0.5 98.3 S 6 0.4 98.7 A 5 0.3 99.0 D 5 0.3 99.3 E 5 0.3 99.7 T 5 0.3 100.0 M 0 0.0 100.0 - To determine the origins of the seven residues most commonly found in position 96, known human IGKJ amino acid sequences were examined (Table 29).
-
TABLE 29 Known Human IGKJ Amino Acid Sequences Gene Sequence SEQ ID NO: IGKJ1 WTFGQGTKVEIK 552 IGKJ2 YTFGQGTKLEIK 553 IGKJ3 FTFGPGTKVDIK 554 IGKJ4 LTFGGGTKVEIK 555 IGKJ5 ITFGQGTRLEIK 556 - Without being bound by theory, five of the seven most commonly occurring amino acids found in position 96 of rearranged human sequences appear to originate from the first amino acid encoded by each of the five human IGKJ genes, namely, W, Y, F, L, and I.
- Less evident were the origins of the P and R residues. Without being bound by theory, most of the human IGKV gene nucleotide sequences end with the sequence CC, which occurs after (i.e., 3′ to) the end of the last full codon (e.g., that encodes the C-terminal residue shown in Table 11). Therefore, regardless of which nucleotide is placed after this sequence (i.e., CCX, where X may be any nucleotide) the codon will encode a proline (P) residue. Thus, when the IGKJ gene undergoes progressive deletion (just as in the IGHJ of the heavy chain; see Example 5), the first full amino acid is lost and, if no deletions have occurred in the IGKV gene, a P residue will result.
- To determine the origin of the arginine residue at position 96, the origin of IGKJ genes in rearranged kappa light chain sequences containing R at position 96 were analyzed. The analysis indicated that R occurred most frequently at position 96 when the IGKJ gene was IGKJ1 (SEQ ID NO: 552). The germline W (
position 1; Table 29) for IGKJ1 (SEQ ID NO: 552) is encoded by TGG. Without being bound by theory, a single nucleotide change of T to C (yielding CGG) or A (yielding AGG) will, therefore, result in codons encoding Arg (R). A change to G (yielding GGG) results in a codon encoding Gly (G). R occurs about ten times more often at position 96 in human sequences than G (when the IGKJ gene is IGKJ1 (SEQ ID NO: 552)), and it is encoded by CGG more often than AGG. Therefore, without being bound by theory, C may originate from one of the aforementioned two Cs at the end of IGKV gene. However, regardless of the mechanism(s) of occurrence, R and P are among the most frequently observed amino acid types at position 96, when the length of VKCDR3 is 9. Therefore, a minimalist VKCDR3 library may be represented by the following amino acid sequence: -
[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGGGTKVEIK (SEQ ID NO: 841)]
In this sequence, VK_Chassis represents any selected VK chassis (for non-limiting examples, see Table 11), specificallyKabat residues 1 to 88 encoded by the IGKV gene. L3-VK represents the portion of the VKCDR3 encoded by the chosen IGKV gene (in this embodiment, residues 89-95). F/L/I/R/W/Y/P represents any one of amino residues F, L, I, R, W, Y, or P. In this exemplary representation, IKJ4 (minus the first residue) has been depicted. Without being bound by theory, apart from IGKJ4 (SEQ ID NO: 555) being among the most commonly used IGKJ genes in humans, the GGG amino acid sequence is expected to lead to larger conformational flexibility than any of the alternative IGKJ genes, which contain a GXG amino acid sequence, where X is an amino acid other than G. In some embodiments, it may be advantageous to produce a minimalist pre-immune repertoire with a higher degree of conformational flexibility. Considering the ten VK chassis depicted in Table 11, one implementation of the minimalist VKCDR3 library would have 70 members resulting from the combination of 10 VK chassis by 7 junction (position 96) options and one IGKJ-derived sequence (e.g., IGKJ4 (SEQ ID NO: 555)). Although this embodiment of the library has been depicted using IGKJ4 (SEQ ID NO: 555), it is possible to design a minimalist VKCDR3 library using one of the other four IGKJ sequences. For example, another embodiment of the library may have 350 members (10 VK chassis by 7 junctions by 5 IGKJ genes). - One of ordinary skill in the art will readily recognize that one or more minimalist VKCDR3 libraries may be constructed using any of the IGKJ genes. Using the notation above, these minimalist VKCDR3 libraries may have sequences represented by, for example:
-
JK1: [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTKVEIK (SEQ ID NO: 528)]; JK2: [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTKLEIK (SEQ ID NO: 842]; JK3: [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGPGTKVDIK (SEQ ID NO: 843] and JK5: [VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[TFGQGTRLEIK (SEQ ID NO: 844]. - In this example, the nine residue VKCDR3 repertoire described in Example 6.1 is expanded to include VKCDR3 lengths of eight and ten residues. Moreover, while the previously enumerated VKCDR3 library included the VK chassis and portions of the IGKJ gene not contributing to VKCDR3, the presently exemplified version focuses only on residues comprising a portion of VKCDR3. This embodiment may be favored, for example, when recombination with a vector which already contains VK chassis sequences and constant region sequences is desired.
- While the dominant length of VKCDR3 sequences in humans is nine amino acids, other lengths appear at measurable rates that cumulatively approach almost 30% of kappa light chain sequences. In particular, VKCDR3 of
lengths FIG. 3 ). Thus, a more complex VKCDR3 library includes CDR lengths of 8 to 10 amino acids; this library accounts for over 95% of the length distribution observed in typical collections of human VKCDR3 sequences. This library also enables the inclusion of additional variation outside of the junction between the VK and JK genes. The present example describes such a library. The library comprises 10 sub-libraries, each designed around one of the 10 exemplary VK chassis depicted in Table 11. Clearly, the approach exemplified here can be generalized to consider M different chassis, where M may be less than or more than 10. - To characterize the variability within the polypeptide segment occupying Kabat positions 89 to 95, human kappa light chain sequence collections derived from each of the ten germline sequences of Example 3 were aligned and compared separately (i.e., within the germline group). This analysis enabled us to discern the patterns of sequence variation at each individual position in each kappa light chain sequence, grouped by germline. The table below shows the results for sequences derived from IGKV1-39 (SEQ ID NO: 233).
-
TABLE 30 Percent Occurrence of Amino Acid Types in IGKV1-39-Derived Sequences Amino Acid P89 P90 P91 P92 P93 P94 P95 A 0 0 1 0 0 4 1 C 0 0 0 0 0 0 0 D 0 0 1 1 3 0 0 E 0 1 0 0 0 0 0 F 0 0 0 5 0 2 0 G 0 0 2 1 2 0 0 H 1 1 0 4 0 0 0 I 0 0 1 0 4 5 1 K 0 0 0 1 2 0 0 L 3 0 0 1 1 3 7 M 0 0 0 0 0 1 0 N 0 0 3 2 6 2 0 P 0 0 0 0 0 4 85 Q 96 97 0 0 0 0 0 R 0 0 0 0 5 0 2 S 0 0 80 4 65 6 3 T 0 0 9 0 10 65 1 V 0 0 0 0 0 1 1 W 0 0 0 0 0 0 0 Y 0 0 2 80 0 3 0 - For example, at position 89, two amino acids, Q and L, account for about 99% of the observed variability, and thus in the currently exemplified library (see below), only Q and L were included in position 89. In larger libraries, of course, additional, less frequently occurring amino acid types (e.g., H), may also be included.
- Similarly, at position 93 there is more variation, with amino acid types S, T, N, R and I being among the most frequently occurring. The currently exemplified library thus aimed to include these five amino acids at position 93, although clearly others could be included in more diverse libraries. However, because this library was constructed via standard chemical oligonucleotide synthesis, one is bound by the limits of the genetic code, so that the actual amino acid set represented at position 93 of the exemplified library consists of S, T, N, R, P and H, with P and H replacing I (see exemplary 9 residue VKCDR3 in Table 32, below). This limitation may be overcome by using codon-based synthesis of oligonucleotides, as described in Example 6.3, below. A similar approach was followed at the other positions and for the other sequences: analysis of occurrences of amino acid type per position, choice from among most frequently occurring subset, followed by adjustment as dictated by the genetic code.
- As indicated above, the library employs a practical and facile synthesis approach using standard oligonucleotide synthesis instrumentation and degenerate oligonucleotides. To facilitate description of the library, the IUPAC code for degenerate nucleotides, as given in Table 31, will be used.
-
TABLE 31 Degenerate Base Symbol Definition IUPAC Symbol Base Pair Composition A A (100%) C C (100%) G G (100%) T T (100%) R A (50%) G (50%) Y C (50%) T (50%) W A (50%) T (50%) S C (50%) G (50%) M A (50%) C (50%) K G (50%) T (50%) B C (33%) G (33%) T (33%) (*) D A (33%) G (33%) T (33%) H A (33%) C (33%) T (33%) V A (33%) C (33%) G (33%) N A (25%) C (25%) G (25%) T (25%) (*) 33% is short hand here for 1/3 (i.e., 33.3333 . . . %) - Using the VK1-39 chassis with VKCDR3 of length nine as an example, the VKCDR3 library may be represented by the following four oligonucleotides (left column in Table 32), with the corresponding amino acids encoded at each position of CDRL3 (Kabat numbering) provided in the columns on the right.
-
TABLE 32 Exemplary Oligonucleotides Encoding a VK1-39 CDR3 Library Oligonucleotide Sequence 89 90 91 92 93 94 95 95A 96 97 CWGSAAWCATHCMVTABTCCTTWCACT LQ EQ ST FSY HNPRST IST P — FY T (SEQ ID NO: 307) CWGSAAWCATHCMVTABTCCTMTCACT LQ EQ ST FSY HNPRST IST P — IL T (SEQ ID NO: 308) CWGSAAWCATHCMVTABTCCTWGGACT LQ EQ ST FSY HNPRST IST P — WR T (SEQ ID NO: 309) CWGSAAWCATHCMVTABTCCTCBTACT LQ EQ ST FSY HNPRST IST P PLR — T (SEQ ID NO: 310) - For example, the first codon (CWG) of the first nucleotide of Table 32, corresponding to Kabat position 89, represents 50% CTG and 50% CAG, which encode Leu (L) and Gln (Q), respectively. Thus, the expressed polypeptide would be expected to have L and Q each about 50% of the time. Similarly, for Kabat position 95A of the fourth oligonucleotide, the codon CBT represents ⅓ each of CCT, CGT and CTT, corresponding in turn to ⅓ each of Pro (P), Leu (L) and Arg (R) upon translation. By multiplying the number of options available at each position of the peptide sequence, one can obtain the complexity, in peptide space, contributed by each oligonucleotide. For the VK1-39 example above, the numbers are 864 for the first three oligonucleotides and 1,296 for the fourth oligonucleotide. Thus, the oligonucleotides encoding VK1-39 CDR3s of length nine contribute 3,888 members to the library. However, as shown in Table 32, sequences with L or R at position 95A (when position 96 is empty) are identical to those with L or R at position 96 (and 95A empty). Therefore, the 3,888 number overestimates the LR contribution and the actual number of unique members is slightly lower, at 3,024. As depicted in Table 33, for the complete list of oligonucleotides that represent VKCDR3 of
sizes size 9 VKCDR3. -
TABLE 33 Degenerate Oligonucleotides Encoding an Exemplary VKCDR3 Library Junc- tion SEQ CDRL3 Type Degenerate ID Chassis Length (1) Oligonucleotide NO: 89 90 91 92 93 94 95 95A 96 97 VK1-5 8 1 CASCASTMCVRTRSTT 259 HQ HQ SY DGHNRS AGST FY — — FY T WCTWCACT VK1-5 8 2 CASCASTMCVRTRSTT 260 HQ HQ SY DGHNRS AGST FY — — IL T WCMTCACT VK1-5 8 3 CASCASTMCVRTRSTT 261 HQ HQ SY DGHNRS AGST FY — — WR T WCWGGACT VK1-5 8 4 CASCASTMCVRTRSTT 262 HQ HQ SY DGHNRS AGST FY PS — — T WCYCTACT VK1-5 9 1 CASCASTMCVRTRSTT 263 HQ HQ SY DGHNRS AGST FY PS — FY T WCYCTTWCACT VK1-5 9 2 CASCASTMCVRTRSTT 264 HQ HQ SY DGHNRS AGST FY PS — IL T WCYCTMTCACT VK1-5 9 3 CASCASTMCVRTRSTT 265 HQ HQ SY DGHNRS AGST FY PS — WR T WCYCTWGGACT VK1-5 9 4 CASCASTMCVRTRSTT 266 HQ HQ SY DGHNRS AGST FY PS PS — T WCYCTYCTACT VK1-5 10 1 CASCASTMCVRTRSTT 267 HQ HQ SY DGHNRS AGST FY PS PLR FY T WCYCTCBTTWCACT VK1-5 10 2 CASCASTMCVRTRSTT 268 HQ HQ SY DGHNRS AGST FY PS PLR IL T WCYCTCBTMTCACT VK1-5 0 3 CASCASTMCVRTRSTT 269 HQ HQ SY DGHNRS AGST FY PS PLR WR T WCYCTCBTWGGACT VK1-12 8 1 CASCASDCTRVCARTT 270 HQ HQ AST ADGNST NS FL — — FY T TSTWCACT VK1-12 8 2 CASCASDCTRVCARTT 271 HQ HQ AST ADGNST NS FL — — IL T TSMTCACT VK1-12 8 3 CASCASDCTRVCARTT 272 HQ HQ AST ADGNST NS FL — — WR T TSWGGACT VK1-12 8 4 CASCASDCTRVCARTT 273 HQ HQ AST ADGNST NS FL P — — T TSCCTACT VK1-12 9 1 CASCASDCTRVCARTT 274 HQ HQ AST ADGNST NS FL P — FY T TSCCTTWCACT VK1-12 9 2 CASCASDCTRVCARTT 275 HQ HQ AST ADGNST NS FL P — IL T TSCCTMTCACT VK1-12 9 3 CASCASDCTRVCARTT 276 HQ HQ AST ADGNST NS FL P — WR T TSCCTWGGACT VK1-12 9 4 CASCASDCTRVCARTT 277 HQ HQ AST ADGNST NS FL P PLR — T TSCCTCBTACT VK1-12 10 1 CASCASDCTRVCARTT 278 HQ HQ AST ADGNST NS FL P PLR FY T TSCCTCBTTWCACT VK1-12 10 2 CASCASDCTRVCARTT 279 HQ HQ AST ADGNST NS FL P PLR IL T TSCCTCBTMTCACT VK1-12 0 3 CASCASDCTRVCARTT 280 HQ HQ AST ADGNST NS FL P PLR WR T TSCCTCBTWGGACT VK1-27 8 1 CASMAGTWCRRTASKG 281 HQ KQ FY DGNS RST AGV — — FY T BATWCACT VK1-27 8 2 CASMAGTWCRRTASKG 282 HQ KQ FY DGNS RST AGV — — IL T BAMTCACT VK1-27 8 3 CASMAGTWCRRTASKG 283 HQ KQ FY DGNS RST AGV — — WR T BAWGGACT VK1-27 8 4 CASMAGTWCRRTASKG 284 HQ KQ FY DGNS RST AGV P — — T BACCTACT VK1-27 9 1 CASMAGTWCRRTASKG 285 HQ KQ FY DGNS RST AGV P — FY T BACCTTWCACT VK1-27 9 2 CASMAGTWCRRTASKG 286 HQ KQ FY DGNS RST AGV P — IL T BACCTMTCACT VK1-27 9 3 CASMAGTWCRRTASKG 287 HQ KQ FY DGNS RST AGV P — WR T BACCTWGGACT VK1-27 9 4 CASMAGTWCRRTASKG 288 HQ KQ FY DGNS RST AGV P PLR — T BACCTCBTACT VK1-27 10 1 CASMAGTWCRRTASKG 289 HQ KQ FY DGNS RST AGV P PLR FY T BACCTCBTTWCACT VK1-27 10 2 CASMAGTWCRRTASKG 290 HQ KQ FY DGNS RST AGV P PLR IL T BACCTCBTMTCACT VK1-27 10 3 CASMAGTWCRRTASKG 291 HQ KQ FY DGNS RST AGV P PLR WR T BACCTCBTWGGACT VK1-33 8 1 CASCWTTMCRATRVCB 292 HQ HL SY DN ADGNS DFH — — FY T WTTWCACT T LVY VK1-33 8 2 CASCWTTMCRATRVCB 293 HQ HL SY DN ADGNS DFH — — IL T WTMTCACT T LVY VK1-33 8 3 CASCWTTMCRATRVCB 294 HQ HL SY DN ADGNS DFH — — WR T WTWGGACT T LVY VK1-33 8 4 CASCWTTMCRATRVCB 295 HQ HL SY DN ADGNS DFH P — — T WTCCTACT T LVY VK1-33 9 1 CASCWTTMCRATRVCB 296 HQ HL SY DN ADGNS DFH P — FY T WTCCTTWCACT T LVY VK1-33 9 2 CASCWTTMCRATRVCB 297 HQ HL SY DN ADGNS DFH P — IL T WTCCTMTCACT T LVY VK1-33 9 3 CASCWTTMCRATRVCB 298 HQ HL SY DN ADGNS DFH P — WR T WTCCTWGGACT T LVY VK1-33 9 4 CASCWTTMCRATRVCB 299 HQ HL SY DN ADGNS DFH P PLR — T WTCCTCBTACT T LVY VK1-33 10 1 CASCWTTMCRATRVCB 300 HQ HL SY DN ADGNS DFH P PLR FY T WTCCTCBTTWCACT T LVY VK1-33 10 2 CASCWTTMCRATRVCB 301 HQ HL SY DN ADGNS DFH P PLR IL T WTCCTCBTMTCACT T LVY VK1-33 10 3 CASCWTTMCRATRVCB 302 HQ HL SY DN ADGNS DFH P PLR WR T WTCCTCBTWGGACT T LVY VK1-39 8 1 CWGSAAWCATHCMVTA 303 LQ EQ ST FSY HNPRS IST — — FY T BTTWCACT T VK1-39 8 2 CWGSAAWCATHCMVTA 304 LQ EQ ST FSY HNPRS IST — — IL T BTMTCACT T VK1-39 8 3 CWGSAAWCATHCMVTA 305 LQ EQ ST FSY HNPRS IST — — WR T BTWGGACT T VK1-39 8 4 CWGSAAWCATHCMVTA 306 LQ EQ ST FSY HNPRS IST P — — T BTCCTACT T VK1-39 9 1 CWGSAAWCATHCMVTA 307 LQ EQ ST FSY HNPRS IST P — FY T BTCCTTWCACT T VK1-39 9 2 CWGSAAWCATHCMVTA 308 LQ EQ ST FSY HNPRS IST P — IL T BTCCTMTCACT T VK1-39 9 3 CWGSAAWCATHCMVTA 309 LQ EQ ST FSY HNPRS IST P — WR T BTCCTWGGACT T VK1-39 9 4 CWGSAAWCATHCMVTA 310 LQ EQ ST FSY HNPRS IST P PLR — T BTCCTCBTACT T VK1-39 10 1 CWGSAAWCATHCMVTA 311 LQ EQ ST FSY HNPRS IST P PLR FY T BTCCTCBTTWCACT T VK1-39 10 2 CWGSAAWCATHCMVTA 312 LQ EQ ST FSY HNPRS IST P PLR IL T BTCCTCBTMTCACT T VK1-39 10 3 CWGSAAWCATHCMVTA 313 LQ EQ ST FSY HNPRS IST P PLR WR T BTCCTCBTWGGACT T VK3-11 8 1 CASCASAGWRGKRVCT 314 HQ HQ RS GRS ADGNS SW — — FY T SGTWCACT T VK3-11 8 2 CASCASAGWRGKRVCT 315 HQ HQ RS GRS ADGNS SW — — IL T SGMTCACT T VK3-11 8 3 CASCASAGWRGKRVCT 316 HQ HQ RS GRS ADGNS SW — — WR T SGWGGACT T VK3-11 8 4 CASCASAGWRGKRVCT 317 HQ HQ RS GRS ADGNS SW P — — T SGCCTACT T VK3-11 9 1 CASCASAGWRGKRVCT 318 HQ HQ RS GRS ADGNS SW P — FY T SGCCTTWCACT T VK3-11 9 2 CASCASAGWRGKRVCT 319 HQ HQ RS GRS ADGNS SW P — IL T SGCCTMTCACT T VK3-11 9 3 CASCASAGWRGKRVCT 320 HQ HQ RS GRS ADGNS SW P — WR T SGCCTWGGACT T VK3-11 9 4 CASCASAGWRGKRVCT 321 HQ HQ RS GRS ADGNS SW P PLR — T SGCCTCBTACT T VK3-11 10 1 CASCASAGWRGKRVCT 322 HQ HQ RS GRS ADGNS SW P PLR FY T SGCCTCBTTWCACT T VK3-11 10 2 CASCASAGWRGKRVCT 323 HQ HQ RS GRS ADGNS SW P PLR IL T SGCCTCBTMTCACT T VK3-11 10 3 CASCASAGWRGKRVCT 324 HQ HQ RS GRS ADGNS SW P PLR WR T SGCCTCBTWGGACT T VK3-15 8 1 CASCASTMCVRTRRKT 325 HQ HQ SY DGHNRS DEGKN W — — FY T GGTWCACT RS VK3-15 8 2 CASCASTMCVRTRRKT 326 HQ HQ SY DGHNRS DEGKN W — — IL T GGMTCACT RS VK3-15 8 3 CASCASTMCVRTRRKT 327 HQ HQ SY DGHNRS DEGKN W — — WR T GGWGGACT RS VK3-15 8 4 CASCASTMCVRTRRKT 328 HQ HQ SY DGHNRS DEGKN W P — — T GGCCTACT RS VK3-15 9 1 CASCASTMCVRTRRKT 329 HQ HQ SY DGHNRS DEGKN W P — FY T GGCCTTWCACT RS VK3-15 9 2 CASCASTMCVRTRRKT 330 HQ HQ SY DGHNRS DEGKN W P — IL T GGCCTMTCACT RS VK3-15 9 3 CASCASTMCVRTRRKT 331 HQ HQ SY DGHNRS DEGKN W P — WR T GGCCTWGGACT RS VK3-15 9 4 CASCASTMCVRTRRKT 332 HQ HQ SY DGHNRS DEGKN W P PLR — T GGCCTCBTACT RS VK3-15 10 1 CASCASTMCVRTRRKT 333 HQ HQ SY DGHNRS DEGKN W P PLR FY T GGCCTCBTTWCACT RS VK3-15 10 2 CASCASTMCVRTRRKT 334 HQ HQ SY DGHNRS DEGKN W P PLR IL T GGCCTCBTMTCACT RS VK3-15 10 3 CASCASTMCVRTRRKT 335 HQ HQ SY DGHNRS DEGKN W P PLR WR T GGCCTCBTWGGACT RS VK3-20 8 1 CASCASTWCGRTRVKK 336 HQ HQ FY DG ADEGK AS — — FY T CATWCACT NRST VK3-20 8 2 CASCASTWCGRTRVKK 337 HQ HQ FY DG ADEGK AS — — IL T CAMTCACT NRST VK3-20 8 3 CASCASTWCGRTRVKK 338 HQ HQ FY DG ADEGK AS — — WR T CAWGGACT NRST VK3-20 8 4 CASCASTWCGRTRVKK 339 HQ HQ FY DG ADEGK AS P — — T CACCTACT NRST VK3-20 9 1 CASCASTWCGRTRVKK 340 HQ HQ FY DG ADEGK AS P — FY T CACCTTWCACT NRST VK3-20 9 2 CASCASTWCGRTRVKK 341 HQ HQ FY DG ADEGK AS P — IL T CACCTMTCACT NRST VK3-20 9 3 CASCASTWCGRTRVKK 342 HQ HQ FY DG ADEGK AS P — WR T CACCTWGGACT NRST VK3-20 9 4 CASCASTWCGRTRVKK 343 HQ HQ FY DG ADEGK AS P PLR — T CACCTCBTACT NRST VK3-20 10 1 CASCASTWCGRTRVKK 344 HQ HQ FY DG ADEGK AS P PLR FY T CACCTCBTTWCACT NRST VK3-20 10 2 CASCASTWCGRTRVKK 345 HQ HQ FY DG ADEGK AS P PLR IL T CACCTCBTMTCACT NRST VK3-20 10 3 CASCASTWCGRTRVKK 346 HQ HQ FY DG ADEGK AS P PLR WR T CACCTCBTWGGACT NRST VK2-28 8 1 ATGCASRBTCKTSASA 347 M HQ AGI LR DEHQ IST — — FY T BTTWCACT STV VK2-28 8 2 ATGCASRBTCKTSASA 348 M HQ AGI LR DEHQ IST — — IL T BTMTCACT STV VK2-28 8 3 ATGCASRBTCKTSASA 349 M HQ AGI LR DEHQ IST — — WR T BTWGGACT STV VK2-28 8 4 ATGCASRBTCKTSASA 350 M HQ AGI LR DEHQ IST P — — T BTCCTACT STV VK2-28 9 1 ATGCASRBTCKTSASA 351 M HQ AGI LR DEHQ IST P — FY T BTCCTTWCACT STV VK2-28 9 2 ATGCASRBTCKTSASA 352 M HQ AGI LR DEHQ IST P — IL T BTCCTMTCACT STV VK2-28 9 3 ATGCASRBTCKTSASA 353 M HQ AGI LR DEHQ IST P — WR T BTCCTWGGACT STV VK2-28 9 4 ATGCASRBTCKTSASA 354 M HQ AGI LR DEHQ IST P PLR — T BTCCTCBTACT STV VK2-28 10 1 ATGCASRBTCKTSASA 355 M HQ AGI LR DEHQ IST P PLR FY T BTCCTCBTTWCACT STV VK2-28 10 2 ATGCASRBTCKTSASA 356 M HQ AGI LR DEHQ IST P PLR IL T BTCCTCBTMTCACT STV VK2-28 10 3 ATGCASRBTCKTSASA 357 M HQ AGI LR DEHQ IST P PLR WR T BTCCTCBTWGGACT STV VK4-1 8 1 CASCASTWCTWCRVCA 358 HQ HQ FY FY ADGNS IST — — FY T BTTWCACT T VK4-1 8 2 CASCASTWCTWCRVCA 359 HQ HQ FY FY ADGNS IST — — IL T BTMTCACT T VK4-1 8 3 CASCASTWCTWCRVCA 360 HQ HQ FY FY ADGNS IST — — WR T BTWGGACT T VK4-1 8 4 CASCASTWCTWCRVCA 361 HQ HQ FY FY ADGNS IST P — — T BTCCTACT T VK4-1 9 1 CASCASTWCTWCRVCA 362 HQ HQ FY FY ADGNS IST P — FY T BTCCTTWCACT T VK4-1 9 2 CASCASTWCTWCRVCA 363 HQ HQ FY FY ADGNS IST P — IL T BTCCTMTCACT T VK4-1 9 3 CASCASTWCTWCRVCA 364 HQ HQ FY FY ADGNS IST P — WR T BTCCTWGGACT T VK4-1 9 4 CASCASTWCTWCRVCA 365 HQ HQ FY FY ADGNS IST P PLR — T BTCCTCBTACT T CASCASTWCTWCRVCA 366 HQ HQ FY FY ADGNS IST P PLR FY T VK4-1 10 1 BTCCTCBTTWCACT T CASCASTWCTWCRVCA 367 HQ HQ FY FY ADGNS IST P PLR IL T VK4-1 10 2 BTCCTCBTMTCACT T VK4-1 10 3 CASCASTWCTWCRVCA 368 HQ HQ FY FY ADGNS IST P PLR WR T [Altern- BTCCTCBTWGGACT T ative for VK1- 33] (2) VK1-33 8 1 CASCWATMCRATRVCB 369 HQ QL SY DN ADGNS DFH — — FY T WTTWCACT T LVY VK1-33 8 2 CASCWATMCRATRVCB 370 HQ QL SY DN ADGNS DFH — — IL T WTMTCACT T LVY VK1-33 8 3 CASCWATMCRATRVCB 371 HQ QL SY DN ADGNS DFH — — WR T WTWGGACT T LVY VK1-33 8 4 CASCWATMCRATRVCB 372 HQ QL SY DN ADGNS DFH P — — T WTCCTACT T LVY VK1-33 9 1 CASCWATMCRATRVCB 373 HQ QL SY DN ADGNS DFH P — FY T WTCCTTWCACT T LVY VK1-33 9 2 CASCWATMCRATRVCB 374 HQ QL SY DN ADGNS DFH P — IL T WTCCTMTCACT T LVY VK1-33 9 3 CASCWATMCRATRVCB 375 HQ QL SY DN ADGNS DFH P — WR T WTCCTWGGACT T LVY VK1-33 9 4 CASCWATMCRATRVCB 376 HQ QL SY DN ADGNS DFH P PLR — T WTCCTCBTACT T LVY VK1-33 10 1 CASCWATMCRATRVCB 377 HQ QL SY DN ADGNS DFH P PLR FY T WTCCTCBTTWCACT T LVY VK1-33 10 2 CASCWATMCRATRVCB 378 HQ QL SY DN ADGNS DFH P PLR IL T WTCCTCBTMTCACT T LVY VK1-33 10 3 CASCWATMCRATRVCB 379 HQ QL SY DN ADGNS DFH P PLR WR T WTCCTCBTWGGACT T LVY (1)Junction type 1 has position 96 as FY, type 2 as IL, type 3 as RW, and type 4 has a deletion. (2)Two embodiments are shown for the VK1-33 library. In one embodiment, the second codon was CWT. In another embodiment, it was CWA or CWG. - This example demonstrates how a more faithful representation of amino acid variation at each position may be obtained by using a codon-based synthesis approach (Virnekas et al. Nucleic Acids Res., 1994, 22: 5600). This synthetic scheme also allows for finer control of the proportions of particular amino acids included at a position. For example, as described above for the VK1-39 sequences, position 89 was designed as 50% Q and 50% L; however, as Table 30 shows, Q is used much more frequently than L. The more complex VKCDR3 libraries of the present example account for the different relative occurrence of Q and L, for example, 90% Q and 10% L. Such control is better exercised within codon-based synthetic schemes, especially when multiple amino acid types are considered.
- This example also describes an implementation of a codon-based synthetic scheme, using the ten VK chassis described in Table 11. Similar approaches, of course, can be implemented with more or fewer such chassis. As indicated in the Detailed Description, a unique aspect of the design of the present libraries, as well as those of the preceding examples, is the germline or chassis-based aspect, which is meant to preserve more of the integrity and variation of actual human kappa light chain sequences. This is in contrast to other codon-based synthesis or degenerate oligonucleotide synthesis approaches that have been described in the literature and that aim to produce “one-size-fits-all” (e.g., consensus) kappa light chain libraries (e.g., Knappik, et al., J Mol Biol, 2000, 296: 57; Akamatsu et al., J Immunol, 1993, 151: 4651).
- With reference to Table 30, obtained for VK1-39, one can thus design the length nine VKCDR3 library of Table 34. Here, for practical reasons, the proportions at each position are denoted in multiples of five percentage points. As better synthetic schemes are developed, finer resolution may be obtained—for example to resolutions of one, two, three, or four percent.
-
TABLE 34 Amino Acid Composition (%) at Each VKCDR3 Position for VK1-39 Library With CDR Length of Nine Residues Amino 96 97 Acid 89 90 91 92 93 94 95 (*) (*) A 5 5 D 5 5 E 5 5 F 5 10 G 5 5 5 5 H 5 5 5 5 I 5 5 K 5 L 10 5 10 20 M N 0 0 5 0 5 P 5 85 5 Q 85 90 5 R 5 5 10 S 80 5 60 5 5 T 10 10 65 90 V 5 W 15 Y 5 75 5 15 Number 3 3 4 6 8 8 3 11 3 Different (*) The composition of positions 96 and 97, determined largely by junction and IGKJ diversity, could be the same for length 9 VK CDR3 of all chassis. - The library of Table 34 would have 1.37×106 unique polypeptide sequences, calculated by multiplying together the numbers in the bottom row of the table.
- The underlined 0 entries for Asn (N) at certain positions represent regions where the possibility of having N-linked glycosylation sites in the VKCDR3 has been minimized or eliminated. Peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P, may undergo post-translational modification in a number of expression systems, including yeast and mammalian cells. Moreover, the nature of such modification depends on the specific cell type and, even for a given cell type, on culture conditions. N-linked glycosylation may be disadvantageous when it occurs in a region of the antibody molecule likely to be involved in antigen binding (e.g., a CDR), as the function of the antibody may then be influenced by factors that may be difficult to control. For example, considering position 91 above, one can observe that position 92 is never P. Position 94 is not P in 95% of the cases. However, position 93 is S or T in 75% (65+10) of the cases. Thus, allowing N at position 91 would generate the undesirable motif N-X-(T/S)-Z (with both X and Z distinct from P), and a zero occurrence has therefore been implemented, even though N is observed with some frequency in actual human sequences (see Table 30). A similar argument applies for N at positions 92 and 94. It should be appreciated, however, that if the antibody library were to be expressed in a system incapable of N-linked glycosylation, such as bacteria, or under culture conditions in which N-linked glycosylation did not occur, this consideration may not apply. However, even in the event that the organism used to express libraries with potential N-linked glycosylation sites is incapable of N-linked glycosylation (e.g., bacteria), it may still be desirable to avoid N-X-(S/T) sequences, as the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity. These embodiments are also included within the scope of the invention. To our knowledge, VKCDR3 libraries known in the art have not considered this effect, and thus a proportion of their members may have the undesirable qualities mentioned above.
- We also designed additional sub-libraries, related to the library outlined in Table 34, for VKCDR3 of
lengths positions 95 and 95A, the latter being defined for VKCDR3 oflength 10 only, are illustrated in Table 35. -
TABLE 35 Amino Acid Composition (%) for VK1-39 Libraries of Lengths Position Position Position 95- 95- 95A- Amino Length Length Length Acid 8 (*) 10 (**) 10 A D E F 5 G 5 H I 10 5 K L 20 10 10 M N P 25 85 60 Q R 10 5 10 S 5 5 T 5 V 5 W 10 Y 10 Number 9 3 8 Different (*) Position 96 is deleted in VKCDR3 of size 8.(**) This is the same composition as in VKCDR3 of size 9. - The total number of unique members in the VK1-39 library of
length 8, thus, can be obtained as before, and is 3.73×105 (or, 3×3×4×6×8×8×9×3). Similarly, the complexity of the VK1-39 library oflength 10 would be 10.9×106 (or 8 times that of the library ofsize 9, as there is additional 8-fold variation at the insertion position 95A). Thus, there would be a total of 12.7×106 unique members in the overall VK1-39 library, as obtained by summing the number of unique members for each of the specified lengths. In certain embodiments of the invention, it may be preferable to create the individual sub-libraries oflengths FIG. 3 ). The present invention provides the compositions and methods for one of ordinary skill synthesizing VKCDR3 libraries corresponding to other VK chassis. - This example describes the design of a minimalist VλCDR3 library. The principles used in designing this library (or more complex Vλ libraries) are similar to those used to design the VKCDR3 libraries. However, unlike the VK genes, the contribution of the IgλV segment to CDRL3 is not constrained to a fixed number of amino acids. Therefore, length variation may be obtained in a minimalist VλCDR3 library even when only considering combinations between Vλ chassis and Jλ sequences.
- Examination of the VλCDR3 lengths of human sequences shows that lengths of 9 to 12 account for almost about 95% of sequences, and lengths of 8 to 12 account for about 97% of sequences (
FIG. 4 ). Table 36 shows the usage (percent occurrence) of the six known IGλJ genes in the rearranged human lambda light chain sequences compiled from the NCBI database (see Appendix B), and Table 37 shows the sequences encoded by the genes. -
TABLE 36 IGλJ Gene Usage in the Lambda Light Chain Sequences Compiled from the NCBI Database (see Appendix B) Gene_Allele LUA Jλ1_01 20.2% Jλ2_01 42.2% Jλ3_02 36.2% Jλ6_01 0.6% Jλ7_01 0.9% -
TABLE 37 Observed Human IGλJ Amino Acid Sequences Gene Sequence SEQ ID NO: IGλJ1-01 YVFGTGTKVTVL 557 IGλJ2-01 VVFGGGTKLTVL 558 IGλJ3-01 WVFGGGTKLTVL 559 IGλJ3-02 VVFGGGTKLTVL 560 IGλJ6-01 NVFGSGTKVTVL 561 IGλJ7-01 AVFGGGTQLTVL 562 IGλJ7-02 AVFGGGTQLTAL 563 - IGλJ3-01 and IGλJ7-02 are not represented among the sequences that were analyzed; therefore, they were not included in Table 36. As illustrated in Table 36, IGλJ1-01, IGλJ2-01, and IGλJ3-02 are over-represented in their usage, and have thus been bolded in Table 37. In some embodiments of the invention, for example, only these three over-represented sequences may be utilized. In other embodiments of the invention, one may use all six segments, any 1, 2, 3, 4, or 5 of the 6 segments, or any combination thereof may be utilized.
- As shown in Table 14, the portion of CDRL3 contributed by the IGλV gene segment is 7, 8, or 9 amino acids. The remainder of CDRL3 and FRM4 are derived from the IGλJ sequences (Table 37). The IGλJ sequences contribute either one or two amino acids to CDRL3. If two amino acids are contributed by IGλJ, the contribution is from the N-terminal two residues of the IGλJ segment: YV (IGλJ1-01), VV (IGλJ2-01), WV (IGλJ3-01), VV (IGλJ3-02), or AV (IGλJ7-01 and IGλJ7-02). If one amino acid is contributed from IGλJ, it is a V residue, which is formed after the deletion of the N-terminal residue of a IGλJ segment.
- In this non-limiting exemplary embodiment of the invention, the FRM4 segment was fixed as FGGGTKLTVL, corresponding to IGλJ2-01 and IGλJ3-02 (i.e., portions of SEQ ID NOs: 558 and 560).
- Seven of the 11 selected chassis (Vλ1-40 (SEQ ID NO: 531), Vλ3-19 (SEQ ID NO: 536), Vλ3-21 (SEQ ID NO: 537), Vλ6-57 (SEQ ID NO: 539), Vλ1-44 (SEQ ID NO: 532), Vλ1-51 (SEQ ID NO: 533), and Vλ4-69 (SEQ ID NO: 538)) have an additional two nucleotides following the last full codon. In four of those seven cases, analysis of the data set provided in Appendix B showed that the addition of a single nucleotide (i.e. without being limited by theory, via the activity of TdT) lead to a further increase in CDRL3 length. This effect can be considered by introducing variants for the L3-Vλ sequences contributed by these four IGλV sequences (Table 38).
-
TABLE 38 Variants with an additional residue in CDRL3 CDR3/ SEQ ID Name Locus FRM1 CDR1 FRM2 CDR2 FRM3 L3-Vλ NO: 1E+ IGVλ1- QSVLTQPPSVSGAPG TGSSSNIGAG WYQQLPGTAP YGN---- GVPDRFSGSKSG-- QSYDSSL 564 40+ QRVTISC YD---VH KLLI SNRPS TSASLAITGLQAEDEADYYC SG S 3L+ IGVλ3- SSELTQDPAVSVALG QGDSLRSYY- WYQQKPGQAP YGK---- GIPDRFSGSSSG-- NSRDSSG 565 19+ QTVRITC -----AS VLVI NNRPS NTASLTITGAQAEDEADYYC NH H/Q 3H+ IGVλ3- SYVLTQPPSVSVAPG GGNNIGSKS- WYQQKPGQAP YYD---- GIPERFSGSNSG-- QVWDSSS 566 21+ KTARITC -----VH VLVI SDRPS NTATLTISRVEAGDEADYYC DH P 6A+ IGVλ6- NFMLTQPHSVSESPG TRSSGSIASN WYQQRPGSSP YED---- GVPDRFSGSIDSSSNSASLT QSYDSSN 567 57+ KTVTISC Y----VQ TTVI NQRPS ISGLKTEDEADYYC H/Q -- (+)sequences are derived from their parents by the addition of an amino acid at the end of the respective CDR3 (bold underlined). H/Q can be introduced in a single sequence by use of the degenerate codon CAW or similar.
Thus, the final set of chassis in the currently exemplified embodiment of the invention is 15: eleven contributed by the chassis in Table 14 and an additional four contributed by the chassis of Table 38. The corresponding L3-Vλ domains of the 15 chassis contribute from 7 to 10 amino acids to CDRL3. When considering the amino acids contributed by the IGλJ sequences, the total variation in the length of CDRL3 is 8 to 12 amino acids, approximating the distribution inFIG. 4 . Thus, in this exemplary embodiment of the invention, the minimalist Vλ library may be represented by the following: 15 Chassis×5 IGλJ-derived segments=75 sequences. Here, the 15 chassis are Vλ1-40 (SEQ ID NO: 531), Vλ1-44 (SEQ ID NO: 532), Vλ1-51 (SEQ ID NO: 533), Vλ2-14 (SEQ ID NO: 534), Vλ3-1* (SEQ ID NO: 535), Vλ3-19 (SEQ ID NO: 536), Vλ3-21 (SEQ ID NO: 537), Vλ4-69 (SEQ ID NO: 538), Vλ6-57 (SEQ ID NO: 539), Vλ5-45 (SEQ ID NO: 540), Vλ7-43 (SEQ ID NO: 541), Vλ1-40+(SEQ ID NO: 564), Vλ3-19+(SEQ ID NO: 565), Vλ3-21+(SEQ ID NO: 566), and Vλ6-57+(SEQ ID NO: 567). The 5 IGλJ-derived segments are YVFGGGTKLTVL (IGλJ1; SEQ ID NO: 568), VVFGGGTKLTVL (IGλJ2; SEQ ID NO: 558), WVFGGGTKLTVL (IGλJ3; SEQ ID NO: 559), AVFGGGTKLTVL (IGλJ7; SEQ ID NO: 569), and -VFGGGTKLTVL (from any of the preceding sequences). - CDRH3 sequences of human antibodies of interest that are known in the art, (e.g., antibodies that have been used in the clinic) have close counterparts in the designed library of the invention. A set of fifteen CDRH3 sequences from clinically relevant antibodies is presented in Table 39.
-
TABLE 39 CDRH3 Sequences of Reference Antibodies SEQ Antibody ID Name Target Origin Status CDHR3 sequence NO: CAB1 TNF-α Phage FDA AKVSYLSTASSLDY 380 display- Approved human library CAB2 EGFR Transgenic FDA VRDRVTGAFDI 381 mouse Approved CAB3 IL-12/IL-23 Phage Phase III KTHGSHDN 382 display- human library CAB4 Interleukin- Transgenic Phase III ARDLRTGPFDY 383 1-β mouse CAB5 RANKL Transgenic Phase III AKDPGTTVIMSWFDP 384 mouse CAB6 IL-12/IL-23 Transgenic Phase III ARRRPGQGYFDF 385 mouse CAB7 TNF-α Transgenic Phase III ARDRGASAGGNYYYYGMDV 386 mouse CAB8 CTLA4 Transgenic Phase III ARDPRGATLYYYYYGMDV 387 mouse CAB9 CD20 Transgenic Phase III AKDIQYGNYYYGMDV 388 mouse CAB10 CD4 Transgenic Phase III ARVINWFDP 389 mouse CAB11 CTLA4 Transgenic Phase III ARTGWLGPFDY 390 mouse CAB12 IGF1-R Transgenic Phase II AKDLGWSDSYYYYYGMDV 391 mouse CAB13 EGFR Transgenic Phase II ARDGITMVRGVMKDYFDY 392 mouse CAB14 EGFR Phage Phase II ARVSIFGVGTFDY 393 display- human library CAB15 BLyS Phage Phase II ARSRDLLLFPHHALSP 394 display- human library - Each of the above sequences was compared to each of the members of the library of Example 5, and the member, or members, with the same length and fewest number of amino acid mismatches was, or were, recorded. The results are summarized in Table 40, below. For most of the cases, matches with 80% identity or better were found in the exemplified CDRH3 library. To the extent that the specificity and binding affinity of each of these antibodies is influenced by their CDRH3 sequence, without being bound by theory, one or more of these library members could have measurable affinity to the relevant targets.
-
TABLE 40 Match of Reference Antibody CDRH3 to Designed Library Number of % Identity Antibody Mismatches of Best Name (*) Length Match CAB1 5 14 64 % CAB2 2 11 82 % CAB3 4 8 50 % CAB4 2 11 82 % CAB5 3 15 80 % CAB6 3 12 75 % CAB7 2 20 90 % CAB8 0 19 100 % CAB9 3 15 80 % CAB10 1 9 89 % CAB11 1 11 91 % CAB12 2 18 89 % CAB13 2 18 89% CAB14 1 13 92 % CAB15 7 16 56% (*) For the best-matching sequence(s) in library - Given that a physical realization of a library with about 108 distinct members could, in practice, contain every single member, then such sequences with close percent identity to antibodies of interest would be present in the physical realization of the library. This example also highlights one of many distinctions of the libraries of the current invention over those of the art; namely, that the members of the libraries of the invention may be precisely enumerated. In contrast, CDRH3 libraries known in the art cannot be explicitly enumerated in the manner described herein. For example, many libraries known in the art (e.g., Hoet et al., Nat. Biotechnol., 2005, 23: 344; Griffiths et al., EMBO J., 1994, 13: 3245; Griffiths et al., EMBO J., 1993, 12: 725; Marks et al., J. Mol. Biol., 1991, 222: 581, each incorporated by reference in its entirety) are derived by cloning of natural human CDRH3 sequences and their exact composition is not characterized, which precludes enumeration.
- Synthetic libraries produced by other (e.g., random or semi-random/biased) methods (Knappik, et al., J Mol Biol, 2000, 296: 57, incorporated by reference in its entirety) tend to have very large numbers of unique members. Thus, while matches to a given input sequence (for example, at 80% or greater) may exist in a theoretical representation of such libraries, the probability of synthesizing and then producing a physical realization of the theoretical library that contains such a sequence and then selecting an antibody corresponding to such a match, in practice, may be remotely small. For example, a CDRH3 of
length 19 in the Knappik library may have over 1019 distinct sequences. In a practical realization of such a library a tenth or so of the sequences may havelength 19 and the largest total library may have in the order of 1010 to 1012 transformants; thus, the probability of a given pre-defined member being present, in practice, is effectively zero (less than one in ten million). Other libraries (e.g., Enzelberger et al. WO2008053275 and Ladner US20060257937, each incorporated by reference in its entirety) suffer from at least one of the limitations described throughout this application. - Thus, for example, considering antibody CAB14, there are seven members of the designed library of Example 5 that differ at just one amino acid position from the sequence of the CDRH3 of CAB14 (given in Table 39). Since the total length of this CDRH3 sequence is 13, the percent of identical amino acids is 12/13 or about 92% for each of these 7 sequences of the library of the invention. It can be estimated that the probability of obtaining such a match (or better) in the library of Knappik et al. is about 1.4×10−9; it would be lower still, about 5.5×10−10, in a library with equal amino acid proportions (i.e., completely random). Therefore, in a physical realization of the library with about 1010 transformants of which about a tenth may have
length 13, there may be one or two instances of these best matches. However, with longer sequences such as CAB12, the probability of having members in the Knappik library with about 89% or better matching are under about 10−15, so that the expected number of instances in a physical realization of the library is essentially zero. To the extent that sequences of interest resemble actual human CDRH3 sequences, there will be close matches in the library of Example 5, which was designed to mimic human sequences. Thus, one of the many relative advantages of the present library, versus those in the art, becomes more apparent as the length of the CDRH3 increases. - This example outlines the procedures used to synthesize the oligonucleotides used to construct the exemplary libraries of the invention. Custom
Primer Support™ 200 dT40S resin (GE Healthcare) was used to synthesize the oligonucleotides, using a loading of about 39 μmol/g of resin. Columns (diameter=30 μm) and frits were purchased from Biosearch Technologies, Inc. A column bed volume of 30 μL was used in the synthesis, with 120 nmol of resin loaded in each column. A mixture of dichloromethane (DCM) and methanol (MeOH), at a ratio of 400/122 (v/v) was used to load the resin. Oligonucleotides were synthesized using a Dr. Oligo® 192 oligonucleotide synthesizer and standard phosphorothioate chemistry. - The split pool procedure for the synthesis of the [DH]-[N2]-[H3-JH]oligonucleotides was performed as follows: First, oligonucleotide leader sequences, containing a randomly chosen 10 nucleotide sequence (ATGCACAGTT; SEQ ID NO: 395), a BsrDI recognition site (GCAATG), and a two base “overlap sequence” (TG, AC, AG, CT, or GA) were synthesized. The purpose of each of these segments is explained below. After synthesis of this 18 nucleotide sequence, the DH segments were synthesized; approximately 1 g of resin (with the 18 nucleotide segment still conjugated) was suspended in 20 mL of DCM/MeOH. About 60 μL of the resulting slurry (120 nmol) was distributed inside each of 278 oligonucleotide synthesis columns. These 278 columns were used to synthesize the 278 DH segments of Table 18, 3′ to the 18 nucleotide segment described above. After synthesis, the 278 DH segments were pooled as follows: the resin and frits were pushed out of the columns and collected inside a 20 mL syringe barrel (without plunger). Each column was then washed with 0.5 mL MeOH, to remove any residual resin that was adsorbed to the walls of the column. The resin in the syringe barrel was washed three times with MeOH, using a low porosity glass filter to retain the resin. The resin was then dried and weighed.
- The pooled resin (about 1.36 g) containing the 278 DH segments was subsequently suspended in about 17 mL of DCM/MeOH, and about 60 μL of the resulting slurry was distributed inside each of two sets of 141 columns. The 141 N2 segments enumerated in Tables 24 and 25 were then synthesized, in duplicate (282 total columns), 3′ to the 278 DH segments synthesized in the first step. The resin from the 282 columns was then pooled, washed, and dried, as described above.
- The pooled resin obtained from the N2 synthesis (about 1.35 g) was suspended in about 17 mL of DCM/MeOH, and about 60 μL of the resulting slurry was distributed inside each of 280 columns, representing 28 H3-JH segments synthesized ten times each. A portion (described more fully below) of each of the 28 IGHJ segments, including H3-JH of Table 20 were then synthesized, 3′ to the N2 segments, in ten of the columns. Final oligonucleotides were cleaved and deprotected by exposure to gaseous ammonia (85° C., 2 h, 60 psi).
- Split pool synthesis was used to synthesize the exemplary CDRH3 library. However, it is appreciated that recent advances in oligonucleotide synthesis, which enable the synthesis of longer oligonucleotides at higher fidelity and the production of the oligonucleotides of the library by synthetic procedures that involve splitting, but not pooling, may be used in alternative embodiments of the invention. The split pool synthesis described herein is, therefore, one possible means of obtaining the oligonucleotides of the library, but is not limiting. One other possible means of synthesizing the oligonucleotides described in this application is the use of trinucleotides. This may be expected to increase the fidelity of the synthesis, since frame shift mutants would be reduced or eliminated.
- This example outlines the procedures used to create exemplary CDRH3 and heavy chain libraries of the invention. A two step process was used to create the CDRH3 library. The first step involved the assembly of a set of vectors encoding the tail and N1 segments, and the second step involved utilizing the split pool nucleic acid synthesis procedures outlined in Example 9 to create oligonucleotides encoding the DH, N2, and H3-JH segments. The chemically synthesized oligonucleotides were then ligated into the vectors, to yield CDRH3 residues 95-102, based on the numbering system described herein. This CDRH3 library was subsequently amplified by PCR and recombined into a plurality of vectors containing the heavy chain chassis variants described in Examples 1 and 2. CDRH1 and CDRH2 variants were produced by QuikChange® Mutagenesis (Stratagene™), using the oligonucleotides encoding the ten heavy chain chassis of Example 1 as a template. In addition to the heavy chain chassis, the plurality of vectors contained the heavy chain constant regions (i.e., CH1, CH2, and CH3) from IgG1, so that a full-length heavy chain was formed upon recombination of the CDRH3 with the vector containing the heavy chain chassis and constant regions. In this exemplary embodiment, the recombination to produce the full-length heavy chains and the expression of the full-length heavy chains were both performed in S. cerevisiae.
- To generate full-length, heterodimeric IgGs, comprising a heavy chain and a light chain, a light chain protein was also expressed in the yeast cell. The light chain library used in this embodiment was the kappa light chain library, wherein the VKCDR3s were synthesized using degenerate oligonucleotides (see Example 6.2). Due to the shorter length of the oligonucleotides encoding the light chain library (in comparison to those encoding the heavy chain library), the light chain CDR3 oligonucleotides could be synthesized de novo, using standard procedures for oligonucleotide synthesis, without the need for assembly from sub-components (as in the heavy chain CDR3 synthesis). One or more light chains can be expressed in each yeast cell which expresses a particular heavy chain clone from a library of the invention. One or more light chains have been successfully expressed from both episomal (e.g., plasmid) vectors and from integrated sites in the yeast genome.
- Below are provided further details on the assembly of the individual components for the synthesis of a CDRH3 library of the invention, and the subsequent combination of the exemplary CDRH3 library with the vectors containing the chassis and constant regions. In this particular exemplary embodiment of the invention, the steps involved in the process may be generally characterized as (i) synthesis of 424 vectors encoding the tail and N1 regions; (ii) ligation of oligonucleotides encoding the [DH]-[N2]-[H3-JH]segments into these 424 vectors; (iii) PCR amplification of the CDRH3 sequences from the vectors produced in these ligations; and (iv) homologous recombination of these PCR-amplified CDRH3 domains into the yeast expression vectors containing the chassis and constant regions.
- This example demonstrates the synthesis of 424 vectors encoding the tail and N1 regions of CDRH3. In this exemplary embodiment of the invention, the tail was restricted to G, D, E, or nothing, and the N1 region was restricted to one of the 59 sequences shown in Table 24. As described throughout the specification, many other embodiments are possible.
- In the first step of the process, a single “base vector” (pJM204, a pUC-derived cloning vector) was constructed, which contained (i) a nucleic acid sequence encoding two amino acids that are common to the C-terminal portion of all 28 IGHJ segments (SS), and (ii) a nucleic acid sequence encoding a portion of the CH1 constant region from IgG1. Thus, the base vector contains an insert encoding a sequence that can be depicted as:
-
[SS]-[CH1˜], - wherein SS is a common portion of the C-terminus of the 28 IGHJ segments and CH11 is a portion of the CH1 constant region from IgG1, namely:
-
(SEQ ID NO: 396) ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGV HTFPAVLQSSGLYSLSSVVTVPSSSLG. - Next, 424 different oligonucleotides were cloned into the base vector, upstream (i.e., 5′) from the region encoding the [SS]-[CH1˜]. These 424 oligonucleotides were synthesized by standard methods and each encoded a C-terminal portion of one of the 17 heavy chain chassis enumerated in Table 5, plus one of four exemplary tail segments (G/D/E/-), and one of 59 exemplary N1 segments (Table 24). These 424 oligonucleotides, therefore, encode a plurality of sequences that may be represented by:
-
[˜FRM3]-[G/D/E/-]-[N1], - wherein ˜FRM3 represents a C-terminal portion of a FRM3 region from one of the 17 heavy chain chassis of Table 5, G/D/E/- represents G, D, E, or nothing, and N1 represents one of the 59 N1 sequences enumerated in Table 24. As described throughout the specification, the invention is not limited to the chassis exemplified in Table 5, their CDRH1 and CDRH2 variants (Table 8), the four exemplary tail options used in this example, or the 59 N1 segments presented in Table 24.
- The oligonucleotide sequences represented by the sequences above were synthesized in two groups: one group containing a ˜FRM3 region identical to the corresponding region on 16 of the 17 the heavy chain chassis enumerated in Table 5, and another group containing a ˜FRM3 region that is identical to the corresponding region on VH3-15. In the former group, an oligonucleotide encoding DTAVYYCAR (SEQ ID NO: 397) was used for ˜FRM3. During subsequent PCR amplification, the V residue of VH5-51 was altered to an M, to correspond to the VH5-51 germline sequence. In the latter group (that with a sequence common to VH3-15), a larger oligonucleotide, encoding the sequence AISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAK (SEQ ID NO: 398) was used for ˜FRM3. Each of the two oligonucleotides encoding the ˜FRM3 regions were paired with oligonucleotides encoding one of the four tail regions (G/D/E/-) and one of the 59 N1 segments, yielding a total of 236 possible combinations for each ˜FRM3 (i.e., 1×4×59), or a total of 472 possible combinations when both ˜FRM3 sequences are considered. However, 48 of these combinations are redundant and only a single representation of these sequences was used in the currently exemplified CDRH3 library, yielding 424 unique oligonucleotides encoding [˜FRM3]-[G/D/E/-]-[N1] sequences.
- After the oligonucleotides encoding the [˜FRM3]-[G/D/E/-]-[N1] and [SS]-[CH1˜] segments were cloned into the vector, as described above, additional sequences were added to the vector to facilitate the subsequent insertion of the oligonucleotides encoding the [DH]-[N2]-[H3-JH] fragments synthesized during the split pool synthesis. These additional sequences comprise a polynucleotide encoding a selectable marker protein, flanked on each side by a recognition site for a type II restriction enzyme, for example:
-
[Type II RS 1]-[selectable marker protein]-[Type II RS 2]. - In this exemplary embodiment, the selectable marker protein is ccdB and the type II restriction enzyme recognition sites are specific for BsrDI and BbsI. In certain strains of E. coli, the ccdB protein is toxic, thereby preventing the growth of these bacteria when the gene is present.
- An example of the 5′ end of one of the 212 vectors with a ˜FRM3 region based on the VH3-23 chassis, D tail residue and an N1 segment of length zero is presented below (SEQ ID NO: 570):
-
VH3-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A I S G S G G S T Y • 961 GCTATTAG TGGTAGTGGT GGTAGCACAT CGATAATC ACCATCACCA CCATCGTGTA VH3-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ • Y A D S V K G R F T I S R D N S K N T L Y L Q M N S 1041 ACTACGCAGA CTCCGTGAAG GGCCGGTTCA CCATCTCCAG AGACAATTCC AAGAACACGC TGTATCTGCA AATGAACAGC TGATGCGTCT GAGGCACTTC CCGGCCAAGT GGTAGAGGTC TCTGTTAAGG TTCTTGTGCG ACATAGACGT TTACTTGTCG VH3-23 ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BsrDI ~~~~~~ L R A E D T A V Y Y C A K 1121 CTGAGAGCCG AGGACACGGC GGTGTACTAC TGCGCCAAGG ACCATTGCGC TTAGCCTAGG TTATATTCCC CAGAACATCA GACTCTCGGC TCCTGTGCCG CCACATGATG ACGCGGTTCC TGGTAACGCG AATCGGATCC AATATAAGGG GTCTTGTAGT - An example of one of the 212 vectors with a ˜FRM3 region based on one of the other 16 chassis, with a D residue as the tail and an N1 segment of length zero is presented below (SEQ ID NO: 571):
-
Framework 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ D T A V Y Y C A R 961 GACACGGCG GTGTACTACT GCGCCAGAGA CTGTGCCGC CACATGATGA CGCGGTCTCT ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BsrDI ~~~~~~ 1041 CCATTGCGCT TAGCCTAGGT TATATTCCCC AGAACATCAG GTTAATGGCG TTTTTGATGT CATTTTCGCG GTGGCTGAGA GGTAACGCGA ATCGGATCCA ATATAAGGGG TCTTGTAGTC CAATTACCGC AAAAACTACA GTAAAAGCGC CACCGACTCT ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1121 TCAGCCACTT CTTCCCCGAT AACGGAAACC GGCACACTGG CCATATCGGT GGTCATCATG CGCCAGCTTT CATCCCCGAT AGTCGGTGAA GAAGGGGCTA TTGCCTTTGG CCGTGTGACC GGTATAGCCA CCAGTAGTAC GCGGTCGAAA GTAGGGGCTA ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1201 ATGCACCACC GGGTAAAGTT CACGGGAGAC TTTATCTGAC AGCAGACGTG CACTGGCCAG GGGGATCACC ATCCGTCGCC TACGTGGTGG CCCATTTCAA GTGCCCTCTG AAATAGACTG TCGTCTGCAC GTGACCGGTC CCCCTAGTGG TAGGCAGCGG ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1281 CGGGCGTGTC AATAATATCA CTCTGTACAT CCACAAACAG ACGATAACGG CTCTCTCTTT TATAGGTGTA AACCTTAAAC GCCCGCACAG TTATTATAGT GAGACATGTA GGTGTTTGTC TGCTATTGCC GAGAGAGAAA ATATCCACAT TTGGAATTTG ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1361 TGCATTTCAC CAGCCCCTGT TCTCGTCAGC AAAAGAGCCG TTCATTTCAA TAAACCGGGC GACCTCAGCC ATCCCTTCCT ACGTAAAGTG GTCGGGGACA AGAGCAGTCG TTTTCTCGGC AAGTAAAGTT ATTTGGCCCG CTGGAGTCGG TAGGGAAGGA ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1441 GATTTTCCGC TTTCCAGCGT TCGGCACGCA GACGACGGGC TTCATTCTGC ATGGTTGTGC TTACCAGACC GGAGATATTG CTAAAAGGCG AAAGGTCGCA AGCCGTGCGT CTGCTGCCCG AAGTAAGACG TACCAACACG AATGGTCTGG CCTCTATAAC ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1521 ACATCATATA TGCCTTGAGC AACTGATAGC TGTCGCTGTC AACTGTCACT GTAATACGCT GCTTCATAGC ATACCTCTTT TGTAGTATAT ACGGAACTCG TTGACTATCG ACAGCGACAG TTGACAGTGA CATTATGCGA CGAAGTATCG TATGGAGAAA ccdB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1601 TTGACATACT TCGGGTATAC ATATCAGTAT ATATTCTTAT ACCGCAAAAA TCAGCGCGCA AATATGCATA CTGTTATCTG AACTGTATGA AGCCCATATG TATAGTCATA TATAAGAATA TGGCGTTTTT AGTCGCGCGT TTATACGTAT GACAATAGAC ccdB CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BbsI ~~~~~~~ A S T K G P S V F P L A P S • 1681 GCTTTTAGTA AGCCGCCTAG GTCATCAGAA GACAACTCAG CTAGCACCAA GGGCCCATCG GTCTTTCCCC TGGCACCCTC CGAAAATCAT TCGGCGGATC CAGTAGTCTT CTGTTGAGTC GATCGTGGTT CCCGGGTAGC CAGAAAGGGG ACCGTGGGAG CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ • S K S T S G G T A A L G C L V K D Y F P E P V T V S W • 1761 CTCCAAGAGC ACCTCTGGGG GCACAGCGGC CCTGGGCTGC CTGGTCAAGG ACTACTTCCC CGAACCGGTG ACGGTGTCGT GAGGTTCTCG TGGAGACCCC CGTGTCGCCG GGACCCGACG GACCAGTTCC TGATGAAGGG GCTTGGCCAC TGCCACAGCA CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ • N S G A L T S G V H T F P A V L Q S S G L 1841 GGAACTCAGG CGCCCTGACC AGCGGCGTGC ACACCTTCCC GGCTGTCCTA CAGTCCTCAG GACTC CCTTGAGTCC GCGGGACTGG TCGCCGCACG TGTGGAAGGG CCGACAGGAT GTCAGGAGTC CTGAG - All 424 vectors were sequence verified. A schematic diagram of the content of the 424 vectors, before and after cloning of the [DH]-[N2]-[H3-JH] fragment is presented in
FIG. 5 . Below is an exemplary sequence from one of the 424 vectors containing a FRM3 region from VH3-23 (SEQ ID NO: 572). -
primer EMK135 ~~~~~~~~~~~~~~~~~~~~~~~~ VH3-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A I S G S G G S T Y Y A D S V K G R F 561 GCTATTA GTGGTAGTGG TGGTAGCACA TACTACGCAG ACTCCGTGAA GGGCCGGTTC CGATAAT CACCATCACC ACCATCGTGT ATGATGCGTC TGAGGCACTT CCCGGCCAAG VH3-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ T I S R D N S K N T L Y L Q M N S L R A E D T A V Y Y • 641 ACCATCTCCA GAGACAATTC CAAGAACACG CTGTATCTGC AAATGAACAG CCTGAGAGCC GAGGACACGG CGGTGTACTA TGGTAGAGGT CTCTGTTAAG GTTCTTGTGC GACATAGACG TTTACTTGTC GGACTCTCGG CTCCTGTGCC GCCACATGAT VH3-23 D J1 ~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ JH6 ~~~~~~~~~~~~~~~~~~~~ N1_9 N2 ~~~~~~~~~~~~~ ~~~~~~~~~~ • C A K D A G G Y Y Y G S G S Y Y N A A A Y Y Y Y Y G M • 721 CTGCGCCAAG GACGCCGGAG GATATTATTA TGGGTCAGGA AGCTATTACA ACGCTGCGGC TTACTACTAC TATTATGGCA GACGCGGTTC CTGCGGCCTC CTATAATAAT ACCCAGTCCT TCGATAATGT TGCGACGCCG AATGATGATG ATAATACCGT JH6 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ J1 CH1 ~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NheI ~~~~~~ • D V W G Q G T T V T V S S A S T K G P S V F P L A P 801 TGGACGTGTG GGGACAAGGT ACAACAGTCA CCGTCTCCTC AGCTAGCACC AAGGGCCCAT CGGTCTTTCC CCTGGCACCC ACCTGCACAC CCCTGTTCCA TGTTGTCAGT GGCAGAGGAG TCGATCGTGG TTCCCGGGTA GCCAGAAAGG GGACCGTGGG CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ S S K S T S G G T A A L G C L V K D Y F P E P V T V S • 881 TCCTCCAAGA GCACCTCTGG GGGCACAGCG GCCCTGGGCT GCCTGGTCAA GGACTACTTC CCCGAACCGG TGACGGTGTC AGGAGGTTCT CGTGGAGACC CCCGTGTCGC CGGGACCCGA CGGACCAGTT CCTGATGAAG GGGCTTGGCC ACTGCCACAG EK137 CH1 Primer ~~~~~~~~~~~~~~~~~~~~ CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ • W N S G A L T S G V H T F P A V L Q S S G L Y S L S S • 961 GTGGAACTCA GGCGCCCTGA CCAGCGGCGT GCACACCTTC CCGGCTGTCC TACAGTCCTC AGGACTCTAC TCCCTCAGCA CACCTTGAGT CCGCGGGACT GGTCGCCGCA CGTGTGGAAG GGCCGACAGG ATGTCAGGAG TCCTGAGATG AGGGAGTCGT CH1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ • V V T V P S S S L G 1041 GCGTGGTGAC CGTGCCCTCC AGCAGCTTGG GC CGCACCACTG GCACGGGAGG TCGTCGAACC CG - This example describes the cloning of the oligonucleotides encoding the [D]-[N2]-[H3-JH] segments (made via split pool synthesis; Example 9) into the 424 vectors produced in Example 10.1. To summarize, the [DH]-[N2]-[H3-JH] oligonucleotides produced via split pool synthesis were amplified by PCR, to produce double-stranded oligonucleotides, to introduce restriction sites that would create overhangs complementary to those on the vectors (i.e., BsrDI and BbsI), and to complete the 3′ portion of the IGHJ segments that was not synthesized in the split pool synthesis. The amplified oligonucleotides were then digested with the restriction enzymes BsrDI (cleaves adjacent to the DH segment) and BbsI (cleaves near the end of the JH segment). The cleaved oligonucleotides were then purified and ligated into the 424 vectors which had previously been digested with BsrDI and BbsI. After ligation, the reactions were purified, ethanol precipitated, and resolubilized.
- This process for one of the [DH]-[N2]-[H3-JH] oligonucleotides synthesized in the split pool synthesis is illustrated below. The following oligonucleotide (SEQ ID NO: 399 is one of the oligonucleotides synthesized during the split pool synthesis:
-
1 ATGCACAGTTGCAATG TG TATTACTATGGATCTGGTTCTTACTATAAT GT 50 51 GGGCGGA TATTATTACTACTATGGTATGGACGTATGGGGGCAAGGGACC 99 - The first 10 nucleotides (ATGCACAGTT; SEQ ID NO: 395) represent a portion of a random sequence that is increased to 20 base pairs in the PCR amplification step, below. This portion of the sequence increases the efficiency of BsrDI digestion and facilitates the downstream purification of the oligonucleotides.
- Nucleotides 11-16 (underlined) represent the BsrDI recognition site. The two base overlap sequence that follows this site (in this example TG; bold) was synthesized to be complementary to the two base overhang created by digesting certain of the 424 vectors with BsrDI (i.e., depending on the composition of the tail/N1 region of the particular vector). Other oligonucleotides contain different two-base overhangs, as described below.
- The two base overlap is followed by the DH gene segment (nucleotides 19-48), in this example, by a 30 bp sequence (TATTACTATGGATCTGGTTCTTACTATAAT, SEQ ID NO: 400) which encodes the ten residue DH segment YYYGSGSYYN (i.e., IGHD3-10_2 of Table 17; SEQ ID NO: 2).
- The region of the oligonucleotide encoding the DH segment is followed, in this example, by a nine base region (GTGGGCGGA; bold; nucleotides 49-57), encoding the N2 segment (in this case VGG; Table 24).
- The remainder of this exemplary oligonucleotide represents the portion of the JH segment that is synthesized during the split pool synthesis (TATTATTACTACTATGGTATGGACGTATGGGGGCAAGGGACC; SEQ ID NO: 401; nucleotides 58-99; underlined), encoding the sequence YYYYYGMDVWGQGT (Table 20; residues 1-14 of SEQ ID NO: 258). The balance of the IGHJ segment is added during the subsequent PCR amplification described below.
- After the split pool-synthesized oligonucleotides were cleaved from the resin and deprotected, they served as a template for a PCR reaction which added an additional randomly chosen 10 nucleotides (e.g., GACGAGCTTC; SEQ ID NO: 402) to the 5′ end and the rest of the IGHJ segment plus the BbsI restriction site to the 3′ end. These additions facilitate the cloning of the [DH]-[N2]-[JH] oligonucleotides into the 424 vectors. As described above (Example 9), the last round of the split pool synthesis involves 280 columns: 10 columns for each of the oligonucleotides encoding one of 28 H3-JH segments. The oligonucleotide products obtained from these 280 columns are pooled according to the identity of their H3-JH segments, for a total of 28 pools. Each of these 28 pools is then amplified in five separate PCR reactions, using five forward primers that each encode a different two base overlap (preceding the DH segment; see above) and one reverse primer that has a sequence corresponding to the familial origin of the H3-JH segment being amplified. The sequences of these 11 primers are provided below:
-
Forward primers AC (SEQ ID NO: 403) GACGAGCTTCAATGCACAGTTGCAATGAC AG (SEQ ID NO: 404) GACGAGCTTCAATGCACAGTTGCAATGAG CT (SEQ ID NO: 405) GACGAGCTTCAATGCACAGTTGCAATGCT GA (SEQ ID NO: 406) GACGAGCTTCAATGCACAGTTGCAATGGA TG (SEQ ID NO: 407) GACGAGCTTCAATGCACAGTTGCAATGTG -
Reverse Primers JH1 (SEQ ID NO: 408) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTGC CCTGGCCCCA JH2 (SEQ ID NO: 409) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACAGTGACCAAGGTGC CACGGCCCCA JH3 (SEQ ID NO: 410) TGCATCAGTGCGACTAACGGAAGACTCTGAAGAGACGGTGACCATTGTCC CTTGGCCCCA JH4 (SEQ ID NO: 411) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTTC CTTGGCCCCA JH5 (SEQ ID NO: 412) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTTC CCTGGCCCCA JH6 (SEQ ID NO: 413) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCGTGGTCC CTTGCCCCCA - Amplifications were performed using Taq polymerase, under standard conditions. The oligonucleotides were amplified for eight cycles, to maintain the representation of sequences of different lengths. Melting of the strands was performed at 95° C. for 30 seconds, with annealing at 58° C. and a 15 second extension time at 72° C.
- Using the exemplary split-pool derived oligonucleotide enumerated above as an example, the PCR amplification was performed using the TG primer and the JH6 primer, where the annealing portion of the primers has been underlined:
-
TG (SEQ ID NO: 407) GACGAGCTTCAATGCACAGTTGCAATGTG JH6 (SEQ ID NO: 413) TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCGTGGTCC CTTGCCCCCA
The portion of the TG primer that is 5′ to the annealing portion includes the random 10 base pairs described above. The portion of the JH6 primer that is 5′ to the annealing portion includes the balance of the JH6 segment and the BbsI restriction site. The following PCR product (SEQ ID NO: 414) is formed in the reaction (added sequences underlined): -
GACGAGCTTCATGCACAGTTGCAATGTGTATTACTATGGATCTGGTTCTT ACTATAATGTGGGCGGATATTATTACTACTATGGTATGGACGTATGGGGG CAAGGGACCACGGTCACCGTCTCCTCAGAGTCTTCCGTTAGTCGCACTGA TGCAG - The PCR products from each reaction were then combined into five pools, based on the forward primer that was used in the reaction, creating sets of sequences yielding the same two-base overhang after BsrDI digestion. The five pools of PCR products were then digested with BsRDI and BbsI (100 μg of PCR product; 1 mL reaction volume; 200 U BbsI; 100 U BsrDI; 2 h; 37° C.; NEB Buffer 2). The digested oligonucleotides were extracted twice with phenol/chloroform, ethanol precipitated, air dried briefly and resolubilized in 300 μL of TE buffer by sitting overnight at 4° C.
- Each of the 424 vectors described in the preceding sections was then digested with BsrDI and BbsI, each vector yielding a two base overhang that was complimentary to one of those contained in one of the five pools of PCR products. Thus, one of the five pools of restriction digested PCR products are ligated into each of the 424 vectors, depending on their compatible ends, for a total of 424 ligations.
- This example describes the PCR amplification of the CDRH3 regions from the 424 vectors described above. As set forth above, the 424 vectors represent two sets: one for the VH3-23 family, with FRM3 ending in CAK (212 vectors) and one for the other 16 chassis, with FRM3 ending in CAR (212 vectors). The CDRH3s in the VH3-23-based vectors were amplified using a reverse primer (EK137; see Table 41) recognizing a portion of the CH1 region of the plasmid and the VH3-23-specific primer EK135 (see Table 41). Amplification of the CDRH3s from the 212 vectors with FRM3 ending in CAR was performed using the same reverse primer (EK137) and each of five FRM3-specific primers shown in Table 41 (EK139, EK140, EK141, EK143, and EK144). Therefore, 212 VH3-23 amplifications and 212×5 FRM3 PCR reactions were performed, for a total of 1,272 reactions. An additional PCR reaction amplified the CDRH3 from the 212 VH3-23-based vectors, using the EK 133 forward primer, to allow the amplicons to be cloned into the other 5 VH3 family member chassis while making the last three amino acids of these chassis CAK instead of the original CAR (VH3-23*). The primers used in each reaction are shown in Table 41.
-
TABLE 41 Primers Used for Amplification of CDRH3 Sequences SEQ Primer Compatible ID No. Chassis Primer Sequence NO EK135 VH3-23 CACATACTACGCAGACTCCGTG 415 EK133 VH3-48; CAAATGAACAGCCTGAGAGCCG 416 VH3-7; AGGACACGGCGGTGTACTACTG VH3-15; VH3-30; VH3-33; VH3-23* EK139 VH4-B; AAGCTGAGTTCTGTGACCGCCG 417 VH4-31; CAGACACGGCGGTGTACTACTG VH4-34; VH4-39; VH4-59; VH4-61 EK140 VH1-46; GAGCTGAGCAGCCTGAGATCTG 418 VH1-69 AGGACACGGCGGTGTACTACTG EK141 VH1-2 GAGCTGAGCAGGCTGAGATCTG 419 ACGACACGGCGGTGTACTACTG EK143 VH5-51 CAGTGGAGCAGCCTGAAGGCCT 420 CGGACACGGCGATGTACTACTG EK144 VH1-18 GAGCTGAGGAGCCTGAGATCTG 421 ACGACACGGCGGTGTACTACTG EK137 CH1 Rev. GTAGGACAGCCGGGAAGG 422 Primer - After amplification, reaction products were pooled according to the respective VH chassis that they would ultimately be cloned into. Table 42 enumerates these pools, with the PCR primers used to obtain the CDRH3 sequences in each pool provided in the last two columns.
-
TABLE 42 PCR Primers Used to Amplify CDRH3 Regions from 424 Vectors Pool # (Arbitrary) HC Chassis Target 5′ Primer 3′ Primer 1 1-46 EK140 EK137 1-69 EK140 EK137 2 1-2 EK141 EK137 3 1-18 EK144 EK137 4 4-B EK139 EK137 4-31 EK139 EK137 4-342 EK139 EK137 4-39 EK139 EK137 4-59 EK139 EK137 4-61 EK139 EK137 5 5-51 EK143 EK137 6 3-151 EK133 EK137 3-7 EK133 EK137 3-33 EK133 EK137 3-33 EK133 EK137 3-48 EK133 EK137 7 3-23 EMK135 EK137 8 3-23* EK133 EK137 *Allowed the amplicons to be cloned into the other 5 VH3 family member chassis (i.e., other than VH3-23), while making the last three amino acids of these chassis CAK instead of the original CAR. 1As described in Table 5, the original KT sequence in VH3-15 was mutated to RA, and the original TT to AR. 2As described in Table 5, the potential site for N-linked glycosylation was removed from CDRH2 of this chassis. - After pooling of the amplified CDRH3 regions, according to the process outlined above, the heavy chain chassis expression vectors were pooled according to their origin and cut, to create a “gap” for homologous recombination with the amplified CDRH3s.
FIG. 6 shows a schematic structure of a heavy chain vector, prior to recombination with a CDRH3. In this exemplary embodiment of the invention, there were a total of 152 vectors encoding heavy chain chassis and IgG1 constant regions, but no CDRH3. These 152 vectors represent 17 individual variable heavy chain gene families (Table 5; Examples 1 and 2). Fifteen of the families were represented by the heavy chain chassis sequences described in Table 5 and the CDRH1/H2 variants described in Table 8 (i.e., 150 vectors). VH 3-30 differs from VH3-33 by a single amino acid; thus VH3-30 was included in the VH3-33 pool of variants. The 4-34 VH family member was kept separate from all others and, in this exemplary embodiment, no variants of it were included in the library. Thus, a total of 16 pools, representing 17 heavy chain chassis, were generated from the 152 vectors. - The vector pools were digested with the restriction enzyme SfiI, which cuts at two sites in the vector that are located between the end of the FRM3 of the variable domain and the start of the CH1 (SEQ ID NO: 573).
-
VH3-48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ S V K G R F T I S R D N A K N S L Y L Q M N S L R A E • 2801 CTCTGTGAAG GGCCGATTCA CCATCTCCAG AGACAATGCC AAGAACTCAC TGTATCTGCA AATGAACAGC CTGAGAGCTG GAGACACTTC CCGGCTAAGT GGTAGAGGTC TCTGTTACGG TTCTTGAGTG ACATAGACGT TTACTTGTCG GACTCTCGAC Constant DTAVYYCAR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ VHS-48 VTVSS common to all J ~~ ~~~~~ SfiI SfiI ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ • D T A V Y Y C A R V T • 2881 AGGACACGGC GGTGTACTAC TGCGCCAGAG GCCAATAGGG CCAACTATAA CAGGGGTACC CCGGCCAATA AGGCCGTCAC TCCTGTGCCG CCACATGATG ACGCGGTCTC CGGTTATCCC GGTTGATATT GTCCCCATGG GGCCGGTTAT TCCGGCAGTG VTVSS common to all J ~~~~~~~~~~~ hIgGlm17,1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NheI ~~~~~~ • V S S A S T K G P S V F P L A P S S K S T S G G T A 2961 CGTCTCCTCA GCTAGCACCA AGGGCCCATC GGTCTTCCCC CTGGCACCCT CCTCCAAGAG CACCTCTGGG GGCACAGCGG GCAGAGGAGT CGATCGTGGT TCCCGGGTAG CCAGAAGGGG GACCGTGGGA GGAGGTTCTC GTGGAGACCC CCGTGTCGCC - The gapped vector pools were then mixed with the appropriate (i.e., compatible) pool of CDRH3 amplicons, generated as described above, at a 50:1 insert to vector ratio. The mixture was then transformed into electrocompetent yeast (S. cerevisiae), which already contained plasmids or integrated genes comprising a VK light chain library (described below). The degree of library diversity was determined by plating a dilution of the electroporated cells on a selectable agar plate. In this exemplified embodiment of the invention, the agar plate lacked tryptophan and the yeast lacked the ability to endogenously synthesize tryptophan. This deficiency was remedied by the inclusion of the TRP marker on the heavy chain chassis plasmid, so that any yeast receiving the plasmid and recombining it with a CDRH3 insert would grow. The electroporated cells were then outgrown approximately 100-fold, in liquid media lacking tryptophan. Aliquots of the library were frozen in 50% glycerol and stored at −80° C. Each transformant obtained at this stage represents a clone that can express a full IgG molecule. A schematic diagram of a CDRH3 integrated into a heavy chain vector and the accompanying sequences are provided in
FIG. 7 . - A heavy chain library pool was then produced, based on the approximate representation of the heavy chain family members as depicted in Table 43.
-
TABLE 43 Occurrence of Heavy Chain Chassis in Data Sets Used to Design Library, Expected (Designed) Library, and Actual (Observed) Library Relative Occurrence in Chassis Data Sets (1) Expected (2) Observed (3) VH1-2 5.1 6.0 6.4 VH1-18 3.4 3.7 3.8 VH1-46 3.4 5.2 4.7 VH1-69 8.0 8.0 10.7 VH3-7 3.6 6.1 4.5 VH3-15 1.9 6.9 3.6 VH3-23 11.0 13.2 17.1 VH3-33/30 13.1 12.5 6.6 VH3-48 2.9 6.3 7.5 VH4-31 3.4 2.5 4.3 VH4-34 17.2 7.0 4.7 VH4-39 8.7 3.9 3.0 VH4-59 7.0 7.8 9.2 VH4-61 3.2 1.9 2.4 VH4-B 1.0 1.4 0.8 VH5-51 7.2 7.7 10.5 (1) As detailed in Example 1, these 17 sequences account for about 76% of the entire sample of human VH sequences used to represent the human repertoire. (2) Based on pooling of sub-libraries of each chassis type. (3) Usage in 531 sequences from library; cf. FIG. 20. - This example describes the mutation of position 94 in VH3-23, VH3-33, VH3-30, VH3-7, and VH3-48. In VH3-23, the amino acid at this position was mutated from K to R. In VH3-33, VH3-30, VH3-7, and VH3-48, this amino acid was mutated from R to K. In VH3-32, this position was mutated from K to R. The purpose of making these mutations was to enhance the diversity of CDRH3 presentation in the library. For example, in naturally occurring VH3-23 sequences, about 90% have K at position 94, while about 10% have position R. By making these changes the diversity of the CDRH3 presentation is increased, as is the overall diversity of the library.
- Amplification was performed using the 424 vectors as a template. For the K94R mutation, the vectors containing the sequence DTAVYYCAK (VH3-23; SEQ ID NO: 578) were amplified with a PCR primer that changed the K to a R and added 5′ tail for homologous recombination with the VH3-48, VH3-33, VH-30, and VH3-7. The “T” base in 3-48 does not change the amino acid encoded and thus the same primer with a T::C mismatch still allows homologous recombination into the 3-48 chassis.
- Furthermore, the amplification products from the 424 vectors (produced as described above) containing the DTAVYYCAR (SEQ ID NO: 579) sequence can be homologously recombined into the VH3-23 (CAR) vector, changing R to K in this framework and thus further increasing the diversity of CDRH3 presentation in this chassis.
-
240 294 VH3-48 (240) TCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCGGTGTACTACTGCGCCAGA SEQ ID NO: (574) VH3-33/30 (240) TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCGCCAGA SEQ ID NO: (575) VH3-7 (240) TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCGCCAGA SEQ ID NO: (576) VH3-23 (240) TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCGCCAAG SEQ ID NO: (577) - This example describes the construction of a VK library of the invention. The exemplary VK library described herein corresponds to the VKCDR3 library of about 105 complexity, described in Example 6.2. As described in Example 6, and throughout the application, other VK libraries are within the scope of the invention, as are Vλ libraries.
- Ten VK chassis were synthesized (Table 11), which did not contain VKCDR3, but instead had two SfiI restriction sites in the place of VKCDR3, as for the heavy chain vectors. The kappa constant region followed the SfiI restriction sites.
FIG. 8 shows a schematic structure of a light chain vector, prior to recombination with a CDRL3. - Ten VKCDR oligonucleotide libraries were then synthesized, as described in Example 6.2, using degenerate oligonucleotides (Table 33). The oligonucleotides were then PCR amplified, as separate pools, to make them double stranded and to add additional nucleotides required for efficient homologous recombination with the gapped (by SfiI) vector containing the VK chassis and constant region sequences. The VKCDR3 pools in this embodiment of the invention represented
lengths FIG. 9 . - A kappa light chain library pool was then produced, based on the approximate representation of the VK family members found in the circulating pool of B cells. The 10 kappa variable regions used and the relative frequency in the final library pool are shown in Table 44.
-
TABLE 44 Occurrence of VK Chassis in Data Sets Used to Design Library, Expected (Designed) Library, and Actual (Observed) Library Relative Occurrence in Chassis Data Sets (1) Expected (2) Observed (3) VK1-5 8.6 7.1 5.8 VK1-12 4.0 3.6 3.5 VK1-27 3.3 3.6 8.1 VK1-33 5.3 7.1 3.5 VK1-39 18.5 21.4 17.4 VK2-28 7.7 7.1 5.8 VK3-11 10.9 10.7 20.9 VK3-15 6.6 7.1 4.7 VK3-20 24.5 21.4 18.6 VK4-1 10.4 10.7 11.6 (1) As indicated in Example 3, these 10 chassis account for about 80% of the occurrences in the entire data set of VK sequences examined. (2) Rounded off ratios from the data in column 2, then normalized for actual experimental set up. The relative rounded ratios are 6 for VK1-39 and VK3-20, 3 for VK3-11 and VK4-1, 2 for VK-15, VK1-33, VK2-28 and VK3-15, and 1 for VK1-12 and VK1-27.(3) Chassis usage in set of 86 sequences obtained from library; see also FIG. 22. - This example shows the characteristics of exemplary libraries of the invention, constructed according to the methods described herein.
- To characterize the product of the split pool synthesis, ten of the 424 vectors containing the [Tail]-[N1]-[DH]-[N2]-[H3-JH] product were selected at random and transformed into E. coli. The split pool product had a theoretical diversity of about 1.1×106 (i.e., 278×141×28). Ninety-six colonies were selected from the transformation and forward and reverse sequences were generated for each clone. Of the 96 sequencing reactions, 90 yielded sequences from which the CDRH3 region could be identified, and about 70% of these sequences matched a designed sequence in the library. The length distribution of the sequenced CDRH3 segments from the ten vectors, as compared to the theoretical distribution (based on design), is provided in
FIG. 10 . The length distribution of the individual DH, N2, and H3-JH segments obtained from the ten vectors are shown inFIGS. 11-13 . - Once the length distribution of the CDRH3 components of the library that were contained in the vector matched design were verified, the CDRH3 domains and heavy chain family representation in yeast that had been transformed according to the process described in Example 10.4 were characterized. Over 500 single-pass sequences were obtained. Of these, 531 yielded enough sequence information to identify the heavy chain chassis and 291 yielded enough sequence information to characterize the CDRH3. These CDRH3 domains have been integrated with the heavy chain chassis and constant region, according to the homologous recombination processes described herein. The length distribution of the CDRH3 domains from 291 sequences, compared to the theoretical length distribution, is shown in
FIG. 14 . The mean theoretical length was 14.4±4 amino acids, while the average observed length was 14.3±3 amino acids. The observed length of each portion of the CDRH3, as compared to theoretical, is presented inFIGS. 15-18 .FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences, andFIG. 20 shows the representation of 16 of the chassis of the library. The VH3-15 chassis was not represented amongst these sequences. This was corrected later by introducing yeast transformants containing the VH3-15 chassis, with CDRH3 diversity, into the library at the desired composition. - The length distribution of the CDRL3 components, from the VKCDR3 library described in Example 6.2, were determined after yeast transformation via the methods described in Example 10.4. A comparison of the CDRL3 length from 86 sequences of the library to the human sequences and designed sequences is provided in
FIG. 21 .FIG. 22 shows the representation of the light chain chassis from amongst the 86 sequences selected from the library. About 91% of the CDRL3 sequences were exact matches to the design, and about 9% differed by a single amino acid. - This example presents data on the composition of the CDRH3 domains of exemplary libraries, and a comparison to other libraries of the art. More specifically, this example presents an analysis of the occurrence of the 400 possible amino acid pairs (20 amino acids×20 amino acids) occurring in the CDRH3 domains of the libraries. The prevalence of these pairs is computed by examination of the nearest neighbor (i−i+1; designated IP1), next nearest neighbor (i−i+2; designated IP2), and next-next nearest neighbor (i−i+3; designated IP3) of the i residue in CDRH3. Libraries previously known in the art (e.g., Knappik et al., J. Mol. Biol., 2000, 296: 57; Sidhu et al., J. Mol. Biol., 2004, 338: 299; and Lee et al., J. Mol. Biol. 2004, 340: 1073, each of which is incorporated by reference in its entirety) have only considered the occurrence of the 20 amino acids at individual positions within CDRH3, while maintaining the same composition across the center of CDRH3, and not the pair-wise occurrences considered herein. In fact, according to Sidhu et al. (J. Mol. Biol., 2004, 338: 299, incorporated by reference in its entirety), “[i]n CDR-H3, there was some bias towards certain residue types, but all 20 natural amino acid residues occurred to a significant extent, and there was very little position-specific bias within the central portion of the loop”. Thus, the present invention represents the first recognition that, surprisingly, a position-specific bias does exist within the central portion of the CDRH3 loop, when the occurrences of amino acid pairs recited above are considered. This example shows that the libraries described herein more faithfully reproduce the occurrence of these pairs as found in human sequences, in comparison to other libraries of the art. The composition of the libraries described herein may thus be considered more “human” than other libraries of the art.
- To examine the pair-wise composition of CDRH3 domains, a portion of CDRH3 beginning at
position 95 was chosen. For the purposes of comparison with data presented in Knappik et al. and Lee et al., the last five residues in each of the analyzed CDRH3s were ignored. Thus, for the purposes of this analysis, both members of the pair i−i+X (X=1 to 3) must fall within the region starting atposition 95 and ending at (but including) the sixth residue from the C-terminus of the CDRH3. The analyzed portion is termed the “central loop” (see Definitions). - To estimate pair distributions in representative libraries of the invention, a sampling approach was used. A number of sequences were generated by choosing randomly and, in turn, one of the 424 tail plus N1 segments, one of the 278 DH segments, one of the 141 N2 segments and one of the 28 JH segments (the latter truncated to include only the 95 to 102 Kabat CDRH3). The process was repeated 10,000 times to generate a sample of 10,000 sequences. By choosing a different seed for the random number generation, an independent sample of another 10,000 sequences was also generated and the results for pair distributions were observed to be nearly the same. For the calculations presented herein, a third and much larger sample of 50,000 sequences was used. A similar approach was used for the alternative library embodiment (N1-141), whereby the first segment was selected from 1068 tail+N1 segments (resulting after eliminating redundant sequences from 2
times 4times 141 or 1128 possible combinations). - The pair-wise composition of Knappik et al. was determined based on the percent occurrences presented in
FIG. 7 a of Knappik et al. (p. 71). The relevant data are reproduced below, in Table 45. -
TABLE 45 Composition of CDRH3 positions 95-100s (corresponding to positions 95- 99B of the libraries of the current invention) of CDRH3 of Knappik et al. (from FIG. 7a of Knappik etal.) Amino Acid Planned (%) Found (%) A 4.1 3.0 C 1.0 1.0 D 4.1 4.2 E 4.1 2.3 F 4.1 4.9 G 15.0 10.8 H 4.1 4.6 I 4.1 4.5 K 4.1 2.9 L 4.1 6.6 M 4.1 3.3 N 4.1 4.5 P 4.1 4.8 Q 4.1 2.9 R 4.1 4.1 S 4.1 5.6 T 4.1 4.5 V 4.1 3.7 W 4.1 2.0 Y 15.0 19.8 - The pair-wise composition of Lee et al. was determined based on the libraries depicted in Table 5 of Lee et al., where the positions corresponding to those CDRH3 regions analyzed from the current invention and from Knappik et al. are composed of an “XYZ” codon in Lee et al. The XYZ codon of Lee et al. is a degenerate codon with the following base compositions:
-
- position 1 (X): 19% A, 17% C, 38% G, and 26% T;
- position 2 (Y): 34% A, 18% C, 31% G, and 17% T; and
- position 3 (Z): 24% G and 76% T.
When the approximately 2% of codons encoding stop codons are excluded (these do not occur in functionally expressed human CDRH3 sequences), and the percentages are re-normalized to 100%, the following amino acid representation can be deduced from the composition of the XYZ codon of Lee et al. (Table 46).
-
TABLE 46 Composition of CDRH3 of Lee et al., Based on the Composition of the Degenerate XYZ Codon. Type Percent A 6.99% C 6.26% D 10.03% E 3.17% F 3.43% G 12.04% H 4.49% I 2.51% K 1.58% L 4.04% M 0.79% N 5.02% P 3.13% Q 1.42% R 6.83% S 9.35% T 3.49% V 6.60% W 1.98% Y 6.86% - The occurrences of each of the 400 amino acid pairs, in each of the IP1, IP2, and IP3 configurations, can be computed for Knappik et al. and Lee et al. by multiplying together the individual amino acid compositions. For example, for Knappik et al., the occurrence of YS pairs in the library is calculated by multiplying 15% by 4.1%, to yield 6.1%; note that the occurrence of SY pairs would be the same. Similarly, for the XYZ codon-based libraries of Lee et al., the occurrence of YS pairs would be 6.86% (Y) multiplied by 9.35% (S), to give 6.4%; the same, again, for SY.
- For the human CDRH3 sequences, the calculation is performed by ignoring the last five amino acids in the Kabat definition. By ignoring the C-
terminal 5 amino acids of the human CDRH3, these sequences may be compared to those of Lee et al., based on the XYZ codons. While Lee et al. also present libraries with “NNK” and “NNS” codons, the pair-wise compositions of these libraries are even further away from human CDRH3 pair-wise composition. The XYZ codon was designed by Lee et al. to replicate, to some extent, the individual amino acid type biases observed in CDRH3. - An identical approach was used for the libraries of the invention, after using the methods described above to produce sample sequences. While it is possible to perform these calculations with all sequences in the library, independent random samples of 10,000 to 20,000 members gave indistinguishable results. The numbers reported herein were thus generated from samples of 50,000 members.
- Three tables were generated for IP1, IP2 and IP3, respectively (Tables 47, 48, and 49). Out of the 400 pairs, a selection from amongst the 20 most frequently occurring is included in the tables. The sample of about 1,000 human sequences (Lee et al., 2006) is denoted as “Preimmune,” a sample of about 2,500 sequences (Jackson et al., 2007) is denoted as “Humabs,” and the more affinity matured subset of the latter, which excludes all of the Preimmune set, is denoted as “Matured.” Synthetic libraries in the art are denoted as HuCAL (Knappik, et al., 2000) and XYZ (Lee et al., e 2004). Two representative libraries of the invention are included: LUA-59 includes 59 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above). LUA-141 includes 141 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above). Redundancies created by combination of the N1 and tail sequences were removed from the dataset in each respective library. In certain embodiments, the invention may be defined based on the percent occurrence of any of the 400 amino acid pairs, particularly those in Tables 47-49. In certain embodiments, the invention may be defined based on at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more of these pairs. In certain embodiments of the invention, the percent occurrence of certain pairs of amino acids may fall within ranges indicated by “LUA-” (lower boundary) and “LUA+” (higher boundary), in the following tables. In some embodiments of the invention, the lower boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, and 5. In some embodiments of the invention, the higher boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 7.25, 7.5, 7.75, and 8. According to the present invention, any of the lower boundaries recited may be combined with any of the higher boundaries recited, to establish ranges, and vice-versa.
-
TABLE 47 Percent Occurrence of i − i + 1 (IP1) Amino Acid Pairs in Human Sequences, Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al. LUA- LUA- Pairs Preimmune Humabs Matured 59 141 HuCAL XYZ LUA− LUA+ Range HuCAL XYZ YY 5.87 4.44 3.27 5.83 5.93 2.25 0.47 2.50 6.50 4.00 0 0 SG 3.54 3.41 3.26 3.90 3.72 0.61 1.13 2.50 4.50 2.00 0 0 SS 3.35 2.65 2.26 2.82 3.08 0.16 0.88 2.00 4.00 2.00 0 0 GS 2.59 2.37 2.20 3.82 3.52 0.61 1.13 1.50 4.00 2.50 0 0 GY 2.55 2.34 2.12 3.15 2.56 2.25 0.83 2.00 3.50 1.50 1 0 GG 2.19 2.28 2.41 6.78 3.51 2.25 1.45 2.00 7.00 5.00 1 0 YS 1.45 1.30 1.23 1.40 1.52 0.61 0.64 0.75 2.00 1.25 0 0 YG 1.35 1.21 1.10 1.64 1.69 2.25 0.83 0.75 2.00 1.25 0 1 SY 1.31 1.07 0.90 1.65 1.77 0.61 0.64 0.75 2.00 1.25 0 0 YD 1.67 1.40 1.17 0.88 0.90 0.61 0.69 0.75 2.25 1.50 0 0 DS 1.53 1.31 1.16 1.20 1.46 0.16 0.94 0.75 2.00 1.25 0 1 DY 1.40 1.23 1.11 0.34 0.48 0.61 0.69 0.25 2.00 1.75 1 1 VV 1.37 0.94 0.64 2.30 2.30 0.16 0.44 0.50 2.50 2.00 0 0 GD 1.20 1.21 1.25 0.49 0.44 0.61 1.21 0.25 1.75 1.50 1 1 AA 1.16 0.93 0.75 1.27 1.46 0.16 0.49 0.60 1.50 0.90 0 0 RG 1.08 1.26 1.38 1.69 1.38 0.61 0.82 1.00 2.00 1.00 0 0 VA 0.91 0.66 0.46 0.36 0.35 0.16 0.46 0.25 1.00 0.75 0 1 GV 0.84 0.89 0.95 2.87 2.16 0.61 0.79 0.80 3.00 2.20 0 0 CS 0.82 0.55 0.38 0.79 0.80 0.04 0.59 0.50 1.00 0.50 0 1 GR 0.74 0.90 1.00 1.01 0.79 0.61 0.82 0.70 1.25 0.55 0 1 The pairs in bold comprise about 19% to about 24% of occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for about 27% to about 31% of the occurrences in the LUA libraries, but only about 12% in the HuCAL library and about 8% in the “XYZ” library. This is a reflection of the fact that pair-wise biases do exist in the human and LUA libraries, but not in the others. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within. -
TABLE 48 Percent Occurrence of i − i + 2 (IP2) Amino Acid Pairs in Human Sequences, Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al. LUA- LUA- Pairs Preimmune Humabs Matured 59 141 HuCAL XYZ LUA− LUA+ Range HuCAL XYZ YY 3.57 2.59 1.78 2.99 3.11 2.25 0.47 2.5 4.5 2 0 0 GY 3.34 2.91 2.56 4.96 3.78 2.25 0.83 2.5 5.5 3 0 0 SY 2.94 2.41 2.01 3.03 3.42 0.61 0.64 2 4 2 0 0 YS 2.88 2.34 1.95 3.24 3.32 0.61 0.64 1.75 3.75 2 0 0 SG 2.60 2.29 2.05 2.84 2.96 0.61 1.13 2 3.5 1.5 0 0 SS 2.27 2.01 1.84 2.30 2.50 0.16 0.88 1.5 3 1.5 0 0 GS 2.16 2.12 2.10 2.96 2.32 0.61 1.13 1.5 3 1.5 0 0 GG 1.92 2.25 2.44 6.23 3.68 2.25 1.45 1.5 7 5.5 1 0 YG 1.17 1.14 1.15 1.39 1.47 2.25 0.83 1 2 1 0 0 DS 2.03 1.67 1.40 1.21 1.48 0.16 0.94 1 2.5 1.5 0 0 YD 1.71 1.39 1.11 0.89 0.92 0.61 0.69 0.75 1.75 1 0 0 VG 1.35 1.17 1.01 1.75 1.54 0.61 0.79 1 2 1 0 0 DY 1.06 1.02 0.99 0.23 0.40 0.61 0.69 0.2 1.2 1 1 1 WG 1.06 0.76 0.53 0.85 0.91 0.61 0.24 0.75 1.25 0.5 0 0 RY 0.98 1.00 0.96 0.70 0.91 0.61 0.47 0.6 1 0.4 1 0 GC 0.97 0.75 0.64 0.94 0.81 0.15 0.75 0.5 1 0.5 0 1 DG 0.95 1.05 1.08 1.78 1.05 0.61 1.21 0.75 2 1.25 0 1 GD 0.94 0.88 0.86 0.47 0.36 0.61 1.21 0.25 1 0.75 1 0 VV 0.94 0.59 0.35 0.95 0.90 0.16 0.44 0.5 1 0.5 0 0 AA 0.90 0.73 0.59 0.72 0.74 0.16 0.49 0.5 1 0.5 0 0 The pairs in bold comprise about 18% to about 23% of occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for about 27% to about 30% of the occurrences in the LUA libraries, but only about 12% in the HuCAL library and about 8% in the “XYZ” library. Because of the nature of the construction of the central loops in the HuCal and XYZ libraries, these numbers are the same for the IP1, IP2, and IP3 pairs. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within. -
TABLE 49 Percent Occurrence of i − i + 3 (IP3) Amino Acid Pairs in Human Sequences, Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al. LUA- LUA- Pairs Preimmune Humabs Matured 59 141 HuCAL XYZ LUA − LUA+ Range HuCAL XYZ GY 3.55 2.85 2.32 5.80 4.42 2.25 0.83 2.5 6.5 4 0 0 SY 3.38 3.01 2.67 3.78 4.21 0.61 0.64 1 5 4 0 0 YS 3.18 2.56 2.05 3.20 3.33 0.61 0.64 2 4 2 0 0 SS 2.26 1.74 1.37 1.81 2.18 0.16 0.88 1 3 2 0 0 GS 2.23 2.13 2.00 4.60 3.33 0.61 1.13 2 5 3 0 0 YG 2.14 1.65 1.35 2.69 2.79 2.25 0.83 1.5 3 1.5 1 0 YY 1.86 1.48 1.12 1.18 1.27 2.25 0.47 0.75 2 1.25 0 0 GG 1.60 1.87 2.11 4.73 2.84 2.25 1.45 1.5 5 3.5 1 0 SG 0.90 1.04 1.12 0.93 1.25 0.61 1.13 0.75 1.5 0.75 0 1 DG 2.01 1.94 1.84 2.51 2.03 0.61 1.21 1.5 3 1.5 0 0 DS 1.48 1.31 1.22 0.41 0.55 0.16 0.94 0.25 1.5 1.25 0 1 VA 1.18 0.83 0.55 1.48 1.46 0.16 0.46 0.5 2 1.5 0 0 AG 1.13 1.09 1.03 0.97 1.04 0.61 0.84 0.9 2 1.1 0 0 TY 1.05 0.90 0.76 1.01 1.16 0.61 0.24 0.75 1.75 1 0 0 PY 1.02 0.88 0.79 1.23 0.86 0.61 0.21 0.75 1.75 1 0 0 RS 1.02 0.88 0.77 0.38 0.55 0.16 0.64 0.25 1.25 1 0 1 RY 1.02 1.12 1.14 0.68 0.88 0.61 0.47 0.65 1.25 0.6 0 0 LY 1.01 0.88 0.75 0.69 0.76 0.61 0.28 0.65 1.25 0.6 0 0 DY 0.93 0.84 0.77 0.72 0.95 0.61 0.69 0.7 1.3 0.6 0 0 GC 0.90 0.62 0.48 0.86 0.68 0.15 0.75 0.5 1 0.5 0 1 The pairs in bold make up about 16% to 21% of the occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for about 26 to 29% of the occurrences in the LUA libraries, but only about 12% in the HuCAL library and about 8% for the “XYZ” library. Because of the nature of the construction of the central loops in the HuCAL and XYZ libraries, these numbers are the same for the IP1, IP2, and IP3 pairs. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within. - The analysis provided in this example demonstrates that the composition of the libraries of the present invention more closely mimics the composition of human sequences than other libraries known in the art. Synthetic libraries of the art do not intrinsically reproduce the composition of the “central loop” portion actual human CDRH3 sequences at the level of pair percentages. The libraries of the invention have a more complex pair-wise composition that closely reproduces that observed in actual human CDRH3 sequences. The exact degree of this reproduction versus a target set of actual human CDRH3 sequences may be optimized, for example, by varying the compositions of the segments used to design the CDRH3 libraries. Moreover, it is also possible to utilize these metrics to computationally design libraries that exactly mimic the pair-wise compositional prevalence found in human sequences.
- One way to quantify the observation that certain libraries, or collection of sequences, may be intrinsically more complex or “less random” than others is to apply information theory (Shannon, Bell Sys. Tech. J., 1984, 27: 379; Martin et al., Bioinformatics, 2005, 21: 4116; Weiss et al., J. Theor. Biol., 2000, 206: 379, each incorporated by reference in its entirety). For example, a metric can be devised to quantify the fact that a position with a fixed amino acid represents less “randomness” than a position where all 20 amino acids may occur with equal probability. Intermediate situations should lead, in turn, to intermediate values of such a metric. According to information theory this metric can be represented by the formula:
-
I=Σi=1 Nfi log2 fi - Here, fi is the normalized frequency of occurrence of i, which may be an amino acid type (in which case N would be equal to 20). When all fi are zero except for one, the value of I is zero. In any other case the value of I would be smaller, i.e., negative, and the lowest value is achieved when all fi values are the same and equal to N. For the amino acid case, N is 20, and the resulting value of I would be −4.322. Because I is defined with
base 2 logarithms, the units of I are bits. - The I value for the HuCAL and XYZ libraries at the single position level may be derived from Tables 45 and 46, respectively, and are equal to −4.08 and −4.06. The corresponding single residue frequency occurrences in the non-limiting exemplary libraries of the invention and the sets of human sequences previously introduced, taken within the “central loop” as defined above, are provided in Table 50.
-
TABLE 50 Amino Acid Type Frequencies in Central Loop Type Preimmune Humabs Matured LUA-59 LUA-141 A 5.46 5.51 5.39 5.71 6.06 C 1.88 1.46 1.22 1.33 1.34 D 7.70 7.51 7.38 4.76 5.23 E 2.40 2.90 3.28 3.99 4.68 F 2.29 2.60 2.81 1.76 2.17 G 14.86 15.42 15.82 24.90 18.85 H 1.46 1.79 2.01 0.20 0.67 I 3.71 3.26 2.99 3.99 4.34 K 1.06 1.27 1.44 0.21 0.67 L 4.48 4.84 5.16 4.12 4.54 M 1.18 1.03 0.93 0.94 1.03 N 1.81 2.43 2.84 0.41 0.65 P 4.12 4.10 4.13 5.68 3.96 Q 1.60 1.77 1.95 0.21 0.68 R 5.05 5.90 6.41 3.35 4.11 S 12.61 11.83 11.37 11.18 12.77 T 4.59 5.11 5.47 4.36 4.95 V 6.21 5.55 5.12 8.13 7.67 W 2.79 2.91 3.07 1.57 1.98 Y 14.74 12.81 11.24 13.20 13.63
The information content of these sets, computed by the formula given above, would then be −3.88, −3.93, −3.96, −3.56, and −3.75, for the preimmune, human, matured, LUA-59 and LUA-141 sets, respectively. As the frequencies deviate more from completely uniform (5% for each of the 20), then numbers tend to be larger, or less negative. - The identical approach can be used to analyze pair compositions, or frequencies, by calculating the sum in the formula above over the 20×20 or 400 values of the frequencies for each of the pairs. It can be shown that any pair frequency made up of the simple product of two singleton frequency sets is equal to the sum of the individual singleton I values. If the two singleton frequency sets are the same or approximately so, this means that I (independent pairs)=2*I (singles). It is thus possible to define a special case of the mutual information, MI, for a general set of pair frequencies as MI (pair)=I(pair)−2*I (singles) to measure the amount of information gained by the structure of the pair frequencies themselves (compare to the standard definitions in Martin et al., 2005, for example, after considering that I(X)=−H(X) in their notation). When there is no such structure, the value of MI is simply zero.
- Values of MI computed from the pair distributions discussed above (over the entire set of 400 values) are given in Table 51.
-
TABLE 51 Mutual Information Within Central Loop of CDRH3 Library or Set i − i + 1 i − i + 2 i − i + 3 Preimmune 0.226 0.192 0.163 Humabs 0.153 0.128 0.111 Matured 0.124 0.107 0.100 LUA-59 0.422 0.327 0.278 LUA-141 0.376 0.305 0.277 HuCAL 0.000 0.000 0.000 XYZ 0.000 0.000 0.000
It is notable that the MI values decrease within sets of human sequences as those sequences undergo further somatic mutation, a process that over many independent sequences is essentially random. It is also worth noting that the MI values decrease as the pairs being considered sit further and further apart, and this is the case for both sets of human sequences, and exemplary libraries of the invention. In both cases, as the two amino acids in a pair become further separated the odds of their straddling an actual segment (V, D, J plus V-D or D-J insertions) increase, and their pair frequencies become closer to a simple product of singleton frequencies. - Table 52 contains sequence information on certain immunoglobulin gene segments cited in the application. These sequences are non-limiting, and it is recognized that allelic variants exist and encompassed by the present invention. Accordingly, the methods present herein can be utilized with mutants of these sequences.
-
TABLE 52 Sequence Information for Certain Immunoglobulin Gene Segments Cited Herein SEQ ID NO: Sequence Peptide or Nucleotide Sequence Observations 423 IGHV1-3 QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYAMHWVRQ APGQRLEWMGWINAGNGNTKYSQKFQGRVTITRDTSAST AYMELSSLRSEDTAVYYCAR 424 IGHV1- QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQ 8_v1 ATGQGLEWMGWMNPNSGNTGYAQKFQGRVTMTR NTS IS TAYMELSSLRSEDTAVYYCAR 425 IGHV1- QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQ N to D mutation avoids 8_v2 ATGQGLEWMGWMNPNSGNTGYAQKFQGRVTMTR DTS IS NTS potential glyco- TAYMELSSLRSEDTAVYYCAR sylation site in the original germline se- quence (v1 above). XTS, where X is not N, and NTZ, where Z is not S or T are also options. NPS is yet another option that is much less likely to be N- linked glycosylated. 426 IGHV1-24 QVQLVQSGAEVKKPGASVKVSCKVSGYTLTELSMHWVRQ APGKGLEWMGGFDPEDGETIYAQKFQGRVTMTEDTSTDT AYMELSSLRSEDTAVYYCAT 427 IGHV1-45 QMQLVQSGAEVKKTGSSVKVSCKASGYTFTYRYLHWVRQ APGQALEWMGWITPFNGNTNYAQKFQDRVTITRDRSMST AYMELSSLRSEDTAMYYCAR 428 IGHV1-58 QMQLVQSGPEVKKPGTSVKVSCKASGFTFTSSAVQWVRQ ARGQRLEWIGWIVVGSGNTNYAQKFQERVTITRDMSTSTA YMELSSLRSEDTAVYYCAA 429 IGHV2-5 QITLKESGPTLVKPTQTLTLTCTFSGFSLSTSGVGVGWIRQ PPGKALEWLALIYWDDDKRYSPSLKSRLTITKDTSKNQVVL TMTNMDPVDTATYYCAHR 430 IGHV2-26 QVTLKESGPVLVKPTETLTLTCTVSGFSLSNARMGVSWIRQ PPGKALEWLAHIFSNDEKSYSTSLKSRLTISKDTSKSQVVLT MTNMDPVDTATYYCARI 431 IGHV2- RVTLRESGPALVKPTQTLTLTCTFSGFSLSTSGM C VSWIRQ 70_v1 PPGKALEWLARIDWDDDKYYSTSLKTRLTISKDTSKNQVVL TMTNMDPVDTATYYCARI 432 IGHV2- RVTLRESGPALVKPTQTLTLTCTFSGFSLSTSGM G VSWIRQ C to G mutation avoids 70_v2 PPGKALEWLARIDWDDDKYYSTSLKTRLTISKDTSKNQVVL unpaired Cys in v1 TMTNMDPVDTATYYCARI above. G was chosen by analogy to other germ- line sequences, but other amino acid types, R, S, T, as non-limiting examples, are possible. 433 IGHV3-9 EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQ APGKGLEWVSGISWNSGSIGYADSVKGRFTISRDNAKNSL YLQMNSLRAEDTALYYCAKD 434 IGHV3-11 QVQLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQ APGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLY LQMNSLRAEDTAVYYCAR 435 IGHV3-13 EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYDMHWVRQ ATGKGLEWVSAIGTAGDTYYPGSVKGRFTISRENAKNSLYL QMNSLRAGDTAVYYCAR 436 IGHV3-20 EVQLVESGGGVVRPGGSLRLSCAASGFTFDDYGMSWVR QAPGKGLEWVSGINWNGGSTGYADSVKGRFTISRDNAKN SLYLQMNSLRAEDTALYHCAR 437 IGHV3-21 EVQLVESGGGLVKPGGSLRLSCAASGFTFSSYSMNWVRQ APGKGLEWVSSISSSSSYIYYADSVKGRFTISRDNAKNSLY LQMNSLRAEDTAVYYCAR 438 IGHV3-43 EVQLVESGGVVVQPGGSLRLSCAASGFTFDDYTMHWVRQ APGKGLEWVSLISWDGGSTYYADSVKGRFTISRDNSKNSL YLQMNSLRTEDTALYYCAKD 439 IGHV3-49 EVQLVESGGGLVQPGRSLRLSCTASGFTFGDYAMSWVRQ APGKGLEWVGFIRSKAYGGTTEYAASVKGRFTISRDDSKSI AYLQMNSLKTEDTAVYYCTR 440 IGHV3-53 EVQLVESGGGLIQPGGSLRLSCAASGFTVSSNYMSWVRQ APGKGLEWVSVIYSGGSTYYADSVKGRFTISRDNSKNTLYL QMNSLRAEDTAVYYCAR 441 IGHV3-64 EVQLVESGGGLVQPGGSLRLSCSASGFTFSSYAMHWVRQ APGKGLEYVSAISSNGGSTYYADSVKGRFTISRDNSKNTLY LQMSSLRAEDTAVYYCVK 442 IGHV3-66 EVQLVESGGGLVQPGGSLRLSCAASGFTVSSNYMSWVRQ APGKGLEWVSVIYSGGSTYYADSVKGRFTISRDNSKNTLYL QMNSLRAEDTAVYYCAR 443 IGHV3-72 EVQLVESGGGLVQPGGSLRLSCAASGFTFSDHYMDWVRQ APGKGLEWVGRTRNKANSYTTEYAASVKGRFTISRDDSKN SLYLQMNSLKTEDTAVYYCAR 444 IGHV3-73 EVQLVESGGGLVQPGGSLKLSCAASGFTFSGSAMHWVRQ ASGKGLEWVGRIRSKANSYATAYAASVKGRFTISRDDSKN TAYLQMNSLKTEDTAVYYCTR 445 IGHV3-74 EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYWMHWVR QAPGKGLVWVSRINSDGSSTSYADSVKGRFTISRDNAKNT LYLQMNSLRAEDTAVYYCAR 446 IGHV4-4v1 QVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR Contains CDRH1 with QPPGKGLEWIGEIYHSGSTNYNPSLKSRVTISVDKSKNQFS size 6 (Kabat defi- LKLSSVTAADTAVYYCAR nition); canonical structure H1-2. Sequence corresponds to allele *02 of IGHV4-4. 447 IGHV4-4v2 QVQLQESGPGLVKPSETLSLTCTVSGGSISSYYWSWIRQP Contains CDRH1 with AGKGLEWIGRIYTSGSTNYNPSLKSRVTMSVDTSKNQFSL size 5 (Kabat defi- KLSSVTAADTAVYYCAR nition); canonical structure H1-1. Sequence corresponds to allele *07 of IGHV4-4 448 IGHV4-28 QVQLQESGPGLVKPSDTLSLTCAVSGYSISSSNWWGWIR QPPGKGLEWIGYIYYSGSTYYNPSLKSRVTMSVDTSKNQF SLKLSSVTAVDTAVYYCAR 449 IGHV6-1 QVQLQQSGPGLVKPSQTLSLTCAISGDSVSSNSAAWNWIR QSPSRGLEWLGRTYYRSKWYNDYAVSVKSRITINPDTSKN QFSLQLNSVTPEDTAVYYCAR 450 IGHV7-4-1 QVQLVQSGSELKKPGASVKVSCKASGYTFTSYAMNWVRQ APGQGLEWMGWINTNTGNPTYAQGFTGRFVFSLDTSVST AYLQISSLKAEDTAVYYCAR 451 IGKV1-06 AIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKP GKAPKLLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE DFATYYCLQDYNYP 452 IGKV1- AIRMTQSPSSFSASTGDRVTITCRASQGISSYLAWYQQKP 08_v1 GKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTIS C LQSE DFATYYCQQYYSYP 453 IGKV1- AIRMTQSPSSFSASTGDRVTITCRASQGISSYLAWYQQKP C to S mutation avoids 08_v2 GKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTIS S LQSE unpaired Cys. in v1 DFATYYCQQYYSYP above. S was chosen by analogy to other germline sequences, but amino acid types, N, R, S, as non-limiting examples, are also possible 454 IGKV1-09 DIQLTQSPSFLSASVGDRVTITCRASQGISSYLAWYQQKPG KAPKLLIYAASTLQSGVPSRFSGSGSGTEFTLTISSLQPEDF ATYYCQQLNSYP 455 IGKV1-13 AIQLTQSPSSLSASVGDRVTITCRASQGISSALAWYQQKPG KAPKLLIYDASSLESGVPSRFSGSGSGTDFTLTISSLQPEDF ATYYCQQFNSYP 456 IGKV1-16 DIQMTQSPSSLSASVGDRVTITCRASQGISNYLAWFQQKP GKAPKSLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE DFATYYCQQYNSYP 457 IGKV1-17 DIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKP GKAPKRLIYAASSLQSGVPSRFSGSGSGTEFTLTISSLQPE DFATYYCLQHNSYP 458 IGKV1- DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPG 37_v1 KVPKLLIYSASNLQSGVPSRFSGSGSGTDFTLTISSLQPED VATYY G QRTYNAP 459 IGKV1- DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPG Restores conserved Cys, 37_v2 KVPKLLIYSASNLQSGVPSRFSGSGSGTDFTLTISSLQPED missing in v1 above, VATYY C QRTYNAP just prior to CDRL3. 460 IGKV1D-16 DIQMTQSPSSLSASVGDRVTITCRASQGISSWLAWYQQKP EKAPKSLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE DFATYYCQQYNSYP 461 IGKV1D-17 NIQMTQSPSAMSASVGDRVTITCRARQGISNYLAWFQQKP GKVPKHLIYAASSLQSGVPSRFSGSGSGTEFTLTISSLQPE DFATYYCLQHNSYP 462 IGKV1D-43 AIRMTQSPFSLSASVGDRVTITCWASQGISSYLAWYQQKP AKAPKLFIYYASSLQSGVPSRFSGSGSGTDYTLTISSLQPE DFATYYCQQYYSTP 463 IGKV1D- VIWMTQSPSLLSASTGDRVTISCRMSQGISSYLAWYQQKP 8_v1 GKAPELLIYAASTLQSGVPSRFSGSGSGTDFTLTIS C LQSE DFATYYCQQYYSFP 464 IGKV1D- VIWMTQSPSLLSASTGDRVTISCRMSQGISSYLAWYQQKP C to S mutation avoids 8_v2 GKAPELLIYAASTLQSGVPSRFSGSGSGTDFTLTIS S LQSE unpaired Cys. in v1 DFATYYCQQYYSFP above. S was chosen by analogy to other germline sequences, but amino acid types, N, R, S, as non-limiting examples, are also possible 465 IGKV2-24 DIVMTQTPLSSPVTLGQPASISCRSSQSLVHSDGNTYLSWL QQRPGQPPRLLIYKISNRFSGVPDRFSGSGAGTDFTLKISR VEAEDVGVYYCMQATQFP 466 IGKV2-29 DIVMTQTFLSLSVTRQQPASISCKSSQSLLHSDGVTYLYWY LQRPQQSPQLLTYEVSSRFSGVPDRFSGSGSGTDFTLKIS RVEAEDVGVYYCMQGTHLP 467 IGKV2-30 DVVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNW FQQRPGQSPRRLIYKVSNRDSGVPDRFSGSGSGTDFTLKI SRVEAEDVGVYYCMQGTHWP 468 IGKV2-40 DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSDDGNTYLDW YLQKPGQSPQLLIYTLSYRASGVPDRFSGSGSGTDFTLKIS RVEAEDVGVYYCMQRIEFP 469 IGKV2D-26 EIVMTQTPLSLSITPGEQASMSCRSSQSLLHSDGYTYLYWF LQKARPVSTLLIYEVSNRFSGVPDRFSGSGSGTDFTLKISR VEAEDFGVYYCMQDAQD 470 IGKV2D-29 DIVMTQTPLSLSVTPGQPASISCKSSQSLLHSDGKTYLYWY LQKPGQPPQLLIYEVSNRFSGVPDRFSGSGSGTDFTLKISR VEAEDVGVYYCMQSIQLP 471 IGKV2D-30 DVVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNW FQQRPGQSPRRLIYKVSNWDSGVPDRFSGSGSGTDFTLKI SRVEAEDVGVYYCMQGTHWP 472 IGKV3D-07 EIVMTQSPATLSLSPGERATLSCRASQSVSSSYLSWYQQK PGQAPRLLIYGASTRATGIPARFSGSGSGTDFTLTISSLQPE DFAVYYCQQDYNLP 473 IGKV3D-11 EIVLTQSPATLSLSPGERATLSCRASQGVSSYLAWYQQKP GQAPRLLIYDASNRATGIPARFSGSGPGTDFTLTISSLEPED FAVYYCQQRSNWH 474 IGKV3D-20 EIVLTQSPATLSLSPGERATLSCGASQSVSSSYLAWYQQK PGLAPRLLIYDASSRATGIPDRFSGSGSGTDFTLTISRLEPE DFAVYYCQQYGSSP 475 IGKV5-2_v1 ETTLTQSPAFMSATPGDKV NIS CKASQDIDDDMNWYQQKP GEAAIFIIQEATTLVPGIPPRFSGSGYGTDFTLTINNIESEDA AYYFCLQHDNFP 476 IGKV5-2_v2 ETTLTQSPAFMSATPGDKV TIS CKASQDIDDDMNWYQQKP N to D mutation avoids GEAAIFIIQEATTLVPGIPPRFSGSGYGTDFTLTINNIESEDA NIS potential glyco- AYYFCLQHDNFP sylation site in v1 above. XIS, where X is not N, and NIZ, where Z is not S or T are also options. NPS is yet another option that is much less likely to be N-linked glycosylated. 477 IGKV6-21 EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPD QSPKLLIKYASQSFSGVPSRFSGSGSGTDFTLTINSLEAED AATYYCHQSSSLP 478 IGKV6D-21 EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPD QSPKLLIKYASQSFSGVPSRFSGSGSGTDFTLTINSLEAED AATYYCHQSSSLP 479 IGKV7-3 DIVLTQSPASLAVSPGQRATITCRASESVSFLGINLIHWYQQ KPGQPPKLLIYQASNKDTGVPARFSGSGSGTDFTLTINPVE ANDTANYYCLQSKNFP 480 IGλV1-36 QSVLTQPPSVSEAPRQRVTISCSGSSSNIGNNAVNWYQQL PGKAPKLLIYYDDLLPSGVSDRFSGSKSGTSASLAISGLQS EDEADYYCAAWDDSLNG 481 IGλV1-47 QSVLTQPPSASGTPGQRVTISCSGSSSNIGSNYVYWYQQL PGTAPKLLIYRNNQRPSGVPDRFSGSKSGTSASLAISGLRS EDEADYYCAAWDDSLSG 482 IGλV10-54 QAGLTQPPSVSKGLRQTATLTCTGNSNNVGNQGAAWLQQ HQGHPPKLLSYRNNNRPSGISERLSASRSGNTASLTITGLQ PEDEADYYCSAWDSSLSA 483 IGλV2- QSALTQPRSVSGSPGQSVTISCTGTSSDVGGYNYVSWYQ 11_v1 QHPGKAPKLMIYDVSKRPSGVPDRFSGSKSGNTASLTISGL QAEDEADYYC C SYAGSYTF 484 IGλV2- QSALTQPRSVSGSPGQSVTISCTGTSSDVGGYNYVSWYQ C to S mutation avoids 11_v2 QHPGKAPKLMIYDVSKRPSGVPDRFSGSKSGNTASLTISGL unpaired Cys in v1 QAEDEADYYC S SYAGSYTF above. S was chosen by analogy to other germ- line sequences, but other amino acid types, such as Q, G, A, L, as non-limiting examples, are also possible 485 IGλV2-18 QSALTQPPSVSGSPGQSVTISCTGTSSDVGSYNRVSWYQ QPPGTAPKLMIYEVSNRPSGVPDRFSGSKSGNTASLTISGL QAEDEADYYCSLYTSSSTF 486 IGλV2- QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQ 23_v1 HPGKAPKLMIYEGSKRPSGVSNRFSGSKSGNTASLTISGL QAEDEADYYC C SYAGSSTL 487 IGλV2- QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQ C to S mutation avoids 23_v2 HPGKAPKLMIYEGSKRPSGVSNRFSGSKSGNTASLTISGL unpaired Cys in v1 QAEDEADYYC S SYAGSSTL above. S was chosen by analogy to other germ- line sequences, but other amino acid types, such as Q, G, A, L, as non-limiting examples, are also possible 488 IGλV2-8 QSALTQPPSASGSPGQSVTISCTGTSSDVGGYNYVSWYQ QHPGKAPKLMIYEVSKRPSGVPDRFSGSKSGNTASLTVSG LQAEDEADYYCSSYAGSNNF 489 IGλV3-10 SYELTQPPSVSVSPGQTARITCSGDALPKKYAYWYQQKSG QAPVLVIYEDSKRPSGIPERFSGSSSGTMATLTISGAQVED EADYYCYSTDSSGNH 490 IGλV3-12 SYELTQPHSVSVATAQMARITCGGNNIGSKAVHWYQQKP GQDPVLVIYSDSNRPSGIPERFSGSNPGNTTTLTISRIEAGD EADYYCQVWDSSSDH 491 IGλV3-16 SYELTQPPSVSVSLGQMARITCSGEALPKKYAYWYQQKPG QFPVLVIYKDSERPSGIPERFSGSSSGTIVTLTISGVQAEDE ADYYCLSADSSGTY 492 IGλV3-25 SYELMQPPSVSVSPGQTARITCSGDALPKQYAYWYQQKP GQAPVLVIYKDSERPSGIPERFSGSSSGTTVTLTISGVQAE DEADYYCQSADSSGTY 493 IGλV3-27 SYELTQPSSVSVSPGQTARITCSGDVLAKKYARWFQQKPG QAPVLVIYKDSERPSGIPERFSGSSSGTTVTLTISGAQVEDE ADYYCYSAADNN 494 IGλV3-9 SYELTQPLSVSVALGQTARITCGGNNIGSKNVHWYQQKPG QAPVLVIYRDSNRPSGIPERFSGSNSGNTATLTISRAQAGD EADYYCQVWDSSTA 495 IGλV4-3 LPVLTQPPSASALLGASIKLTCTLSSEHSTYTIEWYQQRPG RSPQYIMKVKSDGSHSKGDGIPDRFMGSSSGADRYLTFSN LQSDDEAEYHCGESHTIDGQVG 496 IGλV4-60 QPVLTQSSSASASLGSSVKLTCTLSSGHSSYIIAWHQQQP GKAPRYLMKLEGSGSYNKGSGVPDRFSGSSSGADRYLTIS NLQLEDEADYYCETWDSNT 497 IGλV5-39 QPVLTQPTSLSASPGASARFTCTLRSGINVGTYRIYWYQQK PGSLPRYLLRYKSDSDKQQGSGVPSRFSGSKDASTNAGLL LISGLQSEDEADYYCAIWYSSTS 498 IGλV7-46 QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTSGHYPYWFQ QKPGQAPRTLIYDTSNKHSWTPARFSGSLLGGKAALTLSG AQPEDEAEYYCLLSYSGAR 499 IGλV8-61 QTVVTQEPSFSVSPGGTVTLTCGLSSGSVSTSYYPSWYQ QTPGQAPRTLIYSTNTRSSGVPDRFSGSILGNKAALTITGA QADDESDYYCVLYMGSGI 500 IGλV9-49 QPVLTQPPSASASLGASVTLTCTLSSGYSNYKVDWYQQRP GKGPRFVMRVGTGGIVGSKGDGIPDRFSVLGSGLNRYLTI KNIQEEDESDYHCGADHGSGSNFV 501 IGHD1-1 GGTACAACTGGAACGAC See (1) below. 502 IGHD1-14 GGTATAACCGGAACCAC 503 IGHD1-20 GGTATAACTGGAACGAC 504 IGHD1-7 GGTATAACTGGAACTAC 505 IGHD2- AGCATATTGTGGTGGTGA T TGCTATTCC 21_v1 506 IGHD2- AGCATATTGTGGTGGTGA C TGCTATTCC Common allelic variant 21_v2 encoding a different amino acid sequence, compared to v1, in 2 of 3 forward reading frames. 507 IGHD2-8 AGGATATTGTACTAATGGTGTATGCTATACC 508 IGHD3-16 GTATTATGATTACGTTTGGGGGAGTTATGCTTATACC 509 IGHD3-9 GTATTACGATATTTTGACTGGTTATTATAAC 510 IGHD4-23 TGACTACGGTGGTAACTCC 511 IGHD4- TGACTACAGTAACTAC 4/4-11 512 IGHD5-12 GTGGATATAGTGGCTACGATTAC 513 IGHD5-24 GTAGAGATGGCTACAATTAC 514 IGHD6-25 GGGTATAGCAGCGGCTAC 515 IGHD6-6 GAGTATAGCAGCTCGTCC 516 IGHD7-27 CTAACTGGGGA (1) Each of the IGHD nucleotide sequences can be read in three (3) forward reading frames, and, possibly, in 3 reverse reading frames. For example, the nucleotide sequence given for IGHD1-1, depending on how it inserts in full V-DJ rearrangement, may encode the full peptide sequences: GTTGT (SEQ ID NO: 517), VQLER (SEQ ID NO: 518) and YNWND (SEQ ID NO: 519) in the forward direction, and VVPVV (SEQ ID NO: 520), SFQLY (SEQ ID NO: 521) and RSSCT (SEQ ID NO: 522) in the reverse direction. Each of these sequences, in turn, could generate progressively deleted segments as explained in the Examples to produce suitable components for libraries of the invention. - In this example, the selection of antibodies from a library of the invention (described in Examples 9-11 and other Examples) is demonstrated. These selections demonstrate that the libraries of the invention encode antibody proteins capable of binding to antigens. In one selection, antibodies specific for “Antigen X”, a protein antigen, were isolated from the library using the methods described herein.
FIG. 24 shows binding curves for six clones specifically binding Antigen X, and their Kd values. This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library integrated into the genome of the yeast. - In a separate selection, antibodies specific for a model antigen, hen egg white lysozyme (HEL) were isolated.
FIG. 25 shows the binding curves for 10 clones specifically binding HEL; each gave a Kd>500 nM. This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library on a plasmid vector. The sequences of the heavy and light chains were determined for clones isolated from the library and it was demonstrated that multiple clones were present. A portion of the FRM3s (underlined) and the entire CDRH3s from four clones are shown below (Table 53 and Table 54, the latter using the numbering system of the invention). -
TABLE 53 Sequences of CDRH3, and a Portion of FRM3, from Four HEL Binders SEQ Seq ID FRM3 and Name NO: CDRH3 Tail N1 DH N2 H3-JH CR080362 523 AKGPSVPAAR G PS VPA AR AEYFQH AEYFQH CR080363 524 AREGGLGYYY E GGL GYYY RE WYFDL REWYFDL CR080372 525 AKPDYGAEYF — P DYG — AEYFQH QH EK080902 526 AKEIVVPSAE E — IVV PS AEYFQH YFQH -
TABLE 54 Sequences of CDRH3 from Four HEL Binders in Numbering System of the Invention, According to the Numbering System of the Invention [TAIL] [N1] [DH] [N2] [H3-JH] Clones 95 96 96A 96B 97 97A 97B 97C 97D 98 98A 98B 99E CR080362 G P S — V P A — — A R — — CR080363 E G G L G Y Y Y — R E — — CR080372 — P — — D Y G — — — — — — EK080902 E — — — I V V — — P S — — [H3-JH] CDRH3 Clones 99D 99C 99B 99A 99 100 101 102 Length CR080362 — — A E Y F Q H 14 CR080363 — — — W Y F D L 15 CR080372 — — A E Y F Q H 10 EK080902 — — A E Y F Q H 12 Sequence Identifiers: CR080362 (SEQ ID NO: 523); CR080363 (SEQ ID NO: 524); CR080372 (SEQ ID NO: 525); EK080902 (SEQ ID NO: 526)
The heavy chain chassis isolated were VH3-23.0 (for EK080902 and CR080363), VH3-23.6 (for CR080362), and VH3-23.4 (for CR080372). These variants are defined in Table 8 of Example 2. Each of the four heavy chain CDRH3 sequences matched a designed sequence from the exemplified library. The CDRL3 sequence of one of the clones (ED080902) was also determined, and is shown below, with the surrounding FRM regions underlined: -
CDRL3: YYCQESFHIPYTFGGG. (SEQ ID NO: 527)
In this case, the CDRL3 matched the design of a degenerate VK1-39 oligonucleotide sequence in row 49 of Table 33. The relevant portion of this table is reproduced below, with the amino acids occupying each position of the isolated CDRL3 bolded and underlined: -
CDR Junction Degenerate SEQ Chassis Length type Oligonucleotide ID 89 90 91 92 93 94 95 96 97 VK1-39 9 1 CWGSAAWCATHC 307 LQ EQ ST FSY HNPRST IST P FY T MVTABTCCTTWCA CT - Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments and methods described herein. Such equivalents are intended to be encompassed by the scope of the following claims.
- The present invention overcomes the inadequacies inherent in the known methods for generating libraries of antibody-encoding polynucleotides by specifically designing the libraries with directed sequence and length diversity. The libraries are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences.
Claims (7)
1-72. (canceled)
73. A method of isolating one or more host cells expressing one or more antibodies, the method comprising:
(i) expressing the one or more antibodies in one or more host cells, wherein the one or more antibodies comprise:
(a) a CDRH3 amino acid sequence comprising:
(A) an N1 amino acid sequence of 0 to about 3 amino acids,
wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by B cells,
(B) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them,
(C) an N2 amino acid sequence of 0 to about 3 amino acids,
wherein each amino acid of the N2 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by B cells; and
(D) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them; and
(b) a VKCDR3 amino acid sequence comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 amino acid sequences derived from a particular IGKV or IGKJ germline sequence; or
(c) a VKCDR3 amino acid sequence of at least about 80% identity to a amino acid sequence represented by the following formula:
[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein:
[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein:
(1) VK Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05, IGKV1-06, IGKV1-08, IGKV1-09, IGKV1-12, IGKV1-13, IGKV1-16, IGKV1-17, IGKV1-27, IGKV1-33, IGKV1-37, IGKV1-39, IGKV1D-16, IGKV1D-17, IGKV1D-43, IGKV1D-8, IGKV2-24, IGKV2-28, IGKV2-29, IGKV2-30, IGKV2-40, IGKV2D-26, IGKV2D-29, IGKV2D-30, IGKV3-11, IGKV3-15, IGKV3-20, IGKV3D-07, IGKV3D-11, IGKV3D-20, IGKV4-1, IGKV5-2, IGKV6-21, and IGKV6D-41, or a sequence of at least about 80% identity to any of them;
(2) L3-VK is the portion of the VKCDR3 encoded by the IGKV gene segment;
(3) X is any amino acid residue; and
(4) JK* is an amino acid sequence selected from the group consisting of amino acid sequences encoded by IGJK1, IGJK2, IGJK3, IGJK4, and IGJK5, wherein the first residue of each IGJK amino acid sequence is not present; and
(ii) contacting the host cells with one or more antigens; and
(iii) isolating one or more host cells having antibodies that bind to the one or more antigens.
74. The method of claim 73 , further comprising isolating the one or more antibodies from the one or more host cells.
75. The method of claim 73 , further comprising the step of isolating one or more polynucleotide sequences encoding the one or more antibodies from the one or more host cells.
76. The method of claim 73 , wherein the one or more host cells are yeast cells.
77. The method of claim 76 , wherein the yeast cells are S. cerevisiae cells.
78. An antibody isolated from the one or more host cells isolated according to the method of claim 73 .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/856,673 US20230399386A1 (en) | 2007-09-14 | 2023-08-30 | Rationally designed, synthetic antibody libraries and uses therefor |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US99378507P | 2007-09-14 | 2007-09-14 | |
US12/210,072 US8691730B2 (en) | 2007-09-14 | 2008-09-12 | Rationally designed, synthetic antibody libraries and uses therefor |
US14/150,129 US10189894B2 (en) | 2007-09-14 | 2014-01-08 | Rationally designed, synthetic antibody libraries and uses therefor |
US16/215,523 US11008383B2 (en) | 2007-09-14 | 2018-12-10 | Rationally designed, synthetic antibody libraries and uses therefor |
US202117229484A | 2021-04-13 | 2021-04-13 | |
US202117532287A | 2021-11-22 | 2021-11-22 | |
US17/856,673 US20230399386A1 (en) | 2007-09-14 | 2023-08-30 | Rationally designed, synthetic antibody libraries and uses therefor |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US202117532287A Continuation | 2007-09-14 | 2021-11-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230399386A1 true US20230399386A1 (en) | 2023-12-14 |
Family
ID=40350153
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/210,072 Active 2029-12-20 US8691730B2 (en) | 2007-09-14 | 2008-09-12 | Rationally designed, synthetic antibody libraries and uses therefor |
US14/150,129 Active 2030-08-11 US10189894B2 (en) | 2007-09-14 | 2014-01-08 | Rationally designed, synthetic antibody libraries and uses therefor |
US16/215,523 Active 2028-11-11 US11008383B2 (en) | 2007-09-14 | 2018-12-10 | Rationally designed, synthetic antibody libraries and uses therefor |
US17/856,673 Pending US20230399386A1 (en) | 2007-09-14 | 2023-08-30 | Rationally designed, synthetic antibody libraries and uses therefor |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/210,072 Active 2029-12-20 US8691730B2 (en) | 2007-09-14 | 2008-09-12 | Rationally designed, synthetic antibody libraries and uses therefor |
US14/150,129 Active 2030-08-11 US10189894B2 (en) | 2007-09-14 | 2014-01-08 | Rationally designed, synthetic antibody libraries and uses therefor |
US16/215,523 Active 2028-11-11 US11008383B2 (en) | 2007-09-14 | 2018-12-10 | Rationally designed, synthetic antibody libraries and uses therefor |
Country Status (11)
Country | Link |
---|---|
US (4) | US8691730B2 (en) |
EP (3) | EP3753947A1 (en) |
JP (6) | JP5933894B2 (en) |
CN (1) | CN101855242B (en) |
AU (1) | AU2008298603B2 (en) |
BR (1) | BRPI0816785A2 (en) |
CA (3) | CA2964398C (en) |
DK (1) | DK3124497T3 (en) |
HK (1) | HK1147271A1 (en) |
MX (3) | MX2010002661A (en) |
WO (1) | WO2009036379A2 (en) |
Families Citing this family (200)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8288322B2 (en) | 2000-04-17 | 2012-10-16 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
ES2430857T3 (en) | 2000-12-18 | 2013-11-22 | Dyax Corp. | Targeted libraries of genetic packages |
USRE47770E1 (en) | 2002-07-18 | 2019-12-17 | Merus N.V. | Recombinant production of mixtures of antibodies |
ES2442615T5 (en) | 2002-07-18 | 2023-03-16 | Merus Nv | Recombinant production of antibody mixtures |
US20040067532A1 (en) | 2002-08-12 | 2004-04-08 | Genetastix Corporation | High throughput generation and affinity maturation of humanized antibody |
WO2004106375A1 (en) | 2003-05-30 | 2004-12-09 | Merus Biopharmaceuticals B.V. I.O. | Fab library for the preparation of anti vegf and anti rabies virus fabs |
US20100069614A1 (en) | 2008-06-27 | 2010-03-18 | Merus B.V. | Antibody producing non-human mammals |
US8877688B2 (en) | 2007-09-14 | 2014-11-04 | Adimab, Llc | Rationally designed, synthetic antibody libraries and uses therefor |
BRPI0816785A2 (en) | 2007-09-14 | 2017-05-02 | Adimab Inc | rationally designed synthetic antibody libraries, and uses thereof |
EP2098536A1 (en) | 2008-03-05 | 2009-09-09 | 4-Antibody AG | Isolation and identification of antigen- or ligand-specific binding proteins |
US9873957B2 (en) | 2008-03-13 | 2018-01-23 | Dyax Corp. | Libraries of genetic packages comprising novel HC CDR3 designs |
JP5780951B2 (en) | 2008-04-24 | 2015-09-16 | ダイアックス コーポレーション | A library of genetic packages containing new HCCR1, CDR2 and CDR3 designs, and new LCCR1, CDR2 and CDR3 designs |
US8067339B2 (en) | 2008-07-09 | 2011-11-29 | Merck Sharp & Dohme Corp. | Surface display of whole antibodies in eukaryotes |
RU2569187C2 (en) * | 2009-05-29 | 2015-11-20 | МорфоСис АГ | Collection and methods of its application |
CA2773564A1 (en) * | 2009-09-14 | 2011-03-17 | Dyax Corp. | Libraries of genetic packages comprising novel hc cdr3 designs |
US20130045492A1 (en) | 2010-02-08 | 2013-02-21 | Regeneron Pharmaceuticals, Inc. | Methods For Making Fully Human Bispecific Antibodies Using A Common Light Chain |
US9796788B2 (en) | 2010-02-08 | 2017-10-24 | Regeneron Pharmaceuticals, Inc. | Mice expressing a limited immunoglobulin light chain repertoire |
ME02646B (en) | 2010-02-08 | 2017-06-20 | Regeneron Pharma | Common light chain mouse |
US20160186188A1 (en) * | 2010-02-09 | 2016-06-30 | Rutgers, The State University Of New Jersey | Methods for altering polypeptide expression and solubility |
JP2013533211A (en) | 2010-05-04 | 2013-08-22 | メリマック ファーマシューティカルズ インコーポレーティッド | Antibodies against epidermal growth factor receptor (EGFR) and uses thereof |
KR102159272B1 (en) | 2010-07-16 | 2020-09-24 | 아디맵 엘엘씨 | Abtibody libraries |
MX353795B (en) * | 2010-11-19 | 2018-01-30 | Morphosys Ag | A collection and methods for its use. |
DE102010056289A1 (en) * | 2010-12-24 | 2012-06-28 | Geneart Ag | Process for the preparation of reading frame correct fragment libraries |
US9260753B2 (en) | 2011-03-24 | 2016-02-16 | President And Fellows Of Harvard College | Single cell nucleic acid detection and analysis |
US8691231B2 (en) | 2011-06-03 | 2014-04-08 | Merrimack Pharmaceuticals, Inc. | Methods of treatment of tumors expressing predominantly high affinity EGFR ligands or tumors expressing predominantly low affinity EGFR ligands with monoclonal and oligoclonal anti-EGFR antibodies |
EP3865581A1 (en) | 2011-08-05 | 2021-08-18 | Regeneron Pharmaceuticals, Inc. | Humanized universal light chain mice |
CA2791109C (en) | 2011-09-26 | 2021-02-16 | Merus B.V. | Generation of binding molecules |
US9801362B2 (en) | 2012-03-16 | 2017-10-31 | Regeneron Pharmaceuticals, Inc. | Non-human animals expressing pH-sensitive immunoglobulin sequences |
MY172730A (en) | 2012-03-16 | 2019-12-11 | Regeneron Pharma | Histidine engineered light chain antibodies and genetically modified non-human animals for generating the same |
US20140013456A1 (en) | 2012-03-16 | 2014-01-09 | Regeneron Pharmaceuticals, Inc. | Histidine Engineered Light Chain Antibodies and Genetically Modified Non-Human Animals for Generating the Same |
CN104302170B (en) | 2012-03-16 | 2016-09-28 | 瑞泽恩制药公司 | Produce the mice of the antigen-binding proteins with PH dependency binding characteristic |
ES2546105T3 (en) | 2012-04-17 | 2015-09-18 | Arsanis Biosciences Gmbh | Cross-reactive antibody against Staphylococcus aureus |
SG10201913376XA (en) | 2012-04-20 | 2020-02-27 | Merus Nv | Methods and means for the production of ig-like molecules |
CA2895327C (en) | 2013-01-17 | 2021-06-01 | Arsanis Biosciences Gmbh | Mdr e. coli specific antibody |
NZ710299A (en) * | 2013-01-31 | 2020-01-31 | Codexis Inc | Methods, systems, and software for identifying bio-molecules with interacting components |
WO2014169076A1 (en) | 2013-04-09 | 2014-10-16 | Annexon,,Inc. | Methods of treatment for neuromyelitis optica |
AU2014270598B2 (en) | 2013-05-21 | 2018-09-20 | Arsanis Biosciences Gmbh | Generation of highly potent antibodies neutralizing the LukGH (LukAB) toxin of Staphylococcus aureus |
EP3019523A4 (en) | 2013-07-09 | 2016-12-28 | Annexon Inc | Methods of treatment for alzheimer's disease and huntington's disease |
US9764039B2 (en) | 2013-07-10 | 2017-09-19 | Sutro Biopharma, Inc. | Antibodies comprising multiple site-specific non-natural amino acid residues, methods of their preparation and methods of their use |
US20160244511A1 (en) | 2013-10-17 | 2016-08-25 | Arsanis Biosciences Gmbh | Cross-reactive staphylococcus aureus antibody sequences |
MX2016012274A (en) | 2014-03-21 | 2017-05-23 | Regeneron Pharma | Non-human animals that make single domain binding proteins. |
EP3825326A1 (en) | 2014-04-01 | 2021-05-26 | Adimab, LLC | Multispecific antibody analogs comprising a common light chain, and methods of their preparation and use |
MA39599A (en) | 2014-05-14 | 2016-10-05 | Merrimack Pharmaceuticals Inc | Dosage and administration anti-egfr therapeutics |
TWI695011B (en) | 2014-06-18 | 2020-06-01 | 美商梅爾莎納醫療公司 | Monoclonal antibodies against her2 epitope and methods of use thereof |
CA3225495A1 (en) | 2014-07-14 | 2016-01-21 | The Penn State Research Foundation | Compositions and methods for targeting of the surfactant protein a receptor |
CA2955984A1 (en) | 2014-07-22 | 2016-01-28 | The University Of Notre Dame Du Lac | Molecular constructs and uses thereof |
DK3172233T3 (en) | 2014-07-22 | 2019-11-11 | Sutro Biopharma Inc | ANTI-CD74 ANTIBODIES, COMPOSITIONS CONTAINING ANTI-CD74 ANTIBODIES AND PROCEDURES FOR USING ANTI-CD74 ANTIBODIES |
CA2955086A1 (en) | 2014-08-08 | 2016-02-11 | Alector Llc | Anti-trem2 antibodies and methods of use thereof |
CN107873054B (en) * | 2014-09-09 | 2022-07-12 | 博德研究所 | Droplet-based methods and apparatus for multiplexed single-cell nucleic acid analysis |
MA40835A (en) | 2014-10-23 | 2017-08-29 | Biogen Ma Inc | ANTI-GPIIB / IIIA ANTIBODIES AND THEIR USES |
MA40861A (en) | 2014-10-31 | 2017-09-05 | Biogen Ma Inc | ANTI-GLYCOPROTEIN IIB / IIIA ANTIBODIES |
JP6859259B2 (en) | 2014-11-19 | 2021-04-14 | ジェネンテック, インコーポレイテッド | Antibodies to BACEl and its use for neurological disease immunotherapy |
US11008403B2 (en) | 2014-11-19 | 2021-05-18 | Genentech, Inc. | Anti-transferrin receptor / anti-BACE1 multispecific antibodies and methods of use |
KR101694832B1 (en) * | 2014-12-31 | 2017-01-12 | 앱클론(주) | Antibody Libraries and Methods for Preparation of Them |
MX2017009955A (en) | 2015-02-17 | 2017-10-19 | Arsanis Biosciences Gmbh | Antibodies targeting a galactan-based o-antigen of k. pneumoniae. |
WO2016145409A1 (en) | 2015-03-11 | 2016-09-15 | The Broad Institute, Inc. | Genotype and phenotype coupling |
CA2979702A1 (en) | 2015-03-19 | 2016-09-22 | Regeneron Pharmaceuticals, Inc. | Non-human animals that select for light chain variable regions that bind antigen |
EP3991748A3 (en) | 2015-04-07 | 2022-08-24 | Alector LLC | Anti-sortilin antibodies and methods of use thereof |
JP6962819B2 (en) | 2015-04-10 | 2021-11-05 | アディマブ, エルエルシー | Method for Purifying Heterodimer Multispecific Antibody from Parent Homodimer Antibody Species |
EP3283514A1 (en) | 2015-04-17 | 2018-02-21 | ARSANIS Biosciences GmbH | Anti-staphylococcus aureus antibody combination preparation |
US10829540B2 (en) * | 2015-05-01 | 2020-11-10 | Medimmune Limited | Phage display library, members thereof and uses of the same |
EP3307771A2 (en) | 2015-06-12 | 2018-04-18 | Alector LLC | Anti-cd33 antibodies and methods of use thereof |
WO2016201389A2 (en) | 2015-06-12 | 2016-12-15 | Alector Llc | Anti-cd33 antibodies and methods of use thereof |
EP3341411A1 (en) | 2015-08-28 | 2018-07-04 | Alector LLC | Anti-siglec-7 antibodies and methods of use thereof |
UA125062C2 (en) | 2015-10-01 | 2022-01-05 | Потенза Терапеутікс, Інк. | ANTI-TIGIT ANTIGEN-BINDING PROTEINS AND METHODS OF THEIR APPLICATION |
EP3359569A2 (en) | 2015-10-06 | 2018-08-15 | Alector LLC | Anti-trem2 antibodies and methods of use thereof |
CA3000901A1 (en) | 2015-10-16 | 2017-04-20 | Arsanis Biosciences Gmbh | Bactericidal monoclonal antibody targeting klebsiella pneumoniae |
WO2017075297A1 (en) | 2015-10-28 | 2017-05-04 | The Broad Institute Inc. | High-throughput dynamic reagent delivery system |
WO2017075265A1 (en) | 2015-10-28 | 2017-05-04 | The Broad Institute, Inc. | Multiplex analysis of single cell constituents |
AU2016343987B2 (en) | 2015-10-29 | 2023-11-23 | Alector Llc | Anti-Siglec-9 antibodies and methods of use thereof |
EP3377533A2 (en) | 2015-11-19 | 2018-09-26 | Sutro Biopharma, Inc. | Anti-lag3 antibodies, compositions comprising anti-lag3 antibodies and methods of making and using anti-lag3 antibodies |
KR102216032B1 (en) * | 2016-01-19 | 2021-02-16 | 주무토르 바이오로직스 인코포레이티드 | Synthetic antibody library generation method, the library and its application(s) |
US10053515B2 (en) | 2016-01-22 | 2018-08-21 | Merck Sharp & Dohme Corp. | Anti-coagulation factor XI antibodies |
CA3011455A1 (en) | 2016-01-27 | 2017-08-03 | Sutro Biopharma, Inc. | Anti-cd74 antibody conjugates, compositions comprising anti-cd74 antibody conjugates and methods of using anti-cd74 antibody conjugates |
WO2017152102A2 (en) | 2016-03-04 | 2017-09-08 | Alector Llc | Anti-trem1 antibodies and methods of use thereof |
CA3015277A1 (en) | 2016-03-10 | 2017-09-14 | Acceleron Pharma Inc. | Activin type 2 receptor binding proteins and uses thereof |
CA3016474A1 (en) | 2016-03-15 | 2017-09-21 | Mersana Therapeutics, Inc. | Napi2b-targeted antibody-drug conjugates and methods of use thereof |
SG11201810763TA (en) | 2016-06-14 | 2018-12-28 | Merck Sharp & Dohme | Anti-coagulation factor xi antibodies |
WO2017218698A1 (en) | 2016-06-15 | 2017-12-21 | Sutro Biopharma, Inc. | Antibodies with engineered ch2 domains, compositions thereof and methods of using the same |
WO2018029346A1 (en) | 2016-08-12 | 2018-02-15 | Arsanis Biosciences Gmbh | Klebsiella pneumoniae o3 specific antibodies |
JP2019531060A (en) | 2016-08-12 | 2019-10-31 | エックスフォー・ファーマシューティカルズ(オーストリア)ゲーエムべーハー | Anti-galactan-II monoclonal antibody targeting Klebsiella pneumoniae |
EP3293293A1 (en) * | 2016-09-08 | 2018-03-14 | Italfarmaco SpA | Hc-cdr3-only libraries with reduced combinatorial redundancy and optimized loop length distribution |
US20190233512A1 (en) | 2016-10-12 | 2019-08-01 | Sutro Biopharma, Inc. | Anti-folate receptor antibodies, compositions comprising anti-folate receptor antibodies and methods of making and using anti-folate receptor antibodies |
AR109621A1 (en) | 2016-10-24 | 2018-12-26 | Janssen Pharmaceuticals Inc | FORMULATIONS OF VACCINES AGAINST GLUCOCONJUGADOS OF EXPEC |
MA46708B1 (en) | 2016-11-02 | 2021-10-29 | Jounce Therapeutics Inc | Anti-pd1 antibodies and their uses |
JOP20190100A1 (en) | 2016-11-19 | 2019-05-01 | Potenza Therapeutics Inc | Anti-gitr antigen-binding proteins and methods of use thereof |
MA46893A (en) | 2016-11-23 | 2019-10-02 | Bioverativ Therapeutics Inc | BISPECIFIC ANTIBODIES BINDING TO COAGULATION FACTOR IX AND COAGULATION FACTOR X |
MA47111A (en) | 2016-12-22 | 2019-10-30 | Univ Wake Forest Health Sciences | SIRPGAMMA TARGETING AGENTS FOR USE IN THE TREATMENT OF CANCER |
JOP20190134A1 (en) | 2016-12-23 | 2019-06-02 | Potenza Therapeutics Inc | Anti-neuropilin antigen-binding proteins and methods of use thereof |
RU2019123063A (en) | 2016-12-23 | 2021-01-26 | Вистерра, Инк. | BINDING POLYPEPTIDES AND METHODS FOR THEIR PREPARATION |
CN108239150A (en) | 2016-12-24 | 2018-07-03 | 信达生物制药(苏州)有限公司 | Anti- PCSK9 antibody and application thereof |
CN110944651A (en) | 2017-02-08 | 2020-03-31 | 蜻蜓疗法股份有限公司 | Multispecific binding proteins for natural killer cell activation and therapeutic uses thereof for treating cancer |
EP3579866A4 (en) * | 2017-02-08 | 2020-12-09 | Dragonfly Therapeutics, Inc. | Antibody heavy chain variable domains targeting the nkg2d receptor |
FI3582806T3 (en) | 2017-02-20 | 2023-09-07 | Dragonfly Therapeutics Inc | Proteins binding her2, nkg2d and cd16 |
WO2018156777A1 (en) | 2017-02-22 | 2018-08-30 | Sutro Biopharma, Inc. | Pd-1/tim-3 bi-specific antibodies, compositions thereof, and methods of making and using the same |
CN108623686A (en) | 2017-03-25 | 2018-10-09 | 信达生物制药(苏州)有限公司 | Anti- OX40 antibody and application thereof |
JOP20190203A1 (en) | 2017-03-30 | 2019-09-03 | Potenza Therapeutics Inc | Anti-tigit antigen-binding proteins and methods of use thereof |
US11072816B2 (en) | 2017-05-03 | 2021-07-27 | The Broad Institute, Inc. | Single-cell proteomic assay using aptamers |
EP3625258A1 (en) | 2017-05-16 | 2020-03-25 | Alector LLC | Anti-siglec-5 antibodies and methods of use thereof |
US20200079850A1 (en) | 2017-05-24 | 2020-03-12 | Sutro Biopharma, Inc. | Pd-1/lag3 bi-specific antibodies, compositions thereof, and methods of making and using the same |
US11780930B2 (en) | 2017-06-08 | 2023-10-10 | Black Belt Therapeutics Limited | CD38 modulating antibody |
WO2018224683A1 (en) | 2017-06-08 | 2018-12-13 | Tusk Therapeutics Ltd | Cd38 modulating antibody |
PE20200717A1 (en) | 2017-06-22 | 2020-07-21 | Novartis Ag | ANTIBODY MOLECULES THAT BIND AND USES CD73 |
TWI820031B (en) | 2017-07-11 | 2023-11-01 | 美商坎伯斯治療有限責任公司 | Agonist antibodies that bind human cd137 and uses thereof |
US10961318B2 (en) | 2017-07-26 | 2021-03-30 | Forty Seven, Inc. | Anti-SIRP-α antibodies and related methods |
US20200207859A1 (en) | 2017-07-26 | 2020-07-02 | Sutro Biopharma, Inc. | Methods of using anti-cd74 antibodies and antibody conjugates in treatment of t-cell lymphoma |
WO2019023504A1 (en) | 2017-07-27 | 2019-01-31 | Iteos Therapeutics Sa | Anti-tigit antibodies |
MX2020000960A (en) | 2017-07-27 | 2020-07-22 | iTeos Belgium SA | Anti-tigit antibodies. |
HUE062436T2 (en) | 2017-08-03 | 2023-11-28 | Alector Llc | Anti-trem2 antibodies and methods of use thereof |
AU2018310985A1 (en) | 2017-08-03 | 2019-11-07 | Alector Llc | Anti-CD33 antibodies and methods of use thereof |
EP3668896A1 (en) | 2017-08-16 | 2020-06-24 | Black Belt Therapeutics Limited | Cd38 modulating antibody |
US11236173B2 (en) | 2017-08-16 | 2022-02-01 | Black Belt Therapeutics Limited | CD38 antibody |
MX2020002076A (en) | 2017-08-25 | 2020-03-24 | Five Prime Therapeutics Inc | B7-h4 antibodies and methods of use thereof. |
CN109422811A (en) | 2017-08-29 | 2019-03-05 | 信达生物制药(苏州)有限公司 | Anti-cd 47 antibody and application thereof |
CN111566125A (en) * | 2017-09-11 | 2020-08-21 | 特韦斯特生物科学公司 | GPCR binding proteins and synthesis thereof |
EP3684814A1 (en) | 2017-09-18 | 2020-07-29 | Sutro Biopharma, Inc. | Anti-folate receptor alpha antibody conjugates and their uses |
EP3694545A4 (en) | 2017-10-11 | 2021-12-01 | Board Of Regents, The University Of Texas System | Human pd-l1 antibodies and methods of use therefor |
US11718679B2 (en) | 2017-10-31 | 2023-08-08 | Compass Therapeutics Llc | CD137 antibodies and PD-1 antagonists and uses thereof |
WO2019099454A2 (en) * | 2017-11-15 | 2019-05-23 | Philippe Valadon | Highly functional antibody libraries |
AU2018367524B2 (en) | 2017-11-17 | 2022-09-15 | Merck Sharp & Dohme Llc | Antibodies specific for immunoglobulin-like transcript 3 (ILT3) and uses thereof |
US11851497B2 (en) | 2017-11-20 | 2023-12-26 | Compass Therapeutics Llc | CD137 antibodies and tumor antigen-targeting antibodies and uses thereof |
MX2020005473A (en) | 2017-11-27 | 2020-08-27 | Purdue Pharma Lp | Humanized antibodies targeting human tissue factor. |
WO2019129137A1 (en) | 2017-12-27 | 2019-07-04 | 信达生物制药(苏州)有限公司 | Anti-lag-3 antibody and uses thereof |
CN109970856B (en) | 2017-12-27 | 2022-08-23 | 信达生物制药(苏州)有限公司 | anti-LAG-3 antibodies and uses thereof |
AU2019205330A1 (en) | 2018-01-04 | 2020-08-27 | Iconic Therapeutics Llc | Anti-tissue factor antibodies, antibody-drug conjugates, and related methods |
AU2019214183B2 (en) | 2018-02-01 | 2022-04-07 | Innovent Biologics (Suzhou) Co., Ltd. | Fully human anti-B cell maturation antigen (BCMA) single chain variable fragment, and application thereof |
SG11202007482WA (en) | 2018-02-08 | 2020-09-29 | Dragonfly Therapeutics Inc | Antibody variable domains targeting the nkg2d receptor |
EP3759142A1 (en) | 2018-03-02 | 2021-01-06 | Five Prime Therapeutics, Inc. | B7-h4 antibodies and methods of use thereof |
MX2021015518A (en) | 2018-03-14 | 2022-07-21 | Surface Oncology Inc | Antibodies that bind cd39 and uses thereof. |
US11332524B2 (en) | 2018-03-22 | 2022-05-17 | Surface Oncology, Inc. | Anti-IL-27 antibodies and uses thereof |
WO2019182896A1 (en) | 2018-03-23 | 2019-09-26 | Board Of Regents, The University Of Texas System | Dual specificity antibodies to pd-l1 and pd-l2 and methods of use therefor |
US20210130483A1 (en) | 2018-03-26 | 2021-05-06 | Sutro Biopharma, Inc. | Anti-bcma receptor antibodies, compositions comprising anti bcma receptor antibodies and methods of making and using anti-bcma antibodies |
CN110464842B (en) | 2018-05-11 | 2022-10-14 | 信达生物制药(苏州)有限公司 | Formulations comprising anti-PCSK 9 antibodies and uses thereof |
AU2019269383A1 (en) * | 2018-05-18 | 2020-12-10 | Duke University | Optimized GP41-binding molecules and uses thereof |
AR115418A1 (en) | 2018-05-25 | 2021-01-13 | Alector Llc | ANTI-SIRPA ANTIBODIES (SIGNAL REGULATING PROTEIN a) AND METHODS OF USE OF THE SAME |
WO2019232244A2 (en) | 2018-05-31 | 2019-12-05 | Novartis Ag | Antibody molecules to cd73 and uses thereof |
KR102115300B1 (en) * | 2018-06-01 | 2020-05-26 | 재단법인 목암생명과학연구소 | Antibody library and Screening Method of Antibody by Using the Same |
SG11202010990TA (en) | 2018-06-29 | 2020-12-30 | Alector Llc | Anti-sirp-beta1 antibodies and methods of use thereof |
EP3833685A4 (en) | 2018-07-08 | 2022-08-17 | Specifica Inc. | Antibody libraries with maximized antibody developability characteristics |
PE20210186A1 (en) | 2018-07-13 | 2021-02-02 | Alector Llc | ANTI-SORTILINE ANTIBODIES AND METHODS FOR ITS USE |
KR20210032488A (en) | 2018-07-20 | 2021-03-24 | 서피스 온콜로지, 인크. | Anti-CD112R compositions and methods |
AU2019310803B2 (en) | 2018-07-25 | 2022-11-03 | Innovent Biologics (Suzhou) Co., Ltd. | Anti-TIGIT antibody and uses thereof |
BR112021001451A2 (en) | 2018-07-27 | 2021-04-27 | Alector Llc | monoclonal anti-siglec-5 antibodies isolated, nucleic acid, vector, host cell, antibody production and prevention methods, pharmaceutical composition and methods to induce or promote survival, to decrease activity, to decrease cell levels, to induce the production of reactive species, to induce the formation of extracellular trap (net) neutrophil, to induce neutrophil activation, to attenuate one or more immunosuppressed neutrophils and to increase phagocytosis activity |
EP3833443A1 (en) | 2018-08-09 | 2021-06-16 | Compass Therapeutics LLC | Antigen binding agents that bind cd277 and uses thereof |
US20210309746A1 (en) | 2018-08-09 | 2021-10-07 | Compass Therapeutics Llc | Antibodies that bind cd277 and uses thereof |
WO2020033925A2 (en) | 2018-08-09 | 2020-02-13 | Compass Therapeutics Llc | Antibodies that bind cd277 and uses thereof |
AU2019324170A1 (en) | 2018-08-23 | 2021-02-18 | Seagen, Inc. | Anti-TIGIT antibodies |
WO2020060944A1 (en) | 2018-09-17 | 2020-03-26 | Sutro Biopharma, Inc. | Combination therapies with anti-folate receptor antibody conjugates |
US11208487B2 (en) | 2018-09-27 | 2021-12-28 | Tizona Therapeutics | Anti-HLA-G antibodies, compositions comprising anti-HLA-G antibodies and methods of using anti-HLA-G antibodies |
CA3119161A1 (en) | 2018-11-13 | 2020-05-22 | Compass Therapeutics Llc | Multispecific binding constructs against checkpoint molecules and uses thereof |
EP3908609A1 (en) | 2019-01-07 | 2021-11-17 | iTeos Belgium SA | Anti-tigit antibodies |
AU2020208397A1 (en) | 2019-01-16 | 2021-08-12 | Compass Therapeutics Llc | Formulations of antibodies that bind human CD137 and uses thereof |
EP3927748A1 (en) | 2019-02-21 | 2021-12-29 | Trishula Therapeutics, Inc. | Combination therapy involving anti-cd39 antibodies and anti-pd-1 or anti-pd-l1 antibodies |
SG11202109283UA (en) * | 2019-02-26 | 2021-09-29 | Twist Bioscience Corp | Variant nucleic acid libraries for antibody optimization |
UY38616A (en) | 2019-03-18 | 2020-09-30 | Janssen Pharmaceuticals Inc | BIOCONJUGATES OF POLYSACCHARIDES OF E. COLI ANTIGEN-O AND METHODS OF PRODUCTION AND USE OF THE SAME. |
MA55613A (en) | 2019-04-08 | 2022-02-16 | Biogen Ma Inc | ANTI-INTEGRIN ANTIBODIES AND THEIR USES |
US20220362394A1 (en) | 2019-05-03 | 2022-11-17 | Sutro Biopharma, Inc. | Anti-bcma antibody conjugates |
EA202193040A1 (en) | 2019-05-03 | 2022-03-25 | Селджен Корпорэйшн | ANTIBODY ANTI-VSMA CONJUGATE, COMPOSITIONS CONTAINING THIS CONJUGATE, AND METHODS FOR ITS PRODUCTION AND APPLICATION |
EP3980423A1 (en) | 2019-06-10 | 2022-04-13 | Sutro Biopharma, Inc. | 5h-pyrrolo[3,2-d]pyrimidine-2,4-diamino compounds and antibody conjugates thereof |
JP2022536800A (en) | 2019-06-17 | 2022-08-18 | ストロ バイオファーマ インコーポレーテッド | 1-(4-(aminomethyl)benzyl)-2-butyl-2H-pyrazolo[3,4-C]quinolin-4-amine derivatives and related compounds as TOLL-like receptor (TLR) 7/8 agonists, and antibody drug conjugates thereof for use in cancer therapy and diagnosis |
WO2021055329A1 (en) | 2019-09-16 | 2021-03-25 | Surface Oncology, Inc. | Anti-cd39 antibody compositions and methods |
TW202128752A (en) | 2019-09-25 | 2021-08-01 | 美商表面腫瘤學公司 | Anti-il-27 antibodies and uses thereof |
WO2021080682A1 (en) | 2019-10-24 | 2021-04-29 | Massachusetts Institute Of Technology | Monoclonal antibodies that bind human cd161 and uses thereof |
MX2022006073A (en) | 2019-12-05 | 2022-08-04 | Alector Llc | Methods of use of anti-trem2 antibodies. |
JP2023506014A (en) | 2019-12-12 | 2023-02-14 | アレクトル エルエルシー | Methods of using anti-CD33 antibodies |
WO2021115468A1 (en) * | 2019-12-13 | 2021-06-17 | 南京金斯瑞生物科技有限公司 | Method for constructing antibody complementarity determining region library |
JP2023509083A (en) | 2020-01-09 | 2023-03-06 | イノベント バイオロジクス(スーチョウ)カンパニー,リミティド | Use of a combination of anti-CD47 and anti-CD20 antibodies in the preparation of a medicament for preventing or treating tumors |
CN111139264B (en) * | 2020-01-20 | 2021-07-06 | 天津达济科技有限公司 | Method for constructing single-domain antibody library in mammalian cell line based on linear double-stranded DNA molecules |
EP4106788A1 (en) | 2020-02-18 | 2022-12-28 | Alector LLC | Pilra antibodies and methods of use thereof |
US20230159637A1 (en) | 2020-02-24 | 2023-05-25 | Alector Llc | Methods of use of anti-trem2 antibodies |
EP4110389A1 (en) * | 2020-02-28 | 2023-01-04 | The Brigham And Women's Hospital, Inc. | Selective modulation of transforming growth factor beta superfamily signaling via multi-specific antibodies |
JP2023519584A (en) | 2020-03-23 | 2023-05-11 | バイオ - テラ ソリューションズ、リミテッド | Development and application of immune cell activators |
KR20230005848A (en) | 2020-04-03 | 2023-01-10 | 알렉터 엘엘씨 | Methods of Using Anti-TREM2 Antibodies |
BR112022021450A2 (en) | 2020-04-24 | 2022-12-27 | Millennium Pharm Inc | CD19 OR LEADING FRAGMENT, METHOD OF TREATMENT OF A CANCER, PHARMACEUTICAL COMPOSITION, NUCLEIC ACID, VECTOR AND, ISOLATED CELL |
AU2021275361A1 (en) | 2020-05-17 | 2023-01-19 | Astrazeneca Uk Limited | SARS-CoV-2 antibodies and methods of selecting and using the same |
WO2021259199A1 (en) | 2020-06-22 | 2021-12-30 | 信达生物制药(苏州)有限公司 | Anti-cd73 antibody and use thereof |
WO2021262765A1 (en) | 2020-06-22 | 2021-12-30 | The Board Of Trustees Of The Leland Stanford Junior University | Tsp-1 inhibitors for the treatment of aged, atrophied or dystrophied muscle |
AU2021313348A1 (en) | 2020-07-20 | 2023-03-09 | Astrazeneca Uk Limited | SARS-CoV-2 proteins, anti-SARS-CoV-2 antibodies, and methods of using the same |
AU2021328375A1 (en) | 2020-08-18 | 2023-04-13 | Cephalon Llc | Anti-PAR-2 antibodies and methods of use thereof |
TW202233248A (en) | 2020-11-08 | 2022-09-01 | 美商西健公司 | Combination therapy |
KR20230128290A (en) * | 2020-12-03 | 2023-09-04 | 타보텍 바이오테라퓨틱스 (홍콩) 리미티드 | Library dedicated to variable heavy chains, method for producing the same, and uses thereof |
WO2022125927A1 (en) | 2020-12-11 | 2022-06-16 | The University Of North Carolina At Chapel Hill | Compositions and methods comprising sfrp2 antagonists |
WO2022162569A1 (en) | 2021-01-29 | 2022-08-04 | Novartis Ag | Dosage regimes for anti-cd73 and anti-entpd2 antibodies and uses thereof |
TW202304999A (en) | 2021-04-09 | 2023-02-01 | 美商思進公司 | Methods of treating cancer with anti-tigit antibodies |
KR20240004659A (en) | 2021-04-30 | 2024-01-11 | 셀진 코포레이션 | Combination therapy using an anti-BCMA antibody-drug conjugate (ADC) in combination with a gamma secretase inhibitor (GSI) |
WO2022245978A1 (en) | 2021-05-19 | 2022-11-24 | Sutro Biopharma, Inc. | Anti-folate receptor conjugate combination therapy with bevacizumab |
CN113668068A (en) * | 2021-07-20 | 2021-11-19 | 广州滴纳生物科技有限公司 | Genome methylation library and preparation method and application thereof |
AU2022327511A1 (en) | 2021-08-09 | 2024-03-07 | Biotheus Inc. | Anti-tigit antibody and use thereof |
IL310535A (en) | 2021-08-10 | 2024-03-01 | Byomass Inc | Anti-gdf15 antibodies, compositions and uses thereof |
IL310662A (en) | 2021-08-23 | 2024-04-01 | Immunitas Therapeutics Inc | Anti-cd161 antibodies and uses thereof |
WO2023069421A1 (en) | 2021-10-18 | 2023-04-27 | Byomass Inc. | Anti-activin a antibodies, compositions and uses thereof |
WO2023081898A1 (en) | 2021-11-08 | 2023-05-11 | Alector Llc | Soluble cd33 as a biomarker for anti-cd33 efficacy |
WO2023102077A1 (en) | 2021-12-01 | 2023-06-08 | Sutro Biopharma, Inc. | Anti-folate receptor conjugate cancer therapy |
WO2023104904A2 (en) | 2021-12-08 | 2023-06-15 | Genclis | The sars-cov-2 and variants use two independent cell receptors to replicate |
WO2023122213A1 (en) | 2021-12-22 | 2023-06-29 | Byomass Inc. | Targeting gdf15-gfral pathway cross-reference to related applications |
WO2023147107A1 (en) | 2022-01-31 | 2023-08-03 | Byomass Inc. | Myeloproliferative conditions |
WO2023164516A1 (en) | 2022-02-23 | 2023-08-31 | Alector Llc | Methods of use of anti-trem2 antibodies |
WO2023178192A1 (en) | 2022-03-15 | 2023-09-21 | Compugen Ltd. | Il-18bp antagonist antibodies and their use in monotherapy and combination therapy in the treatment of cancer |
US20230365708A1 (en) | 2022-04-01 | 2023-11-16 | Board Of Regents, The University Of Texas System | Dual specificity antibodies to human pd-l1 and pd-l2 and methods of use therefor |
WO2023209177A1 (en) | 2022-04-29 | 2023-11-02 | Astrazeneca Uk Limited | Sars-cov-2 antibodies and methods of using the same |
Family Cites Families (220)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4230685A (en) | 1979-02-28 | 1980-10-28 | Northwestern University | Method of magnetic separation of cells and the like, and microspheres for use therein |
US4452773A (en) | 1982-04-05 | 1984-06-05 | Canadian Patents And Development Limited | Magnetic iron-dextran microspheres |
US4661454A (en) | 1983-02-28 | 1987-04-28 | Collaborative Research, Inc. | GAL1 yeast promoter linked to non galactokinase gene |
US4816567A (en) | 1983-04-08 | 1989-03-28 | Genentech, Inc. | Recombinant immunoglobin preparations |
GB8408127D0 (en) | 1984-03-29 | 1984-05-10 | Nyegaard & Co As | Contrast agents |
US5118605A (en) * | 1984-10-16 | 1992-06-02 | Chiron Corporation | Polynucleotide determination with selectable cleavage sites |
DE3590766C2 (en) * | 1985-03-30 | 1991-01-10 | Marc Genf/Geneve Ch Ballivet | |
CA1293460C (en) | 1985-10-07 | 1991-12-24 | Brian Lee Sauer | Site-specific recombination of dna in yeast |
US5618920A (en) * | 1985-11-01 | 1997-04-08 | Xoma Corporation | Modular assembly of antibody genes, antibodies prepared thereby and use |
US4770183A (en) | 1986-07-03 | 1988-09-13 | Advanced Magnetics Incorporated | Biologically degradable superparamagnetic particles for use as nuclear magnetic resonance imaging agents |
DE3785186T2 (en) | 1986-09-02 | 1993-07-15 | Enzon Lab Inc | BINDING MOLECULE WITH SINGLE POLYPEPTIDE CHAIN. |
US4946778A (en) | 1987-09-21 | 1990-08-07 | Genex Corporation | Single polypeptide chain binding molecules |
US5763192A (en) * | 1986-11-20 | 1998-06-09 | Ixsys, Incorporated | Process for obtaining DNA, RNA, peptides, polypeptides, or protein, by recombinant DNA technique |
WO1988006630A1 (en) | 1987-03-02 | 1988-09-07 | Genex Corporation | Method for the preparation of binding molecules |
US5223409A (en) | 1988-09-02 | 1993-06-29 | Protein Engineering Corp. | Directed evolution of novel binding proteins |
US5688666A (en) * | 1988-10-28 | 1997-11-18 | Genentech, Inc. | Growth hormone variants with altered binding properties |
AU634186B2 (en) * | 1988-11-11 | 1993-02-18 | Medical Research Council | Single domain ligands, receptors comprising said ligands, methods for their production, and use of said ligands and receptors |
DE68919715T2 (en) | 1988-12-28 | 1995-04-06 | Stefan Miltenyi | METHOD AND MATERIALS FOR HIGHLY GRADUATED MAGNETIC SPLITTING OF BIOLOGICAL MATERIALS. |
US5530101A (en) | 1988-12-28 | 1996-06-25 | Protein Design Labs, Inc. | Humanized immunoglobulins |
US6291161B1 (en) * | 1989-05-16 | 2001-09-18 | Scripps Research Institute | Method for tapping the immunological repertiore |
US6291159B1 (en) * | 1989-05-16 | 2001-09-18 | Scripps Research Institute | Method for producing polymers having a preselected activity |
US6291158B1 (en) * | 1989-05-16 | 2001-09-18 | Scripps Research Institute | Method for tapping the immunological repertoire |
US6969586B1 (en) * | 1989-05-16 | 2005-11-29 | Scripps Research Institute | Method for tapping the immunological repertoire |
US6680192B1 (en) * | 1989-05-16 | 2004-01-20 | Scripps Research Institute | Method for producing polymers having a preselected activity |
US6291160B1 (en) * | 1989-05-16 | 2001-09-18 | Scripps Research Institute | Method for producing polymers having a preselected activity |
DE3920358A1 (en) | 1989-06-22 | 1991-01-17 | Behringwerke Ag | BISPECIFIC AND OLIGO-SPECIFIC, MONO- AND OLIGOVALENT ANTI-BODY CONSTRUCTS, THEIR PRODUCTION AND USE |
US5283173A (en) | 1990-01-24 | 1994-02-01 | The Research Foundation Of State University Of New York | System to detect protein-protein interactions |
EP0440147B2 (en) * | 1990-02-01 | 2014-05-14 | Siemens Healthcare Diagnostics Products GmbH | Preparation and use of a human antibody gene bank (human antibody libraries) |
DE4002897A1 (en) * | 1990-02-01 | 1991-08-08 | Behringwerke Ag | Synthetic human antibody library |
ATE126535T1 (en) * | 1990-04-05 | 1995-09-15 | Roberto Crea | ''WALK-THROUGH'' MUTAGENesis. |
ES2119756T3 (en) * | 1990-04-18 | 1998-10-16 | Gist Brocades Nv | MUTATED GENES OF BETA LACTAMA ACILASA. |
US6916605B1 (en) * | 1990-07-10 | 2005-07-12 | Medical Research Council | Methods for producing members of specific binding pairs |
GB9206318D0 (en) * | 1992-03-24 | 1992-05-06 | Cambridge Antibody Tech | Binding substances |
GB9015198D0 (en) * | 1990-07-10 | 1990-08-29 | Brien Caroline J O | Binding substance |
US6172197B1 (en) * | 1991-07-10 | 2001-01-09 | Medical Research Council | Methods for producing members of specific binding pairs |
US7063943B1 (en) * | 1990-07-10 | 2006-06-20 | Cambridge Antibody Technology | Methods for producing members of specific binding pairs |
DE69129154T2 (en) * | 1990-12-03 | 1998-08-20 | Genentech Inc | METHOD FOR ENRICHING PROTEIN VARIANTS WITH CHANGED BINDING PROPERTIES |
US5780279A (en) * | 1990-12-03 | 1998-07-14 | Genentech, Inc. | Method of selection of proteolytic cleavage sites by directed evolution and phagemid display |
AU662148B2 (en) * | 1991-04-10 | 1995-08-24 | Scripps Research Institute, The | Heterodimeric receptor libraries using phagemids |
US6072039A (en) | 1991-04-19 | 2000-06-06 | Rohm And Haas Company | Hybrid polypeptide comparing a biotinylated avidin binding polypeptide fused to a polypeptide of interest |
DE69230142T2 (en) * | 1991-05-15 | 2000-03-09 | Cambridge Antibody Tech | METHOD FOR PRODUCING SPECIFIC BINDING PAIRS |
US5962255A (en) * | 1992-03-24 | 1999-10-05 | Cambridge Antibody Technology Limited | Methods for producing recombinant vectors |
US5858657A (en) * | 1992-05-15 | 1999-01-12 | Medical Research Council | Methods for producing members of specific binding pairs |
US6492160B1 (en) * | 1991-05-15 | 2002-12-10 | Cambridge Antibody Technology Limited | Methods for producing members of specific binding pairs |
US6225447B1 (en) * | 1991-05-15 | 2001-05-01 | Cambridge Antibody Technology Ltd. | Methods for producing members of specific binding pairs |
CA2114950A1 (en) * | 1991-08-10 | 1993-02-11 | Michael J. Embleton | Treatment of cell populations |
ES2136092T3 (en) * | 1991-09-23 | 1999-11-16 | Medical Res Council | PROCEDURES FOR THE PRODUCTION OF HUMANIZED ANTIBODIES. |
US5492817A (en) | 1993-11-09 | 1996-02-20 | Promega Corporation | Coupled transcription and translation in eukaryotic cell-free extract |
US5665563A (en) | 1991-10-11 | 1997-09-09 | Promega Corporation | Coupled transcription and translation in eukaryotic cell-free extract |
DE69216385T2 (en) | 1991-10-11 | 1997-06-12 | Promega Corp | COUPLED TRANSCRIPTION AND TRANSLATION IN CELL-FREE EUKARYOT EXTRACTS |
US5866344A (en) | 1991-11-15 | 1999-02-02 | Board Of Regents, The University Of Texas System | Antibody selection methods using cell surface expressed libraries |
US20030036092A1 (en) | 1991-11-15 | 2003-02-20 | Board Of Regents, The University Of Texas System | Directed evolution of enzymes and antibodies |
AU3178993A (en) | 1991-11-25 | 1993-06-28 | Enzon, Inc. | Multivalent antigen-binding proteins |
WO1993011236A1 (en) * | 1991-12-02 | 1993-06-10 | Medical Research Council | Production of anti-self antibodies from antibody segment repertoires and displayed on phage |
US5872215A (en) * | 1991-12-02 | 1999-02-16 | Medical Research Council | Specific binding members, materials and methods |
US5667988A (en) * | 1992-01-27 | 1997-09-16 | The Scripps Research Institute | Methods for producing antibody libraries using universal or randomized immunoglobulin light chains |
US5733743A (en) * | 1992-03-24 | 1998-03-31 | Cambridge Antibody Technology Limited | Methods for producing members of specific binding pairs |
ATE236257T1 (en) | 1992-07-08 | 2003-04-15 | Unilever Nv | ENZYMATIC PROCESS THAT USED ENZYMES IMMOBILIZED ON THE CELL WALL OF A EUKARYONTIC MICROBIAL CELL BY CREATING A FUSION PROTEIN. |
AU687010B2 (en) | 1992-07-17 | 1998-02-19 | Dana-Farber Cancer Institute | Method of intracellular binding of target molecules |
EP0675904B1 (en) | 1992-09-30 | 2003-03-05 | The Scripps Research Institute | Human neutralizing monoclonal antibodies to human immunodeficiency virus |
US7166423B1 (en) | 1992-10-21 | 2007-01-23 | Miltenyi Biotec Gmbh | Direct selection of cells by secretion product |
DE4237113B4 (en) | 1992-11-03 | 2006-10-12 | "Iba Gmbh" | Peptides and their fusion proteins, expression vector and method of producing a fusion protein |
EP0672142B1 (en) * | 1992-12-04 | 2001-02-28 | Medical Research Council | Multivalent and multispecific binding proteins, their manufacture and use |
GB9225453D0 (en) * | 1992-12-04 | 1993-01-27 | Medical Res Council | Binding proteins |
ES2199959T3 (en) * | 1993-02-04 | 2004-03-01 | Borean Pharma A/S | IMPROVED PROCEDURE FOR THE REPLEGATION OF PROTEINS. |
WO1994018330A1 (en) | 1993-02-10 | 1994-08-18 | Unilever N.V. | Immobilized proteins with specific binding capacities and their use in processes and products |
GB9313509D0 (en) * | 1993-06-30 | 1993-08-11 | Medical Res Council | Chemisynthetic libraries |
WO1995004069A1 (en) | 1993-07-30 | 1995-02-09 | Affymax Technologies N.V. | Biotinylation of proteins |
CA2169620A1 (en) * | 1993-09-22 | 1995-03-30 | Gregory Paul Winter | Retargeting antibodies |
US5922545A (en) | 1993-10-29 | 1999-07-13 | Affymax Technologies N.V. | In vitro peptide and antibody display libraries |
US5605793A (en) | 1994-02-17 | 1997-02-25 | Affymax Technologies N.V. | Methods for in vitro recombination |
US5525490A (en) | 1994-03-29 | 1996-06-11 | Onyx Pharmaceuticals, Inc. | Reverse two-hybrid method |
US5695941A (en) | 1994-06-22 | 1997-12-09 | The General Hospital Corporation | Interaction trap systems for analysis of protein networks |
CU22615A1 (en) | 1994-06-30 | 2000-02-10 | Centro Inmunologia Molecular | PROCEDURE FOR OBTAINING LESS IMMUNOGENIC MONOCLONAL ANTIBODIES. MONOCLONAL ANTIBODIES OBTAINED |
GB9414506D0 (en) * | 1994-07-18 | 1994-09-07 | Zeneca Ltd | Improved production of polyhydroxyalkanoate |
US5888773A (en) | 1994-08-17 | 1999-03-30 | The United States Of America As Represented By The Department Of Health And Human Services | Method of producing single-chain Fv molecules |
US5763733A (en) | 1994-10-13 | 1998-06-09 | Enzon, Inc. | Antigen-binding fusion proteins |
GB9504344D0 (en) | 1995-03-03 | 1995-04-19 | Unilever Plc | Antibody fragment production |
JPH11502717A (en) | 1995-04-11 | 1999-03-09 | ザ ジェネラル ホスピタル コーポレーション | Reverse two hybrid system |
DE69621940T2 (en) * | 1995-08-18 | 2003-01-16 | Morphosys Ag | PROTEIN - / (POLY) PEPTIDE LIBRARIES |
US7264963B1 (en) * | 1995-08-18 | 2007-09-04 | Morphosys Ag | Protein(poly)peptide libraries |
FR2741892B1 (en) | 1995-12-04 | 1998-02-13 | Pasteur Merieux Serums Vacc | METHOD FOR PREPARING A MULTI-COMBINED BANK OF ANTIBODY GENE EXPRESSION VECTORS, BANK AND COLICLONAL ANTIBODY EXPRESSION SYSTEMS |
DE19637718A1 (en) | 1996-04-01 | 1997-10-02 | Boehringer Mannheim Gmbh | Recombinant inactive core streptavidin mutants |
US5928868A (en) | 1996-04-26 | 1999-07-27 | Massachusetts Institute Of Technology | Three hybrid screening assay |
US6699658B1 (en) | 1996-05-31 | 2004-03-02 | Board Of Trustees Of The University Of Illinois | Yeast cell surface display of proteins and uses thereof |
US6696251B1 (en) | 1996-05-31 | 2004-02-24 | Board Of Trustees Of The University Of Illinois | Yeast cell surface display of proteins and uses thereof |
US6300065B1 (en) | 1996-05-31 | 2001-10-09 | Board Of Trustees Of The University Of Illinois | Yeast cell surface display of proteins and uses thereof |
US6083693A (en) | 1996-06-14 | 2000-07-04 | Curagen Corporation | Identification and comparison of protein-protein interactions that occur in populations |
DE19624562A1 (en) | 1996-06-20 | 1998-01-02 | Thomas Dr Koehler | Determination of the concentration ratio of two different nucleic acids |
JP3841360B2 (en) | 1996-06-24 | 2006-11-01 | ツェットエルベー・ベリング・アクチェンゲゼルシャフト | Polypeptide capable of forming an antigen-binding structure having specificity for Rhesus D antigen, DNA encoding them, preparation method and use thereof |
US5994515A (en) | 1996-06-25 | 1999-11-30 | Trustees Of The University Of Pennsylvania | Antibodies directed against cellular coreceptors for human immunodeficiency virus and methods of using the same |
GB9712818D0 (en) * | 1996-07-08 | 1997-08-20 | Cambridge Antibody Tech | Labelling and selection of specific binding molecules |
US5948620A (en) | 1996-08-05 | 1999-09-07 | Amersham International Plc | Reverse two-hybrid system employing post-translation signal modulation |
US5955275A (en) | 1997-02-14 | 1999-09-21 | Arcaris, Inc. | Methods for identifying nucleic acid sequences encoding agents that affect cellular phenotypes |
US5925523A (en) | 1996-08-23 | 1999-07-20 | President & Fellows Of Harvard College | Intraction trap assay, reagents and uses thereof |
ATE284972T1 (en) | 1996-09-24 | 2005-01-15 | Cadus Pharmaceutical Corp | METHODS AND COMPOSITIONS FOR IDENTIFYING RECEPTOR EFFECTORS |
DE19641876B4 (en) | 1996-10-10 | 2011-09-29 | Iba Gmbh | streptavidin muteins |
US6255455B1 (en) | 1996-10-11 | 2001-07-03 | The Trustees Of The University Of Pennsylvania | Rh(D)-binding proteins and magnetically activated cell sorting method for production thereof |
US5858671A (en) * | 1996-11-01 | 1999-01-12 | The University Of Iowa Research Foundation | Iterative and regenerative DNA sequencing method |
US5869250A (en) | 1996-12-02 | 1999-02-09 | The University Of North Carolina At Chapel Hill | Method for the identification of peptides that recognize specific DNA sequences |
DE69835143T2 (en) | 1997-01-21 | 2007-06-06 | The General Hospital Corp., Boston | SELECTION OF PROTEINS BY THE RNA PROTEIN FUSIONS |
US6261804B1 (en) | 1997-01-21 | 2001-07-17 | The General Hospital Corporation | Selection of proteins using RNA-protein fusions |
CA2283716A1 (en) | 1997-03-14 | 1998-09-17 | Gabriel O. Reznik | Multiflavor streptavidin |
US6057098A (en) * | 1997-04-04 | 2000-05-02 | Biosite Diagnostics, Inc. | Polyvalent display libraries |
EP0985033A4 (en) * | 1997-04-04 | 2005-07-13 | Biosite Inc | Polyvalent and polyclonal libraries |
DE69834032T2 (en) | 1997-04-23 | 2006-12-07 | Universität Zürich | METHOD OF DETECTING NUCLEIC ACID MOLECULES COPYING FOR (POLY) PEPTIDES THAT INTERACT WITH TARGET MOLECULES |
CA2288994C (en) | 1997-04-30 | 2011-07-05 | Enzon, Inc. | Polyalkylene oxide-modified single chain polypeptides |
ATE319745T1 (en) | 1997-05-21 | 2006-03-15 | Biovation Ltd | METHOD FOR PRODUCING NON-IMMUNOGENIC PROTEINS |
AU8691398A (en) | 1997-08-04 | 1999-02-22 | Ixsys, Incorporated | Methods for identifying ligand specific binding molecules |
US6391311B1 (en) | 1998-03-17 | 2002-05-21 | Genentech, Inc. | Polypeptides having homology to vascular endothelial cell growth factor and bone morphogenetic protein 1 |
GB9722131D0 (en) * | 1997-10-20 | 1997-12-17 | Medical Res Council | Method |
IL136392A0 (en) | 1997-11-27 | 2001-06-14 | Max Planck Gesellschaft | Improved method for the identification and characterization of interacting molecules using automation |
EP1040201A4 (en) | 1997-11-28 | 2001-11-21 | Invitrogen Corp | Single chain monoclonal antibody fusion reagents that regulate transcription in vivo |
US6759243B2 (en) | 1998-01-20 | 2004-07-06 | Board Of Trustees Of The University Of Illinois | High affinity TCR proteins and methods |
WO1999039744A1 (en) | 1998-02-10 | 1999-08-12 | The Ohio State University Research Foundation | Compositions and methods for polynucleotide delivery |
US6187535B1 (en) | 1998-02-18 | 2001-02-13 | Institut Pasteur | Fast and exhaustive method for selecting a prey polypeptide interacting with a bait polypeptide of interest: application to the construction of maps of interactors polypeptides |
DK1068357T3 (en) | 1998-03-30 | 2011-12-12 | Northwest Biotherapeutics Inc | Therapeutic and diagnostic applications based on the role of the CXCR-4 gene in tumor genesis |
US20020029391A1 (en) | 1998-04-15 | 2002-03-07 | Claude Geoffrey Davis | Epitope-driven human antibody production and gene expression profiling |
US7244826B1 (en) | 1998-04-24 | 2007-07-17 | The Regents Of The University Of California | Internalizing ERB2 antibodies |
WO2000018905A1 (en) | 1998-09-25 | 2000-04-06 | G.D. Searle & Co. | Method of producing permuteins by scanning permutagenesis |
GB2344886B (en) | 1999-03-10 | 2000-11-01 | Medical Res Council | Selection of intracellular immunoglobulins |
US6531580B1 (en) * | 1999-06-24 | 2003-03-11 | Ixsys, Inc. | Anti-αvβ3 recombinant human antibodies and nucleic acids encoding same |
JP4312403B2 (en) * | 1999-07-20 | 2009-08-12 | モルフォシス・アクチェンゲゼルシャフト | Novel method for displaying (poly) peptide / protein on bacteriophage particles via disulfide bonds |
US6171795B1 (en) | 1999-07-29 | 2001-01-09 | Nexstar Pharmaceuticals, Inc. | Nucleic acid ligands to CD40ligand |
AU7839700A (en) | 1999-09-29 | 2001-04-30 | Ronald W. Barrett | Compounds displayed on replicable genetic packages and methods of using same |
US7037706B1 (en) | 1999-09-29 | 2006-05-02 | Xenoport, Inc. | Compounds displayed on replicable genetic packages and methods of using same |
WO2001079229A2 (en) | 2000-04-13 | 2001-10-25 | Genaissance Pharmaceuticals, Inc. | Haplotypes of the cxcr4 gene |
ES2393535T3 (en) | 2000-04-17 | 2012-12-26 | Dyax Corp. | New methods to build libraries of genetic packages that collectively present members of a diverse family of peptides, polypeptides or proteins |
US8288322B2 (en) * | 2000-04-17 | 2012-10-16 | Dyax Corp. | Methods of constructing libraries comprising displayed and/or expressed members of a diverse family of peptides, polypeptides or proteins and the novel libraries |
US6358733B1 (en) | 2000-05-19 | 2002-03-19 | Apolife, Inc. | Expression of heterologous multi-domain proteins in yeast |
US20050158838A1 (en) * | 2000-06-19 | 2005-07-21 | Dyax Corp., A Delaware Corporation | Novel enterokinase cleavage sequences |
US6410246B1 (en) | 2000-06-23 | 2002-06-25 | Genetastix Corporation | Highly diverse library of yeast expression vectors |
US6410271B1 (en) | 2000-06-23 | 2002-06-25 | Genetastix Corporation | Generation of highly diverse library of expression vectors via homologous recombination in yeast |
US6406863B1 (en) | 2000-06-23 | 2002-06-18 | Genetastix Corporation | High throughput generation and screening of fully human antibody repertoire in yeast |
WO2002006464A2 (en) | 2000-07-13 | 2002-01-24 | The Curators Of The University Of Missouri | Large scale expression and purification of recombinant proteins |
US6512263B1 (en) | 2000-09-22 | 2003-01-28 | Sandisk Corporation | Non-volatile memory cell array having discontinuous source and drain diffusions contacted by continuous bit line conductors and methods of forming |
CA2424295A1 (en) | 2000-10-12 | 2002-04-18 | University Of Georgia Research Foundation, Inc. | Metal binding proteins, recombinant host cells and methods |
US7083945B1 (en) | 2000-10-27 | 2006-08-01 | The Board Of Regents Of The University Of Texas System | Isolation of binding proteins with high affinity to ligands |
US7094571B2 (en) | 2000-10-27 | 2006-08-22 | The Board Of Regents Of The University Of Texas System | Combinatorial protein library screening by periplasmic expression |
US6841359B2 (en) | 2000-10-31 | 2005-01-11 | The General Hospital Corporation | Streptavidin-binding peptides and uses thereof |
US7138496B2 (en) | 2002-02-08 | 2006-11-21 | Genetastix Corporation | Human monoclonal antibodies against human CXCR4 |
US20030165988A1 (en) | 2002-02-08 | 2003-09-04 | Shaobing Hua | High throughput generation of human monoclonal antibody against peptide fragments derived from membrane proteins |
US7005503B2 (en) | 2002-02-08 | 2006-02-28 | Genetastix Corporation | Human monoclonal antibody against coreceptors for human immunodeficiency virus |
US6610472B1 (en) | 2000-10-31 | 2003-08-26 | Genetastix Corporation | Assembly and screening of highly complex and fully human antibody repertoire in yeast |
ES2430857T3 (en) * | 2000-12-18 | 2013-11-22 | Dyax Corp. | Targeted libraries of genetic packages |
EP1227321A1 (en) | 2000-12-28 | 2002-07-31 | Institut für Bioanalytik GmbH | Reversible MHC multimer staining for functional purification of antigen-specific T cells |
US20020197691A1 (en) | 2001-04-30 | 2002-12-26 | Myriad Genetics, Incorporated | FLT4-interacting proteins and use thereof |
WO2002057423A2 (en) | 2001-01-16 | 2002-07-25 | Regeneron Pharmaceuticals, Inc. | Isolating cells expressing secreted proteins |
US7229757B2 (en) | 2001-03-21 | 2007-06-12 | Xenoport, Inc. | Compounds displayed on icosahedral phage and methods of using same |
DE10113776B4 (en) | 2001-03-21 | 2012-08-09 | "Iba Gmbh" | Isolated streptavidin-binding, competitively elutable peptide, this comprehensive fusion peptide, nucleic acid coding therefor, expression vector, methods for producing a recombinant fusion protein and methods for detecting and / or obtaining the fusion protein |
CA2443067A1 (en) | 2001-04-05 | 2002-10-17 | Nextgen Sciences Ltd. | Protein analysis by means of immobilized arrays of antigens or antibodies |
AU2002307229B2 (en) | 2001-04-06 | 2007-05-24 | Thomas Jefferson University | Multimerization of HIV-1 Vif protein as a therapeutic target |
US20050048512A1 (en) | 2001-04-26 | 2005-03-03 | Avidia Research Institute | Combinatorial libraries of monomer domains |
US20050089932A1 (en) | 2001-04-26 | 2005-04-28 | Avidia Research Institute | Novel proteins with targeted binding |
WO2002088171A2 (en) | 2001-04-26 | 2002-11-07 | Avidia Research Institute | Combinatorial libraries of monomer domains |
US20050053973A1 (en) | 2001-04-26 | 2005-03-10 | Avidia Research Institute | Novel proteins with targeted binding |
US20040175756A1 (en) | 2001-04-26 | 2004-09-09 | Avidia Research Institute | Methods for using combinatorial libraries of monomer domains |
DE10129815A1 (en) | 2001-06-24 | 2003-01-09 | Profos Ag | Process for the purification of bacterial cells and cell components |
GB0118337D0 (en) | 2001-07-27 | 2001-09-19 | Lonza Biologics Plc | Method for selecting antibody expressing cells |
US6833441B2 (en) | 2001-08-01 | 2004-12-21 | Abmaxis, Inc. | Compositions and methods for generating chimeric heteromultimers |
US7371849B2 (en) | 2001-09-13 | 2008-05-13 | Institute For Antibodies Co., Ltd. | Methods of constructing camel antibody libraries |
ES2372029T3 (en) | 2001-09-25 | 2012-01-13 | F. Hoffmann-La Roche Ag | METHOD FOR A SPECIFIC BIOTINILATION OF IN VITRO POLYPEPTIDE SEQUENCE. |
WO2003029462A1 (en) | 2001-09-27 | 2003-04-10 | Pieris Proteolab Ag | Muteins of human neutrophil gelatinase-associated lipocalin and related proteins |
US7118915B2 (en) | 2001-09-27 | 2006-10-10 | Pieris Proteolab Ag | Muteins of apolipoprotein D |
DE60232672D1 (en) | 2001-10-01 | 2009-07-30 | Dyax Corp | MULTILACKED EUKARYONTIC DISPLAY VECTORS AND THEIR USES |
DE10230147A1 (en) | 2001-10-09 | 2004-01-15 | Profos Ag | Process for non-specific enrichment of bacterial cells |
DE10230997A1 (en) * | 2001-10-26 | 2003-07-17 | Ribopharma Ag | Drug to increase the effectiveness of a receptor-mediates apoptosis in drug that triggers tumor cells |
WO2003038049A2 (en) | 2001-10-29 | 2003-05-08 | Renovis, Inc. | Method for isolating cell-type specific mrnas |
AU2003222717A1 (en) | 2002-03-01 | 2003-09-16 | Erdmann, Volker, A. | Streptavidin-binding peptide |
WO2003078457A1 (en) | 2002-03-19 | 2003-09-25 | Cincinnati Children's Hospital Medical Center | MUTEINS OF THE C5a ANAPHYLATOXIN, NUCLEIC ACID MOLECULES ENCODING SUCH MUTEINS, AND PHARMACEUTICAL USES OF MUTEINS OF THE C5a ANAPHYLATOXIN |
US20030228302A1 (en) * | 2002-04-17 | 2003-12-11 | Roberto Crea | Universal libraries for immunoglobulins |
AU2003240005B2 (en) | 2002-06-14 | 2010-08-12 | Takeda Pharmaceutical Company Limited | Recombination of nucleic acid library members |
CA2490467C (en) | 2002-06-24 | 2011-06-07 | Profos Ag | Method for detecting and for removing endotoxin |
AU2003255250A1 (en) | 2002-08-08 | 2004-02-25 | Insert Therapeutics, Inc. | Compositions and uses of motor protein-binding moieties |
US20040067532A1 (en) | 2002-08-12 | 2004-04-08 | Genetastix Corporation | High throughput generation and affinity maturation of humanized antibody |
JP2006500035A (en) | 2002-09-23 | 2006-01-05 | マクロジェニックス インコーポレイテッド | Vaccine identification method and vaccination composition comprising herpesviridae nucleic acid sequence and / or polypeptide sequence |
US20050058661A1 (en) | 2002-10-18 | 2005-03-17 | Sykes Kathryn F. | Methods and compositions for vaccination comprising nucleic acid and/or polypeptide sequences of the genus Borrelia |
US7172877B2 (en) | 2003-01-09 | 2007-02-06 | Massachusetts Institute Of Technology | Methods and compositions for peptide and protein labeling |
US20050233389A1 (en) | 2003-01-09 | 2005-10-20 | Massachusetts Institute Of Technology | Methods and compositions for peptide and protein labeling |
AU2004204462B2 (en) | 2003-01-09 | 2012-03-08 | Macrogenics, Inc. | Dual expression vector system for antibody expression in bacterial and mammalian cells |
EP2368578A1 (en) | 2003-01-09 | 2011-09-28 | Macrogenics, Inc. | Identification and engineering of antibodies with variant Fc regions and methods of using same |
EP1452601A1 (en) | 2003-02-28 | 2004-09-01 | Roche Diagnostics GmbH | Enhanced expression of fusion polypeptides with a biotinylation tag |
EP1454981A1 (en) | 2003-03-03 | 2004-09-08 | Institut National De La Sante Et De La Recherche Medicale (Inserm) | Infectious pestivirus pseudo-particles containing functional erns, E1, E2 envelope proteins |
US20060024676A1 (en) | 2003-04-14 | 2006-02-02 | Karen Uhlmann | Method of detecting epigenetic biomarkers by quantitative methyISNP analysis |
KR20060036901A (en) | 2003-05-02 | 2006-05-02 | 시그마-알드리치컴퍼니 | Solid phase cell lysis and capture platform |
WO2004101790A1 (en) * | 2003-05-14 | 2004-11-25 | Domantis Limited | A process for recovering polypeptides that unfold reversibly from a polypeptide repertoire |
JP2007530913A (en) | 2003-07-10 | 2007-11-01 | ブラインド・ピッグ・プロテオミクス・インコーポレーテッド | Universal detection of binding |
US7569215B2 (en) | 2003-07-18 | 2009-08-04 | Massachusetts Institute Of Technology | Mutant interleukin-2 (IL-2) polypeptides |
EP1503161A3 (en) | 2003-08-01 | 2006-08-09 | Asahi Glass Company Ltd. | Firing container for silicon nitride ceramics |
EP2824190A1 (en) | 2003-09-09 | 2015-01-14 | Integrigen, Inc. | Methods and compositions for generation of germline human antibody genes |
EP1675878A2 (en) | 2003-10-24 | 2006-07-05 | Avidia, Inc. | Ldl receptor class a and egf domain monomers and multimers |
CA2548817A1 (en) | 2003-12-04 | 2005-06-23 | Xencor, Inc. | Methods of generating variant proteins with increased host string content and compositions thereof |
US20050191710A1 (en) | 2004-03-01 | 2005-09-01 | Hanrahan John W. | Method for labeling a membrane-localized protein |
WO2006026248A1 (en) | 2004-08-25 | 2006-03-09 | Sigma-Aldrich Co. | Compositions and methods employing zwitterionic detergent combinations |
EP1790202A4 (en) | 2004-09-17 | 2013-02-20 | Pacific Biosciences California | Apparatus and method for analysis of molecules |
WO2006069331A2 (en) | 2004-12-22 | 2006-06-29 | The Salk Institute For Biological Studies | Compositions and methods for producing recombinant proteins |
AU2006210660B2 (en) * | 2005-02-01 | 2011-12-01 | Morphosys Ag | Libraries and methods for isolating antibodies |
CN101115771B (en) | 2005-02-03 | 2013-06-05 | 安迪拓普有限公司 | Human antibodies and proteins |
US20070141548A1 (en) | 2005-03-11 | 2007-06-21 | Jorg Kohl | Organ transplant solutions and method for transplanting organs |
JP2008546392A (en) | 2005-06-17 | 2008-12-25 | バイオレクシス ファーマシューティカル コーポレーション | Immobilized transferrin fusion protein library |
EP1957531B1 (en) * | 2005-11-07 | 2016-04-13 | Genentech, Inc. | Binding polypeptides with diversified and consensus vh/vl hypervariable sequences |
RU2470941C2 (en) * | 2005-12-02 | 2012-12-27 | Дженентек, Инк. | Binding polypeptides and use thereof |
ES2394722T3 (en) | 2005-12-20 | 2013-02-05 | Morphosys Ag | New set of HCDR3 regions and their uses |
US20070275416A1 (en) | 2006-05-16 | 2007-11-29 | Gsf-Forschungszentrum Fuer Umwelt Und Gesundheit Gmbh | Affinity marker for purification of proteins |
WO2008019366A2 (en) | 2006-08-07 | 2008-02-14 | Ludwig Institute For Cancer Research | Methods and compositions for increased priming of t-cells through cross-presentation of exogenous antigens |
US8807164B2 (en) | 2006-08-30 | 2014-08-19 | Semba Biosciences, Inc. | Valve module and methods for simulated moving bed chromatography |
US7790040B2 (en) | 2006-08-30 | 2010-09-07 | Semba Biosciences, Inc. | Continuous isocratic affinity chromatography |
US7806137B2 (en) | 2006-08-30 | 2010-10-05 | Semba Biosciences, Inc. | Control system for simulated moving bed chromatography |
CN103541018A (en) * | 2006-10-02 | 2014-01-29 | 航道生物技术有限责任公司 | Design and construction of diverse synthetic peptide and polypeptide librarie |
WO2008067547A2 (en) | 2006-11-30 | 2008-06-05 | Research Development Foundation | Improved immunoglobulin libraries |
JP5564266B2 (en) | 2007-02-16 | 2014-07-30 | メリマック ファーマシューティカルズ インコーポレーティッド | Antibodies against ERBB3 and uses thereof |
US8877688B2 (en) | 2007-09-14 | 2014-11-04 | Adimab, Llc | Rationally designed, synthetic antibody libraries and uses therefor |
BRPI0816785A2 (en) | 2007-09-14 | 2017-05-02 | Adimab Inc | rationally designed synthetic antibody libraries, and uses thereof |
US8067339B2 (en) | 2008-07-09 | 2011-11-29 | Merck Sharp & Dohme Corp. | Surface display of whole antibodies in eukaryotes |
KR20110112301A (en) | 2008-11-18 | 2011-10-12 | 메리맥 파마슈티컬즈, 인크. | Human serum albumin linkers and conjugates thereof |
CA2773564A1 (en) * | 2009-09-14 | 2011-03-17 | Dyax Corp. | Libraries of genetic packages comprising novel hc cdr3 designs |
WO2011035205A2 (en) | 2009-09-18 | 2011-03-24 | Calmune Corporation | Antibodies against candida, collections thereof and methods of use |
CN103403020A (en) | 2010-12-21 | 2013-11-20 | Jsr株式会社 | Novel alkali-resistant variants of protein a and their use in affinity chromatography |
WO2012094653A2 (en) | 2011-01-07 | 2012-07-12 | Massachusetts Institute Of Technology | Compositions and methods for macromolecular drug delivery |
SG10201604554WA (en) | 2011-06-08 | 2016-07-28 | Emd Millipore Corp | Chromatography matrices including novel staphylococcus aureus protein a based ligands |
-
2008
- 2008-09-12 BR BRPI0816785A patent/BRPI0816785A2/en not_active Application Discontinuation
- 2008-09-12 EP EP20161796.6A patent/EP3753947A1/en active Pending
- 2008-09-12 US US12/210,072 patent/US8691730B2/en active Active
- 2008-09-12 EP EP16169483.1A patent/EP3124497B1/en active Active
- 2008-09-12 WO PCT/US2008/076300 patent/WO2009036379A2/en active Application Filing
- 2008-09-12 AU AU2008298603A patent/AU2008298603B2/en active Active
- 2008-09-12 CA CA2964398A patent/CA2964398C/en active Active
- 2008-09-12 CN CN200880116593.0A patent/CN101855242B/en active Active
- 2008-09-12 DK DK16169483.1T patent/DK3124497T3/en active
- 2008-09-12 MX MX2010002661A patent/MX2010002661A/en active IP Right Grant
- 2008-09-12 JP JP2010525049A patent/JP5933894B2/en active Active
- 2008-09-12 CA CA2697193A patent/CA2697193C/en active Active
- 2008-09-12 MX MX2013010158A patent/MX344415B/en unknown
- 2008-09-12 CA CA3187687A patent/CA3187687A1/en active Pending
- 2008-09-12 EP EP08830652.7A patent/EP2193146B1/en active Active
-
2010
- 2010-03-09 MX MX2021006269A patent/MX2021006269A/en unknown
-
2011
- 2011-02-15 HK HK11101445.6A patent/HK1147271A1/en unknown
-
2014
- 2014-01-08 US US14/150,129 patent/US10189894B2/en active Active
- 2014-10-30 JP JP2014220988A patent/JP6383640B2/en active Active
-
2016
- 2016-05-06 JP JP2016092989A patent/JP6434447B2/en active Active
-
2017
- 2017-11-02 JP JP2017212474A patent/JP6685267B2/en active Active
-
2018
- 2018-12-10 US US16/215,523 patent/US11008383B2/en active Active
-
2020
- 2020-03-31 JP JP2020062671A patent/JP6997246B2/en active Active
-
2021
- 2021-12-16 JP JP2021204097A patent/JP2022040137A/en not_active Withdrawn
-
2023
- 2023-08-30 US US17/856,673 patent/US20230399386A1/en active Pending
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230399386A1 (en) | Rationally designed, synthetic antibody libraries and uses therefor | |
US11008568B2 (en) | Rationally designed, synthetic antibody libraries and uses therefor | |
US11390964B2 (en) | Polyclonal mixtures of antibodies, and methods of making and using them | |
US20230348901A1 (en) | Rationally designed, synthetic antibody libraries and uses therefor | |
AU2019204933B2 (en) | Rationally designed, synthetic antibody libraries and uses therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |