WO2024030344A1 - Genetic algorithm and imodulon based optimization of media formulation for quality, titer, strain, and process improvement biologics - Google Patents
Genetic algorithm and imodulon based optimization of media formulation for quality, titer, strain, and process improvement biologics Download PDFInfo
- Publication number
- WO2024030344A1 WO2024030344A1 PCT/US2023/029001 US2023029001W WO2024030344A1 WO 2024030344 A1 WO2024030344 A1 WO 2024030344A1 US 2023029001 W US2023029001 W US 2023029001W WO 2024030344 A1 WO2024030344 A1 WO 2024030344A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- antibody
- gene
- media
- cell
- cells
- Prior art date
Links
- 239000000203 mixture Substances 0.000 title claims abstract description 184
- 238000000034 method Methods 0.000 title claims abstract description 179
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 106
- 238000009472 formulation Methods 0.000 title claims abstract description 92
- 230000002068 genetic effect Effects 0.000 title claims description 71
- 238000005457 optimization Methods 0.000 title claims description 34
- 230000008569 process Effects 0.000 title description 30
- 230000006872 improvement Effects 0.000 title description 21
- 229960000074 biopharmaceutical Drugs 0.000 title description 2
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 259
- 230000014509 gene expression Effects 0.000 claims abstract description 78
- 238000002156 mixing Methods 0.000 claims abstract description 30
- 238000012258 culturing Methods 0.000 claims abstract description 28
- 230000001737 promoting effect Effects 0.000 claims abstract description 6
- 210000004027 cell Anatomy 0.000 claims description 212
- 102000004169 proteins and genes Human genes 0.000 claims description 63
- 239000011159 matrix material Substances 0.000 claims description 51
- 230000001976 improved effect Effects 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 41
- 238000003556 assay Methods 0.000 claims description 40
- 241000588724 Escherichia coli Species 0.000 claims description 35
- 239000000758 substrate Substances 0.000 claims description 34
- 239000012634 fragment Substances 0.000 claims description 31
- 230000001939 inductive effect Effects 0.000 claims description 28
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 28
- 238000003559 RNA-seq method Methods 0.000 claims description 27
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 26
- 210000000805 cytoplasm Anatomy 0.000 claims description 20
- 239000000411 inducer Substances 0.000 claims description 16
- 230000004075 alteration Effects 0.000 claims description 15
- 238000013459 approach Methods 0.000 claims description 15
- 238000012512 characterization method Methods 0.000 claims description 15
- 230000001965 increasing effect Effects 0.000 claims description 15
- 150000003839 salts Chemical class 0.000 claims description 15
- 238000012880 independent component analysis Methods 0.000 claims description 14
- 238000005259 measurement Methods 0.000 claims description 14
- 230000002829 reductive effect Effects 0.000 claims description 14
- 239000012092 media component Substances 0.000 claims description 13
- 229910052751 metal Inorganic materials 0.000 claims description 13
- 239000002184 metal Substances 0.000 claims description 13
- 229910052757 nitrogen Inorganic materials 0.000 claims description 13
- 102000039446 nucleic acids Human genes 0.000 claims description 13
- 108020004707 nucleic acids Proteins 0.000 claims description 13
- 150000007523 nucleic acids Chemical class 0.000 claims description 13
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 12
- 239000000427 antigen Substances 0.000 claims description 12
- 108091007433 antigens Proteins 0.000 claims description 12
- 102000036639 antigens Human genes 0.000 claims description 12
- 229910052799 carbon Inorganic materials 0.000 claims description 12
- 102000040430 polynucleotide Human genes 0.000 claims description 12
- 108091033319 polynucleotide Proteins 0.000 claims description 12
- 239000002157 polynucleotide Substances 0.000 claims description 12
- 238000000746 purification Methods 0.000 claims description 12
- 230000001580 bacterial effect Effects 0.000 claims description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000012163 sequencing technique Methods 0.000 claims description 11
- 102000004190 Enzymes Human genes 0.000 claims description 10
- 108090000790 Enzymes Proteins 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 9
- 230000006798 recombination Effects 0.000 claims description 9
- 238000005215 recombination Methods 0.000 claims description 9
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 8
- 239000003446 ligand Substances 0.000 claims description 8
- 230000015654 memory Effects 0.000 claims description 8
- 150000002739 metals Chemical class 0.000 claims description 8
- 239000003607 modifier Substances 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 7
- 239000000872 buffer Substances 0.000 claims description 7
- 230000010261 cell growth Effects 0.000 claims description 7
- 238000002474 experimental method Methods 0.000 claims description 7
- 239000001301 oxygen Substances 0.000 claims description 7
- 229910052760 oxygen Inorganic materials 0.000 claims description 7
- 108020003175 receptors Proteins 0.000 claims description 7
- 102000005962 receptors Human genes 0.000 claims description 7
- 230000001225 therapeutic effect Effects 0.000 claims description 7
- 108010078791 Carrier Proteins Proteins 0.000 claims description 6
- 108010054477 Immunoglobulin Fab Fragments Proteins 0.000 claims description 6
- 102000001706 Immunoglobulin Fab Fragments Human genes 0.000 claims description 6
- 239000012491 analyte Substances 0.000 claims description 6
- 238000004113 cell culture Methods 0.000 claims description 6
- 239000003102 growth factor Substances 0.000 claims description 6
- 238000003306 harvesting Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 235000013343 vitamin Nutrition 0.000 claims description 6
- 239000011782 vitamin Substances 0.000 claims description 6
- 229930003231 vitamin Natural products 0.000 claims description 6
- 229940088594 vitamin Drugs 0.000 claims description 6
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 claims description 5
- 241000282414 Homo sapiens Species 0.000 claims description 5
- 239000002253 acid Substances 0.000 claims description 5
- 229940079593 drug Drugs 0.000 claims description 5
- 239000003814 drug Substances 0.000 claims description 5
- 210000004962 mammalian cell Anatomy 0.000 claims description 5
- 238000007254 oxidation reaction Methods 0.000 claims description 5
- 239000002243 precursor Substances 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 5
- 150000005846 sugar alcohols Chemical class 0.000 claims description 5
- 241000238631 Hexapoda Species 0.000 claims description 4
- 229920001222 biopolymer Polymers 0.000 claims description 4
- 238000013270 controlled release Methods 0.000 claims description 4
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 4
- 239000000796 flavoring agent Substances 0.000 claims description 4
- 235000019634 flavors Nutrition 0.000 claims description 4
- 230000000855 fungicidal effect Effects 0.000 claims description 4
- 239000000417 fungicide Substances 0.000 claims description 4
- 230000002363 herbicidal effect Effects 0.000 claims description 4
- 239000004009 herbicide Substances 0.000 claims description 4
- 238000000126 in silico method Methods 0.000 claims description 4
- 238000002493 microarray Methods 0.000 claims description 4
- 108091005601 modified peptides Proteins 0.000 claims description 4
- 102000044158 nucleic acid binding protein Human genes 0.000 claims description 4
- 108700020942 nucleic acid binding protein Proteins 0.000 claims description 4
- 239000000575 pesticide Substances 0.000 claims description 4
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 4
- 229930000044 secondary metabolite Natural products 0.000 claims description 4
- 150000003384 small molecules Chemical class 0.000 claims description 4
- 239000007790 solid phase Substances 0.000 claims description 4
- 239000011573 trace mineral Substances 0.000 claims description 4
- 235000013619 trace mineral Nutrition 0.000 claims description 4
- BWGNESOTFCXPMA-UHFFFAOYSA-N Dihydrogen disulfide Chemical compound SS BWGNESOTFCXPMA-UHFFFAOYSA-N 0.000 claims description 3
- 102000004195 Isomerases Human genes 0.000 claims description 3
- 108090000769 Isomerases Proteins 0.000 claims description 3
- 108090000854 Oxidoreductases Proteins 0.000 claims description 3
- 102000004316 Oxidoreductases Human genes 0.000 claims description 3
- 238000000338 in vitro Methods 0.000 claims description 3
- 238000001727 in vivo Methods 0.000 claims description 3
- 230000003647 oxidation Effects 0.000 claims description 3
- 239000002245 particle Substances 0.000 claims description 3
- 239000013610 patient sample Substances 0.000 claims description 3
- 239000000843 powder Substances 0.000 claims description 3
- 238000011160 research Methods 0.000 claims description 3
- 108010027322 single cell proteins Proteins 0.000 claims description 3
- 150000003722 vitamin derivatives Chemical class 0.000 claims description 3
- 239000000047 product Substances 0.000 description 69
- 235000018102 proteins Nutrition 0.000 description 44
- 230000027455 binding Effects 0.000 description 25
- 102000004196 processed proteins & peptides Human genes 0.000 description 22
- 229920001184 polypeptide Polymers 0.000 description 21
- 238000004458 analytical method Methods 0.000 description 19
- 150000001413 amino acids Chemical class 0.000 description 18
- 230000035772 mutation Effects 0.000 description 18
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 16
- 230000000694 effects Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 14
- 239000000463 material Substances 0.000 description 14
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 12
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 12
- 230000001590 oxidative effect Effects 0.000 description 12
- 238000013341 scale-up Methods 0.000 description 12
- 102000006010 Protein Disulfide-Isomerase Human genes 0.000 description 11
- 238000000855 fermentation Methods 0.000 description 11
- 108020003519 protein disulfide isomerase Proteins 0.000 description 11
- 229940024606 amino acid Drugs 0.000 description 10
- 235000001014 amino acid Nutrition 0.000 description 10
- 230000004151 fermentation Effects 0.000 description 10
- 238000004519 manufacturing process Methods 0.000 description 10
- 239000000523 sample Substances 0.000 description 10
- 230000012010 growth Effects 0.000 description 9
- -1 or serum Substances 0.000 description 9
- 238000000513 principal component analysis Methods 0.000 description 9
- 108091026890 Coding region Proteins 0.000 description 8
- 108010006519 Molecular Chaperones Proteins 0.000 description 8
- 229960002685 biotin Drugs 0.000 description 8
- 235000020958 biotin Nutrition 0.000 description 8
- 239000011616 biotin Substances 0.000 description 8
- 210000000349 chromosome Anatomy 0.000 description 8
- 230000002950 deficient Effects 0.000 description 8
- 239000000243 solution Substances 0.000 description 8
- 238000002198 surface plasmon resonance spectroscopy Methods 0.000 description 8
- 230000004083 survival effect Effects 0.000 description 8
- 241000894006 Bacteria Species 0.000 description 7
- 108010021625 Immunoglobulin Fragments Proteins 0.000 description 7
- 102000008394 Immunoglobulin Fragments Human genes 0.000 description 7
- 102000005431 Molecular Chaperones Human genes 0.000 description 7
- 238000006664 bond formation reaction Methods 0.000 description 7
- 108010047041 Complementarity Determining Regions Proteins 0.000 description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 6
- 229910052742 iron Inorganic materials 0.000 description 6
- 239000002773 nucleotide Substances 0.000 description 6
- 125000003729 nucleotide group Chemical group 0.000 description 6
- 238000006467 substitution reaction Methods 0.000 description 6
- 101150057627 trxB gene Proteins 0.000 description 6
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 5
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 5
- 235000003642 hunger Nutrition 0.000 description 5
- 239000007788 liquid Substances 0.000 description 5
- 230000003938 response to stress Effects 0.000 description 5
- 230000037351 starvation Effects 0.000 description 5
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 4
- 239000002028 Biomass Substances 0.000 description 4
- SRBFZHDQGSBBOR-IOVATXLUSA-N D-xylopyranose Chemical compound O[C@@H]1COC(O)[C@H](O)[C@H]1O SRBFZHDQGSBBOR-IOVATXLUSA-N 0.000 description 4
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 4
- 108091005804 Peptidases Proteins 0.000 description 4
- 239000004365 Protease Substances 0.000 description 4
- 230000004071 biological effect Effects 0.000 description 4
- 230000001086 cytosolic effect Effects 0.000 description 4
- 239000000975 dye Substances 0.000 description 4
- 239000007789 gas Substances 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 239000013028 medium composition Substances 0.000 description 4
- 230000004060 metabolic process Effects 0.000 description 4
- 235000015097 nutrients Nutrition 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 3
- 102000035195 Peptidases Human genes 0.000 description 3
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 3
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 3
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- PYMYPHUHKUWMLA-WDCZJNDASA-N arabinose Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)C=O PYMYPHUHKUWMLA-WDCZJNDASA-N 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000012575 bio-layer interferometry Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 235000018417 cysteine Nutrition 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000010790 dilution Methods 0.000 description 3
- 239000012895 dilution Substances 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 3
- RWSXRVCMGQZWBV-WDSKDSINSA-N glutathione Chemical compound OC(=O)[C@@H](N)CCC(=O)N[C@@H](CS)C(=O)NCC(O)=O RWSXRVCMGQZWBV-WDSKDSINSA-N 0.000 description 3
- 230000013595 glycosylation Effects 0.000 description 3
- 238000006206 glycosylation reaction Methods 0.000 description 3
- 101150096947 gshB gene Proteins 0.000 description 3
- 230000006698 induction Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000002503 metabolic effect Effects 0.000 description 3
- 238000002705 metabolomic analysis Methods 0.000 description 3
- 230000001431 metabolomic effect Effects 0.000 description 3
- 239000008188 pellet Substances 0.000 description 3
- 210000001322 periplasm Anatomy 0.000 description 3
- 239000013014 purified material Substances 0.000 description 3
- 229910052717 sulfur Inorganic materials 0.000 description 3
- 239000011593 sulfur Substances 0.000 description 3
- 230000002103 transcriptional effect Effects 0.000 description 3
- 102100024341 10 kDa heat shock protein, mitochondrial Human genes 0.000 description 2
- 102100038222 60 kDa heat shock protein, mitochondrial Human genes 0.000 description 2
- 241000203069 Archaea Species 0.000 description 2
- VTYYLEPIZMXCLO-UHFFFAOYSA-L Calcium carbonate Chemical compound [Ca+2].[O-]C([O-])=O VTYYLEPIZMXCLO-UHFFFAOYSA-L 0.000 description 2
- 108030002440 Catalase peroxidases Proteins 0.000 description 2
- 101710097430 Catalase-peroxidase Proteins 0.000 description 2
- 108010059013 Chaperonin 10 Proteins 0.000 description 2
- 108010058432 Chaperonin 60 Proteins 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 2
- 108091035707 Consensus sequence Proteins 0.000 description 2
- FBPFZTCFMRRESA-FSIIMWSLSA-N D-Glucitol Natural products OC[C@H](O)[C@H](O)[C@@H](O)[C@H](O)CO FBPFZTCFMRRESA-FSIIMWSLSA-N 0.000 description 2
- FBPFZTCFMRRESA-JGWLITMVSA-N D-glucitol Chemical compound OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-JGWLITMVSA-N 0.000 description 2
- 101100015982 Dictyostelium discoideum gcsA gene Proteins 0.000 description 2
- 241000588722 Escherichia Species 0.000 description 2
- 101100532764 Escherichia coli (strain K12) scpC gene Proteins 0.000 description 2
- 241001522878 Escherichia coli B Species 0.000 description 2
- 241000192128 Gammaproteobacteria Species 0.000 description 2
- 108010010803 Gelatin Proteins 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- 108010024636 Glutathione Proteins 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- 102000003886 Glycoproteins Human genes 0.000 description 2
- 108090000288 Glycoproteins Proteins 0.000 description 2
- 102100023737 GrpE protein homolog 1, mitochondrial Human genes 0.000 description 2
- 102000004447 HSP40 Heat-Shock Proteins Human genes 0.000 description 2
- 108010042283 HSP40 Heat-Shock Proteins Proteins 0.000 description 2
- 101000829489 Homo sapiens GrpE protein homolog 1, mitochondrial Proteins 0.000 description 2
- 108010067060 Immunoglobulin Variable Region Proteins 0.000 description 2
- 102000017727 Immunoglobulin Variable Region Human genes 0.000 description 2
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 description 2
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 2
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 2
- 101100172084 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) egtA gene Proteins 0.000 description 2
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 2
- 229910019142 PO4 Inorganic materials 0.000 description 2
- 239000001888 Peptone Substances 0.000 description 2
- 108010080698 Peptones Proteins 0.000 description 2
- 108010009736 Protein Hydrolysates Proteins 0.000 description 2
- 102000002278 Ribosomal Proteins Human genes 0.000 description 2
- 108010000605 Ribosomal Proteins Proteins 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 241000607720 Serratia Species 0.000 description 2
- 108091023040 Transcription factor Proteins 0.000 description 2
- 102000040945 Transcription factor Human genes 0.000 description 2
- 101150024831 ahpC gene Proteins 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- 229940088710 antibiotic agent Drugs 0.000 description 2
- 101150076178 araE gene Proteins 0.000 description 2
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 2
- 238000010378 bimolecular fluorescence complementation Methods 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 210000004899 c-terminal region Anatomy 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 2
- 229960002433 cysteine Drugs 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013400 design of experiment Methods 0.000 description 2
- 230000000368 destabilizing effect Effects 0.000 description 2
- 239000013024 dilution buffer Substances 0.000 description 2
- 238000010494 dissociation reaction Methods 0.000 description 2
- 230000005593 dissociations Effects 0.000 description 2
- 238000010387 dual polarisation interferometry Methods 0.000 description 2
- 238000002296 dynamic light scattering Methods 0.000 description 2
- VWWQXMAJTJZDQX-UYBVJOGSSA-N flavin adenine dinucleotide Chemical compound C1=NC2=C(N)N=CN=C2N1[C@@H]([C@H](O)[C@@H]1O)O[C@@H]1CO[P@](O)(=O)O[P@@](O)(=O)OC[C@@H](O)[C@@H](O)[C@@H](O)CN1C2=NC(=O)NC(=O)C2=NC2=C1C=C(C)C(C)=C2 VWWQXMAJTJZDQX-UYBVJOGSSA-N 0.000 description 2
- 235000019162 flavin adenine dinucleotide Nutrition 0.000 description 2
- 239000011714 flavin adenine dinucleotide Substances 0.000 description 2
- 229940093632 flavin-adenine dinucleotide Drugs 0.000 description 2
- 238000000684 flow cytometry Methods 0.000 description 2
- 238000010388 flow-induced dispersion analysis Methods 0.000 description 2
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 2
- 239000011888 foil Substances 0.000 description 2
- 239000008273 gelatin Substances 0.000 description 2
- 229920000159 gelatin Polymers 0.000 description 2
- 235000019322 gelatine Nutrition 0.000 description 2
- 235000011852 gelatine desserts Nutrition 0.000 description 2
- 238000002921 genetic algorithm search Methods 0.000 description 2
- 230000004077 genetic alteration Effects 0.000 description 2
- 231100000118 genetic alteration Toxicity 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 101150019860 gshA gene Proteins 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000010438 iron metabolism Effects 0.000 description 2
- 238000000111 isothermal titration calorimetry Methods 0.000 description 2
- 101150111658 katE gene Proteins 0.000 description 2
- 101150013110 katG gene Proteins 0.000 description 2
- 239000008101 lactose Substances 0.000 description 2
- 229960003136 leucine Drugs 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- HEBKCHPVOIAQTA-UHFFFAOYSA-N meso ribitol Natural products OCC(O)C(O)C(O)CO HEBKCHPVOIAQTA-UHFFFAOYSA-N 0.000 description 2
- 239000002207 metabolite Substances 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 229960004452 methionine Drugs 0.000 description 2
- 230000000813 microbial effect Effects 0.000 description 2
- 238000001768 microscale thermophoresis Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 230000036542 oxidative stress Effects 0.000 description 2
- 101150093025 pepA gene Proteins 0.000 description 2
- 235000019319 peptone Nutrition 0.000 description 2
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 2
- 239000010452 phosphate Substances 0.000 description 2
- 230000012846 protein folding Effects 0.000 description 2
- 230000004850 protein–protein interaction Effects 0.000 description 2
- 238000010384 proximity ligation assay Methods 0.000 description 2
- 238000010383 quantitative immunoprecipitation combined with knock-down Methods 0.000 description 2
- 210000003705 ribosome Anatomy 0.000 description 2
- 238000010390 single colour reflectometry Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000000600 sorbitol Substances 0.000 description 2
- 235000010356 sorbitol Nutrition 0.000 description 2
- 229960002920 sorbitol Drugs 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 238000001370 static light scattering Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000010381 tandem affinity purification Methods 0.000 description 2
- 239000013077 target material Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 108060008226 thioredoxin Proteins 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000032258 transport Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 239000012138 yeast extract Substances 0.000 description 2
- HDTRYLNUVZCQOY-UHFFFAOYSA-N α-D-glucopyranosyl-α-D-glucopyranoside Natural products OC1C(O)C(O)C(CO)OC1OC1C(O)C(O)C(O)C(CO)O1 HDTRYLNUVZCQOY-UHFFFAOYSA-N 0.000 description 1
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- AOFUBOWZWQFQJU-SNOJBQEQSA-N (2r,3s,4s,5r)-2,5-bis(hydroxymethyl)oxolane-2,3,4-triol;(2s,3r,4s,5s,6r)-6-(hydroxymethyl)oxane-2,3,4,5-tetrol Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O.OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@@H]1O AOFUBOWZWQFQJU-SNOJBQEQSA-N 0.000 description 1
- OWEGMIWEEQEYGQ-UHFFFAOYSA-N 100676-05-9 Natural products OC1C(O)C(O)C(CO)OC1OCC1C(O)C(O)C(O)C(OC2C(OC(O)C(O)C2O)CO)O1 OWEGMIWEEQEYGQ-UHFFFAOYSA-N 0.000 description 1
- PIZHFBODNLEQBL-UHFFFAOYSA-N 2,2-diethoxy-1-phenylethanone Chemical compound CCOC(OCC)C(=O)C1=CC=CC=C1 PIZHFBODNLEQBL-UHFFFAOYSA-N 0.000 description 1
- FHVDTGUDJYJELY-UHFFFAOYSA-N 6-{[2-carboxy-4,5-dihydroxy-6-(phosphanyloxy)oxan-3-yl]oxy}-4,5-dihydroxy-3-phosphanyloxane-2-carboxylic acid Chemical compound O1C(C(O)=O)C(P)C(O)C(O)C1OC1C(C(O)=O)OC(OP)C(O)C1O FHVDTGUDJYJELY-UHFFFAOYSA-N 0.000 description 1
- NYHBQMYGNKIUIF-FJFJXFQQSA-N 9-beta-D-arabinofuranosylguanine Chemical compound C12=NC(N)=NC(O)=C2N=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O NYHBQMYGNKIUIF-FJFJXFQQSA-N 0.000 description 1
- 241000588624 Acinetobacter calcoaceticus Species 0.000 description 1
- ZKHQWZAMYRWXGA-KQYNXXCUSA-N Adenosine triphosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-N 0.000 description 1
- 229920001817 Agar Polymers 0.000 description 1
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 102000009027 Albumins Human genes 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- GUBGYTABKSRVRQ-XLOQQCSPSA-N Alpha-Lactose Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1O[C@@H]1[C@@H](CO)O[C@H](O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-XLOQQCSPSA-N 0.000 description 1
- 241001135756 Alphaproteobacteria Species 0.000 description 1
- VHUUQVKOLVNVRT-UHFFFAOYSA-N Ammonium hydroxide Chemical compound [NH4+].[OH-] VHUUQVKOLVNVRT-UHFFFAOYSA-N 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 241000228245 Aspergillus niger Species 0.000 description 1
- 101100425074 Aspergillus oryzae (strain ATCC 42149 / RIB 40) thiA gene Proteins 0.000 description 1
- 241000589149 Azotobacter vinelandii Species 0.000 description 1
- 241000194108 Bacillus licheniformis Species 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 101100345719 Bacillus subtilis (strain 168) mmgE gene Proteins 0.000 description 1
- 101100345721 Bacillus subtilis (strain 168) mmgF gene Proteins 0.000 description 1
- 241001135755 Betaproteobacteria Species 0.000 description 1
- 241000534630 Brevibacillus choshinensis Species 0.000 description 1
- 241000244203 Caenorhabditis elegans Species 0.000 description 1
- 101100454807 Caenorhabditis elegans lgg-1 gene Proteins 0.000 description 1
- 101100454808 Caenorhabditis elegans lgg-2 gene Proteins 0.000 description 1
- 101100217502 Caenorhabditis elegans lgg-3 gene Proteins 0.000 description 1
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 1
- 101100351592 Caldanaerobacter subterraneus subsp. tengcongensis (strain DSM 15242 / JCM 11007 / NBRC 100824 / MB4) pepT gene Proteins 0.000 description 1
- 241000010804 Caulobacter vibrioides Species 0.000 description 1
- RGJOEKWQDUBAIZ-IBOSZNHHSA-N CoASH Chemical compound O[C@@H]1[C@H](OP(O)(O)=O)[C@@H](COP(O)(=O)OP(O)(=O)OCC(C)(C)[C@@H](O)C(=O)NCCC(=O)NCCS)O[C@H]1N1C2=NC=NC(N)=C2N=C1 RGJOEKWQDUBAIZ-IBOSZNHHSA-N 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000252867 Cupriavidus metallidurans Species 0.000 description 1
- 102000015295 Cysteine Dioxygenase Human genes 0.000 description 1
- 108010039724 Cysteine dioxygenase Proteins 0.000 description 1
- FBPFZTCFMRRESA-KVTDHHQDSA-N D-Mannitol Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-KVTDHHQDSA-N 0.000 description 1
- HEBKCHPVOIAQTA-QWWZWVQMSA-N D-arabinitol Chemical compound OC[C@@H](O)C(O)[C@H](O)CO HEBKCHPVOIAQTA-QWWZWVQMSA-N 0.000 description 1
- SHZGCJCMOBCMKK-UHFFFAOYSA-N D-mannomethylose Natural products CC1OC(O)C(O)C(O)C1O SHZGCJCMOBCMKK-UHFFFAOYSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- HMFHBZSHGGEWLO-IOVATXLUSA-N D-xylofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@H]1O HMFHBZSHGGEWLO-IOVATXLUSA-N 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 101100317179 Dictyostelium discoideum vps26 gene Proteins 0.000 description 1
- 108010089072 Dolichyl-diphosphooligosaccharide-protein glycotransferase Proteins 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 101100407639 Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139) prtB gene Proteins 0.000 description 1
- 241000588914 Enterobacter Species 0.000 description 1
- 241000588921 Enterobacteriaceae Species 0.000 description 1
- 241000588698 Erwinia Species 0.000 description 1
- 239000004386 Erythritol Substances 0.000 description 1
- UNXHWFMMPAWVPI-UHFFFAOYSA-N Erythritol Natural products OCC(O)C(O)CO UNXHWFMMPAWVPI-UHFFFAOYSA-N 0.000 description 1
- 101100108035 Escherichia coli (strain K12) acrE gene Proteins 0.000 description 1
- 101100002294 Escherichia coli (strain K12) argK gene Proteins 0.000 description 1
- 101100172462 Escherichia coli (strain K12) envC gene Proteins 0.000 description 1
- 101100396088 Escherichia coli (strain K12) hybD gene Proteins 0.000 description 1
- 101100071946 Escherichia coli (strain K12) iadA gene Proteins 0.000 description 1
- 101100399519 Escherichia coli (strain K12) loiP gene Proteins 0.000 description 1
- 101100416564 Escherichia coli (strain K12) rfbA gene Proteins 0.000 description 1
- 101100095178 Escherichia coli (strain K12) scpB gene Proteins 0.000 description 1
- 101100482183 Escherichia coli (strain K12) trhP gene Proteins 0.000 description 1
- 101100101536 Escherichia coli (strain K12) ubiU gene Proteins 0.000 description 1
- 101100052373 Escherichia coli (strain K12) ycaL gene Proteins 0.000 description 1
- 101100075131 Escherichia coli (strain K12) ycbZ gene Proteins 0.000 description 1
- 101100544787 Escherichia coli (strain K12) ypdF gene Proteins 0.000 description 1
- 102000009109 Fc receptors Human genes 0.000 description 1
- 108010087819 Fc receptors Proteins 0.000 description 1
- 241000192125 Firmicutes Species 0.000 description 1
- 238000000729 Fisher's exact test Methods 0.000 description 1
- 229930091371 Fructose Natural products 0.000 description 1
- 239000005715 Fructose Substances 0.000 description 1
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 108010081687 Glutamate-cysteine ligase Proteins 0.000 description 1
- 102100039696 Glutamate-cysteine ligase catalytic subunit Human genes 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 102000017278 Glutaredoxin Human genes 0.000 description 1
- 108050005205 Glutaredoxin Proteins 0.000 description 1
- 108010063907 Glutathione Reductase Proteins 0.000 description 1
- 102100036442 Glutathione reductase, mitochondrial Human genes 0.000 description 1
- 108010036164 Glutathione synthase Proteins 0.000 description 1
- 102100034294 Glutathione synthetase Human genes 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- 108700023372 Glycosyltransferases Proteins 0.000 description 1
- 102000051366 Glycosyltransferases Human genes 0.000 description 1
- 241000204933 Haloferax volcanii Species 0.000 description 1
- 101000599573 Homo sapiens InaD-like protein Proteins 0.000 description 1
- 241001480714 Humicola insolens Species 0.000 description 1
- 101001117161 Humicola insolens Protein disulfide-isomerase Proteins 0.000 description 1
- OAKJQQAXSVQMHS-UHFFFAOYSA-N Hydrazine Chemical compound NN OAKJQQAXSVQMHS-UHFFFAOYSA-N 0.000 description 1
- 101100519490 Idiomarina loihiensis (strain ATCC BAA-735 / DSM 15497 / L2-TR) pepQ1 gene Proteins 0.000 description 1
- 108060003951 Immunoglobulin Proteins 0.000 description 1
- 102100037978 InaD-like protein Human genes 0.000 description 1
- 241000588748 Klebsiella Species 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- SHZGCJCMOBCMKK-JFNONXLTSA-N L-rhamnopyranose Chemical compound C[C@@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O SHZGCJCMOBCMKK-JFNONXLTSA-N 0.000 description 1
- PNNNRSAQSRJVSB-UHFFFAOYSA-N L-rhamnose Natural products CC(O)C(O)C(O)C(O)C=O PNNNRSAQSRJVSB-UHFFFAOYSA-N 0.000 description 1
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 240000001929 Lactobacillus brevis Species 0.000 description 1
- 235000013957 Lactobacillus brevis Nutrition 0.000 description 1
- 241000186679 Lactobacillus buchneri Species 0.000 description 1
- 101100256118 Latilactobacillus sakei sakP gene Proteins 0.000 description 1
- FYYHWMGAXLPEAU-UHFFFAOYSA-N Magnesium Chemical compound [Mg] FYYHWMGAXLPEAU-UHFFFAOYSA-N 0.000 description 1
- 239000005913 Maltodextrin Substances 0.000 description 1
- 229920002774 Maltodextrin Polymers 0.000 description 1
- GUBGYTABKSRVRQ-PICCSMPSSA-N Maltose Natural products O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@@H](CO)OC(O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-PICCSMPSSA-N 0.000 description 1
- 229930195725 Mannitol Natural products 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- ZOKXTWBITQBERF-UHFFFAOYSA-N Molybdenum Chemical compound [Mo] ZOKXTWBITQBERF-UHFFFAOYSA-N 0.000 description 1
- 241000699660 Mus musculus Species 0.000 description 1
- 101100178822 Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) htrA1 gene Proteins 0.000 description 1
- 101100509674 Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) katG3 gene Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 108010090127 Periplasmic Proteins Proteins 0.000 description 1
- 102000007456 Peroxiredoxin Human genes 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- XBDQKXXYIPTUBI-UHFFFAOYSA-M Propionate Chemical compound CCC([O-])=O XBDQKXXYIPTUBI-UHFFFAOYSA-M 0.000 description 1
- 241000192142 Proteobacteria Species 0.000 description 1
- 108010026552 Proteome Proteins 0.000 description 1
- 241000588769 Proteus <enterobacteria> Species 0.000 description 1
- 241000589517 Pseudomonas aeruginosa Species 0.000 description 1
- 241000589776 Pseudomonas putida Species 0.000 description 1
- 238000002123 RNA extraction Methods 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 108700008625 Reporter Genes Proteins 0.000 description 1
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 1
- 101100277437 Rhizobium meliloti (strain 1021) degP1 gene Proteins 0.000 description 1
- 241000191043 Rhodobacter sphaeroides Species 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 108010041388 Ribonucleotide Reductases Proteins 0.000 description 1
- 102000000505 Ribonucleotide Reductases Human genes 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 101150088541 SCPA gene Proteins 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 241000293869 Salmonella enterica subsp. enterica serovar Typhimurium Species 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 241000607768 Shigella Species 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 241000589196 Sinorhizobium meliloti Species 0.000 description 1
- 102000039471 Small Nuclear RNA Human genes 0.000 description 1
- 108020004688 Small Nuclear RNA Proteins 0.000 description 1
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 1
- 102000042773 Small Nucleolar RNA Human genes 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- HIWPGCMGAMJNRG-ACCAVRKYSA-N Sophorose Natural products O([C@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1O)[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 HIWPGCMGAMJNRG-ACCAVRKYSA-N 0.000 description 1
- 229920002472 Starch Polymers 0.000 description 1
- 244000057717 Streptococcus lactis Species 0.000 description 1
- 235000014897 Streptococcus lactis Nutrition 0.000 description 1
- 241000187398 Streptomyces lividans Species 0.000 description 1
- 229930006000 Sucrose Natural products 0.000 description 1
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 1
- 241000205091 Sulfolobus solfataricus Species 0.000 description 1
- 101100157012 Thermoanaerobacterium saccharolyticum (strain DSM 8691 / JW/SL-YS485) xynB gene Proteins 0.000 description 1
- 101710167005 Thiol:disulfide interchange protein DsbD Proteins 0.000 description 1
- 102100036407 Thioredoxin Human genes 0.000 description 1
- 102000002933 Thioredoxin Human genes 0.000 description 1
- 102000013090 Thioredoxin-Disulfide Reductase Human genes 0.000 description 1
- 108010079911 Thioredoxin-disulfide reductase Proteins 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- HDTRYLNUVZCQOY-WSWWMNSNSA-N Trehalose Natural products O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 HDTRYLNUVZCQOY-WSWWMNSNSA-N 0.000 description 1
- 101710154918 Trigger factor Proteins 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- 108020004417 Untranslated RNA Proteins 0.000 description 1
- 102000039634 Untranslated RNA Human genes 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- OIRDTQYFTABQOQ-UHTZMRCNSA-N Vidarabine Chemical group C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O OIRDTQYFTABQOQ-UHTZMRCNSA-N 0.000 description 1
- 239000005862 Whey Substances 0.000 description 1
- 108010046377 Whey Proteins Proteins 0.000 description 1
- 102000007544 Whey Proteins Human genes 0.000 description 1
- 101100340151 Wolinella succinogenes (strain ATCC 29543 / DSM 1740 / LMG 7466 / NCTC 11488 / FDC 602W) hydD gene Proteins 0.000 description 1
- TVXBFESIOXBWNM-UHFFFAOYSA-N Xylitol Natural products OCCC(O)C(O)C(O)CCO TVXBFESIOXBWNM-UHFFFAOYSA-N 0.000 description 1
- 101150098235 abgA gene Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000002299 affinity electrophoresis Methods 0.000 description 1
- 239000008272 agar Substances 0.000 description 1
- 235000010419 agar Nutrition 0.000 description 1
- 229960003767 alanine Drugs 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 229940072056 alginate Drugs 0.000 description 1
- 235000010443 alginic acid Nutrition 0.000 description 1
- 229920000615 alginic acid Polymers 0.000 description 1
- HDTRYLNUVZCQOY-LIZSDCNHSA-N alpha,alpha-trehalose Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 HDTRYLNUVZCQOY-LIZSDCNHSA-N 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- WQZGKKKJIJFFOK-PHYPRBDBSA-N alpha-D-galactose Chemical compound OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@H]1O WQZGKKKJIJFFOK-PHYPRBDBSA-N 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- 239000004411 aluminium Substances 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 239000000908 ammonium hydroxide Substances 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009830 antibody antigen interaction Effects 0.000 description 1
- 239000004599 antimicrobial Substances 0.000 description 1
- 239000003963 antioxidant agent Substances 0.000 description 1
- OIRDTQYFTABQOQ-UHFFFAOYSA-N ara-adenosine Natural products Nc1ncnc2n(cnc12)C1OC(CO)C(O)C1O OIRDTQYFTABQOQ-UHFFFAOYSA-N 0.000 description 1
- 101150035354 araA gene Proteins 0.000 description 1
- 101150097746 araB gene Proteins 0.000 description 1
- 101150017736 araD gene Proteins 0.000 description 1
- 101150084021 araG gene Proteins 0.000 description 1
- 101150023525 araH gene Proteins 0.000 description 1
- 101150092394 argK gene Proteins 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 229960003121 arginine Drugs 0.000 description 1
- 235000009697 arginine Nutrition 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960005261 aspartic acid Drugs 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QUYVBRFLSA-N beta-maltose Chemical compound OC[C@H]1O[C@H](O[C@H]2[C@H](O)[C@@H](O)[C@H](O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@@H]1O GUBGYTABKSRVRQ-QUYVBRFLSA-N 0.000 description 1
- HIWPGCMGAMJNRG-UHFFFAOYSA-N beta-sophorose Natural products OC1C(O)C(CO)OC(O)C1OC1C(O)C(O)C(O)C(CO)O1 HIWPGCMGAMJNRG-UHFFFAOYSA-N 0.000 description 1
- 239000003833 bile salt Substances 0.000 description 1
- 229940093761 bile salts Drugs 0.000 description 1
- 238000010364 biochemical engineering Methods 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 239000011575 calcium Substances 0.000 description 1
- 229910052791 calcium Inorganic materials 0.000 description 1
- 229910000019 calcium carbonate Inorganic materials 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229940041514 candida albicans extract Drugs 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 108010079058 casein hydrolysate Proteins 0.000 description 1
- 239000006143 cell culture medium Substances 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 230000004656 cell transport Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 230000005465 channeling Effects 0.000 description 1
- 239000003610 charcoal Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003638 chemical reducing agent Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- RGJOEKWQDUBAIZ-UHFFFAOYSA-N coenzime A Natural products OC1C(OP(O)(O)=O)C(COP(O)(=O)OP(O)(=O)OCC(C)(C)C(O)C(=O)NCCC(=O)NCCS)OC1N1C2=NC=NC(N)=C2N=C1 RGJOEKWQDUBAIZ-UHFFFAOYSA-N 0.000 description 1
- 239000005516 coenzyme A Substances 0.000 description 1
- 229940093530 coenzyme a Drugs 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000004154 complement system Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000004132 cross linking Methods 0.000 description 1
- 150000001945 cysteines Chemical class 0.000 description 1
- 101150084095 ddpX gene Proteins 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 101150018266 degP gene Proteins 0.000 description 1
- 101150085919 degQ gene Proteins 0.000 description 1
- 101150083941 degS gene Proteins 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- KDTSHFARGAKYJN-UHFFFAOYSA-N dephosphocoenzyme A Natural products OC1C(O)C(COP(O)(=O)OP(O)(=O)OCC(C)(C)C(O)C(=O)NCCC(=O)NCCS)OC1N1C2=NC=NC(N)=C2N=C1 KDTSHFARGAKYJN-UHFFFAOYSA-N 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000008121 dextrose Substances 0.000 description 1
- AIUDWMLXCFRVDR-UHFFFAOYSA-N dimethyl 2-(3-ethyl-3-methylpentyl)propanedioate Chemical class CCC(C)(CC)CCC(C(=O)OC)C(=O)OC AIUDWMLXCFRVDR-UHFFFAOYSA-N 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- UNXHWFMMPAWVPI-ZXZARUISSA-N erythritol Chemical compound OC[C@H](O)[C@H](O)CO UNXHWFMMPAWVPI-ZXZARUISSA-N 0.000 description 1
- 235000019414 erythritol Nutrition 0.000 description 1
- 229940009714 erythritol Drugs 0.000 description 1
- 235000020774 essential nutrients Nutrition 0.000 description 1
- 230000010429 evolutionary process Effects 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000198 fluorescence anisotropy Methods 0.000 description 1
- 238000002875 fluorescence polarization Methods 0.000 description 1
- 102000034287 fluorescent proteins Human genes 0.000 description 1
- 108091006047 fluorescent proteins Proteins 0.000 description 1
- 101150052059 frvX gene Proteins 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 229930182830 galactose Natural products 0.000 description 1
- 239000003349 gelling agent Substances 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 229960002989 glutamic acid Drugs 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 235000004554 glutamine Nutrition 0.000 description 1
- 229960002743 glutamine Drugs 0.000 description 1
- 229960003180 glutathione Drugs 0.000 description 1
- 229960002449 glycine Drugs 0.000 description 1
- 101150077538 gor gene Proteins 0.000 description 1
- 150000003278 haem Chemical class 0.000 description 1
- 229940025294 hemin Drugs 0.000 description 1
- BTIJJDXEELBZFS-QDUVMHSLSA-K hemin Chemical compound CC1=C(CCC(O)=O)C(C=C2C(CCC(O)=O)=C(C)\C(N2[Fe](Cl)N23)=C\4)=N\C1=C/C2=C(C)C(C=C)=C3\C=C/1C(C)=C(C=C)C/4=N\1 BTIJJDXEELBZFS-QDUVMHSLSA-K 0.000 description 1
- 235000019534 high fructose corn syrup Nutrition 0.000 description 1
- 238000013537 high throughput screening Methods 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 229960002885 histidine Drugs 0.000 description 1
- 101150112675 htpX gene Proteins 0.000 description 1
- 101150090028 hyaD gene Proteins 0.000 description 1
- 101150031015 hycH gene Proteins 0.000 description 1
- 108010070336 hydroperoxidase II Proteins 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 102000018358 immunoglobulin Human genes 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000001802 infusion Methods 0.000 description 1
- 238000011081 inoculation Methods 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 229910052816 inorganic phosphate Inorganic materials 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- 238000006317 isomerization reaction Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012933 kinetic analysis Methods 0.000 description 1
- 238000010380 label transfer Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000832 lactitol Substances 0.000 description 1
- 235000010448 lactitol Nutrition 0.000 description 1
- VQHSOMBJVWLPSR-JVCRWLNRSA-N lactitol Chemical compound OC[C@H](O)[C@@H](O)[C@@H]([C@H](O)CO)O[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O VQHSOMBJVWLPSR-JVCRWLNRSA-N 0.000 description 1
- 229960003451 lactitol Drugs 0.000 description 1
- 231100000518 lethal Toxicity 0.000 description 1
- 230000001665 lethal effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000011777 magnesium Substances 0.000 description 1
- 229910052749 magnesium Inorganic materials 0.000 description 1
- 229940035034 maltodextrin Drugs 0.000 description 1
- WPBNNNQJVZRUHP-UHFFFAOYSA-L manganese(2+);methyl n-[[2-(methoxycarbonylcarbamothioylamino)phenyl]carbamothioyl]carbamate;n-[2-(sulfidocarbothioylamino)ethyl]carbamodithioate Chemical compound [Mn+2].[S-]C(=S)NCCNC([S-])=S.COC(=O)NC(=S)NC1=CC=CC=C1NC(=S)NC(=O)OC WPBNNNQJVZRUHP-UHFFFAOYSA-L 0.000 description 1
- 239000000594 mannitol Substances 0.000 description 1
- 235000010355 mannitol Nutrition 0.000 description 1
- 229960001855 mannitol Drugs 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 235000013372 meat Nutrition 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 238000009629 microbiological culture Methods 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 235000010755 mineral Nutrition 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 229910052750 molybdenum Inorganic materials 0.000 description 1
- 239000011733 molybdenum Substances 0.000 description 1
- 229950006238 nadide Drugs 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 229910052759 nickel Inorganic materials 0.000 description 1
- 229930027945 nicotinamide-adenine dinucleotide Natural products 0.000 description 1
- BOPGDPNILDQYTO-NNYOXOHSSA-N nicotinamide-adenine dinucleotide Chemical compound C1=CCC(C(=O)N)=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OC[C@@H]2[C@H]([C@@H](O)[C@@H](O2)N2C3=NC=NC(N)=C3N=C2)O)O1 BOPGDPNILDQYTO-NNYOXOHSSA-N 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 235000016709 nutrition Nutrition 0.000 description 1
- 238000013386 optimize process Methods 0.000 description 1
- 101150035909 pepB gene Proteins 0.000 description 1
- 101150052109 pepDA gene Proteins 0.000 description 1
- 101150104069 pepE gene Proteins 0.000 description 1
- 101150064613 pepN gene Proteins 0.000 description 1
- 101150074180 pepP gene Proteins 0.000 description 1
- 101150095786 pepPI gene Proteins 0.000 description 1
- 101150017363 pepQ gene Proteins 0.000 description 1
- 101150038087 pepd gene Proteins 0.000 description 1
- 229940066779 peptones Drugs 0.000 description 1
- 150000002978 peroxides Chemical class 0.000 description 1
- 108030002458 peroxiredoxin Proteins 0.000 description 1
- 238000002823 phage display Methods 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 229960005190 phenylalanine Drugs 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 239000000244 polyoxyethylene sorbitan monooleate Substances 0.000 description 1
- 235000010482 polyoxyethylene sorbitan monooleate Nutrition 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 229920000053 polysorbate 80 Polymers 0.000 description 1
- 229940068968 polysorbate 80 Drugs 0.000 description 1
- 101150079081 pphB gene Proteins 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 239000006041 probiotic Substances 0.000 description 1
- 230000000529 probiotic effect Effects 0.000 description 1
- 235000018291 probiotics Nutrition 0.000 description 1
- 229960002429 proline Drugs 0.000 description 1
- 239000003223 protective agent Substances 0.000 description 1
- 230000004952 protein activity Effects 0.000 description 1
- 230000004845 protein aggregation Effects 0.000 description 1
- 238000000159 protein binding assay Methods 0.000 description 1
- 239000003531 protein hydrolysate Substances 0.000 description 1
- 101150085802 prpB gene Proteins 0.000 description 1
- 101150077061 prpD gene Proteins 0.000 description 1
- 101150060364 ptrA gene Proteins 0.000 description 1
- 101150014928 ptrB gene Proteins 0.000 description 1
- 101150069887 ptsP gene Proteins 0.000 description 1
- 238000013442 quality metrics Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 101150055510 rfbB gene Proteins 0.000 description 1
- 101150091078 rhaA gene Proteins 0.000 description 1
- 101150032455 rhaB gene Proteins 0.000 description 1
- 101150052254 rhaD gene Proteins 0.000 description 1
- 101150097837 rhaT gene Proteins 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 101150099889 rmlA gene Proteins 0.000 description 1
- 101150043440 rmlB gene Proteins 0.000 description 1
- 102220195881 rs1057517859 Human genes 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 101150024466 scpB gene Proteins 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 229960001153 serine Drugs 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 101150015861 sgcX gene Proteins 0.000 description 1
- 239000000741 silica gel Substances 0.000 description 1
- 229910002027 silica gel Inorganic materials 0.000 description 1
- 101150011963 sohB gene Proteins 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- PZDOWFGHCNHPQD-VNNZMYODSA-N sophorose Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](C=O)O[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O PZDOWFGHCNHPQD-VNNZMYODSA-N 0.000 description 1
- 101150105933 sppA gene Proteins 0.000 description 1
- 101150003162 sprT gene Proteins 0.000 description 1
- 101150033155 sspP gene Proteins 0.000 description 1
- 239000012128 staining reagent Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000008107 starch Substances 0.000 description 1
- 235000019698 starch Nutrition 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 239000005720 sucrose Substances 0.000 description 1
- 108010001535 sulfhydryl oxidase Proteins 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 239000004094 surface-active agent Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 108010057210 telomerase RNA Proteins 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 229940126622 therapeutic monoclonal antibody Drugs 0.000 description 1
- 229940094937 thioredoxin Drugs 0.000 description 1
- 229960002898 threonine Drugs 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 101150102294 tldD gene Proteins 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000001296 transplacental effect Effects 0.000 description 1
- 229960004799 tryptophan Drugs 0.000 description 1
- 229960004441 tyrosine Drugs 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 229960004295 valine Drugs 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 101150011516 xlnD gene Proteins 0.000 description 1
- 101150052264 xylA gene Proteins 0.000 description 1
- 101150110790 xylB gene Proteins 0.000 description 1
- 101150010108 xylF gene Proteins 0.000 description 1
- 101150068654 xylG gene Proteins 0.000 description 1
- 101150004248 xylH gene Proteins 0.000 description 1
- 239000000811 xylitol Substances 0.000 description 1
- 235000010447 xylitol Nutrition 0.000 description 1
- 229960002675 xylitol Drugs 0.000 description 1
- HEBKCHPVOIAQTA-SCDXWVJYSA-N xylitol Chemical compound OC[C@H](O)[C@@H](O)[C@H](O)CO HEBKCHPVOIAQTA-SCDXWVJYSA-N 0.000 description 1
- 238000003158 yeast two-hybrid assay Methods 0.000 description 1
- 101150087407 ygeY gene Proteins 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/5005—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
- G01N33/5008—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
- G01N33/502—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics for testing non-proliferative effects
- G01N33/5023—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics for testing non-proliferative effects on expression patterns
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q3/00—Condition responsive control processes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
Definitions
- the subject matter provided herein relates generally to the field of cell media optimization, and more specifically, to methods and systems for optimizing cell media formulations using optimization and search technique inspired by the principles of natural selection and genetics, that mimic evolutionary processes (e.g., genetic algorithms).
- Bioreactor media is comprised of many components.
- the choice of components and component concentrations can have a profound impact on product quality and titer.
- Current methods for media optimization, even when automated and using advanced mixing algorithms, are time consuming and often do not elucidate the underlying physiological reasons for the improvements in product quality and titer conferred by the optimized mixture.
- Independent component analysis (ICA) and RNAseq have been combined to identify co-expressed, functionally related gene sets (i.e. , determine transcriptional regulatory network activation), but to date a comprehensive method and predictive model combining advanced mixing algorithms with advanced transcriptional, sequencing and gene network analyses has not been described.
- ICA Independent component analysis
- RNAseq have been combined to identify co-expressed, functionally related gene sets (i.e. , determine transcriptional regulatory network activation), but to date a comprehensive method and predictive model combining advanced mixing algorithms with advanced transcriptional, sequencing and gene network analyses has not been described.
- ICA Independent component analysis
- RNAseq have been combined to
- a method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest comprising: (1) applying a mixing algorithm to a high- throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.
- the present disclosure provides a method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.
- the present disclosure provides a method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.
- the present disclosure also provides an aforementioned method optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.
- an aforementioned method wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm.
- the high-throughput device is selected from the group consisting of a liquidhandling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.
- the mixture matrix comprises one or more multi-well plates, one or more controlled-release multiwell plates, and one or more multi-well or multi-vessel bioreactor systems.
- the multi-well plate comprises 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells.
- an aforementioned method wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factor, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.
- the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.
- the present disclosure also provides an aforementioned method wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.
- an aforementioned method wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.
- the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a non-commercial antibody, a clinical antibody, a non-clinical antibody, a research-grade antibody, a diagnosticgrade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitope-binding fragments of any of the above.
- a commercial antibody a non-commercial antibody, a clinical antibody, a non-clinical antibody, a research-grade antibody, a diagnosticgrade antibody, a publicly
- an aforementioned method wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest. In still another embodiment, an aforementioned method is provided further comprising measuring cell growth.
- the present disclosure provides an aforementioned method wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA.
- said measuring comprises measuring the quantity and sequences of RNA.
- an aforementioned method is provided wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.
- the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.
- the cells are bacterial cells.
- the bacterial cells are E. coli cells.
- the E.coli cells comprise one or more or all of: (a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter; (b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter; (c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm; (e) a reduced level of gene function of a gene that encodes a reductase; (f) at least one expression construct encoding at least one disulfide bond isomerase protein; (g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or (h) at least one polynucleo
- a computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the
- a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest is provided.
- FIG. 1 A shows one embodiment of a genetic algorithm media optimization workflow environment, according to some aspects
- FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects
- FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects
- FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects;
- FIG. 2A shows exemplary cell media optimization using Darwinian fitness, according to some aspects;
- FIG. 2B depicts an exemplary selection for reproduction, according to some aspects
- FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects
- FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects
- FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects
- FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects
- FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects
- FIG. 2H depicts a chart showing population evolving to a higher HiPrBindTM signal, according to some aspects
- FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects;
- FIG. 3 shows Fab Expression in SoluPro E.coli. Random mixtures (light grey) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (dark grey) are shown versus control conditions (diagonal hatch). The mean relative expression of the evolved mixtures average 25% higher than unevolved mixtures after 2 rounds of the GA;
- FIG. 4 shows Fab expression before and after GA. Means comparison show a significant improvement of mixtures after two rounds of evolution;
- FIG. 5 shows principal component analysis (PCA) of RNAseq data and titer.
- Principal component analysis of RNAseq data reduces ⁇ 4000-dimensional gene expression data to two dimensions. Generations are overlayed showing a convergence on a single location, which matches the area of highest Fab expression (right);
- FIG. 6A shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoS iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects;
- FIG. 6B shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoH iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects;
- FIG. 6C shows iModulon activity analysis mixtures with a lower stress response correlate with higher HPB activity, according to some aspects, and specifically that activity analysis of DksA, a ribosomal protein subunit regulator that regulates ribosomal synthesis, and specifically depicted is that higher ribosome levels trend towards higher HPB, in some aspects;
- FIG. 6D depicts iModulon activity analysis wherein starvation of leucine is associated with high HPB, according to some aspects
- FIG. 6E depicts the iModulon IscR, according to some aspects
- FIG. 6F depicts Fur is associated with iron metabolism, wherein conditions with lower iron concentration are associated with higher HPB signal, according to some aspects
- FIG. 6G depicts that Cbl relates to sulfur metabolism and high signals HPB indicate a state of sulfur starvation, according to some aspects
- FIG. 7A depicts scaling up, purification and characterization of purified material, according to some aspects
- FIG. 7B depicts integrated OUR and CER, according to some aspects
- FIG. 70 depicts that scaling up quality increases by a high percentage (e.g., 1 11 % with doubling of carbon), according to some aspects
- FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits);
- FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects
- FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects
- FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects;
- FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects;
- FIG. 71 depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects
- FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to CHO produced Mab, according to some aspects
- FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects
- FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects
- FIG. 80 depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects
- FIG. 9A depicts an exemplary structural characterization environment, according to some aspects
- FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects
- FIG. 90 depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects
- FIG. 10A depicts iModulon gene sets, according to some aspects
- FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects;
- FIG. 1 1 A depicts iModulon activity, according to some aspects
- FIG. 1 1 B depicts iModulon activity, according to some aspects
- FIG. 1 1 C depicts iModulon activity, according to some aspects
- FIG. 1 1 D depicts iModulon activity, according to some aspects; and [0064] FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects.
- the present disclosure addresses the aforementioned unmet need and provides methods that combine algorithms that create mixtures of media components with machine learning, RNAseq and ICA, rapid identification of optimized mixtures (e.g., optimized cell media formulations), and thus unmasks how an organism responds to diverse stresses and unfamiliar environments.
- FIG.1 A shows an exemplary workflow of one embodiment of the methods described herein.
- Mixing optimization algorithms such as, for example, a Genetic Algorithm (GA), Bayesian algorithms, and the like are used to search an experimental space for the optimal mixture of components and concentration of such components based on scoring formulas that can include robustness of growth, product quality and titer.
- the mixing algorithms are, in some embodiments, used in connection with high-throughput screening methodologies such as liquid handling robots and specialized equipment for evaluating growth rates of inoculated microorganisms or cell lines, measuring product quality and titer. These are typically executed as stand-alone experiments, where improved mixtures are identified empirically.
- RNAseq is used in one embodiment to understand what genes are being transcribed at the time of sampling from a culture of microbial cells or mammalian cell lines.
- Independent component analysis is used, in some embodiments, to identify co-expressed functionally related gene sets (iModulons). This approach is used to identify limitations of media components, metabolic state(s) and the like. By combining these two approaches with machine learning, the number of individual members (wells, flasks, reactors) is significantly decreased.
- a computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement (e.g., a gene)that is transcribed
- the present techniques further provide the ability to form improved disulfide bond formation and protein folding in full length monoclonal antibodies in Escherichia co// by improving media composition using genetic algorithm based media optimization.
- a computing system such as described herein and above can be used in combination with other systems/wet lab assays which can optionally measure, for example, (b) at least one arrangement (e.g., a gene) that is transcribed, and (c) at least one independently modulated arrangement set (e.g., a gene set) by identifying the improved substrate.
- a fermentation device such as a BioLector (Beckman).
- the BioLector measures on a per-well basis pH, O 2 concentration, and biomass. Such measurements may or may not be considered when assessing the fitness function to select winners for breeding into subsequent rounds as described in more detail below.
- Biomass would be considered where, for example, higher cell mass was an objective (e.g., probiotic cultures). Fermentation devices such as those described herein have a fluorescent channel, so one could add a fluorescent reporter gene, or add a fluorescent substrate, as an input for fitness. One example would be to create a genetic construct wherein the quantity of the protein of interest is directly proportional to the amount of signal from the co-expressed fluorescent protein.
- an Al engine is capable of learning what mixtures are optimal per strain and per target protein, and the development timeline will be decreased and entirely in-silico predictions of optimal mixtures may be achieved.
- a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest is contemplated in one embodiment.
- a genetic algorithm is contemplated.
- the genetic algorithm is or is adapted from the algorithm described in H., Weuster-Botz, et al., Bioprocess Biosyst Eng 29, 385-390 (2006) (https://doi.org/10.1007/s00449-006-0087-7).
- GAs are inspired by Darwinian principles of natural selection, or “survival of the fittest.”
- GAs are based on evolutionary principles, encoding several sets of design variables (e.g., volumes of media components) on strings. These strings may be processed by GA operators as discussed herein (e.g., crossover, mutation, etc.) throughout one or more successive generations.
- each “gene” may be a pipetting volume of a media component
- each “chromosome” may be a well in a microplate comprised of discrete volumes of each “gene.”
- the principle of “survival of the fittest” assures a convergence toward optimal values with iterative generations.
- “fitness” refers to an ability of a particular growth medium to promote or enhance growth/survival of an inoculated biological organism (e.g., a cell placed into the well of a bioreactor).
- the GA may optimize for survival of the most welcoming, with respect, for example, to the respective abilities of cell media contained in respective wells to promote the growth/survival of bacteria, sometimes in the presence of less than ideal conditions.
- the one or more media most welcoming to a given bacteria, given one or more conditions may be the media considered to have “survived,” for purposes of a single round of evolution (e.g., during a tournament selection process, as discussed herein).
- the present techniques may use different/additional criteria to moderate fitness, such as minimizing material necessary to achieve a successful culture (survival of the most materially efficient); minimizing the number of components respective media consist of while still achieving a successful culture (survival of the simplest); minimizing economic input cost (survival of the least expensive); etc.
- fitness refers to an ability of a biomolecule to survive under various conditions.
- fitness of media and biomolecules may be determined separately, and/or concomitantly, in aspects.
- genetic operators are mathematical functions used to simulate biological concepts, such as mutation, crossover (i.e., recombination or mixing of chromosomes) and selection. These operators are the practical hooks by which the fitness concepts discussed above are enforced in an evolutionary computation solution. For example, a particular crossover operator may be implemented to avoid recombining media materials known to be incompatible, and a mutation operator may be used to introduce diversity, and to avoid premature convergence. Similarly, as will be appreciated by those of ordinary skill in the art, a given selection operator (e.g., tournament selection, roulette wheel selection, etc.) may be chosen to determine respective fitness of individuals.
- the present techniques may include other/additional algorithmic approaches.
- the GA may include a Bayesian estimation of distribution (EDA) algorithm, or probabilistic model-building genetic algorithm (PMBGA).
- EDA Bayesian estimation of distribution
- PMBGA probabilistic model-building genetic algorithm
- a naive Bayes algorithm e.g., https://onlinelibrary.wiley.com/doi/epdf/10.1002/bit.28132 is contemplated.
- a differential evolution algorithm (Storn, R. M. & Price, K. V. Differential evolution — a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11 (4), 341-359. https://doi.Org/10.1023/A:100820282 (1997)) is contemplated. In general, any direct, stochastic, and/or population algorithm is contemplated herein (https://nature.com/articles/s41598-020-74228-0).
- Exemplary, non-limiting cell media formulation components include nutrients as well as buffering compounds, pH and temperature values.
- Components may include one or more carbon sources, one or more nitrogen sources, one or more analytes, one or more salts, one or more buffering compounds, a pH or pH range, a temperature or temperature range, one or more metal salts, one or more trace minerals, one or more biostimulants, one or more co-factors, one or more peptides, one or more modified peptides, one or more nucleic acids, a one or more nucleic acid precursors, one or more small molecules, and/or one or more vitamins.
- amino-nitrogen sources such as peptone, protein hydrolysates, infusions and extracts, both plant based such as soytone, potato hydrolysate, grains and grain meals, or animal based peptones, such as meat digests and casein hydrolysate, whey and gelatin are contemplated.
- yeast extracts and, or combinations of all or some of the available amino acids such as alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine and valine are also contemplated.
- Growth factors such as blood, or serum, vitamins, NAD, and yeast extract are also contemplated in some embodiments.
- energy sources such as any kind of sugar, alcohol, or carbohydrate are contemplated, including, for example, glucose, glycerol, sorbitol, fructose, sucrose, maltose, sophorose, lactose, dextrose, galactose, arabinose, high fructose corn syrup, maltodextrin, manose, ribose, trehalose, xylose and the like.
- Sugar alcohols are additionally contemplated and include, for example, arabitol, erythritol, glycerol, matlitol, mannitol, lactitol, sorbitol and xylitol.
- Inorganic phosphate and sulfur, trace metals, water, and vitamins are often included in microbial culture media, and these components are contemplated herein.
- Mineral salts such as phosphate, sulfate, magnesium, calcium, iron are also contemplated herein.
- Selective agents such as bile salts, desoxycholate, chemicals, antimicrobials, antibiotics and dyes can be included according to some embodiments of the present disclosure.
- Gelling agents are, in some embodiments, used in solid and semi-solid media and may include agar, gelatin, alginate, albumin and silica gel.
- Protective agents such as calcium carbonate, soluble starch and charcoal are, in some embodiments, used to absorb toxic metabolites or neutralize media.
- Certain growth factors including NAD and hemin, or surfactants such as polysorbate 80 are, in some embodiments, used to alter growth rates.
- Trace metals are, in still other embodiments, included and include, but are not limited to; aluminum, manganese, molybdenum and iron, copper and nickel.
- FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects.
- FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects.
- the present techniques may include improved folding techniques.
- cell media components contain different nutrients and cofactors for cell proliferation and product formation. These components can also be optimized to affect the quality of the product formed.
- the present techniques are able to identify an optimized media composition that results in higher quality (more than 2-fold increase in quality metric) full-length antibody with improved folding and disulfide bond formation.
- FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects. As shown in FIG. 1 D, the steps to implement this improvement are as follows:
- Top candidates with the highest quality scores were selected as parents for subsequent round of mixtures and an evolutionary approach to recombination to produce another 60 mixtures. This was repeated twice, yielding populations with improved MAb quality of 40% in comparison to 15% with the control media - an improvement of more than 266 %.
- the present techniques include genetic algorithm and iModulon-based optimization of media formulation for quality, titer, strain, and/or process improvement of biologies, in some aspects.
- the present techniques include methods, compositions and/or systems for optimizing cell media formulations for the amount of biomolecules of interest expressed. These may be single parameter optimizations focused around the amount of biomolecules of interest expressed. Further, these techniques may provide methods/systems/compositions for identifying one or more genes and/or independently modulated gene sets to optimize the media formulation towards the amount of biomolecules expressed at the small scale or high throughput condition.
- the present techniques may include methods, systems and/or compositions for optimizing cell media formulations for the quality of the biomolecules of interest expressed.
- the present techniques may include a multi-parametric optimization using cell media formulation around biomass, amount of biomolecules of interest and the quality of biomolecules of interest.
- biomolecules of interest may be produced at higher quality but at the same time a minimum threshold amount that is sufficient for downstream assays.
- the present techniques may include optimizing media to allow for sufficient growth of the cells that are the cell factories making or manufacturing certain biomolecules.
- the present techniques may include identifying one or more independently modulated gene sets that improve media formulation for bioreactor scale-up.
- the present techniques may include techniques for identifying individual genes and independently modulated gene sets to gain insights for strain engineering to improve.
- the optimized cell media formulations can be genetically encoded into the strain and therefore (1 ) reduce cogs (2) retain quality during scale up along with (3) boosting the amount of product/biomolecule produced.
- the present techniques may include techniques for identifying gaps at the bioreactor scale process in terms of nutrient limitations over the course of time.
- the present techniques may provide insights to optimize and develop a process using nutritional requirements based on the individual genes or the independently modulated gene sets.
- FIG. 2A depicts an exemplary Darwinian “fitness” approach where the pipetting volume represents a “gene” and where a well (e.g., in a multi-well plate as described herein) represents a chromosome.
- FIG. 2B depicts chromosomes (wells) with the highest fitness score are, in one embodiment, bred by combining genes (component volumes) from two fit parents.
- a tournament selection process is used to determine the parent individual for the subsequent generation. In this process, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size.
- a mutation step/process is used.
- simple mutations are applied to 5, 10, 15, 20, 25% or more of individuals, and crossovers are applied to the remaining 75%.
- concentration of one component in the formulation is changed to a random value within the variable bounds.
- crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.
- the mixing and/or culturing steps of the methods described herein may be done, in some embodiments, within a microfluidic mixing system, a droplet microarray system, a multiwell plate (including, for example, controlled-release multi-well plates) or other high throughputcompatible devices as are known in the art.
- Multi-well plates or microplates can comprise 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells or more wells, and multiple plates may be used in the methods described herein.
- culturing e.g., of the aforementioned multi-well plates and other devices
- culturing is performed under a variety of conditions, including stable conditions and conditions that change over time.
- the culturing can be performed under conditions that promote cell growth, where the conditions include or exclude constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.
- the present disclosure also contemplates measuring and/or maintaining cell culturing conditions that allow productions of the biomolecules.
- Conditions such as constant or intermittent shaking (e.g., 25-1000 rpm), constant or intermittent oxygen levels (e.g., 0-40 %) constant or intermittent humidity (e.g., 20- 90 %), pH (e.g., 3-8) and/or constant or intermittent temperature (e.g., 10-60 °C) are contemplated.
- FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects, wherein earlier generations are of a lower titer and quality, and later media have a higher titer and quality, wherein the improved titer and quality are the result of evolutionary pressures.
- FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects.
- the experimental space searched by the genetic algorithm may be thought of as a mountain range (e.g., gradient) wherein the genetic algorithm seeks to find the highest peak, without reference to a map.
- FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects.
- the present techniques may be implemented using a programming language such as Python. Optimization can be slow and unpredictable using OFAT (“One Factor at a Time”) or even Design of Experiments (DoE) optimization techniques. Even high throughput wet lab media optimization using GA samples a very small fraction of possible mixtures, in some aspects. The solution space may be too large to cover even with automation and micro fermentation.
- the present techniques may solve such problems by combining iModulons and machine learning with GAs. Specifically, wet lab data and Al advantageously enable the present techniques to explore more of the best media conditions.
- an experimental design may include a particular strain (e.g., SoluProTM) E. coli expressing a Mab.
- a GA script may produce a number (e.g., 60) of unique conditions (e.g., 30 per plate)
- a number of controls may be included (e.g., 4 controls (2 per plate)).
- Process conditions may include dynamic feeding, pH feed trigger at pH setpoint and a temperature shift after 12hrs EFT.
- the GA script of FIG. 2E depicts a tournament selection process to determine the parent individual for the subsequent generation.
- five individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size.
- simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%.
- concentration of one component in the formulation is changed to a random value within the variable bounds.
- a crossover two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.
- the script of FIG. 2E may output a config file and liquid handling robot volume output file.
- FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects, for example as described with respect to FIG. 2E.
- FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects.
- the initial random mixtures show diverse growth trends.
- FIG. 2H depicts a chart showing population evolving to a higher HiPrBindTM signal, according to some aspects.
- FIG. 2H depicts Mab quality fitness scorein SoluPro Ecoli. Random mixtures (red color) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (blue and green) are shown versus the control condition.
- FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects. Mean comparison show a significant improvement of mixtures after two rounds of evolution.
- a biomolecule of interest may be expressed and produced from cells according to various embodiments. Exemplary biomolecules are described further herein.
- the methods provided herein include one or more steps that include measuring or detecting or analyzing the biomolecules that are produced.
- the biomolecule of interest includes a fluorescent tag or fusion. Fluorescence activated cell sorting (FACS) and flow cytometry, as are known in the art, are therefore contemplated herein.
- FACS Fluorescence activated cell sorting
- flow cytometry as are known in the art, are therefore contemplated herein.
- a liquid-handling robot that is capable of inoculating and culturing cells and equipped to take a variety of measurements is contemplated for use in various embodiments of the present disclosure.
- a BioLector® microbioreactor can be used.
- Binding assays for example assays that measure protein-protein interactions, including antibody-antigen interactions and including measuring binding affinity, are well known in the art.
- SPR Surface plasmon resonance
- DPI Dual polarisation interferometry
- SLS Static light scattering
- DLS Dynamic light scattering
- FIDA Flow-induced dispersion analysis
- FRET Fluorescence polarization/anisotropy
- FRET Fluorescence resonance energy transfer
- BBI Bio-layer interferometry
- ITC Isothermal titration calorimetry
- MST Microscale thermophoresis
- SCRE Single colour reflectometry
- Bimolecular fluorescence complementation Bimolecular fluorescence complementation
- affinity electrophoresis affinity electrophoresis
- label transfer phage display
- TAP Tandem affinity purification
- cross-linking Quantitative immunoprecipitation combined with knock-down (QUICK)
- QUICK Quantitative immunoprecipitation combined with knock-down
- PLA Proximity ligation assay
- the binding affinities of the antibodies described herein are measured by array surface plasmon resonance (SPR), according to standard techniques (Abdiche, et al. (2016) MAbs 8:264-277). Briefly, antibodies were immobilized on a HC 30M chip at four different densities / antibody concentrations. Varying concentrations (0-500 nM) of antibody target are then bound to the captured antibodies. Kinetic analysis is performed using Carterra software to extract association and dissociation rate constants (k a and k d , respectively) for each antibody. Apparent affinity constants (K D ) are calculated from the ratio of k d /k a . In some embodiments, the Carterra LSA Platform is used to determine kinetics and affinity.
- SPR array surface plasmon resonance
- binding affinity can be measured, e.g., by surface plasmon resonance (e.g., BIAcoreTM) using, for example, the IBIS MX96 SPR system from IBIS Technologies or the Carterra LSA SPR platform, or by Bio-Layer Interferometry, for example using the OctetTM system from ForteBio.
- a biosensor instrument such as Octet RED384, ProteOn XPR36, IBIS MX96 and Biacore T100 is used (Yang, D., et al., J. Vis. Exp., 2017, 122:55659).
- KD is the equilibrium dissociation constant, a ratio of k 0 ff/k 0n , between the antibody and its antigen. KD and affinity are inversely related. The KD value relates to the concentration of antibody and so the lower the KD value (lower concentration) and thus the higher the affinity of the antibody.
- Antibody, including reference antibody and variant antibody, K D according to various embodiments of the present disclosure can be, for example, in the micromolar range (10 -4 to 10' 6 ), the nanomolar range (1 O' 7 to 10' 9 ), the picomolar range (1 O' 10 to 10' 12 ) or the femtomolar range (1 O' 13 to 10' 15 ).
- antibody affinity of a variant antibody is improved, relative to a reference antibody, by approximately 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50% or more.
- the improvement may also be expressed relative to a fold change (e.g., 2x, 4x, 6x, or 2-, 3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-fold or more improvement in binding activity, etc.) and/or an order of magnitude (e.g., 10 7 , 10 8 , 10 9 , etc.).
- the amount of “biologically active” biomolecules produced are also measured or detected or analyzed.
- Biologically active includes, but is not limited to, a properly folded biomolecule such as a therapeutic protein or enzyme or antibody or fragment of any of the above, an enzymatically active biomolecule, an antibody or fragment thereof that is capable of binding to an antigen, and a protein or polypeptide that is capable of binding to a ligand.
- an activity-specific cell-enrichment (ACE) assay is used with the methods described herein.
- the activity-specific cell-enrichment (ACE) assay identifies host cells that express active gene product of interest (e.g., biomolecules, as used herein) rather than inactive material, as described in WO 2021/146626, incorporated herein in relevant part.
- Active gene product can be distinguished from inactive material by the ability of active gene product to specifically bind a binding partner molecule, or by the ability of gene product to participate in a chemical or enzymatic reaction, as examples.
- the presence of properly formed disulfide bonds in a polypeptide gene product is an indication that it is correctly folded and presumptively active.
- active gene product of interest is detected by utilizing an appropriate labeling complex that specifically binds to active gene product of interest, such as a labeled antigen if the gene product of interest is an antibody or Fab; or a labeled ligand if the gene product of interest is a receptor or a receptor fragment, where the ligand specifically binds to an active conformation of the receptor; or a labeled substrate or a labeled substrate analog if the gene product of interest is an enzyme, as examples.
- an appropriate labeling complex that specifically binds to active gene product of interest, such as a labeled antigen if the gene product of interest is an antibody or Fab; or a labeled ligand if the gene product of interest is a receptor or a receptor fragment, where the ligand specifically binds to an active conformation of the receptor; or a labeled substrate or a labeled substrate analog if the gene product of interest is an enzyme, as examples.
- a HiPrBind assay is used with the present methods.
- the HiPrBind assay provides an efficient method for multiple interrogations of an active gene product, such as by providing at least two distinct interrogations of a characteristic property of the active gene product or simultaneously interrogating at least two characteristic properties of that active gene product.
- HiPrBind assays are described in WO 2021/163349, incorporated herein in relevant part.
- the assay is an advance on the principle underlying the yeast two hybrid assay in that a multi-component detection mechanism is brought into proximity, and thereby brought into an environment where the detection mechanism can be active in producing a signal capable of detection.
- One component of the multi-component e.g., two component detection system is stably associated with a first analyte-associating moiety (/.e., active gene product-associating moiety) and a second component of the detection system is stably associated with a distinct second analyte-associating moiety.
- a detectable signal is generated when the two components of the detection system are brought into proximity by the analyteassociating moieties binding to the analyte.
- each of the analyte-associating moieties is specific for an active gene product as analyte, a signal is only generated when a characteristic property of an active gene product is detected using two distinct mechanisms, or when two distinct characteristic properties of an active gene product are simultaneously detected.
- the HiPrBind assay is versatile in detecting a variety of characteristic properties, but a simple example involves a gene product that is active in homodimeric form wherein each monomer requires disulfide bonds to properly fold.
- One active gene product-associating moiety can be a binding agent that specifically binds to the properly folded and therefore active monomer, and a second active gene product-associating moiety can be a distinct second binding agent that specifically binds to the dimeric form of the gene product.
- the HiPrBind assay in this version simultaneously detects a gene product that is properly folded and in dimeric form.
- the genes i.e., other than the gene encoding the biomolecule
- the genes that are induced inside cells during the culturing steps described herein are, in some embodiments, also or alternatively measured/determined.
- RNAseq or RNA-Seq is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome (Wang Z., et al., Nature Reviews. Genetics., 10(1 ): 57-63 (2009)).
- NGS next-generation sequencing
- the present disclosure provides, in some embodiments, using RNA sequencing and other well-known high throughput sequencing techniques and assays to identify the gene expression or transcript number or sequence of individual, gene sets, or an entire genome/transcriptome.
- the use of metabolomic data to identify the quantity of key metabolites and metabolic states can be used.
- Metabolomics is a discipline widely used in systems biology and has been applied to explain observed phenotypes. (Lopez-Malo M, et al., PLoS ONE. 2013;8:e60135.)
- ICA Independent component analysis
- Sastry et al. have recently described that E. coli transcriptome mostly consists of independently regulated modules (Satry, A.V., et aL, Nature Comm., 2019, 10:5536). Additionally, Tan et al. recently reported that ICA of E. Goli’s transcriptome revealed the cellular processes that respond to heterologous gene expression (Tan J., et al., Metabol. Eng., 2020, 61 , 360-368). Additionally, improvements or modification to the independent component analysis can be used such as OptICA, for finding the optimal dimensionality that controls for both over-and under-decomposition.
- “Host cells” herein are, in some embodiments, cells used in bioprocessing to manufacture heterologous protein products.
- host cells can be, for example, eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.
- Prokaryotic host cells are provided that comprise expression constructs designed for the expression of coding regions.
- Prokaryotic host cells can include archaea (such as Haloferax volcanii, Sulfolobus solfataricus), Gram-positive bacteria (such as Bacillus subtilis, Bacillus licheniformis, Brevibacillus choshinensis, Lactobacillus brevis, Lactobacillus buchneri, Lactococcus lactis, and Streptomyces lividans), or Gram-negative bacteria, i.e., proteobacteria, including Alphaproteobacteria (Agrobacterium tumefaciens, Caulobacter crescentus, Rhodobacter sphaeroides, and Sinorhizobium meliloti), Betaproteobacteria (Alcaligenes eutrophus), and Gammaproteobacteria (Acinetobacter calcoaceticus, Azotobacter vinel
- Host cells include Gammaproteobacteria of the family Enterobacteriaceae, such as Enterobacter, Erwinia, Escherichia (including E. coli), Klebsiella, Proteus, Salmonella (including Salmonella typhimurium), Serratia (including Serratia marcescans), and Shigella.
- Enterobacter Erwinia
- Escherichia including E. coli
- Klebsiella including E. coli
- Proteus including Salmonella typhimurium
- Salmonella including Salmonella typhimurium
- Serratia including Serratia marcescans
- Shigella Shigella.
- WO/2017/106583 As described in WO/2017/106583, incorporated by reference herein in its entirety, producing gene products such as therapeutic proteins at commercial scale and in soluble form is addressed by providing suitable host cells capable of growth at high cell density in fermentation culture, and which can produce soluble gene products in the oxidizing host cell cytoplasm through highly controlled inducible gene expression.
- Host cells of the present disclosure with these qualities are produced by combining some or all of the following characteristics.
- the host cells are genetically modified to have an oxidizing cytoplasm by increasing the expression or function of oxidizing polypeptides in the cytoplasm, and/or by decreasing the expression or function of reducing polypeptides in the cytoplasm. Specific examples of such genetic alterations are provided herein and in WO 2017/106583.
- host cells can also be genetically modified to express accessory proteins (which can be chaperones) and/or cofactors that assist in the production of the desired gene product(s), and/or to glycosylate polypeptide gene products.
- accessory proteins which can be chaperones
- cofactors that assist in the production of the desired gene product(s), and/or to glycosylate polypeptide gene products.
- the host cells comprise one or more expression constructs designed for the expression of one or more active gene products of interest. At least one expression construct can comprise an inducible promoter and a polynucleotide encoding a gene product to be expressed in active form from the inducible promoter.
- the host cells contain additional genetic modifications designed to improve certain aspects of gene product expression from the expression construct(s).
- the host cells can (A) have an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter, and as another example, wherein the gene encoding the transporter protein is araE, araE, araG, araH, rhaT, xylF, xylG, or xylH, or particularly the transporter protein is araE, or wherein the alteration of gene function more particularly is expression of unaltered araE from a constitutive promoter; and/or (B) have a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter, and as further examples, wherein the gene encoding a protein that metabolizes an inducer of at least one said inducible promoter is selected from the group consisting of araA, araB, araD, prpB, prpD, rhaA, rhaB, rha
- Host Cells with Oxidizing Cytoplasm are designed to express active gene products. Examples of host cells are provided that allow for the efficient and cost-effective expression of active gene products, including components of multimeric products.
- the host cells can be microbial cells such as gramnegative bacteria, e.g., E. coli.
- Exemplary E. coli host cells having oxidizing cytoplasm include the E. coli B strains SHuffle® Express (NEB Catalog No. C3028H) and SHuffle® T7 Express (NEB Catalog No. C3029H) and the E. coli K strain SHuffle® T7 (NEB Catalog No. C3026H).
- coli B strains with oxidizing cytoplasm are able to grow to much higher cell densities than the most closely corresponding E. coli K strain (WO/2017/106583).
- alterations to host cell gene functions Certain alterations can be made to the gene functions of host cells comprising inducible expression constructs, to promote efficient and homogeneous induction of the host cell population by an inducer.
- the combination of expression constructs, host cell genotype, and induction conditions results in at least 75% (more preferably at least 85%, and most preferably, at least 95%) of the cells in the culture expressing active gene product from each induced promoter, as measured by the method of Khlebnikov et al. described in Example 9 of WO/2017/106583.
- these alterations can involve the function of genes that are structurally similar to an E. coli gene, or genes that carry out a function within the host cell similar to that of the E. coli gene.
- Alterations to host cell gene functions include eliminating or reducing gene function by deleting the coding region of the gene in its entirety, or by deleting a large enough portion of the gene, inserting sequence into the gene, or otherwise altering the gene sequence so that a reduced level of functional gene product is made from that gene, as is described herein with greater particularity for the ptsP gene or coding region.
- Alterations to host cell gene functions also include increasing gene function by, for example, altering the native promoter to create a stronger promoter that directs a higher level of transcription of the gene, or introducing a missense mutation into the protein-coding sequence that results in a more highly active gene product. Alterations to host cell gene functions include altering gene function in any way, including for example, altering a native inducible promoter to create a promoter that is constitutively activated. In addition to alterations in gene functions for the transport and metabolism of inducers, as described herein with relation to inducible promoters, and/or an altered expression of chaperone proteins, alterations of the reduction-oxidation environment of the host cell are also contemplated.
- Cytoplasmic Dsb proteins such as a cytoplasmic version of DsbA (cDsbA) and/or of DsbC (cDsbC), that lacks a signal peptide and therefore is not transported into the periplasm.
- Cytoplasmic Dsb proteins such as cDsbA and/or cDsbC are useful for making the cytoplasm of the host cell more oxidizing and thus more conducive to the formation of disulfide bonds in proteins, including heterologous proteins, produced in the cytoplasm.
- the host cell cytoplasm can also be made less reducing and thus more oxidizing by altering the thioredoxin and the glutaredoxin/glutathione enzyme systems directly: mutant strains defective in glutathione reductase (gor) or glutathione synthetase (gshB), together with a defective thioredoxin reductase (trxB), render the cytoplasm oxidizing. These strains are unable to reduce ribonucleotides and therefore cannot grow in the absence of exogenous reductant, such as dithiothreitol (DTT).
- DTT dithiothreitol
- Suppressor mutations (such as ahpC* and ahpCA, Lobstein et aL, Microb Cell Fact 11 :56 (2012)) in the gene ahpC, which encodes the peroxiredoxin AhpC, convert it to a disulfide reductase that generates reduced glutathione, allowing the channeling of electrons onto the enzyme ribonucleotide reductase and enabling the cells defective in gor and trxB, or defective in gshB and trxB, to grow in the absence of DTT.
- ahpC which encodes the peroxiredoxin AhpC
- AhpC can allow strains, defective in the activity of gamma-glutamylcysteine synthetase (gshA) and defective in trxB, to grow in the absence of DTT; these include AhpC V164G, AhpC S71 F, AhpC E173/S71 F, AhpC E171Ter, and AhpC dupl62-169 (Faulkner et al., Proc Natl Acad Sci USA 105(18):6735-6740 (2008), Epub 2008 May 2).
- gshA gamma-glutamylcysteine synthetase
- the disclosure also contemplates the expression of the sulfhydryl oxidase Ervlp, derived from the inner membrane space of yeast mitochondria, in the host cell cytoplasm, which has been shown to increase the production of a variety of complex, disulfide-bonded proteins of eukaryotic origin in the cytoplasm of E. coli, even in the absence of mutations in gor or trxB (Nguyen et al, Microb Cell Fact 10:1 (201 1 )).
- Host cells comprising expression constructs preferably also express cDsbA and/or cDsbC and/or Ervlp, are deficient in trxB gene function, and are also deficient in the gene function of either gor, gshB, or gshA.
- the host cells have increased levels of katG and/or katE gene function, and express an appropriate mutant form of AhpC so that the host cells can be grown in the absence of dithiothreitol (i.e., DTT).
- DTT dithiothreitol
- cofactors include ATP, coenzyme A, flavin adenine dinucleotide (FAD), NAD NADH, and heme.
- FAD flavin adenine dinucleotide
- Polynucleotides encoding cofactor transport polypeptides and/or cofactor synthesizing polypeptides can be introduced into host cells, and such polypeptides can be constitutively expressed, or inducibly co-expressed with the active gene products to be produced by methods of the disclosure.
- protease include, but are not limited to, Clp, CIpP, OmpT, Lon, FtsH, CIpX, CIpY, CIpA, CIpQ, CIpAP, CIpXP, CIpAXP, CIpYQ, CIpY, and the proteases encoded by yaeL, sppA ; tldD, sprT, yhbU.
- ptrA frvX, hyaD, hybD, hycH, envC, ddpX, degP, degQ, degS, hsIV, hsIU, pepB, pepP, sohB, yggG, pepE, pepN, pepQ, abgA, pepT, iadA, pepA, pepD, ptrB, ycaL ycbZ. yegQ, ygeY. ypdF, hyci sgcX, and htpX.
- Host cells can have alterations in their ability to glycosylate polypeptides.
- eukaryotic host cells can have eliminated or reduced gene function of the glycosyltransferase and/or oligo-saccharyltransferase genes, impairing the normal eukaryotic glycosylation of polypeptides to form glycoproteins.
- Prokaryotic host cells such as E. coli, which do not normally glycosylate polypeptides, can be altered to express a set of eukaryotic and prokaryotic genes that provide a glycosylation function (DeLisa et al., WO 2009/089154A2).
- Expression constructs are polynucleotides designed for the expression of one or more recombinant gene products of interest, and thus are not naturally occurring molecules. Any expression construct known in the art is contemplated for use in the cells and methods of the disclosure, including expression constructs that can be integrated into a host cell chromosome or maintained within the host cell as extra-chromosomal, independently replicating polynucleotide molecules, i.e., episomes having origins of replication independent of the host cell chromosome, such as plasmids or artificial chromosomes. Expression constructs according to the disclosure also can have one or more selectable markers to enable selection of those cells harboring the expression construct.
- Exemplary selectable markers confer resistance to antibiotics lethal to the host cell lacking that selectable marker or encode enzymes required to produce essential nutrients. Any selectable marker known in the art is contemplated for use in the expression constructs of the disclosure.
- Expression markers may also contain an inducible promoter to provide the ability to induce the expression of a coding region operably linked to that inducible promoter.
- Exemplary inducible promoters contemplated by the disclosure include the arabinose promoter (ParaBAD), ParaC, ParaE, the propionate promoter (PprpBCDE), the rhamnose promoter (PrhaSR), the xylose promoter (PxylA), the lactose promoter, and the alkaline phosphatase promoter.
- the disclosure comprehends expression constructs comprising constitutive promoters.
- the construct may also include a ribosome binding site (RBS).
- RBS ribosome binding site
- the RBS consensus sequence is GGAGG or GGAGGU, and in bacteria such as E. coli, the RBS consensus sequence is further defined as AGGAGG or AGGAGGU.
- the expression construct may include a multiple cloning site in which a variety of restriction endonuclease cleavage sites are clustered to provide flexibility in incorporating exogenous polynucleotides, as is known in the art.
- Some expression constructs of the disclosure further include a coding region for a signal peptide or leader peptide, wherein the coding region is oriented to result in expression of fusion protein comprising the signal peptide and the active gene product of interest.
- inducible promoters are contemplated for use with the expression constructs to be introduced into the host cells according to the disclosure in order to achieve elevated expression of desired active gene products.
- Exemplary promoters are described herein and are also described in WO/2017/205570, incorporated herein by reference in relevant part.
- the cells comprising one or more expression constructs may optionally include one or more inducible promoters to express a gene product of interest.
- Chaperones are accessory proteins that assist the non-covalent folding or unfolding, and/or the assembly or disassembly, of other gene products, but do not occur in the resulting monomeric or multimeric gene product structures when the structures are performing their normal biological functions (having completed the processes of folding and/or assembly). Chaperones can be expressed from an inducible promoter or a constitutive promoter within an expression construct, or can be expressed from the host cell chromosome. Exemplary chaperones present in E.
- coli host cells are the folding factors DnaK/DnaJ/GrpE, DsbC/DsbG, GroEL/GroES, IbpA/IbpB, Skp, Tig (trigger factor), and FkpA, which have been used to prevent protein aggregation of cytoplasmic or periplasmic proteins.
- DnaK/DnaJ/GrpE, GroEL/GroES, and CIpB can function synergistically in assisting protein folding, and expression of these chaperones in various combinations has been shown to facilitate expression of properly folded gene product.
- a eukaryotic chaperone protein such as protein disulfide isomerase (PDI) from the same or a related eukaryotic species, can be co-expressed, e.g., inducibly co-expressed, with the gene product of interest.
- PDI protein disulfide isomerase
- a chaperone that can be expressed in host cells is a protein disulfide isomerase from Humicola insolens, a soil hyphomycete (soft-rot fungus).
- An amino acid sequence of Humicola insolens PDI is shown as SEQ ID NO: 1 of WO2017/106583; it lacks the signal peptide of the native protein so that it remains in the host cell cytoplasm.
- the nucleotide sequence encoding PDI was optimized for expression in E. coli; the expression construct for PDI is shown as SEQ ID NO: 2 of WO2017/106583.
- SEQ ID NO: 2 of WO2017/106583 contains a GCTAGC Nhel restriction site at its 5' end, an AGGAGG ribosome binding site at nucleotides 7 through 12, the PDI coding sequence at nucleotides 21 through 1478, and a GTCGAC Sail restriction site at its 3' end.
- the nucleotide sequence of SEQ ID NO: 2 of WO2017/106583 was designed to be inserted immediately downstream of a promoter, such as an inducible promoter.
- Nhel and Sail restriction sites in SEQ ID NO: 2 of WO2017/106583 can be used to insert it into a vector multiple cloning site, such as that of the pSOL expression vector (SEQ ID NO: 3 of WQ2017/106583), described in published US patent application US2015353940A1 , which is incorporated by reference in its entirety herein.
- PDI polypeptides can also be expressed in host cells, including PDI polypeptides from a variety of species (Saccharomyces cerevisiae (UniProtKB PI 7967), Homo sapiens (UniProtKB P07237), Mus musculus (UniProtKB P09103), Caenorhabditis elegans (UniProtKB Q 17770 and Q 17967), Arabdopsis thaliana (UniProtKB 048773, Q9XI01 , Q9S G3, Q9LJU2, Q9MAU6, Q94F09, and Q9T042), Aspergillus niger (UniProtKB Q12730) and also modified forms of such PDI polypeptides.
- species Sacharomyces cerevisiae (UniProtKB PI 7967)
- Homo sapiens UniProtKB P07237)
- Mus musculus UniProtKB P09103
- a PDI polypeptide expressed in host cells of the disclosure can share at least 70%, or 80%, or 90%, or 95% amino acid sequence identity across at least 50% (or at least 60%, or at least 70%, or at least 80%, or at least 90%) of the length of SEQ ID NO: I of WO2017/106583, where amino acid sequence identity is determined according to Example 10 of WO2017/106583.
- Assays to measure accessory protein activity include, PhyTip-based column target heterologous protein expression level quantification (phynexus.com/products/proteins/antibody- binding-phytip-columns/), flow cytometry-based ACE ASSAYTM measuring bound probe to properly folded target protein material (WO2021/146626), and/or an ELISA-based method HiPr bind assay (WO2021/163349), which measures fluorescence signal in a plate-based format of probes binding to properly folded target protein. These methods measure the increase in target protein production in the presence of the accessory protein compared to the production level in its absence.
- the increase can be at least 1 .5-fold, at least two-fold, at least three-fold, at least four-fold, at least five-fold, at least six-fold, at least seven-fold, at least eight-fold, at least ninefold, at least ten-fold, at least twenty-fold, at least fifty-fold, at least one hundred-fold, or greater.
- the present disclosure provides methods for expressing and/or producing and/or purifying biomolecules of interest, wherein the biomolecule of interest can be a protein, a RNA, and a RNA-DNA hybrid.
- the biomolecule is a RNA selected from the group consisting of ncRNA, tRNA, rRNA, snRNA, snoRNA, miRNA, mRNA, and TERC.
- the biomolecule is a protein selected from the group consisting of a therapeutic protein, an antibody, an enzyme, a ligand, an antigen, a growth factor, a receptor, a nucleic acid-binding protein, as well as fragments, analogs and fusions of any of the aforementioned biomolecules and as described herein.
- the biomolecule of interest is a biopolymer, a chemical, a drug, a flavor modifier, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, or a sugar alcohol.
- antibody refers to whole antibodies that interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) an epitope on a target antigen.
- a naturally occurring "antibody” is a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds.
- Each heavy chain is comprised of a heavy chain variable region (abbreviated herein as VH) and a heavy chain constant region.
- the heavy chain constant region is comprised of three domains, CH1 , CH2 and CH3.
- Each light chain is comprised of a light chain variable region (abbreviated herein as VL) and a light chain constant region.
- the light chain constant region is comprised of one domain, CL.
- CL The VH and VL regions can be further subdivided into regions of hypervariability, termed complementarity determining regions (CDR), interspersed with regions that are more conserved, termed framework regions (FR).
- CDR complementarity determining regions
- FR framework regions
- Each VH and VL is composed of three CDRs and four FRs arranged from amino-terminus to carboxy-terminus in the following order: FR1 , CDR1 , FR2, CDR2, FR3, CDR3, FR4.
- the variable regions of the heavy and light chains contain a binding domain that interacts with an antigen.
- the constant regions of the antibodies may mediate the binding of the immunoglobulin to host tissues or factors, including various cells of the immune system (e.g., effector cells) and the first component (Clq) of the classical complement system.
- the term “antibody” includes for example, monoclonal antibodies, human antibodies, humanized antibodies, camelised antibodies, chimeric antibodies, single-chain Fvs (scFv), disu If ide-lin ked Fvs (sdFv), Fab fragments, F (ab') fragments, and anti-idiotypic (anti-ld) antibodies (including, e.g., anti-ld antibodies to antibodies of the invention), and epitope-binding fragments of any of the above.
- the antibodies can be of any isotype (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., lgG1 , lgG2, lgG3, lgG4, lgA1 and lgA2) or subclass.
- the antibody or epitope-binding fragments may be, or be a component of, a multi-specific molecule.
- variable domains of both the light (VL) and heavy (VH) chain portions determine antigen recognition and specificity.
- the constant domains of the light chain (CL) and the heavy chain (CH1 , CH2 or CH3) confer important biological properties such as secretion, transplacental mobility, Fc receptor binding, complement binding, and the like.
- the N-terminus is a variable region and at the C-terminus is a constant region; the CH3 and CL domains actually comprise the carboxy-terminus of the heavy and light chain, respectively.
- antibody fragment refers to one or more portions of an antibody that retain the ability to specifically interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) a target epitope.
- binding fragments include, but are not limited to, a Fab fragment, a monovalent fragment consisting of the VL, VH, CL and CH1 domains; a F(ab)2 fragment, a bivalent fragment comprising two Fab fragments linked by a disulfide bridge at the hinge region; a Fd fragment consisting of the VH and CH1 domains; a Fv fragment consisting of the VL and VH domains of a single arm of an antibody; a dAb fragment (Ward et al., (1989) Nature 341 :544-546), which consists of a VH domain; and an isolated complementarity determining region (CDR).
- a Fab fragment a monovalent fragment consisting of the VL, VH, CL and CH1 domains
- F(ab)2 fragment a bivalent fragment comprising two Fab fragments linked by a disulfide bridge at the hinge region
- a Fd fragment consisting of the VH and CH1 domains
- the two domains of the Fv fragment, VL and VH are coded for by separate genes, they can be joined, using recombinant methods, by a synthetic linker that enables them to be made as a single protein chain in which the VL and VH regions pair to form monovalent molecules (known as single chain Fv (scFv); see e.g., Bird et al., (1988) Science 242:423-426; and Huston et al., (1988) Proc. Natl. Acad. Sci. 85:5879-5883).
- single chain Fv single chain Fv
- Such single chain antibodies are also intended to be encompassed within the term “antibody fragment”.
- antibody fragments are obtained using conventional techniques known to those of skill in the art, and the fragments are screened for utility in the same manner as are intact antibodies.
- antibodies may include biologically active derivatives or variants or fragments.
- biologically active derivative or “biologically active variant” includes any derivative or variant of an antibody having substantially the same functional and/or biological properties of said antibody (e.g., a WT antibody), such as binding properties, and/or the same structural basis, such as a peptidic backbone or a basic polymeric unit, including framework regions.
- biologically active biomolecules additionally includes, in some embodiments, proteins that are enzymatically active and/or properly folded, and/or that exhibit a desired affinity to an antigen or ligand and/or exhibit stability under specific conditions such as temperature (e.g., temperature sensitivity/stability).
- an “analog,” such as a “variant” or a “derivative,” is an antibody substantially similar in structure and having the same biological activity, albeit in certain instances to a differing degree, to a naturally-occurring antibody or a WT antibody or another reference antibody as will be understood by those of skill in the art.
- an antibody variant refers to an antibody sharing substantially similar structure and having the same biological activity as a reference antibody.
- Variants or analogs differ in the composition of their amino acid sequences compared to the reference antibody from which the analog is derived, based on one or more mutations involving (i) deletion of one or more amino acid residues at one or more termini of the antibody and/or one or more internal regions of the antibody sequence (e.g., fragments), (ii) insertion or addition of one or more amino acids at one or more termini (typically an “addition” or “fusion”) of the antibody and/or one or more internal regions (typically an “insertion”) of the antibody sequence or (iii) substitution of one or more amino acids for other amino acids in the antibody sequence.
- a “derivative” is a type of analog and refers to an antibody sharing the same or substantially similar structure as a reference antibody that has been modified, e.g., chemically.
- the variants or sequence variants are mutants wherein 1 , 2, 3, 4, 5, 6 or more amino acids within one or more CDR are mutated relative to a reference antibody.
- CDRs on the light chain, heavy chain, or both heavy and light chain are mutated.
- one or more framework amino acid residues are mutated relative to a reference antibody.
- substitution variants one or more amino acid residues, e.g., in a CDR region, of an antibody are removed and replaced with alternative residues.
- the substitutions are conservative in nature and conservative substitutions of this type are well known in the art.
- the disclosure embraces substitutions that are also non-conservative. Exemplary conservative substitutions are described in Lehninger, [Biochemistry, 2nd Edition; Worth Publishers, Inc., New York (1975), pp.71 -77],
- Antibodies contemplated herein include full-length antibodies, biologically active subunits or fragments of full length antibodies, as well as biologically active derivatives and variants of any of these forms of therapeutic proteins.
- antibodies include those that (1 ) have an amino acid sequence that has greater than about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98% or about 99% or greater amino acid sequence identity, over a region of at least about 25, about 50, about 100, about 200, about 300, about 400, or more amino acids, to a reference antibody (e.g., encoded by a referenced nucleic acid or an amino acid sequence described herein).
- the term "recombinant protein” or “recombinant antibody” includes any protein obtained via recombinant DNA technology. In certain embodiments, the term encompasses antibodies as described herein.
- the antibodies or antibody variants described herein are expressed from one or more expression construct and/or in a cell or strains as described herein.
- Exemplary wild-type or reference antibodies include commercially available or other known antibodies, including therapeutic monoclonal antibodies.
- Reference antibodies according to the present disclosure may include any antibodies now known or later developed, including those that are not clinically and/or commercially available.
- identity refers to a relationship between the sequences of two or more polypeptide molecules or two or more nucleic acid molecules, as determined by aligning and comparing the sequences. "Percent identity” means the percent of identical residues between the amino acids or nucleotides in the compared molecules and is calculated based on the size of the smallest of the molecules being compared. For these calculations, gaps in alignments (if any) must be addressed by a particular mathematical model or computer program (/.e., an "algorithm”). Methods that can be used to calculate the identity of the aligned nucleic acids or polypeptides are standard in the art.
- the sequences being compared are aligned in a way that gives the largest match between the sequences.
- An exemplary computer program used to determine percent identity is the GCG program package, which includes GAP (Devereux et al., Nucl Acid Res, 12 387 (1984); Genetics Computer Group, University of Wisconsin, Madison, Wise.).
- GAP is used to align the two polypeptides or polynucleotides for which the percent sequence identity is to be determined.
- the sequences are aligned for optimal matching of their respective amino acid or nucleotide (the "matched span", as determined by the algorithm).
- a gap opening penalty (which is calculated as 3.
- Certain alignment schemes for aligning two amino acid sequences can result in matching of only a short region of the two sequences, and this small aligned region can have very high sequence identity even though there is no significant relationship between the two full- length sequences. Accordingly, the selected alignment method (GAP program) can be adjusted if so desired to result in an alignment that spans at least 50 contiguous amino acids of the target polypeptide.
- exemplary programs that compare and align pairs of sequences include, but are not limited to, ALIGN (Myers and Miller, Comput Appl Biosci, 19, 4(1 ): 1 1-17 (1988), FASTA (Pearson and Lipman, Proc Natl Acad Sci USA, 85(8): 2444-2448 (1988); Pearson, Methods Enzymol, 183: 63-98 (1990) and gapped BLAST (Altschul et a!., Nucleic Acids Res, 25(17):3389- 40 (1997), BLASTP, BLASTN, or GOG (Devereux et al., Nucleic Acids Res, 12(1 Pt 1):387-95 (1984).
- the present Example describes the use of a genetic algorithm (GA) to identify improved media formulations that resulted in several-fold higher titers of an antibody fragment produced in SoluProTM E. coli (See, e.g., WO/2014/025663 and WO/2017/106583).
- Python software was created to execute the GA, and to output random pipetting volumes to a .csv file for 60 mixtures of 14 individual media components (sugar, nitrogen, salt, and trace metals).
- the .csv file was uploaded to a liquid handling robot to execute pipetting of each of the 14 media components into 60 wells of a small-scale plate-based bioreactor system with pH control and feeding ability.
- RNAseq was collected to characterize the host’s transcriptional response.
- Independent Component Analysis (ICA) of the RNAseq data set revealed independently modulated gene sets (iModulons) that characterized the host response to different media formulations.
- a genetic algorithm was developed using the DEAP library in Python to identify the media formulations that resulted in highest product titers.
- each individual media composition is represented by a vector of media component concentrations. Concentrations can either be discrete or continuous, and each component is designated a minimum and maximum concentration.
- a random pool was generated of 60 individual media with -900 pipetting events performed on a Hamilton liquid handling robot. These media compositions were tested in the BioLector Pro (Beckman Coulter) to measure product formation.
- a cyclical select-reproduce-mutate-cull process was followed as is common with a (mu, lambda) evolutionary strategy. This means that children replace the parents.
- For selection A tournament selection process was used to determine the parent individual for the subsequent generation. In this process, five individuals were randomly sampled from the population and the individual with the highest fitness was selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size.
- For mutation In the mutation step, simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%. In a simple mutation, we change the concentration of one component in the formulation to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.
- Any instrument- logged signal can be chosen (i.e. Biomass, pH, DO) to trigger microfluidic pumps to add a specified volume.
- a pH trigger was used; pH was maintained at 6.6 with a trigger at 6.65.
- a value above 6.65 triggers addition of 5.5 pL of substrate consisting of a carbon source and inducers for both the protein of interest and accessory molecules to promote proper folding and solubility of the Fab.
- the control program includes the options to have a block and a pause.
- Block time is defined as the time a trigger condition must be continuously met before the trigger is activated.
- Pause time is defined as the time that must elapse after a trigger was activated before it can be activated again. For the study, 15 minutes was chosen for both.
- Temperature was set at 32 e C for the first twelve hours after inoculation then reduced to 26 e C for the remainder of the fermentation.
- HiPrBindTM assay was used to assess performance of experimental samples.
- HiPrBindTM is an ELISA-type assay that specifically binds properly folded, functional target protein and produces a signal corresponding to the amount of target protein present (WO2021/163349 and described herein).
- To measure properly folded, active target protein two analyte associating moieties were used in combination with a signal donor and an activatable compound.
- one moiety forms a complex with the signal donor, and the other moiety forms a complex with an activatable compound. If properly folded, active target material is present, the signal donor complex and the activatable compound complex will both bind and provide detectable output. If properly folded, active target material is not present, the complexes do not associate, and no signal is produced.
- RNAIater (Thermo Fisher). The culture and RNAIater mixture were spun down, supernatant was removed, followed by resuspension of the pellet in 3x volume of RNAIater for complete quenching. A volume of 10-50 pL of the treated culture was mixed with >300 pL of Trizol. RNA was then isolated using Direct-zol-96 Magbead RNA (Zymo Research, PN R2100), according to manufacturer’s protocol and including the optional DNase I treatment.
- Sequencing libraries were prepared from extracted RNA with Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Illumina, PN 20040529), according to the manufacturer’s protocol. This prep kit depletes ribosomal RNA, reverse transcribes remaining RNA into cDNA (complementary DNA), then ligation and subsequent amplification adds adapters and dual-indexes for multiplex sequencing on an Illumina instrument. Libraries were normalized, pooled, and then diluted to 750 pM for 2x75bp sequencing on an Illumina Nextseq 1000 using P2 Reagents (200 cycle, v3) (Illumina, PN 20046812). [0169] iModulon/PCA
- px,y is the Pearson correlation between components x and y.
- the final robust ICs were defined as the centroids of the cluster. Again, to account for identical components with opposite signs, we choose one component as the canonical direction, and flip all other components in the cluster to ensure that the Pearson correlation is positive between all members of the cluster before computing the centroid.
- Precision is the proportion of genes in the component that are present in the associated regulon and recall is the proportion of genes in the regulon that are present in the associated component.
- Prior information about regulator binding sites was borrowed from previous studies (Sastry AV, et al., Nat Common. 2019;10:5536; Rychel K, et al., Nat Common. 2020;11 :6338; and Poodel S, et al., Proc Natl Acad Sci USA. 2020;117:17228-39).
- FIG. 3 shows Fab Expression in SoloPro E.coli. Random mixtores (light grey) versos mixtores evolved throogh two roonds of evolotionary selection by the genetic algorithm (dark grey) are shown versos control conditions (diagonal hatch). The mean relative expression of the evolved mixtores average 25% higher than onevolved mixtores after 2 roonds of the GA.
- the HiPrBindTM (HPB) signal significantly increased between before (Roond 0) and after (Roond 2) the genetic algorithm.
- the mean signal increased 1 .9 fold between the initial and final popolations.
- FIG. 6C starvation of leocine is shown to trend towards higher HiPrBindTM signal, soggesting addition of leocine may improve protein expression.
- FIG. 6D DksA, a ribosomal protein sobonit regolator, soggests that higher levels of ribosomes trend to higher HiPrBindTM signal.
- FIG. 6E Cbl, is an iModolon associated with solfor metabolism. High HiPrBindTM signals here may indicate a state of solfor starvation.
- FIG. 6F and 6G show iModolon signal related to iron metabolism. For is opregolated doring iron starvation and the iron-solfor closter regolator IscR is also trends to a state of opregolation when HiPrBindTM signals are higher.
- This media optimization approach can be applied, in various embodiments, to biological drugs such as monoclonal antibodies and antibody fragments, enzymes, edible proteins, and metabolites produced by microbes or other production hosts.
- biological drugs such as monoclonal antibodies and antibody fragments, enzymes, edible proteins, and metabolites produced by microbes or other production hosts.
- This mixture was scaled into a stirred bioreactor, demonstrating that mixtures identified by GA in a microplate with pH control and feeding can be quickly scaled to industrial production.
- FIG. 7A depicts scale-up, purification and characterization of the purified material, according to some aspects, including scaling-up, transcriptomic analysis and analytical characterization of purified material.
- FIG. 7B depicts integrated OUR and CER, according to some aspects.
- FIG. 70 depicts that scaling up from microplates to bioreactors increases quality by a high percentage (e.g., 111 % with doubling of carbon), according to some aspects.
- FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits).
- FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects.
- FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects.
- FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects.
- FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects.
- FIG. 7I depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects.
- FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to OHO produced Mab, according to some aspects.
- FIG. 7J shows that the genetic algorithm-based media optimization results in Mab (e.g, produced in SoluPro Ecoli) that is structurally similar to CHO produced Mab.
- FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects.
- FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects.
- FIG. 8C depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects.
- FIG. 9A depicts an exemplary structural characterization environment, according to some aspects.
- FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects.
- FIG. 9C depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects.
- FIG. 10A depicts iModulon gene sets, according to some aspects.
- FIG. 10A shows the principal component analysis (PCA) of the RNAseq data and quality. Principal component analysis of RNAseq data (left) reduces ⁇ 4000-dimensional gene expression.
- FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects.
- iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titers.
- FIG.s 11 A-11 D depict iModulon activity, according to some aspects.
- RNAseq data was processed and into grouds of functionally co-regulated genesets known as iModulons.
- the iModulon analysis was collected from the RNAseq data of multiple strains (FIG.
- OxyR is a regulator of antioxidants genes and is an indicator.
- FIG. 11 A OxyR expression was used to assess the different strains engineered to improve their oxidative stress.
- PhoB is an iModulon of phosphate metabolism.
- PhoP is an iModulon of metal homeostatis and Fur
- FIG. 11 D is an iModulon of iron uptake. These iModulons may be used to improve the process conditions by supplementing respective nutrient to make process imporvments over the course of cultivation in Bioreactors.
- FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects.
- iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titer.
- the fitness score of the strain G with strain improvement insights obtained from iModulon analysis results in a strain with an improved quality fitness score.
- GA media was used as a benchmark and incorporate the benefits of the GA media into the strain.
- aspects of the present techniques may be implemented using source code in suitable programming languages, such as Python, Java, C++, etc.
- suitable programming languages such as Python, Java, C++, etc.
- source code listings that may be implemented, in some aspects:
- a method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.
- a method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.
- a method of increasing the yield of biomolecule expression in a cell culture system comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.
- any one of aspects 1-3 optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.
- the high-throughput device is selected from the group consisting of a liquid-handling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.
- the mixture matrix comprises one or more multi-well plates, one or more controlled-release multi-well plates, and one or more multi-well or multi-vessel bioreactor systems.
- cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factors, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.
- cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.
- biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.
- biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitopebinding fragments of any of the above.
- a commercial antibody a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly
- a method of producing a biomolecule of interest comprising culturing a host cell comprising an expression construct encoding the biomolecule of interest in an optimized cell media formulation as determined by the method of aspect 1 .
- a computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.
- a computer-implemented method for improving quality of monoclonal antibodies comprising: (i) performing a media optimization using a genetic algorithm workflow; (ii) harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) selecting a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) scaling one or more candidates to a biorecator scale; (v) collecting RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) performing a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
- a computing system for improving quality of monoclonal antibodies comprising: a bioreactor, one or more processors, and one or more memories having stored thereon comptuer-executable instructions that, when executed by the one or more processors, cause the system to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
- a non-transitory computer-readable media having stored thereon computerexecutable instructions that, when executed, cause a computer to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Organic Chemistry (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Biochemistry (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Hematology (AREA)
- Microbiology (AREA)
- Urology & Nephrology (AREA)
- Databases & Information Systems (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Epidemiology (AREA)
- Analytical Chemistry (AREA)
- Tropical Medicine & Parasitology (AREA)
- Toxicology (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods, systems and computer-readable media for identifying optimized cell media formulations capable of promoting the expression of a biomolecule of interest are presented, which may include applying mixing algorithms, culturing cells, and measuring component conditions. The techniques may include identifying genes and/or or more independently modulated gene sets. The present techniques include methods, systems and computer-readable media for improving quality of monoclonal antibodies.
Description
GENETIC ALGORITHM AND IMODULON BASED OPTIMIZATION OF MEDIA FORMULATION FOR QUALITY, TITER, STRAIN, AND PROCESS IMPROVEMENT OF BIOLOGICS
Field
[0001] The subject matter provided herein relates generally to the field of cell media optimization, and more specifically, to methods and systems for optimizing cell media formulations using optimization and search technique inspired by the principles of natural selection and genetics, that mimic evolutionary processes (e.g., genetic algorithms).
Background
[0002] Bioreactor media is comprised of many components. The choice of components and component concentrations can have a profound impact on product quality and titer. Current methods for media optimization, even when automated and using advanced mixing algorithms, are time consuming and often do not elucidate the underlying physiological reasons for the improvements in product quality and titer conferred by the optimized mixture. Independent component analysis (ICA) and RNAseq have been combined to identify co-expressed, functionally related gene sets (i.e. , determine transcriptional regulatory network activation), but to date a comprehensive method and predictive model combining advanced mixing algorithms with advanced transcriptional, sequencing and gene network analyses has not been described. [0003] Further, efficient production and proper folding of many recombinant proteins expressed in Escherichia coli are often challenging. This challenge of expressing properly folded proteins further increases with the complexity of the protein such as a full-length antibody requiring the formation of disulfide bond formation. Despite SoluPro™ having an oxidative cytoplasm that is required for the formation of the disulfide bonds in recombinant proteins, the presence of missing disulfide bonds has been observed, and scrambled disulfide bonds with improperly folded individual chains (light or heavy chain) of the antibody and multiple product related impurities aggravated due to the misfolding issues and inefficient disulfide bond formation.
[0004] Thus, comprehensive techniques for determining optimized cell media formulations are needed, to improve product quality and titer.
Summary
[0005] The disclosure provides methods and compositions for optimizing cell media formulations. In one embodiment of the present disclosure, a method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest is provided, said method comprising: (1) applying a mixing algorithm to a high- throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.
[0006] In another embodiment, the present disclosure provides a method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.
[0007] In still another embodiment, the present disclosure provides a method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.
[0008] The present disclosure also provides an aforementioned method optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at
least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.
[0009] In still another embodiment, an aforementioned method is provided wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm. In another embodiment, the high-throughput device is selected from the group consisting of a liquidhandling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.
[0010] In yet another embodiment, an aforementioned method is provided wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multiwell plates, and one or more multi-well or multi-vessel bioreactor systems. In some embodiments, the multi-well plate comprises 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells.
[0011] In another embodiment, an aforementioned method is provided wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factor, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin. In one embodiment, the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.
[0012] The present disclosure also provides an aforementioned method wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.
[0013] In still another embodiment, an aforementioned method is provided wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a
dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules. In some embodiments, the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a non-commercial antibody, a clinical antibody, a non-clinical antibody, a research-grade antibody, a diagnosticgrade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitope-binding fragments of any of the above.
[0014] In yet another embodiment, an aforementioned method is provided wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest. In still another embodiment, an aforementioned method is provided further comprising measuring cell growth.
[0015] In some embodiment, the present disclosure provides an aforementioned method wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA. In still another embodiment, an aforementioned method is provided wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.
[0016] In another embodiment of the present disclosure, an aforementioned method is provided wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells. In one embodiment, the cells are bacterial cells. In another embodiment, the bacterial cells are E. coli cells. In still another embodiment, the E.coli cells comprise one or more or all of: (a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter; (b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter; (c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm; (e) a reduced level of gene function of a gene that encodes a reductase; (f) at least one expression construct encoding at least one disulfide bond
isomerase protein; (g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or (h) at least one polynucleotide encoding Ervlp.
[0017] The present disclosure also provides computing systems and computer-implemented methods that are optionally performed or used in combination with other, separate, computer systems or wet lab assays as described herein. In one embodiment, a computing system for identifying an improved bioform substrate is provided, comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.
[0018] In another embodiment of the present disclosure, a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest is provided.
Brief Description of Figures
[0019] FIG. 1 A shows one embodiment of a genetic algorithm media optimization workflow environment, according to some aspects;
[0020] FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects;
[0021] FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects;
[0022] FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects;
[0023] FIG. 2A shows exemplary cell media optimization using Darwinian fitness, according to some aspects;
[0024] FIG. 2B depicts an exemplary selection for reproduction, according to some aspects;
[0025] FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects;
[0026] FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects;
[0027] FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects;
[0028] FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects;
[0029] FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects;
[0030] FIG. 2H depicts a chart showing population evolving to a higher HiPrBind™ signal, according to some aspects;
[0031] FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects;
[0032] FIG. 3 shows Fab Expression in SoluPro E.coli. Random mixtures (light grey) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (dark grey) are shown versus control conditions (diagonal hatch). The mean relative expression of the evolved mixtures average 25% higher than unevolved mixtures after 2 rounds of the GA;
[0033] FIG. 4 shows Fab expression before and after GA. Means comparison show a significant improvement of mixtures after two rounds of evolution;
[0034] FIG. 5 shows principal component analysis (PCA) of RNAseq data and titer. Principal component analysis of RNAseq data (left) reduces ~4000-dimensional gene expression data to two dimensions. Generations are overlayed showing a convergence on a single location, which matches the area of highest Fab expression (right);
[0035] FIG. 6A shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoS iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects;
[0036] FIG. 6B shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoH iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects;
[0037] FIG. 6C shows iModulon activity analysis mixtures with a lower stress response correlate with higher HPB activity, according to some aspects, and specifically that activity analysis of DksA, a ribosomal protein subunit regulator that regulates ribosomal synthesis, and specifically depicted is that higher ribosome levels trend towards higher HPB, in some aspects;
[0038] FIG. 6D depicts iModulon activity analysis wherein starvation of leucine is associated with high HPB, according to some aspects;
[0039] FIG. 6E depicts the iModulon IscR, according to some aspects;
[0040] FIG. 6F depicts Fur is associated with iron metabolism, wherein conditions with lower iron concentration are associated with higher HPB signal, according to some aspects;
[0041] FIG. 6G depicts that Cbl relates to sulfur metabolism and high signals HPB indicate a state of sulfur starvation, according to some aspects;
[0042] FIG. 7A depicts scaling up, purification and characterization of purified material, according to some aspects;
[0043] FIG. 7B depicts integrated OUR and CER, according to some aspects;
[0044] FIG. 70 depicts that scaling up quality increases by a high percentage (e.g., 1 11 % with doubling of carbon), according to some aspects;
[0045] FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits);
[0046] FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects;
[0047] FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects;
[0048] FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects;
[0049] FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects;
[0050] FIG. 71 depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects;
[0051] FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to CHO produced Mab, according to some aspects;
[0052] FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects;
[0053] FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects;
[0054] FIG. 80 depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects;
[0055] FIG. 9A depicts an exemplary structural characterization environment, according to some aspects;
[0056] FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects;
[0057] FIG. 90 depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects;
[0058] FIG. 10A depicts iModulon gene sets, according to some aspects;
[0059] FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects;
[0060] FIG. 1 1 A depicts iModulon activity, according to some aspects;
[0061] FIG. 1 1 B depicts iModulon activity, according to some aspects;
[0062] FIG. 1 1 C depicts iModulon activity, according to some aspects;
[0063] FIG. 1 1 D depicts iModulon activity, according to some aspects; and
[0064] FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects.
Detailed Description
[0065] The present disclosure addresses the aforementioned unmet need and provides methods that combine algorithms that create mixtures of media components with machine learning, RNAseq and ICA, rapid identification of optimized mixtures (e.g., optimized cell media formulations), and thus unmasks how an organism responds to diverse stresses and unfamiliar environments.
[0066] FIG.1 A shows an exemplary workflow of one embodiment of the methods described herein. Mixing optimization algorithms such as, for example, a Genetic Algorithm (GA), Bayesian algorithms, and the like are used to search an experimental space for the optimal mixture of components and concentration of such components based on scoring formulas that can include robustness of growth, product quality and titer. The mixing algorithms are, in some embodiments, used in connection with high-throughput screening methodologies such as liquid handling robots and specialized equipment for evaluating growth rates of inoculated microorganisms or cell lines, measuring product quality and titer. These are typically executed as stand-alone experiments, where improved mixtures are identified empirically. As described herein, RNAseq is used in one embodiment to understand what genes are being transcribed at the time of sampling from a culture of microbial cells or mammalian cell lines. Independent component analysis is used, in some embodiments, to identify co-expressed functionally related gene sets (iModulons). This approach is used to identify limitations of media components, metabolic state(s) and the like. By combining these two approaches with machine learning, the number of individual members (wells, flasks, reactors) is significantly decreased.
[0067] The present disclosure also provides computing systems and computer-implemented methods that are optionally performed or used in combination with other, separate, computer systems or wet lab assays as described herein. In one embodiment, a computing system for identifying an improved bioform substrate (e.g., a cell culture media formulation) is provided, comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate
component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement (e.g., a gene)that is transcribed, and (c) at least one independently modulated arrangement set (e.g., a gene set as described herein) by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.
[0068] The present techniques further provide the ability to form improved disulfide bond formation and protein folding in full length monoclonal antibodies in Escherichia co// by improving media composition using genetic algorithm based media optimization.
[0069] It will be appreciated by those of skill in the art that a computing system such as described herein and above can be used in combination with other systems/wet lab assays which can optionally measure, for example, (b) at least one arrangement (e.g., a gene) that is transcribed, and (c) at least one independently modulated arrangement set (e.g., a gene set) by identifying the improved substrate. In this way, some measurements are taken by a fermentation device such as a BioLector (Beckman). The BioLector measures on a per-well basis pH, O2 concentration, and biomass. Such measurements may or may not be considered when assessing the fitness function to select winners for breeding into subsequent rounds as described in more detail below. Biomass would be considered where, for example, higher cell mass was an objective (e.g., probiotic cultures). Fermentation devices such as those described herein have a fluorescent channel, so one could add a fluorescent reporter gene, or add a fluorescent substrate, as an input for fitness. One example would be to create a genetic construct wherein the quantity of the protein of interest is directly proportional to the amount of signal from the co-expressed fluorescent protein.
[0070] Using the methods and methods steps disclosed herein, an Al engine is capable of learning what mixtures are optimal per strain and per target protein, and the development timeline will be decreased and entirely in-silico predictions of optimal mixtures may be achieved. In this way, a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest, is contemplated in one embodiment.
[0071] Mixing Algorithms
[0072] In some embodiments, a genetic algorithm (GA) is contemplated. In one embodiment, the genetic algorithm is or is adapted from the algorithm described in H., Weuster-Botz, et al., Bioprocess Biosyst Eng 29, 385-390 (2006) (https://doi.org/10.1007/s00449-006-0087-7). GAs are inspired by Darwinian principles of natural selection, or “survival of the fittest.” GAs are based on evolutionary principles, encoding several sets of design variables (e.g., volumes of media components) on strings. These strings may be processed by GA operators as discussed herein (e.g., crossover, mutation, etc.) throughout one or more successive generations. In this scheme, each “gene” may be a pipetting volume of a media component, and each “chromosome” may be a well in a microplate comprised of discrete volumes of each “gene.” The principle of “survival of the fittest” assures a convergence toward optimal values with iterative generations.
[0073] In some aspects of the present invention, “fitness” refers to an ability of a particular growth medium to promote or enhance growth/survival of an inoculated biological organism (e.g., a cell placed into the well of a bioreactor). Thus, the GA may optimize for survival of the most welcoming, with respect, for example, to the respective abilities of cell media contained in respective wells to promote the growth/survival of bacteria, sometimes in the presence of less than ideal conditions. In this case, the one or more media most welcoming to a given bacteria, given one or more conditions, may be the media considered to have “survived,” for purposes of a single round of evolution (e.g., during a tournament selection process, as discussed herein).
[0074] In some aspects, the present techniques may use different/additional criteria to moderate fitness, such as minimizing material necessary to achieve a successful culture (survival of the most materially efficient); minimizing the number of components respective media consist of while still achieving a successful culture (survival of the simplest); minimizing economic input cost (survival of the least expensive); etc.
[0075] In some aspects of the present invention, “fitness” refers to an ability of a biomolecule to survive under various conditions. Thus, fitness of media and biomolecules may be determined separately, and/or concomitantly, in aspects.
[0076] In GAs, genetic operators are mathematical functions used to simulate biological concepts, such as mutation, crossover (i.e., recombination or mixing of chromosomes) and selection. These operators are the practical hooks by which the fitness concepts discussed above are enforced in an evolutionary computation solution. For example, a particular crossover operator may be implemented to avoid recombining media materials known to be incompatible, and a mutation operator may be used to introduce diversity, and to avoid
premature convergence. Similarly, as will be appreciated by those of ordinary skill in the art, a given selection operator (e.g., tournament selection, roulette wheel selection, etc.) may be chosen to determine respective fitness of individuals.
[0077] As appreciated by those of ordinary skill in the art, the present techniques may include other/additional algorithmic approaches. For example, the GA may include a Bayesian estimation of distribution (EDA) algorithm, or probabilistic model-building genetic algorithm (PMBGA). In one embodiment, a naive Bayes algorithm (e.g., https://onlinelibrary.wiley.com/doi/epdf/10.1002/bit.28132) is contemplated.
[0078] In still other embodiments, a differential evolution algorithm (Storn, R. M. & Price, K. V. Differential evolution — a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11 (4), 341-359. https://doi.Org/10.1023/A:100820282 (1997)) is contemplated. In general, any direct, stochastic, and/or population algorithm is contemplated herein (https://nature.com/articles/s41598-020-74228-0).
[0079] Exemplary, non-limiting cell media formulation components include nutrients as well as buffering compounds, pH and temperature values. Components, therefore, may include one or more carbon sources, one or more nitrogen sources, one or more analytes, one or more salts, one or more buffering compounds, a pH or pH range, a temperature or temperature range, one or more metal salts, one or more trace minerals, one or more biostimulants, one or more co-factors, one or more peptides, one or more modified peptides, one or more nucleic acids, a one or more nucleic acid precursors, one or more small molecules, and/or one or more vitamins.
[0080] In some embodiments, amino-nitrogen sources such as peptone, protein hydrolysates, infusions and extracts, both plant based such as soytone, potato hydrolysate, grains and grain meals, or animal based peptones, such as meat digests and casein hydrolysate, whey and gelatin are contemplated. Various yeast extracts and, or combinations of all or some of the available amino acids such as alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine and valine are also contemplated. Growth factors such as blood, or serum, vitamins, NAD, and yeast extract are also contemplated in some embodiments. In still other embodiments, energy sources such as any kind of sugar, alcohol, or carbohydrate are contemplated, including, for example, glucose, glycerol, sorbitol, fructose, sucrose, maltose, sophorose, lactose, dextrose, galactose, arabinose, high fructose corn syrup, maltodextrin, manose, ribose, trehalose, xylose and the like. Sugar alcohols are additionally contemplated
and include, for example, arabitol, erythritol, glycerol, matlitol, mannitol, lactitol, sorbitol and xylitol. Inorganic phosphate and sulfur, trace metals, water, and vitamins are often included in microbial culture media, and these components are contemplated herein. Mineral salts such as phosphate, sulfate, magnesium, calcium, iron are also contemplated herein. Selective agents such as bile salts, desoxycholate, chemicals, antimicrobials, antibiotics and dyes can be included according to some embodiments of the present disclosure. Gelling agents are, in some embodiments, used in solid and semi-solid media and may include agar, gelatin, alginate, albumin and silica gel. Protective agents such as calcium carbonate, soluble starch and charcoal are, in some embodiments, used to absorb toxic metabolites or neutralize media. Certain growth factors including NAD and hemin, or surfactants such as polysorbate 80 are, in some embodiments, used to alter growth rates. Trace metals are, in still other embodiments, included and include, but are not limited to; aluminum, manganese, molybdenum and iron, copper and nickel.
[0081] FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects. FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects.
[0082] Genetic Algorithms for Improved Product Quality
[0083] As discussed herein, the present techniques may include improved folding techniques. Specifically, cell media components contain different nutrients and cofactors for cell proliferation and product formation. These components can also be optimized to affect the quality of the product formed. Using a genetic algorithm approach, the present techniques are able to identify an optimized media composition that results in higher quality (more than 2-fold increase in quality metric) full-length antibody with improved folding and disulfide bond formation.
[0084] FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects. As shown in FIG. 1 D, the steps to implement this improvement are as follows:
1 . Media Optimization using Genetic Algorithm - Genetic Algorithm workflow consisting of 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals) the experiments were executed with SoluPro E.coli strain expressing full-length Mab.
2. Broth harvested at 68h were purified and measured by iCIEF, an assay that measures the quality of the product in terms of folding and disulfide bond formation and NR-CGE. This assay measures intermolecular disulfide bond formation.
3. Top candidates with the highest quality scores were selected as parents for subsequent round of mixtures and an evolutionary approach to recombination to produce another 60 mixtures. This was repeated twice, yielding populations with improved MAb quality of 40% in comparison to 15% with the control media - an improvement of more than 266 %.
4. Top 2 candidates were scaled up to the Biorecator scale that still exhibited a 200% increase in quality at the end of fermentation.
5. A comprehensive RNAseq was collected to compare the control and new improved media in the bioreactor.
6. A complete downstream purification and analytical characterization was performed on the product obtained from the use of the new improved media.
[0085] The foregoing steps are described herein in further detail.
[0086] It should be appreciated that the present techniques include genetic algorithm and iModulon-based optimization of media formulation for quality, titer, strain, and/or process improvement of biologies, in some aspects. For example, in some aspects, the present techniques include methods, compositions and/or systems for optimizing cell media formulations for the amount of biomolecules of interest expressed. These may be single parameter optimizations focused around the amount of biomolecules of interest expressed. Further, these techniques may provide methods/systems/compositions for identifying one or more genes and/or independently modulated gene sets to optimize the media formulation towards the amount of biomolecules expressed at the small scale or high throughput condition.
[0087] In other aspects, the present techniques may include methods, systems and/or compositions for optimizing cell media formulations for the quality of the biomolecules of interest expressed. For example, the present techniques may include a multi-parametric optimization using cell media formulation around biomass, amount of biomolecules of interest and the quality of biomolecules of interest. In some aspects, biomolecules of interest may be produced at higher quality but at the same time a minimum threshold amount that is sufficient for downstream assays. The present techniques may include optimizing media to allow for
sufficient growth of the cells that are the cell factories making or manufacturing certain biomolecules.
[0088] Further, the present techniques may include identifying one or more independently modulated gene sets that improve media formulation for bioreactor scale-up. In addition, the present techniques may include techniques for identifying individual genes and independently modulated gene sets to gain insights for strain engineering to improve. Specifically, the optimized cell media formulations can be genetically encoded into the strain and therefore (1 ) reduce cogs (2) retain quality during scale up along with (3) boosting the amount of product/biomolecule produced.
[0089] Furthermore, the present techniques may include techniques for identifying gaps at the bioreactor scale process in terms of nutrient limitations over the course of time. The present techniques may provide insights to optimize and develop a process using nutritional requirements based on the individual genes or the independently modulated gene sets.
[0090] FIG. 2A depicts an exemplary Darwinian “fitness” approach where the pipetting volume represents a “gene” and where a well (e.g., in a multi-well plate as described herein) represents a chromosome. FIG. 2B depicts chromosomes (wells) with the highest fitness score are, in one embodiment, bred by combining genes (component volumes) from two fit parents. In some embodiments, a tournament selection process is used to determine the parent individual for the subsequent generation. In this process, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size. In another embodiment, a mutation step/process is used. In the mutation step, simple mutations are applied to 5, 10, 15, 20, 25% or more of individuals, and crossovers are applied to the remaining 75%. In a simple mutation, the concentration of one component in the formulation is changed to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.
[0091] The mixing and/or culturing steps of the methods described herein may be done, in some embodiments, within a microfluidic mixing system, a droplet microarray system, a multiwell plate (including, for example, controlled-release multi-well plates) or other high throughputcompatible devices as are known in the art. Multi-well plates or microplates can comprise 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells or more wells, and multiple plates may be used in the methods described herein.
[0092] As described herein, culturing (e.g., of the aforementioned multi-well plates and other devices) is performed under a variety of conditions, including stable conditions and conditions that change over time. By way of example, the culturing can be performed under conditions that promote cell growth, where the conditions include or exclude constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing. In this way, as will be appreciated by those in the art, in addition to measuring various aspects of the biomolecules produced from the cells as described herein, the present disclosure also contemplates measuring and/or maintaining cell culturing conditions that allow productions of the biomolecules. Conditions such as constant or intermittent shaking (e.g., 25-1000 rpm), constant or intermittent oxygen levels (e.g., 0-40 %) constant or intermittent humidity (e.g., 20- 90 %), pH (e.g., 3-8) and/or constant or intermittent temperature (e.g., 10-60 °C) are contemplated.
[0093] FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects, wherein earlier generations are of a lower titer and quality, and later media have a higher titer and quality, wherein the improved titer and quality are the result of evolutionary pressures.
[0094] FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects. The experimental space searched by the genetic algorithm may be thought of as a mountain range (e.g., gradient) wherein the genetic algorithm seeks to find the highest peak, without reference to a map.
[0095] FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects. As discussed herein, the present techniques may be implemented using a programming language such as Python. Optimization can be slow and unpredictable using OFAT (“One Factor at a Time”) or even Design of Experiments (DoE) optimization techniques. Even high throughput wet lab media optimization using GA samples a very small fraction of possible mixtures, in some aspects. The solution space may be too large to cover even with automation and micro fermentation. The present techniques may solve such problems by combining iModulons and machine learning with GAs. Specifically, wet lab data and Al advantageously enable the present techniques to explore more of the best media conditions.
[0096] For example, an experimental design may include a particular strain (e.g., SoluPro™) E. coli expressing a Mab. A GA script may produce a number (e.g., 60) of unique conditions (e.g., 30 per plate) A number of controls may be included (e.g., 4 controls (2 per plate)).
Process conditions may include dynamic feeding, pH feed trigger at pH setpoint and a temperature shift after 12hrs EFT.
[0097] The GA script of FIG. 2E depicts a tournament selection process to determine the parent individual for the subsequent generation. In this exemplary process, five individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size. In the mutation step, simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%. In a simple mutation, the concentration of one component in the formulation is changed to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring. The script of FIG. 2E may output a config file and liquid handling robot volume output file. The script may be parameterized and run using the following command: python optimize media ga.py Define-GA Genes edit.xlsx -output file=ga-media-config- genO.csv --population_size=64 --random_seed=400
[0098] FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects, for example as described with respect to FIG. 2E.
[0099] FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects. The initial random mixtures show diverse growth trends.
[0100] FIG. 2H depicts a chart showing population evolving to a higher HiPrBind™ signal, according to some aspects. FIG. 2H depicts Mab quality fitness scorein SoluPro Ecoli. Random mixtures (red color) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (blue and green) are shown versus the control condition.
[0101] FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects. Mean comparison show a significant improvement of mixtures after two rounds of evolution.
Measuring Biomolecules
[0102] As described herein, a biomolecule of interest may be expressed and produced from cells according to various embodiments. Exemplary biomolecules are described further herein. In some embodiments, the methods provided herein include one or more steps that include measuring or detecting or analyzing the biomolecules that are produced.
[0103] In one embodiment, the biomolecule of interest includes a fluorescent tag or fusion. Fluorescence activated cell sorting (FACS) and flow cytometry, as are known in the art, are therefore contemplated herein. As described herein, a liquid-handling robot that is capable of inoculating and culturing cells and equipped to take a variety of measurements is contemplated for use in various embodiments of the present disclosure. As one example, a BioLector® microbioreactor (Beckman) can be used. Other microfermentation systems are contemplated (Ambr 15 and 250 systems by Sartorius (https:/sartorius-stedim-tap.com/a/micro-bioreactor- gp.htm); DASbox and DASGIP systems from Eppendorf (https://online-shop.eppendorf.us/US- en/Bioprocess-44559/Bioprocess-Systems-60767/DASbox-Mini-Bioreactor-System-PF- 133566.html); and Feed Plate technologies from Kuhner Shaker company (https://feedingtechnology.com/portfolio-item/feedplates/?lang=en.)).
[0104] Numerous additional assays are also contemplated. Binding assays, for example assays that measure protein-protein interactions, including antibody-antigen interactions and including measuring binding affinity, are well known in the art. By way of example, Surface plasmon resonance (SPR), Dual polarisation interferometry (DPI), Static light scattering (SLS), Dynamic light scattering (DLS), Flow-induced dispersion analysis (FIDA), Fluorescence polarization/anisotropy, Fluorescence resonance energy transfer (FRET), Bio-layer interferometry (BLI), Isothermal titration calorimetry (ITC), Microscale thermophoresis (MST), Single colour reflectometry (SCORE) are contemplated. Additionally, Bimolecular fluorescence complementation (BiFC), affinity electrophoresis, label transfer, phage display, Tandem affinity purification (TAP), cross-linking, Quantitative immunoprecipitation combined with knock-down (QUICK) and Proximity ligation assay (PLA) are other well-known assays that provide proteinprotein interaction information.
[0105] In some embodiments, the binding affinities of the antibodies described herein are measured by array surface plasmon resonance (SPR), according to standard techniques (Abdiche, et al. (2016) MAbs 8:264-277). Briefly, antibodies were immobilized on a HC 30M chip at four different densities / antibody concentrations. Varying concentrations (0-500 nM) of antibody target are then bound to the captured antibodies. Kinetic analysis is performed using Carterra software to extract association and dissociation rate constants (ka and kd, respectively) for each antibody. Apparent affinity constants (KD) are calculated from the ratio of kd/ka. In some embodiments, the Carterra LSA Platform is used to determine kinetics and affinity. In other embodiments, binding affinity can be measured, e.g., by surface plasmon resonance (e.g., BIAcore™) using, for example, the IBIS MX96 SPR system from IBIS Technologies or the
Carterra LSA SPR platform, or by Bio-Layer Interferometry, for example using the Octet™ system from ForteBio. In some embodiments, a biosensor instrument such as Octet RED384, ProteOn XPR36, IBIS MX96 and Biacore T100 is used (Yang, D., et al., J. Vis. Exp., 2017, 122:55659).
[0106] KD is the equilibrium dissociation constant, a ratio of k0ff/k0n, between the antibody and its antigen. KD and affinity are inversely related. The KD value relates to the concentration of antibody and so the lower the KD value (lower concentration) and thus the higher the affinity of the antibody. Antibody, including reference antibody and variant antibody, KD according to various embodiments of the present disclosure can be, for example, in the micromolar range (10-4 to 10'6), the nanomolar range (1 O'7 to 10'9), the picomolar range (1 O'10 to 10'12) or the femtomolar range (1 O'13 to 10'15). In some embodiments, antibody affinity of a variant antibody is improved, relative to a reference antibody, by approximately 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50% or more. The improvement may also be expressed relative to a fold change (e.g., 2x, 4x, 6x, or 2-, 3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-fold or more improvement in binding activity, etc.) and/or an order of magnitude (e.g., 107, 108, 109, etc.).
[0107] In still other embodiments, the amount of “biologically active” biomolecules produced are also measured or detected or analyzed. Biologically active includes, but is not limited to, a properly folded biomolecule such as a therapeutic protein or enzyme or antibody or fragment of any of the above, an enzymatically active biomolecule, an antibody or fragment thereof that is capable of binding to an antigen, and a protein or polypeptide that is capable of binding to a ligand.
[0108] In one embodiment, an activity-specific cell-enrichment (ACE) assay is used with the methods described herein. The activity-specific cell-enrichment (ACE) assay identifies host cells that express active gene product of interest (e.g., biomolecules, as used herein) rather than inactive material, as described in WO 2021/146626, incorporated herein in relevant part. Active gene product can be distinguished from inactive material by the ability of active gene product to specifically bind a binding partner molecule, or by the ability of gene product to participate in a chemical or enzymatic reaction, as examples. The presence of properly formed disulfide bonds in a polypeptide gene product is an indication that it is correctly folded and presumptively active. In the cell-enrichment methods, active gene product of interest is detected by utilizing an appropriate labeling complex that specifically binds to active gene product of interest, such as a labeled antigen if the gene product of interest is an antibody or Fab; or a labeled ligand if the gene product of interest is a receptor or a receptor fragment,
where the ligand specifically binds to an active conformation of the receptor; or a labeled substrate or a labeled substrate analog if the gene product of interest is an enzyme, as examples. For any gene product of interest, if there is an available antibody or antibody fragment that specifically binds to the active gene product and not to inactive gene product, that antibody or antibody fragment can be used to label the active gene product of interest when attached to a detectable moiety.
[0109] Exemplary ACE assay protocol (based on assays in WO2021/146626)
1 . Fix sample cells with a formaldehyde based solution.
2. Prepare a permeablization buffer and treat cells.
3. Add biotin to the 1x PE+ 1 mM EDTA to a final concentration 0.1 mg/ml biotin.
4. Add biotin (stored at -80eC or 4eC) at a 10Ox dilution (e.g., 5000uL or 5mL of PE buffer would require 50uL biotin).
5. Combine the primary (and secondary probes if dual probe), fluorescently labled in 1 x_PE +1_mM_EDTA in 15mL centrifuge tube (or 50mL, if staining reagent volume exceeds 15mL)
6. Incubate and rotate for at least 1 hour at 49C with foil wrapped around tube.
7. After 1 hr, add biotin to stain solution to a final concentration 0.1 mg/ml biotin
8. Add biotin (stored at -80 or 4eC) at a 100x dilution. Eg, 5000|iL or 5 mL of PE buffer would require 50|iL biotin) Incubate again with rotation for at least 30 min with foil wrapped around tube.
9. Spin samples at 3300g at 4eC for 5 minutes
10. Aspirate the supernatant with the vacuum pump and the matrix tube attachment. Avoid touching the attachment to the sides of the tube
1 1. Slowly cascade 500|iL of 1X PBS + 1 mM EDTA onto the side of the sample tubes without disturbing the pellet.
12. Add 250|iL of E2 Fixation Buffer to each tube.
13. After the 18hr incubation, remove samples from rotator and spin in centrifuge at 3300g at 4eC for 3 minutes.
14. Carry out FACS on the stained cell samples, binning by fluorescence signal.
[0110] In still another embodiment, a HiPrBind assay is used with the present methods. The HiPrBind assay provides an efficient method for multiple interrogations of an active gene product, such as by providing at least two distinct interrogations of a characteristic property of
the active gene product or simultaneously interrogating at least two characteristic properties of that active gene product. HiPrBind assays are described in WO 2021/163349, incorporated herein in relevant part. The assay is an advance on the principle underlying the yeast two hybrid assay in that a multi-component detection mechanism is brought into proximity, and thereby brought into an environment where the detection mechanism can be active in producing a signal capable of detection. One component of the multi-component e.g., two component) detection system is stably associated with a first analyte-associating moiety (/.e., active gene product-associating moiety) and a second component of the detection system is stably associated with a distinct second analyte-associating moiety. A detectable signal is generated when the two components of the detection system are brought into proximity by the analyteassociating moieties binding to the analyte. Because each of the analyte-associating moieties is specific for an active gene product as analyte, a signal is only generated when a characteristic property of an active gene product is detected using two distinct mechanisms, or when two distinct characteristic properties of an active gene product are simultaneously detected. The HiPrBind assay is versatile in detecting a variety of characteristic properties, but a simple example involves a gene product that is active in homodimeric form wherein each monomer requires disulfide bonds to properly fold. One active gene product-associating moiety can be a binding agent that specifically binds to the properly folded and therefore active monomer, and a second active gene product-associating moiety can be a distinct second binding agent that specifically binds to the dimeric form of the gene product. Thus, the HiPrBind assay in this version simultaneously detects a gene product that is properly folded and in dimeric form.
[0111] Exemplary HiPR Bind assay (based on assays in WO2021/163349)
1 . Culture sample cells in proper induction conditions to facilitate target protein and accessory protein expression using arabinose and or proprionate media. Grow sample cells in a 96 well plate.
2. Keep sample plates on ice to thaw. While those are thawing, gather and label the plates needed and prep assay solutions.
3. Dilute standard into Dilution Buffer 1 solution (0.1x Perkin Elmer Buffer, 1 mM EDTA, 1 x PBS).
4. Prepare Assay solution I (ASI) and Assay solution II (ASH) in dark amber colored 50mL conicals .
5. Predispense Dilution Buffer 2 into 384 well V-bottom Greiner Bio-One dilution plates, resuspend cell pellets
6. Predispense ASI into 384-well Proxiplates. Visually inspect that the silicone nozzles are fitted properly and that nothing looks loose or off kilter.
7. Once each plate is finished, seal it with a Perkin Elmer plate sealer.
8. Spin down plates at 500g for 1 min.
9. Incubate the plates overnight at 4eC.
10. The next day, take the plates out of 4eC storage. Allow to equilibrate to room temp for at least 1 hour.
11 . Feed plates into plate feeder on the Enspire.
12. Scan on the Enspire using the "Alpha, Fl - DNA_Ex480 Em 520-Alpha 384-SW".
13. Record values for further analysis of alpha max slope.
Measuring Gene Transcription and Identifying Gene Sets
[0112] As described herein, in addition to measuring biomolecules, the genes (i.e., other than the gene encoding the biomolecule) that are induced inside cells during the culturing steps described herein are, in some embodiments, also or alternatively measured/determined.
[0113] RNAseq or RNA-Seq is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome (Wang Z., et al., Nature Reviews. Genetics., 10(1 ): 57-63 (2009)). The present disclosure provides, in some embodiments, using RNA sequencing and other well-known high throughput sequencing techniques and assays to identify the gene expression or transcript number or sequence of individual, gene sets, or an entire genome/transcriptome. In yet another embodiment the use of metabolomic data to identify the quantity of key metabolites and metabolic states can be used. Metabolomics is a discipline widely used in systems biology and has been applied to explain observed phenotypes. (Lopez-Malo M, et al., PLoS ONE. 2013;8:e60135.)
[0114] Independent component analysis (ICA) is a signal deconvolution algorithm and can be used according to some embodiments of the disclosure. Sastry et al. have recently described that E. coli transcriptome mostly consists of independently regulated modules (Satry, A.V., et aL, Nature Comm., 2019, 10:5536). Additionally, Tan et al. recently reported that ICA of E. Goli’s transcriptome revealed the cellular processes that respond to heterologous gene expression (Tan J., et al., Metabol. Eng., 2020, 61 , 360-368). Additionally, improvements or modification to the independent component analysis can be used such as OptICA, for finding the optimal dimensionality that controls for both over-and under-decomposition. (McConn, J.L., et al., BMC Bioinformatics 22, 584 (2021 ) https://doi.org/10-1 186/s12859-021 -04497-7). Other
dimensionality reduction techniques may be employed such as PCA, ZIFA, GrandPrix, t-SNA, UMAP, DCA, scvis, VAE and SIMLR (Front. Genet., 23 March 2021 , Sec. Computational Genomics, https://doi.org/10.3389/fgene.2021.646936). While the Example proved below is based on RNAseq (transcriptomics), dimensionality reduction can, in another embodiment, be applied to metabolomics data that may be used to optimize media towards a particular desired metabolic state. (Proteome Res. 2012, 11 , 8, 4120-4131 , June 20, 2012, https://doi.org/10.1021/pr300231 n) (Lin J, et al., RSC Adv. 2019 Aug 30;9(47):27369-27377. doi: 10.1039/c9ra05128g. PMID: 35529190; PMCID: PMC9070647.)
Host Cells
[0115] “Host cells” herein are, in some embodiments, cells used in bioprocessing to manufacture heterologous protein products. Such host cells can be, for example, eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.
[0116] Prokaryotic host cells are provided that comprise expression constructs designed for the expression of coding regions. Prokaryotic host cells can include archaea (such as Haloferax volcanii, Sulfolobus solfataricus), Gram-positive bacteria (such as Bacillus subtilis, Bacillus licheniformis, Brevibacillus choshinensis, Lactobacillus brevis, Lactobacillus buchneri, Lactococcus lactis, and Streptomyces lividans), or Gram-negative bacteria, i.e., proteobacteria, including Alphaproteobacteria (Agrobacterium tumefaciens, Caulobacter crescentus, Rhodobacter sphaeroides, and Sinorhizobium meliloti), Betaproteobacteria (Alcaligenes eutrophus), and Gammaproteobacteria (Acinetobacter calcoaceticus, Azotobacter vinelandii, Escherichia coli, Pseudomonas aeruginosa, and Pseudomonas putida). Host cells include Gammaproteobacteria of the family Enterobacteriaceae, such as Enterobacter, Erwinia, Escherichia (including E. coli), Klebsiella, Proteus, Salmonella (including Salmonella typhimurium), Serratia (including Serratia marcescans), and Shigella.
[0117] As described in WO/2017/106583, incorporated by reference herein in its entirety, producing gene products such as therapeutic proteins at commercial scale and in soluble form is addressed by providing suitable host cells capable of growth at high cell density in fermentation culture, and which can produce soluble gene products in the oxidizing host cell cytoplasm through highly controlled inducible gene expression. Host cells of the present disclosure with these qualities are produced by combining some or all of the following characteristics. (1 ) The host cells are genetically modified to have an oxidizing cytoplasm by
increasing the expression or function of oxidizing polypeptides in the cytoplasm, and/or by decreasing the expression or function of reducing polypeptides in the cytoplasm. Specific examples of such genetic alterations are provided herein and in WO 2017/106583. Optionally, host cells can also be genetically modified to express accessory proteins (which can be chaperones) and/or cofactors that assist in the production of the desired gene product(s), and/or to glycosylate polypeptide gene products. (2) The host cells comprise one or more expression constructs designed for the expression of one or more active gene products of interest. At least one expression construct can comprise an inducible promoter and a polynucleotide encoding a gene product to be expressed in active form from the inducible promoter. (3) The host cells contain additional genetic modifications designed to improve certain aspects of gene product expression from the expression construct(s). The host cells can (A) have an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter, and as another example, wherein the gene encoding the transporter protein is araE, araE, araG, araH, rhaT, xylF, xylG, or xylH, or particularly the transporter protein is araE, or wherein the alteration of gene function more particularly is expression of unaltered araE from a constitutive promoter; and/or (B) have a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter, and as further examples, wherein the gene encoding a protein that metabolizes an inducer of at least one said inducible promoter is selected from the group consisting of araA, araB, araD, prpB, prpD, rhaA, rhaB, rhaD, xylA, and xylB; and/or (C) have a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter, which gene can be scpA/sbm, argK/ygfD, scpB/ygfG, scpC/ygfH, rmlA, rmlB, rmIC, or rmID.
[0118] Host Cells with Oxidizing Cytoplasm. The expression systems of the present disclosure are designed to express active gene products. Examples of host cells are provided that allow for the efficient and cost-effective expression of active gene products, including components of multimeric products. The host cells can be microbial cells such as gramnegative bacteria, e.g., E. coli. Exemplary E. coli host cells having oxidizing cytoplasm include the E. coli B strains SHuffle® Express (NEB Catalog No. C3028H) and SHuffle® T7 Express (NEB Catalog No. C3029H) and the E. coli K strain SHuffle® T7 (NEB Catalog No. C3026H). The E. coli B strains with oxidizing cytoplasm are able to grow to much higher cell densities than the most closely corresponding E. coli K strain (WO/2017/106583).
[0119] Alterations to host cell gene functions. Certain alterations can be made to the gene functions of host cells comprising inducible expression constructs, to promote efficient and homogeneous induction of the host cell population by an inducer. Preferably, the combination of expression constructs, host cell genotype, and induction conditions results in at least 75% (more preferably at least 85%, and most preferably, at least 95%) of the cells in the culture expressing active gene product from each induced promoter, as measured by the method of Khlebnikov et al. described in Example 9 of WO/2017/106583. For host cells other than E. coli, these alterations can involve the function of genes that are structurally similar to an E. coli gene, or genes that carry out a function within the host cell similar to that of the E. coli gene. Alterations to host cell gene functions include eliminating or reducing gene function by deleting the coding region of the gene in its entirety, or by deleting a large enough portion of the gene, inserting sequence into the gene, or otherwise altering the gene sequence so that a reduced level of functional gene product is made from that gene, as is described herein with greater particularity for the ptsP gene or coding region. Alterations to host cell gene functions also include increasing gene function by, for example, altering the native promoter to create a stronger promoter that directs a higher level of transcription of the gene, or introducing a missense mutation into the protein-coding sequence that results in a more highly active gene product. Alterations to host cell gene functions include altering gene function in any way, including for example, altering a native inducible promoter to create a promoter that is constitutively activated. In addition to alterations in gene functions for the transport and metabolism of inducers, as described herein with relation to inducible promoters, and/or an altered expression of chaperone proteins, alterations of the reduction-oxidation environment of the host cell are also contemplated.
[0120] Host cell reduction-oxidation environment. In bacterial cells such as E. coli, proteins that need disulfide bonds to be active are typically exported into the periplasm where disulfide bond formation and isomerization is catalyzed by the Dsb system, comprising DsbABCD and DsbG. Increased expression of the cysteine oxidase DsbA, the disulfide isomerase DsbC, or combinations of the Dsb proteins, which are all normally transported into the periplasm, has been utilized in the expression of heterologous proteins that require disulfide bonds (Makino et aL, Microb Cell Fact 10:32 (2011 )). It is also possible to express cytoplasmic forms of these Dsb proteins, such as a cytoplasmic version of DsbA (cDsbA) and/or of DsbC (cDsbC), that lacks a signal peptide and therefore is not transported into the periplasm. Cytoplasmic Dsb proteins such as cDsbA and/or cDsbC are useful for making the cytoplasm of the host cell more oxidizing and thus more conducive to the formation of disulfide bonds in proteins, including
heterologous proteins, produced in the cytoplasm. The host cell cytoplasm can also be made less reducing and thus more oxidizing by altering the thioredoxin and the glutaredoxin/glutathione enzyme systems directly: mutant strains defective in glutathione reductase (gor) or glutathione synthetase (gshB), together with a defective thioredoxin reductase (trxB), render the cytoplasm oxidizing. These strains are unable to reduce ribonucleotides and therefore cannot grow in the absence of exogenous reductant, such as dithiothreitol (DTT). Suppressor mutations (such as ahpC* and ahpCA, Lobstein et aL, Microb Cell Fact 11 :56 (2012)) in the gene ahpC, which encodes the peroxiredoxin AhpC, convert it to a disulfide reductase that generates reduced glutathione, allowing the channeling of electrons onto the enzyme ribonucleotide reductase and enabling the cells defective in gor and trxB, or defective in gshB and trxB, to grow in the absence of DTT. A different class of mutated forms of AhpC can allow strains, defective in the activity of gamma-glutamylcysteine synthetase (gshA) and defective in trxB, to grow in the absence of DTT; these include AhpC V164G, AhpC S71 F, AhpC E173/S71 F, AhpC E171Ter, and AhpC dupl62-169 (Faulkner et al., Proc Natl Acad Sci USA 105(18):6735-6740 (2008), Epub 2008 May 2). In such strains with oxidizing cytoplasm, exposed protein cysteines become readily oxidized in a process that is catalyzed by thioredoxins, in a reversal of their physiological function, resulting in the formation of disulfide bonds. Other proteins that may be helpful to reduce the oxidative stress effects in host cells of an oxidizing cytoplasm are HPI (hydroperoxidase I) catalase-peroxidase encoded by E. coli katG and HPII (hydroperoxidase II) catalase-peroxidase encoded by E. coli katE, which disproportionate peroxide into water and 02 (Farr and Kogoma, Microbiol Rev. 55(4):561 -585; (1991 )). Increasing levels of KatG and/or KatE protein in host cells can also be induced by coexpression or through elevated levels of constitutive expression.
[0121] The disclosure also contemplates the expression of the sulfhydryl oxidase Ervlp, derived from the inner membrane space of yeast mitochondria, in the host cell cytoplasm, which has been shown to increase the production of a variety of complex, disulfide-bonded proteins of eukaryotic origin in the cytoplasm of E. coli, even in the absence of mutations in gor or trxB (Nguyen et al, Microb Cell Fact 10:1 (201 1 )).
[0122] Host cells comprising expression constructs preferably also express cDsbA and/or cDsbC and/or Ervlp, are deficient in trxB gene function, and are also deficient in the gene function of either gor, gshB, or gshA. Optionally, the host cells have increased levels of katG and/or katE gene function, and express an appropriate mutant form of AhpC so that the host cells can be grown in the absence of dithiothreitol (i.e., DTT).
[0123] Cellular transport of cofactors. When using the expression systems of the disclosure to produce enzymes that require cofactors for function, it is helpful to use a host cell either capable of synthesizing the cofactor from available precursors, or capable of taking it up from the environment. Common cofactors include ATP, coenzyme A, flavin adenine dinucleotide (FAD), NAD NADH, and heme. Polynucleotides encoding cofactor transport polypeptides and/or cofactor synthesizing polypeptides can be introduced into host cells, and such polypeptides can be constitutively expressed, or inducibly co-expressed with the active gene products to be produced by methods of the disclosure.
[0124] Proteases. Host cells can have alterations in their ability to degrade expressed protein products because of the lack of or lowering of the activity of one or more proteases. Exemplary protease include, but are not limited to, Clp, CIpP, OmpT, Lon, FtsH, CIpX, CIpY, CIpA, CIpQ, CIpAP, CIpXP, CIpAXP, CIpYQ, CIpY, and the proteases encoded by yaeL, sppA; tldD, sprT, yhbU. ptrA, frvX, hyaD, hybD, hycH, envC, ddpX, degP, degQ, degS, hsIV, hsIU, pepB, pepP, sohB, yggG, pepE, pepN, pepQ, abgA, pepT, iadA, pepA, pepD, ptrB, ycaL ycbZ. yegQ, ygeY. ypdF, hyci sgcX, and htpX.
[0125] Glycosylation of polypeptide gene products. Host cells can have alterations in their ability to glycosylate polypeptides. For example, eukaryotic host cells can have eliminated or reduced gene function of the glycosyltransferase and/or oligo-saccharyltransferase genes, impairing the normal eukaryotic glycosylation of polypeptides to form glycoproteins. Prokaryotic host cells such as E. coli, which do not normally glycosylate polypeptides, can be altered to express a set of eukaryotic and prokaryotic genes that provide a glycosylation function (DeLisa et al., WO 2009/089154A2).
[0126] Available host cell strains with altered gene functions. To create preferred strains of host cells to be used in the expression systems and methods of the disclosure, it is useful to start with a strain that already comprises desired genetic alterations (See Table A of WO2017/106583, reproduced below).
Expression Constructs
[0127] Expression constructs are polynucleotides designed for the expression of one or more recombinant gene products of interest, and thus are not naturally occurring molecules. Any expression construct known in the art is contemplated for use in the cells and methods of the disclosure, including expression constructs that can be integrated into a host cell chromosome or maintained within the host cell as extra-chromosomal, independently replicating polynucleotide molecules, i.e., episomes having origins of replication independent of the host cell chromosome, such as plasmids or artificial chromosomes. Expression constructs according to the disclosure also can have one or more selectable markers to enable selection of those cells harboring the expression construct. Exemplary selectable markers confer resistance to antibiotics lethal to the host cell lacking that selectable marker or encode enzymes required to produce essential nutrients. Any selectable marker known in the art is contemplated for use in the expression constructs of the disclosure. Expression markers may also contain an inducible promoter to provide the ability to induce the expression of a coding region operably linked to that inducible promoter. Exemplary inducible promoters contemplated by the disclosure include the arabinose promoter (ParaBAD), ParaC, ParaE, the propionate promoter (PprpBCDE), the rhamnose promoter (PrhaSR), the xylose promoter (PxylA), the lactose promoter, and the alkaline phosphatase promoter. Additional information on contemplated inducible promoters, including the sequences thereof, is provided in WO 2016/205570, incorporated herein by reference in relevant part. In addition to inducible promoters, the disclosure comprehends expression constructs comprising constitutive promoters. To ensure that RNA transcribed from the expression construct is efficiently translated, the construct may also include a ribosome binding site (RBS). In prokaryotes in general (archaea and bacteria), the RBS consensus sequence is GGAGG or GGAGGU, and in bacteria such as E. coli, the RBS consensus sequence is further defined as AGGAGG or AGGAGGU. To facilitate incorporation of a coding region or gene of interest, the expression construct may include a multiple cloning site in which a variety of restriction endonuclease cleavage sites are clustered to provide flexibility in
incorporating exogenous polynucleotides, as is known in the art. Some expression constructs of the disclosure further include a coding region for a signal peptide or leader peptide, wherein the coding region is oriented to result in expression of fusion protein comprising the signal peptide and the active gene product of interest.
[0128] As mentioned above, inducible promoters are contemplated for use with the expression constructs to be introduced into the host cells according to the disclosure in order to achieve elevated expression of desired active gene products. Exemplary promoters are described herein and are also described in WO/2016/205570, incorporated herein by reference in relevant part. As described herein, the cells comprising one or more expression constructs may optionally include one or more inducible promoters to express a gene product of interest.
[0129] Chaperones are accessory proteins that assist the non-covalent folding or unfolding, and/or the assembly or disassembly, of other gene products, but do not occur in the resulting monomeric or multimeric gene product structures when the structures are performing their normal biological functions (having completed the processes of folding and/or assembly). Chaperones can be expressed from an inducible promoter or a constitutive promoter within an expression construct, or can be expressed from the host cell chromosome. Exemplary chaperones present in E. coli host cells are the folding factors DnaK/DnaJ/GrpE, DsbC/DsbG, GroEL/GroES, IbpA/IbpB, Skp, Tig (trigger factor), and FkpA, which have been used to prevent protein aggregation of cytoplasmic or periplasmic proteins. DnaK/DnaJ/GrpE, GroEL/GroES, and CIpB can function synergistically in assisting protein folding, and expression of these chaperones in various combinations has been shown to facilitate expression of properly folded gene product. When expressing eukaryotic proteins in prokaryotic host cells, a eukaryotic chaperone protein, such as protein disulfide isomerase (PDI) from the same or a related eukaryotic species, can be co-expressed, e.g., inducibly co-expressed, with the gene product of interest.
[0130] A chaperone that can be expressed in host cells is a protein disulfide isomerase from Humicola insolens, a soil hyphomycete (soft-rot fungus). An amino acid sequence of Humicola insolens PDI is shown as SEQ ID NO: 1 of WO2017/106583; it lacks the signal peptide of the native protein so that it remains in the host cell cytoplasm. The nucleotide sequence encoding PDI was optimized for expression in E. coli; the expression construct for PDI is shown as SEQ ID NO: 2 of WO2017/106583. SEQ ID NO: 2 of WO2017/106583 contains a GCTAGC Nhel restriction site at its 5' end, an AGGAGG ribosome binding site at nucleotides 7 through 12, the PDI coding sequence at nucleotides 21 through 1478, and a GTCGAC Sail restriction site at its
3' end. The nucleotide sequence of SEQ ID NO: 2 of WO2017/106583 was designed to be inserted immediately downstream of a promoter, such as an inducible promoter. The Nhel and Sail restriction sites in SEQ ID NO: 2 of WO2017/106583 can be used to insert it into a vector multiple cloning site, such as that of the pSOL expression vector (SEQ ID NO: 3 of WQ2017/106583), described in published US patent application US2015353940A1 , which is incorporated by reference in its entirety herein. Other PDI polypeptides can also be expressed in host cells, including PDI polypeptides from a variety of species (Saccharomyces cerevisiae (UniProtKB PI 7967), Homo sapiens (UniProtKB P07237), Mus musculus (UniProtKB P09103), Caenorhabditis elegans (UniProtKB Q 17770 and Q 17967), Arabdopsis thaliana (UniProtKB 048773, Q9XI01 , Q9S G3, Q9LJU2, Q9MAU6, Q94F09, and Q9T042), Aspergillus niger (UniProtKB Q12730) and also modified forms of such PDI polypeptides. A PDI polypeptide expressed in host cells of the disclosure can share at least 70%, or 80%, or 90%, or 95% amino acid sequence identity across at least 50% (or at least 60%, or at least 70%, or at least 80%, or at least 90%) of the length of SEQ ID NO: I of WO2017/106583, where amino acid sequence identity is determined according to Example 10 of WO2017/106583.
[0131] Assays to measure accessory protein activity include, PhyTip-based column target heterologous protein expression level quantification (phynexus.com/products/proteins/antibody- binding-phytip-columns/), flow cytometry-based ACE ASSAY™ measuring bound probe to properly folded target protein material (WO2021/146626), and/or an ELISA-based method HiPr bind assay (WO2021/163349), which measures fluorescence signal in a plate-based format of probes binding to properly folded target protein. These methods measure the increase in target protein production in the presence of the accessory protein compared to the production level in its absence. The increase can be at least 1 .5-fold, at least two-fold, at least three-fold, at least four-fold, at least five-fold, at least six-fold, at least seven-fold, at least eight-fold, at least ninefold, at least ten-fold, at least twenty-fold, at least fifty-fold, at least one hundred-fold, or greater.
Biomolecules
[0132] As described herein, the present disclosure provides methods for expressing and/or producing and/or purifying biomolecules of interest, wherein the biomolecule of interest can be a protein, a RNA, and a RNA-DNA hybrid. In some embodiments, the biomolecule is a RNA selected from the group consisting of ncRNA, tRNA, rRNA, snRNA, snoRNA, miRNA, mRNA, and TERC. In still other embodiments, the biomolecule is a protein selected from the group consisting of a therapeutic protein, an antibody, an enzyme, a ligand, an antigen, a growth factor, a receptor, a nucleic acid-binding protein, as well as fragments, analogs and fusions of
any of the aforementioned biomolecules and as described herein. In still other embodiments, the biomolecule of interest is a biopolymer, a chemical, a drug, a flavor modifier, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, or a sugar alcohol.
[0133] Antibodies
[0134] The term “antibody” as used herein refers to whole antibodies that interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) an epitope on a target antigen. A naturally occurring "antibody" is a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds. Each heavy chain is comprised of a heavy chain variable region (abbreviated herein as VH) and a heavy chain constant region. The heavy chain constant region is comprised of three domains, CH1 , CH2 and CH3. Each light chain is comprised of a light chain variable region (abbreviated herein as VL) and a light chain constant region. The light chain constant region is comprised of one domain, CL. The VH and VL regions can be further subdivided into regions of hypervariability, termed complementarity determining regions (CDR), interspersed with regions that are more conserved, termed framework regions (FR). Each VH and VL is composed of three CDRs and four FRs arranged from amino-terminus to carboxy-terminus in the following order: FR1 , CDR1 , FR2, CDR2, FR3, CDR3, FR4. The variable regions of the heavy and light chains contain a binding domain that interacts with an antigen. The constant regions of the antibodies may mediate the binding of the immunoglobulin to host tissues or factors, including various cells of the immune system (e.g., effector cells) and the first component (Clq) of the classical complement system. The term “antibody” includes for example, monoclonal antibodies, human antibodies, humanized antibodies, camelised antibodies, chimeric antibodies, single-chain Fvs (scFv), disu If ide-lin ked Fvs (sdFv), Fab fragments, F (ab') fragments, and anti-idiotypic (anti-ld) antibodies (including, e.g., anti-ld antibodies to antibodies of the invention), and epitope-binding fragments of any of the above. The antibodies can be of any isotype (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., lgG1 , lgG2, lgG3, lgG4, lgA1 and lgA2) or subclass. The antibody or epitope-binding fragments may be, or be a component of, a multi-specific molecule.
[0135] Both the light and heavy chains are divided into regions of structural and functional homology. The terms “constant” and “variable” are used functionally. In this regard, it will be appreciated that the variable domains of both the light (VL) and heavy (VH) chain portions determine antigen recognition and specificity. Conversely, the constant domains of the light chain (CL) and the heavy chain (CH1 , CH2 or CH3) confer important biological properties such
as secretion, transplacental mobility, Fc receptor binding, complement binding, and the like. By convention the numbering of the constant region domains increases as they become more distal from the antigen binding site or amino-terminus of the antibody. The N-terminus is a variable region and at the C-terminus is a constant region; the CH3 and CL domains actually comprise the carboxy-terminus of the heavy and light chain, respectively.
[0136] The phrase “antibody fragment”, as used herein, refers to one or more portions of an antibody that retain the ability to specifically interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) a target epitope. Examples of binding fragments include, but are not limited to, a Fab fragment, a monovalent fragment consisting of the VL, VH, CL and CH1 domains; a F(ab)2 fragment, a bivalent fragment comprising two Fab fragments linked by a disulfide bridge at the hinge region; a Fd fragment consisting of the VH and CH1 domains; a Fv fragment consisting of the VL and VH domains of a single arm of an antibody; a dAb fragment (Ward et al., (1989) Nature 341 :544-546), which consists of a VH domain; and an isolated complementarity determining region (CDR). Furthermore, although the two domains of the Fv fragment, VL and VH, are coded for by separate genes, they can be joined, using recombinant methods, by a synthetic linker that enables them to be made as a single protein chain in which the VL and VH regions pair to form monovalent molecules (known as single chain Fv (scFv); see e.g., Bird et al., (1988) Science 242:423-426; and Huston et al., (1988) Proc. Natl. Acad. Sci. 85:5879-5883). Such single chain antibodies are also intended to be encompassed within the term “antibody fragment”. These antibody fragments are obtained using conventional techniques known to those of skill in the art, and the fragments are screened for utility in the same manner as are intact antibodies.
[0137] As described herein, antibodies may include biologically active derivatives or variants or fragments. As used herein "biologically active derivative" or "biologically active variant" includes any derivative or variant of an antibody having substantially the same functional and/or biological properties of said antibody (e.g., a WT antibody), such as binding properties, and/or the same structural basis, such as a peptidic backbone or a basic polymeric unit, including framework regions. As described herein, “biologically active biomolecules” additionally includes, in some embodiments, proteins that are enzymatically active and/or properly folded, and/or that exhibit a desired affinity to an antigen or ligand and/or exhibit stability under specific conditions such as temperature (e.g., temperature sensitivity/stability). As described herein, one or more of the aforementioned properties can be determined/measured from techniques known in the art including, for example, HiPrBind assays.
[0138] An “analog,” such as a “variant” or a “derivative,” is an antibody substantially similar in structure and having the same biological activity, albeit in certain instances to a differing degree, to a naturally-occurring antibody or a WT antibody or another reference antibody as will be understood by those of skill in the art. For example, an antibody variant refers to an antibody sharing substantially similar structure and having the same biological activity as a reference antibody. Variants or analogs differ in the composition of their amino acid sequences compared to the reference antibody from which the analog is derived, based on one or more mutations involving (i) deletion of one or more amino acid residues at one or more termini of the antibody and/or one or more internal regions of the antibody sequence (e.g., fragments), (ii) insertion or addition of one or more amino acids at one or more termini (typically an “addition” or “fusion”) of the antibody and/or one or more internal regions (typically an “insertion”) of the antibody sequence or (iii) substitution of one or more amino acids for other amino acids in the antibody sequence. By way of example, a “derivative” is a type of analog and refers to an antibody sharing the same or substantially similar structure as a reference antibody that has been modified, e.g., chemically.
[0139] In some embodiments, the variants or sequence variants are mutants wherein 1 , 2, 3, 4, 5, 6 or more amino acids within one or more CDR are mutated relative to a reference antibody. In some embodiments, CDRs on the light chain, heavy chain, or both heavy and light chain, are mutated. In some embodiments, one or more framework amino acid residues are mutated relative to a reference antibody.
[0140] In substitution variants, one or more amino acid residues, e.g., in a CDR region, of an antibody are removed and replaced with alternative residues. In one aspect, the substitutions are conservative in nature and conservative substitutions of this type are well known in the art. Alternatively, the disclosure embraces substitutions that are also non-conservative. Exemplary conservative substitutions are described in Lehninger, [Biochemistry, 2nd Edition; Worth Publishers, Inc., New York (1975), pp.71 -77],
[0141] Antibodies contemplated herein include full-length antibodies, biologically active subunits or fragments of full length antibodies, as well as biologically active derivatives and variants of any of these forms of therapeutic proteins. Thus, antibodies include those that (1 ) have an amino acid sequence that has greater than about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98% or about 99% or greater amino acid sequence identity, over a region of at least about 25, about 50, about 100, about 200, about 300, about 400, or
more amino acids, to a reference antibody (e.g., encoded by a referenced nucleic acid or an amino acid sequence described herein). According to the present disclosure, the term "recombinant protein" or “recombinant antibody” includes any protein obtained via recombinant DNA technology. In certain embodiments, the term encompasses antibodies as described herein.
[0142] In some embodiment, the antibodies or antibody variants described herein are expressed from one or more expression construct and/or in a cell or strains as described herein.
Exemplary wild-type or reference antibodies include commercially available or other known antibodies, including therapeutic monoclonal antibodies. Reference antibodies according to the present disclosure may include any antibodies now known or later developed, including those that are not clinically and/or commercially available.
[0143] As used herein and in the appended claims, the singular forms "a," "and," and "the" include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any element, e.g., any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements, or use of a "negative" limitation.
[0144] When a range of values is provided herein, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
[0145] The terms “identical” and "identity" as used herein refer to a relationship between the sequences of two or more polypeptide molecules or two or more nucleic acid molecules, as determined by aligning and comparing the sequences. "Percent identity" means the percent of identical residues between the amino acids or nucleotides in the compared molecules and is calculated based on the size of the smallest of the molecules being compared. For these calculations, gaps in alignments (if any) must be addressed by a particular mathematical model
or computer program (/.e., an "algorithm"). Methods that can be used to calculate the identity of the aligned nucleic acids or polypeptides are standard in the art. Methods can include those described in Computational Molecular Biology, (Lesk, Ed.), 1988, New York: Oxford University Press; Biocomputing Informatics and Genome Projects, (Smith, Ed.), 1993, New York: Academic Press; Computer Analysis of Sequence Data, Part I, (Griffin and Griffin, Eds.), 1994, New Jersey: Humana Press; Sequence Analysis in Molecular Biology, (von Heinje), 1987, New York: Academic Press; Sequence Analysis Primer, (Gribskov and Devereux, Eds.), 1991 , New York: M. Stockton Press; and Carillo et al., SIAM J. Applied Math., 48 1073 (1988).
[0146] In calculating percent identity, the sequences being compared are aligned in a way that gives the largest match between the sequences. An exemplary computer program used to determine percent identity is the GCG program package, which includes GAP (Devereux et al., Nucl Acid Res, 12 387 (1984); Genetics Computer Group, University of Wisconsin, Madison, Wise.). The computer algorithm GAP is used to align the two polypeptides or polynucleotides for which the percent sequence identity is to be determined. The sequences are aligned for optimal matching of their respective amino acid or nucleotide (the "matched span", as determined by the algorithm). A gap opening penalty (which is calculated as 3. times, the average diagonal, wherein the "average diagonal" is the average of the diagonal of the comparison matrix being used; the "diagonal" is the score or number assigned to each perfect amino acid match by the particular comparison matrix) and a gap extension penalty (which is usually 1/10 times the gap opening penalty), as well as a comparison matrix such as PAM 250 or BLOSUM 62 are used in conjunction with the algorithm. A standard comparison matrix [e.g., Dayhoff et al., Atlas of Protein Sequence and Structure, 5:345-352 (1978) for the PAM 250 comparison matrix;
Henikoff et al., Proc. Natl. Acad. Sci. USA, 89 10915-10919 (1992) for the BLOSUM 62 comparison matrix] can also be used by the algorithm.
[0147] Recommended parameters for determining percent identity for polypeptides or nucleotide sequences using the GAP program are the following: Algorithm: Needleman etal., J. Mol. Bio!., 48:443-453 (1970); Comparison matrix: BLOSUM 62 from Henikoff et al., 1992, supra-, Gap Penalty: 12 (but with no penalty for end gaps); Gap Length Penalty: 4; Threshold of Similarity: 0.
[0148] Certain alignment schemes for aligning two amino acid sequences can result in matching of only a short region of the two sequences, and this small aligned region can have very high sequence identity even though there is no significant relationship between the two full- length sequences. Accordingly, the selected alignment method (GAP program) can be adjusted
if so desired to result in an alignment that spans at least 50 contiguous amino acids of the target polypeptide.
[0149] Other exemplary programs that compare and align pairs of sequences include, but are not limited to, ALIGN (Myers and Miller, Comput Appl Biosci, 19, 4(1 ): 1 1-17 (1988), FASTA (Pearson and Lipman, Proc Natl Acad Sci USA, 85(8): 2444-2448 (1988); Pearson, Methods Enzymol, 183: 63-98 (1990) and gapped BLAST (Altschul et a!., Nucleic Acids Res, 25(17):3389- 40 (1997), BLASTP, BLASTN, or GOG (Devereux et al., Nucleic Acids Res, 12(1 Pt 1):387-95 (1984).
[0150] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure.
[0151] All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials for the purpose for which the publications are cited.
[0152] As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure. Any recited method can be carried out in the order of events recited or in any other order which is logically possible. This disclosure is intended to provide support for all such combinations.
[0153] As used herein, “can comprise” or “can be” indicates something envisaged by the inventors that is functional and available as part of the subject matter provided.
[0154] While the following examples describe specific embodiments, variations and modifications will occur to those skilled in the art. Accordingly, only such limitations as appear in the claims should be placed on the invention.
Examples
Media optimization using a genetic algorithm
[0155] The present Example describes the use of a genetic algorithm (GA) to identify improved media formulations that resulted in several-fold higher titers of an antibody fragment produced in SoluPro™ E. coli (See, e.g., WO/2014/025663 and WO/2017/106583). Python software was created to execute the GA, and to output random pipetting volumes to a .csv file
for 60 mixtures of 14 individual media components (sugar, nitrogen, salt, and trace metals). The .csv file was uploaded to a liquid handling robot to execute pipetting of each of the 14 media components into 60 wells of a small-scale plate-based bioreactor system with pH control and feeding ability. These 60 unique media were inoculated with a SoluPro E.coli engineered to produce a monoclonal Fab. Broth was harvested after 48h and measured by HiPrBind™, an ELISA-type assay that specifically binds properly folded, functional target protein and yields signal corresponding to the amount of target protein present. Individuals with the highest HiPrBind™ signal from the initial round were selected as parents for a subsequent round of mixture, and an evolutionary approach to recombination of their pipetting volumes was used to produce another 60 mixtures. This was repeated twice, yielding populations with improved antibody Fab titers on average 25% higher than the mean of the initial mixtures. One condition was scaled to a stirred bioreactor with >25% improvement over a conventionally optimized process using the same strain. A comprehensive RNAseq set was collected to characterize the host’s transcriptional response. Independent Component Analysis (ICA) of the RNAseq data set revealed independently modulated gene sets (iModulons) that characterized the host response to different media formulations.
[0156] Software
[0157] A genetic algorithm (GA) was developed using the DEAP library in Python to identify the media formulations that resulted in highest product titers. In this GA, each individual media composition is represented by a vector of media component concentrations. Concentrations can either be discrete or continuous, and each component is designated a minimum and maximum concentration. At the start of the GA run, a random pool was generated of 60 individual media with -900 pipetting events performed on a Hamilton liquid handling robot. These media compositions were tested in the BioLector Pro (Beckman Coulter) to measure product formation. In each subsequent round of the GA, a cyclical select-reproduce-mutate-cull process was followed as is common with a (mu, lambda) evolutionary strategy. This means that children replace the parents.
[0158] For selection: A tournament selection process was used to determine the parent individual for the subsequent generation. In this process, five individuals were randomly sampled from the population and the individual with the highest fitness was selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size.
[0159] For mutation: In the mutation step, simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%. In a simple mutation, we change the concentration of one component in the formulation to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.
[0160] In addition, physical constraints were added to the final populations before measurements. First, the total carbon and nitrogen volumes were each limited to 80 pL. If an individual had a total carbon or nitrogen volume above this amount, all carbon (or nitrogen) components were rescaled such that the total was 80 pL. Similarly, if the total volume of all reagents in the media was above 800 pL, all components were rescaled so that the media volume was 800 pL.
[0161] Cultivation
[0162] The fermentations took place simultaneously in two BioLector Pro instruments in a microfluidic 32 well FlowerPlate that measures and logs pH and 02 concentration using optodes, oxygenates by shaking and controls temperature at programmed setpoints. Two wells were reserved on each plate for control medium. Sixty wells were used during each round to test mixture outputs from the Python GA program. For these GA fermentations, a 1 -sided pH control process was applied using 5% ammonium hydroxide. At the start of each fermentation, the pH was adjusted to 6.6 for each chromosome (media condition) in each well if necessary. The Biolector Pro has several substrate addition options including constant, linear, exponential, and signal triggered. For this Example, signal-triggered feeding was used. Any instrument- logged signal can be chosen (i.e. Biomass, pH, DO) to trigger microfluidic pumps to add a specified volume. For these experiments, a pH trigger was used; pH was maintained at 6.6 with a trigger at 6.65. A value above 6.65 triggers addition of 5.5 pL of substrate consisting of a carbon source and inducers for both the protein of interest and accessory molecules to promote proper folding and solubility of the Fab. The control program includes the options to have a block and a pause. Block time is defined as the time a trigger condition must be continuously met before the trigger is activated. Pause time is defined as the time that must elapse after a trigger was activated before it can be activated again. For the study, 15 minutes was chosen for both. An initial 4-hour pause was used before enabling feed triggering. To initiate a bolus of feed, the pH set point must remain above 6.65 for fifteen minutes before feed is added as a bolus. After the bolus, the program monitors the pH for 15 minutes before a subsequent bolus may be added. These values were chosen for three reasons: 1 ) to prevent noisy signals from
triggering a bolus 2) to prevent overfeeding, especially with media mixtures that might raise the pH of the well early in the run and 3) to set a growth rate ceiling so the cells don't go into oxygen limitation too quickly. Lastly, the shaking frequency, oxygen concentration, and humidity were kept constant throughout the 48-hour fermentation at 1100 rpm, 35%, and 85%, respectively.
Temperature was set at 32eC for the first twelve hours after inoculation then reduced to 26eC for the remainder of the fermentation.
[0163] Titer measurement
[0164] The HiPrBind™ assay was used to assess performance of experimental samples. HiPrBind™ is an ELISA-type assay that specifically binds properly folded, functional target protein and produces a signal corresponding to the amount of target protein present (WO2021/163349 and described herein). To measure properly folded, active target protein, two analyte associating moieties were used in combination with a signal donor and an activatable compound.
[0165] In this binding cascade, one moiety forms a complex with the signal donor, and the other moiety forms a complex with an activatable compound. If properly folded, active target material is present, the signal donor complex and the activatable compound complex will both bind and provide detectable output. If properly folded, active target material is not present, the complexes do not associate, and no signal is produced.
[0166] RNAseq
[0167] To stabilize RNA between sample collection and RNA isolation, culture was sampled directly into 3x volume of RNAIater (Thermo Fisher). The culture and RNAIater mixture were spun down, supernatant was removed, followed by resuspension of the pellet in 3x volume of RNAIater for complete quenching. A volume of 10-50 pL of the treated culture was mixed with >300 pL of Trizol. RNA was then isolated using Direct-zol-96 Magbead RNA (Zymo Research, PN R2100), according to manufacturer’s protocol and including the optional DNase I treatment.
[0168] Sequencing libraries were prepared from extracted RNA with Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Illumina, PN 20040529), according to the manufacturer’s protocol. This prep kit depletes ribosomal RNA, reverse transcribes remaining RNA into cDNA (complementary DNA), then ligation and subsequent amplification adds adapters and dual-indexes for multiplex sequencing on an Illumina instrument. Libraries were normalized, pooled, and then diluted to 750 pM for 2x75bp sequencing on an Illumina Nextseq 1000 using P2 Reagents (200 cycle, v3) (Illumina, PN 20046812).
[0169] iModulon/PCA
[0170] Conducting independent component analysis on gene expression data
[0171] The Scikit-learn (vO.23.2) (Pedregosa F. et al., J Mach Learn Res. 2011 ; 12:2825-30) implementation of FastICA (Hyvarinen A., IEEE Trans Neural Netw. 1999;10:626-34) was executed 100 times with random seeds and a convergence tolerance of 10-7. The resulting independent components (ICs) were clustered using DBSCAN (Ester M, et al., In: Kdd 1996; p. 226-31 ) to identify robust ICs, using an epsilon of 0.1 and minimum cluster seed size of 50. To account for identical components with opposite signs, the following distance metric was used for computing the distance matrix:
[0172] dx,y=1 — ||px,y||
[0173] where px,y is the Pearson correlation between components x and y. The final robust ICs were defined as the centroids of the cluster. Again, to account for identical components with opposite signs, we choose one component as the canonical direction, and flip all other components in the cluster to ensure that the Pearson correlation is positive between all members of the cluster before computing the centroid.
[0174] Identifying significant genes in an independent component
[0175] To perform regulator enrichments on components, genes with significantly high weightings must be identified. To keep this method agnostic to the prior regulatory structure, the Scikit-learn (Pedregosa F, et al., J Mach Learn Res. 2011 ;12:2825-30) implementation of K- means clustering was applied to the absolute values of the gene weights in each independent component. All genes in the top two clusters were deemed significant, and the set of significant genes in each independent component was called the iModulon.
[0176] Associating iModulons to regulators
[0177] The set of significant genes in each component, or the iModulon, was compared to each regulon by the two-sided Fisher’s exact test (FDR < 10-5) to determine regulator enrichment. F1 -scores were calculated to evaluate these associations. The F1 -score is the harmonic average of precision and recall between a component and its linked regulon.
Precision is the proportion of genes in the component that are present in the associated regulon and recall is the proportion of genes in the regulon that are present in the associated component. Prior information about regulator binding sites was borrowed from previous studies
(Sastry AV, et al., Nat Common. 2019;10:5536; Rychel K, et al., Nat Common. 2020;11 :6338; and Poodel S, et al., Proc Natl Acad Sci USA. 2020;117:17228-39).
[0178] FIG. 3, shows Fab Expression in SoloPro E.coli. Random mixtores (light grey) versos mixtores evolved throogh two roonds of evolotionary selection by the genetic algorithm (dark grey) are shown versos control conditions (diagonal hatch). The mean relative expression of the evolved mixtores average 25% higher than onevolved mixtores after 2 roonds of the GA.
[0179] As shown in FIG. 4, the HiPrBind™ (HPB) signal significantly increased between before (Roond 0) and after (Roond 2) the genetic algorithm. The mean signal increased 1 .9 fold between the initial and final popolations.
[0180] As shown in FIG. 5, gene expression after dimensionality redaction (left plot) by principal component analysis of the RNAseq data shows convergence on an area that corresponds to the highest HiPrBind™ signal (right plot).
[0181] As shown in FIG. 6, scatter plots of iModolon data have been converted to bins based on bins with the Y axis representing HiPrBind™ (HPB) signal and the Y axis the freqoency per bin of the particolar iModolon being expressed. In plots of FIGs. 6A and 6B, the iModolons represent the stress response sigma factors RpoS and RpoH respectively. The general trend shows that lower stress response (RpoS) and less protein misfolding (RpoH) is associated with higher HiPrBind™ signal. In FIG. 6C, starvation of leocine is shown to trend towards higher HiPrBind™ signal, soggesting addition of leocine may improve protein expression. In FIG. 6D, DksA, a ribosomal protein sobonit regolator, soggests that higher levels of ribosomes trend to higher HiPrBind™ signal. FIG. 6E, Cbl, is an iModolon associated with solfor metabolism. High HiPrBind™ signals here may indicate a state of solfor starvation. FIG. 6F and 6G show iModolon signal related to iron metabolism. For is opregolated doring iron starvation and the iron-solfor closter regolator IscR is also trends to a state of opregolation when HiPrBind™ signals are higher.
[0182] For the first time, a genetic algorithm approach to optimize media was coopled with pipetting execoted by liquid handling robot into a micro-fermentation plate system with pH control and feeding, followed by RNAseq and iModolon analysis. This discovery shows the power of combining GA with iModolon for prodoction of protein, yielding both improved media composition and an indication of aspects of the onderlying mechanisms for the improvement. The iModolon analysis may also inform farther improvements, soch as in this case sopplementing leocine, methionine, or cysteine, redocing iron concentrations, etc. This media
optimization approach can be applied, in various embodiments, to biological drugs such as monoclonal antibodies and antibody fragments, enzymes, edible proteins, and metabolites produced by microbes or other production hosts. This mixture was scaled into a stirred bioreactor, demonstrating that mixtures identified by GA in a microplate with pH control and feeding can be quickly scaled to industrial production.
[0183] Scale-Up of Genetic Algorithms
[0184] FIG. 7A depicts scale-up, purification and characterization of the purified material, according to some aspects, including scaling-up, transcriptomic analysis and analytical characterization of purified material.
[0185] FIG. 7B depicts integrated OUR and CER, according to some aspects.
[0186] FIG. 70 depicts that scaling up from microplates to bioreactors increases quality by a high percentage (e.g., 111 % with doubling of carbon), according to some aspects.
[0187] FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits).
[0188] FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects.
[0189] FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects.
[0190] FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects.
[0191] FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects.
[0192] FIG. 7I depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects.
[0193] FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to OHO produced Mab, according to some aspects. FIG. 7J shows that the genetic algorithm-based media optimization results in Mab (e.g, produced in SoluPro Ecoli) that is structurally similar to CHO produced Mab.
[0194] Downstream Purification and Characterization of GA-Generated Material
[0195] FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects.
[0196] FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects.
[0197] FIG. 8C depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects.
[0198] Structural Assay of Antibodies
[0199] FIG. 9A depicts an exemplary structural characterization environment, according to some aspects.
[0200] FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects.
[0201] FIG. 9C depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects.
[0202] iModulons
[0203] FIG. 10A depicts iModulon gene sets, according to some aspects. FIG. 10A shows the principal component analysis (PCA) of the RNAseq data and quality. Principal component analysis of RNAseq data (left) reduces ~4000-dimensional gene expression.
[0204] FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects. iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titers.
[0205] FIG.s 11 A-11 D depict iModulon activity, according to some aspects. In FIGs. 11 A-D, RNAseq data was processed and into grouds of functionally co-regulated genesets known as iModulons. The iModulon analysis was collected from the RNAseq data of multiple strains (FIG.
1 1 A-11 D), media or process conditions and OxyR is a regulator of antioxidants genes and is an indicator.
[0206] In FIG. 11 A, OxyR expression was used to assess the different strains engineered to improve their oxidative stress. In FIG. 1 1 B, PhoB is an iModulon of phosphate metabolism. In FIG. 11 C, PhoP is an iModulon of metal homeostatis and Fur, in FIG. 11 D, is an iModulon of
iron uptake. These iModulons may be used to improve the process conditions by supplementing respective nutrient to make process imporvments over the course of cultivation in Bioreactors.
[0207] FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects. iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titer. The fitness score of the strain G with strain improvement insights obtained from iModulon analysis results in a strain with an improved quality fitness score. Thus GA media was used as a benchmark and incorporate the benefits of the GA media into the strain.
[0208] Exemplary Source Code
[0209] As discussed herein, aspects of the present techniques may be implemented using source code in suitable programming languages, such as Python, Java, C++, etc. For example following are source code listings that may be implemented, in some aspects:
# Codon optimization from abscijibrary import codon optimizer library = codon_optimizer.reverse_translate(library) library.to_csv(‘covid_antibody_designs.csv’) library. to_web_lab(assay=’ACE’)
# Lead optimization model from absci import lead_opt_model lead_optimizer = lead_opt_model.load_latest() library.naturalneses = lead_optimizer.naturalneses(library) lead_optimizer.optimize(library).to_wet_lab(assay=’SPR’)
[0001] The various embodiments described herein can be combined to provide further embodiments. All U.S. patents, U.S. patent application publications, U.S. patent application, foreign patents, foreign patent application and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified if necessary to employ concepts of the various patents, applications, and publications to provide yet further embodiments.
[0002] These and other changes can be made to the embodiments in light of the abovedetailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
[0003] Aspects of the techniques described in the present disclosure may include any of the following aspects, either alone or in combination:
[0004] 1 . A method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest, said method comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.
[0005] 2. A method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.
[0006] 3. A method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of
(3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.
[0007] 4. The method of any one of aspects 1-3, optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.
[0008] 5. The method of any one of aspects 1-4, wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm.
[0009] 6. The method of aspect 5, wherein the high-throughput device is selected from the group consisting of a liquid-handling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.
[0010] 7. The method of any one of aspects 1-6, wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multi-well plates, and one or more multi-well or multi-vessel bioreactor systems.
[0011] 8. The method of aspect 7, wherein the multi-well plate comprises 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells.
[0012] 9. The method of any one of aspects 1-8, wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factors, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.
[0013] 10. The method of aspect 9, wherein the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.
[0014] 11 . The method of any one of aspects 1-10, wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking,
constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.
[0015] 12. The method of any one of aspects 1-11 , wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.
[0016] 13. The method of aspect 12, wherein the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitopebinding fragments of any of the above.
[0017] 14. The method of any one of aspects 1-13, wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest.
[0018] 15. The method of any one of aspects 1-14, further comprising measuring cell growth.
[0019] 16. The method of any one of aspects 1-15, wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA.
[0020] 17. The method of any one of aspects 1-16, wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.
[0021] 18. The method of any one of aspects 1-17, wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.
[0022] 19. The method of aspect 18, wherein the cells are bacterial cells.
[0023] 20. The method of aspect 19, wherein the bacterial cells are E. coli cells.
[0024] 21 . The method of aspect 20, wherein the E.coli cells comprise one or more or all of:
(a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter; (b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter; (c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm; (e) a reduced level of gene function of a gene that encodes a reductase; (f) at least one expression construct encoding at least one disulfide bond isomerase protein; (g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or (h) at least one polynucleotide encoding Ervlp.
[0025] 22. A method of producing a biomolecule of interest comprising culturing a host cell comprising an expression construct encoding the biomolecule of interest in an optimized cell media formulation as determined by the method of aspect 1 .
[0026] 23. A computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.
[0027] 24. A computer-implemented method for improving quality of monoclonal antibodies, comprising: (i) performing a media optimization using a genetic algorithm workflow; (ii)
harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) selecting a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) scaling one or more candidates to a biorecator scale; (v) collecting RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) performing a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
[0028] 25. The computer-implemented method of aspect 24, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.
[0029] 26. The computer-implemented method of any of aspects 24-25, further comprising: repeating step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.
[0030] 27. A computing system for improving quality of monoclonal antibodies, comprising: a bioreactor, one or more processors, and one or more memories having stored thereon comptuer-executable instructions that, when executed by the one or more processors, cause the system to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
[0031] 28. The computing system of aspect 27, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.
[0032] 29. The computing system of any of aspects 27-28, the one or more memories including further instructions that, when executed, cause the system to: repeat step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.
[0033] 30. A non-transitory computer-readable media having stored thereon computerexecutable instructions that, when executed, cause a computer to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
Claims
1 . A method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest, said method comprising:
(1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions;
(2) culturing cells in the mixture matrix of (1); and
(3) measuring two or more or all of the following:
(a) the amount of biomolecules of interest expressed;
(b) at least one gene that is transcribed; and
(c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.
2. A method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising:
(1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions;
(2) culturing cells in the mixture matrix of (1); and
(3) measuring two or more or all of the following:
(a) the amount of biomolecules of interest expressed;
(b) at least one gene that is transcribed; and
(c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.
3. A method of increasing the yield of biomolecule expression in a cell culture system, said method comprising:
(1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions;
(2) culturing cells in the mixture matrix of (1); and
(3) measuring two or more or all of the following:
(a) the amount of biomolecules of interest expressed;
(b) at least one gene that is transcribed; and
(c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.
4. The method of any one of claims 1 -3, optionally further comprising the steps of:
(1 ) identifying multiple optimized cell media formulations;
(2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation;
(3) culturing cells in the mixture of (2); and
(4) measuring two or more or all of the following:
(a) the amount and/or quality of biomolecules of interest expressed;
(b) at least one gene that is transcribed; and
(c) at least one independently modulated gene set.
5. The method of any one of claims 1 -4, wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm.
6. The method of claim 5, wherein the high-throughput device is selected from the group consisting of a liquid-handling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.
7. The method of any one of claims 1 -6, wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multi-well plates, and one or more multi-well or multi-vessel bioreactor systems.
8. The method of claim 7, wherein the multi-well plate comprises 6, 12, 24, 32, 48,
64, 96, 384, or 1 ,536 wells.
9. The method of any one of claims 1 -8, wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factors, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.
10. The method of claim 9 wherein the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.
11 . The method of any one of claims 1 -10, wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.
12. The method of any one of claims 1 -11 , wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.
13. The method of claim 12, wherein the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked
Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitopebinding fragments of any of the above.
14. The method of any one of claims 1 -13, wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest.
15. The method of any one of claims 1 -14, further comprising measuring cell growth.
16. The method of any one of claims 1 -15, wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA.
17. The method of any one of claims 1 -16, wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.
18. The method of any one of claims 1 -17, wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.
19. The method of claim 18, wherein the cells are bacterial cells.
20. The method of claim 19, wherein the bacterial cells are E. coli cells.
21 . The method of claim 20, wherein the E.coli cells comprise one or more or all of:
(a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter;
(b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter;
(c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter;
(d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm;
(e) a reduced level of gene function of a gene that encodes a reductase;
(f) at least one expression construct encoding at least one disulfide bond isomerase protein;
(g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or
(h) at least one polynucleotide encoding Ervlp.
22. A method of producing a biomolecule of interest comprising culturing a host cell comprising an expression construct encoding the biomolecule of interest in an optimized cell media formulation as determined by the method of claim 1 .
23. A computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following:
(a) an amount of a biomolecule produced in the inoculated matrix,
(b) at least one arrangement that is transcribed, and
(c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.
24. A computer-implemented method for improving quality of monoclonal antibodies, comprising:
(i) performing a media optimization using a genetic algorithm workflow;
(ii) harvesting broth at 68h, by purifying and measuring by an iCIEF assay;
(iii) selecting a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;
(iv) scaling one or more candidates to a bioreactor scale;
(v) collecting RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and
(vi) performing a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
25. The computer-implemented method of claim 24, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.
26. The computer-implemented method of claim 24, further comprising: repeating step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.
27. A computing system for improving quality of monoclonal antibodies, comprising: a bioreactor, one or more processors, and one or more memories having stored thereon comptuer-executable instructions that, when executed by the one or more processors, cause the system to:
(i) perform a media optimization using a genetic algorithm workflow;
(ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay;
(iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;
(iv) receive data corresponding to scaling one or more candidates to a bioreactor scale;
(v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and
(vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
28. The computing system of claim 27, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.
29. The computing system of claim 27, the one or more memories including instructions that when executed, cause the system to: repeat step (ii) one or more times, to yield populations with improved MAb quality in comparison to control media.
30. A non-transitory computer-readable media having stored thereon computerexecutable instructions that, when executed, cause a computer to:
(i) perform a media optimization using a genetic algorithm workflow;
(ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay;
(iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;
(iv) receive data corresponding to scaling one or more candidates to a bioreactor scale;
(v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and
(vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263395208P | 2022-08-04 | 2022-08-04 | |
US63/395,208 | 2022-08-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024030344A1 true WO2024030344A1 (en) | 2024-02-08 |
Family
ID=89849784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/029001 WO2024030344A1 (en) | 2022-08-04 | 2023-07-28 | Genetic algorithm and imodulon based optimization of media formulation for quality, titer, strain, and process improvement biologics |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024030344A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8652817B2 (en) * | 2005-07-01 | 2014-02-18 | Univeristy Of Florida Research Foundation, Inc. | Recombinant host cells and media for ethanol production |
US20160326482A1 (en) * | 2004-11-18 | 2016-11-10 | The Regents Of The University Of California | Apparatus and methods for manipulation and optimization of biological systems |
-
2023
- 2023-07-28 WO PCT/US2023/029001 patent/WO2024030344A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160326482A1 (en) * | 2004-11-18 | 2016-11-10 | The Regents Of The University Of California | Apparatus and methods for manipulation and optimization of biological systems |
US8652817B2 (en) * | 2005-07-01 | 2014-02-18 | Univeristy Of Florida Research Foundation, Inc. | Recombinant host cells and media for ethanol production |
Non-Patent Citations (1)
Title |
---|
MUNROE, S. ET AL.: "Genetic algorithm as an optimization tool for the development of sponge cell culture media", VITRO CELLULAR & DEVELOPMENTAL BIOLOGY - ANIMAL, vol. 55, 2019, pages 149 - 158, XP036721860, DOI: 10.1007/s11626-018-00317-0 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Current status and applications of adaptive laboratory evolution in industrial microorganisms | |
von Borzyskowski et al. | An engineered Calvin-Benson-Bassham cycle for carbon dioxide fixation in Methylobacterium extorquens AM1 | |
Irla et al. | Genome-based genetic tool development for Bacillus methanolicus: theta-and rolling circle-replicating plasmids for inducible gene expression and application to methanol-based cadaverine production | |
Korneli et al. | Getting the big beast to work—systems biotechnology of Bacillus megaterium for novel high-value proteins | |
Aldor et al. | Proteomic profiling of recombinant Escherichia coli in high-cell-density fermentations for improved production of an antibody fragment biopharmaceutical | |
US20230062579A1 (en) | Activity-specific cell enrichment | |
US20090023180A1 (en) | Reduction of Shading in Microalgae | |
Hirasawa et al. | Adaptive laboratory evolution of microorganisms: Methodology and application for bioproduction | |
Han et al. | Design of growth‐dependent biosensors based on destabilized GFP for the detection of physiological behavior of Escherichia coli in heterogeneous bioreactors | |
Stella et al. | Biosensor-based growth-coupling and spatial separation as an evolution strategy to improve small molecule production of Corynebacterium glutamicum | |
Schito et al. | Communities of Niche-optimized Strains (CoNoS)–Design and creation of stable, genome-reduced co-cultures | |
Peng et al. | Triple deletion of clpC, porB, and mepA enhances production of small ubiquitin-like modifier-N-terminal pro-brain natriuretic peptide in Corynebacterium glutamicum | |
Han et al. | Biotechnological applications of microbial proteomes | |
Koksharova et al. | β-N-methylamino-L-alanine (BMAA) causes severe stress in Nostoc sp. PCC 7120 cells under diazotrophic conditions: a proteomic study | |
CN112375771B (en) | Homoserine biosensor and construction method and application thereof | |
EP3108007B1 (en) | A method of detecting a microorganism in a sample by a fluorescence based detection method using somamers | |
CN110615832A (en) | Bmor mutant for efficiently screening isobutanol high-yield strains | |
Madhu et al. | Global transcriptome-guided identification of neutral sites for engineering Synechococcus elongatus PCC 11801 | |
WO2024030344A1 (en) | Genetic algorithm and imodulon based optimization of media formulation for quality, titer, strain, and process improvement biologics | |
García-Franco et al. | Insights into the susceptibility of Pseudomonas putida to industrially relevant aromatic hydrocarbons that it can synthesize from sugars | |
Ignatova et al. | Two-dimensional fluorescence difference gel electrophoresis analysis of Listeria monocytogenes submitted to a redox shock | |
Finley et al. | Structural genomics for Caenorhabditis elegans: high throughput protein expression analysis | |
Xiao et al. | Analysis of key genes for the survival of Pantoea agglomerans under nutritional stress | |
Wang et al. | Spatial Proteome Reorganization of a Photosynthetic Model Cyanobacterium in Response to Abiotic Stresses | |
Jansen et al. | Parallelized disruption of prokaryotic and eukaryotic cells via miniaturized and automated bead mill |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23850629 Country of ref document: EP Kind code of ref document: A1 |