WO2024030344A1

WO2024030344A1 - Genetic algorithm and imodulon based optimization of media formulation for quality, titer, strain, and process improvement biologics

Info

Publication number: WO2024030344A1
Application number: PCT/US2023/029001
Authority: WO
Inventors: William Throndset; Anand SASTRY; Miles Gander
Original assignee: Absci Corporation
Priority date: 2022-08-04
Filing date: 2023-07-28
Publication date: 2024-02-08

Abstract

Methods, systems and computer-readable media for identifying optimized cell media formulations capable of promoting the expression of a biomolecule of interest are presented, which may include applying mixing algorithms, culturing cells, and measuring component conditions. The techniques may include identifying genes and/or or more independently modulated gene sets. The present techniques include methods, systems and computer-readable media for improving quality of monoclonal antibodies.

Description

GENETIC ALGORITHM AND IMODULON BASED OPTIMIZATION OF MEDIA FORMULATION FOR QUALITY, TITER, STRAIN, AND PROCESS IMPROVEMENT OF BIOLOGICS

Field

[0001] The subject matter provided herein relates generally to the field of cell media optimization, and more specifically, to methods and systems for optimizing cell media formulations using optimization and search technique inspired by the principles of natural selection and genetics, that mimic evolutionary processes (e.g., genetic algorithms).

Background

[0002] Bioreactor media is comprised of many components. The choice of components and component concentrations can have a profound impact on product quality and titer. Current methods for media optimization, even when automated and using advanced mixing algorithms, are time consuming and often do not elucidate the underlying physiological reasons for the improvements in product quality and titer conferred by the optimized mixture. Independent component analysis (ICA) and RNAseq have been combined to identify co-expressed, functionally related gene sets (i.e. , determine transcriptional regulatory network activation), but to date a comprehensive method and predictive model combining advanced mixing algorithms with advanced transcriptional, sequencing and gene network analyses has not been described. [0003] Further, efficient production and proper folding of many recombinant proteins expressed in Escherichia coli are often challenging. This challenge of expressing properly folded proteins further increases with the complexity of the protein such as a full-length antibody requiring the formation of disulfide bond formation. Despite SoluPro™ having an oxidative cytoplasm that is required for the formation of the disulfide bonds in recombinant proteins, the presence of missing disulfide bonds has been observed, and scrambled disulfide bonds with improperly folded individual chains (light or heavy chain) of the antibody and multiple product related impurities aggravated due to the misfolding issues and inefficient disulfide bond formation.

[0004] Thus, comprehensive techniques for determining optimized cell media formulations are needed, to improve product quality and titer.

Summary [0005] The disclosure provides methods and compositions for optimizing cell media formulations. In one embodiment of the present disclosure, a method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest is provided, said method comprising: (1) applying a mixing algorithm to a high- throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.

[0006] In another embodiment, the present disclosure provides a method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.

[0007] In still another embodiment, the present disclosure provides a method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.

[0008] The present disclosure also provides an aforementioned method optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.

[0009] In still another embodiment, an aforementioned method is provided wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm. In another embodiment, the high-throughput device is selected from the group consisting of a liquidhandling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.

[0010] In yet another embodiment, an aforementioned method is provided wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multiwell plates, and one or more multi-well or multi-vessel bioreactor systems. In some embodiments, the multi-well plate comprises 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells.

[0011] In another embodiment, an aforementioned method is provided wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factor, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin. In one embodiment, the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.

[0012] The present disclosure also provides an aforementioned method wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.

[0013] In still another embodiment, an aforementioned method is provided wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules. In some embodiments, the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a non-commercial antibody, a clinical antibody, a non-clinical antibody, a research-grade antibody, a diagnosticgrade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitope-binding fragments of any of the above.

[0014] In yet another embodiment, an aforementioned method is provided wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest. In still another embodiment, an aforementioned method is provided further comprising measuring cell growth.

[0015] In some embodiment, the present disclosure provides an aforementioned method wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA. In still another embodiment, an aforementioned method is provided wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.

[0016] In another embodiment of the present disclosure, an aforementioned method is provided wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells. In one embodiment, the cells are bacterial cells. In another embodiment, the bacterial cells are E. coli cells. In still another embodiment, the E.coli cells comprise one or more or all of: (a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter; (b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter; (c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm; (e) a reduced level of gene function of a gene that encodes a reductase; (f) at least one expression construct encoding at least one disulfide bond isomerase protein; (g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or (h) at least one polynucleotide encoding Ervlp.

[0017] The present disclosure also provides computing systems and computer-implemented methods that are optionally performed or used in combination with other, separate, computer systems or wet lab assays as described herein. In one embodiment, a computing system for identifying an improved bioform substrate is provided, comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.

[0018] In another embodiment of the present disclosure, a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest is provided.

Brief Description of Figures

[0019] FIG. 1 A shows one embodiment of a genetic algorithm media optimization workflow environment, according to some aspects;

[0020] FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects;

[0021] FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects;

[0022] FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects; [0023] FIG. 2A shows exemplary cell media optimization using Darwinian fitness, according to some aspects;

[0024] FIG. 2B depicts an exemplary selection for reproduction, according to some aspects;

[0025] FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects;

[0026] FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects;

[0027] FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects;

[0028] FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects;

[0029] FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects;

[0030] FIG. 2H depicts a chart showing population evolving to a higher HiPrBind™ signal, according to some aspects;

[0031] FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects;

[0032] FIG. 3 shows Fab Expression in SoluPro E.coli. Random mixtures (light grey) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (dark grey) are shown versus control conditions (diagonal hatch). The mean relative expression of the evolved mixtures average 25% higher than unevolved mixtures after 2 rounds of the GA;

[0033] FIG. 4 shows Fab expression before and after GA. Means comparison show a significant improvement of mixtures after two rounds of evolution;

[0034] FIG. 5 shows principal component analysis (PCA) of RNAseq data and titer. Principal component analysis of RNAseq data (left) reduces ~4000-dimensional gene expression data to two dimensions. Generations are overlayed showing a convergence on a single location, which matches the area of highest Fab expression (right);

[0035] FIG. 6A shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoS iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects; [0036] FIG. 6B shows an iModulon activity analysis wherein RNAseq data was processed and into groups of functionally co-regulation transcription factors known as iModulons, such that RpoH iModulons consist of genes which are controlled by stress response sigma factors, according to some aspects;

[0037] FIG. 6C shows iModulon activity analysis mixtures with a lower stress response correlate with higher HPB activity, according to some aspects, and specifically that activity analysis of DksA, a ribosomal protein subunit regulator that regulates ribosomal synthesis, and specifically depicted is that higher ribosome levels trend towards higher HPB, in some aspects;

[0038] FIG. 6D depicts iModulon activity analysis wherein starvation of leucine is associated with high HPB, according to some aspects;

[0039] FIG. 6E depicts the iModulon IscR, according to some aspects;

[0040] FIG. 6F depicts Fur is associated with iron metabolism, wherein conditions with lower iron concentration are associated with higher HPB signal, according to some aspects;

[0041] FIG. 6G depicts that Cbl relates to sulfur metabolism and high signals HPB indicate a state of sulfur starvation, according to some aspects;

[0042] FIG. 7A depicts scaling up, purification and characterization of purified material, according to some aspects;

[0043] FIG. 7B depicts integrated OUR and CER, according to some aspects;

[0044] FIG. 70 depicts that scaling up quality increases by a high percentage (e.g., 1 11 % with doubling of carbon), according to some aspects;

[0045] FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits);

[0046] FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects;

[0047] FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects;

[0048] FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects; [0049] FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects;

[0050] FIG. 71 depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects;

[0051] FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to CHO produced Mab, according to some aspects;

[0052] FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects;

[0053] FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects;

[0054] FIG. 80 depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects;

[0055] FIG. 9A depicts an exemplary structural characterization environment, according to some aspects;

[0056] FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects;

[0057] FIG. 90 depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects;

[0058] FIG. 10A depicts iModulon gene sets, according to some aspects;

[0059] FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects;

[0060] FIG. 1 1 A depicts iModulon activity, according to some aspects;

[0061] FIG. 1 1 B depicts iModulon activity, according to some aspects;

[0062] FIG. 1 1 C depicts iModulon activity, according to some aspects;

[0063] FIG. 1 1 D depicts iModulon activity, according to some aspects; and [0064] FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects.

Detailed Description

[0065] The present disclosure addresses the aforementioned unmet need and provides methods that combine algorithms that create mixtures of media components with machine learning, RNAseq and ICA, rapid identification of optimized mixtures (e.g., optimized cell media formulations), and thus unmasks how an organism responds to diverse stresses and unfamiliar environments.

[0066] FIG.1 A shows an exemplary workflow of one embodiment of the methods described herein. Mixing optimization algorithms such as, for example, a Genetic Algorithm (GA), Bayesian algorithms, and the like are used to search an experimental space for the optimal mixture of components and concentration of such components based on scoring formulas that can include robustness of growth, product quality and titer. The mixing algorithms are, in some embodiments, used in connection with high-throughput screening methodologies such as liquid handling robots and specialized equipment for evaluating growth rates of inoculated microorganisms or cell lines, measuring product quality and titer. These are typically executed as stand-alone experiments, where improved mixtures are identified empirically. As described herein, RNAseq is used in one embodiment to understand what genes are being transcribed at the time of sampling from a culture of microbial cells or mammalian cell lines. Independent component analysis is used, in some embodiments, to identify co-expressed functionally related gene sets (iModulons). This approach is used to identify limitations of media components, metabolic state(s) and the like. By combining these two approaches with machine learning, the number of individual members (wells, flasks, reactors) is significantly decreased.

[0067] The present disclosure also provides computing systems and computer-implemented methods that are optionally performed or used in combination with other, separate, computer systems or wet lab assays as described herein. In one embodiment, a computing system for identifying an improved bioform substrate (e.g., a cell culture media formulation) is provided, comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement (e.g., a gene)that is transcribed, and (c) at least one independently modulated arrangement set (e.g., a gene set as described herein) by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.

[0068] The present techniques further provide the ability to form improved disulfide bond formation and protein folding in full length monoclonal antibodies in Escherichia co// by improving media composition using genetic algorithm based media optimization.

[0069] It will be appreciated by those of skill in the art that a computing system such as described herein and above can be used in combination with other systems/wet lab assays which can optionally measure, for example, (b) at least one arrangement (e.g., a gene) that is transcribed, and (c) at least one independently modulated arrangement set (e.g., a gene set) by identifying the improved substrate. In this way, some measurements are taken by a fermentation device such as a BioLector (Beckman). The BioLector measures on a per-well basis pH, O₂ concentration, and biomass. Such measurements may or may not be considered when assessing the fitness function to select winners for breeding into subsequent rounds as described in more detail below. Biomass would be considered where, for example, higher cell mass was an objective (e.g., probiotic cultures). Fermentation devices such as those described herein have a fluorescent channel, so one could add a fluorescent reporter gene, or add a fluorescent substrate, as an input for fitness. One example would be to create a genetic construct wherein the quantity of the protein of interest is directly proportional to the amount of signal from the co-expressed fluorescent protein.

[0070] Using the methods and methods steps disclosed herein, an Al engine is capable of learning what mixtures are optimal per strain and per target protein, and the development timeline will be decreased and entirely in-silico predictions of optimal mixtures may be achieved. In this way, a computer-implemented method for predicting a cell media formulation capable of promoting the production of a biomolecule of interest, is contemplated in one embodiment.

[0071] Mixing Algorithms [0072] In some embodiments, a genetic algorithm (GA) is contemplated. In one embodiment, the genetic algorithm is or is adapted from the algorithm described in H., Weuster-Botz, et al., Bioprocess Biosyst Eng 29, 385-390 (2006) (https://doi.org/10.1007/s00449-006-0087-7). GAs are inspired by Darwinian principles of natural selection, or “survival of the fittest.” GAs are based on evolutionary principles, encoding several sets of design variables (e.g., volumes of media components) on strings. These strings may be processed by GA operators as discussed herein (e.g., crossover, mutation, etc.) throughout one or more successive generations. In this scheme, each “gene” may be a pipetting volume of a media component, and each “chromosome” may be a well in a microplate comprised of discrete volumes of each “gene.” The principle of “survival of the fittest” assures a convergence toward optimal values with iterative generations.

[0073] In some aspects of the present invention, “fitness” refers to an ability of a particular growth medium to promote or enhance growth/survival of an inoculated biological organism (e.g., a cell placed into the well of a bioreactor). Thus, the GA may optimize for survival of the most welcoming, with respect, for example, to the respective abilities of cell media contained in respective wells to promote the growth/survival of bacteria, sometimes in the presence of less than ideal conditions. In this case, the one or more media most welcoming to a given bacteria, given one or more conditions, may be the media considered to have “survived,” for purposes of a single round of evolution (e.g., during a tournament selection process, as discussed herein).

[0074] In some aspects, the present techniques may use different/additional criteria to moderate fitness, such as minimizing material necessary to achieve a successful culture (survival of the most materially efficient); minimizing the number of components respective media consist of while still achieving a successful culture (survival of the simplest); minimizing economic input cost (survival of the least expensive); etc.

[0075] In some aspects of the present invention, “fitness” refers to an ability of a biomolecule to survive under various conditions. Thus, fitness of media and biomolecules may be determined separately, and/or concomitantly, in aspects.

[0076] In GAs, genetic operators are mathematical functions used to simulate biological concepts, such as mutation, crossover (i.e., recombination or mixing of chromosomes) and selection. These operators are the practical hooks by which the fitness concepts discussed above are enforced in an evolutionary computation solution. For example, a particular crossover operator may be implemented to avoid recombining media materials known to be incompatible, and a mutation operator may be used to introduce diversity, and to avoid premature convergence. Similarly, as will be appreciated by those of ordinary skill in the art, a given selection operator (e.g., tournament selection, roulette wheel selection, etc.) may be chosen to determine respective fitness of individuals.

[0077] As appreciated by those of ordinary skill in the art, the present techniques may include other/additional algorithmic approaches. For example, the GA may include a Bayesian estimation of distribution (EDA) algorithm, or probabilistic model-building genetic algorithm (PMBGA). In one embodiment, a naive Bayes algorithm (e.g., https://onlinelibrary.wiley.com/doi/epdf/10.1002/bit.28132) is contemplated.

[0078] In still other embodiments, a differential evolution algorithm (Storn, R. M. & Price, K. V. Differential evolution — a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11 (4), 341-359. https://doi.Org/10.1023/A:100820282 (1997)) is contemplated. In general, any direct, stochastic, and/or population algorithm is contemplated herein (https://nature.com/articles/s41598-020-74228-0).

[0079] Exemplary, non-limiting cell media formulation components include nutrients as well as buffering compounds, pH and temperature values. Components, therefore, may include one or more carbon sources, one or more nitrogen sources, one or more analytes, one or more salts, one or more buffering compounds, a pH or pH range, a temperature or temperature range, one or more metal salts, one or more trace minerals, one or more biostimulants, one or more co-factors, one or more peptides, one or more modified peptides, one or more nucleic acids, a one or more nucleic acid precursors, one or more small molecules, and/or one or more vitamins.

[0080] In some embodiments, amino-nitrogen sources such as peptone, protein hydrolysates, infusions and extracts, both plant based such as soytone, potato hydrolysate, grains and grain meals, or animal based peptones, such as meat digests and casein hydrolysate, whey and gelatin are contemplated. Various yeast extracts and, or combinations of all or some of the available amino acids such as alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine and valine are also contemplated. Growth factors such as blood, or serum, vitamins, NAD, and yeast extract are also contemplated in some embodiments. In still other embodiments, energy sources such as any kind of sugar, alcohol, or carbohydrate are contemplated, including, for example, glucose, glycerol, sorbitol, fructose, sucrose, maltose, sophorose, lactose, dextrose, galactose, arabinose, high fructose corn syrup, maltodextrin, manose, ribose, trehalose, xylose and the like. Sugar alcohols are additionally contemplated and include, for example, arabitol, erythritol, glycerol, matlitol, mannitol, lactitol, sorbitol and xylitol. Inorganic phosphate and sulfur, trace metals, water, and vitamins are often included in microbial culture media, and these components are contemplated herein. Mineral salts such as phosphate, sulfate, magnesium, calcium, iron are also contemplated herein. Selective agents such as bile salts, desoxycholate, chemicals, antimicrobials, antibiotics and dyes can be included according to some embodiments of the present disclosure. Gelling agents are, in some embodiments, used in solid and semi-solid media and may include agar, gelatin, alginate, albumin and silica gel. Protective agents such as calcium carbonate, soluble starch and charcoal are, in some embodiments, used to absorb toxic metabolites or neutralize media. Certain growth factors including NAD and hemin, or surfactants such as polysorbate 80 are, in some embodiments, used to alter growth rates. Trace metals are, in still other embodiments, included and include, but are not limited to; aluminum, manganese, molybdenum and iron, copper and nickel.

[0081] FIG. 1 B depicts an exemplary logical diagram of a genetic algorithm media optimization workflow, according to some aspects. FIG. 1 C depicts an exemplary flow diagram of a method for genetic algorithm media optimization, according to some aspects.

[0082] Genetic Algorithms for Improved Product Quality

[0083] As discussed herein, the present techniques may include improved folding techniques. Specifically, cell media components contain different nutrients and cofactors for cell proliferation and product formation. These components can also be optimized to affect the quality of the product formed. Using a genetic algorithm approach, the present techniques are able to identify an optimized media composition that results in higher quality (more than 2-fold increase in quality metric) full-length antibody with improved folding and disulfide bond formation.

[0084] FIG. 1 D depicts a first exemplary flow diagram for creating for using a genetic algorithm to improve fragment antibody (Fab) titer, and a second exemplary flow diagram for using a genetic algorithm to improve product quality of full-length monoclonal antibodies, according to some aspects. As shown in FIG. 1 D, the steps to implement this improvement are as follows:

1 . Media Optimization using Genetic Algorithm - Genetic Algorithm workflow consisting of 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals) the experiments were executed with SoluPro E.coli strain expressing full-length Mab. 2. Broth harvested at 68h were purified and measured by iCIEF, an assay that measures the quality of the product in terms of folding and disulfide bond formation and NR-CGE. This assay measures intermolecular disulfide bond formation.

3. Top candidates with the highest quality scores were selected as parents for subsequent round of mixtures and an evolutionary approach to recombination to produce another 60 mixtures. This was repeated twice, yielding populations with improved MAb quality of 40% in comparison to 15% with the control media - an improvement of more than 266 %.

4. Top 2 candidates were scaled up to the Biorecator scale that still exhibited a 200% increase in quality at the end of fermentation.

5. A comprehensive RNAseq was collected to compare the control and new improved media in the bioreactor.

6. A complete downstream purification and analytical characterization was performed on the product obtained from the use of the new improved media.

[0085] The foregoing steps are described herein in further detail.

[0086] It should be appreciated that the present techniques include genetic algorithm and iModulon-based optimization of media formulation for quality, titer, strain, and/or process improvement of biologies, in some aspects. For example, in some aspects, the present techniques include methods, compositions and/or systems for optimizing cell media formulations for the amount of biomolecules of interest expressed. These may be single parameter optimizations focused around the amount of biomolecules of interest expressed. Further, these techniques may provide methods/systems/compositions for identifying one or more genes and/or independently modulated gene sets to optimize the media formulation towards the amount of biomolecules expressed at the small scale or high throughput condition.

[0087] In other aspects, the present techniques may include methods, systems and/or compositions for optimizing cell media formulations for the quality of the biomolecules of interest expressed. For example, the present techniques may include a multi-parametric optimization using cell media formulation around biomass, amount of biomolecules of interest and the quality of biomolecules of interest. In some aspects, biomolecules of interest may be produced at higher quality but at the same time a minimum threshold amount that is sufficient for downstream assays. The present techniques may include optimizing media to allow for sufficient growth of the cells that are the cell factories making or manufacturing certain biomolecules.

[0088] Further, the present techniques may include identifying one or more independently modulated gene sets that improve media formulation for bioreactor scale-up. In addition, the present techniques may include techniques for identifying individual genes and independently modulated gene sets to gain insights for strain engineering to improve. Specifically, the optimized cell media formulations can be genetically encoded into the strain and therefore (1 ) reduce cogs (2) retain quality during scale up along with (3) boosting the amount of product/biomolecule produced.

[0089] Furthermore, the present techniques may include techniques for identifying gaps at the bioreactor scale process in terms of nutrient limitations over the course of time. The present techniques may provide insights to optimize and develop a process using nutritional requirements based on the individual genes or the independently modulated gene sets.

[0090] FIG. 2A depicts an exemplary Darwinian “fitness” approach where the pipetting volume represents a “gene” and where a well (e.g., in a multi-well plate as described herein) represents a chromosome. FIG. 2B depicts chromosomes (wells) with the highest fitness score are, in one embodiment, bred by combining genes (component volumes) from two fit parents. In some embodiments, a tournament selection process is used to determine the parent individual for the subsequent generation. In this process, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size. In another embodiment, a mutation step/process is used. In the mutation step, simple mutations are applied to 5, 10, 15, 20, 25% or more of individuals, and crossovers are applied to the remaining 75%. In a simple mutation, the concentration of one component in the formulation is changed to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.

[0091] The mixing and/or culturing steps of the methods described herein may be done, in some embodiments, within a microfluidic mixing system, a droplet microarray system, a multiwell plate (including, for example, controlled-release multi-well plates) or other high throughputcompatible devices as are known in the art. Multi-well plates or microplates can comprise 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells or more wells, and multiple plates may be used in the methods described herein. [0092] As described herein, culturing (e.g., of the aforementioned multi-well plates and other devices) is performed under a variety of conditions, including stable conditions and conditions that change over time. By way of example, the culturing can be performed under conditions that promote cell growth, where the conditions include or exclude constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing. In this way, as will be appreciated by those in the art, in addition to measuring various aspects of the biomolecules produced from the cells as described herein, the present disclosure also contemplates measuring and/or maintaining cell culturing conditions that allow productions of the biomolecules. Conditions such as constant or intermittent shaking (e.g., 25-1000 rpm), constant or intermittent oxygen levels (e.g., 0-40 %) constant or intermittent humidity (e.g., 20- 90 %), pH (e.g., 3-8) and/or constant or intermittent temperature (e.g., 10-60 °C) are contemplated.

[0093] FIG. 2C depicts an exemplary genetic algorithm explanatory diagram, according to some aspects, wherein earlier generations are of a lower titer and quality, and later media have a higher titer and quality, wherein the improved titer and quality are the result of evolutionary pressures.

[0094] FIG. 2D depicts an exemplary gradient diagram, depicting the genetic algorithm search space, according to some aspects. The experimental space searched by the genetic algorithm may be thought of as a mountain range (e.g., gradient) wherein the genetic algorithm seeks to find the highest peak, without reference to a map.

[0095] FIG. 2E depicts an exemplary genetic algorithm script, according to some aspects. As discussed herein, the present techniques may be implemented using a programming language such as Python. Optimization can be slow and unpredictable using OFAT (“One Factor at a Time”) or even Design of Experiments (DoE) optimization techniques. Even high throughput wet lab media optimization using GA samples a very small fraction of possible mixtures, in some aspects. The solution space may be too large to cover even with automation and micro fermentation. The present techniques may solve such problems by combining iModulons and machine learning with GAs. Specifically, wet lab data and Al advantageously enable the present techniques to explore more of the best media conditions.

[0096] For example, an experimental design may include a particular strain (e.g., SoluPro™) E. coli expressing a Mab. A GA script may produce a number (e.g., 60) of unique conditions (e.g., 30 per plate) A number of controls may be included (e.g., 4 controls (2 per plate)). Process conditions may include dynamic feeding, pH feed trigger at pH setpoint and a temperature shift after 12hrs EFT.

[0097] The GA script of FIG. 2E depicts a tournament selection process to determine the parent individual for the subsequent generation. In this exemplary process, five individuals are randomly sampled from the population and the individual with the highest fitness is selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size. In the mutation step, simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%. In a simple mutation, the concentration of one component in the formulation is changed to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring. The script of FIG. 2E may output a config file and liquid handling robot volume output file. The script may be parameterized and run using the following command: python optimize media ga.py Define-GA Genes edit.xlsx -output file=ga-media-config- genO.csv --population_size=64 --random_seed=400

[0098] FIG. 2F depicts an exemplary logical diagram for adding genes to one or more culture plates using a liquid-handling robot, according to some aspects, for example as described with respect to FIG. 2E.

[0099] FIG. 2G depicts trends of mixtures initialized to random initial conditions, according to some aspects. The initial random mixtures show diverse growth trends.

[0100] FIG. 2H depicts a chart showing population evolving to a higher HiPrBind™ signal, according to some aspects. FIG. 2H depicts Mab quality fitness scorein SoluPro Ecoli. Random mixtures (red color) versus mixtures evolved through two rounds of evolutionary selection by the genetic algorithm (blue and green) are shown versus the control condition.

[0101] FIG. 2I depicts respective qualities of monoclonal antibodies (MABs) before and after application of the present genetic algorithm techniques, according to some aspects. Mean comparison show a significant improvement of mixtures after two rounds of evolution.

Measuring Biomolecules

[0102] As described herein, a biomolecule of interest may be expressed and produced from cells according to various embodiments. Exemplary biomolecules are described further herein. In some embodiments, the methods provided herein include one or more steps that include measuring or detecting or analyzing the biomolecules that are produced. [0103] In one embodiment, the biomolecule of interest includes a fluorescent tag or fusion. Fluorescence activated cell sorting (FACS) and flow cytometry, as are known in the art, are therefore contemplated herein. As described herein, a liquid-handling robot that is capable of inoculating and culturing cells and equipped to take a variety of measurements is contemplated for use in various embodiments of the present disclosure. As one example, a BioLector® microbioreactor (Beckman) can be used. Other microfermentation systems are contemplated (Ambr 15 and 250 systems by Sartorius (https:/sartorius-stedim-tap.com/a/micro-bioreactor- gp.htm); DASbox and DASGIP systems from Eppendorf (https://online-shop.eppendorf.us/US- en/Bioprocess-44559/Bioprocess-Systems-60767/DASbox-Mini-Bioreactor-System-PF- 133566.html); and Feed Plate technologies from Kuhner Shaker company (https://feedingtechnology.com/portfolio-item/feedplates/?lang=en.)).

[0104] Numerous additional assays are also contemplated. Binding assays, for example assays that measure protein-protein interactions, including antibody-antigen interactions and including measuring binding affinity, are well known in the art. By way of example, Surface plasmon resonance (SPR), Dual polarisation interferometry (DPI), Static light scattering (SLS), Dynamic light scattering (DLS), Flow-induced dispersion analysis (FIDA), Fluorescence polarization/anisotropy, Fluorescence resonance energy transfer (FRET), Bio-layer interferometry (BLI), Isothermal titration calorimetry (ITC), Microscale thermophoresis (MST), Single colour reflectometry (SCORE) are contemplated. Additionally, Bimolecular fluorescence complementation (BiFC), affinity electrophoresis, label transfer, phage display, Tandem affinity purification (TAP), cross-linking, Quantitative immunoprecipitation combined with knock-down (QUICK) and Proximity ligation assay (PLA) are other well-known assays that provide proteinprotein interaction information.

[0105] In some embodiments, the binding affinities of the antibodies described herein are measured by array surface plasmon resonance (SPR), according to standard techniques (Abdiche, et al. (2016) MAbs 8:264-277). Briefly, antibodies were immobilized on a HC 30M chip at four different densities / antibody concentrations. Varying concentrations (0-500 nM) of antibody target are then bound to the captured antibodies. Kinetic analysis is performed using Carterra software to extract association and dissociation rate constants (k_a and k_d, respectively) for each antibody. Apparent affinity constants (K_D) are calculated from the ratio of k_d/k_a. In some embodiments, the Carterra LSA Platform is used to determine kinetics and affinity. In other embodiments, binding affinity can be measured, e.g., by surface plasmon resonance (e.g., BIAcore™) using, for example, the IBIS MX96 SPR system from IBIS Technologies or the Carterra LSA SPR platform, or by Bio-Layer Interferometry, for example using the Octet™ system from ForteBio. In some embodiments, a biosensor instrument such as Octet RED384, ProteOn XPR36, IBIS MX96 and Biacore T100 is used (Yang, D., et al., J. Vis. Exp., 2017, 122:55659).

[0106] KD is the equilibrium dissociation constant, a ratio of k₀ff/k_0n, between the antibody and its antigen. KD and affinity are inversely related. The KD value relates to the concentration of antibody and so the lower the KD value (lower concentration) and thus the higher the affinity of the antibody. Antibody, including reference antibody and variant antibody, K_D according to various embodiments of the present disclosure can be, for example, in the micromolar range (10^-4 to 10'⁶), the nanomolar range (1 O'⁷ to 10'⁹), the picomolar range (1 O'¹⁰ to 10'¹²) or the femtomolar range (1 O'¹³ to 10'¹⁵). In some embodiments, antibody affinity of a variant antibody is improved, relative to a reference antibody, by approximately 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50% or more. The improvement may also be expressed relative to a fold change (e.g., 2x, 4x, 6x, or 2-, 3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-fold or more improvement in binding activity, etc.) and/or an order of magnitude (e.g., 10⁷, 10⁸, 10⁹, etc.).

[0107] In still other embodiments, the amount of “biologically active” biomolecules produced are also measured or detected or analyzed. Biologically active includes, but is not limited to, a properly folded biomolecule such as a therapeutic protein or enzyme or antibody or fragment of any of the above, an enzymatically active biomolecule, an antibody or fragment thereof that is capable of binding to an antigen, and a protein or polypeptide that is capable of binding to a ligand.

[0108] In one embodiment, an activity-specific cell-enrichment (ACE) assay is used with the methods described herein. The activity-specific cell-enrichment (ACE) assay identifies host cells that express active gene product of interest (e.g., biomolecules, as used herein) rather than inactive material, as described in WO 2021/146626, incorporated herein in relevant part. Active gene product can be distinguished from inactive material by the ability of active gene product to specifically bind a binding partner molecule, or by the ability of gene product to participate in a chemical or enzymatic reaction, as examples. The presence of properly formed disulfide bonds in a polypeptide gene product is an indication that it is correctly folded and presumptively active. In the cell-enrichment methods, active gene product of interest is detected by utilizing an appropriate labeling complex that specifically binds to active gene product of interest, such as a labeled antigen if the gene product of interest is an antibody or Fab; or a labeled ligand if the gene product of interest is a receptor or a receptor fragment, where the ligand specifically binds to an active conformation of the receptor; or a labeled substrate or a labeled substrate analog if the gene product of interest is an enzyme, as examples. For any gene product of interest, if there is an available antibody or antibody fragment that specifically binds to the active gene product and not to inactive gene product, that antibody or antibody fragment can be used to label the active gene product of interest when attached to a detectable moiety.

[0109] Exemplary ACE assay protocol (based on assays in WO2021/146626)

1 . Fix sample cells with a formaldehyde based solution.

2. Prepare a permeablization buffer and treat cells.

3. Add biotin to the 1x PE+ 1 mM EDTA to a final concentration 0.1 mg/ml biotin.

4. Add biotin (stored at -80^eC or 4^eC) at a 10Ox dilution (e.g., 5000uL or 5mL of PE buffer would require 50uL biotin).

5. Combine the primary (and secondary probes if dual probe), fluorescently labled in 1 x_PE +1_mM_EDTA in 15mL centrifuge tube (or 50mL, if staining reagent volume exceeds 15mL)

6. Incubate and rotate for at least 1 hour at 4⁹C with foil wrapped around tube.

7. After 1 hr, add biotin to stain solution to a final concentration 0.1 mg/ml biotin

8. Add biotin (stored at -80 or 4^eC) at a 100x dilution. Eg, 5000|iL or 5 mL of PE buffer would require 50|iL biotin) Incubate again with rotation for at least 30 min with foil wrapped around tube.

9. Spin samples at 3300g at 4^eC for 5 minutes

10. Aspirate the supernatant with the vacuum pump and the matrix tube attachment. Avoid touching the attachment to the sides of the tube

1 1. Slowly cascade 500|iL of 1X PBS + 1 mM EDTA onto the side of the sample tubes without disturbing the pellet.

12. Add 250|iL of E2 Fixation Buffer to each tube.

13. After the 18hr incubation, remove samples from rotator and spin in centrifuge at 3300g at 4^eC for 3 minutes.

14. Carry out FACS on the stained cell samples, binning by fluorescence signal.

[0110] In still another embodiment, a HiPrBind assay is used with the present methods. The HiPrBind assay provides an efficient method for multiple interrogations of an active gene product, such as by providing at least two distinct interrogations of a characteristic property of the active gene product or simultaneously interrogating at least two characteristic properties of that active gene product. HiPrBind assays are described in WO 2021/163349, incorporated herein in relevant part. The assay is an advance on the principle underlying the yeast two hybrid assay in that a multi-component detection mechanism is brought into proximity, and thereby brought into an environment where the detection mechanism can be active in producing a signal capable of detection. One component of the multi-component e.g., two component) detection system is stably associated with a first analyte-associating moiety (/.e., active gene product-associating moiety) and a second component of the detection system is stably associated with a distinct second analyte-associating moiety. A detectable signal is generated when the two components of the detection system are brought into proximity by the analyteassociating moieties binding to the analyte. Because each of the analyte-associating moieties is specific for an active gene product as analyte, a signal is only generated when a characteristic property of an active gene product is detected using two distinct mechanisms, or when two distinct characteristic properties of an active gene product are simultaneously detected. The HiPrBind assay is versatile in detecting a variety of characteristic properties, but a simple example involves a gene product that is active in homodimeric form wherein each monomer requires disulfide bonds to properly fold. One active gene product-associating moiety can be a binding agent that specifically binds to the properly folded and therefore active monomer, and a second active gene product-associating moiety can be a distinct second binding agent that specifically binds to the dimeric form of the gene product. Thus, the HiPrBind assay in this version simultaneously detects a gene product that is properly folded and in dimeric form.

[0111] Exemplary HiPR Bind assay (based on assays in WO2021/163349)

1 . Culture sample cells in proper induction conditions to facilitate target protein and accessory protein expression using arabinose and or proprionate media. Grow sample cells in a 96 well plate.

2. Keep sample plates on ice to thaw. While those are thawing, gather and label the plates needed and prep assay solutions.

3. Dilute standard into Dilution Buffer 1 solution (0.1x Perkin Elmer Buffer, 1 mM EDTA, 1 x PBS).

4. Prepare Assay solution I (ASI) and Assay solution II (ASH) in dark amber colored 50mL conicals .

5. Predispense Dilution Buffer 2 into 384 well V-bottom Greiner Bio-One dilution plates, resuspend cell pellets 6. Predispense ASI into 384-well Proxiplates. Visually inspect that the silicone nozzles are fitted properly and that nothing looks loose or off kilter.

7. Once each plate is finished, seal it with a Perkin Elmer plate sealer.

8. Spin down plates at 500g for 1 min.

9. Incubate the plates overnight at 4^eC.

10. The next day, take the plates out of 4^eC storage. Allow to equilibrate to room temp for at least 1 hour.

11 . Feed plates into plate feeder on the Enspire.

12. Scan on the Enspire using the "Alpha, Fl - DNA_Ex480 Em 520-Alpha 384-SW".

13. Record values for further analysis of alpha max slope.

Measuring Gene Transcription and Identifying Gene Sets

[0112] As described herein, in addition to measuring biomolecules, the genes (i.e., other than the gene encoding the biomolecule) that are induced inside cells during the culturing steps described herein are, in some embodiments, also or alternatively measured/determined.

[0113] RNAseq or RNA-Seq is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome (Wang Z., et al., Nature Reviews. Genetics., 10(1 ): 57-63 (2009)). The present disclosure provides, in some embodiments, using RNA sequencing and other well-known high throughput sequencing techniques and assays to identify the gene expression or transcript number or sequence of individual, gene sets, or an entire genome/transcriptome. In yet another embodiment the use of metabolomic data to identify the quantity of key metabolites and metabolic states can be used. Metabolomics is a discipline widely used in systems biology and has been applied to explain observed phenotypes. (Lopez-Malo M, et al., PLoS ONE. 2013;8:e60135.)

[0114] Independent component analysis (ICA) is a signal deconvolution algorithm and can be used according to some embodiments of the disclosure. Sastry et al. have recently described that E. coli transcriptome mostly consists of independently regulated modules (Satry, A.V., et aL, Nature Comm., 2019, 10:5536). Additionally, Tan et al. recently reported that ICA of E. Goli’s transcriptome revealed the cellular processes that respond to heterologous gene expression (Tan J., et al., Metabol. Eng., 2020, 61 , 360-368). Additionally, improvements or modification to the independent component analysis can be used such as OptICA, for finding the optimal dimensionality that controls for both over-and under-decomposition. (McConn, J.L., et al., BMC Bioinformatics 22, 584 (2021 ) https://doi.org/10-1 186/s12859-021 -04497-7). Other dimensionality reduction techniques may be employed such as PCA, ZIFA, GrandPrix, t-SNA, UMAP, DCA, scvis, VAE and SIMLR (Front. Genet., 23 March 2021 , Sec. Computational Genomics, https://doi.org/10.3389/fgene.2021.646936). While the Example proved below is based on RNAseq (transcriptomics), dimensionality reduction can, in another embodiment, be applied to metabolomics data that may be used to optimize media towards a particular desired metabolic state. (Proteome Res. 2012, 11 , 8, 4120-4131 , June 20, 2012, https://doi.org/10.1021/pr300231 n) (Lin J, et al., RSC Adv. 2019 Aug 30;9(47):27369-27377. doi: 10.1039/c9ra05128g. PMID: 35529190; PMCID: PMC9070647.)

Host Cells

[0115] “Host cells” herein are, in some embodiments, cells used in bioprocessing to manufacture heterologous protein products. Such host cells can be, for example, eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.

[0116] Prokaryotic host cells are provided that comprise expression constructs designed for the expression of coding regions. Prokaryotic host cells can include archaea (such as Haloferax volcanii, Sulfolobus solfataricus), Gram-positive bacteria (such as Bacillus subtilis, Bacillus licheniformis, Brevibacillus choshinensis, Lactobacillus brevis, Lactobacillus buchneri, Lactococcus lactis, and Streptomyces lividans), or Gram-negative bacteria, i.e., proteobacteria, including Alphaproteobacteria (Agrobacterium tumefaciens, Caulobacter crescentus, Rhodobacter sphaeroides, and Sinorhizobium meliloti), Betaproteobacteria (Alcaligenes eutrophus), and Gammaproteobacteria (Acinetobacter calcoaceticus, Azotobacter vinelandii, Escherichia coli, Pseudomonas aeruginosa, and Pseudomonas putida). Host cells include Gammaproteobacteria of the family Enterobacteriaceae, such as Enterobacter, Erwinia, Escherichia (including E. coli), Klebsiella, Proteus, Salmonella (including Salmonella typhimurium), Serratia (including Serratia marcescans), and Shigella.

[0117] As described in WO/2017/106583, incorporated by reference herein in its entirety, producing gene products such as therapeutic proteins at commercial scale and in soluble form is addressed by providing suitable host cells capable of growth at high cell density in fermentation culture, and which can produce soluble gene products in the oxidizing host cell cytoplasm through highly controlled inducible gene expression. Host cells of the present disclosure with these qualities are produced by combining some or all of the following characteristics. (1 ) The host cells are genetically modified to have an oxidizing cytoplasm by increasing the expression or function of oxidizing polypeptides in the cytoplasm, and/or by decreasing the expression or function of reducing polypeptides in the cytoplasm. Specific examples of such genetic alterations are provided herein and in WO 2017/106583. Optionally, host cells can also be genetically modified to express accessory proteins (which can be chaperones) and/or cofactors that assist in the production of the desired gene product(s), and/or to glycosylate polypeptide gene products. (2) The host cells comprise one or more expression constructs designed for the expression of one or more active gene products of interest. At least one expression construct can comprise an inducible promoter and a polynucleotide encoding a gene product to be expressed in active form from the inducible promoter. (3) The host cells contain additional genetic modifications designed to improve certain aspects of gene product expression from the expression construct(s). The host cells can (A) have an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter, and as another example, wherein the gene encoding the transporter protein is araE, araE, araG, araH, rhaT, xylF, xylG, or xylH, or particularly the transporter protein is araE, or wherein the alteration of gene function more particularly is expression of unaltered araE from a constitutive promoter; and/or (B) have a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter, and as further examples, wherein the gene encoding a protein that metabolizes an inducer of at least one said inducible promoter is selected from the group consisting of araA, araB, araD, prpB, prpD, rhaA, rhaB, rhaD, xylA, and xylB; and/or (C) have a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter, which gene can be scpA/sbm, argK/ygfD, scpB/ygfG, scpC/ygfH, rmlA, rmlB, rmIC, or rmID.

[0118] Host Cells with Oxidizing Cytoplasm. The expression systems of the present disclosure are designed to express active gene products. Examples of host cells are provided that allow for the efficient and cost-effective expression of active gene products, including components of multimeric products. The host cells can be microbial cells such as gramnegative bacteria, e.g., E. coli. Exemplary E. coli host cells having oxidizing cytoplasm include the E. coli B strains SHuffle® Express (NEB Catalog No. C3028H) and SHuffle® T7 Express (NEB Catalog No. C3029H) and the E. coli K strain SHuffle® T7 (NEB Catalog No. C3026H). The E. coli B strains with oxidizing cytoplasm are able to grow to much higher cell densities than the most closely corresponding E. coli K strain (WO/2017/106583). [0119] Alterations to host cell gene functions. Certain alterations can be made to the gene functions of host cells comprising inducible expression constructs, to promote efficient and homogeneous induction of the host cell population by an inducer. Preferably, the combination of expression constructs, host cell genotype, and induction conditions results in at least 75% (more preferably at least 85%, and most preferably, at least 95%) of the cells in the culture expressing active gene product from each induced promoter, as measured by the method of Khlebnikov et al. described in Example 9 of WO/2017/106583. For host cells other than E. coli, these alterations can involve the function of genes that are structurally similar to an E. coli gene, or genes that carry out a function within the host cell similar to that of the E. coli gene. Alterations to host cell gene functions include eliminating or reducing gene function by deleting the coding region of the gene in its entirety, or by deleting a large enough portion of the gene, inserting sequence into the gene, or otherwise altering the gene sequence so that a reduced level of functional gene product is made from that gene, as is described herein with greater particularity for the ptsP gene or coding region. Alterations to host cell gene functions also include increasing gene function by, for example, altering the native promoter to create a stronger promoter that directs a higher level of transcription of the gene, or introducing a missense mutation into the protein-coding sequence that results in a more highly active gene product. Alterations to host cell gene functions include altering gene function in any way, including for example, altering a native inducible promoter to create a promoter that is constitutively activated. In addition to alterations in gene functions for the transport and metabolism of inducers, as described herein with relation to inducible promoters, and/or an altered expression of chaperone proteins, alterations of the reduction-oxidation environment of the host cell are also contemplated.

[0120] Host cell reduction-oxidation environment. In bacterial cells such as E. coli, proteins that need disulfide bonds to be active are typically exported into the periplasm where disulfide bond formation and isomerization is catalyzed by the Dsb system, comprising DsbABCD and DsbG. Increased expression of the cysteine oxidase DsbA, the disulfide isomerase DsbC, or combinations of the Dsb proteins, which are all normally transported into the periplasm, has been utilized in the expression of heterologous proteins that require disulfide bonds (Makino et aL, Microb Cell Fact 10:32 (2011 )). It is also possible to express cytoplasmic forms of these Dsb proteins, such as a cytoplasmic version of DsbA (cDsbA) and/or of DsbC (cDsbC), that lacks a signal peptide and therefore is not transported into the periplasm. Cytoplasmic Dsb proteins such as cDsbA and/or cDsbC are useful for making the cytoplasm of the host cell more oxidizing and thus more conducive to the formation of disulfide bonds in proteins, including heterologous proteins, produced in the cytoplasm. The host cell cytoplasm can also be made less reducing and thus more oxidizing by altering the thioredoxin and the glutaredoxin/glutathione enzyme systems directly: mutant strains defective in glutathione reductase (gor) or glutathione synthetase (gshB), together with a defective thioredoxin reductase (trxB), render the cytoplasm oxidizing. These strains are unable to reduce ribonucleotides and therefore cannot grow in the absence of exogenous reductant, such as dithiothreitol (DTT). Suppressor mutations (such as ahpC* and ahpCA, Lobstein et aL, Microb Cell Fact 11 :56 (2012)) in the gene ahpC, which encodes the peroxiredoxin AhpC, convert it to a disulfide reductase that generates reduced glutathione, allowing the channeling of electrons onto the enzyme ribonucleotide reductase and enabling the cells defective in gor and trxB, or defective in gshB and trxB, to grow in the absence of DTT. A different class of mutated forms of AhpC can allow strains, defective in the activity of gamma-glutamylcysteine synthetase (gshA) and defective in trxB, to grow in the absence of DTT; these include AhpC V164G, AhpC S71 F, AhpC E173/S71 F, AhpC E171Ter, and AhpC dupl62-169 (Faulkner et al., Proc Natl Acad Sci USA 105(18):6735-6740 (2008), Epub 2008 May 2). In such strains with oxidizing cytoplasm, exposed protein cysteines become readily oxidized in a process that is catalyzed by thioredoxins, in a reversal of their physiological function, resulting in the formation of disulfide bonds. Other proteins that may be helpful to reduce the oxidative stress effects in host cells of an oxidizing cytoplasm are HPI (hydroperoxidase I) catalase-peroxidase encoded by E. coli katG and HPII (hydroperoxidase II) catalase-peroxidase encoded by E. coli katE, which disproportionate peroxide into water and 0₂ (Farr and Kogoma, Microbiol Rev. 55(4):561 -585; (1991 )). Increasing levels of KatG and/or KatE protein in host cells can also be induced by coexpression or through elevated levels of constitutive expression.

[0121] The disclosure also contemplates the expression of the sulfhydryl oxidase Ervlp, derived from the inner membrane space of yeast mitochondria, in the host cell cytoplasm, which has been shown to increase the production of a variety of complex, disulfide-bonded proteins of eukaryotic origin in the cytoplasm of E. coli, even in the absence of mutations in gor or trxB (Nguyen et al, Microb Cell Fact 10:1 (201 1 )).

[0122] Host cells comprising expression constructs preferably also express cDsbA and/or cDsbC and/or Ervlp, are deficient in trxB gene function, and are also deficient in the gene function of either gor, gshB, or gshA. Optionally, the host cells have increased levels of katG and/or katE gene function, and express an appropriate mutant form of AhpC so that the host cells can be grown in the absence of dithiothreitol (i.e., DTT). [0123] Cellular transport of cofactors. When using the expression systems of the disclosure to produce enzymes that require cofactors for function, it is helpful to use a host cell either capable of synthesizing the cofactor from available precursors, or capable of taking it up from the environment. Common cofactors include ATP, coenzyme A, flavin adenine dinucleotide (FAD), NAD NADH, and heme. Polynucleotides encoding cofactor transport polypeptides and/or cofactor synthesizing polypeptides can be introduced into host cells, and such polypeptides can be constitutively expressed, or inducibly co-expressed with the active gene products to be produced by methods of the disclosure.

[0124] Proteases. Host cells can have alterations in their ability to degrade expressed protein products because of the lack of or lowering of the activity of one or more proteases. Exemplary protease include, but are not limited to, Clp, CIpP, OmpT, Lon, FtsH, CIpX, CIpY, CIpA, CIpQ, CIpAP, CIpXP, CIpAXP, CIpYQ, CIpY, and the proteases encoded by yaeL, sppA_; tldD, sprT, yhbU. ptrA, frvX, hyaD, hybD, hycH, envC, ddpX, degP, degQ, degS, hsIV, hsIU, pepB, pepP, sohB, yggG, pepE, pepN, pepQ, abgA, pepT, iadA, pepA, pepD, ptrB, ycaL ycbZ. yegQ, ygeY. ypdF, hyci sgcX, and htpX.

[0125] Glycosylation of polypeptide gene products. Host cells can have alterations in their ability to glycosylate polypeptides. For example, eukaryotic host cells can have eliminated or reduced gene function of the glycosyltransferase and/or oligo-saccharyltransferase genes, impairing the normal eukaryotic glycosylation of polypeptides to form glycoproteins. Prokaryotic host cells such as E. coli, which do not normally glycosylate polypeptides, can be altered to express a set of eukaryotic and prokaryotic genes that provide a glycosylation function (DeLisa et al., WO 2009/089154A2).

[0126] Available host cell strains with altered gene functions. To create preferred strains of host cells to be used in the expression systems and methods of the disclosure, it is useful to start with a strain that already comprises desired genetic alterations (See Table A of WO2017/106583, reproduced below).

Exemplary Host Cell Strains

Expression Constructs

[0127] Expression constructs are polynucleotides designed for the expression of one or more recombinant gene products of interest, and thus are not naturally occurring molecules. Any expression construct known in the art is contemplated for use in the cells and methods of the disclosure, including expression constructs that can be integrated into a host cell chromosome or maintained within the host cell as extra-chromosomal, independently replicating polynucleotide molecules, i.e., episomes having origins of replication independent of the host cell chromosome, such as plasmids or artificial chromosomes. Expression constructs according to the disclosure also can have one or more selectable markers to enable selection of those cells harboring the expression construct. Exemplary selectable markers confer resistance to antibiotics lethal to the host cell lacking that selectable marker or encode enzymes required to produce essential nutrients. Any selectable marker known in the art is contemplated for use in the expression constructs of the disclosure. Expression markers may also contain an inducible promoter to provide the ability to induce the expression of a coding region operably linked to that inducible promoter. Exemplary inducible promoters contemplated by the disclosure include the arabinose promoter (ParaBAD), ParaC, ParaE, the propionate promoter (PprpBCDE), the rhamnose promoter (PrhaSR), the xylose promoter (PxylA), the lactose promoter, and the alkaline phosphatase promoter. Additional information on contemplated inducible promoters, including the sequences thereof, is provided in WO 2016/205570, incorporated herein by reference in relevant part. In addition to inducible promoters, the disclosure comprehends expression constructs comprising constitutive promoters. To ensure that RNA transcribed from the expression construct is efficiently translated, the construct may also include a ribosome binding site (RBS). In prokaryotes in general (archaea and bacteria), the RBS consensus sequence is GGAGG or GGAGGU, and in bacteria such as E. coli, the RBS consensus sequence is further defined as AGGAGG or AGGAGGU. To facilitate incorporation of a coding region or gene of interest, the expression construct may include a multiple cloning site in which a variety of restriction endonuclease cleavage sites are clustered to provide flexibility in incorporating exogenous polynucleotides, as is known in the art. Some expression constructs of the disclosure further include a coding region for a signal peptide or leader peptide, wherein the coding region is oriented to result in expression of fusion protein comprising the signal peptide and the active gene product of interest.

[0128] As mentioned above, inducible promoters are contemplated for use with the expression constructs to be introduced into the host cells according to the disclosure in order to achieve elevated expression of desired active gene products. Exemplary promoters are described herein and are also described in WO/2016/205570, incorporated herein by reference in relevant part. As described herein, the cells comprising one or more expression constructs may optionally include one or more inducible promoters to express a gene product of interest.

[0129] Chaperones are accessory proteins that assist the non-covalent folding or unfolding, and/or the assembly or disassembly, of other gene products, but do not occur in the resulting monomeric or multimeric gene product structures when the structures are performing their normal biological functions (having completed the processes of folding and/or assembly). Chaperones can be expressed from an inducible promoter or a constitutive promoter within an expression construct, or can be expressed from the host cell chromosome. Exemplary chaperones present in E. coli host cells are the folding factors DnaK/DnaJ/GrpE, DsbC/DsbG, GroEL/GroES, IbpA/IbpB, Skp, Tig (trigger factor), and FkpA, which have been used to prevent protein aggregation of cytoplasmic or periplasmic proteins. DnaK/DnaJ/GrpE, GroEL/GroES, and CIpB can function synergistically in assisting protein folding, and expression of these chaperones in various combinations has been shown to facilitate expression of properly folded gene product. When expressing eukaryotic proteins in prokaryotic host cells, a eukaryotic chaperone protein, such as protein disulfide isomerase (PDI) from the same or a related eukaryotic species, can be co-expressed, e.g., inducibly co-expressed, with the gene product of interest.

[0130] A chaperone that can be expressed in host cells is a protein disulfide isomerase from Humicola insolens, a soil hyphomycete (soft-rot fungus). An amino acid sequence of Humicola insolens PDI is shown as SEQ ID NO: 1 of WO2017/106583; it lacks the signal peptide of the native protein so that it remains in the host cell cytoplasm. The nucleotide sequence encoding PDI was optimized for expression in E. coli; the expression construct for PDI is shown as SEQ ID NO: 2 of WO2017/106583. SEQ ID NO: 2 of WO2017/106583 contains a GCTAGC Nhel restriction site at its 5' end, an AGGAGG ribosome binding site at nucleotides 7 through 12, the PDI coding sequence at nucleotides 21 through 1478, and a GTCGAC Sail restriction site at its 3' end. The nucleotide sequence of SEQ ID NO: 2 of WO2017/106583 was designed to be inserted immediately downstream of a promoter, such as an inducible promoter. The Nhel and Sail restriction sites in SEQ ID NO: 2 of WO2017/106583 can be used to insert it into a vector multiple cloning site, such as that of the pSOL expression vector (SEQ ID NO: 3 of WQ2017/106583), described in published US patent application US2015353940A1 , which is incorporated by reference in its entirety herein. Other PDI polypeptides can also be expressed in host cells, including PDI polypeptides from a variety of species (Saccharomyces cerevisiae (UniProtKB PI 7967), Homo sapiens (UniProtKB P07237), Mus musculus (UniProtKB P09103), Caenorhabditis elegans (UniProtKB Q 17770 and Q 17967), Arabdopsis thaliana (UniProtKB 048773, Q9XI01 , Q9S G3, Q9LJU2, Q9MAU6, Q94F09, and Q9T042), Aspergillus niger (UniProtKB Q12730) and also modified forms of such PDI polypeptides. A PDI polypeptide expressed in host cells of the disclosure can share at least 70%, or 80%, or 90%, or 95% amino acid sequence identity across at least 50% (or at least 60%, or at least 70%, or at least 80%, or at least 90%) of the length of SEQ ID NO: I of WO2017/106583, where amino acid sequence identity is determined according to Example 10 of WO2017/106583.

[0131] Assays to measure accessory protein activity include, PhyTip-based column target heterologous protein expression level quantification (phynexus.com/products/proteins/antibody- binding-phytip-columns/), flow cytometry-based ACE ASSAY™ measuring bound probe to properly folded target protein material (WO2021/146626), and/or an ELISA-based method HiPr bind assay (WO2021/163349), which measures fluorescence signal in a plate-based format of probes binding to properly folded target protein. These methods measure the increase in target protein production in the presence of the accessory protein compared to the production level in its absence. The increase can be at least 1 .5-fold, at least two-fold, at least three-fold, at least four-fold, at least five-fold, at least six-fold, at least seven-fold, at least eight-fold, at least ninefold, at least ten-fold, at least twenty-fold, at least fifty-fold, at least one hundred-fold, or greater.

Biomolecules

[0132] As described herein, the present disclosure provides methods for expressing and/or producing and/or purifying biomolecules of interest, wherein the biomolecule of interest can be a protein, a RNA, and a RNA-DNA hybrid. In some embodiments, the biomolecule is a RNA selected from the group consisting of ncRNA, tRNA, rRNA, snRNA, snoRNA, miRNA, mRNA, and TERC. In still other embodiments, the biomolecule is a protein selected from the group consisting of a therapeutic protein, an antibody, an enzyme, a ligand, an antigen, a growth factor, a receptor, a nucleic acid-binding protein, as well as fragments, analogs and fusions of any of the aforementioned biomolecules and as described herein. In still other embodiments, the biomolecule of interest is a biopolymer, a chemical, a drug, a flavor modifier, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, or a sugar alcohol.

[0133] Antibodies

[0134] The term “antibody” as used herein refers to whole antibodies that interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) an epitope on a target antigen. A naturally occurring "antibody" is a glycoprotein comprising at least two heavy (H) chains and two light (L) chains inter-connected by disulfide bonds. Each heavy chain is comprised of a heavy chain variable region (abbreviated herein as VH) and a heavy chain constant region. The heavy chain constant region is comprised of three domains, CH1 , CH2 and CH3. Each light chain is comprised of a light chain variable region (abbreviated herein as VL) and a light chain constant region. The light chain constant region is comprised of one domain, CL. The VH and VL regions can be further subdivided into regions of hypervariability, termed complementarity determining regions (CDR), interspersed with regions that are more conserved, termed framework regions (FR). Each VH and VL is composed of three CDRs and four FRs arranged from amino-terminus to carboxy-terminus in the following order: FR1 , CDR1 , FR2, CDR2, FR3, CDR3, FR4. The variable regions of the heavy and light chains contain a binding domain that interacts with an antigen. The constant regions of the antibodies may mediate the binding of the immunoglobulin to host tissues or factors, including various cells of the immune system (e.g., effector cells) and the first component (Clq) of the classical complement system. The term “antibody” includes for example, monoclonal antibodies, human antibodies, humanized antibodies, camelised antibodies, chimeric antibodies, single-chain Fvs (scFv), disu If ide-lin ked Fvs (sdFv), Fab fragments, F (ab') fragments, and anti-idiotypic (anti-ld) antibodies (including, e.g., anti-ld antibodies to antibodies of the invention), and epitope-binding fragments of any of the above. The antibodies can be of any isotype (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., lgG1 , lgG2, lgG3, lgG4, lgA1 and lgA2) or subclass. The antibody or epitope-binding fragments may be, or be a component of, a multi-specific molecule.

[0135] Both the light and heavy chains are divided into regions of structural and functional homology. The terms “constant” and “variable” are used functionally. In this regard, it will be appreciated that the variable domains of both the light (VL) and heavy (VH) chain portions determine antigen recognition and specificity. Conversely, the constant domains of the light chain (CL) and the heavy chain (CH1 , CH2 or CH3) confer important biological properties such as secretion, transplacental mobility, Fc receptor binding, complement binding, and the like. By convention the numbering of the constant region domains increases as they become more distal from the antigen binding site or amino-terminus of the antibody. The N-terminus is a variable region and at the C-terminus is a constant region; the CH3 and CL domains actually comprise the carboxy-terminus of the heavy and light chain, respectively.

[0136] The phrase “antibody fragment”, as used herein, refers to one or more portions of an antibody that retain the ability to specifically interact with (e.g., by binding, steric hindrance, stabilizing/destabilizing, spatial distribution) a target epitope. Examples of binding fragments include, but are not limited to, a Fab fragment, a monovalent fragment consisting of the VL, VH, CL and CH1 domains; a F(ab)2 fragment, a bivalent fragment comprising two Fab fragments linked by a disulfide bridge at the hinge region; a Fd fragment consisting of the VH and CH1 domains; a Fv fragment consisting of the VL and VH domains of a single arm of an antibody; a dAb fragment (Ward et al., (1989) Nature 341 :544-546), which consists of a VH domain; and an isolated complementarity determining region (CDR). Furthermore, although the two domains of the Fv fragment, VL and VH, are coded for by separate genes, they can be joined, using recombinant methods, by a synthetic linker that enables them to be made as a single protein chain in which the VL and VH regions pair to form monovalent molecules (known as single chain Fv (scFv); see e.g., Bird et al., (1988) Science 242:423-426; and Huston et al., (1988) Proc. Natl. Acad. Sci. 85:5879-5883). Such single chain antibodies are also intended to be encompassed within the term “antibody fragment”. These antibody fragments are obtained using conventional techniques known to those of skill in the art, and the fragments are screened for utility in the same manner as are intact antibodies.

[0137] As described herein, antibodies may include biologically active derivatives or variants or fragments. As used herein "biologically active derivative" or "biologically active variant" includes any derivative or variant of an antibody having substantially the same functional and/or biological properties of said antibody (e.g., a WT antibody), such as binding properties, and/or the same structural basis, such as a peptidic backbone or a basic polymeric unit, including framework regions. As described herein, “biologically active biomolecules” additionally includes, in some embodiments, proteins that are enzymatically active and/or properly folded, and/or that exhibit a desired affinity to an antigen or ligand and/or exhibit stability under specific conditions such as temperature (e.g., temperature sensitivity/stability). As described herein, one or more of the aforementioned properties can be determined/measured from techniques known in the art including, for example, HiPrBind assays. [0138] An “analog,” such as a “variant” or a “derivative,” is an antibody substantially similar in structure and having the same biological activity, albeit in certain instances to a differing degree, to a naturally-occurring antibody or a WT antibody or another reference antibody as will be understood by those of skill in the art. For example, an antibody variant refers to an antibody sharing substantially similar structure and having the same biological activity as a reference antibody. Variants or analogs differ in the composition of their amino acid sequences compared to the reference antibody from which the analog is derived, based on one or more mutations involving (i) deletion of one or more amino acid residues at one or more termini of the antibody and/or one or more internal regions of the antibody sequence (e.g., fragments), (ii) insertion or addition of one or more amino acids at one or more termini (typically an “addition” or “fusion”) of the antibody and/or one or more internal regions (typically an “insertion”) of the antibody sequence or (iii) substitution of one or more amino acids for other amino acids in the antibody sequence. By way of example, a “derivative” is a type of analog and refers to an antibody sharing the same or substantially similar structure as a reference antibody that has been modified, e.g., chemically.

[0139] In some embodiments, the variants or sequence variants are mutants wherein 1 , 2, 3, 4, 5, 6 or more amino acids within one or more CDR are mutated relative to a reference antibody. In some embodiments, CDRs on the light chain, heavy chain, or both heavy and light chain, are mutated. In some embodiments, one or more framework amino acid residues are mutated relative to a reference antibody.

[0140] In substitution variants, one or more amino acid residues, e.g., in a CDR region, of an antibody are removed and replaced with alternative residues. In one aspect, the substitutions are conservative in nature and conservative substitutions of this type are well known in the art. Alternatively, the disclosure embraces substitutions that are also non-conservative. Exemplary conservative substitutions are described in Lehninger, [Biochemistry, 2nd Edition; Worth Publishers, Inc., New York (1975), pp.71 -77],

[0141] Antibodies contemplated herein include full-length antibodies, biologically active subunits or fragments of full length antibodies, as well as biologically active derivatives and variants of any of these forms of therapeutic proteins. Thus, antibodies include those that (1 ) have an amino acid sequence that has greater than about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98% or about 99% or greater amino acid sequence identity, over a region of at least about 25, about 50, about 100, about 200, about 300, about 400, or more amino acids, to a reference antibody (e.g., encoded by a referenced nucleic acid or an amino acid sequence described herein). According to the present disclosure, the term "recombinant protein" or “recombinant antibody” includes any protein obtained via recombinant DNA technology. In certain embodiments, the term encompasses antibodies as described herein.

[0142] In some embodiment, the antibodies or antibody variants described herein are expressed from one or more expression construct and/or in a cell or strains as described herein.

Exemplary wild-type or reference antibodies include commercially available or other known antibodies, including therapeutic monoclonal antibodies. Reference antibodies according to the present disclosure may include any antibodies now known or later developed, including those that are not clinically and/or commercially available.

[0143] As used herein and in the appended claims, the singular forms "a," "and," and "the" include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any element, e.g., any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely," "only" and the like in connection with the recitation of claim elements, or use of a "negative" limitation.

[0144] When a range of values is provided herein, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.

[0145] The terms “identical” and "identity" as used herein refer to a relationship between the sequences of two or more polypeptide molecules or two or more nucleic acid molecules, as determined by aligning and comparing the sequences. "Percent identity" means the percent of identical residues between the amino acids or nucleotides in the compared molecules and is calculated based on the size of the smallest of the molecules being compared. For these calculations, gaps in alignments (if any) must be addressed by a particular mathematical model or computer program (/.e., an "algorithm"). Methods that can be used to calculate the identity of the aligned nucleic acids or polypeptides are standard in the art. Methods can include those described in Computational Molecular Biology, (Lesk, Ed.), 1988, New York: Oxford University Press; Biocomputing Informatics and Genome Projects, (Smith, Ed.), 1993, New York: Academic Press; Computer Analysis of Sequence Data, Part I, (Griffin and Griffin, Eds.), 1994, New Jersey: Humana Press; Sequence Analysis in Molecular Biology, (von Heinje), 1987, New York: Academic Press; Sequence Analysis Primer, (Gribskov and Devereux, Eds.), 1991 , New York: M. Stockton Press; and Carillo et al., SIAM J. Applied Math., 48 1073 (1988).

[0146] In calculating percent identity, the sequences being compared are aligned in a way that gives the largest match between the sequences. An exemplary computer program used to determine percent identity is the GCG program package, which includes GAP (Devereux et al., Nucl Acid Res, 12 387 (1984); Genetics Computer Group, University of Wisconsin, Madison, Wise.). The computer algorithm GAP is used to align the two polypeptides or polynucleotides for which the percent sequence identity is to be determined. The sequences are aligned for optimal matching of their respective amino acid or nucleotide (the "matched span", as determined by the algorithm). A gap opening penalty (which is calculated as 3. times, the average diagonal, wherein the "average diagonal" is the average of the diagonal of the comparison matrix being used; the "diagonal" is the score or number assigned to each perfect amino acid match by the particular comparison matrix) and a gap extension penalty (which is usually 1/10 times the gap opening penalty), as well as a comparison matrix such as PAM 250 or BLOSUM 62 are used in conjunction with the algorithm. A standard comparison matrix [e.g., Dayhoff et al., Atlas of Protein Sequence and Structure, 5:345-352 (1978) for the PAM 250 comparison matrix;

Henikoff et al., Proc. Natl. Acad. Sci. USA, 89 10915-10919 (1992) for the BLOSUM 62 comparison matrix] can also be used by the algorithm.

[0147] Recommended parameters for determining percent identity for polypeptides or nucleotide sequences using the GAP program are the following: Algorithm: Needleman etal., J. Mol. Bio!., 48:443-453 (1970); Comparison matrix: BLOSUM 62 from Henikoff et al., 1992, supra-, Gap Penalty: 12 (but with no penalty for end gaps); Gap Length Penalty: 4; Threshold of Similarity: 0.

[0148] Certain alignment schemes for aligning two amino acid sequences can result in matching of only a short region of the two sequences, and this small aligned region can have very high sequence identity even though there is no significant relationship between the two full- length sequences. Accordingly, the selected alignment method (GAP program) can be adjusted if so desired to result in an alignment that spans at least 50 contiguous amino acids of the target polypeptide.

[0149] Other exemplary programs that compare and align pairs of sequences include, but are not limited to, ALIGN (Myers and Miller, Comput Appl Biosci, 19, 4(1 ): 1 1-17 (1988), FASTA (Pearson and Lipman, Proc Natl Acad Sci USA, 85(8): 2444-2448 (1988); Pearson, Methods Enzymol, 183: 63-98 (1990) and gapped BLAST (Altschul et a!., Nucleic Acids Res, 25(17):3389- 40 (1997), BLASTP, BLASTN, or GOG (Devereux et al., Nucleic Acids Res, 12(1 Pt 1):387-95 (1984).

[0150] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure.

[0151] All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials for the purpose for which the publications are cited.

[0152] As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure. Any recited method can be carried out in the order of events recited or in any other order which is logically possible. This disclosure is intended to provide support for all such combinations.

[0153] As used herein, “can comprise” or “can be” indicates something envisaged by the inventors that is functional and available as part of the subject matter provided.

[0154] While the following examples describe specific embodiments, variations and modifications will occur to those skilled in the art. Accordingly, only such limitations as appear in the claims should be placed on the invention.

Examples

Media optimization using a genetic algorithm

[0155] The present Example describes the use of a genetic algorithm (GA) to identify improved media formulations that resulted in several-fold higher titers of an antibody fragment produced in SoluPro™ E. coli (See, e.g., WO/2014/025663 and WO/2017/106583). Python software was created to execute the GA, and to output random pipetting volumes to a .csv file for 60 mixtures of 14 individual media components (sugar, nitrogen, salt, and trace metals). The .csv file was uploaded to a liquid handling robot to execute pipetting of each of the 14 media components into 60 wells of a small-scale plate-based bioreactor system with pH control and feeding ability. These 60 unique media were inoculated with a SoluPro E.coli engineered to produce a monoclonal Fab. Broth was harvested after 48h and measured by HiPrBind™, an ELISA-type assay that specifically binds properly folded, functional target protein and yields signal corresponding to the amount of target protein present. Individuals with the highest HiPrBind™ signal from the initial round were selected as parents for a subsequent round of mixture, and an evolutionary approach to recombination of their pipetting volumes was used to produce another 60 mixtures. This was repeated twice, yielding populations with improved antibody Fab titers on average 25% higher than the mean of the initial mixtures. One condition was scaled to a stirred bioreactor with >25% improvement over a conventionally optimized process using the same strain. A comprehensive RNAseq set was collected to characterize the host’s transcriptional response. Independent Component Analysis (ICA) of the RNAseq data set revealed independently modulated gene sets (iModulons) that characterized the host response to different media formulations.

[0156] Software

[0157] A genetic algorithm (GA) was developed using the DEAP library in Python to identify the media formulations that resulted in highest product titers. In this GA, each individual media composition is represented by a vector of media component concentrations. Concentrations can either be discrete or continuous, and each component is designated a minimum and maximum concentration. At the start of the GA run, a random pool was generated of 60 individual media with -900 pipetting events performed on a Hamilton liquid handling robot. These media compositions were tested in the BioLector Pro (Beckman Coulter) to measure product formation. In each subsequent round of the GA, a cyclical select-reproduce-mutate-cull process was followed as is common with a (mu, lambda) evolutionary strategy. This means that children replace the parents.

[0158] For selection: A tournament selection process was used to determine the parent individual for the subsequent generation. In this process, five individuals were randomly sampled from the population and the individual with the highest fitness was selected as a parent. This process is repeated until the number of selected parents is equal to the desired population size. [0159] For mutation: In the mutation step, simple mutations were applied to 25% of individuals, and crossovers were applied to the remaining 75%. In a simple mutation, we change the concentration of one component in the formulation to a random value within the variable bounds. In a crossover, two crossover points are selected in a pair of parents. The values in between the points are switched to form the offspring.

[0160] In addition, physical constraints were added to the final populations before measurements. First, the total carbon and nitrogen volumes were each limited to 80 pL. If an individual had a total carbon or nitrogen volume above this amount, all carbon (or nitrogen) components were rescaled such that the total was 80 pL. Similarly, if the total volume of all reagents in the media was above 800 pL, all components were rescaled so that the media volume was 800 pL.

[0161] Cultivation

[0162] The fermentations took place simultaneously in two BioLector Pro instruments in a microfluidic 32 well FlowerPlate that measures and logs pH and 02 concentration using optodes, oxygenates by shaking and controls temperature at programmed setpoints. Two wells were reserved on each plate for control medium. Sixty wells were used during each round to test mixture outputs from the Python GA program. For these GA fermentations, a 1 -sided pH control process was applied using 5% ammonium hydroxide. At the start of each fermentation, the pH was adjusted to 6.6 for each chromosome (media condition) in each well if necessary. The Biolector Pro has several substrate addition options including constant, linear, exponential, and signal triggered. For this Example, signal-triggered feeding was used. Any instrument- logged signal can be chosen (i.e. Biomass, pH, DO) to trigger microfluidic pumps to add a specified volume. For these experiments, a pH trigger was used; pH was maintained at 6.6 with a trigger at 6.65. A value above 6.65 triggers addition of 5.5 pL of substrate consisting of a carbon source and inducers for both the protein of interest and accessory molecules to promote proper folding and solubility of the Fab. The control program includes the options to have a block and a pause. Block time is defined as the time a trigger condition must be continuously met before the trigger is activated. Pause time is defined as the time that must elapse after a trigger was activated before it can be activated again. For the study, 15 minutes was chosen for both. An initial 4-hour pause was used before enabling feed triggering. To initiate a bolus of feed, the pH set point must remain above 6.65 for fifteen minutes before feed is added as a bolus. After the bolus, the program monitors the pH for 15 minutes before a subsequent bolus may be added. These values were chosen for three reasons: 1 ) to prevent noisy signals from triggering a bolus 2) to prevent overfeeding, especially with media mixtures that might raise the pH of the well early in the run and 3) to set a growth rate ceiling so the cells don't go into oxygen limitation too quickly. Lastly, the shaking frequency, oxygen concentration, and humidity were kept constant throughout the 48-hour fermentation at 1100 rpm, 35%, and 85%, respectively.

Temperature was set at 32^eC for the first twelve hours after inoculation then reduced to 26^eC for the remainder of the fermentation.

[0163] Titer measurement

[0164] The HiPrBind™ assay was used to assess performance of experimental samples. HiPrBind™ is an ELISA-type assay that specifically binds properly folded, functional target protein and produces a signal corresponding to the amount of target protein present (WO2021/163349 and described herein). To measure properly folded, active target protein, two analyte associating moieties were used in combination with a signal donor and an activatable compound.

[0165] In this binding cascade, one moiety forms a complex with the signal donor, and the other moiety forms a complex with an activatable compound. If properly folded, active target material is present, the signal donor complex and the activatable compound complex will both bind and provide detectable output. If properly folded, active target material is not present, the complexes do not associate, and no signal is produced.

[0166] RNAseq

[0167] To stabilize RNA between sample collection and RNA isolation, culture was sampled directly into 3x volume of RNAIater (Thermo Fisher). The culture and RNAIater mixture were spun down, supernatant was removed, followed by resuspension of the pellet in 3x volume of RNAIater for complete quenching. A volume of 10-50 pL of the treated culture was mixed with >300 pL of Trizol. RNA was then isolated using Direct-zol-96 Magbead RNA (Zymo Research, PN R2100), according to manufacturer’s protocol and including the optional DNase I treatment.

[0168] Sequencing libraries were prepared from extracted RNA with Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Illumina, PN 20040529), according to the manufacturer’s protocol. This prep kit depletes ribosomal RNA, reverse transcribes remaining RNA into cDNA (complementary DNA), then ligation and subsequent amplification adds adapters and dual-indexes for multiplex sequencing on an Illumina instrument. Libraries were normalized, pooled, and then diluted to 750 pM for 2x75bp sequencing on an Illumina Nextseq 1000 using P2 Reagents (200 cycle, v3) (Illumina, PN 20046812). [0169] iModulon/PCA

[0170] Conducting independent component analysis on gene expression data

[0171] The Scikit-learn (vO.23.2) (Pedregosa F. et al., J Mach Learn Res. 2011 ; 12:2825-30) implementation of FastICA (Hyvarinen A., IEEE Trans Neural Netw. 1999;10:626-34) was executed 100 times with random seeds and a convergence tolerance of 10-7. The resulting independent components (ICs) were clustered using DBSCAN (Ester M, et al., In: Kdd 1996; p. 226-31 ) to identify robust ICs, using an epsilon of 0.1 and minimum cluster seed size of 50. To account for identical components with opposite signs, the following distance metric was used for computing the distance matrix:

[0172] dx,y=1 — ||px,y||

[0173] where px,y is the Pearson correlation between components x and y. The final robust ICs were defined as the centroids of the cluster. Again, to account for identical components with opposite signs, we choose one component as the canonical direction, and flip all other components in the cluster to ensure that the Pearson correlation is positive between all members of the cluster before computing the centroid.

[0174] Identifying significant genes in an independent component

[0175] To perform regulator enrichments on components, genes with significantly high weightings must be identified. To keep this method agnostic to the prior regulatory structure, the Scikit-learn (Pedregosa F, et al., J Mach Learn Res. 2011 ;12:2825-30) implementation of K- means clustering was applied to the absolute values of the gene weights in each independent component. All genes in the top two clusters were deemed significant, and the set of significant genes in each independent component was called the iModulon.

[0176] Associating iModulons to regulators

[0177] The set of significant genes in each component, or the iModulon, was compared to each regulon by the two-sided Fisher’s exact test (FDR < 10-5) to determine regulator enrichment. F1 -scores were calculated to evaluate these associations. The F1 -score is the harmonic average of precision and recall between a component and its linked regulon.

Precision is the proportion of genes in the component that are present in the associated regulon and recall is the proportion of genes in the regulon that are present in the associated component. Prior information about regulator binding sites was borrowed from previous studies (Sastry AV, et al., Nat Common. 2019;10:5536; Rychel K, et al., Nat Common. 2020;11 :6338; and Poodel S, et al., Proc Natl Acad Sci USA. 2020;117:17228-39).

[0178] FIG. 3, shows Fab Expression in SoloPro E.coli. Random mixtores (light grey) versos mixtores evolved throogh two roonds of evolotionary selection by the genetic algorithm (dark grey) are shown versos control conditions (diagonal hatch). The mean relative expression of the evolved mixtores average 25% higher than onevolved mixtores after 2 roonds of the GA.

[0179] As shown in FIG. 4, the HiPrBind™ (HPB) signal significantly increased between before (Roond 0) and after (Roond 2) the genetic algorithm. The mean signal increased 1 .9 fold between the initial and final popolations.

[0180] As shown in FIG. 5, gene expression after dimensionality redaction (left plot) by principal component analysis of the RNAseq data shows convergence on an area that corresponds to the highest HiPrBind™ signal (right plot).

[0181] As shown in FIG. 6, scatter plots of iModolon data have been converted to bins based on bins with the Y axis representing HiPrBind™ (HPB) signal and the Y axis the freqoency per bin of the particolar iModolon being expressed. In plots of FIGs. 6A and 6B, the iModolons represent the stress response sigma factors RpoS and RpoH respectively. The general trend shows that lower stress response (RpoS) and less protein misfolding (RpoH) is associated with higher HiPrBind™ signal. In FIG. 6C, starvation of leocine is shown to trend towards higher HiPrBind™ signal, soggesting addition of leocine may improve protein expression. In FIG. 6D, DksA, a ribosomal protein sobonit regolator, soggests that higher levels of ribosomes trend to higher HiPrBind™ signal. FIG. 6E, Cbl, is an iModolon associated with solfor metabolism. High HiPrBind™ signals here may indicate a state of solfor starvation. FIG. 6F and 6G show iModolon signal related to iron metabolism. For is opregolated doring iron starvation and the iron-solfor closter regolator IscR is also trends to a state of opregolation when HiPrBind™ signals are higher.

[0182] For the first time, a genetic algorithm approach to optimize media was coopled with pipetting execoted by liquid handling robot into a micro-fermentation plate system with pH control and feeding, followed by RNAseq and iModolon analysis. This discovery shows the power of combining GA with iModolon for prodoction of protein, yielding both improved media composition and an indication of aspects of the onderlying mechanisms for the improvement. The iModolon analysis may also inform farther improvements, soch as in this case sopplementing leocine, methionine, or cysteine, redocing iron concentrations, etc. This media optimization approach can be applied, in various embodiments, to biological drugs such as monoclonal antibodies and antibody fragments, enzymes, edible proteins, and metabolites produced by microbes or other production hosts. This mixture was scaled into a stirred bioreactor, demonstrating that mixtures identified by GA in a microplate with pH control and feeding can be quickly scaled to industrial production.

[0183] Scale-Up of Genetic Algorithms

[0184] FIG. 7A depicts scale-up, purification and characterization of the purified material, according to some aspects, including scaling-up, transcriptomic analysis and analytical characterization of purified material.

[0185] FIG. 7B depicts integrated OUR and CER, according to some aspects.

[0186] FIG. 70 depicts that scaling up from microplates to bioreactors increases quality by a high percentage (e.g., 111 % with doubling of carbon), according to some aspects.

[0187] FIG. 7D depicts consistent quality trends across media during scale-up for a top number of hits (e.g., two hits).

[0188] FIG. 7E depicts media with an increase in quality compared to other strains, according to some aspects.

[0189] FIG. 7F depicts media with increased quality for two strains across multiple timepoints, according to some aspects.

[0190] FIG. 7G depicts media having an increase in quality compared to other strains, according to some aspects.

[0191] FIG. 7H depicts genetic algorithm media that resulted in the identification of a top strain that incorporates genetic algorithm media’s advantages in the genetics, according to some aspects.

[0192] FIG. 7I depicts measures of quality increase with double (e.g., 2x) carbon in genetic algorithm media, according to some aspects.

[0193] FIG. 7J depicts genetic algorithm-based media optimization results in Mab that are structurally similar to OHO produced Mab, according to some aspects. FIG. 7J shows that the genetic algorithm-based media optimization results in Mab (e.g, produced in SoluPro Ecoli) that is structurally similar to CHO produced Mab.

[0194] Downstream Purification and Characterization of GA-Generated Material [0195] FIG. 8A depicts downstream purification and characterization of genetic algorithmgenerated material, according to some aspects.

[0196] FIG. 8B depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a first strain, according to some aspects.

[0197] FIG. 8C depicts a table showing genetic algorithm-generated material resulting in superior quality of a final purified antibody with respect to a second strain, according to some aspects.

[0198] Structural Assay of Antibodies

[0199] FIG. 9A depicts an exemplary structural characterization environment, according to some aspects.

[0200] FIG. 9B depicts an exemplary mab study of second derivatives for structural fingerprints, according to some aspects.

[0201] FIG. 9C depicts an exemplary mab study of a delta plot for clear visualization of differences, according to some aspects.

[0202] iModulons

[0203] FIG. 10A depicts iModulon gene sets, according to some aspects. FIG. 10A shows the principal component analysis (PCA) of the RNAseq data and quality. Principal component analysis of RNAseq data (left) reduces ~4000-dimensional gene expression.

[0204] FIG. 10B depicts iModulon analysis of genetic-algorithm scale-up provided insights for strain and process improvements to lower COGs, retain CQAs and boost titers, according to some aspects. iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titers.

[0205] FIG.s 11 A-11 D depict iModulon activity, according to some aspects. In FIGs. 11 A-D, RNAseq data was processed and into grouds of functionally co-regulated genesets known as iModulons. The iModulon analysis was collected from the RNAseq data of multiple strains (FIG.

1 1 A-11 D), media or process conditions and OxyR is a regulator of antioxidants genes and is an indicator.

[0206] In FIG. 11 A, OxyR expression was used to assess the different strains engineered to improve their oxidative stress. In FIG. 1 1 B, PhoB is an iModulon of phosphate metabolism. In FIG. 11 C, PhoP is an iModulon of metal homeostatis and Fur, in FIG. 11 D, is an iModulon of iron uptake. These iModulons may be used to improve the process conditions by supplementing respective nutrient to make process imporvments over the course of cultivation in Bioreactors.

[0207] FIG. 12 depicts iModulon analysis of the genetic algorithm scale up provided insights for strain and process improvement, to lower COGs, retain CQAs and boost titers, according to some aspects. iModulon analysis of the GA scale-up provided insights for strain and process improvement to lower COGs, retain CQAs and boost titer. The fitness score of the strain G with strain improvement insights obtained from iModulon analysis results in a strain with an improved quality fitness score. Thus GA media was used as a benchmark and incorporate the benefits of the GA media into the strain.

[0208] Exemplary Source Code

[0209] As discussed herein, aspects of the present techniques may be implemented using source code in suitable programming languages, such as Python, Java, C++, etc. For example following are source code listings that may be implemented, in some aspects:

# Codon optimization from abscijibrary import codon optimizer library = codon_optimizer.reverse_translate(library) library.to_csv(‘covid_antibody_designs.csv’) library. to_web_lab(assay=’ACE’)

# Lead optimization model from absci import lead_opt_model lead_optimizer = lead_opt_model.load_latest() library.naturalneses = lead_optimizer.naturalneses(library) lead_optimizer.optimize(library).to_wet_lab(assay=’SPR’)

[0001] The various embodiments described herein can be combined to provide further embodiments. All U.S. patents, U.S. patent application publications, U.S. patent application, foreign patents, foreign patent application and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified if necessary to employ concepts of the various patents, applications, and publications to provide yet further embodiments. [0002] These and other changes can be made to the embodiments in light of the abovedetailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.

[0003] Aspects of the techniques described in the present disclosure may include any of the following aspects, either alone or in combination:

[0004] 1 . A method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest, said method comprising: (1) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.

[0005] 2. A method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.

[0006] 3. A method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions; (2) culturing cells in the mixture matrix of (1 ); and (3) measuring two or more or all of the following: (a) the amount of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.

[0007] 4. The method of any one of aspects 1-3, optionally further comprising the steps of: (1 ) identifying multiple optimized cell media formulations; (2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation; (3) culturing cells in the mixture of (2); and (4) measuring two or more or all of the following: (a) the amount and/or quality of biomolecules of interest expressed; (b) at least one gene that is transcribed; and (c) at least one independently modulated gene set.

[0008] 5. The method of any one of aspects 1-4, wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm.

[0009] 6. The method of aspect 5, wherein the high-throughput device is selected from the group consisting of a liquid-handling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.

[0010] 7. The method of any one of aspects 1-6, wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multi-well plates, and one or more multi-well or multi-vessel bioreactor systems.

[0011] 8. The method of aspect 7, wherein the multi-well plate comprises 6, 12, 24, 32, 48, 64, 96, 384, or 1 ,536 wells.

[0012] 9. The method of any one of aspects 1-8, wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factors, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.

[0013] 10. The method of aspect 9, wherein the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.

[0014] 11 . The method of any one of aspects 1-10, wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.

[0015] 12. The method of any one of aspects 1-11 , wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.

[0016] 13. The method of aspect 12, wherein the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitopebinding fragments of any of the above.

[0017] 14. The method of any one of aspects 1-13, wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest.

[0018] 15. The method of any one of aspects 1-14, further comprising measuring cell growth.

[0019] 16. The method of any one of aspects 1-15, wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA.

[0020] 17. The method of any one of aspects 1-16, wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis. [0021] 18. The method of any one of aspects 1-17, wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.

[0022] 19. The method of aspect 18, wherein the cells are bacterial cells.

[0023] 20. The method of aspect 19, wherein the bacterial cells are E. coli cells.

[0024] 21 . The method of aspect 20, wherein the E.coli cells comprise one or more or all of:

(a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter; (b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter; (c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm; (e) a reduced level of gene function of a gene that encodes a reductase; (f) at least one expression construct encoding at least one disulfide bond isomerase protein; (g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or (h) at least one polynucleotide encoding Ervlp.

[0025] 22. A method of producing a biomolecule of interest comprising culturing a host cell comprising an expression construct encoding the biomolecule of interest in an optimized cell media formulation as determined by the method of aspect 1 .

[0026] 23. A computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following: (a) an amount of a biomolecule produced in the inoculated matrix, (b) at least one arrangement that is transcribed, and (c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.

[0027] 24. A computer-implemented method for improving quality of monoclonal antibodies, comprising: (i) performing a media optimization using a genetic algorithm workflow; (ii) harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) selecting a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) scaling one or more candidates to a biorecator scale; (v) collecting RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) performing a downstream purification and analytical characterization on the product obtained from the use of the new improved media.

[0028] 25. The computer-implemented method of aspect 24, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.

[0029] 26. The computer-implemented method of any of aspects 24-25, further comprising: repeating step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.

[0030] 27. A computing system for improving quality of monoclonal antibodies, comprising: a bioreactor, one or more processors, and one or more memories having stored thereon comptuer-executable instructions that, when executed by the one or more processors, cause the system to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.

[0031] 28. The computing system of aspect 27, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab. [0032] 29. The computing system of any of aspects 27-28, the one or more memories including further instructions that, when executed, cause the system to: repeat step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.

[0033] 30. A non-transitory computer-readable media having stored thereon computerexecutable instructions that, when executed, cause a computer to: (i) perform a media optimization using a genetic algorithm workflow; (ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures; (iv) receive data corresponding to scaling one or more candidates to a biorecator scale; (v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.

Claims

WHAT IS CLAIMED:

1 . A method of identifying an optimized cell media formulation capable of promoting the expression of a biomolecule of interest, said method comprising:

(1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions;

(2) culturing cells in the mixture matrix of (1); and

(3) measuring two or more or all of the following:

(a) the amount of biomolecules of interest expressed;

(b) at least one gene that is transcribed; and

(c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise the optimized cell media formulation.

2. A method of identifying one or more genes and/or one or more independently modulated gene sets that are transcribed in response to a cell media formulation, said method comprising:

(2) culturing cells in the mixture matrix of (1); and

(3) measuring two or more or all of the following:

(a) the amount of biomolecules of interest expressed;

(b) at least one gene that is transcribed; and

(c) at least one independently modulated gene set; wherein measurements of (3) identify one or more genes and/or one or more independently modulated gene sets.

3. A method of increasing the yield of biomolecule expression in a cell culture system, said method comprising: (1 ) applying a mixing algorithm to a high-throughput device capable of providing a mixture matrix, wherein said mixture matrix comprises a multitude of (a) cell media formulation components, and (b) cell media formulation component conditions;

(2) culturing cells in the mixture matrix of (1); and

(3) measuring two or more or all of the following:

(a) the amount of biomolecules of interest expressed;

(b) at least one gene that is transcribed; and

(c) at least one independently modulated gene set; wherein measurements of (3) identify the cell media formulation components and the cell media formulation component conditions that comprise an optimized cell media formulation for increasing the yield of biomolecule expression in a cell culture system.

4. The method of any one of claims 1 -3, optionally further comprising the steps of:

(1 ) identifying multiple optimized cell media formulations;

(2) mixing at least one cell media formulation component and condition from one identified optimized cell media formulation with at least one cell media formulation component and condition from a second identified optimized cell media formulation;

(3) culturing cells in the mixture of (2); and

(4) measuring two or more or all of the following:

(a) the amount and/or quality of biomolecules of interest expressed;

(b) at least one gene that is transcribed; and

(c) at least one independently modulated gene set.

5. The method of any one of claims 1 -4, wherein said mixing algorithm is selected from the group consisting of a genetic algorithm, a naive Bayes algorithm, a differential evolution algorithm, and a particle swarm algorithm.

6. The method of claim 5, wherein the high-throughput device is selected from the group consisting of a liquid-handling robot, a droplet micro array system, a powder mixing system, and a microfluidic mixing system.

7. The method of any one of claims 1 -6, wherein the mixture matrix comprises one or more multi-well plates, one or more controlled-release multi-well plates, and one or more multi-well or multi-vessel bioreactor systems.

8. The method of claim 7, wherein the multi-well plate comprises 6, 12, 24, 32, 48,

64, 96, 384, or 1 ,536 wells.

9. The method of any one of claims 1 -8, wherein the cell media formulation components are selected from the group consisting of an analyte, a salt, a carbon source, a buffer, a nitrogen source, a pH, a temperature, a metal salt, a trace mineral, a biostimulants, a co-factors, a peptide, a modified peptide, a nucleic acid, a nucleic acid precursor, a small molecule, and a vitamin.

10. The method of claim 9 wherein the cell media formulation component conditions are selected from the group consisting of a concentration, a pH value, a temperature value, cell media formulation component conditions.

11 . The method of any one of claims 1 -10, wherein the culturing is performed under conditions that promote cell growth, said conditions comprising constant or intermittent shaking, constant or intermittent oxygen, constant or intermittent humidity, constant or intermittent temperature, pH control, feeding, aerobic cultivation, anaerobic cultivation, and solid-phase culturing.

12. The method of any one of claims 1 -11 , wherein the biomolecule of interest is a therapeutic protein, a growth factor, an enzyme, an antibody, a receptor, a nucleic acid-binding protein, an antigen, a ligand, a peptide, a biopolymer, a chemical, a drug, a flavor modifier, a single cell protein, an edible product, a texture modifier, a dye, a pesticide, a fungicide, a herbicide, a secondary metabolite, an acid, an oil, an alcohol, and a sugar alcohol, or fragments, analogs and fusions of any of the aforementioned biomolecules.

13. The method of claim 12, wherein the biomolecule is an antibody or fragment, analog or fusion thereof selected from the group consisting of a commercial antibody, a noncommercial antibody, a clinical antibody, a non-clinical antibody, a research -grade antibody, a diagnostic-grade antibody, a publicly-available antibody, an antibody derived from patient samples, a de novo antibody discovered in vivo, a de novo antibody discovered in vitro, or a de novo antibody discovered in silico, a monoclonal antibody, a human antibody, a humanized antibody, a camelised antibody, a chimeric antibody, single-chain Fvs (scFv), disulfide-linked Fvs (sdFv), Fab fragments, F (ab') fragments, anti-idiotypic (anti-ld) antibody, and epitopebinding fragments of any of the above.

14. The method of any one of claims 1 -13, wherein the amount of biomolecules of interest expressed is measured, and wherein said measuring further comprises measuring the amount of biologically active biomolecules of interest and/or the stability of the biomolecules of interest.

15. The method of any one of claims 1 -14, further comprising measuring cell growth.

16. The method of any one of claims 1 -15, wherein at least one gene that is transcribed is measured, wherein said measuring comprises measuring the quantity and sequences of RNA.

17. The method of any one of claims 1 -16, wherein at least one independently modulated gene set is measured, wherein said measuring comprises independent component analysis.

18. The method of any one of claims 1 -17, wherein the cells are selected from the group consisting of eukaryotic cells, prokaryotic cells, bacterial cells, mammalian cells and insect cells.

19. The method of claim 18, wherein the cells are bacterial cells.

20. The method of claim 19, wherein the bacterial cells are E. coli cells.

21 . The method of claim 20, wherein the E.coli cells comprise one or more or all of:

(a) an alteration of gene function of at least one gene encoding a transporter protein for an inducer of at least one inducible promoter;

(b) a reduced level of gene function of at least one gene encoding a protein that metabolizes an inducer of at least one inducible promoter;

(c) a reduced level of gene function of at least one gene encoding a protein involved in biosynthesis of an inducer of at least one inducible promoter; (d) an altered gene function of a gene that affects the reduction/oxidation environment of the host cell cytoplasm;

(e) a reduced level of gene function of a gene that encodes a reductase;

(f) at least one expression construct encoding at least one disulfide bond isomerase protein;

(g) at least one polynucleotide encoding a form of DsbC lacking a signal peptide; and/or

(h) at least one polynucleotide encoding Ervlp.

22. A method of producing a biomolecule of interest comprising culturing a host cell comprising an expression construct encoding the biomolecule of interest in an optimized cell media formulation as determined by the method of claim 1 .

23. A computing system for identifying an improved bioform substrate comprising: a high-throughput device controller; one or more processors; and one or more memories including computer-executable instructions that, when executed, cause the computing system to: cause, via the one or more processors, the high-throughput device to generate a matrix including a multitude of (a) substrate components, and (b) substrate component conditions; cause, via the one or more processors, the high-throughput device to inoculate the matrix; and cause, via the one or more processors, the high-throughput device to measure respective values of one or more or all of the following:

(a) an amount of a biomolecule produced in the inoculated matrix,

(b) at least one arrangement that is transcribed, and

(c) at least one independently modulated arrangement set by identifying the improved substrate; wherein the two or more measured respective values identify the substrate components and the substrate component conditions of the improved bioform substrate.

24. A computer-implemented method for improving quality of monoclonal antibodies, comprising: (i) performing a media optimization using a genetic algorithm workflow;

(ii) harvesting broth at 68h, by purifying and measuring by an iCIEF assay;

(iii) selecting a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;

(iv) scaling one or more candidates to a bioreactor scale;

(v) collecting RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and

(vi) performing a downstream purification and analytical characterization on the product obtained from the use of the new improved media.

25. The computer-implemented method of claim 24, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.

26. The computer-implemented method of claim 24, further comprising: repeating step (ii) one or more times, to yield populations with improved MAb quality of in comparison to control media.

27. A computing system for improving quality of monoclonal antibodies, comprising: a bioreactor, one or more processors, and one or more memories having stored thereon comptuer-executable instructions that, when executed by the one or more processors, cause the system to:

(i) perform a media optimization using a genetic algorithm workflow;

(ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay; (iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;

(iv) receive data corresponding to scaling one or more candidates to a bioreactor scale;

(v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and

(vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.

28. The computing system of claim 27, wherein the genetic algorithm workflow includes 60 mixtures (media compositions) and 24 media components (sugar, nitrogen, salt and trace metals), and wherein experiments are executed a with SoluPro E.coli strain expressing full-length Mab.

29. The computing system of claim 27, the one or more memories including instructions that when executed, cause the system to: repeat step (ii) one or more times, to yield populations with improved MAb quality in comparison to control media.

30. A non-transitory computer-readable media having stored thereon computerexecutable instructions that, when executed, cause a computer to:

(i) perform a media optimization using a genetic algorithm workflow;

(ii) receive data corresponding to harvesting broth at 68h, by purifying and measuring by an iCIEF assay;

(iii) select a plurality of top candidates with the highest quality scores as parents for one or more subsequent round of mixtures and an evolutionary approach to recombination to produce a plurality of additional mixtures;

(v) receive RNAseq data or other sequencing data to compare the control and new improved media in the bioreactor; and (vi) receive data corresponding to a downstream purification and analytical characterization on the product obtained from the use of the new improved media.