CN117497038A - Method for rapidly optimizing culture medium formula based on nuclear method - Google Patents
Method for rapidly optimizing culture medium formula based on nuclear method Download PDFInfo
- Publication number
- CN117497038A CN117497038A CN202311604082.6A CN202311604082A CN117497038A CN 117497038 A CN117497038 A CN 117497038A CN 202311604082 A CN202311604082 A CN 202311604082A CN 117497038 A CN117497038 A CN 117497038A
- Authority
- CN
- China
- Prior art keywords
- yield
- cells
- culture medium
- matrix
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000001963 growth medium Substances 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 239000000306 component Substances 0.000 claims description 79
- 239000011159 matrix material Substances 0.000 claims description 57
- 210000004027 cell Anatomy 0.000 claims description 34
- 239000002609 medium Substances 0.000 claims description 28
- 239000000203 mixture Substances 0.000 claims description 25
- 238000009472 formulation Methods 0.000 claims description 24
- 238000002474 experimental method Methods 0.000 claims description 18
- 238000004113 cell culture Methods 0.000 claims description 14
- 238000011282 treatment Methods 0.000 claims description 7
- 210000004978 chinese hamster ovary cell Anatomy 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000013076 uncertainty analysis Methods 0.000 claims description 4
- 210000003501 vero cell Anatomy 0.000 claims description 4
- 210000001840 diploid cell Anatomy 0.000 claims description 2
- 210000004408 hybridoma Anatomy 0.000 claims description 2
- 239000012642 immune effector Substances 0.000 claims description 2
- 229940121354 immunomodulator Drugs 0.000 claims description 2
- 230000006870 function Effects 0.000 description 10
- 238000011161 development Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 239000006143 cell culture medium Substances 0.000 description 4
- 238000013401 experimental design Methods 0.000 description 4
- 239000013028 medium composition Substances 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 210000004102 animal cell Anatomy 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000016784 immunoglobulin production Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100425816 Dictyostelium discoideum top2mt gene Proteins 0.000 description 1
- 101100261006 Salmonella typhi topB gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000012533 medium component Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- AYEKOFBPNLCAJY-UHFFFAOYSA-O thiamine pyrophosphate Chemical compound CC1=C(CCOP(O)(=O)OP(O)(O)=O)SC=[N+]1CC1=CN=C(C)N=C1N AYEKOFBPNLCAJY-UHFFFAOYSA-O 0.000 description 1
- 101150032437 top-3 gene Proteins 0.000 description 1
- 101150082896 topA gene Proteins 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Optimization (AREA)
- Medical Informatics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Biotechnology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Algebra (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Probability & Statistics with Applications (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Bioethics (AREA)
- Operations Research (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a method for rapidly optimizing a culture medium formula based on a nuclear method. Specifically, the invention uses a point exchange algorithm to amplify the culture medium formula, uses a pseudo-decision coefficient to determine important components, uses a regression algorithm based on a kernel method to build model prediction yield, and uses an uncertainty algorithm to select candidate culture medium formulas. The method can rapidly optimize the components of the culture medium and obtain the optimal culture medium under limited conditions.
Description
Technical Field
The invention belongs to the field of biological culture media, and particularly relates to a method for rapidly optimizing a culture medium formula based on a nuclear method.
Background
With the development of biotechnology, the components of the culture medium are very complex, and the number of the components is close to more than 100. Medium composition optimization is a particularly important part of the medium development process.
Because of the extremely complex composition and variety of animal cell culture media, numerous factors and levels of experimental design often need to be considered in conducting different project studies and cell type selections to meet the needs. In this process, the use of traditional development methods is highly dependent on the expertise and experience of the researchers. In the traditional method, candidate components and ranges are determined empirically first, and then screening tests are carried out on the components by using a factorial experimental design mode. In response surface test design of the generated key components, the concentration of the components of the culture medium is optimized through response surface fitting results.
This process is highly dependent on the experience and knowledge of the researcher and is a trial and error process. Confirmation of candidate components and ranges is effective directly affects the experimental amount and period of subsequent media development. In recent years, there have been reports in the literature of the development of experimental data for the treatment of small sample multi-variable animal cell culture media using multi-variable analysis software. However, most of the current commercial multivariate analysis software is based on linear model, and the complex nonlinear relationship between the medium composition and the yield reduces the effectiveness of the linear analysis, and the output requirement for stability control in actual production cannot be met.
In view of this, there is a need in the art to develop a general set of cell culture medium development algorithms that can convert nonlinear models into linear models.
Disclosure of Invention
The invention aims to provide a method for rapidly optimizing culture medium components based on a regression algorithm of a nuclear method, which can convert a nonlinear model into a linear model for developing a cell culture medium.
In a first aspect of the invention, a method for rapidly optimizing media composition based on a nuclear method fit is provided,
(a) Providing a culture medium formula to be optimized; wherein the formula comprises p components, and the concentration value of each component is expressed asThe corresponding data set is denoted +.>I is a positive integer of 1-p, and p is 10-150;
(b) Component concentration data set of culture medium formula to be optimizedInputting into point exchange algorithm, and using the point exchange algorithm to make concentration of each component +.>N random point exchange treatments are carried out for the first round, N virtual optimized culture medium formulas subjected to the random point exchange treatments are respectively obtained, and the matrix is expressed as N multiplied by p, wherein N is the number of times of the random point exchange treatments, and N is a positive integer more than or equal to 30;
(c) Calculating a trace (trace) of a covariance matrix of the components for the run based on the N x p matrix;
(d) Repeating steps (b) and (c), repeating the M-1 runs, thereby obtaining traces of covariance matrices of components of each run; wherein M is 200-500.
(e) Comparing M tracks obtained by M rounds to determine the track with the largest value and N x p matrix corresponding to the largest track, namely the track maximum matrix (or T) max A matrix); at said T max The concentration values of the components of each virtual optimized formulation in the matrix are expressed as x i ' its corresponding dataset isThe matrix is denoted as N x P; wherein i is a positive integer from 1 to p, and N is a positive integer from 1 to N;
(f) According to T max N groups of virtual optimized culture medium formulas in the matrix, preparing N groups of culture mediums, and executing a culture experiment to obtain N groups of experiment yield data, wherein the N groups of experiment yield data are recorded as
(g) For T max N groups of virtual optimized medium formulas in the matrix are subjected to component transformation by using a local weighted regression modelFitting of quantity and yield, obtaining N yield prediction data by fitting, and recording as
Predicting data based on the yield in step (g)And experimental yield data in step (f)Calculating pseudo-determinant coefficients R of each of the pseudo-optimized medium formulations using Q1 2 p Thereby obtaining R 2 p The largest a components, denoted the most important components (TopA components), where a is an integer from 3 to 10; from T max Other components are deleted from the matrix, and the matrix of the rest data set is marked as an NxTopA matrix;
wherein;
R 2 p determining coefficients for the false;
y (n) experimental yield data corresponding to the n-th virtual optimized medium formula in the culture experiment in the step (f);
fitting the yield prediction data obtained by the formula fitting of the n-th virtual optimization culture medium to the locally weighted regression model in the step (g);
is the average of all experimental yield data in step (f);
n is an integer from 1 to N;
(h) Modeling by using a nuclear method, and analyzing a nonlinear model obtained by modeling by using the nuclear method by using an uncertainty analysis algorithm to obtain a recommended culture medium, wherein the recommended culture medium has recommended concentrations of one or more components;
(i) Preparing the recommended culture medium obtained in the step (h), and performing a cell culture experiment to obtain the yield of the target product;
(j) When the yield of the target product reaches an expected value, the optimization is finished; and (c) repeating steps (b) - (i) until the yield of the target product reaches the desired value when the yield is lower than the desired value.
In another preferred embodiment, the step (h) specifically includes the following steps:
(h1) Based on the experimental yield data obtained in step (f)Determining the concentration value of each component in the culture medium with the highest yield, and keeping the TopA component and the corresponding concentration value according to the pseudo-determination coefficient result in the step (g) and marking as CM max A culture medium;
(h2) With CM max The concentration value of each component in the culture medium is a reference value, and candidate concentration values are randomly generated within the range of 20% above and below the reference value, so that a plurality of groups of candidate culture medium formulas are obtained;
(h3) At T max A set of data is not repeatedly deleted from the matrix, and N-1 sets of data remain, the matrix being expressed as (N-1) x p;
(h4) Converting the (N-1) x p matrix in the step (h 3) by using a kernel function shown in a formula Q2, so as to expand the linear model to a nonlinear model;
wherein,
k () is a matrix using a kernel function, D representing the number of formulations and the number of components, i.e., (N-1) x p matrix in step (h 3); d (D) T Represents a transpose of D;
(h5) Establishing a yield prediction model Q4 based on the nonlinear model obtained in the step (h 4), and fitting the yield of the candidate culture medium formula obtained in the step (h 2) by using the yield prediction model Q4 to obtain a predicted yield;
f(D)=wK(D,D T )+b Q4
wherein w is a weight coefficient, and b is a bias coefficient;
(h6) Repeating steps (h 3) to (h 5) up to N-1 times, wherein a set of data different from the previous one is deleted each time in step (h 3), and a matrix of the remaining data each time is expressed as (N-1) ×p;
(h7) Evaluating the predicted yield of the candidate medium formulation obtained in steps (h 5) - (h 6), using a medium formulation having the following characteristics as a recommended medium formulation for a cell culture experiment:
1) Candidate medium formulas corresponding to the highest predicted yield in the predicted yield data;
2) Calculating variances of the candidate culture mediums of each group, wherein the candidate culture medium formula corresponds to the maximum variance value;
3) The sum of the predicted yields of each group of candidate media and their variances (Σ) is calculated, the candidate media formulation corresponding to the maximum sum Σ.
In another preferred embodiment, the CM max The culture medium is the topA component of the group of culture media corresponding to the highest yield in step (f) and the concentration of the corresponding component.
In another preferred embodiment, in step (f) and step (i), the cells used in the culture experiment are selected from the group consisting of: CHO cells, MDCK cells, BHK cells, sf9 cells, highFive cells, 293 cells, MDBK cells, F81 cells, DF-1 cells, LMH cells, vero cells, PK15 cells, ST cells, marc145 cells, hybridoma cells, diploid cells, immune effector cells; preferably CHO cells, MDCK cells, sf9 cells, vero cells; more preferably CHO cells.
In another preferred example, in step (h 2), the candidate concentration is randomly generated within a range of 15% above and below the reference value; preferably, the candidate concentration is randomly generated within a range of 12% above and below the reference value, more preferably 10%,8%.
In another preferred embodiment, in the step (h 5), the weight coefficient w and the bias coefficient b are obtained by minimizing a loss function represented by the formula Q3;
min w,b (y-wK(D,D T )-b) 2 Q3
wherein,
the definition of the parameters is as defined in claim 2.
In another preferred embodiment, N is 50-150, preferably 60-100.
In another preferred embodiment, p is 20 to 100; preferably p is 20, 30, 50, or 80.
In another preferred embodiment, in step (g), the locally weighted regression model is a LOWESS model.
In another preferred embodiment, in step (g), R is taken 2 p The fraction > 0.5, designated as TopA fraction.
In another preferred embodiment, in step (g), A is an integer from 3 to 5.
In another preferred embodiment, p is 100 or 200.
It is understood that within the scope of the present invention, the above-described technical features of the present invention and technical features specifically described below (e.g., in the examples) may be combined with each other to constitute new or preferred technical solutions. And are limited to a space, and are not described in detail herein.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention optimizing the culture medium.
FIG. 2 is a component importance score in example 1.
Fig. 3 shows the response values obtained by the uncertainty algorithm in example 1.
Detailed Description
The inventor provides a regression algorithm based on a nuclear method for the first time through extensive and intensive research, and a method for rapidly optimizing culture medium components. The method disclosed by the invention does not depend on experience of a developer, only converts the nonlinear relation between the components of the multi-variable culture medium and the yield into a linear relation according to the conventional experimental data, deduces and obtains the culture medium component formula with the optimal yield, and greatly shortens the development time cost. Based on this, the inventors completed the present invention.
Terminology
It should be understood that in this application, the matrix is expressed as the number of ordinate x the number of abscissa for a set of columns; as used herein, the ordinate represents the number of point exchanges, further expressed as the number of virtual optimized formulas; the abscissa indicates the number of components constituting the medium, excluding cells and yields. In general, the matrix is denoted as Nxp, N is the number of virtually optimized media formulations, and N-1 represents the number of media from which the remainder of a set of media formulations is removed; p represents the number of individual components.
TopA in this application represents the most important A medium components, irrespective of concentration, and represents the component with a pseudo-determinant greater than 0.5 (0.55 or 0.6) or the first 3-10 of the greatest value in pseudo-determinant calculation.
In step (g), T is determined by a locally weighted regression model max Fitting component variables and yields by using 50 groups of virtual optimization culture mediums in the matrix, introducing a fitted yield result and an actual culture result into a pseudo-decision coefficient calculation formula shown in a formula Q2, and determining the most important 3-10 components by using the result of the pseudo-decision coefficient;
the association relation between the result value of the pseudo-decision coefficient and the component is that a model is built on the yield by using one component in advance, and finally, the model of all the components and the yield is obtained, wherein the model prediction value is closer to the yield, the pseudo-decision coefficient is larger, and the association degree between the components and the yield is higher.
Point exchange algorithm
The point exchange algorithm used in the invention is a point exchange algorithm known in the art and is commonly used for searching optimal experimental data under the condition of limited design space. Specifically, the invention provides an algorithm for amplifying target experimental data to generate multiple groups of initial experimental data.
Generally comprising the steps of:
a simplex data point is generated at which random point exchanges are performed such that the correlation ρ between variables is lowest and the variance is maximized. The basic idea of the candidate point exchange algorithm is to predict the effect of each exchange by iteratively exchanging data points, followed by using a statistical model.
Specifically, in the present invention, the point exchange algorithm includes the steps of:
1. generating experimental design points with simplex.
2. The criteria for measuring the information gain obtained by each exchange, here the correlation and variance, are calculated.
3. An exchange of maximization criteria is selected and the design is updated.
5. Repeating the step 2-3 until convergence.
Nuclear method
The kernel method is a nonlinear method based on a linear algorithm, which converts the original input space into another high-dimensional feature space so that the original data is linearly separable in the feature space, and the conversion is performed by a kernel function, which is a function that can calculate an inner product. The basic idea of this algorithm is to map data from an original space to a high-dimensional feature space such that the original nonlinear relationship exhibits a linear relationship in the feature space.
The kernel method is widely applied to the fields of machine learning and data mining. It can be used for classification, regression, clustering, dimension reduction and other tasks. The method has wide application in the fields of image recognition, natural language processing, bioinformatics, financial prediction and the like. The kernel approach can significantly improve the performance of machine learning algorithms, especially when processing nonlinear data. By providing higher classification and regression accuracy, the kernel method may improve the efficiency of the prediction or classification task while reducing classification or regression errors.
The invention provides an algorithm for modeling experimental data and deducing the optimal point based on a kernel method, which comprises the following steps: fitting the data by using a nuclear regression mode, generating data points by using a candidate point generation algorithm in the experimental design space, and finally, predicting and screening by using a model.
The specific steps of the regression algorithm based on the kernel method are as follows:
1. mapping the original data to a high-dimensional feature space by using a kernel function;
2. fitting data in a high-dimensional feature space by using a linear regression model to obtain a prediction function model;
f(D)=wK(D,D T )+b Q4
3. the model is evaluated and adjusted by cross-validation or the like (predictive value versus actual experimental value).
Method for rapidly optimizing culture medium components based on nuclear method fitting
The invention provides a regression algorithm based on a kernel method, which maps data from an original space to a high-dimensional characteristic space, so that a nonlinear relation between a culture medium component and yield is converted into a linear relation, and a culture medium component formula with optimal yield is obtained.
Specifically, the invention provides a method for rapidly optimizing the formula components of a culture medium, which comprises the following steps:
step 1: providing a formula of a culture medium to be optimized, and amplifying the formula by using a point exchange algorithm to obtain a formula matrix;
step 2: screening the components with highest importance by using the pseudo-decision coefficients, and removing other components from the formula matrix to obtain a formula matrix to be fitted;
step 3: predicting the yield of each formula in the formula matrix to be fitted by using a kernel function;
step 4: generating a recommended culture medium formula by using an uncertainty analysis method;
step 5: performing a cell culture experiment to verify a recommended culture medium formula; ending the optimization when the yield of the target product meets the expected value; and if the expected value is not met, repeating the steps 2-4 until the yield of the target product in the recommended culture medium formula reaches the expected value.
Compared with the prior art, the invention has the main advantages that:
(a) The nonlinear relation between the components of the culture medium formula and the yield can be converted into a linear relation by only relying on past data without depending on experience of professionals, and the culture medium formula with the optimal yield is developed.
(b) The time cost, reagent cost and labor cost brought by cell culture in the development process of the culture medium components are greatly reduced, and the method has wide application prospect. .
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental methods, in which specific conditions are not noted in the following examples, are generally conducted under conventional conditions or under conditions recommended by the manufacturer. Percentages and parts are by weight unless otherwise indicated.
Example 1
The basic culture medium is optimized by the method of the invention, and the product content is tested.
1. Providing an initial culture medium formula 0 to be optimized; 55 components are included in the formulation.
Table 1: initial culture medium formula and concentration of each component
2. Inputting a culture medium formula to be optimized and concentration data of each component into a point exchange algorithm, and carrying out random point exchange processing for N times in the first round, wherein N is 50; a 50 x55 matrix as shown in table 2 was obtained.
Table 2: n x p matrix obtained by first round random point exchange process
3. The trace of the first round component X covariance matrix is calculated based on the matrix of table 2.
4. Repeating the step (2) for 1000 times (namely, carrying out 1000 rounds of N times of point-following exchange processing), obtaining 1000 groups of N multiplied by p matrixes, and respectively calculating the tracks of X covariance matrixes of each group;
5. determining the largest trace and its corresponding T based on the result in step 4) max A matrix, as shown in table 3;
table 3: t corresponding to the largest trace determined in step 4 max Matrix array
6. Cell culture experiments were performed by preparing a medium according to the 50 sets of medium composition data obtained in step 5, and obtaining a response amount, which is the yield of a product (e.g., protein) expressed by the cells.
TABLE 4 relation between the component concentration (X) of the medium and the yield of the product
7. T of Table 3 using a locally weighted regression model max Fitting the component variable and the yield of 50 groups of culture mediums in the matrix to obtain 50 groups of yield prediction data; based on the above 50 sets of yield prediction data and the 50 sets of yield data of actual cell culture in table 4, calculating pseudo-decision coefficients using formula Q1, determining the most important 3 components, i.e., X1, X3 and X55, from the values of the pseudo-decision coefficients, deleting the other components to obtain an n×top3 matrix, as shown in table 5;
TABLE 5
8. And introducing a nuclear method, establishing a nonlinear model, and analyzing the nonlinear model by using an uncertainty analysis algorithm so as to obtain the recommended concentration of one or more components in the next batch of culture medium formula.
8.1. Selecting a formula corresponding to the highest yield in the table 5 in the step 7, namely a formula 4:
8.2. based on formula 4 (set as formula 0 in table 7 below), a random perturbation of ±15% was added to the concentrations of the components in formula 4 to generate candidate concentrations, resulting in a total of 300 sets of candidate medium formulas;
TABLE 7 candidate Medium formulations
8.3. Randomly T from step 5 max Deleting one row of data in the matrix to obtain an (N-1) x P matrix, as shown in Table 6;
TABLE 6
8.4. Mapping conversion is carried out on the (N-1) x P matrix in the table 6 through a formula Q2 to obtain a nonlinear model:
wherein D is an (N-1) x P matrix;
performing minimum calculation on the loss function shown in the formula Q3, and taking the values of w and b when the loss function is minimum; w is a weight coefficient, b is a bias coefficient:
min w,b (y-wK(D,D T )-b) 2 Q3
establishing a yield simulation model pattern Q4 based on the nonlinear model and two coefficients (w and b);
f(D)=wK(D,D T )+b Q4
fitting to obtain a nonlinear yield prediction model f1;
8.5. step 8.3.—8.4. Was repeated 10 times, yielding 10 sets of yield predictive models (denoted f1, f2, …, f10, respectively).
8.6. Yield data for the 300 sets of candidate medium formulations obtained in step 8.2 were predicted based on the 10 sets of yield prediction models obtained in step 8.5.
TABLE 8
8.7. Based on all the yield data obtained in step 8.6, the recommended medium formulation was determined using the following evaluation system, respectively:
1) Candidate medium formulas corresponding to the highest predicted yield in the yield prediction data;
2) Calculating variances of the candidate culture mediums of each group, wherein the candidate culture medium formula corresponds to the maximum variance value;
3) Calculating the candidate culture medium formula corresponding to the sum sigma, the maximum sum sigma of the predicted yield and the variance of each group of candidate culture mediums.
8.8. The following recommended candidate medium formulations were obtained: 20 25, 30, 200, 201, 252; and used for cell culture, respectively.
Table 9 recommends candidate Medium formulations
9. Preparing recommended candidate culture medium for cell culture and expressing target product.
Cell culture method: the candidate media were prepared separately according to the formulation of table 9 using a preparation method commonly known to those skilled in the art and sterilized for use. Resuscitates a meter from liquid nitrogenCHO cells reaching target antibody are centrifugally inoculated after being passaged and expanded to obtain sufficient cell quantity according to the ratio of 1 multiplied by 10 6 cells/ml were inoculated into TPP (cell culture shake tube) at a density, the recommended candidate medium was added to the experimental group at a loading of 20 ml/tube, the cell culture was performed in a two-stage (basal + fed-batch) culture procedure, the same fed-batch medium and fed-batch strategy were used for different experimental groups, and antibody production was examined at the end of 14 days, and the formulation of each recommended candidate medium and its corresponding antibody production are shown in Table 10.
Table 10
Medium numbering | X1 | X3 | X55 | Y(g/L) |
20 | 4.5 | 209.0 | 74.0 | 4.2 |
25 | 4.0 | 208.5 | 75.0 | 4.1 |
30 | 4.2 | 208.0 | 75.1 | 3.6 |
200 | 4.2 | 208.0 | 75.0 | 3.6 |
201 | 3.9 | 208.0 | 75.2 | 4.0 |
252 | 4.5 | 208.3 | 75.2 | 3.4 |
The results indicated that after the first round of optimization, a yield of up to 4.2g/L was achieved. Compared with the traditional method for developing the culture medium for optimizing at least four components, the method achieves the aim of optimizing the components only through two-round culture experiments, greatly reduces the time cost, the reagent cost and the labor cost brought by cell culture in the development process of the culture medium components, and has wide application prospect.
All documents mentioned in this application are incorporated by reference as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the claims appended hereto.
Claims (10)
1. A method for rapidly optimizing culture medium components based on nuclear method fitting is characterized in that,
(a) Providing a culture medium formula to be optimized; wherein the formula comprises p components, and the concentration value of each component is expressed asThe corresponding data set is denoted +.>I is a positive integer of 1-p, and p is 10-150;
(b) Component concentration data set of culture medium formula to be optimizedInputting into point exchange algorithm, and using the point exchange algorithm to make concentration of each component +.>N random point exchange treatments are carried out for the first round, N virtual optimized culture medium formulas subjected to the random point exchange treatments are respectively obtained, and the matrix is expressed as N multiplied by p, wherein N is the number of times of the random point exchange treatments, and N is a positive integer more than or equal to 30;
(c) Calculating a trace (trace) of a covariance matrix of the components for the run based on the N x p matrix;
(d) Repeating steps (b) and (c), repeating the M-1 runs, thereby obtaining traces of covariance matrices of components of each run; wherein M is 200-500;
(e) Comparing M tracks obtained by M rounds to determine the track with the largest value and N x p matrix corresponding to the largest track, namely the track maximum matrix (or T) max A matrix); at said T max In the matrix, the concentration values of the components of each virtual optimized formulation are expressed as x' i Its corresponding data set isThe matrix is denoted as N x P; wherein i is a positive integer from 1 to p, and N is a positive integer from 1 to N;
(f) According to T max N groups of virtual optimized culture medium formulas in the matrix, preparing N groups of culture mediums, and executing a culture experiment to obtain N groups of experiment yield data, wherein the N groups of experiment yield data are recorded as
(g) For T max Fitting the component variables and the yields of N groups of virtual optimization culture medium formulas in the matrix by using a local weighted regression model to obtain N yield prediction data, and recording the N yield prediction data as
Predicting data based on the yield in step (g)And experimental yield data in step (f)>Calculating pseudo-determinant coefficients R of each of the pseudo-optimized medium formulations using Q1 2 p Thereby obtaining R 2 p The largest a components, denoted the most important components (TopA components), where a is an integer from 3 to 10; from T max Other components are deleted from the matrix, and the matrix of the rest data set is marked as an NxTopA matrix;
wherein;
R 2 p determining coefficients for the false;
y (n) experimental yield data corresponding to the n-th virtual optimized medium formula in the culture experiment in the step (f);
fitting the yield prediction data obtained by the formula fitting of the n-th virtual optimization culture medium to the locally weighted regression model in the step (g);
is the average of all experimental yield data in step (f);
n is an integer from 1 to N;
(h) Modeling by using a nuclear method, and analyzing a nonlinear model obtained by modeling by using the nuclear method by using an uncertainty analysis algorithm to obtain a recommended culture medium, wherein the recommended culture medium has recommended concentrations of one or more components;
(i) Preparing the recommended culture medium obtained in the step (h), and performing a cell culture experiment to obtain the yield of the target product;
(j) When the yield of the target product reaches an expected value, the optimization is finished; and (c) repeating steps (b) - (i) until the yield of the target product reaches the desired value when the yield is lower than the desired value.
2. The method according to claim 1, wherein in step (h), the method specifically comprises the steps of:
(h1) Based on the experimental yield data obtained in step (f)Determining the concentration value of each component in the culture medium with the highest yield, and keeping the TopA component and the corresponding concentration value according to the pseudo-determination coefficient result in the step (g) and marking as CM max A culture medium;
(h2) With CM max The concentration value of each component in the culture medium is a reference value, and candidate concentration values are randomly generated within the range of 20% above and below the reference value, so that a plurality of groups of candidate culture medium formulas are obtained;
(h3) At T max One set of data is not repeatedly deleted from the matrix, the remaining N-1 setsData, matrix expressed as (N-1) x p;
(h4) Converting the (N-1) x p matrix in the step (h 3) by using a kernel function shown in a formula Q2, so as to expand the linear model to a nonlinear model;
wherein,
k () is a matrix using a kernel function, D representing the number of formulations and the number of components, i.e., (N-1) x p matrix in step (h 3); d (D) T Represents a transpose of D;
(h5) Establishing a yield prediction model Q4 based on the nonlinear model obtained in the step (h 4), and fitting the yield of the candidate culture medium formula obtained in the step (h 2) by using the yield prediction model Q4 to obtain a predicted yield;
f(D)=wK(D,D T )+b Q4
wherein w is a weight coefficient, and b is a bias coefficient;
(h6) Repeating steps (h 3) to (h 5) up to N-1 times, wherein a set of data different from the previous one is deleted each time in step (h 3), and a matrix of the remaining data each time is expressed as (N-1) ×p;
(h7) Evaluating the predicted yield of the candidate medium formulation obtained in steps (h 5) - (h 6), using a medium formulation having the following characteristics as a recommended medium formulation for a cell culture experiment:
1) Candidate medium formulas corresponding to the highest predicted yield in the predicted yield data;
2) Calculating variances of the candidate culture mediums of each group, wherein the candidate culture medium formula corresponds to the maximum variance value;
3) The sum of the predicted yields of each group of candidate media and their variances (Σ) is calculated, the candidate media formulation corresponding to the maximum sum Σ.
3. The method of claim 1, wherein in step (f) and step (i), the cells used in the culture experiment are selected from the group consisting of: CHO cells, MDCK cells, BHK cells, sf9 cells, highFive cells, 293 cells, MDBK cells, F81 cells, DF-1 cells, LMH cells, vero cells, PK15 cells, ST cells, marc145 cells, hybridoma cells, diploid cells, immune effector cells; preferably CHO cells, MDCK cells, sf9 cells, vero cells; more preferably CHO cells.
4. The method of claim 2, wherein in step (h 2), the candidate concentration is randomly generated within 15% above and below the reference value; preferably, the candidate concentration is randomly generated within a range of 12% above and below the reference value, more preferably 10%,8%.
5. The method of claim 2, wherein in step (h 5), the weight coefficient w and the bias coefficient b are obtained by minimizing a loss function represented by formula Q3;
min w,b (y-wK(D,D T )-b) 2 Q3
wherein,
the definition of the parameters is as defined in claim 2.
6. A method according to claim 1, characterized in that N is 50-150, preferably 60-100.
7. The method of claim 1, wherein p is 20 to 100; preferably p is 20, 30, 50, or 80.
8. The method of claim 1, wherein in step (g), the locally weighted regression model is a LOWESS model.
9. The method of claim 1, wherein R is taken in step (g) 2 p The fraction > 0.5, designated as TopA fraction.
10. The method of claim 1, wherein in step (g), a is an integer from 3 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311604082.6A CN117497038B (en) | 2023-11-28 | 2023-11-28 | Method for rapidly optimizing culture medium formula based on nuclear method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311604082.6A CN117497038B (en) | 2023-11-28 | 2023-11-28 | Method for rapidly optimizing culture medium formula based on nuclear method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117497038A true CN117497038A (en) | 2024-02-02 |
CN117497038B CN117497038B (en) | 2024-06-25 |
Family
ID=89682893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311604082.6A Active CN117497038B (en) | 2023-11-28 | 2023-11-28 | Method for rapidly optimizing culture medium formula based on nuclear method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117497038B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117497037A (en) * | 2023-11-17 | 2024-02-02 | 上海倍谙基生物科技有限公司 | Culture medium component sensitivity analysis method based on generalized linear model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1241210A (en) * | 1996-10-11 | 2000-01-12 | 得克萨斯农业及机械体系综合大学 | Method for generation of primordial germ cell and transgenic animal species |
US20080313188A1 (en) * | 2007-06-14 | 2008-12-18 | Microsoft Corporation | Distributed Kernel Density Estimation |
CN104573720A (en) * | 2014-12-31 | 2015-04-29 | 北京工业大学 | Distributed training method for kernel classifiers in wireless sensor network |
CN113450882A (en) * | 2020-09-27 | 2021-09-28 | 东莞太力生物工程有限公司 | Artificial intelligence-based basic culture medium formula development method and system |
CN113450868A (en) * | 2020-11-26 | 2021-09-28 | 东莞太力生物工程有限公司 | Basic culture medium development method based on culture index evaluation |
US20220213429A1 (en) * | 2019-05-08 | 2022-07-07 | Insilico Biotechnology Ag | Method and means for optimizing biotechnological production |
-
2023
- 2023-11-28 CN CN202311604082.6A patent/CN117497038B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1241210A (en) * | 1996-10-11 | 2000-01-12 | 得克萨斯农业及机械体系综合大学 | Method for generation of primordial germ cell and transgenic animal species |
US20080313188A1 (en) * | 2007-06-14 | 2008-12-18 | Microsoft Corporation | Distributed Kernel Density Estimation |
CN104573720A (en) * | 2014-12-31 | 2015-04-29 | 北京工业大学 | Distributed training method for kernel classifiers in wireless sensor network |
US20220213429A1 (en) * | 2019-05-08 | 2022-07-07 | Insilico Biotechnology Ag | Method and means for optimizing biotechnological production |
CN113450882A (en) * | 2020-09-27 | 2021-09-28 | 东莞太力生物工程有限公司 | Artificial intelligence-based basic culture medium formula development method and system |
CN113450868A (en) * | 2020-11-26 | 2021-09-28 | 东莞太力生物工程有限公司 | Basic culture medium development method based on culture index evaluation |
Non-Patent Citations (1)
Title |
---|
黄宏运 等: "一种改进的模拟退火算法优化的支持向量机在交易信号研判中的应用", 延边大学学报(自然科学版), vol. 43, no. 01, 20 March 2017 (2017-03-20), pages 25 - 33 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117497037A (en) * | 2023-11-17 | 2024-02-02 | 上海倍谙基生物科技有限公司 | Culture medium component sensitivity analysis method based on generalized linear model |
CN117497037B (en) * | 2023-11-17 | 2024-08-16 | 上海倍谙基生物科技有限公司 | Culture medium component sensitivity analysis method based on generalized linear model |
Also Published As
Publication number | Publication date |
---|---|
CN117497038B (en) | 2024-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086799A (en) | A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet | |
CN114022693A (en) | A method for clustering single-cell RNA-seq data based on dual self-supervision | |
CN111564183A (en) | Single cell sequencing data dimension reduction method fusing gene ontology and neural network | |
CN114783526B (en) | Deep unsupervised single-cell clustering method based on Gaussian mixture graph variational autoencoder | |
CN117497038B (en) | Method for rapidly optimizing culture medium formula based on nuclear method | |
CN113723707A (en) | Medium-and-long-term runoff trend prediction method based on deep learning model | |
CN118536720B (en) | Evaluation method and system for improving tobacco field soil quality by applying biochar | |
CN118609667B (en) | Crop phenotype association regulation and control network optimization method and system | |
CN113889192A (en) | A clustering method for single-cell RNA-seq data based on deep denoising autoencoders | |
CN117334271B (en) | Method for generating molecules based on specified attributes | |
CN118942724A (en) | A method based on LSTM to predict the epidemic trend of respiratory infectious diseases using multi-source heterogeneous data | |
CN114462548B (en) | A Method to Improve the Accuracy of Single Cell Depth Clustering Algorithm | |
Syahrani | Comparation analysis of ensemble technique with boosting (Xgboost) and bagging (Randomforest) for classify splice junction DNA sequence category | |
CN119339798A (en) | Single-cell sequencing data interpolation method based on biclustering and graph convolutional neural network | |
CN119537852A (en) | A quantitative analysis method of soil properties based on spectral driven and greedy optimization integrated model | |
CN113378946A (en) | Robust multi-label feature selection method considering feature label dependency | |
Xu et al. | Growth dynamics and heritability for plant high‐throughput phenotyping studies using hierarchical functional data analysis | |
CN113035363A (en) | Probability density weighted genetic metabolic disease screening data mixed sampling method | |
CN118569892A (en) | A method for predicting second-hand housing prices based on deep learning | |
JP2024137802A (en) | Prediction device and method based on omics data | |
CN115661498A (en) | Self-optimization single cell clustering method | |
CN114974422A (en) | Graph convolution network-based single cell sub-compartment detection method | |
Cao et al. | Financial network analysis using polymodel theory | |
CN112801163A (en) | Multi-target feature selection method of mouse model hippocampal biomarker based on dynamic graph structure | |
Alyassine et al. | An Efficient and Reliable scRNA-seq Data Imputation Method Using Variational Autoencoders |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |