CN117497038A

CN117497038A - Method for rapidly optimizing culture medium formula based on nuclear method

Info

Publication number: CN117497038A
Application number: CN202311604082.6A
Authority: CN
Inventors: 刘旭平; 范里
Original assignee: Shanghai Bioengine Biotechnology Co ltd
Current assignee: Shanghai Bioengine Biotechnology Co ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-02-02
Anticipated expiration: 2043-11-28
Also published as: CN117497038B

Abstract

The invention provides a method for rapidly optimizing a culture medium formula based on a nuclear method. Specifically, the invention uses a point exchange algorithm to amplify the culture medium formula, uses a pseudo-decision coefficient to determine important components, uses a regression algorithm based on a kernel method to build model prediction yield, and uses an uncertainty algorithm to select candidate culture medium formulas. The method can rapidly optimize the components of the culture medium and obtain the optimal culture medium under limited conditions.

Description

Method for rapidly optimizing culture medium formula based on nuclear method

Technical Field

The invention belongs to the field of biological culture media, and particularly relates to a method for rapidly optimizing a culture medium formula based on a nuclear method.

Background

With the development of biotechnology, the components of the culture medium are very complex, and the number of the components is close to more than 100. Medium composition optimization is a particularly important part of the medium development process.

Because of the extremely complex composition and variety of animal cell culture media, numerous factors and levels of experimental design often need to be considered in conducting different project studies and cell type selections to meet the needs. In this process, the use of traditional development methods is highly dependent on the expertise and experience of the researchers. In the traditional method, candidate components and ranges are determined empirically first, and then screening tests are carried out on the components by using a factorial experimental design mode. In response surface test design of the generated key components, the concentration of the components of the culture medium is optimized through response surface fitting results.

This process is highly dependent on the experience and knowledge of the researcher and is a trial and error process. Confirmation of candidate components and ranges is effective directly affects the experimental amount and period of subsequent media development. In recent years, there have been reports in the literature of the development of experimental data for the treatment of small sample multi-variable animal cell culture media using multi-variable analysis software. However, most of the current commercial multivariate analysis software is based on linear model, and the complex nonlinear relationship between the medium composition and the yield reduces the effectiveness of the linear analysis, and the output requirement for stability control in actual production cannot be met.

In view of this, there is a need in the art to develop a general set of cell culture medium development algorithms that can convert nonlinear models into linear models.

Disclosure of Invention

The invention aims to provide a method for rapidly optimizing culture medium components based on a regression algorithm of a nuclear method, which can convert a nonlinear model into a linear model for developing a cell culture medium.

In a first aspect of the invention, a method for rapidly optimizing media composition based on a nuclear method fit is provided,

(a) Providing a culture medium formula to be optimized; wherein the formula comprises p components, and the concentration value of each component is expressed asThe corresponding data set is denoted +.>I is a positive integer of 1-p, and p is 10-150;

(b) Component concentration data set of culture medium formula to be optimizedInputting into point exchange algorithm, and using the point exchange algorithm to make concentration of each component +.>N random point exchange treatments are carried out for the first round, N virtual optimized culture medium formulas subjected to the random point exchange treatments are respectively obtained, and the matrix is expressed as N multiplied by p, wherein N is the number of times of the random point exchange treatments, and N is a positive integer more than or equal to 30;

(c) Calculating a trace (trace) of a covariance matrix of the components for the run based on the N x p matrix;

(d) Repeating steps (b) and (c), repeating the M-1 runs, thereby obtaining traces of covariance matrices of components of each run; wherein M is 200-500.

(e) Comparing M tracks obtained by M rounds to determine the track with the largest value and N x p matrix corresponding to the largest track, namely the track maximum matrix (or T) _max A matrix); at said T _max The concentration values of the components of each virtual optimized formulation in the matrix are expressed as x _i ' its corresponding dataset isThe matrix is denoted as N x P; wherein i is a positive integer from 1 to p, and N is a positive integer from 1 to N;

(f) According to T _max N groups of virtual optimized culture medium formulas in the matrix, preparing N groups of culture mediums, and executing a culture experiment to obtain N groups of experiment yield data, wherein the N groups of experiment yield data are recorded as

(g) For T _max N groups of virtual optimized medium formulas in the matrix are subjected to component transformation by using a local weighted regression modelFitting of quantity and yield, obtaining N yield prediction data by fitting, and recording as

Predicting data based on the yield in step (g)And experimental yield data in step (f)Calculating pseudo-determinant coefficients R of each of the pseudo-optimized medium formulations using Q1 ² _p Thereby obtaining R ² _p The largest a components, denoted the most important components (TopA components), where a is an integer from 3 to 10; from T _max Other components are deleted from the matrix, and the matrix of the rest data set is marked as an NxTopA matrix;

wherein;

R ² _p determining coefficients for the false;

y ⁽ⁿ⁾ experimental yield data corresponding to the n-th virtual optimized medium formula in the culture experiment in the step (f);

fitting the yield prediction data obtained by the formula fitting of the n-th virtual optimization culture medium to the locally weighted regression model in the step (g);

is the average of all experimental yield data in step (f);

n is an integer from 1 to N;

(h) Modeling by using a nuclear method, and analyzing a nonlinear model obtained by modeling by using the nuclear method by using an uncertainty analysis algorithm to obtain a recommended culture medium, wherein the recommended culture medium has recommended concentrations of one or more components;

(i) Preparing the recommended culture medium obtained in the step (h), and performing a cell culture experiment to obtain the yield of the target product;

(j) When the yield of the target product reaches an expected value, the optimization is finished; and (c) repeating steps (b) - (i) until the yield of the target product reaches the desired value when the yield is lower than the desired value.

In another preferred embodiment, the step (h) specifically includes the following steps:

(h1) Based on the experimental yield data obtained in step (f)Determining the concentration value of each component in the culture medium with the highest yield, and keeping the TopA component and the corresponding concentration value according to the pseudo-determination coefficient result in the step (g) and marking as CM _max A culture medium;

(h2) With CM _max The concentration value of each component in the culture medium is a reference value, and candidate concentration values are randomly generated within the range of 20% above and below the reference value, so that a plurality of groups of candidate culture medium formulas are obtained;

(h3) At T _max A set of data is not repeatedly deleted from the matrix, and N-1 sets of data remain, the matrix being expressed as (N-1) x p;

(h4) Converting the (N-1) x p matrix in the step (h 3) by using a kernel function shown in a formula Q2, so as to expand the linear model to a nonlinear model;

wherein,

k () is a matrix using a kernel function, D representing the number of formulations and the number of components, i.e., (N-1) x p matrix in step (h 3); d (D) ^T Represents a transpose of D;

(h5) Establishing a yield prediction model Q4 based on the nonlinear model obtained in the step (h 4), and fitting the yield of the candidate culture medium formula obtained in the step (h 2) by using the yield prediction model Q4 to obtain a predicted yield;

f(D)＝wK(D,D ^T )+b Q4

wherein w is a weight coefficient, and b is a bias coefficient;

(h6) Repeating steps (h 3) to (h 5) up to N-1 times, wherein a set of data different from the previous one is deleted each time in step (h 3), and a matrix of the remaining data each time is expressed as (N-1) ×p;

(h7) Evaluating the predicted yield of the candidate medium formulation obtained in steps (h 5) - (h 6), using a medium formulation having the following characteristics as a recommended medium formulation for a cell culture experiment:

1) Candidate medium formulas corresponding to the highest predicted yield in the predicted yield data;

2) Calculating variances of the candidate culture mediums of each group, wherein the candidate culture medium formula corresponds to the maximum variance value;

3) The sum of the predicted yields of each group of candidate media and their variances (Σ) is calculated, the candidate media formulation corresponding to the maximum sum Σ.

In another preferred embodiment, the CM _max The culture medium is the topA component of the group of culture media corresponding to the highest yield in step (f) and the concentration of the corresponding component.

In another preferred embodiment, in step (f) and step (i), the cells used in the culture experiment are selected from the group consisting of: CHO cells, MDCK cells, BHK cells, sf9 cells, highFive cells, 293 cells, MDBK cells, F81 cells, DF-1 cells, LMH cells, vero cells, PK15 cells, ST cells, marc145 cells, hybridoma cells, diploid cells, immune effector cells; preferably CHO cells, MDCK cells, sf9 cells, vero cells; more preferably CHO cells.

In another preferred example, in step (h 2), the candidate concentration is randomly generated within a range of 15% above and below the reference value; preferably, the candidate concentration is randomly generated within a range of 12% above and below the reference value, more preferably 10%,8%.

In another preferred embodiment, in the step (h 5), the weight coefficient w and the bias coefficient b are obtained by minimizing a loss function represented by the formula Q3;

min _w,b (y-wK(D,D ^T )-b) ² Q3

wherein,

the definition of the parameters is as defined in claim 2.

In another preferred embodiment, N is 50-150, preferably 60-100.

In another preferred embodiment, p is 20 to 100; preferably p is 20, 30, 50, or 80.

In another preferred embodiment, in step (g), the locally weighted regression model is a LOWESS model.

In another preferred embodiment, in step (g), R is taken ² _p The fraction > 0.5, designated as TopA fraction.

In another preferred embodiment, in step (g), A is an integer from 3 to 5.

In another preferred embodiment, p is 100 or 200.

It is understood that within the scope of the present invention, the above-described technical features of the present invention and technical features specifically described below (e.g., in the examples) may be combined with each other to constitute new or preferred technical solutions. And are limited to a space, and are not described in detail herein.

Drawings

FIG. 1 is a flow chart of the algorithm of the present invention optimizing the culture medium.

FIG. 2 is a component importance score in example 1.

Fig. 3 shows the response values obtained by the uncertainty algorithm in example 1.

Detailed Description

The inventor provides a regression algorithm based on a nuclear method for the first time through extensive and intensive research, and a method for rapidly optimizing culture medium components. The method disclosed by the invention does not depend on experience of a developer, only converts the nonlinear relation between the components of the multi-variable culture medium and the yield into a linear relation according to the conventional experimental data, deduces and obtains the culture medium component formula with the optimal yield, and greatly shortens the development time cost. Based on this, the inventors completed the present invention.

Terminology

It should be understood that in this application, the matrix is expressed as the number of ordinate x the number of abscissa for a set of columns; as used herein, the ordinate represents the number of point exchanges, further expressed as the number of virtual optimized formulas; the abscissa indicates the number of components constituting the medium, excluding cells and yields. In general, the matrix is denoted as Nxp, N is the number of virtually optimized media formulations, and N-1 represents the number of media from which the remainder of a set of media formulations is removed; p represents the number of individual components.

TopA in this application represents the most important A medium components, irrespective of concentration, and represents the component with a pseudo-determinant greater than 0.5 (0.55 or 0.6) or the first 3-10 of the greatest value in pseudo-determinant calculation.

In step (g), T is determined by a locally weighted regression model _max Fitting component variables and yields by using 50 groups of virtual optimization culture mediums in the matrix, introducing a fitted yield result and an actual culture result into a pseudo-decision coefficient calculation formula shown in a formula Q2, and determining the most important 3-10 components by using the result of the pseudo-decision coefficient;

the association relation between the result value of the pseudo-decision coefficient and the component is that a model is built on the yield by using one component in advance, and finally, the model of all the components and the yield is obtained, wherein the model prediction value is closer to the yield, the pseudo-decision coefficient is larger, and the association degree between the components and the yield is higher.

Point exchange algorithm

The point exchange algorithm used in the invention is a point exchange algorithm known in the art and is commonly used for searching optimal experimental data under the condition of limited design space. Specifically, the invention provides an algorithm for amplifying target experimental data to generate multiple groups of initial experimental data.

Generally comprising the steps of:

a simplex data point is generated at which random point exchanges are performed such that the correlation ρ between variables is lowest and the variance is maximized. The basic idea of the candidate point exchange algorithm is to predict the effect of each exchange by iteratively exchanging data points, followed by using a statistical model.

Specifically, in the present invention, the point exchange algorithm includes the steps of:

1. generating experimental design points with simplex.

2. The criteria for measuring the information gain obtained by each exchange, here the correlation and variance, are calculated.

3. An exchange of maximization criteria is selected and the design is updated.

5. Repeating the step 2-3 until convergence.

Nuclear method

The kernel method is a nonlinear method based on a linear algorithm, which converts the original input space into another high-dimensional feature space so that the original data is linearly separable in the feature space, and the conversion is performed by a kernel function, which is a function that can calculate an inner product. The basic idea of this algorithm is to map data from an original space to a high-dimensional feature space such that the original nonlinear relationship exhibits a linear relationship in the feature space.

The kernel method is widely applied to the fields of machine learning and data mining. It can be used for classification, regression, clustering, dimension reduction and other tasks. The method has wide application in the fields of image recognition, natural language processing, bioinformatics, financial prediction and the like. The kernel approach can significantly improve the performance of machine learning algorithms, especially when processing nonlinear data. By providing higher classification and regression accuracy, the kernel method may improve the efficiency of the prediction or classification task while reducing classification or regression errors.

The invention provides an algorithm for modeling experimental data and deducing the optimal point based on a kernel method, which comprises the following steps: fitting the data by using a nuclear regression mode, generating data points by using a candidate point generation algorithm in the experimental design space, and finally, predicting and screening by using a model.

The specific steps of the regression algorithm based on the kernel method are as follows:

1. mapping the original data to a high-dimensional feature space by using a kernel function;

2. fitting data in a high-dimensional feature space by using a linear regression model to obtain a prediction function model;

f(D)＝wK(D,D ^T )+b Q4

3. the model is evaluated and adjusted by cross-validation or the like (predictive value versus actual experimental value).

Method for rapidly optimizing culture medium components based on nuclear method fitting

The invention provides a regression algorithm based on a kernel method, which maps data from an original space to a high-dimensional characteristic space, so that a nonlinear relation between a culture medium component and yield is converted into a linear relation, and a culture medium component formula with optimal yield is obtained.

Specifically, the invention provides a method for rapidly optimizing the formula components of a culture medium, which comprises the following steps:

step 1: providing a formula of a culture medium to be optimized, and amplifying the formula by using a point exchange algorithm to obtain a formula matrix;

step 2: screening the components with highest importance by using the pseudo-decision coefficients, and removing other components from the formula matrix to obtain a formula matrix to be fitted;

step 3: predicting the yield of each formula in the formula matrix to be fitted by using a kernel function;

step 4: generating a recommended culture medium formula by using an uncertainty analysis method;

step 5: performing a cell culture experiment to verify a recommended culture medium formula; ending the optimization when the yield of the target product meets the expected value; and if the expected value is not met, repeating the steps 2-4 until the yield of the target product in the recommended culture medium formula reaches the expected value.

Compared with the prior art, the invention has the main advantages that:

(a) The nonlinear relation between the components of the culture medium formula and the yield can be converted into a linear relation by only relying on past data without depending on experience of professionals, and the culture medium formula with the optimal yield is developed.

(b) The time cost, reagent cost and labor cost brought by cell culture in the development process of the culture medium components are greatly reduced, and the method has wide application prospect. .

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental methods, in which specific conditions are not noted in the following examples, are generally conducted under conventional conditions or under conditions recommended by the manufacturer. Percentages and parts are by weight unless otherwise indicated.

Example 1

The basic culture medium is optimized by the method of the invention, and the product content is tested.

1. Providing an initial culture medium formula 0 to be optimized; 55 components are included in the formulation.

Table 1: initial culture medium formula and concentration of each component

2. Inputting a culture medium formula to be optimized and concentration data of each component into a point exchange algorithm, and carrying out random point exchange processing for N times in the first round, wherein N is 50; a 50 x55 matrix as shown in table 2 was obtained.

Table 2: n x p matrix obtained by first round random point exchange process

3. The trace of the first round component X covariance matrix is calculated based on the matrix of table 2.

4. Repeating the step (2) for 1000 times (namely, carrying out 1000 rounds of N times of point-following exchange processing), obtaining 1000 groups of N multiplied by p matrixes, and respectively calculating the tracks of X covariance matrixes of each group;

5. determining the largest trace and its corresponding T based on the result in step 4) _max A matrix, as shown in table 3;

table 3: t corresponding to the largest trace determined in step 4 _max Matrix array

6. Cell culture experiments were performed by preparing a medium according to the 50 sets of medium composition data obtained in step 5, and obtaining a response amount, which is the yield of a product (e.g., protein) expressed by the cells.

TABLE 4 relation between the component concentration (X) of the medium and the yield of the product

7. T of Table 3 using a locally weighted regression model _max Fitting the component variable and the yield of 50 groups of culture mediums in the matrix to obtain 50 groups of yield prediction data; based on the above 50 sets of yield prediction data and the 50 sets of yield data of actual cell culture in table 4, calculating pseudo-decision coefficients using formula Q1, determining the most important 3 components, i.e., X1, X3 and X55, from the values of the pseudo-decision coefficients, deleting the other components to obtain an n×top3 matrix, as shown in table 5;

TABLE 5

8. And introducing a nuclear method, establishing a nonlinear model, and analyzing the nonlinear model by using an uncertainty analysis algorithm so as to obtain the recommended concentration of one or more components in the next batch of culture medium formula.

8.1. Selecting a formula corresponding to the highest yield in the table 5 in the step 7, namely a formula 4:

8.2. based on formula 4 (set as formula 0 in table 7 below), a random perturbation of ±15% was added to the concentrations of the components in formula 4 to generate candidate concentrations, resulting in a total of 300 sets of candidate medium formulas;

TABLE 7 candidate Medium formulations

8.3. Randomly T from step 5 _max Deleting one row of data in the matrix to obtain an (N-1) x P matrix, as shown in Table 6;

TABLE 6

8.4. Mapping conversion is carried out on the (N-1) x P matrix in the table 6 through a formula Q2 to obtain a nonlinear model:

wherein D is an (N-1) x P matrix;

performing minimum calculation on the loss function shown in the formula Q3, and taking the values of w and b when the loss function is minimum; w is a weight coefficient, b is a bias coefficient:

min _w,b (y-wK(D,D ^T )-b) ² Q3

establishing a yield simulation model pattern Q4 based on the nonlinear model and two coefficients (w and b);

f(D)＝wK(D,D ^T )+b Q4

fitting to obtain a nonlinear yield prediction model f1;

8.5. step 8.3.—8.4. Was repeated 10 times, yielding 10 sets of yield predictive models (denoted f1, f2, …, f10, respectively).

8.6. Yield data for the 300 sets of candidate medium formulations obtained in step 8.2 were predicted based on the 10 sets of yield prediction models obtained in step 8.5.

TABLE 8

8.7. Based on all the yield data obtained in step 8.6, the recommended medium formulation was determined using the following evaluation system, respectively:

1) Candidate medium formulas corresponding to the highest predicted yield in the yield prediction data;

3) Calculating the candidate culture medium formula corresponding to the sum sigma, the maximum sum sigma of the predicted yield and the variance of each group of candidate culture mediums.

8.8. The following recommended candidate medium formulations were obtained: 20 25, 30, 200, 201, 252; and used for cell culture, respectively.

Table 9 recommends candidate Medium formulations

9. Preparing recommended candidate culture medium for cell culture and expressing target product.

Cell culture method: the candidate media were prepared separately according to the formulation of table 9 using a preparation method commonly known to those skilled in the art and sterilized for use. Resuscitates a meter from liquid nitrogenCHO cells reaching target antibody are centrifugally inoculated after being passaged and expanded to obtain sufficient cell quantity according to the ratio of 1 multiplied by 10 ⁶ cells/ml were inoculated into TPP (cell culture shake tube) at a density, the recommended candidate medium was added to the experimental group at a loading of 20 ml/tube, the cell culture was performed in a two-stage (basal + fed-batch) culture procedure, the same fed-batch medium and fed-batch strategy were used for different experimental groups, and antibody production was examined at the end of 14 days, and the formulation of each recommended candidate medium and its corresponding antibody production are shown in Table 10.

Table 10

Medium numbering	X1	X3	X55	Y(g/L)
					20	4.5	209.0	74.0	4.2
25	4.0	208.5	75.0	4.1
					30	4.2	208.0	75.1	3.6
200	4.2	208.0	75.0	3.6
					201	3.9	208.0	75.2	4.0
252	4.5	208.3	75.2	3.4

The results indicated that after the first round of optimization, a yield of up to 4.2g/L was achieved. Compared with the traditional method for developing the culture medium for optimizing at least four components, the method achieves the aim of optimizing the components only through two-round culture experiments, greatly reduces the time cost, the reagent cost and the labor cost brought by cell culture in the development process of the culture medium components, and has wide application prospect.

All documents mentioned in this application are incorporated by reference as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the claims appended hereto.

Claims

1. A method for rapidly optimizing culture medium components based on nuclear method fitting is characterized in that,

(d) Repeating steps (b) and (c), repeating the M-1 runs, thereby obtaining traces of covariance matrices of components of each run; wherein M is 200-500;

(e) Comparing M tracks obtained by M rounds to determine the track with the largest value and N x p matrix corresponding to the largest track, namely the track maximum matrix (or T) _max A matrix); at said T _max In the matrix, the concentration values of the components of each virtual optimized formulation are expressed as x' _i Its corresponding data set isThe matrix is denoted as N x P; wherein i is a positive integer from 1 to p, and N is a positive integer from 1 to N;

(g) For T _max Fitting the component variables and the yields of N groups of virtual optimization culture medium formulas in the matrix by using a local weighted regression model to obtain N yield prediction data, and recording the N yield prediction data as

Predicting data based on the yield in step (g)And experimental yield data in step (f)>Calculating pseudo-determinant coefficients R of each of the pseudo-optimized medium formulations using Q1 ² _p Thereby obtaining R ² _p The largest a components, denoted the most important components (TopA components), where a is an integer from 3 to 10; from T _max Other components are deleted from the matrix, and the matrix of the rest data set is marked as an NxTopA matrix;

wherein;

R ² _p determining coefficients for the false;

is the average of all experimental yield data in step (f);

n is an integer from 1 to N;

2. The method according to claim 1, wherein in step (h), the method specifically comprises the steps of:

(h3) At T _max One set of data is not repeatedly deleted from the matrix, the remaining N-1 setsData, matrix expressed as (N-1) x p;

wherein,

f(D)＝wK(D,D ^T )+b Q4

wherein w is a weight coefficient, and b is a bias coefficient;

3. The method of claim 1, wherein in step (f) and step (i), the cells used in the culture experiment are selected from the group consisting of: CHO cells, MDCK cells, BHK cells, sf9 cells, highFive cells, 293 cells, MDBK cells, F81 cells, DF-1 cells, LMH cells, vero cells, PK15 cells, ST cells, marc145 cells, hybridoma cells, diploid cells, immune effector cells; preferably CHO cells, MDCK cells, sf9 cells, vero cells; more preferably CHO cells.

4. The method of claim 2, wherein in step (h 2), the candidate concentration is randomly generated within 15% above and below the reference value; preferably, the candidate concentration is randomly generated within a range of 12% above and below the reference value, more preferably 10%,8%.

5. The method of claim 2, wherein in step (h 5), the weight coefficient w and the bias coefficient b are obtained by minimizing a loss function represented by formula Q3;

min _w,b (y-wK(D,D ^T )-b) ² Q3

wherein,

the definition of the parameters is as defined in claim 2.

6. A method according to claim 1, characterized in that N is 50-150, preferably 60-100.

7. The method of claim 1, wherein p is 20 to 100; preferably p is 20, 30, 50, or 80.

8. The method of claim 1, wherein in step (g), the locally weighted regression model is a LOWESS model.

9. The method of claim 1, wherein R is taken in step (g) ² _p The fraction > 0.5, designated as TopA fraction.

10. The method of claim 1, wherein in step (g), a is an integer from 3 to 5.