CN113450882A - Artificial intelligence-based basic culture medium formula development method and system - Google Patents

Artificial intelligence-based basic culture medium formula development method and system Download PDF

Info

Publication number
CN113450882A
CN113450882A CN202011033081.7A CN202011033081A CN113450882A CN 113450882 A CN113450882 A CN 113450882A CN 202011033081 A CN202011033081 A CN 202011033081A CN 113450882 A CN113450882 A CN 113450882A
Authority
CN
China
Prior art keywords
formula
culture medium
basic
basic culture
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011033081.7A
Other languages
Chinese (zh)
Other versions
CN113450882B (en
Inventor
陈亮
张祥涛
梁楚亨
梁国龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Taili Biotechnology Co.,Ltd.
Original Assignee
Dongguan Taili Biological Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Taili Biological Engineering Co ltd filed Critical Dongguan Taili Biological Engineering Co ltd
Priority to CN202011033081.7A priority Critical patent/CN113450882B/en
Publication of CN113450882A publication Critical patent/CN113450882A/en
Priority to EP21871710.6A priority patent/EP4220646A1/en
Priority to US18/028,555 priority patent/US20240321404A1/en
Priority to PCT/CN2021/131105 priority patent/WO2022063341A1/en
Application granted granted Critical
Publication of CN113450882B publication Critical patent/CN113450882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a basic culture medium formula development method and system based on artificial intelligence. The method comprises the following steps: (1) establishing a sample formula database; (2) obtaining a sample formula culture database; (3) training a machine learning model aiming at an optimization target by adopting the sample formula culture database obtained in the step (2) to obtain a basic culture medium formula culture effect prediction model; (4) and (4) carrying out culture effect regression prediction on the optimization target by adopting the culture effect prediction model of the basic culture medium formula obtained in the step (3) in the search space of the addition proportion of each component in the basic culture medium formula to be optimized, and recommending the basic culture medium formula. The system comprises: the system comprises a sample formula generation module, a sample formula culture database, a regression model training module and a formula recommendation module. The invention simultaneously analyzes all components in the formula, and avoids multiple DOE tests in the traditional method, thereby accelerating the research and development speed and saving the development time of the formula of the basic culture medium.

Description

Artificial intelligence-based basic culture medium formula development method and system
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a basic culture medium formula development method and system based on artificial intelligence.
Background
Although the nutritional requirements of biological samples such as various microorganisms and cells are different, most biological samples require the same basic nutrients. The culture medium prepared according to the basic nutrient substances required by the growth and the reproduction of the common biological samples becomes the basic culture medium. The serum-free animal-origin-free basic culture medium with definite chemical components consists of a carbon source, amino acids, vitamins, trace metal ions, lipids, a buffer reagent and other additive reagents.
The traditional culture medium formula development method is based on one or more classical culture media (such as DEME/F12), by adding a plurality of different components, finding out key components by adopting a single-factor test or DOE screening test, and then optimizing the concentration of each component by using a plurality of DOE experimental designs such as a response curved surface and the like to obtain an optimal formula; or the formula is optimized according to the change condition of each component in the cell growth process and the influence on the yield and the quality of the target product found by cell metabolism analysis, genomics analysis and proteomics analysis.
The existing basic culture medium optimization method cannot perform global optimization on all components at the same time, so that the obtained formula is not optimal. Meanwhile, a formula designer needs to perform tests according to specific optimization targets by utilizing more professional theoretical knowledge of basic chemistry, biochemistry, molecular biology, cell biology and the like mastered by the formula designer, each test cannot contain all components, the time is long, the design threshold is high, and the labor cost and the time cost are huge.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a basic culture medium formula development method and system based on artificial intelligence, and aims to apply a machine learning algorithm to a complex formula optimization process, recommend a basic culture medium formula most likely having a good culture effect in a short time by constructing a sample formula database with good quality and sufficient quantity and selecting a proper machine learning algorithm and optimization algorithm, and reduce the formula development threshold, so that the technical problems of low development speed and high development cost caused by complex components of the existing basic culture medium are solved.
In order to achieve the above object, according to one aspect of the present invention, there is provided a basic medium formulation development method based on artificial intelligence, comprising the steps of:
(1) establishing a sample formula database: obtaining alternative basic culture medium formula components, determining a search space of the addition proportion of each component, searching in the search space of each component to form a basic culture medium sample formula, collecting the basic culture medium sample formula and establishing a sample formula database;
(2) obtaining a sample formula culture database: carrying out experimental verification on the basic culture medium sample formulas stored in the sample formula database obtained in the step (1) according to development purposes to obtain the culture effect of each basic culture medium sample formula, and collecting basic culture medium sample formula data associated with the culture effect as a sample formula culture database;
(3) training a machine learning model aiming at a development target by adopting the sample formula culture database obtained in the step (2) to obtain a basic culture medium formula culture effect prediction model;
(4) and (4) carrying out culture effect regression prediction on a development target by adopting the culture effect prediction model of the basic culture medium formula obtained in the step (3) in a search space of the adding proportion of each component in the basic culture medium formula to be optimized, and preferentially recommending the basic culture medium formula according to the predicted culture effect.
Preferably, the artificial intelligence-based basic medium formula development method, in which the search is performed in each search space to form a training formula, includes, but is not limited to, the following four methods: the method comprises the following steps of randomly generating a formula, designing a DOE experiment formula, mixing to form a formula and a historical AI recommended formula.
Preferably, in the method for developing a basic medium formula based on artificial intelligence, the formula is randomly generated, that is, each component in the basic medium formula is randomly valued in a search space thereof to form a basic medium sample formula;
the DOE experimental design formula comprises the following steps:
s1, clustering the lowest adding proportion of each component in the basic culture medium to obtain a plurality of adding magnitudes; the components in the basic culture medium are classified into functional categories according to functions, wherein the functional categories comprise amino acids, trace metal ions, vitamins, lipids, buffers and the like;
s2, combining the different added magnitudes and the functional categories obtained in the step S1 to form a DOE experiment factor, and forming a basic sample formula by adopting a space filling DOE experiment design; the space filling DOE experiment is designed into a ball filling method, a Latin hypercube method, a uniform method and a minimum potential method; preferably selecting a Latin hypercube method design formula;
the mixing forms a formula, namely screening and combining the existing basic culture medium sample formula to obtain an updated basic culture medium sample formula; preferably, the screening and combination with the existing basal medium sample formula is performed according to the following method:
verifying the culture effect of the existing basal medium sample formula, selecting a formula with higher cell viability, higher cell density or higher protein expression, and mixing two or more than three formulas according to a random or preset proportion to prepare a new formula;
the historical AI recommended formula comprises a basic culture medium formula developed based on artificial intelligence according to the formula development method.
Preferably, in the basic medium formula development method based on artificial intelligence, the number ratio of the randomly generated formula to the DOE experimental design formula in the sample formula database is 1-4: 10.
Preferably, the total amount of the sample in the sample formula database is more than 1000, including 100 to 200 randomly generated formulas, 50 to 200 DOE experimental design formulas and historical AI recommended formulas, and the balance is mixed culture medium.
Preferably, in the method for developing a basic medium formula based on artificial intelligence, in the step (1), the addition ratio of the component is the ratio of the addition value of the component to the addition maximum value, the search space is the minimum addition ratio to 100%, and the minimum addition ratio is the ratio of the addition minimum value to the addition maximum value of the component.
Preferably, in the method for developing a basic culture medium formula based on artificial intelligence, the step (2) of performing experimental verification according to the optimization purpose to obtain the culture effect of each basic culture medium sample formula specifically includes:
the basic culture medium sample formula is adopted to culture target cells, the cell state is detected in the culture process according to time point sampling, the cell state comprises cell viability, cell density and/or biochemical indexes, and the biochemical indexes are as follows: protein expression level, glucose, lactate, ammonia, and/or glutamine content; the cell viability can be fitted to obtain a cell viability curve of the basic culture medium sample formula about the culture time, and the cell density is fitted to obtain a cell growth curve of the basic culture medium sample formula about the culture time; the culture effect of the basic culture medium sample formula is one or more targets of a cell growth curve, a cell activity rate curve, cell density at a specific time point, a cell activity rate and biochemical indexes of the basic culture medium relative to the culture time, or the combination of the targets.
Preferably, the basic medium formulation development method based on artificial intelligence, the step (3) of which the machine learning model includes but is not limited to: a support vector machine regression model, a K nearest neighbor model, XGboost, ridge regression, LightGBM, random forest, GBDT, or deep learning model; the deep learning model includes, but is not limited to: a fully-connected neural network, a convolutional neural network, or a recurrent neural network; preferably a support vector machine regression model.
Preferably, in the method for developing the basic culture medium formula based on artificial intelligence, a global optimization algorithm or a heuristic algorithm is adopted to search basic nutrition, namely the formula, in a search space to perform culture effect regression prediction; the heuristic algorithms include, but are not limited to: a genetic algorithm, a greedy algorithm, an annealing algorithm, an ant colony algorithm, a particle swarm algorithm, an artificial bee colony algorithm, an artificial fish colony algorithm, a shuffled frog-leaping algorithm, a firework algorithm, a bacterial foraging optimization algorithm, and a firefly algorithm; the global optimization algorithm includes, but is not limited to: newton method, quasi-newton method, conjugate gradient method, and gradient descent method commonly used for deep learning; preferably, the gradient descent method is SGD, Momentum, Adagarad, RMSprop, Adam, Nadam.
According to another aspect of the present invention, there is provided an artificial intelligence-based basic medium formula development system, comprising: the system comprises a sample formula generation module, a sample formula culture database, a regression model training module and a formula recommendation module;
the sample formula generation module is used for searching in an addition proportion search space of each component in the basic culture medium formula to be optimized to form a basic culture medium sample formula and establishing a sample formula database;
the sample formula culture database stores each basic culture medium sample formula in the sample formula database and the associated culture effect data thereof;
the regression model training module is used for selecting a regression model, performing regression model training by adopting the basic culture medium sample formula stored in the sample formula culture database and the associated culture effect data thereof, obtaining a basic culture medium formula culture effect prediction model and storing the model;
and the formula recommending module is used for applying the basic culture medium formula culture effect predicting model stored by the regression model training module to the prediction of the basic culture medium formula culture effect in the search space and preferentially recommending the basic culture medium formula.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
according to the basic culture medium formula development method based on artificial intelligence, all components in the formula are analyzed simultaneously through a regression model established by a computer AI technology, multiple DOE tests in the traditional method are avoided, the analysis efficiency and accuracy are improved, the research and development speed is accelerated, the basic culture medium formula development time is saved, the development time of different cell strain formulas can be greatly shortened, the basic culture medium development period which originally needs more than 9 months is shortened to about 5 months, and even the development time can be shortened to half a month to 1 month under the condition of the existing database.
On the other hand, the artificial intelligence technology is adopted to reduce the talent learning cost, avoid inferior formulas caused by poor knowledge, improve the accuracy of the test, avoid invalid waste, and simultaneously can be competent for simultaneously carrying out formula development of a plurality of different cell strains, respectively aiming at a plurality of optimization targets and simultaneously aiming at a plurality of optimization targets, thereby providing one or more optimal formulas for each cell strain.
Drawings
FIG. 1 is a schematic flow chart of a basic culture medium formulation development method based on artificial intelligence provided by the invention;
FIG. 2 is a seven day maximum cell density prediction accuracy evaluation of various machine learning models against a sample formula database under 15-fold cross validation;
FIG. 3 is a fifth day cell density prediction accuracy evaluation of various machine learning models against a sample formula database under 15-fold cross validation;
FIG. 4 is a schematic flow chart of a basic medium formulation development method according to an embodiment of the present invention;
FIG. 5 is a graph comparing the predicted value and the actual value data of the culture result of the basal medium formula recommended for the maximum cell density within seven days according to the embodiment of the invention;
FIG. 6 is a cell growth curve of a recommended basal medium formulation for a maximum cell density within seven days according to an embodiment of the invention;
FIG. 7 is a graph comparing the predicted value and the actual value data of the culture result of the basal medium formula recommended for the cell density on the fifth day in the embodiment of the present invention;
FIG. 8 is a cell growth curve of the recommended basal medium formulation for cell density on day five for an example of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a basic culture medium formula development method based on artificial intelligence, which comprises the following steps as shown in figure 1:
(1) establishing a sample formula database: determining a search space of the addition proportion of each component in a basic culture medium formula to be developed, searching in the search space of each component to form a basic culture medium sample formula, collecting the basic culture medium sample formula and establishing a sample formula database; the addition proportion of the component is the ratio of the addition value of the component to the addition maximum value, the search space is the lowest addition proportion to 100 percent, and the lowest addition proportion is the ratio of the addition minimum value to the addition maximum value of the component;
the searching in the search space of each component forms a training formula, and includes but is not limited to the following four methods: randomly generating a formula, designing a DOE experiment formula, mixing to form a formula and a historical AI recommended formula;
the formula is randomly generated, namely, values of all components in the basic culture medium formula are randomly taken in a search space of the formula to form a basic culture medium sample formula;
the DOE experimental design formula comprises the following steps:
s1, clustering the lowest adding proportion of each component in the basic culture medium to obtain a plurality of adding magnitudes; the components in the basic culture medium are classified into functional categories according to functions, wherein the functional categories comprise amino acids, trace metal ions, vitamins, lipids, buffers and the like;
s2, combining the different added magnitudes and the functional categories obtained in the step S1 to form a DOE experiment factor, and forming a basic sample formula by adopting a space filling DOE experiment design, wherein the space filling DOE experiment design is a ball filling method, a Latin hypercube method, a uniform method and a least potential method; the Latin hypercube method design formula is preferred.
The mixing forms a formula, namely screening and combining the existing basic culture medium sample formula to obtain an updated basic culture medium sample formula; preferably, the basic culture medium sample formula is screened and combined with the existing basic culture medium sample formula according to the following method; verifying the culture effect of the existing basal medium sample formula, selecting a formula with higher cell viability, higher cell density or higher protein expression, and mixing two or more than three formulas according to a random or preset proportion to prepare a new formula.
The historical AI recommended formula comprises a basic culture medium formula which can be developed based on artificial intelligence according to the formula development method.
The quality and quantity of the basic culture medium sample formula are the key and the premise of optimizing the optimization effect of the basic culture medium by artificial intelligence. The basic culture medium sample formula needs to reach hundreds for machine learning to perform more accurate model training, meanwhile, the basic culture medium sample formula covers different known and unknown dimensions and different effects, and more basic culture medium sample formulas are distributed in a high-dimensional space region with better effect and a high-dimensional space region with obvious change. Therefore, in order to increase the number and select the basic culture medium sample formula more densely in the high-dimensional space region with better effect and the high-dimensional space region with obvious change, the invention is more preferably inclined to the DOE experimental design formula, and simultaneously, in order to improve the coverage of the sample formulas with different dimensions and different effects, the basic culture medium sample formula is increased randomly and the mode of forming the formula by mixing is assisted. Experiments show that the sample formula database established by combining the three methods has better training effect of a machine learning model.
Preferably, the number ratio of the randomly generated formula to the DOE experimental design formula in the sample formula database is 1-4: 10; the generalization capability of the method is ensured by randomly generating the formula, and the prediction precision can be better improved by the DOE experimental design formula.
The total amount of the samples in the sample formula database is more than 100, so that the artificial intelligent basic culture medium formula development method provided by the invention can be realized, preferably, the total amount of the samples in the sample formula database is more than 1000, wherein the total amount comprises 100 to 200 randomly generated formulas and 50 to 200 DOE experiment design formulas, so as to control the time cost for preparing the culture medium, and the historical AI recommended formula is inevitably subjected to preparation and culture effect verification in the continuous experiment process, so that the cost for preparing the culture medium and verifying the culture effect is not additionally increased; the rest is the mixed culture medium, and the time cost for preparing the culture medium can be greatly reduced because the prepared culture medium is adopted for mixing.
(2) Obtaining a sample formula culture database: carrying out experimental verification on the basic culture medium sample formulas stored in the sample formula database obtained in the step (1) according to the optimization purpose to obtain the culture effect of each basic culture medium sample formula, and collecting basic culture medium sample formula data associated with the culture effect as a sample formula culture database; the experimental verification is carried out according to the optimization purpose to obtain the culture effect of each basic culture medium sample formula, and the experimental verification specifically comprises the following steps:
the basic culture medium sample formula is adopted to culture target cells, the cell state is detected in the culture process according to time point sampling, the cell state comprises cell viability, cell density and/or biochemical indexes, and the biochemical indexes are as follows: protein expression level, glucose, lactate, ammonia, and/or glutamine content; the cell viability can be fitted to obtain a cell viability curve of the basic culture medium sample formula about the culture time, and the cell density is fitted to obtain a cell growth curve of the basic culture medium sample formula about the culture time; the culture effect of the basic culture medium sample formula is a cell growth curve, a cell viability curve, or a cell density, a cell viability, and a biochemical index of the basic culture medium at a specific time point, or may be the multiple indexes (corresponding to a machine learning model using multi-objective optimization), or may be a comprehensive index of the multiple indexes (for example, a weighted sum of specific multiple indexes).
(3) Training a machine learning model aiming at an optimization target by adopting the sample formula culture database obtained in the step (2) to obtain a basic culture medium formula culture effect prediction model;
the machine learning models include, but are not limited to: a support vector machine regression model, a K nearest neighbor model, XGboost, ridge regression, LightGBM, random forest, GBDT, or deep learning model; the deep learning model includes, but is not limited to: a fully-connected neural network, a convolutional neural network, or a recurrent neural network;
wherein the regression model effect of the support vector machine is better. According to the 15-fold cross validation, the support vector machine regression model has better model performance and is a continuous micro model as shown in fig. 2 and fig. 3, and the optimal formula is recommended later, so that the model has great advantages. The other machine learning models, such as the K nearest neighbor model, have lower accuracy on the test set; additionally, tree structure based machine learning models such as XGBoost, LightGBM, random forest, GBDT are not continuously differentiable models. The deep learning model is such as a fully connected neural network, a convolutional neural network, a cyclic neural network and the like, the convolutional neural network is suitable for data with translation invariance, such as image processing, and the cyclic neural network is suitable for sequence type data, such as speech text processing, and has no great advantage in predicting the optimal formula. In addition, the deep learning model requires a large amount of data and is higher in cost. Therefore, in summary, the lowest root mean square error of the regression model of the support vector machine is the preferred model.
(4) And (4) carrying out culture effect regression prediction on an optimization target by adopting the culture effect prediction model of the basic culture medium formula obtained in the step (3) in a search space of the adding proportion of each component in the basic culture medium formula to be optimized, and preferentially recommending the basic culture medium formula according to the predicted culture effect. Specifically, the method comprises the following steps:
preferably, a global optimization algorithm or a heuristic algorithm is adopted to search a basic culture medium formula in a search space to perform culture effect regression prediction; the heuristic algorithms include, but are not limited to: genetic algorithm, greedy algorithm, annealing algorithm, ant colony algorithm, particle swarm algorithm, artificial bee colony algorithm, artificial fish colony algorithm, shuffled frog leaping algorithm, firework algorithm, bacterial foraging optimization algorithm and firefly algorithm; the global optimization algorithm includes, but is not limited to: newton's method, quasi-Newton's method, conjugate gradient method, and gradient descent method commonly used for deep learning, and commonly used gradient descent method variants are SGD, Momentum, Adagrad, RMSprop, Adam, Nadam, etc.
Most heuristic algorithms such as genetic algorithm, ant colony algorithm, particle swarm algorithm, artificial bee colony algorithm, artificial fish swarm algorithm, shuffled frog leaping algorithm, firework algorithm, bacterial foraging optimization algorithm, firefly algorithm and other swarm optimization algorithms need to occupy a large amount of cache and a large amount of calculation in terms of space complexity. The greedy algorithm is not easy to reach the global optimum value on the problem that the multiple factors of the formula with various component contents are mutually influenced. The annealing algorithm is inferior to the gradient descent method in which the gradient is used as an evaluation index, simply by using the increment. In addition, the algorithms such as the Newton method, the quasi-Newton method and the conjugate gradient method adopt second-order gradient approximation, and are theoretically superior to first-order gradient approximation of a gradient descent method, but the calculation load of the methods is larger due to the complex formula. Variants of the gradient descent method are therefore preferred.
The invention provides a basic culture medium formula development system based on artificial intelligence, which comprises: the system comprises a sample formula generation module, a sample formula culture database, a regression model training module and a formula recommendation module;
the sample formula generation module is used for searching in an addition proportion search space of each component in the basic culture medium formula to be optimized to form a basic culture medium sample formula and establishing a sample formula database;
the sample formula culture database stores each basic culture medium sample formula in the sample formula database and the associated culture effect data thereof;
the regression model training module is used for selecting a regression model, performing regression model training by adopting the basic culture medium sample formula stored in the sample formula culture database and the associated culture effect data thereof, obtaining a basic culture medium formula culture effect prediction model and storing the model;
and the formula recommending module is used for applying the basic culture medium formula culture effect predicting model stored by the regression model training module to the prediction of the basic culture medium formula culture effect in the search space and preferentially recommending the basic culture medium formula.
The following are examples:
a basic culture medium formula development method based on artificial intelligence takes a small amount of formula components as examples: the method comprises the following steps:
(1) establishing a sample formula database: determining a search space of each component in addition proportion, searching in the search space of each component to form a basic culture medium sample formula, collecting the basic culture medium sample formula, and establishing a sample formula database; the addition proportion of the component is the ratio of the addition value of the component to the addition maximum value, the search space is the minimum addition proportion to 100 percent, and the minimum addition proportion is the proportion of the addition minimum value to the addition maximum value of the component;
the method for searching in the search space of each component to form the training formula comprises the following four methods: randomly generating a formula, designing a DOE experiment formula, mixing to form a formula and a historical AI recommended formula;
the formula is randomly generated, namely, values of all components in the basic culture medium formula are randomly taken in a search space of the formula to form a basic culture medium sample formula;
in this embodiment, the DOE experiment design formula is specifically as follows:
except a small amount of unchanged components (such as glucose) in the formula, all other components are divided into 5 classes according to other substances such as amino acid, trace metal ions, vitamins, lipids, buffer reagents and the like, in each class, the maximum addition value of each component is 100%, the minimum value is divided by the maximum value to be the minimum addition percentage in the formula, the components with the minimum addition percentage close to the minimum addition percentage are selected to form a new class, 9 classes, namely nine factors, are formed on the basis of the 5 classes, and 90 formulas are designed by adopting a Latin hypercube method in space filling DOE experimental design.
The formulation formed by mixing the materials used in this example is as follows: verifying the culture effect of the existing basal medium sample formula, selecting a formula with higher cell viability, higher cell density or higher protein expression, and mixing two or more than three formulas according to a random proportion to prepare a new formula.
The historical AI recommended formula comprises a basic culture medium formula which can be optimized based on artificial intelligence according to the formula optimization method.
The sample formula database finally established in this embodiment includes 1000-1500 basic medium formulas, including 90 DOE experiment design formulas, 200 random formulas, 100-200 historical AI recommended formulas, and the balance of 700 mixed formulas.
(2) Obtaining a sample formula culture database: carrying out experimental verification on the basic culture medium sample formulas stored in the sample formula database obtained in the step (1) according to the optimization purpose to obtain the culture effect of each basic culture medium sample formula, and collecting basic culture medium sample formula data associated with the culture effect as a sample formula culture database; the experimental verification is performed according to the optimization purpose to obtain the culture effect of each basic culture medium sample formula, and the embodiment specifically comprises the following steps:
the method is carried out in a batch culture mode, and comprises the following steps: according to 0.5X 106cell/mL, the culture volume is 10mL, the culture container is 50mL mini biorator, the rotating speed of a shaking table is 180rpm, the culture time is seven days, samples are taken on the third day, the fifth day and the seventh day in the culture process, the cell density is counted, biochemical parameters such as glucose, lactic acid, ammonia, glutamine, protein expression quantity and the like are detected, and the glucose is supplemented to 4-5 g/L according to the consumption condition of the glucose. To obtain complete experimental data, samples were taken each day.
When the AI builds the model, single-target modeling or multi-target modeling can be carried out by only adopting the cell viability, the cell density or the biochemical parameters of a certain sampling point; regression models can also be constructed using cell viability plots or cell density plots.
This example obtained the following data: cell density at each sampling point, maximum cell density over seven days, cell growth curves were plotted using the cell density data.
The contents of all components in the formula, data in the preparation process of the culture medium, culture effect and other related information are recorded in a culture database for storage.
(3) Training a regression model by adopting the sample formula culture database obtained in the step (2) to obtain a basic culture medium formula culture effect prediction model; specifically, the method comprises the following steps:
loading a regression model of a support vector machine by adopting a python language, and performing 15-fold cross validation by adopting an RBF (Gaussian) kernel function; obtaining a formula culture effect prediction model 1 aiming at the maximum cell density within seven days, wherein two decimal places of the average root mean square error are kept to be about 0.39; obtaining a formula culture medium effect prediction model 2 aiming at the cell density of the fifth day, wherein two decimal places of the average root mean square error are kept to be about 0.41; the model is strived to perfectly predict the yield and quality of cultured cells with different contents of each component of the culture medium formula.
And (4) if the training result does not meet the standard, repeating the steps (1) to (2) to increase the amount of training sample data.
(4) And (4) carrying out culture effect regression prediction on an optimization target by adopting the culture effect prediction model of the basic culture medium formula obtained in the step (3) in a search space of the adding proportion of each component in the basic culture medium formula to be optimized, and preferentially recommending the basic culture medium formula according to the predicted culture effect. The embodiment specifically includes:
based on the maximum cell density in the seven days and the cell density in the fifth day, model information is deeply mined, and the cell culture effect which is most likely to occur under different contents of each component is numerically simulated in a computer. The simulation calculation adopts a gradient descending method, namely the gradient is the change influence of one unit increase of each component in specific content on the culture effect of the cells, the gradient is divided into a positive gradient and a negative gradient, the positive gradient indicates that the improvement of the content of the component is beneficial to the culture effect of the cells, and the negative gradient indicates that the improvement of the culture effect of the component which is harmful to the cells. Based on the simulation results, the component content is gradually corrected (positive gradient increases the component content, negative gradient decreases the component content, the content value of the increased or decreased component is in direct proportion to the gradient value), the simulation calculation is repeated, and then the component is adjusted according to the gradient. Repeating the above processes until the numerical simulation finds that the gradient is infinitely close to 0, and the culture effect of the cell simulation cannot be further improved by correcting the component content, so that the culture medium formula is the optimal formula under the model simulation. The gradient descent method is algorithmically formatted as SGD, Momentum, adarad, RMSprop, Adam, and the like.
The influence of each component on the culture effect of the cells can be successfully measured by successfully building the machine learning model. If the machine learning model has the problems of incapability of convergence, low accuracy, poor generalization capability and the like during training, judging that the data is insufficient, and repeating the steps (1) to (3) to continue random generation, DOE formula design or formula mixing to form more basic culture medium sample formulas, expanding the data and optimizing the machine learning model.
And (4) determining the contents of all components in the recommended optimal formula obtained through numerical simulation after machine learning in detail, preparing a culture medium according to the formula, performing batch culture experiments, and repeating the step (4) if the culture effect of cell culture does not meet the experiment requirements. After the model is built through the machine learning, the recommended formula with the components adjusted is checked, a basic culture medium formula is generated respectively according to the maximum cell density within seven days, and the cells are cultured in batches after being inoculated with a culture medium. The cell survival rate is kept stable, the seven-day mortality rate is 0%, the cell density is extremely high, the model predicted value is close to the actual value, the predicted formula is reliable, the specific formula data is shown in the following table, the comparison of the predicted value and the actual value of the cultured cell is shown in fig. 5, and the cell growth curve is shown in fig. 6.
Figure BDA0002704365650000141
And (4) generating a basic culture medium formula aiming at the cell density on the fifth day, and inoculating the cells into the culture medium for batch culture. The cell survival rate is kept stable, the seven-day mortality rate is 0%, the cell density is extremely high, the model predicted value is close to the actual value, the predicted formula is reliable, the specific formula data is shown in the following table, the comparison of the predicted value and the actual value of the cultured cell is shown in figure 7, and the cell growth curve is shown in figure 8.
Figure BDA0002704365650000142
The formula optimization cycle of the embodiment is as follows: if the method comprises the steps of constructing a sample formula culture database (1000+ formula) and performing machine learning model training, the period is about 5 months; when the culture medium is used, the training of a learning model, the formula recommendation and the effect verification are carried out, and the formula development can be carried out aiming at the culture effect contained in the database in half a month, so that the development period of the basic culture medium is greatly shortened, and the development threshold of the basic culture medium is reduced.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A basic culture medium formula development method based on artificial intelligence is characterized by comprising the following steps:
(1) establishing a sample formula database: obtaining alternative basic culture medium formula components, determining a search space of the addition proportion of each component, searching in the search space of each component to form a basic culture medium sample formula, collecting the basic culture medium sample formula and establishing a sample formula database;
(2) obtaining a sample formula culture database: carrying out experimental verification on the basic culture medium sample formulas stored in the sample formula database obtained in the step (1) according to development purposes to obtain the culture effect of each basic culture medium sample formula, and collecting basic culture medium sample formula data associated with the culture effect as a sample formula culture database;
(3) training a machine learning model aiming at a development target by adopting the sample formula culture database obtained in the step (2) to obtain a basic culture medium formula culture effect prediction model;
(4) and (4) carrying out culture effect regression prediction on a development target by adopting the culture effect prediction model of the basic culture medium formula obtained in the step (3) in a search space of the adding proportion of each component in the basic culture medium formula to be optimized, and preferentially recommending the basic culture medium formula according to the predicted culture effect.
2. The artificial intelligence based basic media formulation development method of claim 1, wherein the searching in the search space of each component forms a training formulation, including but not limited to the following four methods: the method comprises the following steps of randomly generating a formula, designing a DOE experiment formula, mixing to form a formula and a historical AI recommended formula.
3. The method for developing a basic culture medium formula based on artificial intelligence of claim 2, wherein the formula is randomly generated, that is, for each component in the basic culture medium formula, values are randomly taken in a search space thereof to form a basic culture medium sample formula;
the DOE experimental design formula comprises the following steps:
s1, clustering the lowest adding proportion of each component in the basic culture medium to obtain a plurality of adding magnitudes; the components in the basic culture medium are classified into functional categories according to functions, wherein the functional categories comprise amino acids, trace metal ions, vitamins, lipids, buffers and the like;
s2, combining the different added magnitudes and the functional categories obtained in the step S1 to form a DOE experiment factor, and forming a basic sample formula by adopting a space filling DOE experiment design; the space filling DOE experiment is designed into a ball filling method, a Latin hypercube method, a uniform method and a minimum potential method; preferably the latin hypercube method;
the mixing forms a formula, namely screening and combining the existing basic culture medium sample formula to obtain an updated basic culture medium sample formula; preferably, the basic culture medium sample formula is screened and combined with the existing basic culture medium sample formula according to the following method;
verifying the culture effect of the existing basal medium sample formula, selecting a formula with higher cell viability, higher cell density or higher protein expression, and mixing two or more than three formulas according to a random or preset proportion to prepare a new formula;
the historical AI recommended formula comprises a basic culture medium formula which is recommended based on an artificial intelligence method according to the formula development method.
4. The artificial intelligence based basal media formulation development method of claim 2, wherein the number ratio of the randomly generated formulation to the DOE design of experiment formulation in the sample formulation database is between 1-4: 10.
5. The artificial intelligence based basal media formulation development method of claim 2, wherein the total number of samples in the sample formulation database is 1000 or more, including randomly generated formulations 100 to 200, DOE design of experiments formulations 50 to 200, and historical AI recommended formulations, and the balance is mixed media.
6. The artificial intelligence based basic media formulation development method of claim 1, wherein the addition ratio of the components in step (1) is the ratio of the addition value of the components to the addition maximum value, the search space is the minimum addition ratio to 100%, and the minimum addition ratio is the ratio of the addition minimum value to the addition maximum value of the components.
7. The artificial intelligence-based basic medium formula development method according to claim 1, wherein the experimental verification according to the development purpose in the step (2) to obtain the culture effect of each basic medium sample formula specifically comprises:
the basic culture medium sample formula is adopted to culture target cells, the cell state is detected in the culture process according to time point sampling, the cell state comprises cell viability, cell density and/or biochemical indexes, and the biochemical indexes are as follows: protein expression level, glucose, lactate, ammonia, and/or glutamine content; the cell viability can be fitted to obtain a cell viability curve of the basic culture medium sample formula about the culture time, and the cell density is fitted to obtain a cell growth curve of the basic culture medium sample formula about the culture time; the culture effect of the basic culture medium sample formula is the combination of one or more of a cell growth curve, a cell viability curve, a cell density at a specific time point, a cell viability and a biochemical index of the basic culture medium relative to the culture time.
8. The artificial intelligence based basal media formulation development method of claim 1, wherein the machine learning model of step (3) includes but is not limited to: a support vector machine regression model, a K nearest neighbor model, XGboost, ridge regression, LightGBM, random forest, GBDT, or deep learning model; the deep learning model includes, but is not limited to: a fully-connected neural network, a convolutional neural network, or a recurrent neural network; preferably a support vector machine regression model.
9. The artificial intelligence-based basic medium formula development method according to claim 1, wherein the step (4) searches the basic medium formula in a search space by using a global optimization algorithm or a heuristic algorithm to perform regression prediction of the culture effect; the heuristic algorithms include, but are not limited to: a genetic algorithm, a greedy algorithm, an annealing algorithm, an ant colony algorithm, a particle swarm algorithm, an artificial bee colony algorithm, an artificial fish colony algorithm, a shuffled frog-leaping algorithm, a firework algorithm, a bacterial foraging optimization algorithm, and a firefly algorithm; the global optimization algorithm includes, but is not limited to: newton method, quasi-newton method, conjugate gradient method, and gradient descent method commonly used for deep learning; preferably, the gradient descent method is SGD, Momentum, Adagarad, RMSprop, Adam, Nadam.
10. A basic culture medium formula development system based on artificial intelligence is characterized by comprising: the system comprises a sample formula generation module, a sample formula culture database, a regression model training module and a formula recommendation module;
the sample formula generation module is used for searching in an addition proportion search space of each component in the basic culture medium formula to be optimized to form a basic culture medium sample formula and establishing a sample formula database;
the sample formula culture database stores each basic culture medium sample formula in the sample formula database and the associated culture effect data thereof;
the regression model training module is used for selecting a regression model, performing regression model training by adopting the basic culture medium sample formula stored in the sample formula culture database and the associated culture effect data thereof, obtaining a basic culture medium formula culture effect prediction model and storing the model;
and the formula recommending module is used for applying the basic culture medium formula culture effect predicting model stored by the regression model training module to the prediction of the basic culture medium formula culture effect in the search space and preferentially recommending the basic culture medium formula.
CN202011033081.7A 2020-09-27 2020-09-27 Artificial intelligence-based basic culture medium formula development method and system Active CN113450882B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202011033081.7A CN113450882B (en) 2020-09-27 2020-09-27 Artificial intelligence-based basic culture medium formula development method and system
EP21871710.6A EP4220646A1 (en) 2020-09-27 2021-11-17 Basal culture medium development method, basal culture medium formulation and development, and system thereof
US18/028,555 US20240321404A1 (en) 2020-09-27 2021-11-17 Basal culture medium development method, basal culture medium formulation and development, and system thereof
PCT/CN2021/131105 WO2022063341A1 (en) 2020-09-27 2021-11-17 Basal culture medium development method, basal culture medium formulation and development, and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011033081.7A CN113450882B (en) 2020-09-27 2020-09-27 Artificial intelligence-based basic culture medium formula development method and system

Publications (2)

Publication Number Publication Date
CN113450882A true CN113450882A (en) 2021-09-28
CN113450882B CN113450882B (en) 2022-03-01

Family

ID=77808569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011033081.7A Active CN113450882B (en) 2020-09-27 2020-09-27 Artificial intelligence-based basic culture medium formula development method and system

Country Status (1)

Country Link
CN (1) CN113450882B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450868A (en) * 2020-11-26 2021-09-28 东莞太力生物工程有限公司 Basic culture medium development method based on culture index evaluation
CN114121163A (en) * 2021-11-30 2022-03-01 深圳太力生物技术有限责任公司 Culture medium prediction system based on ensemble learning, training and culture medium prediction method
WO2022063341A1 (en) * 2020-09-27 2022-03-31 深圳太力生物技术有限责任公司 Basal culture medium development method, basal culture medium formulation and development, and system thereof
CN114325777A (en) * 2021-11-11 2022-04-12 中航机载系统共性技术有限公司 Cycle slip detection and restoration method, device and equipment
CN115982178A (en) * 2023-03-21 2023-04-18 佛山市恒益环保建材有限公司 Intelligent formula batching method and system for autoclaved aerated concrete product
CN117121949A (en) * 2023-09-22 2023-11-28 广州麦乐生物科技有限公司 Adult formula milk powder for rebuilding and strengthening human immunity
US11837333B1 (en) 2022-12-20 2023-12-05 Dow Global Technologies Llc Simulation guided inverse design for material formulations
CN117497038A (en) * 2023-11-28 2024-02-02 上海倍谙基生物科技有限公司 Method for rapidly optimizing culture medium formula based on nuclear method
CN118098338A (en) * 2024-04-29 2024-05-28 福瑞莱环保科技(深圳)股份有限公司 Microorganism culture condition prediction method and system based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080108553A1 (en) * 2006-11-08 2008-05-08 Yen-Tung Luan Rationally designed media for cell culture
CN102994441A (en) * 2012-09-19 2013-03-27 上海瀚康生物医药科技有限公司 Cell culture medium, and preparation method and use thereof
CN105018409A (en) * 2014-04-22 2015-11-04 中国科学院沈阳应用生态研究所 Methyl nutritional type bacillus spore-production fermentation medium and culture method thereof
CN106096077A (en) * 2016-05-26 2016-11-09 新乡医学院 Medium optimization method
CN111063391A (en) * 2019-12-20 2020-04-24 海南大学 Non-culturable microorganism screening system based on generation type confrontation network principle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080108553A1 (en) * 2006-11-08 2008-05-08 Yen-Tung Luan Rationally designed media for cell culture
CN102994441A (en) * 2012-09-19 2013-03-27 上海瀚康生物医药科技有限公司 Cell culture medium, and preparation method and use thereof
CN105018409A (en) * 2014-04-22 2015-11-04 中国科学院沈阳应用生态研究所 Methyl nutritional type bacillus spore-production fermentation medium and culture method thereof
CN106096077A (en) * 2016-05-26 2016-11-09 新乡医学院 Medium optimization method
CN111063391A (en) * 2019-12-20 2020-04-24 海南大学 Non-culturable microorganism screening system based on generation type confrontation network principle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FU-GANG HU等: "Modeling of multilayered media using effective medium theory", 《IEEE》 *
杜琳: "北冬虫夏草培养基优化及活性成分研究", 《中国优秀博硕士学位论文全文数据库(硕士)农业科技辑》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022063341A1 (en) * 2020-09-27 2022-03-31 深圳太力生物技术有限责任公司 Basal culture medium development method, basal culture medium formulation and development, and system thereof
CN113450868B (en) * 2020-11-26 2022-07-08 深圳太力生物技术有限责任公司 Basic culture medium development method based on culture index evaluation
CN113450868A (en) * 2020-11-26 2021-09-28 东莞太力生物工程有限公司 Basic culture medium development method based on culture index evaluation
CN114325777A (en) * 2021-11-11 2022-04-12 中航机载系统共性技术有限公司 Cycle slip detection and restoration method, device and equipment
CN114325777B (en) * 2021-11-11 2023-10-13 中航机载系统共性技术有限公司 Cycle slip detection and repair method, device and equipment
CN114121163B (en) * 2021-11-30 2023-10-27 深圳太力生物技术有限责任公司 Culture medium prediction system, training and culture medium prediction method based on ensemble learning
CN114121163A (en) * 2021-11-30 2022-03-01 深圳太力生物技术有限责任公司 Culture medium prediction system based on ensemble learning, training and culture medium prediction method
US11837333B1 (en) 2022-12-20 2023-12-05 Dow Global Technologies Llc Simulation guided inverse design for material formulations
CN115982178B (en) * 2023-03-21 2023-07-25 佛山市恒益环保建材有限公司 Intelligent formula batching method and system for autoclaved aerated concrete products
CN115982178A (en) * 2023-03-21 2023-04-18 佛山市恒益环保建材有限公司 Intelligent formula batching method and system for autoclaved aerated concrete product
CN117121949A (en) * 2023-09-22 2023-11-28 广州麦乐生物科技有限公司 Adult formula milk powder for rebuilding and strengthening human immunity
CN117497038A (en) * 2023-11-28 2024-02-02 上海倍谙基生物科技有限公司 Method for rapidly optimizing culture medium formula based on nuclear method
CN118098338A (en) * 2024-04-29 2024-05-28 福瑞莱环保科技(深圳)股份有限公司 Microorganism culture condition prediction method and system based on deep learning
CN118098338B (en) * 2024-04-29 2024-07-16 福瑞莱环保科技(深圳)股份有限公司 Microorganism culture condition prediction method and system based on deep learning

Also Published As

Publication number Publication date
CN113450882B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN113450882B (en) Artificial intelligence-based basic culture medium formula development method and system
Choon et al. Differential bees flux balance analysis with OptKnock for in silico microbial strains optimization
CN113450868B (en) Basic culture medium development method based on culture index evaluation
Kim et al. Flux balance analysis of primary metabolism in the diatom Phaeodactylum tricornutum
Otwinowski et al. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter
Costanza et al. Robust design of microbial strains
EP4220646A1 (en) Basal culture medium development method, basal culture medium formulation and development, and system thereof
CN114360652B (en) Cell strain similarity evaluation method and similar cell strain culture medium formula recommendation method
Greene et al. Acceleration strategies to enhance metabolic ensemble modeling performance
CN114678085A (en) Metabolic parameter-based supplemented medium development method and system
Ginovart et al. Digital image analysis of yeast single cells growing in two different oxygen concentrations to analyze the population growth and to assist individual-based modeling
Knijnenburg et al. Exploiting combinatorial cultivation conditions to infer transcriptional regulation
US20240209306A1 (en) Methods and Systems for Optimizing Culture Conditions in a Culture Process
Scott Jr et al. Curation and analysis of a Saccharomyces cerevisiae genome-scale metabolic model for predicting production of sensory impact molecules under enological conditions
CN103164631A (en) Intelligent coordinate expression gene analyzer
Ferreira et al. Protein constraints in genome‐scale metabolic models: Data integration, parameter estimation, and prediction of metabolic phenotypes
Ginovart et al. Analysis of the effect of inoculum characteristics on the first stages of a growing yeast population in beer fermentations by means of an individual-based model
Zilio et al. Predicting evolution in experimental range expansions of an aquatic model system
CN114121161B (en) Culture medium formula development method and system based on transfer learning
Bai et al. Advances and applications of machine learning and intelligent optimization algorithms in genome-scale metabolic network models
Prats et al. Individual-based modelling and simulation of microbial processes: yeast fermentation and multi-species composting
Lee et al. Comparison of optimization-modelling methods for metabolites production in Escherichia coli
CN114611386A (en) Culture medium mixing proportion optimization method, device, equipment and medium
Portell et al. INDISIM-Saccha, an individual-based model to tackle Saccharomyces cerevisiae fermentations
Metcalf et al. Rhythm of the Night (and Day): Predictive Metabolic Modeling of Diurnal Growth in Chlamydomonas

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220125

Address after: 518048 No. 323-m, third floor, comprehensive Xinxing phase I, No. 1, Haihong Road, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Taili Biotechnology Co.,Ltd.

Address before: 523570 buildings 3 and 4, gaobao green technology city, Tutang Industrial Zone, Changping Town, Dongguan City, Guangdong Province

Applicant before: DONGGUAN TAILI BIOLOGICAL ENGINEERING CO.,LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant