CN112017730A - Cell screening method and device based on expression quantity prediction model - Google Patents

Cell screening method and device based on expression quantity prediction model Download PDF

Info

Publication number
CN112017730A
CN112017730A CN202010870681.2A CN202010870681A CN112017730A CN 112017730 A CN112017730 A CN 112017730A CN 202010870681 A CN202010870681 A CN 202010870681A CN 112017730 A CN112017730 A CN 112017730A
Authority
CN
China
Prior art keywords
cell
cells
protein expression
texture
expression quantity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010870681.2A
Other languages
Chinese (zh)
Other versions
CN112017730B (en
Inventor
陈亮
买买提依明·哈斯木
韩晓健
梁楚亨
梁国龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Taili Biotechnology Co ltd
Original Assignee
Dongguan Taili Biological Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Taili Biological Engineering Co ltd filed Critical Dongguan Taili Biological Engineering Co ltd
Priority to CN202010870681.2A priority Critical patent/CN112017730B/en
Publication of CN112017730A publication Critical patent/CN112017730A/en
Priority to PCT/CN2021/114168 priority patent/WO2022042509A1/en
Application granted granted Critical
Publication of CN112017730B publication Critical patent/CN112017730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • G06T2207/10061Microscopic image from scanning electron microscope
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10064Fluorescence image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The application relates to a cell screening method, a cell screening device, computer equipment and a storage medium based on an expression quantity prediction model, wherein the cell screening method comprises the following steps: obtaining a gray-scale image of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray-scale image of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features; inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; according to the predicted protein expression amount, the target cells with the predicted protein expression amount meeting the set conditions are determined from the multiple cells to be detected, the cells with high protein expression amount are rapidly determined, the cell screening can be carried out after repeated culture and screening is avoided, and the screening period is greatly shortened.

Description

Cell screening method and device based on expression quantity prediction model
Technical Field
The present application relates to the field of biotechnology, and in particular, to a cell screening method and apparatus based on an expression prediction model, a computer device, and a storage medium.
Background
With the continuous development of genetic engineering technology, the isolation of monoclonal cell lines capable of expressing specific products from cell pools has become a common need in the biological field.
In the prior art, when obtaining cells for culturing monoclonal cell strains, cells in a cell pool can be transfected first, and the cell pool is treated by a limiting dilution method to obtain single cells, and then a homogeneous cell population, i.e., a cell strain, can be cultured by the single cells, and the cell strain with high expression level of a target protein is screened.
However, the process of obtaining single cells by the limiting dilution method is complicated, and requires repeated culture and screening, and meanwhile, due to the problem of cell transfection efficiency, the proportion of cells with high target protein expression level is low, so that the screening work efficiency of the screened cells is low, the screening period is long, and it is difficult to rapidly and accurately obtain cells with high target protein expression level.
Disclosure of Invention
In view of the above, it is desirable to provide a cell screening method, device, computer device, and storage medium based on an expression level prediction model, in order to solve the above-described problems.
A cell screening method based on an expression level prediction model, the method comprising:
obtaining a gray-scale image of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray-scale image of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and according to the predicted protein expression quantity, determining a target cell with the predicted protein expression quantity meeting set conditions from the plurality of cells to be detected.
Optionally, the method further comprises:
obtaining a sample cell gray-scale image and a corresponding fluorescence image thereof;
acquiring a plurality of sample cell texture characteristics of the sample cell gray-scale map and a real protein expression amount corresponding to the fluorescence map;
obtaining expression quantity labels respectively corresponding to the cell texture characteristics of a plurality of samples according to the real protein expression quantity;
training a plurality of regression models of different types by adopting a plurality of sample cell texture characteristics and expression quantity labels corresponding to the sample cell texture characteristics;
and according to the training results of the multiple regression models, screening the optimal cell texture features with the highest contribution degree to the predicted protein expression amount from the multiple sample cell texture features, and determining the regression model with the smallest prediction error from the multiple regression models to serve as the expression amount prediction model.
Optionally, the training of multiple regression models of different types by using the multiple sample cell texture features and their corresponding expression quantity labels includes:
respectively inputting the texture characteristics of a plurality of sample cells into the plurality of regression models, and obtaining the current prediction error of each regression model according to the predicted protein expression quantity output by the plurality of regression models and the expression quantity label of the texture characteristics of the corresponding sample cells;
and aiming at each regression model, adjusting the model parameters of the regression model according to the current prediction error, and inputting the cell texture characteristics of the sample again to carry out model training until the training end condition is met, thereby obtaining the optimized regression model of the regression model.
Optionally, the inputting the texture features of the plurality of sample cells into the plurality of regression models respectively comprises:
determining a plurality of texture feature combinations according to a plurality of sample cell texture features corresponding to the same sample cell gray-scale map; each texture feature combination comprises one or more sample cell texture features, and the expression quantity labels corresponding to the texture feature combinations are the same;
and inputting a plurality of texture feature combinations into the plurality of regression models respectively.
Optionally, the screening of the optimal cell texture features with the highest contribution degree to the prediction of the protein expression amount from the plurality of sample cell texture features comprises:
respectively inputting each texture feature combination into a plurality of optimized regression models of different types, and obtaining the prediction error magnitude corresponding to each texture feature combination according to the predicted protein expression quantity output by the optimized regression models;
aiming at each texture feature combination, determining the contribution degree of the texture feature combination to the predicted protein expression quantity according to the prediction error;
and determining the optimal cell texture characteristics with the highest contribution degree to the predicted protein expression amount according to the contribution degree of each texture characteristic combination to the predicted protein expression amount.
Optionally, the determining, as the expression quantity prediction model, a regression model with a smallest prediction error from the multiple regression models includes:
obtaining optimal cell texture characteristics corresponding to the cell gray level images of the samples respectively to obtain multiple optimal cell texture characteristics;
respectively inputting the optimal cell texture features into the optimal regression models, and obtaining prediction errors of the optimal cell texture features of the optimal regression models according to the predicted protein expression quantities output by the optimal regression models and the expression quantity labels corresponding to the optimal cell texture features;
and determining the optimized regression model with the minimum prediction error aiming at the optimal cell texture characteristics from the plurality of optimized regression models to be used as the expression quantity prediction model.
Optionally, the determining, according to the predicted protein expression amount, a target cell whose predicted protein expression amount satisfies a set condition from the plurality of test cells includes:
sequencing the plurality of predicted protein expression quantities, and determining a preset number of predicted protein expression quantities sequenced most at the front as target expression quantities from the sequenced plurality of predicted protein expression quantities;
and determining a gray-scale map of the cell to be detected corresponding to the target expression amount, and determining the cell to be detected corresponding to the gray-scale map of the cell to be detected as a target cell.
A cell screening apparatus based on an expression level prediction model, the apparatus comprising:
the target cell texture feature acquisition module is used for acquiring to-be-detected cell gray-scale maps corresponding to a plurality of to-be-detected cells in the cell culture pool respectively and acquiring target cell texture features corresponding to the to-be-detected cell gray-scale maps respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
the cell expression prediction module is used for inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression prediction model and obtaining the predicted protein expression quantities corresponding to the plurality of cells to be detected according to the output of the expression prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and the target cell determining module is used for determining the target cells of which the predicted protein expression quantity meets the set conditions from the multiple cells to be detected according to the predicted protein expression quantity.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method for cell screening based on an expression level prediction model as described above when the computer program is executed.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for cell screening based on an expression level prediction model as set forth above.
The cell screening method, the device, the computer equipment and the storage medium based on the expression quantity prediction model can rapidly determine the cells with high protein expression quantity by acquiring the gray level maps of the cells to be tested corresponding to the cells to be tested in the cell culture pool, acquiring the texture features of the target cells corresponding to the gray level maps of the cells to be tested, inputting the texture features of the target cells of the cells to be tested into the pre-trained expression quantity prediction model, acquiring the predicted protein expression quantities corresponding to the cells to be tested according to the output of the expression quantity prediction model, determining the target cells with the predicted protein expression quantities meeting the set conditions from the cells to be tested according to the predicted protein expression quantities, avoiding the cell screening after repeated culture and screening, greatly shortening the screening period, and rapidly processing millions of single cells through the cell screening method, when the cell screening scope is increased, the workload of workers is reduced, and the cell screening efficiency is effectively improved.
Drawings
FIG. 1 is a schematic view showing a flow of a cell screening method based on an expression level prediction model in one embodiment;
FIG. 2 is a flowchart illustrating steps of generating an expression level prediction model in one embodiment;
FIG. 3a is a gray scale view of a sample cell in one embodiment;
FIG. 3b is a fluorescent image of a cell according to one embodiment;
FIG. 4 is a schematic flow chart diagram illustrating the steps of a regression model optimization method in one embodiment;
FIG. 5 is a schematic flow chart of the steps for determining optimal cell texture features in one embodiment;
FIG. 6 is a schematic flow chart diagram illustrating the step of screening an optimized regression model in one embodiment;
FIG. 7 is a schematic diagram of a cell screening process according to one embodiment;
FIG. 8 is a block diagram showing the construction of a cell screening apparatus based on an expression level prediction model in one embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
To facilitate an understanding of embodiments of the present invention, a description of the prior art will be given.
In the prior art, when obtaining cells for culturing monoclonal cell strains, cells in a cell pool can be transfected first, and the cell pool is treated by a limiting dilution method to obtain single cells, and then a homogeneous cell population, i.e., a cell strain, can be cultured by the single cells, and the cell strain with high expression level of a target protein is screened.
However, the process of obtaining single cells by using the limiting dilution method is complicated, and repeated culture and screening are required, and meanwhile, due to the problem of cell transfection efficiency, the proportion of cells with high target protein expression level is low, so that the screening work efficiency of the screened cells is low, the screening period is long, the traditional method usually needs 6 months or more time, and the requirements of large scale and industrialization are difficult to meet while a large amount of manpower and material resources are consumed.
In one embodiment, as shown in fig. 1, a cell screening method based on an expression quantity prediction model is provided, and this embodiment is illustrated by applying the method to a terminal, and it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 101, obtaining gray level maps of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray level maps of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
as an example, the test cell can be a cell treated by transfection technology, the test cell can be a cell which can not obtain the exogenous DNA fragment after treatment, or a cell which has obtained the exogenous DNA fragment but does not integrate into the chromosome, or a cell in which the exogenous DNA fragment has integrated into the chromosome, and the gray-scale map of the test cell is the gray-scale map of the test cell; the texture features of the target cells are information reflecting the image features of the gray-scale map of the cells to be detected.
In practical applications, a plurality of cells in the cell culture pool can be transfected, so that some or all of the cells in the cell culture pool can obtain the exogenous DNA fragments. After transfection treatment, a gray-scale image of the cell to be detected corresponding to each of a plurality of cells to be detected in the cell culture pool can be obtained through a photomicrograph device, and the texture characteristics of the target cell corresponding to each gray-scale image of the cell to be detected are determined, wherein the texture characteristics of the target cell are the optimal cell texture characteristics determined from a plurality of cell texture characteristics in advance.
102, inputting texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
the predicted protein expression quantity is a protein expression quantity predicted by an expression quantity prediction model based on the texture characteristics of target cells, the expression quantity prediction model is obtained by training a plurality of sample cell texture characteristics with expression quantity labels, and the model can be used for predicting the protein expression quantity corresponding to the texture characteristics of the target cells. The expression quantity label is used for representing the real protein quantity corresponding to the sample cell texture characteristics, and the real protein quantity corresponding to the sample cell texture characteristics refers to the real protein expression quantity of cells in the sample cell gray level image.
In the specific implementation, after the texture features of the target cells of the multiple cells to be detected are obtained, the texture features of the target cells can be input into a pre-trained expression quantity prediction model, and the predicted protein expression quantities corresponding to the multiple cells to be detected can be obtained according to the output of the expression quantity prediction model.
In practical application, the protein expression level of the cell can be determined by a fluorescence map of the cell, however, the cell is inactivated by photographing the fluorescence map of the cell, and the cell cannot continue to proliferate. In this embodiment, the predicted protein expression amount is obtained through the cell texture features of the cell gray-scale map to be detected, so that cell inactivation can be avoided while the protein expression amount is predicted.
And 103, determining target cells with the predicted protein expression quantity meeting set conditions from the multiple cells to be detected according to the predicted protein expression quantity.
After the predicted protein expression level is determined, a cell to be tested whose predicted protein expression level satisfies a predetermined condition may be determined as a target cell.
In the embodiment, gray-scale maps of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool can be obtained, texture features of target cells corresponding to the gray-scale maps of the cells to be detected are obtained, the texture features of the target cells of the cells to be detected are input into a pre-trained expression quantity prediction model, predicted protein expression quantities corresponding to the cells to be detected are obtained according to the output of the expression quantity prediction model, the target cells with the predicted protein expression quantities meeting set conditions are determined from the cells to be detected according to the predicted protein expression quantities, cells with high protein expression quantities are rapidly determined, cell screening can be carried out after repeated culture and screening is avoided, the screening period is greatly shortened, millions of single cells can be rapidly processed through the method, the cell screening range is increased, and the workload of workers is reduced, effectively improve the cell screening efficiency.
In one embodiment, as shown in fig. 2, the method may further include the steps of:
step 201, obtaining a sample cell gray scale image and a corresponding fluorescence image;
the grayscale map and the fluorescence map corresponding to the grayscale map are obtained by imaging the same cell under the same imaging conditions.
In a specific implementation, a cell as a training set may be set, and the cell is photographed by a microscopic photographing apparatus to obtain a gray-scale image of the cell of the sample and a corresponding fluorescence image.
Specifically, the cells used as the training set are cells treated by transfection technique in a cell culture pool, and the cells may be cells from which the exogenous DNA fragment has not been obtained after treatment, cells from which the exogenous DNA fragment has been obtained but not integrated into the chromosome, or cells from which the exogenous DNA fragment has been integrated into the chromosome.
The gray-scale image and the fluorescence image can be simultaneously shot by using a microscope under the same shooting condition for the same batch of cells subjected to transfection treatment in the cell culture pond, one or more cells can be included in the obtained gray-scale image and the fluorescence image, and the coordinate of each cell in the gray-scale image corresponds to the coordinate of the cell in the fluorescence image. Since the same gray-scale image and the fluorescence image can simultaneously contain a plurality of cells, after the gray-scale image and the fluorescence image are obtained, the gray-scale image and the fluorescence image can be subjected to image preprocessing to obtain a sample cell gray-scale image and a fluorescence image corresponding to a single cell, as shown in fig. 3a and 3b, wherein the image preprocessing can include cell segmentation processing and adhesion cell filtering processing.
202, obtaining a plurality of sample cell texture characteristics of the sample cell gray-scale image and a real protein expression amount corresponding to the fluorescence image;
after the sample cell gray-scale image is obtained, a plurality of sample cell texture features can be extracted from the sample cell gray-scale image, and the real protein expression amount corresponding to the fluorescence image can be obtained.
Specifically, the fluorescence map and the gray scale map of the cell having different protein expression levels do not match, i.e., the protein expression level of the cell may correspond to the texture feature of the sample cell in the fluorescence map and the gray scale map.
In practical application, a texture feature extraction algorithm may be used to extract texture features of a plurality of sample cells from a sample cell gray-scale map, and the texture features of the sample cells corresponding to the plurality of sample cell gray-scale maps may form a texture feature matrix, for example, for n sample cell gray-scale maps, m texture features of the sample cells are extracted respectively, so as to form an n × m texture feature matrix. Wherein the sample cell texture features may include any one or more of: a gray Level Co-occurrence Matrix Feature (gray Level Co-occurrence Matrix Feature), a Histogram Feature (Histogram Feature), (Laws Energy Texture Feature), a Local Binary Pattern Feature (Local Binary Pattern Feature), and a Discrete Wavelet Transform Feature (Discrete Wavelet Transform), and of course, those skilled in the art may select other Texture features as needed, which is not limited in this application.
After the fluorescence map is obtained, the real protein expression quantity of the cells in the corresponding sample cell gray scale map can be determined according to the fluorescence map. In practical applications, the protein produced by the target gene (e.g., the foreign DNA fragment) can emit fluorescence at a specific wavelength. After obtaining the fluorescence map, a G value (also referred to as a fluorescence value) corresponding to the green channel of the fluorescence map can be determined, where the G value can be a total G value or an average G value of the fluorescence map, and after determining the G value, the real protein expression amount corresponding to the fluorescence map is determined according to the G value. Wherein, the G value and the protein expression quantity have positive correlation, namely the higher the G value is, the higher the protein expression quantity of the corresponding cell is, and based on the G value of the fluorescence map, the real protein expression quantity can be determined.
Step 203, obtaining expression quantity labels respectively corresponding to the cell texture characteristics of the plurality of samples according to the expression quantity of the real protein;
after the real protein expression amount corresponding to each fluorescence image is determined, expression amount labels respectively corresponding to the cell texture characteristics of a plurality of samples can be determined according to the real protein expression amount.
Specifically, the expression level of the real protein may be used as an expression level label corresponding to the cell gray-scale map of the sample, and the expression level label of the cell gray-scale map of the sample is an expression level label corresponding to the cell texture feature of the sample. Extracting texture characteristics of a plurality of sample cells from the same sample cell gray-scale image, wherein the texture characteristics have the same expression quantity label; and (3) extracting texture characteristics of the sample cells from different sample cell gray level images, wherein the expression quantity labels correspond to the real protein expression quantity of the cells in the sample cell gray level images.
Step 204, training a plurality of regression models of different types by adopting a plurality of sample cell texture characteristics and expression quantity labels corresponding to the sample cell texture characteristics;
specifically, various types of Regression models, such as an SVR (Support vector Regression) model, an elastic net (elastic network) model, an Xgboost model, a Gradient Boosting Regression model, and a logistic Regression model, may be preset. Because different regression models are different in implementation form, namely the underlying mathematical principles are different, different regression models are adopted to analyze the extracted cell texture characteristics of the sample and predict the protein expression quantity, so that different prediction results can be obtained.
Based on this, after obtaining a plurality of sample cell texture features and expression quantity labels corresponding to the sample cell texture features, a plurality of regression models of different types can be trained.
Step 205, according to the training results of the multiple regression models, selecting the optimal cell texture feature with the highest contribution degree to the predicted protein expression amount from the multiple sample cell texture features, and determining the regression model with the smallest prediction error from the multiple regression models as the expression amount prediction model.
After training, a plurality of training results with regressiveness can be obtained, the optimal cell texture feature with the highest contribution degree to the predicted protein expression amount is screened out from the cell texture features of the plurality of samples according to the training results, and the regression model with the minimum prediction error is determined from the plurality of regression models and serves as the expression amount prediction model based on the optimal cell texture feature. Wherein the optimal cell texture feature may comprise one or more sample cell texture features.
In the implementation, a plurality of sample cell texture features and expression quantity labels corresponding to the sample cell texture features are adopted to train a plurality of regression models of different types, the optimal sample cell texture feature with the highest contribution degree to the predicted protein expression quantity is selected from the sample cell texture features according to the training results of the regression models, the regression model with the smallest prediction error is determined from the regression models, the regression model is used as the expression quantity prediction model, cell images can be inspected from multiple dimensions on the basis of the sample cell texture features, the relation between the cell gray level graph and the protein expression quantity of the cells is established, and a basis is provided for rapidly predicting the protein expression quantity.
In an embodiment, as shown in fig. 4, the training of multiple regression models of different types by using the sample cell texture features and their corresponding expression quantity tags may include the following steps:
step 401, respectively inputting texture characteristics of a plurality of sample cells into the plurality of regression models, and obtaining a current prediction error of each regression model according to predicted protein expression quantity output by the plurality of regression models and expression quantity labels corresponding to the texture characteristics of the sample cells;
in a specific implementation, the cell texture features of the multiple samples can be respectively input into the multiple regression models to obtain predicted protein expression quantities output by the multiple regression models, and then the current prediction errors of the multiple regression models can be determined according to the predicted protein expression quantities and expression quantity labels corresponding to the cell texture features of the samples.
Step 402, aiming at each regression model, adjusting model parameters of the regression model according to the current prediction error, and inputting the cell texture characteristics of the sample again for model training until the training end condition is met, so as to obtain an optimized regression model of the regression model.
For each regression model, after obtaining the current prediction error, adjusting and optimizing the model parameters of the regression model according to the current prediction error, inputting the cell texture characteristics of the sample into the regression model again for model training, obtaining the current prediction error based on the cell texture characteristics of the current input of the regression model with the adjusted model parameters, judging whether the current prediction error meets the training end condition, and determining the current regression model as the optimized regression model if the current prediction error meets the training end condition; if the current prediction error does not meet the training end condition, the steps can be repeated, and the model parameters of the regression model are continuously optimized.
In this embodiment, model parameters of the regression model are continuously adjusted according to the current prediction error corresponding to the expression quantity label and the predicted protein expression quantity until an optimized regression model is obtained, and the regression model can be continuously optimized by a machine supervised learning method, so that the prediction accuracy of the regression model is improved.
In one embodiment, the inputting the texture features of the sample cells into the regression models respectively may include the following steps:
determining a plurality of texture feature combinations according to a plurality of sample cell texture features corresponding to the same sample cell gray-scale map; each texture feature combination comprises one or more sample cell texture features, and the expression quantity labels corresponding to the texture feature combinations are the same; and inputting a plurality of texture feature combinations into the plurality of regression models respectively.
In practical application, when the texture features of the sample cells are input into multiple regression models, the texture features of the sample cells can be combined for multiple sample cell texture features corresponding to the same sample cell gray scale map to obtain multiple texture feature combinations, each texture feature combination includes one or more sample cell texture features, and the sample cell texture features in the multiple texture feature combinations are all from the same sample cell gray scale map, and the multiple texture feature combinations have the same expression quantity label. After obtaining the plurality of combinations of texture features, the plurality of combinations of texture features may be input into a plurality of regression models, respectively.
Specifically, when texture feature combinations are determined, methods such as correlation analysis can be used to analyze the cell texture features of multiple samples to obtain correlation scores corresponding to the cell texture features of the samples, where the correlation scores are used to characterize the importance of the cell texture features of the corresponding samples to the prediction of protein expression level, and the correlation scores are in a positive correlation relationship, that is, the higher the correlation score of the cell texture features of the samples is, the more important the cell texture features play in predicting the protein expression level.
After obtaining the relevance scores, the plurality of sample cellular texture features may be ranked according to relevance scores from high to low. And aiming at the sorted sample cell texture features, the sample cell texture features can be sequentially selected from the sorted sample cell texture features to generate a texture feature combination. For example, three sample cell texture features are sorted from high to low according to the relevance scores, the obtained sequence is a sample cell texture feature A, a sample cell texture feature B and a sample cell texture feature C, when the texture feature combination is generated, the sample cell texture feature A with the highest score can be selected as a cell texture feature combination, then the sample cell texture feature B with the next highest score and the original sample cell texture feature A are selected to generate a cell texture feature combination, and finally, the sample cell texture feature C with the lowest score can be selected to generate a cell texture feature combination with the original sample cell texture feature A and the original sample cell texture feature B.
In this embodiment, a plurality of sample cell texture features are combined, a plurality of texture feature combinations are determined, the plurality of texture feature combinations are respectively input into a plurality of regression models, the influence of different sample cell features on the prediction effect of the regression models can be comprehensively evaluated, and the prediction accuracy of the regression models is improved.
In one embodiment, as shown in fig. 5, the screening of the optimal cell texture features from the plurality of sample cell texture features that have the highest contribution degree to the predicted protein expression amount includes:
step 501, inputting each texture feature combination into a plurality of optimized regression models of different types, and obtaining a prediction error magnitude corresponding to each texture feature combination according to predicted protein expression quantities output by the optimized regression models;
after obtaining the multiple texture feature combinations, the texture feature combinations can be respectively input into multiple optimization regression models of different types to obtain predicted protein expression quantities output by the multiple optimization regression models, so that expression quantity labels corresponding to the texture feature combinations can be obtained, and prediction error sizes corresponding to the texture feature combinations are determined according to the output predicted protein expression quantities and the expression quantity labels. By obtaining the prediction error, the prediction effect of the optimized regression model can be verified.
Step 502, determining the contribution degree of each texture feature combination to the predicted protein expression amount according to the prediction error;
step 503, determining the optimal cell texture feature with the highest contribution degree to the predicted protein expression amount according to the contribution degree of each texture feature combination to the predicted protein expression amount.
And determining the contribution degree of the texture feature combination to the predicted protein expression quantity according to the prediction error of each texture feature combination, and determining the cell texture feature with the highest contribution degree to the predicted protein expression quantity, namely the optimal cell texture feature by comparing the contribution degrees of the texture feature combinations to the predicted protein expression quantity.
Specifically, for each texture feature combination, the number of optimized regression models with prediction errors smaller than a preset threshold may be determined, and the number is positively correlated with the contribution degree of the texture feature combination, for example, when different types of optimized regression models are input for the same texture feature combination, a prediction result with a prediction error smaller than a preset value may be obtained, and thus the texture feature combination accurately predicts the protein expression level, and has a high contribution degree.
When determining the contribution degree of a plurality of texture feature combinations to the predicted protein expression amount, determining the texture feature combination which simultaneously leads the optimal prediction result to each type of optimized regression model, wherein the sample cell texture features in the texture feature combination are the optimal cell texture features. Specifically, each texture feature combination is respectively input into different types of optimized regression models, and for each type of optimized regression model, the prediction error magnitudes corresponding to a plurality of texture feature combinations can be sorted from small to large, and the preset error magnitude of the preset number with the highest sorting is determined as the optimal prediction result. In multiple optimization regressions of different types, a texture feature combination which enables the optimal prediction result to be generated to the optimization regression models of the different types at the same time can be determined, and the sample cell texture features in the texture feature combination are the optimal sample cell texture features.
In this embodiment, according to the respective contribution degrees of the texture feature combinations to the predicted protein expression amount, the optimal sample cell texture feature with the highest contribution degree to the predicted protein expression amount can be determined, so that the sample cell texture feature can be extracted more specifically in the subsequent process of predicting the protein expression amount, and the phenomenon that the interference is generated on the prediction result or the calculation resource is wasted due to excessive extraction of other cell texture features with lower contribution degrees is avoided.
In one embodiment, as shown in fig. 6, the determining a regression model with the smallest prediction error from the plurality of regression models as the expression quantity prediction model may include the following steps:
601, obtaining optimal cell texture characteristics corresponding to a plurality of sample cell gray level maps respectively to obtain a plurality of optimal cell texture characteristics;
in practical application, optimal cell texture features corresponding to the cell gray-scale maps of the multiple samples can be obtained, and the optimal cell texture features can be obtained.
Step 602, inputting the optimal cell texture features into the optimal regression models, and obtaining prediction errors of the optimal cell texture features of the optimal regression models according to the predicted protein expression quantities output by the optimal regression models and the expression quantity labels corresponding to the optimal cell texture features;
specifically, expression quantity labels corresponding to the optimal cell texture features can be determined, the optimal cell texture features are input into the optimal regression models respectively, predicted protein expression quantities output by the optimal regression models are obtained, and prediction errors when the optimal regression models predict by using the optimal sample cell texture features can be determined by using the output predicted protein expression quantities and the corresponding expression quantity labels.
Step 603, determining the optimized regression model with the minimum prediction error for the optimal cell texture characteristics from the plurality of optimized regression models as an expression quantity prediction model.
After the prediction errors of the multiple optimized regression models for the optimal cell texture features are obtained, the optimized regression model with the minimum prediction error can be determined as the expression quantity prediction model.
In another example, after obtaining the optimal cell texture features corresponding to the cell grayscale images of the samples, the optimal cell texture features may be divided into ten equal parts for ten experiments, where nine parts are used for training and verifying the regression model, and one part is used for testing the trained regression model. Furthermore, it is possible to obtain an average prediction error of ten experiments, determine an expression amount prediction model from a plurality of types of regression models based on the average prediction error, and perform performance evaluation on the expression amount prediction model.
In this embodiment, the optimal cell texture features are input into the optimized regression models, and the expression quantity prediction model is determined from the optimized regression models according to the predicted protein expression quantities output by the optimized regression models and the expression quantity labels corresponding to the optimal cell texture features, so that the regression model with accurate prediction effect can be determined from the regression models of multiple types based on the optimal texture features that contribute most to the protein expression quantities.
In one embodiment, the determining, according to the predicted protein expression amount, a target cell whose predicted protein expression amount satisfies a set condition from the plurality of test cells may include:
sequencing the plurality of predicted protein expression quantities, and determining a preset number of predicted protein expression quantities sequenced most at the front as target expression quantities from the sequenced plurality of predicted protein expression quantities; and determining a gray-scale map of the cell to be detected corresponding to the target expression amount, and determining the cell to be detected corresponding to the gray-scale map of the cell to be detected as a target cell.
In a specific implementation, after obtaining the predicted protein expression amounts corresponding to the multiple cells to be tested, the multiple predicted protein expression amounts may be ranked, and a preset number of the predicted protein expression amounts ranked at the top are determined as the target expression amount from the ranked multiple predicted protein expression amounts.
Specifically, the predicted protein expression levels may be sorted in descending order, that is, sorted from large to small, and after sorting, the predicted protein expression levels corresponding to the top N predicted protein expression levels may be determined as the target expression levels. Of course, in practical applications, the predicted protein expression level exceeding the preset expression level threshold may also be determined as the target expression level.
After the target expression level is determined, a gray-scale map of the cell to be detected corresponding to the target expression level can be determined, and the cell to be detected corresponding to the gray-scale map of the cell to be detected is determined as the target cell. The target cell can be used for culturing cell strains.
In this embodiment, the plurality of predicted protein expression levels are ranked, and a preset number of cells with the highest ranked predicted protein expression level are determined as target cells from the plurality of cells to be tested according to the ranked plurality of predicted protein expression levels, so that cells with high protein expression levels can be rapidly screened, and the screening workload is greatly reduced.
In order to enable those skilled in the art to better understand the above steps, the following is an example to illustrate the embodiments of the present application, but it should be understood that the embodiments of the present application are not limited thereto.
In a specific implementation, the aflibercept expression plasmid can be used to transfect a plurality of CHO-K1 host cells to obtain a plurality of transfected CHO-K1 host cells, i.e., the cells to be tested in the present application. After obtaining the plurality of cells to be measured, as shown in fig. 7, a gray-scale image of the cells to be measured corresponding to each of the plurality of cells to be measured can be obtained by microscopic photographing.
Aiming at each cell gray-scale image to be detected, texture feature analysis can be carried out on the cell gray-scale image to be detected to obtain cell texture features, the cell texture features are input into an expression quantity prediction model, the protein expression quantity of the cells is predicted, and the predicted protein expression quantity output by the model is obtained.
After the processing of the cell gray-scale map to be detected is finished, whether the processing of all the cell gray-scale maps to be detected is finished can be judged, if not, the steps of performing texture feature analysis on the cell gray-scale map to be detected and obtaining cell texture features can be returned; if yes, obtaining the expression quantity sequence of the multiple cells to be detected according to the predicted protein expression quantity of the multiple cells to be detected, determining the cells to be detected with high protein expression quantity from the sequence result, generating a screening report and submitting the screening report.
It should be understood that although the various steps in the flow charts of fig. 1-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not necessarily performed in a strict order, and may be performed in other orders, unless otherwise indicated herein. Moreover, at least some of the steps in fig. 1-7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 8, there is provided a cell screening apparatus based on an expression level prediction model, the apparatus including:
a target cell texture feature obtaining module 801, configured to obtain a gray-scale map of cells to be detected corresponding to each of multiple cells to be detected in the cell culture pool, and obtain texture features of target cells corresponding to each of the gray-scale maps of the multiple cells to be detected; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
the cell expression prediction module 802 is configured to input texture features of target cells of a plurality of cells to be detected into a pre-trained expression prediction model, and obtain predicted protein expression amounts corresponding to the plurality of cells to be detected according to output of the expression prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and a target cell determining module 803, configured to determine, according to the predicted protein expression amount, a target cell whose predicted protein expression amount satisfies a set condition from the plurality of cells to be tested.
In one embodiment, further comprising:
the image acquisition module is used for acquiring a sample cell gray-scale image and a corresponding fluorescence image;
the sample cell texture feature acquisition module is used for acquiring a plurality of sample cell texture features of the sample cell gray-scale image and the real protein expression amount corresponding to the fluorescence image;
the expression quantity label determining module is used for obtaining expression quantity labels respectively corresponding to the cell texture characteristics of the plurality of samples according to the real protein expression quantity;
the training module is used for training a plurality of regression models of different types by adopting a plurality of sample cell texture characteristics and expression quantity labels corresponding to the sample cell texture characteristics;
and the expression quantity prediction model determining module is used for screening the optimal cell texture characteristics with the highest contribution degree to the predicted protein expression quantity from the cell texture characteristics of the multiple samples according to the training results of the multiple regression models, and determining the regression model with the smallest prediction error from the multiple regression models to serve as the expression quantity prediction model.
In one embodiment, the training module comprises:
the sample cell texture feature input submodule is used for respectively inputting the sample cell texture features into the regression models, and obtaining the current prediction error of each regression model according to the predicted protein expression quantity output by the regression models and the expression quantity labels of the corresponding sample cell texture features;
and the regression model optimization submodule is used for adjusting the model parameters of the regression model according to the current prediction error aiming at each regression model, and then inputting the cell texture characteristics of the sample again to carry out model training until the training ending condition is met, so that the optimized regression model of the regression model is obtained.
In one embodiment, the sample cell texture feature input sub-module comprises:
the texture feature combination determining unit is used for determining a plurality of texture feature combinations according to a plurality of sample cell texture features corresponding to the same sample cell gray level image; each texture feature combination comprises one or more sample cell texture features, and the expression quantity labels corresponding to the texture feature combinations are the same;
and the texture feature combination input unit is used for respectively inputting a plurality of texture feature combinations into the plurality of regression models.
In one embodiment, the expression quantity prediction model determination module includes:
the prediction error size determining submodule is used for inputting each texture feature combination into a plurality of optimization regression models of different types respectively, and obtaining the prediction error size corresponding to each texture feature combination according to the predicted protein expression quantity output by the optimization regression models;
the contribution degree determining submodule is used for determining the contribution degree of the texture feature combination to the predicted protein expression quantity according to the prediction error for each texture feature combination;
and the optimal cell texture feature determining submodule is used for determining the optimal cell texture feature with the highest contribution degree to the predicted protein expression amount according to the contribution degree of each texture feature combination to the predicted protein expression amount.
In one embodiment, the expression quantity prediction model determination module includes:
the optimal cell texture feature acquisition sub-module is used for acquiring optimal cell texture features corresponding to the sample cell gray level images to obtain multiple optimal cell texture features;
the prediction error determination submodule is used for respectively inputting the optimal cell texture characteristics into the optimal regression models, and obtaining the prediction error of each optimal regression model for the optimal cell texture characteristics according to the predicted protein expression quantity output by the optimal regression models and the expression quantity labels corresponding to the optimal cell texture characteristics;
and the optimization regression model screening submodule is used for determining an optimization regression model with the minimum prediction error aiming at the optimal cell texture characteristics from the multiple optimization regression models to serve as an expression quantity prediction model.
In one embodiment, the target cell determination module comprises:
the target expression quantity determining submodule is used for sequencing a plurality of predicted protein expression quantities and determining a preset number of predicted protein expression quantities sequenced most before as a target expression quantity from the sequenced plurality of predicted protein expression quantities;
and the cell screening submodule to be detected is used for determining a cell gray-scale map to be detected corresponding to the target expression level and determining the cell to be detected corresponding to the cell gray-scale map to be detected as a target cell.
For the specific definition of the cell screening device based on the expression level prediction model, see the above definition of the cell screening method based on the expression level prediction model, which is not repeated herein. The modules in the cell screening apparatus based on the expression level prediction model may be implemented in whole or in part by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a cell screening method based on an expression level prediction model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
obtaining a gray-scale image of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray-scale image of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and according to the predicted protein expression quantity, determining a target cell with the predicted protein expression quantity meeting set conditions from the plurality of cells to be detected.
In one embodiment, the steps in the other embodiments described above are also implemented when the computer program is executed by a processor.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
obtaining a gray-scale image of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray-scale image of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and according to the predicted protein expression quantity, determining a target cell with the predicted protein expression quantity meeting set conditions from the plurality of cells to be detected.
In one embodiment, the computer program when executed by the processor also performs the steps in the other embodiments described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A cell screening method based on an expression level prediction model, comprising:
obtaining a gray-scale image of cells to be detected corresponding to a plurality of cells to be detected in a cell culture pool respectively, and obtaining texture characteristics of target cells corresponding to the gray-scale image of the cells to be detected respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression quantity prediction model, and obtaining predicted protein expression quantities corresponding to the plurality of cells to be detected respectively according to the output of the expression quantity prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and according to the predicted protein expression quantity, determining a target cell with the predicted protein expression quantity meeting set conditions from the plurality of cells to be detected.
2. The method of claim 1, further comprising:
obtaining a sample cell gray-scale image and a corresponding fluorescence image thereof;
acquiring a plurality of sample cell texture characteristics of the sample cell gray-scale map and a real protein expression amount corresponding to the fluorescence map;
obtaining expression quantity labels respectively corresponding to the cell texture characteristics of a plurality of samples according to the real protein expression quantity;
training a plurality of regression models of different types by adopting a plurality of sample cell texture characteristics and expression quantity labels corresponding to the sample cell texture characteristics;
and according to the training results of the multiple regression models, screening the optimal cell texture features with the highest contribution degree to the predicted protein expression amount from the multiple sample cell texture features, and determining the regression model with the smallest prediction error from the multiple regression models to serve as the expression amount prediction model.
3. The method of claim 2, wherein the training of multiple regression models of different types using the sample cell texture features and their corresponding expression level tags comprises:
respectively inputting the texture characteristics of a plurality of sample cells into the plurality of regression models, and obtaining the current prediction error of each regression model according to the predicted protein expression quantity output by the plurality of regression models and the expression quantity label of the texture characteristics of the corresponding sample cells;
and aiming at each regression model, adjusting the model parameters of the regression model according to the current prediction error, and inputting the cell texture characteristics of the sample again to carry out model training until the training end condition is met, thereby obtaining the optimized regression model of the regression model.
4. The method of claim 3, wherein the inputting the sample cell texture features into the regression models comprises:
determining a plurality of texture feature combinations according to a plurality of sample cell texture features corresponding to the same sample cell gray-scale map; each texture feature combination comprises one or more sample cell texture features, and the expression quantity labels corresponding to the texture feature combinations are the same;
and inputting a plurality of texture feature combinations into the plurality of regression models respectively.
5. The method of claim 4, wherein the step of screening the optimal cell texture features from the plurality of sample cell texture features that have the highest contribution to the predicted protein expression level comprises:
respectively inputting each texture feature combination into a plurality of optimized regression models of different types, and obtaining the prediction error magnitude corresponding to each texture feature combination according to the predicted protein expression quantity output by the optimized regression models;
aiming at each texture feature combination, determining the contribution degree of the texture feature combination to the predicted protein expression quantity according to the prediction error;
and determining the optimal cell texture characteristics with the highest contribution degree to the predicted protein expression amount according to the contribution degree of each texture characteristic combination to the predicted protein expression amount.
6. The method according to claim 3, wherein the determining, as the expression quantity prediction model, the regression model with the smallest prediction error from the plurality of regression models comprises:
obtaining optimal cell texture characteristics corresponding to the cell gray level images of the samples respectively to obtain multiple optimal cell texture characteristics;
respectively inputting the optimal cell texture features into the optimal regression models, and obtaining prediction errors of the optimal cell texture features of the optimal regression models according to the predicted protein expression quantities output by the optimal regression models and the expression quantity labels corresponding to the optimal cell texture features;
and determining the optimized regression model with the minimum prediction error aiming at the optimal cell texture characteristics from the plurality of optimized regression models to be used as the expression quantity prediction model.
7. The method according to any one of claims 1 to 6, wherein the determining, from the plurality of test cells, a target cell whose predicted protein expression level satisfies a predetermined condition based on the predicted protein expression level comprises:
sequencing the plurality of predicted protein expression quantities, and determining a preset number of predicted protein expression quantities sequenced most at the front as target expression quantities from the sequenced plurality of predicted protein expression quantities;
and determining a gray-scale map of the cell to be detected corresponding to the target expression amount, and determining the cell to be detected corresponding to the gray-scale map of the cell to be detected as a target cell.
8. A cell screening apparatus based on an expression level prediction model, comprising:
the target cell texture feature acquisition module is used for acquiring to-be-detected cell gray-scale maps corresponding to a plurality of to-be-detected cells in the cell culture pool respectively and acquiring target cell texture features corresponding to the to-be-detected cell gray-scale maps respectively; the target cell texture features are optimal cell texture features determined in advance from various cell texture features;
the cell expression prediction module is used for inputting the texture characteristics of target cells of a plurality of cells to be detected into a pre-trained expression prediction model and obtaining the predicted protein expression quantities corresponding to the plurality of cells to be detected according to the output of the expression prediction model; the expression quantity prediction model is obtained by training according to a plurality of sample cell texture characteristics with expression quantity labels, the expression quantity labels are used for representing real protein expression quantities corresponding to the cell texture characteristics of all samples, and the expression quantity prediction model is used for predicting protein expression quantities corresponding to target cell texture characteristics;
and the target cell determining module is used for determining the target cells of which the predicted protein expression quantity meets the set conditions from the multiple cells to be detected according to the predicted protein expression quantity.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method for cell screening based on an expression level prediction model according to any one of claims 1 to 7.
10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the method for cell screening based on an expression level prediction model according to any one of claims 1 to 7.
CN202010870681.2A 2020-08-26 2020-08-26 Cell screening method and device based on expression quantity prediction model Active CN112017730B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010870681.2A CN112017730B (en) 2020-08-26 2020-08-26 Cell screening method and device based on expression quantity prediction model
PCT/CN2021/114168 WO2022042509A1 (en) 2020-08-26 2021-08-24 Cell screening method and apparatus based on expression level prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010870681.2A CN112017730B (en) 2020-08-26 2020-08-26 Cell screening method and device based on expression quantity prediction model

Publications (2)

Publication Number Publication Date
CN112017730A true CN112017730A (en) 2020-12-01
CN112017730B CN112017730B (en) 2022-08-09

Family

ID=73502282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010870681.2A Active CN112017730B (en) 2020-08-26 2020-08-26 Cell screening method and device based on expression quantity prediction model

Country Status (2)

Country Link
CN (1) CN112017730B (en)
WO (1) WO2022042509A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022042509A1 (en) * 2020-08-26 2022-03-03 深圳太力生物技术有限责任公司 Cell screening method and apparatus based on expression level prediction model
CN117153240A (en) * 2023-08-18 2023-12-01 国家超级计算天津中心 Oxygen free radical based relationship determination method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897984A (en) * 2018-05-07 2018-11-27 上海理工大学 Based on correlation analysis between CT images group feature and lung cancer gene expression
CN109948429A (en) * 2019-01-28 2019-06-28 上海依智医疗技术有限公司 Image analysis method, device, electronic equipment and computer-readable medium
CN110119710A (en) * 2019-05-13 2019-08-13 广州锟元方青医疗科技有限公司 Cell sorting method, device, computer equipment and storage medium
CN110838126A (en) * 2019-10-30 2020-02-25 东莞太力生物工程有限公司 Cell image segmentation method, cell image segmentation device, computer equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104404082A (en) * 2014-11-19 2015-03-11 上海美百瑞生物医药技术有限公司 Efficient screening method of exogenous protein expression cell strain
CN104850860A (en) * 2015-05-25 2015-08-19 广西师范大学 Cell image recognition method and cell image recognition device
WO2017027380A1 (en) * 2015-08-12 2017-02-16 Molecular Devices, Llc System and method for automatically analyzing phenotypical responses of cells
CN109740560B (en) * 2019-01-11 2023-04-18 山东浪潮科学研究院有限公司 Automatic human body cell protein identification method and system based on convolutional neural network
CN109815870B (en) * 2019-01-17 2021-02-05 华中科技大学 High-throughput functional gene screening method and system for quantitative analysis of cell phenotype image
CN112001329B (en) * 2020-08-26 2021-11-30 深圳太力生物技术有限责任公司 Method and device for predicting protein expression amount, computer device and storage medium
CN112037862B (en) * 2020-08-26 2021-11-30 深圳太力生物技术有限责任公司 Cell screening method and device based on convolutional neural network
CN112017730B (en) * 2020-08-26 2022-08-09 深圳太力生物技术有限责任公司 Cell screening method and device based on expression quantity prediction model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897984A (en) * 2018-05-07 2018-11-27 上海理工大学 Based on correlation analysis between CT images group feature and lung cancer gene expression
CN109948429A (en) * 2019-01-28 2019-06-28 上海依智医疗技术有限公司 Image analysis method, device, electronic equipment and computer-readable medium
CN110119710A (en) * 2019-05-13 2019-08-13 广州锟元方青医疗科技有限公司 Cell sorting method, device, computer equipment and storage medium
CN110838126A (en) * 2019-10-30 2020-02-25 东莞太力生物工程有限公司 Cell image segmentation method, cell image segmentation device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EDWIN等: ""Computed Tomography Texture Analysis Is Associated with Histopathologic Features and Protein Expression in Small Renal Cell Carcinomas"", 《UROLOGICAL SURGERY》 *
NACEI: ""一种基于灰度共生矩阵的细胞表达量初步筛选方法"", 《HTTP://WWW.360DOC.COM/DOCUMENT/20/0417/09/14292954_906582070.SHTML》 *
张均田等: "《现代药理试验方法 下》", 31 July 2012, 中国协和医科大学出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022042509A1 (en) * 2020-08-26 2022-03-03 深圳太力生物技术有限责任公司 Cell screening method and apparatus based on expression level prediction model
CN117153240A (en) * 2023-08-18 2023-12-01 国家超级计算天津中心 Oxygen free radical based relationship determination method, device, equipment and medium

Also Published As

Publication number Publication date
CN112017730B (en) 2022-08-09
WO2022042509A1 (en) 2022-03-03

Similar Documents

Publication Publication Date Title
CN112037862B (en) Cell screening method and device based on convolutional neural network
WO2022042510A1 (en) Protein expression quantity prediction method and apparatus, computer device, and storage medium
CN112735535B (en) Prediction model training method, prediction model training device, data prediction method, data prediction device and storage medium
CN111666993A (en) Medical image sample screening method and device, computer equipment and storage medium
Clarke Land use change modeling with sleuth: Improving calibration with a genetic algorithm
CN112017730B (en) Cell screening method and device based on expression quantity prediction model
Rueda et al. A hill-climbing approach for automatic gridding of cDNA microarray images
US11756677B2 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN113299346A (en) Classification model training and classifying method and device, computer equipment and storage medium
Schlegel et al. An empirical study of explainable AI techniques on deep learning models for time series tasks
US20150242676A1 (en) Method for the Supervised Classification of Cells Included in Microscopy Images
CN113408802B (en) Energy consumption prediction network training method and device, energy consumption prediction method and device, and computer equipment
CN114169460A (en) Sample screening method, sample screening device, computer equipment and storage medium
KR101913952B1 (en) Automatic Recognition Method of iPSC Colony through V-CNN Approach
CN113127342B (en) Defect prediction method and device based on power grid information system feature selection
CN111949530B (en) Test result prediction method and device, computer equipment and storage medium
US11775822B2 (en) Classification model training using diverse training source and inference engine using same
CN113095589A (en) Population attribute determination method, device, equipment and storage medium
Itano et al. An automated image analysis and cell identification system using machine learning methods
Johnson et al. Recombination rate inference via deep learning is limited by sequence diversity
Dhivya et al. Weighted particle swarm optimization algorithm for randomized unit testing
CN117077016B (en) Supermatrix rock identification method of support vector machine based on aviation magnetic release data
CN111951888B (en) Beef fatty acid composition prediction method, system and storage medium
US20230062003A1 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN111370068B (en) Protein isomer pair interaction prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220602

Address after: 518048 No. 323-m, third floor, comprehensive Xinxing phase I, No. 1, Haihong Road, Fubao community, Fubao street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Taili Biotechnology Co.,Ltd.

Address before: 523560 building 3 and 4, gaobao green technology city, Tutang village, Changping Town, Dongguan City, Guangdong Province

Applicant before: DONGGUAN TAILI BIOLOGICAL ENGINEERING CO.,LTD.

GR01 Patent grant
GR01 Patent grant