WO2022042509A1 - Cell screening method and apparatus based on expression level prediction model - Google Patents

Cell screening method and apparatus based on expression level prediction model Download PDF

Info

Publication number
WO2022042509A1
WO2022042509A1 PCT/CN2021/114168 CN2021114168W WO2022042509A1 WO 2022042509 A1 WO2022042509 A1 WO 2022042509A1 CN 2021114168 W CN2021114168 W CN 2021114168W WO 2022042509 A1 WO2022042509 A1 WO 2022042509A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
cells
protein expression
texture features
expression
Prior art date
Application number
PCT/CN2021/114168
Other languages
French (fr)
Chinese (zh)
Inventor
陈亮
哈斯木买买提依明
韩晓健
梁楚亨
梁国龙
Original Assignee
深圳太力生物技术有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳太力生物技术有限责任公司 filed Critical 深圳太力生物技术有限责任公司
Publication of WO2022042509A1 publication Critical patent/WO2022042509A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • G06T2207/10061Microscopic image from scanning electron microscope
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10064Fluorescence image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro

Definitions

  • the present application relates to the field of biotechnology, in particular to a cell screening method, device, computer equipment and storage medium based on an expression level prediction model.
  • the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained.
  • Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
  • a cell screening method based on an expression level prediction model comprising:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, and the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and the one with the smallest prediction error is determined from the multiple regression models.
  • a regression model is used as the expression level prediction model.
  • multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types, including:
  • the model parameters of the regression model are adjusted, and then the texture features of the sample cells are re-input to perform model training until the training end condition is met, and an optimized regression model of the regression model is obtained.
  • the inputting the texture features of multiple sample cells into the multiple regression models includes:
  • each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations
  • the quantity label is the same
  • a plurality of texture feature combinations are respectively input into the plurality of regression models.
  • the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of a plurality of samples, including:
  • the optimal cell texture feature with the highest contribution to the predicted protein expression was determined.
  • the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, including:
  • the multiple optimal cell texture features are respectively input into the multiple optimized regression models, and each optimized regression model is obtained according to the predicted protein expression output from the multiple optimized regression models and the expression label corresponding to the optimal cell texture features.
  • the optimal regression model with the smallest prediction error for the optimal cell texture feature is determined as the expression prediction model.
  • the target cells whose predicted protein expression meets the set condition are determined from the plurality of cells to be tested, including:
  • the grayscale image of the cell to be tested corresponding to the target expression level is determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • a cell screening device based on an expression level prediction model comprising:
  • the target cell texture feature acquisition module is used to obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and to obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested;
  • the target The cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
  • the cell expression level prediction module is used for inputting the target cell texture features of multiple cells to be tested into the pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding cells of the multiple cells to be tested.
  • Predicting protein expression level the expression level prediction model is obtained by training based on the texture features of multiple sample cells with expression level labels, and the expression level labels are used to characterize the real protein expression level corresponding to the texture features of each sample cell, and the expression level
  • the prediction model is used to predict the protein expression corresponding to the texture features of the target cells;
  • the target cell determination module is configured to determine, according to the predicted protein expression amount, target cells whose predicted protein expression amount satisfies a set condition from the plurality of cells to be tested.
  • a computer device includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the processor implements the steps of the above-described cell screening method based on an expression level prediction model.
  • the above-mentioned cell screening method, device, computer equipment and storage medium based on an expression level prediction model by acquiring grayscale images of cells to be tested corresponding to a plurality of cells to be tested in a cell culture tank, and obtaining a plurality of grayscale images of cells to be tested.
  • the target cell texture features corresponding to the degree map respectively, input the target cell texture features of multiple test cells into the pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the multiple test cells according to the output of the expression prediction model.
  • the target cells whose predicted protein expression meets the set conditions are determined from multiple cells to be tested, which realizes the rapid determination of cells with high protein expression and avoids the need for repeated culture and screening.
  • Cell screening can greatly shorten the screening period, and the application can quickly process millions of single cells, while increasing the range of cell screening, reducing the workload of staff and effectively improving the efficiency of cell screening.
  • FIG. 1 is a schematic flowchart of a cell screening method based on an expression level prediction model in one embodiment
  • FIG. 2 is a schematic flowchart of steps of generating an expression level prediction model in one embodiment
  • Figure 3a is a grayscale image of a sample cell in one embodiment
  • Fig. 3b is a kind of cell fluorescence map in one embodiment
  • FIG. 4 is a schematic flowchart of steps of a regression model optimization method in one embodiment
  • FIG. 5 is a schematic flowchart of steps of determining optimal cell texture features in one embodiment
  • FIG. 6 is a schematic flowchart of steps of screening and optimizing regression models in one embodiment
  • FIG. 7 is a schematic flow chart of cell screening in one embodiment
  • FIG. 8 is a structural block diagram of a cell screening device based on an expression level prediction model in one embodiment
  • Figure 9 is a diagram of the internal structure of a computer device in one embodiment.
  • the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained.
  • Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
  • a cell screening method based on an expression level prediction model is provided.
  • the method is applied to a terminal for illustration. It is understood that this method can also be applied to a terminal.
  • the server can also be applied to a system including a terminal and a server, and the method is implemented through the interaction between the terminal and the server.
  • the method includes the following steps:
  • Step 101 Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from The optimal cell texture feature determined from a variety of cell texture features.
  • the cells to be tested may be cells that have been processed by transfection technology
  • the cells to be tested may be cells that fail to obtain exogenous DNA fragments after treatment with transfection technology, or cells that have obtained exogenous DNA fragments but have not integrated Cells that have been integrated into chromosomes, or cells whose exogenous DNA fragments have been integrated into chromosomes
  • the grayscale image of the cell to be tested is the grayscale image of the cell to be tested
  • the texture feature of the target cell is the image feature that reflects the grayscale image of the cell to be tested. Information.
  • multiple cells in the cell culture pool can be transfected, so that some or all of the cells in the cell culture pool can obtain exogenous DNA fragments.
  • the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank can be obtained through a photomicrography device, and the texture of the target cells corresponding to each grayscale image of the cells to be tested can be determined.
  • the target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features.
  • Step 102 Input the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the plurality of cells to be tested according to the output of the expression prediction model;
  • the expression level prediction model is obtained by training based on the texture features of a plurality of sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell, and the expression level prediction model is used to predict The protein expression corresponding to the texture features of the target cells.
  • the predicted protein expression is the protein expression predicted by the expression prediction model based on the texture features of the target cells, and the expression prediction model is obtained by training the texture features of multiple sample cells with expression labels.
  • the expression prediction model can be used It is used to predict the protein expression corresponding to the texture features of the target cells.
  • the expression label is used to characterize the real protein amount corresponding to the texture features of the sample cells, and the real protein amount corresponding to the texture features of the sample cells refers to the real protein expression amount of the cells in the grayscale image of the sample cells.
  • the texture features of the target cells can be input into a pre-trained expression prediction model, and according to the output of the expression prediction model, multiple cells to be tested can be obtained The corresponding predicted protein expression levels.
  • the protein expression level of the cells can be determined by the fluorescence image of the cells. However, taking the fluorescence image of the cells will cause the cells to lose activity and the cells cannot continue to proliferate.
  • the predicted protein expression level is obtained through the cell texture feature of the grayscale image of the cell to be tested, which can avoid cell inactivation while estimating the protein expression level.
  • Step 103 according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets the set condition.
  • the cells to be tested whose predicted protein expression level meets the set conditions can be determined as target cells.
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank can be obtained, and the texture features of the target cells corresponding to the grayscale images of the cells to be tested can be obtained, and the plurality of grayscale images of the cells to be tested can be obtained.
  • the target cell texture features of the cells are input into the pre-trained expression prediction model, and the predicted protein expression corresponding to the multiple cells to be tested is obtained according to the output of the expression prediction model.
  • the cell screening method based on the expression level prediction model may further include the following steps:
  • Step 201 Obtain a grayscale image of the sample cell and its corresponding fluorescence image.
  • the grayscale image of the sample cell and its corresponding fluorescence image are the grayscale image and the fluorescence image obtained by photographing the same cell under the same shooting conditions.
  • cells as a training set can be set, and the cells are photographed by a microscopic photographing device to obtain a grayscale image and a corresponding fluorescence image of the sample cells.
  • the cells used as the training set are the cells in the cell culture pool that have been processed by transfection technology.
  • the cells can be cells that have not been able to obtain exogenous DNA fragments after the transfection technology treatment, or can be cells that have obtained foreign DNA fragments.
  • the grayscale image and the fluorescence image can be captured simultaneously with a microscope under the same shooting conditions, and the obtained grayscale image and fluorescence image can include one or more cells , the coordinates of each cell in the grayscale image correspond to the coordinates of that cell in the fluorescence image. Since the same grayscale image and fluorescence image can contain multiple cells at the same time, after obtaining the grayscale image and the fluorescence image, image preprocessing can be performed on the grayscale image and the fluorescence image to obtain the sample cell grayscale image corresponding to a single cell and Fluorescence images, as shown in Figure 3a and Figure 3b, wherein the image preprocessing can include cell segmentation processing, adhesion cell filtering processing.
  • Step 202 Obtain a plurality of sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image.
  • multiple texture features of the sample cells can be extracted from the grayscale image of the sample cells, and the real protein expression corresponding to the fluorescence image can be obtained.
  • the corresponding fluorescence maps and grayscale maps are inconsistent, that is, the protein expression levels of cells can correspond to the texture features of the sample cells in the fluorescence map and the grayscale map.
  • the texture feature extraction algorithm can be used to extract multiple sample cell texture features from the sample cell grayscale image, and the sample cell texture features corresponding to the multiple sample cell grayscale images can form a texture feature matrix.
  • the texture features of m sample cells are extracted from n grayscale images of sample cells, which can form an n*m texture feature matrix.
  • the sample cell texture features may include any one or more of the following: Gray Level Co-occurrence Matrix Feature (Grey Level Co-occurrence Matrix Feature), Histogram Feature (Histogram Feature), (Laws Energy Texture Feature), Local Binary Pattern Feature (Local Binary Pattern Feature), discrete wavelet transform feature (Discrete Wavelet Transform), of course, those skilled in the art can select other texture features as needed, which is not limited in this application.
  • the real protein expression level of the cells in the corresponding grayscale image of the sample cells can be determined according to the fluorescence image.
  • proteins produced by genes of interest such as exogenous DNA fragments, can fluoresce at specific wavelengths.
  • the G value corresponding to the green channel of the fluorescence image also called the fluorescence value
  • the G value can be the total G value or the average G value of the fluorescence image.
  • the actual protein expression corresponding to the fluorescence map there is a positive correlation between the G value and the protein expression, that is, the higher the G value, the higher the protein expression of the corresponding cell. Based on this, the real protein expression can be determined by the G value of the fluorescence map.
  • Step 203 according to the real protein expression, obtain expression labels corresponding to the texture features of a plurality of sample cells respectively.
  • the expression labels corresponding to the texture features of multiple sample cells can be determined according to the actual protein expression.
  • the real protein expression level may be used as the expression level label of the grayscale image of the corresponding sample cell, and the expression level label of the grayscale image of the sample cell is the expression level label corresponding to the texture feature of the sample cell.
  • the texture features of multiple sample cells extracted from the grayscale image of the same sample cells have the same expression label; the texture features of sample cells extracted from the grayscale images of different sample cells, the expression labels are the same as the cells in the grayscale image of the sample cells. corresponds to the actual protein expression.
  • Step 204 using multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types.
  • regression models can be preset, such as SVR (Support Vector Regression, Support Vactor Regression) model, ElasticNet (Elastic Network) model, Xgboost model, Gradient Boosting Regression model, Logistic Regression model. Since different regression models have different implementation forms, that is, the underlying mathematical principles are different, different prediction results can be obtained by using different regression models to analyze the extracted sample cell texture features and predict protein expression.
  • SVR Serial Vector Regression, Support Vactor Regression
  • ElasticNet Elastic Network
  • Xgboost model Gradient Boosting Regression model
  • Logistic Regression model Logistic Regression model. Since different regression models have different implementation forms, that is, the underlying mathematical principles are different, different prediction results can be obtained by using different regression models to analyze the extracted sample cell texture features and predict protein expression.
  • Step 205 according to the training results of the multiple regression models, select the optimal cell texture feature with the highest contribution to the predicted protein expression from the cell texture features of the multiple samples, and determine the prediction from the multiple regression models.
  • the regression model with the smallest error is used as the expression level prediction model.
  • the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and based on the optimal cell texture features Cell texture features, from multiple regression models, determine the regression model with the smallest prediction error as the expression prediction model.
  • the optimal cell texture features may include one or more sample cell texture features.
  • multiple sample cell texture features and their corresponding expression labels are used to train multiple regression models of different types, and according to the training results of multiple regression models, the texture features of multiple sample cells are screened out.
  • the optimal sample cell texture feature that contributes the most to the predicted protein expression, and the regression model with the smallest prediction error is determined from multiple regression models.
  • As an expression prediction model it can be based on multiple sample cell texture features. The cell image was examined in three dimensions, and the relationship between the cell grayscale image and the protein expression of the cell was established, which provided the basis for the rapid prediction of protein expression.
  • using the multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types may include the following steps:
  • Step 401 Input the texture features of multiple sample cells into the multiple regression models respectively, and obtain the current predictions of the respective regression models according to the predicted protein expression levels output by the multiple regression models and the expression level labels corresponding to the texture features of the sample cells. error.
  • the texture features of multiple sample cells can be input into multiple regression models respectively, and the predicted protein expression levels output by the multiple regression models can be obtained. , to determine the current prediction error of each regression model.
  • Step 402 For each regression model, according to the current prediction error, adjust the model parameters of the regression model, and then re-input the sample cell texture features for model training, until the training end condition is met, and an optimized regression model of the regression model is obtained. .
  • the model parameters of the regression model can be adjusted and optimized according to the current prediction error, and the sample cell texture features are input into the regression model again for model training.
  • the model can obtain the current prediction error based on the current input cell texture feature again, and judge whether the current prediction error meets the training end condition. If the current prediction error meets the training end condition, the current regression model can be determined as the optimized regression model; if If the current prediction error does not meet the training end condition, the above steps can be repeated to continue to optimize the model parameters of the regression model.
  • the model parameters of the regression model are continuously adjusted until an optimized regression model is obtained, and the regression model can be continuously optimized through the method of machine-supervised learning to improve the regression model. prediction accuracy.
  • the inputting the multiple sample cell texture features into the multiple regression models may include the following steps:
  • each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations
  • the quantity labels are the same; multiple texture feature combinations are input into the multiple regression models respectively.
  • the texture features of multiple sample cells can be combined to obtain multiple texture features.
  • Combination in each texture feature combination, one or more sample cell texture features are included. Since the sample cell texture features in multiple texture feature combinations are all from the same sample cell grayscale image, multiple texture feature combinations have the same expression label. After obtaining multiple texture feature combinations, the multiple texture feature combinations can be input into multiple regression models respectively.
  • methods such as correlation analysis can be used to analyze the texture features of multiple sample cells to obtain a correlation score corresponding to the texture features of each sample cell, and the correlation score is used to characterize the corresponding sample.
  • the importance of cell texture features in predicting protein expression is positively correlated, that is, the higher the correlation score of sample cell texture features, the more important it plays in predicting protein expression.
  • the cell texture features of multiple samples can be sorted according to the correlation score from high to low.
  • the sample cell texture features can be selected in turn to generate a texture feature combination.
  • the sample cell texture feature A with the highest score can be selected as a cell texture feature combination, and then the sample cell texture feature B with the second highest score and the original sample cell texture feature A can be selected to generate a cell texture feature combination.
  • Select the sample cell texture feature C with the lowest score and generate a cell texture feature combination with the original sample cell texture feature A and the sample cell texture feature B.
  • the texture features of multiple sample cells are combined to determine multiple texture feature combinations, and the multiple texture feature combinations are respectively input into multiple regression models, so that the influence of different sample cell features on the prediction effect of the regression model can be comprehensively evaluated , to improve the prediction accuracy of the regression model.
  • the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of multiple samples, including:
  • Step 501 Input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output from the multiple optimized regression models.
  • each texture feature combination can be input into multiple optimized regression models of different types to obtain the predicted protein expression output by the multiple optimized regression models, and then the corresponding texture feature combinations can be obtained.
  • Expression label according to the output predicted protein expression and expression label, to determine the size of the prediction error corresponding to each texture feature combination. By obtaining the prediction error size, the prediction effect of the optimized regression model can be verified.
  • Step 502 For each texture feature combination, determine the contribution degree of the texture feature combination to the predicted protein expression according to the prediction error.
  • Step 503 Determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of the multiple texture feature combinations to the predicted protein expression.
  • the contribution of the texture feature combination to the predicted protein expression can be determined according to the size of the prediction error. By comparing the contribution of each texture feature combination to the predicted protein expression, the predicted protein expression can be determined.
  • the number of optimized regression models whose prediction error is smaller than a preset threshold can be determined, and the number is positively correlated with the contribution degree of the texture feature combination.
  • the type of optimization regression model it can be obtained that the prediction error size is smaller than the preset prediction result, and the texture feature combination has a high degree of contribution to the accurate prediction of protein expression.
  • the texture feature combination that simultaneously produces the best prediction results for each type of optimized regression model can be determined, and the sample cell texture feature in the texture feature combination is the most Excellent cell texture features.
  • the prediction error sizes corresponding to multiple texture feature combinations can be sorted from small to large, and the highest ranked A preset error size of a preset number is determined as the best prediction result.
  • a texture feature combination that simultaneously makes each type of optimization regression model produce the best prediction result can be determined, and the sample cell texture feature in the texture feature combination is the optimal sample cell texture feature.
  • the optimal sample cell texture feature with the highest contribution to the predicted protein expression can be determined, which can be used in the subsequent prediction of the protein expression.
  • the cell texture features of the sample are extracted more targetedly, so as to avoid excessive extraction of other cell texture features with lower contributions, which may interfere with the prediction results or waste computing resources.
  • the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, which may include the following steps:
  • Step 601 Obtain the optimal cell texture features corresponding to each of the multiple sample cell grayscale images, and obtain a plurality of optimal cell texture features.
  • the optimal cell texture features corresponding to each of the grayscale images of multiple sample cells can be obtained, and multiple optimal cell texture features can be obtained.
  • Step 602 Input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and obtain the predicted protein expression output from the plurality of optimal regression models and the expression label corresponding to the optimal cell texture features. The prediction error of each optimized regression model for the optimal cell texture feature.
  • the expression labels corresponding to each of the multiple optimal cell texture features can be determined, and the multiple optimal cell texture features can be input into multiple optimized regression models respectively to obtain the predicted protein expression output by the multiple optimized regression models, and then Multiple output predicted protein expression levels and corresponding expression level labels can be used to determine the prediction error when each optimized regression model uses the optimal sample cell texture feature for prediction.
  • Step 603 from a plurality of optimal regression models, determine the optimal regression model with the smallest prediction error for the optimal cell texture feature as the expression prediction model.
  • the optimal regression model with the smallest prediction error can be determined as the expression amount prediction model.
  • the multiple optimal cell texture features can be divided into ten equal parts, and ten experiments are performed, of which nine parts are used for training , Validate the regression model, one for testing the trained regression model. Furthermore, the average prediction error of ten experiments can be obtained, the expression amount prediction model can be determined from multiple types of regression models according to the average prediction error, and the performance of the expression amount prediction model can be evaluated.
  • multiple optimal cell texture features are respectively input into multiple optimized regression models, and according to the predicted protein expression levels output by the multiple optimized regression models, and the expression level labels corresponding to the optimal cell texture features, from multiple
  • the expression prediction model is determined from the optimal regression model, and the regression model with accurate prediction effect can be determined from multiple types of regression models based on the optimal texture feature that contributes the most to the protein expression.
  • determining the target cells whose predicted protein expression meets the set condition from the plurality of cells to be tested may include the following steps:
  • the grayscale image of the cell is measured, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • the multiple predicted protein expression levels can be sorted, and from the sorted multiple predicted protein expression levels, the first predicted protein expression level is sorted.
  • the predicted protein expression level of the number is determined as the target expression level.
  • the predicted protein expression levels may be sorted in descending order, that is, sorted from large to small, and after sorting, the predicted protein expression levels corresponding to the top N names may be determined as the target expression levels.
  • the predicted protein expression that exceeds the preset expression threshold can also be determined as the target expression.
  • the grayscale image of the cell to be tested corresponding to the target expression level can be determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • the target cells can be used to culture cell lines.
  • a plurality of predicted protein expression levels are sorted, and according to the sorted multiple predicted protein expression levels, from a plurality of cells to be tested, the predicted protein expression levels of the first preset number of cells are sorted Determined as target cells, cells with high protein expression can be quickly screened, which greatly reduces the screening workload.
  • aflibercept expression plasmid can be used to transfect multiple CHO-K1 host cells to obtain multiple CHO-K1 host cells treated with transfection technology, that is, the cells to be tested in this application. After obtaining a plurality of cells to be tested, as shown in FIG. 7 , a grayscale image of the cells to be tested corresponding to each of the plurality of cells to be tested can be obtained by microphotography.
  • the texture feature analysis of the grayscale image of the cell to be tested can be performed to obtain the cell texture feature, and the cell texture feature can be input into the expression prediction model to predict the protein expression of the cell and obtain the model output. predicted protein expression.
  • the grayscale images of the cells to be tested After the grayscale images of the cells to be tested are processed, it can be determined whether the processing of all grayscale images of the cells to be tested is completed. If not, the steps of performing texture feature analysis on the grayscale images of the cells to be tested and obtaining cell texture features can be returned; , according to the predicted protein expression levels of multiple cells to be tested, the expression levels of multiple cells to be tested can be sorted, and the cells to be tested with high protein expression levels can be determined from the sorting results, and a screening report can be generated and submitted.
  • a cell screening device based on an expression level prediction model comprising:
  • the target cell texture feature acquisition module 801 is used to acquire the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank, and to acquire the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the The target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
  • the cell expression level prediction module 802 is used for inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding
  • the predicted protein expression level is obtained by training the expression level prediction model according to the texture features of multiple sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell.
  • the quantity prediction model is used to predict the protein expression quantity corresponding to the texture feature of the target cell;
  • the target cell determination module 803 is configured to, according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets a set condition.
  • it also includes:
  • the image acquisition module is used to acquire the grayscale image of the sample cell and its corresponding fluorescence image
  • a sample cell texture feature acquisition module used for acquiring multiple sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image
  • an expression label determination module configured to obtain expression labels corresponding to texture features of a plurality of sample cells according to the real protein expression
  • the training module is used to train multiple regression models of different types by using multiple sample cell texture features and their corresponding expression labels;
  • the expression level prediction model determination module is used to select the optimal cell texture feature with the highest contribution to the predicted protein expression level from the cell texture features of multiple samples according to the training results of multiple regression models, Among the regression models, the regression model with the smallest prediction error is determined as the expression prediction model.
  • the training module includes:
  • the sample cell texture feature input sub-module is used to input the multiple sample cell texture features into the multiple regression models respectively, and according to the predicted protein expression output from the multiple regression models and the expression label corresponding to the sample cell texture features, Get the current prediction error of each regression model;
  • the regression model optimization sub-module is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met, and the regression model is obtained.
  • the optimized regression model of the model is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met, and the regression model is obtained.
  • the optimized regression model of the model is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met.
  • the sample cell texture feature input sub-module includes:
  • the texture feature combination determination unit is used to determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the The expression labels corresponding to multiple texture feature combinations are the same;
  • the texture feature combination input unit is used for inputting multiple texture feature combinations into the multiple regression models respectively.
  • the expression level prediction model determination module includes:
  • the prediction error size determination sub-module is used to input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output by the multiple optimized regression models ;
  • Contribution degree determination sub-module for each texture feature combination, according to the prediction error size, to determine the contribution degree of the texture feature combination to the predicted protein expression level
  • the optimal cell texture feature determination sub-module is used to determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of multiple texture feature combinations to the predicted protein expression.
  • the expression level prediction model determination module includes:
  • the optimal cell texture feature acquisition sub-module is used to obtain the respective optimal cell texture features corresponding to the grayscale images of multiple sample cells, and obtain multiple optimal cell texture features;
  • the prediction error determination sub-module is used to input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and output the predicted protein expression according to the plurality of optimal regression models and the corresponding optimal cell texture features.
  • the expression label is used to obtain the prediction error of each optimized regression model for the optimal cell texture feature;
  • the optimal regression model screening sub-module is used to determine the optimal regression model with the smallest prediction error for the optimal cell texture feature from a plurality of optimal regression models, as the expression amount prediction model.
  • the target cell determination module includes:
  • the target expression level determination submodule is used to sort the multiple predicted protein expression levels, and from the multiple predicted protein expression levels after sorting, determine the predicted protein expression level of the first preset number as the target expression level;
  • the cell to be tested screening sub-module is used to determine the grayscale image of the cell to be tested corresponding to the target expression level, and to determine the cell to be tested corresponding to the grayscale image of the cell to be tested as the target cell.
  • Each module in the above-mentioned cell screening device based on the expression level prediction model can be realized in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 9 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer program when executed by the processor, implements a cell screening method based on an expression level prediction model.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 9 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the processor when the processor executes the computer program, it also implements the steps in the other embodiments described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the computer program when executed by the processor, also implements the steps in the other embodiments described above.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A cell screening method and apparatus based on an expression level prediction model, a computer device, and a storage medium. The method comprises: acquiring gray scale images of cells to be predicted respectively corresponding to multiple cells to be predicted in a cell culture pool, and acquiring target cell texture features respectively corresponding to multiple gray scale images of cells to be predicted, the target cell texture features being optimal cell texture features determined from multiple cell texture features in advance (101); inputting the target cell texture features of the multiple cells to be predicted into a pre-trained expression level prediction model, and obtaining, according to an output of the expression level prediction model, predicted protein expression levels respectively corresponding to the multiple cells to be predicted (102); and determining, from the multiple cells to be predicted, according to the predicted protein expression levels, a target cell having the predicted protein expression level meeting a set condition (103). A cell having a high protein expression level is determined rapidly, cell screening can be performed without performing repeated culture and screening, and thus the screening period is greatly shortened.

Description

基于表达量预测模型的细胞筛选方法和装置Cell screening method and device based on expression level prediction model
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年08月26日递交至中国国家知识产权局、申请号为202010870681.2、发明名称为“基于表达量预测模型的细胞筛选方法和装置”的中国专利申请的优先权,其全部内容通过引用合并入本申请中。This application claims the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on August 26, 2020, the application number is 202010870681.2, and the invention title is "Cell Screening Method and Device Based on Expression Prediction Model", the entire content of which is Incorporated into this application by reference.
技术领域technical field
本申请涉及生物技术领域,特别是涉及一种基于表达量预测模型的细胞筛选方法、装置、计算机设备和存储介质。The present application relates to the field of biotechnology, in particular to a cell screening method, device, computer equipment and storage medium based on an expression level prediction model.
背景技术Background technique
随着基因工程技术的不断发展,从细胞池中分离出能够表达特定产物的单克隆细胞株已成为生物领域中的常见需求。With the continuous development of genetic engineering technology, the isolation of monoclonal cell lines capable of expressing specific products from cell pools has become a common requirement in the biological field.
在现有技术中,在获取用于培养单克隆细胞株的细胞时,可以先对细胞池中的细胞进行转染,并采用有限稀释法对细胞池进行处理,得到单个细胞,进而可以采用单个细胞培养具有同质性的细胞群体,即细胞株,并筛选其中目的蛋白表达量高的细胞株。In the prior art, when obtaining cells for culturing a monoclonal cell line, the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained. Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
然而,采用有限稀释法获取单细胞的过程较为繁琐,需要反复地培养和筛选,同时,由于细胞转染效率问题,目的蛋白表达水平高的细胞比例较低,导致筛选细胞筛选工作效率较低,筛选周期长,难以快速、准确地获取具有高目的蛋白表达量的细胞。However, the process of obtaining single cells by the limiting dilution method is cumbersome and requires repeated cultivation and screening. At the same time, due to the problem of cell transfection efficiency, the proportion of cells with high expression levels of the target protein is low, resulting in low efficiency of screening cells. The screening cycle is long, and it is difficult to obtain cells with high target protein expression quickly and accurately.
发明内容SUMMARY OF THE INVENTION
基于此,有必要针对上述技术问题,提供一种基于表达量预测模型的细胞筛选方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a cell screening method, device, computer equipment and storage medium based on an expression level prediction model for the above technical problems.
一种基于表达量预测模型的细胞筛选方法,所述方法包括:A cell screening method based on an expression level prediction model, the method comprising:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance. The optimal cell texture feature determined from the texture feature;
将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;Inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and according to the output of the expression prediction model, the predicted protein expression corresponding to the plurality of cells to be tested is obtained respectively; the expression The quantity prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, and the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture. The protein expression corresponding to the feature;
根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。According to the predicted protein expression level, a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
可选地,还包括:Optionally, also include:
获取样本细胞灰度图及其对应的荧光图;Obtain the grayscale image of the sample cell and its corresponding fluorescence image;
获取所述样本细胞灰度图的多个样本细胞纹理特征,以及所述荧光图对应的真实蛋白表达量;Acquiring multiple sample cell texture features of the grayscale image of the sample cells, and the real protein expression corresponding to the fluorescence image;
根据所述真实蛋白表达量,得到多个样本细胞纹理特征分别对应的表达量标签;According to the real protein expression, obtain the expression labels corresponding to the texture features of a plurality of sample cells respectively;
采用多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练;Use multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types;
根据多个回归模型的训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,并从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型。According to the training results of multiple regression models, the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and the one with the smallest prediction error is determined from the multiple regression models. A regression model is used as the expression level prediction model.
可选地,所述采用所述多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练,包括:Optionally, using the multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types, including:
将多个样本细胞纹理特征分别输入所述多个回归模型,根据所述多个回归模型输出的预测蛋白表达量和对应样本细胞纹理特征的表达量标签,得到各个回归模型的当前预测误差;Inputting the texture features of multiple sample cells into the multiple regression models respectively, and obtaining the current prediction errors of the respective regression models according to the predicted protein expression levels output by the multiple regression models and the expression level labels corresponding to the texture features of the sample cells;
针对各个回归模型,根据所述当前预测误差,调整所述回归模型的模型参数,再重新输入样本细胞纹理特征进行模型训练,直到满足训练结束条件,得到所述回归模型的优化回归模型。For each regression model, according to the current prediction error, the model parameters of the regression model are adjusted, and then the texture features of the sample cells are re-input to perform model training until the training end condition is met, and an optimized regression model of the regression model is obtained.
可选地,所述将多个样本细胞纹理特征分别输入所述多个回归模型,包括:Optionally, the inputting the texture features of multiple sample cells into the multiple regression models includes:
根据同一样本细胞灰度图对应的多个样本细胞纹理特征,确定多个纹理特征组合;每个纹理特征组合中包含一个或者多个样本细胞纹理特征,且所述多个纹理特征组合对应的表达量标签相同;Determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations The quantity label is the same;
将多个纹理特征组合分别输入所述多个回归模型。A plurality of texture feature combinations are respectively input into the plurality of regression models.
可选地,所述从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,包括:Optionally, the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of a plurality of samples, including:
将各个纹理特征组合分别输入不同类型的多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量,得到各个纹理特征组合对应的预测误差大小;Inputting each texture feature combination into multiple optimized regression models of different types, and obtaining the prediction error size corresponding to each texture feature combination according to the predicted protein expression output by the multiple optimized regression models;
针对每个纹理特征组合,根据所述预测误差大小,确定所述纹理特征组合对预测蛋白表达量的贡献程度;For each texture feature combination, according to the size of the prediction error, determine the contribution degree of the texture feature combination to the predicted protein expression level;
根据多个纹理特征组合各自对预测蛋白表达量的贡献程度,确定出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征。According to the respective contributions of multiple texture feature combinations to the predicted protein expression, the optimal cell texture feature with the highest contribution to the predicted protein expression was determined.
可选地,所述从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型,包括:Optionally, the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, including:
获取多个样本细胞灰度图各自对应的最优细胞纹理特征,得到多个最优细胞纹理特征;Obtain the optimal cell texture features corresponding to each of the grayscale images of multiple sample cells, and obtain multiple optimal cell texture features;
将所述多个最优细胞纹理特征分别输入所述多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量和对应最优细胞纹理特征的表达量标签,得到各个优化回归模型针对最优细胞纹理特征的预测误差;The multiple optimal cell texture features are respectively input into the multiple optimized regression models, and each optimized regression model is obtained according to the predicted protein expression output from the multiple optimized regression models and the expression label corresponding to the optimal cell texture features. The prediction error of the model for the optimal cell texture feature;
从多个优化回归模型中,确定出针对最优细胞纹理特征的预测误差最小的优化回归模型,作为表达量预测模型。From multiple optimal regression models, the optimal regression model with the smallest prediction error for the optimal cell texture feature is determined as the expression prediction model.
可选地,所述根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞,包括:Optionally, according to the predicted protein expression, the target cells whose predicted protein expression meets the set condition are determined from the plurality of cells to be tested, including:
对多个预测蛋白表达量进行排序,并从排序后的多个预测蛋白表达量中,将排序最前的预设数量的预测蛋白表达量确定为目标表达量;Sorting the multiple predicted protein expression levels, and from the multiple predicted protein expression levels after sorting, determining the predicted protein expression level of the first preset number as the target expression level;
确定所述目标表达量对应的待测细胞灰度图,并将所述待测细胞灰度图对应的待测细胞确定为目标细胞。The grayscale image of the cell to be tested corresponding to the target expression level is determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
一种基于表达量预测模型的细胞筛选装置,所述装置包括:A cell screening device based on an expression level prediction model, the device comprising:
目标细胞纹理特征获取模块,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;The target cell texture feature acquisition module is used to obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and to obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the target The cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
细胞表达量预测模块,用于将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;The cell expression level prediction module is used for inputting the target cell texture features of multiple cells to be tested into the pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding cells of the multiple cells to be tested. Predicting protein expression level; the expression level prediction model is obtained by training based on the texture features of multiple sample cells with expression level labels, and the expression level labels are used to characterize the real protein expression level corresponding to the texture features of each sample cell, and the expression level The prediction model is used to predict the protein expression corresponding to the texture features of the target cells;
目标细胞确定模块,用于根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。The target cell determination module is configured to determine, according to the predicted protein expression amount, target cells whose predicted protein expression amount satisfies a set condition from the plurality of cells to be tested.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如上所述的基于表达量预测模型的细胞筛选方法的步骤。A computer device includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the processor implements the steps of the above-described cell screening method based on an expression level prediction model.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的基于表达量预测模型的细胞筛选方法的步骤。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-described cell screening method based on an expression level prediction model.
上述一种基于表达量预测模型的细胞筛选方法、装置、计算机设备和存储介质,通过获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征,将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,根据表达量预测模型的输出,得到多个待测细胞分别对应的预测蛋白表达量,根据预测蛋白表达量,从多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞,实现了快速确定具有高蛋白表达量的细胞,避免需要经过反复培养和筛选后才能进行细胞筛选,大大缩短的筛选周期,并且,通过本申请可以快速处理上百万的单细胞,在增加细胞筛选范围的同时,减少了工作人员的工作量,有效提升细胞筛选效率。The above-mentioned cell screening method, device, computer equipment and storage medium based on an expression level prediction model, by acquiring grayscale images of cells to be tested corresponding to a plurality of cells to be tested in a cell culture tank, and obtaining a plurality of grayscale images of cells to be tested. The target cell texture features corresponding to the degree map respectively, input the target cell texture features of multiple test cells into the pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the multiple test cells according to the output of the expression prediction model. According to the predicted protein expression, the target cells whose predicted protein expression meets the set conditions are determined from multiple cells to be tested, which realizes the rapid determination of cells with high protein expression and avoids the need for repeated culture and screening. Cell screening can greatly shorten the screening period, and the application can quickly process millions of single cells, while increasing the range of cell screening, reducing the workload of staff and effectively improving the efficiency of cell screening.
附图说明Description of drawings
图1为一个实施例中一种基于表达量预测模型的细胞筛选方法的流程示意图;1 is a schematic flowchart of a cell screening method based on an expression level prediction model in one embodiment;
图2为一个实施例中生成表达量预测模型的步骤的流程示意图;2 is a schematic flowchart of steps of generating an expression level prediction model in one embodiment;
图3a为一个实施例中一种样本细胞灰度图;Figure 3a is a grayscale image of a sample cell in one embodiment;
图3b为一个实施例中一种细胞荧光图;Fig. 3b is a kind of cell fluorescence map in one embodiment;
图4为一个实施例中回归模型优化方法的步骤的流程示意图;4 is a schematic flowchart of steps of a regression model optimization method in one embodiment;
图5为一个实施例中确定最优细胞纹理特征的步骤的流程示意图;5 is a schematic flowchart of steps of determining optimal cell texture features in one embodiment;
图6为一个实施例中筛选优化回归模型步骤的流程示意图;6 is a schematic flowchart of steps of screening and optimizing regression models in one embodiment;
图7为一个实施例中细胞筛选的流程示意图;7 is a schematic flow chart of cell screening in one embodiment;
图8为一个实施例中一种基于表达量预测模型的细胞筛选装置的结构框图;8 is a structural block diagram of a cell screening device based on an expression level prediction model in one embodiment;
图9为一个实施例中计算机设备的内部结构图。Figure 9 is a diagram of the internal structure of a computer device in one embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
为了便于对本发明实施例的理解,先对现有技术进行说明。In order to facilitate the understanding of the embodiments of the present invention, the prior art is first described.
在现有技术中,在获取用于培养单克隆细胞株的细胞时,可以先对细胞池中的细胞进行转染,并采用有限稀释法对细胞池进行处理,得到单个细胞,进而可以采用单个细胞培养具有同质性的细胞群体,即细胞株,并筛选其中目的蛋白表达量高的细胞株。In the prior art, when obtaining cells for culturing a monoclonal cell line, the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained. Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
然而,采用有限稀释法获取单细胞的过程较为繁琐,需要反复地培养和筛选,同时,由于细胞转染效率问题,目的蛋白表达水平高的细胞比例较低,导致筛选细胞筛选工作效率较低,筛选周期长,传统方法往往需要耗时6个月甚至更多,在耗费大量人力物力支持的同时,难以满足规模化、产业化的需求。However, the process of obtaining single cells by the limiting dilution method is cumbersome and requires repeated cultivation and screening. At the same time, due to the problem of cell transfection efficiency, the proportion of cells with high expression levels of the target protein is low, resulting in low efficiency of screening cells. The screening cycle is long, and traditional methods often take 6 months or more. While consuming a lot of human and material resources, it is difficult to meet the needs of scale and industrialization.
在一个实施例中,如图1所示,提供了一种基于表达量预测模型的细胞筛选方法,本实施例以该方法应用于终端进行举例说明,可以理解的是,该方法也可以应用于服务器,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现该方法。本实施例中,该方法包括以下步骤:In one embodiment, as shown in FIG. 1 , a cell screening method based on an expression level prediction model is provided. In this embodiment, the method is applied to a terminal for illustration. It is understood that this method can also be applied to a terminal. The server can also be applied to a system including a terminal and a server, and the method is implemented through the interaction between the terminal and the server. In this embodiment, the method includes the following steps:
步骤101,获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征。Step 101: Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from The optimal cell texture feature determined from a variety of cell texture features.
作为一示例,待测细胞可以是经过转染技术处理的细胞,待测细胞可以是在转染技术处理后未能获得外源DNA片段的细胞,也可以是已获得外源DNA片段但未整合到染色体中的细胞,或者是外源DNA片段已整合到染色体中的细胞,待测细胞灰度图是待测细胞的灰度图;目标细胞纹理特征是反应待测细胞灰度图的图像特征的信息。As an example, the cells to be tested may be cells that have been processed by transfection technology, the cells to be tested may be cells that fail to obtain exogenous DNA fragments after treatment with transfection technology, or cells that have obtained exogenous DNA fragments but have not integrated Cells that have been integrated into chromosomes, or cells whose exogenous DNA fragments have been integrated into chromosomes, the grayscale image of the cell to be tested is the grayscale image of the cell to be tested; the texture feature of the target cell is the image feature that reflects the grayscale image of the cell to be tested. Information.
在实际应用中,可以对细胞培养池中的多个细胞进行转染,使得细胞培养池中的部分或全部细胞可以获得外源DNA片段。在进行转染技术处理后,可以通过显微摄影设备获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并确定每张待测细胞灰度图分别对应 的目标细胞纹理特征,目标细胞纹理特征是预先从多种细胞纹理特征中确定出的最优细胞纹理特征。In practical applications, multiple cells in the cell culture pool can be transfected, so that some or all of the cells in the cell culture pool can obtain exogenous DNA fragments. After the transfection technology is processed, the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank can be obtained through a photomicrography device, and the texture of the target cells corresponding to each grayscale image of the cells to be tested can be determined. The target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features.
步骤102,将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量。Step 102: Input the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the plurality of cells to be tested according to the output of the expression prediction model; The expression level prediction model is obtained by training based on the texture features of a plurality of sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell, and the expression level prediction model is used to predict The protein expression corresponding to the texture features of the target cells.
其中,预测蛋白表达量是表达量预测模型基于目标细胞纹理特征预测的蛋白表达量,表达量预测模型是采用具有表达量标签的多个样本细胞纹理特征进行训练得到的,该表达量预测模型可用于预测目标细胞纹理特征对应的蛋白表达量。表达量标签用于表征样本细胞纹理特征对应的真实蛋白量,样本细胞纹理特征对应的真实蛋白量,是指样本细胞灰度图中细胞的真实蛋白表达量。Among them, the predicted protein expression is the protein expression predicted by the expression prediction model based on the texture features of the target cells, and the expression prediction model is obtained by training the texture features of multiple sample cells with expression labels. The expression prediction model can be used It is used to predict the protein expression corresponding to the texture features of the target cells. The expression label is used to characterize the real protein amount corresponding to the texture features of the sample cells, and the real protein amount corresponding to the texture features of the sample cells refers to the real protein expression amount of the cells in the grayscale image of the sample cells.
在具体实现中,在获取多个待测细胞的目标细胞纹理特征后,即可将目标细胞纹理特征输入预先训练的表达量预测模型,根据表达量预测模型的输出,可以得到多个待测细胞分别对应的预测蛋白表达量。In a specific implementation, after acquiring the target cell texture features of multiple cells to be tested, the texture features of the target cells can be input into a pre-trained expression prediction model, and according to the output of the expression prediction model, multiple cells to be tested can be obtained The corresponding predicted protein expression levels.
在实际应用中,可以通过细胞的荧光图确定细胞的蛋白表达量,然而,拍摄细胞的荧光图将导致细胞失去活性,细胞无法继续增殖。在本实施例中,通过待测细胞灰度图的细胞纹理特征,获取预测蛋白表达量,能够在预估蛋白表达量的同时,避免细胞失活。In practical applications, the protein expression level of the cells can be determined by the fluorescence image of the cells. However, taking the fluorescence image of the cells will cause the cells to lose activity and the cells cannot continue to proliferate. In this embodiment, the predicted protein expression level is obtained through the cell texture feature of the grayscale image of the cell to be tested, which can avoid cell inactivation while estimating the protein expression level.
步骤103,根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。 Step 103 , according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets the set condition.
在确定预测蛋白表达量后,可以将预测蛋白表达量满足设定条件的待测细胞,确定为目标细胞。After the predicted protein expression level is determined, the cells to be tested whose predicted protein expression level meets the set conditions can be determined as target cells.
在本实施例中,可以获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征,将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,根据表达量预测模型的输出,得到多个待测细胞分别对应的预测蛋白表达量,根据预测蛋白表达量,从多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞,实现了快速确定具有高蛋白表达量的细胞,避免需要经过反复培养和筛选后才能进行细胞筛选,大大缩短的筛选周期,并且,通过本申请可以快速处理上百万的单细胞,在增加细胞筛选范围的同时,减少了工作人员的工作量,有效提升细胞筛选效率。In this embodiment, the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank can be obtained, and the texture features of the target cells corresponding to the grayscale images of the cells to be tested can be obtained, and the plurality of grayscale images of the cells to be tested can be obtained. The target cell texture features of the cells are input into the pre-trained expression prediction model, and the predicted protein expression corresponding to the multiple cells to be tested is obtained according to the output of the expression prediction model. Determine the target cells whose predicted protein expression meets the set conditions, realize the rapid determination of cells with high protein expression, avoid the need to undergo repeated culture and screening before cell screening, greatly shorten the screening cycle, and, through the present application It can quickly process millions of single cells, while increasing the range of cell screening, reducing the workload of the staff and effectively improving the efficiency of cell screening.
在一个实施例中,如图2所示,基于表达量预测模型的细胞筛选方法还可以包括如下步骤:In one embodiment, as shown in Figure 2, the cell screening method based on the expression level prediction model may further include the following steps:
步骤201,获取样本细胞灰度图及其对应的荧光图。Step 201: Obtain a grayscale image of the sample cell and its corresponding fluorescence image.
其中,样本细胞灰度图及其对应的荧光图,是在相同的拍摄条件下,对相同的细胞进行拍摄得到的灰度图和荧光图。Among them, the grayscale image of the sample cell and its corresponding fluorescence image are the grayscale image and the fluorescence image obtained by photographing the same cell under the same shooting conditions.
在具体实现中,可以设置作为训练集的细胞,并通过显微拍摄设备对该细胞进行拍摄,获取样本细胞灰度图和对应的荧光图。In a specific implementation, cells as a training set can be set, and the cells are photographed by a microscopic photographing device to obtain a grayscale image and a corresponding fluorescence image of the sample cells.
具体而言,作为训练集的细胞,是细胞培养池中经过转染技术处理后的细胞,该细胞可以是在转染技术处理后未能获得外源DNA片段的细胞,也可以是已获得外源DNA片段但未整合到染色体中的细胞,或者是外源DNA片段已整合到染色体中的细胞。Specifically, the cells used as the training set are the cells in the cell culture pool that have been processed by transfection technology. The cells can be cells that have not been able to obtain exogenous DNA fragments after the transfection technology treatment, or can be cells that have obtained foreign DNA fragments. A cell in which the source DNA fragment has not been integrated into the chromosome, or a cell in which the exogenous DNA fragment has been integrated into the chromosome.
针对细胞培养池中经过转染技术处理的同一批细胞,可以在相同的拍摄条件下,使用显微镜同时拍摄灰度图和荧光图,得到的灰度图和荧光图中可以包括一个或多个细胞,灰度图中每个细胞的坐标与荧光图中该细胞的坐标对应。由于同一灰度图和荧光图中可以同时包含多个细胞,在得到灰度图和荧光图后,可以对灰度图和荧光图进行图像预处理,得到单细胞对应的样本细胞灰度图和荧光图,如图3a和图3b所示,其中,图像预处理可以包括细胞切分处理、粘连细胞过滤处理。For the same batch of cells in the cell culture pool treated with transfection technology, the grayscale image and the fluorescence image can be captured simultaneously with a microscope under the same shooting conditions, and the obtained grayscale image and fluorescence image can include one or more cells , the coordinates of each cell in the grayscale image correspond to the coordinates of that cell in the fluorescence image. Since the same grayscale image and fluorescence image can contain multiple cells at the same time, after obtaining the grayscale image and the fluorescence image, image preprocessing can be performed on the grayscale image and the fluorescence image to obtain the sample cell grayscale image corresponding to a single cell and Fluorescence images, as shown in Figure 3a and Figure 3b, wherein the image preprocessing can include cell segmentation processing, adhesion cell filtering processing.
步骤202,获取所述样本细胞灰度图的多个样本细胞纹理特征,以及所述荧光图对应的真实蛋白表达量。Step 202: Obtain a plurality of sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image.
在获取样本细胞灰度图后,可以从样本细胞灰度图中提取多个样本细胞纹理特征,并获取荧光图对应的真实蛋白表达量。After obtaining the grayscale image of the sample cells, multiple texture features of the sample cells can be extracted from the grayscale image of the sample cells, and the real protein expression corresponding to the fluorescence image can be obtained.
具体而言,具有不同蛋白表达量的细胞,其对应的荧光图和灰度图并不一致,即细胞的蛋白表达量可以与荧光图以及灰度图中的样本细胞纹理特征对应。Specifically, for cells with different protein expression levels, the corresponding fluorescence maps and grayscale maps are inconsistent, that is, the protein expression levels of cells can correspond to the texture features of the sample cells in the fluorescence map and the grayscale map.
在实际应用中,可以通过纹理特征提取算法,从样本细胞灰度图中提取多个样本细胞纹理特征,多张样本细胞灰度图对应的样本细胞纹理特征,可以组成纹理特征矩阵,例如,针对n张样本细胞灰度图,分别提取m个样本细胞纹理特征,可以构成一个n*m的纹理特征矩阵。其中,样本细胞纹理特征可以包括以下任一项或多项:灰度共享矩阵特征(Grey Level Co-occurrence Matrix Feature)、直方图特征(Histogram Feature)、(Laws Energy Texture Feature)、本地二进制模式特征(Local Binary Pattern Feature)、离散小波变换特征(Discrete Wavelet Transform),当然,本领域技术人员可以根据需要选择其他纹理特征,本申请对此不作限制。In practical applications, the texture feature extraction algorithm can be used to extract multiple sample cell texture features from the sample cell grayscale image, and the sample cell texture features corresponding to the multiple sample cell grayscale images can form a texture feature matrix. For example, for The texture features of m sample cells are extracted from n grayscale images of sample cells, which can form an n*m texture feature matrix. The sample cell texture features may include any one or more of the following: Gray Level Co-occurrence Matrix Feature (Grey Level Co-occurrence Matrix Feature), Histogram Feature (Histogram Feature), (Laws Energy Texture Feature), Local Binary Pattern Feature (Local Binary Pattern Feature), discrete wavelet transform feature (Discrete Wavelet Transform), of course, those skilled in the art can select other texture features as needed, which is not limited in this application.
在获取荧光图后,可以根据荧光图,确定对应的样本细胞灰度图中细胞的真实蛋白表达量。在实际应用中,通过目的基因(例如外源DNA片段)生成的蛋白质,可以在特定波长下发出荧光。在得到荧光图后,可以确定荧光图绿色通道对应的G值(也可以称为荧光值),该G值可以是荧光图片的总G值或平均G值,在确定G值后,并根据G值荧光图对应的真实蛋白表达量。其中,G值与蛋白表达量之间存在正相关的关系,即G值越高,对应细胞的蛋白表达量越高,基于此,可以通过荧光图的G值确定真实蛋白表达量。After acquiring the fluorescence image, the real protein expression level of the cells in the corresponding grayscale image of the sample cells can be determined according to the fluorescence image. In practical applications, proteins produced by genes of interest, such as exogenous DNA fragments, can fluoresce at specific wavelengths. After the fluorescence image is obtained, the G value corresponding to the green channel of the fluorescence image (also called the fluorescence value) can be determined. The G value can be the total G value or the average G value of the fluorescence image. The actual protein expression corresponding to the fluorescence map. Among them, there is a positive correlation between the G value and the protein expression, that is, the higher the G value, the higher the protein expression of the corresponding cell. Based on this, the real protein expression can be determined by the G value of the fluorescence map.
步骤203,根据所述真实蛋白表达量,得到多个样本细胞纹理特征分别对应的表达量标签。 Step 203 , according to the real protein expression, obtain expression labels corresponding to the texture features of a plurality of sample cells respectively.
在确定每张荧光图对应的真实蛋白表达量后,可以根据真实蛋白表达量确定多个样本细胞纹理特征分别对应的表达量标签。After the actual protein expression corresponding to each fluorescence image is determined, the expression labels corresponding to the texture features of multiple sample cells can be determined according to the actual protein expression.
具体的,可以采用真实蛋白表达量作为对应样本细胞灰度图的表达量标签,样本细胞灰度图的表达量标签即为对应样本细胞纹理特征的表达量标签。从同一样本细胞灰度图中提取的多个样本细胞纹理特征,具有相同的表达量标签;从不同样本细胞灰度图中提取的样本细胞纹理特征,表达量标签与样本细胞灰度图中细胞的真实蛋白表达量对应。Specifically, the real protein expression level may be used as the expression level label of the grayscale image of the corresponding sample cell, and the expression level label of the grayscale image of the sample cell is the expression level label corresponding to the texture feature of the sample cell. The texture features of multiple sample cells extracted from the grayscale image of the same sample cells have the same expression label; the texture features of sample cells extracted from the grayscale images of different sample cells, the expression labels are the same as the cells in the grayscale image of the sample cells. corresponds to the actual protein expression.
步骤204,采用多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练。 Step 204 , using multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types.
具体的,可以预先设置多种类型的回归模型,例如SVR(支持向量回归,Support Vactor Regression)模型、ElasticNet(弹性网络)模型、Xgboost模型、Gradient Boosting Regression模型、Logostic Regression模型。由于不同的回归模型实现形式并不相同,即底层数学原理不相同,采用不同的回归模型对提取的样本细胞纹理特征进行分析并预测蛋白表达量,可以得到不同的预测结果。Specifically, various types of regression models can be preset, such as SVR (Support Vector Regression, Support Vactor Regression) model, ElasticNet (Elastic Network) model, Xgboost model, Gradient Boosting Regression model, Logistic Regression model. Since different regression models have different implementation forms, that is, the underlying mathematical principles are different, different prediction results can be obtained by using different regression models to analyze the extracted sample cell texture features and predict protein expression.
基于此,在得到多个样本细胞纹理特征及其对应的表达量标签后,可以对不同类型的多个回归模型进行训练。Based on this, after obtaining multiple sample cell texture features and their corresponding expression labels, multiple regression models of different types can be trained.
步骤205,根据多个回归模型的训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,并从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型。 Step 205, according to the training results of the multiple regression models, select the optimal cell texture feature with the highest contribution to the predicted protein expression from the cell texture features of the multiple samples, and determine the prediction from the multiple regression models. The regression model with the smallest error is used as the expression level prediction model.
在进行训练后,可以获取多个回归性的训练结果,并根据训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,并基于最优细胞纹理特征,从多个回归模型中确定出预测误差最小的回归模型作为表达量预测模型。其中,最优细胞纹理特征可以包括一个或多个样本细胞纹理特征。After training, multiple regression training results can be obtained, and according to the training results, the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and based on the optimal cell texture features Cell texture features, from multiple regression models, determine the regression model with the smallest prediction error as the expression prediction model. The optimal cell texture features may include one or more sample cell texture features.
在本实施中,采用多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练,根据多个回归模型的训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优样本细胞纹理特征,并从多个回归模型中确定出预测误差最小的回归模型,作为表达量预测模型,能够基于多个样本细胞纹理特征,从多个维度考察细胞图像,建立细胞灰度图与细胞的蛋白表达量之间的联系,为快速预测蛋白表达量提供基础。In this implementation, multiple sample cell texture features and their corresponding expression labels are used to train multiple regression models of different types, and according to the training results of multiple regression models, the texture features of multiple sample cells are screened out. The optimal sample cell texture feature that contributes the most to the predicted protein expression, and the regression model with the smallest prediction error is determined from multiple regression models. As an expression prediction model, it can be based on multiple sample cell texture features. The cell image was examined in three dimensions, and the relationship between the cell grayscale image and the protein expression of the cell was established, which provided the basis for the rapid prediction of protein expression.
在一个实施例中,如图4所示,所述采用所述多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练,可以包括如下步骤:In one embodiment, as shown in FIG. 4 , using the multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types may include the following steps:
步骤401,将多个样本细胞纹理特征分别输入所述多个回归模型,根据所述多个回归模型输出的预测蛋白表达量和对应样本细胞纹理特征的表达量标签,得到各个回归模型的当前预测误差。Step 401: Input the texture features of multiple sample cells into the multiple regression models respectively, and obtain the current predictions of the respective regression models according to the predicted protein expression levels output by the multiple regression models and the expression level labels corresponding to the texture features of the sample cells. error.
在具体实现中,可以将多个样本细胞纹理特征分别输入到多个回归模型,得到多个回归模型输出的预测蛋白表达量,进而可以根据预测蛋白表达量和对应样本细胞纹理特征的表达量标签,确定各个回归模型的当前预测误差。In the specific implementation, the texture features of multiple sample cells can be input into multiple regression models respectively, and the predicted protein expression levels output by the multiple regression models can be obtained. , to determine the current prediction error of each regression model.
步骤402,针对各个回归模型,根据所述当前预测误差,调整所述回归模型的模型参数,再重新输入样本细胞纹理特征进行模型训练,直到满足训练结束条件,得到所述回归模型的优化回归模型。Step 402: For each regression model, according to the current prediction error, adjust the model parameters of the regression model, and then re-input the sample cell texture features for model training, until the training end condition is met, and an optimized regression model of the regression model is obtained. .
针对各个回归模型,在获取当前预测误差后,可以根据当前预测误差对回归模型的模型参数进行调整、优化,并再次将样本细胞纹理特征输入到回归模型中进行模型训练,模型参数经过调整的回归模型可以再次基于当前输入的细胞纹理特征,获取当前预测误差,并判断 当前预测误差是否满足训练结束条件,若当前预测误差满足训练结束条件时,可以将当前的回归模型确定为优化回归模型;若当前预测误差未满足训练结束条件,则可以重复上述步骤,继续对回归模型的模型参数进行优化。For each regression model, after obtaining the current prediction error, the model parameters of the regression model can be adjusted and optimized according to the current prediction error, and the sample cell texture features are input into the regression model again for model training. The model can obtain the current prediction error based on the current input cell texture feature again, and judge whether the current prediction error meets the training end condition. If the current prediction error meets the training end condition, the current regression model can be determined as the optimized regression model; if If the current prediction error does not meet the training end condition, the above steps can be repeated to continue to optimize the model parameters of the regression model.
在本实施例中,根据表达量标签与预测蛋白表达量对应的当前预测误差,不断调整回归模型的模型参数,直到得到优化回归模型,能够通过机器监督学习的方法不断优化回归模型,提高回归模型的预测准确性。In this embodiment, according to the current prediction error corresponding to the expression level label and the predicted protein expression level, the model parameters of the regression model are continuously adjusted until an optimized regression model is obtained, and the regression model can be continuously optimized through the method of machine-supervised learning to improve the regression model. prediction accuracy.
在一个实施例中,所述将多个样本细胞纹理特征分别输入所述多个回归模型,可以包括如下步骤:In one embodiment, the inputting the multiple sample cell texture features into the multiple regression models may include the following steps:
根据同一样本细胞灰度图对应的多个样本细胞纹理特征,确定多个纹理特征组合;每个纹理特征组合中包含一个或者多个样本细胞纹理特征,且所述多个纹理特征组合对应的表达量标签相同;将多个纹理特征组合分别输入所述多个回归模型。Determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations The quantity labels are the same; multiple texture feature combinations are input into the multiple regression models respectively.
在实际应用中,在将样本细胞纹理特征输入多个回归模型时,针对同一样本细胞灰度图对应的多个样本细胞纹理特征,可以对多个样本细胞纹理特征进行组合,得到多个纹理特征组合,在每个纹理特征组合中,包括了一个或多个样本细胞纹理特征,由于多个纹理特征组合中的样本细胞纹理特征都来自于同一样本细胞灰度图,多个纹理特征组合具有相同的表达量标签。在获得多个纹理特征组合后,可以将多个纹理特征组合分别输入多个回归模型。In practical applications, when the texture features of sample cells are input into multiple regression models, for the texture features of multiple sample cells corresponding to the grayscale image of the same sample cells, the texture features of multiple sample cells can be combined to obtain multiple texture features. Combination, in each texture feature combination, one or more sample cell texture features are included. Since the sample cell texture features in multiple texture feature combinations are all from the same sample cell grayscale image, multiple texture feature combinations have the same expression label. After obtaining multiple texture feature combinations, the multiple texture feature combinations can be input into multiple regression models respectively.
具体的,在确定纹理特征组合时,可以使用相关性分析等方法,对多个样本细胞纹理特征进行分析,得到各个样本细胞纹理特征对应的相关性分值,相关性分值用于表征对应样本细胞纹理特征对预测蛋白表达量的重要程度,两者为正相关的关系,即样本细胞纹理特征的相关性分值越高,在预测蛋白表达量时所起作用越重要。Specifically, when determining the texture feature combination, methods such as correlation analysis can be used to analyze the texture features of multiple sample cells to obtain a correlation score corresponding to the texture features of each sample cell, and the correlation score is used to characterize the corresponding sample. The importance of cell texture features in predicting protein expression is positively correlated, that is, the higher the correlation score of sample cell texture features, the more important it plays in predicting protein expression.
在获取相关性分值后,可以根据相关性分值从高到低,对多个样本细胞纹理特征进行排序。针对排序后的样本细胞纹理特征,可以依次从中选取样本细胞纹理特征生成纹理特征组合。例如,有三个样本细胞纹理特,根据相关性分值从高到低进行排序,得到的顺序为样本细胞纹理特征A、样本细胞纹理特征B、样本细胞纹理特征C,在生成纹理特征组合时,可以先选取最高分值的样本细胞纹理特征A作为一个细胞纹理特征组合,再选取次高分值的样本细胞纹理特征B与原来的样本细胞纹理特征A,生成一个细胞纹理特征组合,最后,可以选取最低分值的样本细胞纹理特征C,与原来的样本细胞纹理特征A和样本细胞纹理特征B生成一个细胞纹理特征组合。After the correlation score is obtained, the cell texture features of multiple samples can be sorted according to the correlation score from high to low. For the sorted sample cell texture features, the sample cell texture features can be selected in turn to generate a texture feature combination. For example, there are three sample cell texture features, which are sorted from high to low according to the correlation score, and the obtained order is sample cell texture feature A, sample cell texture feature B, and sample cell texture feature C. When generating the texture feature combination, The sample cell texture feature A with the highest score can be selected as a cell texture feature combination, and then the sample cell texture feature B with the second highest score and the original sample cell texture feature A can be selected to generate a cell texture feature combination. Select the sample cell texture feature C with the lowest score, and generate a cell texture feature combination with the original sample cell texture feature A and the sample cell texture feature B.
在本实施例中,对多个样本细胞纹理特征进行组合,确定多个纹理特征组合,将多个纹理特征组合分别输入多个回归模型,可以综合评估不同样本细胞特征对回归模型预测效果的影响,提高回归模型的预测准确度。In this embodiment, the texture features of multiple sample cells are combined to determine multiple texture feature combinations, and the multiple texture feature combinations are respectively input into multiple regression models, so that the influence of different sample cell features on the prediction effect of the regression model can be comprehensively evaluated , to improve the prediction accuracy of the regression model.
在一个实施例中,如图5所示,所述从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,包括:In one embodiment, as shown in FIG. 5 , the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of multiple samples, including:
步骤501,将各个纹理特征组合分别输入不同类型的多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量,得到各个纹理特征组合对应的预测误差大小。Step 501: Input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output from the multiple optimized regression models.
在得到多个纹理特征组合后,可以将各个纹理特征组合分别输入到不同类型的多个优化 回归模型中,得到多个优化回归模型输出的预测蛋白表达量,进而可以获取与纹理特征组合对应的表达量标签,根据输出的预测蛋白表达量与表达量标签,确定各个纹理特征组合对应的预测误差大小。通过获取预测误差大小,可以对优化回归模型的预测效果进行验证。After obtaining multiple texture feature combinations, each texture feature combination can be input into multiple optimized regression models of different types to obtain the predicted protein expression output by the multiple optimized regression models, and then the corresponding texture feature combinations can be obtained. Expression label, according to the output predicted protein expression and expression label, to determine the size of the prediction error corresponding to each texture feature combination. By obtaining the prediction error size, the prediction effect of the optimized regression model can be verified.
步骤502,针对每个纹理特征组合,根据所述预测误差大小,确定所述纹理特征组合对预测蛋白表达量的贡献程度。Step 502: For each texture feature combination, determine the contribution degree of the texture feature combination to the predicted protein expression according to the prediction error.
步骤503,根据多个纹理特征组合各自对预测蛋白表达量的贡献程度,确定出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征。Step 503: Determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of the multiple texture feature combinations to the predicted protein expression.
针对每个纹理特征组合,可以根据预测误差大小,确定该纹理特征组合对预测蛋白表达量的贡献程度,通过对比各个纹理特征组合各自对预测蛋白表达量的贡献程度,可以确定出对预测蛋白表达量的贡献程度最高的细胞纹理特征,即最优细胞纹理特征。For each texture feature combination, the contribution of the texture feature combination to the predicted protein expression can be determined according to the size of the prediction error. By comparing the contribution of each texture feature combination to the predicted protein expression, the predicted protein expression can be determined. The cell texture feature with the highest amount of contribution, that is, the optimal cell texture feature.
具体的,针对每个纹理特征组合,可以确定预测误差大小小于预设阈值的优化回归模型个数,个数与该纹理特征组合的贡献程度为正相关,例如,同一纹理特征组合,在输入不同类型的优化回归模型时,都可以得到预测误差大小小于预设的预测结果,则该纹理特征组合对蛋白表达量的准确预测,贡献程度高。Specifically, for each texture feature combination, the number of optimized regression models whose prediction error is smaller than a preset threshold can be determined, and the number is positively correlated with the contribution degree of the texture feature combination. When the type of optimization regression model is used, it can be obtained that the prediction error size is smaller than the preset prediction result, and the texture feature combination has a high degree of contribution to the accurate prediction of protein expression.
在确定多个纹理特征组合对预测蛋白表达量的贡献程度时,可以确定同时令到各个类型的优化回归模型出现最佳预测结果的纹理特征组合,该纹理特征组合中的样本细胞纹理特征为最优细胞纹理特征。具体而言,由于各个纹理特征组合分别输入了不同类型的优化回归模型中,针对每个类型的优化回归模型,可以将多个纹理特征组合对应的预测误差大小从小到大排序,将排序最前的预设数量的预设误差大小确定为最佳预测结果。在不同类型的多个优化回归中,可以确定同时令到各个类型的优化回归模型出现最佳预测结果的纹理特征组合,该纹理特征组合中的样本细胞纹理特征为最优样本细胞纹理特征。When determining the degree of contribution of multiple texture feature combinations to the predicted protein expression, the texture feature combination that simultaneously produces the best prediction results for each type of optimized regression model can be determined, and the sample cell texture feature in the texture feature combination is the most Excellent cell texture features. Specifically, since each texture feature combination is input into different types of optimized regression models, for each type of optimized regression model, the prediction error sizes corresponding to multiple texture feature combinations can be sorted from small to large, and the highest ranked A preset error size of a preset number is determined as the best prediction result. In multiple optimization regressions of different types, a texture feature combination that simultaneously makes each type of optimization regression model produce the best prediction result can be determined, and the sample cell texture feature in the texture feature combination is the optimal sample cell texture feature.
在本实施例中,根据多个纹理特征组合各自对预测蛋白表达量的贡献程度,可以确定出对预测蛋白表达量的贡献程度最高的最优样本细胞纹理特征,能够在后续预测蛋白表达量的过程中,更有针对性地提取样本细胞纹理特征,避免过多提取其他贡献程度较低的细胞纹理特征,导致对预测结果产生干扰或浪费计算资源。In this embodiment, according to the respective contributions of multiple texture feature combinations to the predicted protein expression, the optimal sample cell texture feature with the highest contribution to the predicted protein expression can be determined, which can be used in the subsequent prediction of the protein expression. In the process, the cell texture features of the sample are extracted more targetedly, so as to avoid excessive extraction of other cell texture features with lower contributions, which may interfere with the prediction results or waste computing resources.
在一个实施例中,如图6所示,所述从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型,可以包括如下步骤:In one embodiment, as shown in FIG. 6 , the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, which may include the following steps:
步骤601,获取多个样本细胞灰度图各自对应的最优细胞纹理特征,得到多个最优细胞纹理特征。Step 601: Obtain the optimal cell texture features corresponding to each of the multiple sample cell grayscale images, and obtain a plurality of optimal cell texture features.
在实际应用中,可以获取多个样本细胞灰度图各自对应的最优细胞纹理特征,得到多个最优细胞纹理特征。In practical applications, the optimal cell texture features corresponding to each of the grayscale images of multiple sample cells can be obtained, and multiple optimal cell texture features can be obtained.
步骤602,将所述多个最优细胞纹理特征分别输入所述多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量和对应最优细胞纹理特征的表达量标签,得到各个优化回归模型针对最优细胞纹理特征的预测误差。Step 602: Input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and obtain the predicted protein expression output from the plurality of optimal regression models and the expression label corresponding to the optimal cell texture features. The prediction error of each optimized regression model for the optimal cell texture feature.
具体的,可以确定多个最优细胞纹理特征各自对应的表达量标签,并将多个最优细胞纹理特征分别输入多个优化回归模型,得到多个优化回归模型输出的预测蛋白表达量,进而可 以采用多个输出的预测蛋白表达量和对应的表达量标签,确定各个优化回归模型采用最优样本细胞纹理特征进行预测时的预测误差。Specifically, the expression labels corresponding to each of the multiple optimal cell texture features can be determined, and the multiple optimal cell texture features can be input into multiple optimized regression models respectively to obtain the predicted protein expression output by the multiple optimized regression models, and then Multiple output predicted protein expression levels and corresponding expression level labels can be used to determine the prediction error when each optimized regression model uses the optimal sample cell texture feature for prediction.
步骤603,从多个优化回归模型中,确定出针对最优细胞纹理特征的预测误差最小的优化回归模型,作为表达量预测模型。 Step 603 , from a plurality of optimal regression models, determine the optimal regression model with the smallest prediction error for the optimal cell texture feature as the expression prediction model.
在得到多个优化回归模型针对最优细胞纹理特征的预测误差后,可以将预测误差最小的优化回归模型,确定为表达量预测模型。After the prediction errors of multiple optimal regression models for the optimal cell texture feature are obtained, the optimal regression model with the smallest prediction error can be determined as the expression amount prediction model.
在另一个示例中,在获得多个样本细胞灰度图各自对应的最优细胞纹理特征,可以将多个最优细胞纹理特征分为十等份,进行十次实验,其中九份用于训练、验证回归模型,一份用于测试训练后的回归模型。进而,可以获取十次实验的平均预测误差,根据平均预测误差,从多个类型的回归模型中确定表达量预测模型,并对表达量预测模型进行性能评估。In another example, after obtaining the optimal cell texture features corresponding to each of the multiple sample cell grayscale images, the multiple optimal cell texture features can be divided into ten equal parts, and ten experiments are performed, of which nine parts are used for training , Validate the regression model, one for testing the trained regression model. Furthermore, the average prediction error of ten experiments can be obtained, the expression amount prediction model can be determined from multiple types of regression models according to the average prediction error, and the performance of the expression amount prediction model can be evaluated.
在本实施例中,将多个最优细胞纹理特征分别输入多个优化回归模型,并根据多个优化回归模型输出的预测蛋白表达量,以及对应最优细胞纹理特征的表达量标签,从多个优化回归模型中确定出表达量预测模型,能够基于对蛋白表达量贡献最大的最优纹理特征,从多个类型的回归模型中,确定具有准确预测效果的回归模型。In this embodiment, multiple optimal cell texture features are respectively input into multiple optimized regression models, and according to the predicted protein expression levels output by the multiple optimized regression models, and the expression level labels corresponding to the optimal cell texture features, from multiple The expression prediction model is determined from the optimal regression model, and the regression model with accurate prediction effect can be determined from multiple types of regression models based on the optimal texture feature that contributes the most to the protein expression.
在一个实施例中,所述根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞,可以包括如下步骤:In one embodiment, according to the predicted protein expression, determining the target cells whose predicted protein expression meets the set condition from the plurality of cells to be tested may include the following steps:
对多个预测蛋白表达量进行排序,并从排序后的多个预测蛋白表达量中,将排序最前的预设数量的预测蛋白表达量确定为目标表达量;确定所述目标表达量对应的待测细胞灰度图,并将所述待测细胞灰度图对应的待测细胞确定为目标细胞。Rank the multiple predicted protein expression levels, and from the sorted multiple predicted protein expression levels, determine the predicted protein expression level of the first preset number as the target expression level; determine the target expression level corresponding to the target expression level. The grayscale image of the cell is measured, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
在具体实现中,在得到多个待测细胞分别对应的预测蛋白表达量后,可以对多个预测蛋白表达量进行排序,并从排序后的多个预测蛋白表达量中,将排序最前的预设数量的预测蛋白表达量确定为目标表达量。In a specific implementation, after the predicted protein expression levels corresponding to the multiple cells to be tested are obtained, the multiple predicted protein expression levels can be sorted, and from the sorted multiple predicted protein expression levels, the first predicted protein expression level is sorted. The predicted protein expression level of the number is determined as the target expression level.
具体的,可以对预测蛋白表达量进行降序排列,即由大到小进行排序,在排序后,可以将前N名对应的预测蛋白表达量,确定为目标表达量。当然,在实际应用中,还可以将超过预设表达量阈值的预测蛋白表达量确定为目标表达量。Specifically, the predicted protein expression levels may be sorted in descending order, that is, sorted from large to small, and after sorting, the predicted protein expression levels corresponding to the top N names may be determined as the target expression levels. Of course, in practical applications, the predicted protein expression that exceeds the preset expression threshold can also be determined as the target expression.
在确定目标表达量后,可以确定目标表达量对应的待测细胞灰度图,并将待测细胞灰度图对应的待测细胞确定为目标细胞。该目标细胞可用于培养细胞株。After the target expression level is determined, the grayscale image of the cell to be tested corresponding to the target expression level can be determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell. The target cells can be used to culture cell lines.
在本实施例中,对多个预测蛋白表达量进行排序,并根据排序后的多个预测蛋白表达量中,从多个待测细胞中,将预测蛋白表达量排序最前的预设数量的细胞确定为目标细胞,能够快速筛选具有高蛋白表达量的细胞,大大减少了筛选工作量。In this embodiment, a plurality of predicted protein expression levels are sorted, and according to the sorted multiple predicted protein expression levels, from a plurality of cells to be tested, the predicted protein expression levels of the first preset number of cells are sorted Determined as target cells, cells with high protein expression can be quickly screened, which greatly reduces the screening workload.
为了使本领域技术人员能够更好地理解上述步骤,以下通过一个例子对本申请实施例加以示例性说明,但应当理解的是,本申请实施例并不限于此。In order to enable those skilled in the art to better understand the above steps, an example is used below to illustrate the embodiment of the present application, but it should be understood that the embodiment of the present application is not limited thereto.
在具体实现中,可以采用阿柏西普表达质粒转染多个CHO-K1宿主细胞,得到多个经过转染技术处理的CHO-K1宿主细胞,即本申请中的待测细胞。在得到多个待测细胞后,如图7所示,可以通过显微拍摄获取多个待测细胞各自对应的待测细胞灰度图。In a specific implementation, aflibercept expression plasmid can be used to transfect multiple CHO-K1 host cells to obtain multiple CHO-K1 host cells treated with transfection technology, that is, the cells to be tested in this application. After obtaining a plurality of cells to be tested, as shown in FIG. 7 , a grayscale image of the cells to be tested corresponding to each of the plurality of cells to be tested can be obtained by microphotography.
针对每张待测细胞灰度图,可以对待测细胞灰度图进行纹理特征分析,获取细胞纹理特 征,并将细胞纹理特征输入表达量预测模型,对细胞的蛋白表达量进行预测,获取模型输出的预测蛋白表达量。For each grayscale image of the cell to be tested, the texture feature analysis of the grayscale image of the cell to be tested can be performed to obtain the cell texture feature, and the cell texture feature can be input into the expression prediction model to predict the protein expression of the cell and obtain the model output. predicted protein expression.
在该待测细胞灰度图处理完毕后,可以判断是否完成所有待测细胞灰度图的处理,若否,可以返回对待测细胞灰度图进行纹理特征分析、获取细胞纹理特征的步骤;若是,可以根据多个待测细胞的预测蛋白表达量,得到多个待测细胞的表达量排序,从排序结果中确定具有高蛋白表达量的待测细胞,生成筛选报告并提交。After the grayscale images of the cells to be tested are processed, it can be determined whether the processing of all grayscale images of the cells to be tested is completed. If not, the steps of performing texture feature analysis on the grayscale images of the cells to be tested and obtaining cell texture features can be returned; , according to the predicted protein expression levels of multiple cells to be tested, the expression levels of multiple cells to be tested can be sorted, and the cells to be tested with high protein expression levels can be determined from the sorting results, and a screening report can be generated and submitted.
应该理解的是,虽然图1、2、4-7的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明表明这些步骤需要按照顺序依次执行,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1、2、4-7中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the steps in the flowcharts of FIGS. 1 , 2 , and 4-7 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless it is expressly stated herein that these steps need to be performed sequentially, there is no strict sequence restriction on the execution of these steps, and these steps can be performed in other sequences. Moreover, at least a part of the steps in Figs. 1, 2, 4-7 may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. These steps Alternatively, the order of execution of the stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in the other steps.
在一个实施例中,如图8所示,提供了一种基于表达量预测模型的细胞筛选装置,所述装置包括:In one embodiment, as shown in Figure 8, a cell screening device based on an expression level prediction model is provided, the device comprising:
目标细胞纹理特征获取模块801,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;The target cell texture feature acquisition module 801 is used to acquire the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank, and to acquire the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the The target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
细胞表达量预测模块802,用于将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;The cell expression level prediction module 802 is used for inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding The predicted protein expression level is obtained by training the expression level prediction model according to the texture features of multiple sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell. The quantity prediction model is used to predict the protein expression quantity corresponding to the texture feature of the target cell;
目标细胞确定模块803,用于根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。The target cell determination module 803 is configured to, according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets a set condition.
在一个实施例中,还包括:In one embodiment, it also includes:
图像获取模块,用于获取样本细胞灰度图及其对应的荧光图;The image acquisition module is used to acquire the grayscale image of the sample cell and its corresponding fluorescence image;
样本细胞纹理特征获取模块,用于获取所述样本细胞灰度图的多个样本细胞纹理特征,以及所述荧光图对应的真实蛋白表达量;a sample cell texture feature acquisition module, used for acquiring multiple sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image;
表达量标签确定模块,用于根据所述真实蛋白表达量,得到多个样本细胞纹理特征分别对应的表达量标签;an expression label determination module, configured to obtain expression labels corresponding to texture features of a plurality of sample cells according to the real protein expression;
训练模块,用于采用多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练;The training module is used to train multiple regression models of different types by using multiple sample cell texture features and their corresponding expression labels;
表达量预测模型确定模块,用于根据多个回归模型的训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,并从所述多个回归模型 中确定出预测误差最小的回归模型,作为所述表达量预测模型。The expression level prediction model determination module is used to select the optimal cell texture feature with the highest contribution to the predicted protein expression level from the cell texture features of multiple samples according to the training results of multiple regression models, Among the regression models, the regression model with the smallest prediction error is determined as the expression prediction model.
在一个实施例中,所述训练模块,包括:In one embodiment, the training module includes:
样本细胞纹理特征输入子模块,用于将多个样本细胞纹理特征分别输入所述多个回归模型,根据所述多个回归模型输出的预测蛋白表达量和对应样本细胞纹理特征的表达量标签,得到各个回归模型的当前预测误差;The sample cell texture feature input sub-module is used to input the multiple sample cell texture features into the multiple regression models respectively, and according to the predicted protein expression output from the multiple regression models and the expression label corresponding to the sample cell texture features, Get the current prediction error of each regression model;
回归模型优化子模块,用于针对各个回归模型,根据所述当前预测误差,调整所述回归模型的模型参数,再重新输入样本细胞纹理特征进行模型训练,直到满足训练结束条件,得到所述回归模型的优化回归模型。The regression model optimization sub-module is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met, and the regression model is obtained. The optimized regression model of the model.
在一个实施例中,所述样本细胞纹理特征输入子模块,包括:In one embodiment, the sample cell texture feature input sub-module includes:
纹理特征组合确定单元,用于根据同一样本细胞灰度图对应的多个样本细胞纹理特征,确定多个纹理特征组合;每个纹理特征组合中包含一个或者多个样本细胞纹理特征,且所述多个纹理特征组合对应的表达量标签相同;The texture feature combination determination unit is used to determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the The expression labels corresponding to multiple texture feature combinations are the same;
纹理特征组合输入单元,用于将多个纹理特征组合分别输入所述多个回归模型。The texture feature combination input unit is used for inputting multiple texture feature combinations into the multiple regression models respectively.
在一个实施例中,所述表达量预测模型确定模块,包括:In one embodiment, the expression level prediction model determination module includes:
预测误差大小确定子模块,用于将各个纹理特征组合分别输入不同类型的多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量,得到各个纹理特征组合对应的预测误差大小;The prediction error size determination sub-module is used to input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output by the multiple optimized regression models ;
贡献程度确定子模块,用于针对每个纹理特征组合,根据所述预测误差大小,确定所述纹理特征组合对预测蛋白表达量的贡献程度;Contribution degree determination sub-module, for each texture feature combination, according to the prediction error size, to determine the contribution degree of the texture feature combination to the predicted protein expression level;
最优细胞纹理特征确定子模块,用于根据多个纹理特征组合各自对预测蛋白表达量的贡献程度,确定出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征。The optimal cell texture feature determination sub-module is used to determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of multiple texture feature combinations to the predicted protein expression.
在一个实施例中,所述表达量预测模型确定模块,包括:In one embodiment, the expression level prediction model determination module includes:
最优细胞纹理特征获取子模块,用于获取多个样本细胞灰度图各自对应的最优细胞纹理特征,得到多个最优细胞纹理特征;The optimal cell texture feature acquisition sub-module is used to obtain the respective optimal cell texture features corresponding to the grayscale images of multiple sample cells, and obtain multiple optimal cell texture features;
预测误差确定子模块,用于将所述多个最优细胞纹理特征分别输入所述多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量和对应最优细胞纹理特征的表达量标签,得到各个优化回归模型针对最优细胞纹理特征的预测误差;The prediction error determination sub-module is used to input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and output the predicted protein expression according to the plurality of optimal regression models and the corresponding optimal cell texture features. The expression label is used to obtain the prediction error of each optimized regression model for the optimal cell texture feature;
优化回归模型筛选子模块,用于从多个优化回归模型中,确定出针对最优细胞纹理特征的预测误差最小的优化回归模型,作为表达量预测模型。The optimal regression model screening sub-module is used to determine the optimal regression model with the smallest prediction error for the optimal cell texture feature from a plurality of optimal regression models, as the expression amount prediction model.
在一个实施例中,所述目标细胞确定模块,包括:In one embodiment, the target cell determination module includes:
目标表达量确定子模块,用于对多个预测蛋白表达量进行排序,并从排序后的多个预测蛋白表达量中,将排序最前的预设数量的预测蛋白表达量确定为目标表达量;The target expression level determination submodule is used to sort the multiple predicted protein expression levels, and from the multiple predicted protein expression levels after sorting, determine the predicted protein expression level of the first preset number as the target expression level;
待测细胞筛选子模块,用于确定所述目标表达量对应的待测细胞灰度图,并将所述待测细胞灰度图对应的待测细胞确定为目标细胞。The cell to be tested screening sub-module is used to determine the grayscale image of the cell to be tested corresponding to the target expression level, and to determine the cell to be tested corresponding to the grayscale image of the cell to be tested as the target cell.
关于一种基于表达量预测模型的细胞筛选装置的具体限定可以参见上文中对于一种基于表达量预测模型的细胞筛选方法的限定,在此不再赘述。上述一种基于表达量预测模型的细 胞筛选装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of a cell screening device based on an expression level prediction model, please refer to the above definition of a cell screening method based on an expression level prediction model, which will not be repeated here. Each module in the above-mentioned cell screening device based on the expression level prediction model can be realized in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现基于表达量预测模型的细胞筛选方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 9 . The computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies. The computer program, when executed by the processor, implements a cell screening method based on an expression level prediction model. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance. The optimal cell texture feature determined from the texture feature;
将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;Inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and according to the output of the expression prediction model, the predicted protein expression corresponding to the plurality of cells to be tested is obtained respectively; the expression The quantity prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture. The protein expression corresponding to the feature;
根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。According to the predicted protein expression level, a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
在一个实施例中,处理器执行计算机程序时还实现上述其他实施例中的步骤。In one embodiment, when the processor executes the computer program, it also implements the steps in the other embodiments described above.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance. The optimal cell texture feature determined from the texture feature;
将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;Inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and according to the output of the expression prediction model, the predicted protein expression corresponding to the plurality of cells to be tested is obtained respectively; the expression The quantity prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture. The protein expression corresponding to the feature;
根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。According to the predicted protein expression level, a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
在一个实施例中,计算机程序被处理器执行时还实现上述其他实施例中的步骤。In one embodiment, the computer program, when executed by the processor, also implements the steps in the other embodiments described above.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (10)

  1. 一种基于表达量预测模型的细胞筛选方法,其特征在于,所述方法包括:A cell screening method based on an expression level prediction model, characterized in that the method comprises:
    获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance. The optimal cell texture feature determined from the texture feature;
    将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;Inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and according to the output of the expression prediction model, the predicted protein expression corresponding to the plurality of cells to be tested is obtained respectively; the expression The quantity prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture. The protein expression corresponding to the feature;
    根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。According to the predicted protein expression level, a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1, further comprising:
    获取样本细胞灰度图及其对应的荧光图;Obtain the grayscale image of the sample cell and its corresponding fluorescence image;
    获取所述样本细胞灰度图的多个样本细胞纹理特征,以及所述荧光图对应的真实蛋白表达量;Acquiring multiple sample cell texture features of the grayscale image of the sample cells, and the real protein expression corresponding to the fluorescence image;
    根据所述真实蛋白表达量,得到多个样本细胞纹理特征分别对应的表达量标签;According to the real protein expression, obtain the expression labels corresponding to the texture features of a plurality of sample cells respectively;
    采用多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练;Use multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types;
    根据多个回归模型的训练结果,从多个样本细胞纹理特征中筛选出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征,并从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型。According to the training results of multiple regression models, the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and the one with the smallest prediction error is determined from the multiple regression models. A regression model is used as the expression level prediction model.
  3. 根据权利要求2所述的方法,其特征在于,所述采用所述多个样本细胞纹理特征及其对应的表达量标签,对不同类型的多个回归模型进行训练,包括:The method according to claim 2, wherein, using the multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types, comprising:
    将多个样本细胞纹理特征分别输入所述多个回归模型,根据所述多个回归模型输出的预测蛋白表达量和对应样本细胞纹理特征的表达量标签,得到各个回归模型的当前预测误差;Inputting the texture features of multiple sample cells into the multiple regression models respectively, and obtaining the current prediction errors of the respective regression models according to the predicted protein expression levels output by the multiple regression models and the expression level labels corresponding to the texture features of the sample cells;
    针对各个回归模型,根据所述当前预测误差,调整所述回归模型的模型参数,再重新输入样本细胞纹理特征进行模型训练,直到满足训练结束条件,得到所述回归模型的优化回归模型。For each regression model, according to the current prediction error, the model parameters of the regression model are adjusted, and the texture features of the sample cells are re-input to perform model training until the training end condition is met, and an optimized regression model of the regression model is obtained.
  4. 根据权利要求3所述的方法,其特征在于,所述将多个样本细胞纹理特征分别输入所述多个回归模型,包括:The method according to claim 3, wherein the inputting the multiple sample cell texture features into the multiple regression models respectively comprises:
    根据同一样本细胞灰度图对应的多个样本细胞纹理特征,确定多个纹理特征组合;每个纹理特征组合中包含一个或者多个样本细胞纹理特征,且所述多个纹理特征组合对应的表达量标签相同;Determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations The quantity label is the same;
    将多个纹理特征组合分别输入所述多个回归模型。A plurality of texture feature combinations are respectively input into the plurality of regression models.
  5. 根据权利要求4所述的方法,其特征在于,所述从多个样本细胞纹理特征中筛选出对 预测蛋白表达量的贡献程度最高的最优细胞纹理特征,包括:The method according to claim 4, wherein the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of a plurality of samples, comprising:
    将各个纹理特征组合分别输入不同类型的多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量,得到各个纹理特征组合对应的预测误差大小;Inputting each texture feature combination into multiple optimized regression models of different types, and obtaining the prediction error size corresponding to each texture feature combination according to the predicted protein expression output by the multiple optimized regression models;
    针对每个纹理特征组合,根据所述预测误差大小,确定所述纹理特征组合对预测蛋白表达量的贡献程度;For each texture feature combination, according to the size of the prediction error, determine the contribution degree of the texture feature combination to the predicted protein expression level;
    根据多个纹理特征组合各自对预测蛋白表达量的贡献程度,确定出对预测蛋白表达量的贡献程度最高的最优细胞纹理特征。According to the respective contributions of multiple texture feature combinations to the predicted protein expression, the optimal cell texture feature with the highest contribution to the predicted protein expression was determined.
  6. 根据权利要求3所述的方法,其特征在于,所述从所述多个回归模型中确定出预测误差最小的回归模型,作为所述表达量预测模型,包括:The method according to claim 3, wherein, determining the regression model with the smallest prediction error from the plurality of regression models as the expression prediction model, comprising:
    获取多个样本细胞灰度图各自对应的最优细胞纹理特征,得到多个最优细胞纹理特征;Obtain the optimal cell texture features corresponding to each of the grayscale images of multiple sample cells, and obtain multiple optimal cell texture features;
    将所述多个最优细胞纹理特征分别输入所述多个优化回归模型,根据所述多个优化回归模型输出的预测蛋白表达量和对应最优细胞纹理特征的表达量标签,得到各个优化回归模型针对最优细胞纹理特征的预测误差;The multiple optimal cell texture features are respectively input into the multiple optimized regression models, and each optimized regression model is obtained according to the predicted protein expression output from the multiple optimized regression models and the expression label corresponding to the optimal cell texture features. The prediction error of the model for the optimal cell texture feature;
    从多个优化回归模型中,确定出针对最优细胞纹理特征的预测误差最小的优化回归模型,作为表达量预测模型。From multiple optimal regression models, the optimal regression model with the smallest prediction error for the optimal cell texture feature is determined as the expression prediction model.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞,包括:The method according to any one of claims 1 to 6, characterized in that, according to the predicted protein expression, determining a target that the predicted protein expression satisfies a set condition from the plurality of cells to be tested cells, including:
    对多个预测蛋白表达量进行排序,并从排序后的多个预测蛋白表达量中,将排序最前的预设数量的预测蛋白表达量确定为目标表达量;Ranking the multiple predicted protein expression levels, and from the multiple predicted protein expression levels after sorting, determining the predicted protein expression level of the first preset number as the target expression level;
    确定所述目标表达量对应的待测细胞灰度图,并将所述待测细胞灰度图对应的待测细胞确定为目标细胞。The grayscale image of the cell to be tested corresponding to the target expression level is determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  8. 一种基于表达量预测模型的细胞筛选装置,其特征在于,所述装置包括:A cell screening device based on an expression level prediction model, characterized in that the device comprises:
    目标细胞纹理特征获取模块,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,并获取多张待测细胞灰度图分别对应的目标细胞纹理特征;所述目标细胞纹理特征为预先从多种细胞纹理特征中确定出的最优细胞纹理特征;The target cell texture feature acquisition module is used to obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and to obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the target The cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
    细胞表达量预测模块,用于将多个待测细胞的目标细胞纹理特征输入预先训练的表达量预测模型,并根据所述表达量预测模型的输出,得到所述多个待测细胞分别对应的预测蛋白表达量;所述表达量预测模型根据具有表达量标签的多个样本细胞纹理特征训练得到,所述表达量标签用于表征各样本细胞纹理特征对应的真实蛋白表达量,所述表达量预测模型用于预测目标细胞纹理特征对应的蛋白表达量;The cell expression level prediction module is used for inputting the target cell texture features of multiple cells to be tested into the pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding cells of the multiple cells to be tested. Predicting protein expression level; the expression level prediction model is obtained by training based on the texture features of multiple sample cells with expression level labels, and the expression level labels are used to characterize the real protein expression level corresponding to the texture features of each sample cell, and the expression level The prediction model is used to predict the protein expression corresponding to the texture features of the target cells;
    目标细胞确定模块,用于根据所述预测蛋白表达量,从所述多个待测细胞中确定出预测蛋白表达量满足设定条件的目标细胞。The target cell determination module is configured to determine, according to the predicted protein expression amount, target cells whose predicted protein expression amount satisfies a set condition from the plurality of cells to be tested.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述的基于表达量预测模型 的细胞筛选方法的步骤。A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the expression-based prediction according to any one of claims 1 to 7 when the processor executes the computer program Steps of the Cell Screening Method of the Model.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的基于表达量预测模型的细胞筛选方法的步骤。A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the cell screening method based on an expression level prediction model according to any one of claims 1 to 7 is realized A step of.
PCT/CN2021/114168 2020-08-26 2021-08-24 Cell screening method and apparatus based on expression level prediction model WO2022042509A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010870681.2A CN112017730B (en) 2020-08-26 2020-08-26 Cell screening method and device based on expression quantity prediction model
CN202010870681.2 2020-08-26

Publications (1)

Publication Number Publication Date
WO2022042509A1 true WO2022042509A1 (en) 2022-03-03

Family

ID=73502282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114168 WO2022042509A1 (en) 2020-08-26 2021-08-24 Cell screening method and apparatus based on expression level prediction model

Country Status (2)

Country Link
CN (1) CN112017730B (en)
WO (1) WO2022042509A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153240A (en) * 2023-08-18 2023-12-01 国家超级计算天津中心 Oxygen free radical based relationship determination method, device, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017730B (en) * 2020-08-26 2022-08-09 深圳太力生物技术有限责任公司 Cell screening method and device based on expression quantity prediction model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104404082A (en) * 2014-11-19 2015-03-11 上海美百瑞生物医药技术有限公司 Efficient screening method of exogenous protein expression cell strain
CN104850860A (en) * 2015-05-25 2015-08-19 广西师范大学 Cell image recognition method and cell image recognition device
US20190012521A1 (en) * 2015-08-12 2019-01-10 Molecular Devices, Llc System and Method for Automatically Analyzing Phenotypical Responses of Cells
CN109740560A (en) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 Human cellular protein automatic identifying method and system based on convolutional neural networks
CN109815870A (en) * 2019-01-17 2019-05-28 华中科技大学 The high-throughput functional gene screening technique and system of cell phenotype image quantitative analysis
CN112001329A (en) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 Method and device for predicting protein expression amount, computer device and storage medium
CN112017730A (en) * 2020-08-26 2020-12-01 东莞太力生物工程有限公司 Cell screening method and device based on expression quantity prediction model
CN112037862A (en) * 2020-08-26 2020-12-04 东莞太力生物工程有限公司 Cell screening method and device based on convolutional neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897984A (en) * 2018-05-07 2018-11-27 上海理工大学 Based on correlation analysis between CT images group feature and lung cancer gene expression
CN109948429A (en) * 2019-01-28 2019-06-28 上海依智医疗技术有限公司 Image analysis method, device, electronic equipment and computer-readable medium
CN110119710A (en) * 2019-05-13 2019-08-13 广州锟元方青医疗科技有限公司 Cell sorting method, device, computer equipment and storage medium
CN110838126B (en) * 2019-10-30 2020-11-17 东莞太力生物工程有限公司 Cell image segmentation method, cell image segmentation device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104404082A (en) * 2014-11-19 2015-03-11 上海美百瑞生物医药技术有限公司 Efficient screening method of exogenous protein expression cell strain
CN104850860A (en) * 2015-05-25 2015-08-19 广西师范大学 Cell image recognition method and cell image recognition device
US20190012521A1 (en) * 2015-08-12 2019-01-10 Molecular Devices, Llc System and Method for Automatically Analyzing Phenotypical Responses of Cells
CN109740560A (en) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 Human cellular protein automatic identifying method and system based on convolutional neural networks
CN109815870A (en) * 2019-01-17 2019-05-28 华中科技大学 The high-throughput functional gene screening technique and system of cell phenotype image quantitative analysis
CN112001329A (en) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 Method and device for predicting protein expression amount, computer device and storage medium
CN112017730A (en) * 2020-08-26 2020-12-01 东莞太力生物工程有限公司 Cell screening method and device based on expression quantity prediction model
CN112037862A (en) * 2020-08-26 2020-12-04 东莞太力生物工程有限公司 Cell screening method and device based on convolutional neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153240A (en) * 2023-08-18 2023-12-01 国家超级计算天津中心 Oxygen free radical based relationship determination method, device, equipment and medium

Also Published As

Publication number Publication date
CN112017730A (en) 2020-12-01
CN112017730B (en) 2022-08-09

Similar Documents

Publication Publication Date Title
WO2022042510A1 (en) Protein expression quantity prediction method and apparatus, computer device, and storage medium
WO2022042506A1 (en) Convolutional neural network-based cell screening method and device
Berntsen et al. Robust and generalizable embryo selection based on artificial intelligence and time-lapse image sequences
WO2022042509A1 (en) Cell screening method and apparatus based on expression level prediction model
Schulz et al. Exploiting citation networks for large-scale author name disambiguation
Frise et al. Systematic image‐driven analysis of the spatial Drosophila embryonic expression landscape
JP2021503666A (en) Systems and methods for single-channel whole-cell segmentation
CN110378206B (en) Intelligent image examination system and method
WO2020232874A1 (en) Modeling method and apparatus based on transfer learning, and computer device and storage medium
CN110890137A (en) Modeling method, device and application of compound toxicity prediction model
US20190073570A1 (en) Model-based analysis in a relational database
US20220366710A1 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN111047563A (en) Neural network construction method applied to medical ultrasonic image
CN110969600A (en) Product defect detection method and device, electronic equipment and storage medium
CN114494168A (en) Model determination, image recognition and industrial quality inspection method, equipment and storage medium
Chen et al. Evaluation of cell segmentation methods without reference segmentations
CN113408802B (en) Energy consumption prediction network training method and device, energy consumption prediction method and device, and computer equipment
Xu et al. TrichomeYOLO: A Neural Network for Automatic Maize Trichome Counting
Ridhovan et al. Disease Detection in Banana Leaf Plants using DenseNet and Inception Method
CN114664382B (en) Multi-group association analysis method and device and computing equipment
CN113344079B (en) Image tag semi-automatic labeling method, system, terminal and medium
Vanea et al. HAPPY: A deep learning pipeline for mapping cell-to-tissue graphs across placenta histology whole slide images
CN113095589A (en) Population attribute determination method, device, equipment and storage medium
Johnson et al. Recombination rate inference via deep learning is limited by sequence diversity
Itano et al. An automated image analysis and cell identification system using machine learning methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860344

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/07/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21860344

Country of ref document: EP

Kind code of ref document: A1