WO2022042509A1 - Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression - Google Patents

Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression Download PDF

Info

Publication number
WO2022042509A1
WO2022042509A1 PCT/CN2021/114168 CN2021114168W WO2022042509A1 WO 2022042509 A1 WO2022042509 A1 WO 2022042509A1 CN 2021114168 W CN2021114168 W CN 2021114168W WO 2022042509 A1 WO2022042509 A1 WO 2022042509A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
cells
protein expression
texture features
expression
Prior art date
Application number
PCT/CN2021/114168
Other languages
English (en)
Chinese (zh)
Inventor
陈亮
哈斯木买买提依明
韩晓健
梁楚亨
梁国龙
Original Assignee
深圳太力生物技术有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳太力生物技术有限责任公司 filed Critical 深圳太力生物技术有限责任公司
Publication of WO2022042509A1 publication Critical patent/WO2022042509A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • G06T2207/10061Microscopic image from scanning electron microscope
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10064Fluorescence image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro

Definitions

  • the present application relates to the field of biotechnology, in particular to a cell screening method, device, computer equipment and storage medium based on an expression level prediction model.
  • the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained.
  • Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
  • a cell screening method based on an expression level prediction model comprising:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, and the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and the one with the smallest prediction error is determined from the multiple regression models.
  • a regression model is used as the expression level prediction model.
  • multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types, including:
  • the model parameters of the regression model are adjusted, and then the texture features of the sample cells are re-input to perform model training until the training end condition is met, and an optimized regression model of the regression model is obtained.
  • the inputting the texture features of multiple sample cells into the multiple regression models includes:
  • each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations
  • the quantity label is the same
  • a plurality of texture feature combinations are respectively input into the plurality of regression models.
  • the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of a plurality of samples, including:
  • the optimal cell texture feature with the highest contribution to the predicted protein expression was determined.
  • the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, including:
  • the multiple optimal cell texture features are respectively input into the multiple optimized regression models, and each optimized regression model is obtained according to the predicted protein expression output from the multiple optimized regression models and the expression label corresponding to the optimal cell texture features.
  • the optimal regression model with the smallest prediction error for the optimal cell texture feature is determined as the expression prediction model.
  • the target cells whose predicted protein expression meets the set condition are determined from the plurality of cells to be tested, including:
  • the grayscale image of the cell to be tested corresponding to the target expression level is determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • a cell screening device based on an expression level prediction model comprising:
  • the target cell texture feature acquisition module is used to obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and to obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested;
  • the target The cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
  • the cell expression level prediction module is used for inputting the target cell texture features of multiple cells to be tested into the pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding cells of the multiple cells to be tested.
  • Predicting protein expression level the expression level prediction model is obtained by training based on the texture features of multiple sample cells with expression level labels, and the expression level labels are used to characterize the real protein expression level corresponding to the texture features of each sample cell, and the expression level
  • the prediction model is used to predict the protein expression corresponding to the texture features of the target cells;
  • the target cell determination module is configured to determine, according to the predicted protein expression amount, target cells whose predicted protein expression amount satisfies a set condition from the plurality of cells to be tested.
  • a computer device includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the processor implements the steps of the above-described cell screening method based on an expression level prediction model.
  • the above-mentioned cell screening method, device, computer equipment and storage medium based on an expression level prediction model by acquiring grayscale images of cells to be tested corresponding to a plurality of cells to be tested in a cell culture tank, and obtaining a plurality of grayscale images of cells to be tested.
  • the target cell texture features corresponding to the degree map respectively, input the target cell texture features of multiple test cells into the pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the multiple test cells according to the output of the expression prediction model.
  • the target cells whose predicted protein expression meets the set conditions are determined from multiple cells to be tested, which realizes the rapid determination of cells with high protein expression and avoids the need for repeated culture and screening.
  • Cell screening can greatly shorten the screening period, and the application can quickly process millions of single cells, while increasing the range of cell screening, reducing the workload of staff and effectively improving the efficiency of cell screening.
  • FIG. 1 is a schematic flowchart of a cell screening method based on an expression level prediction model in one embodiment
  • FIG. 2 is a schematic flowchart of steps of generating an expression level prediction model in one embodiment
  • Figure 3a is a grayscale image of a sample cell in one embodiment
  • Fig. 3b is a kind of cell fluorescence map in one embodiment
  • FIG. 4 is a schematic flowchart of steps of a regression model optimization method in one embodiment
  • FIG. 5 is a schematic flowchart of steps of determining optimal cell texture features in one embodiment
  • FIG. 6 is a schematic flowchart of steps of screening and optimizing regression models in one embodiment
  • FIG. 7 is a schematic flow chart of cell screening in one embodiment
  • FIG. 8 is a structural block diagram of a cell screening device based on an expression level prediction model in one embodiment
  • Figure 9 is a diagram of the internal structure of a computer device in one embodiment.
  • the cells in the cell pool can be transfected first, and the cell pool can be processed by a limiting dilution method to obtain a single cell, and then a single cell can be obtained.
  • Cells are cultured with homogeneous cell populations, namely cell lines, and the cell lines with high target protein expression are screened.
  • a cell screening method based on an expression level prediction model is provided.
  • the method is applied to a terminal for illustration. It is understood that this method can also be applied to a terminal.
  • the server can also be applied to a system including a terminal and a server, and the method is implemented through the interaction between the terminal and the server.
  • the method includes the following steps:
  • Step 101 Obtain the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from The optimal cell texture feature determined from a variety of cell texture features.
  • the cells to be tested may be cells that have been processed by transfection technology
  • the cells to be tested may be cells that fail to obtain exogenous DNA fragments after treatment with transfection technology, or cells that have obtained exogenous DNA fragments but have not integrated Cells that have been integrated into chromosomes, or cells whose exogenous DNA fragments have been integrated into chromosomes
  • the grayscale image of the cell to be tested is the grayscale image of the cell to be tested
  • the texture feature of the target cell is the image feature that reflects the grayscale image of the cell to be tested. Information.
  • multiple cells in the cell culture pool can be transfected, so that some or all of the cells in the cell culture pool can obtain exogenous DNA fragments.
  • the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank can be obtained through a photomicrography device, and the texture of the target cells corresponding to each grayscale image of the cells to be tested can be determined.
  • the target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features.
  • Step 102 Input the target cell texture features of a plurality of cells to be tested into a pre-trained expression prediction model, and obtain the predicted protein expression corresponding to the plurality of cells to be tested according to the output of the expression prediction model;
  • the expression level prediction model is obtained by training based on the texture features of a plurality of sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell, and the expression level prediction model is used to predict The protein expression corresponding to the texture features of the target cells.
  • the predicted protein expression is the protein expression predicted by the expression prediction model based on the texture features of the target cells, and the expression prediction model is obtained by training the texture features of multiple sample cells with expression labels.
  • the expression prediction model can be used It is used to predict the protein expression corresponding to the texture features of the target cells.
  • the expression label is used to characterize the real protein amount corresponding to the texture features of the sample cells, and the real protein amount corresponding to the texture features of the sample cells refers to the real protein expression amount of the cells in the grayscale image of the sample cells.
  • the texture features of the target cells can be input into a pre-trained expression prediction model, and according to the output of the expression prediction model, multiple cells to be tested can be obtained The corresponding predicted protein expression levels.
  • the protein expression level of the cells can be determined by the fluorescence image of the cells. However, taking the fluorescence image of the cells will cause the cells to lose activity and the cells cannot continue to proliferate.
  • the predicted protein expression level is obtained through the cell texture feature of the grayscale image of the cell to be tested, which can avoid cell inactivation while estimating the protein expression level.
  • Step 103 according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets the set condition.
  • the cells to be tested whose predicted protein expression level meets the set conditions can be determined as target cells.
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank can be obtained, and the texture features of the target cells corresponding to the grayscale images of the cells to be tested can be obtained, and the plurality of grayscale images of the cells to be tested can be obtained.
  • the target cell texture features of the cells are input into the pre-trained expression prediction model, and the predicted protein expression corresponding to the multiple cells to be tested is obtained according to the output of the expression prediction model.
  • the cell screening method based on the expression level prediction model may further include the following steps:
  • Step 201 Obtain a grayscale image of the sample cell and its corresponding fluorescence image.
  • the grayscale image of the sample cell and its corresponding fluorescence image are the grayscale image and the fluorescence image obtained by photographing the same cell under the same shooting conditions.
  • cells as a training set can be set, and the cells are photographed by a microscopic photographing device to obtain a grayscale image and a corresponding fluorescence image of the sample cells.
  • the cells used as the training set are the cells in the cell culture pool that have been processed by transfection technology.
  • the cells can be cells that have not been able to obtain exogenous DNA fragments after the transfection technology treatment, or can be cells that have obtained foreign DNA fragments.
  • the grayscale image and the fluorescence image can be captured simultaneously with a microscope under the same shooting conditions, and the obtained grayscale image and fluorescence image can include one or more cells , the coordinates of each cell in the grayscale image correspond to the coordinates of that cell in the fluorescence image. Since the same grayscale image and fluorescence image can contain multiple cells at the same time, after obtaining the grayscale image and the fluorescence image, image preprocessing can be performed on the grayscale image and the fluorescence image to obtain the sample cell grayscale image corresponding to a single cell and Fluorescence images, as shown in Figure 3a and Figure 3b, wherein the image preprocessing can include cell segmentation processing, adhesion cell filtering processing.
  • Step 202 Obtain a plurality of sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image.
  • multiple texture features of the sample cells can be extracted from the grayscale image of the sample cells, and the real protein expression corresponding to the fluorescence image can be obtained.
  • the corresponding fluorescence maps and grayscale maps are inconsistent, that is, the protein expression levels of cells can correspond to the texture features of the sample cells in the fluorescence map and the grayscale map.
  • the texture feature extraction algorithm can be used to extract multiple sample cell texture features from the sample cell grayscale image, and the sample cell texture features corresponding to the multiple sample cell grayscale images can form a texture feature matrix.
  • the texture features of m sample cells are extracted from n grayscale images of sample cells, which can form an n*m texture feature matrix.
  • the sample cell texture features may include any one or more of the following: Gray Level Co-occurrence Matrix Feature (Grey Level Co-occurrence Matrix Feature), Histogram Feature (Histogram Feature), (Laws Energy Texture Feature), Local Binary Pattern Feature (Local Binary Pattern Feature), discrete wavelet transform feature (Discrete Wavelet Transform), of course, those skilled in the art can select other texture features as needed, which is not limited in this application.
  • the real protein expression level of the cells in the corresponding grayscale image of the sample cells can be determined according to the fluorescence image.
  • proteins produced by genes of interest such as exogenous DNA fragments, can fluoresce at specific wavelengths.
  • the G value corresponding to the green channel of the fluorescence image also called the fluorescence value
  • the G value can be the total G value or the average G value of the fluorescence image.
  • the actual protein expression corresponding to the fluorescence map there is a positive correlation between the G value and the protein expression, that is, the higher the G value, the higher the protein expression of the corresponding cell. Based on this, the real protein expression can be determined by the G value of the fluorescence map.
  • Step 203 according to the real protein expression, obtain expression labels corresponding to the texture features of a plurality of sample cells respectively.
  • the expression labels corresponding to the texture features of multiple sample cells can be determined according to the actual protein expression.
  • the real protein expression level may be used as the expression level label of the grayscale image of the corresponding sample cell, and the expression level label of the grayscale image of the sample cell is the expression level label corresponding to the texture feature of the sample cell.
  • the texture features of multiple sample cells extracted from the grayscale image of the same sample cells have the same expression label; the texture features of sample cells extracted from the grayscale images of different sample cells, the expression labels are the same as the cells in the grayscale image of the sample cells. corresponds to the actual protein expression.
  • Step 204 using multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types.
  • regression models can be preset, such as SVR (Support Vector Regression, Support Vactor Regression) model, ElasticNet (Elastic Network) model, Xgboost model, Gradient Boosting Regression model, Logistic Regression model. Since different regression models have different implementation forms, that is, the underlying mathematical principles are different, different prediction results can be obtained by using different regression models to analyze the extracted sample cell texture features and predict protein expression.
  • SVR Serial Vector Regression, Support Vactor Regression
  • ElasticNet Elastic Network
  • Xgboost model Gradient Boosting Regression model
  • Logistic Regression model Logistic Regression model. Since different regression models have different implementation forms, that is, the underlying mathematical principles are different, different prediction results can be obtained by using different regression models to analyze the extracted sample cell texture features and predict protein expression.
  • Step 205 according to the training results of the multiple regression models, select the optimal cell texture feature with the highest contribution to the predicted protein expression from the cell texture features of the multiple samples, and determine the prediction from the multiple regression models.
  • the regression model with the smallest error is used as the expression level prediction model.
  • the optimal cell texture features with the highest contribution to the predicted protein expression are selected from the cell texture features of multiple samples, and based on the optimal cell texture features Cell texture features, from multiple regression models, determine the regression model with the smallest prediction error as the expression prediction model.
  • the optimal cell texture features may include one or more sample cell texture features.
  • multiple sample cell texture features and their corresponding expression labels are used to train multiple regression models of different types, and according to the training results of multiple regression models, the texture features of multiple sample cells are screened out.
  • the optimal sample cell texture feature that contributes the most to the predicted protein expression, and the regression model with the smallest prediction error is determined from multiple regression models.
  • As an expression prediction model it can be based on multiple sample cell texture features. The cell image was examined in three dimensions, and the relationship between the cell grayscale image and the protein expression of the cell was established, which provided the basis for the rapid prediction of protein expression.
  • using the multiple sample cell texture features and their corresponding expression labels to train multiple regression models of different types may include the following steps:
  • Step 401 Input the texture features of multiple sample cells into the multiple regression models respectively, and obtain the current predictions of the respective regression models according to the predicted protein expression levels output by the multiple regression models and the expression level labels corresponding to the texture features of the sample cells. error.
  • the texture features of multiple sample cells can be input into multiple regression models respectively, and the predicted protein expression levels output by the multiple regression models can be obtained. , to determine the current prediction error of each regression model.
  • Step 402 For each regression model, according to the current prediction error, adjust the model parameters of the regression model, and then re-input the sample cell texture features for model training, until the training end condition is met, and an optimized regression model of the regression model is obtained. .
  • the model parameters of the regression model can be adjusted and optimized according to the current prediction error, and the sample cell texture features are input into the regression model again for model training.
  • the model can obtain the current prediction error based on the current input cell texture feature again, and judge whether the current prediction error meets the training end condition. If the current prediction error meets the training end condition, the current regression model can be determined as the optimized regression model; if If the current prediction error does not meet the training end condition, the above steps can be repeated to continue to optimize the model parameters of the regression model.
  • the model parameters of the regression model are continuously adjusted until an optimized regression model is obtained, and the regression model can be continuously optimized through the method of machine-supervised learning to improve the regression model. prediction accuracy.
  • the inputting the multiple sample cell texture features into the multiple regression models may include the following steps:
  • each texture feature combination includes one or more sample cell texture features, and the corresponding expression of the multiple texture feature combinations
  • the quantity labels are the same; multiple texture feature combinations are input into the multiple regression models respectively.
  • the texture features of multiple sample cells can be combined to obtain multiple texture features.
  • Combination in each texture feature combination, one or more sample cell texture features are included. Since the sample cell texture features in multiple texture feature combinations are all from the same sample cell grayscale image, multiple texture feature combinations have the same expression label. After obtaining multiple texture feature combinations, the multiple texture feature combinations can be input into multiple regression models respectively.
  • methods such as correlation analysis can be used to analyze the texture features of multiple sample cells to obtain a correlation score corresponding to the texture features of each sample cell, and the correlation score is used to characterize the corresponding sample.
  • the importance of cell texture features in predicting protein expression is positively correlated, that is, the higher the correlation score of sample cell texture features, the more important it plays in predicting protein expression.
  • the cell texture features of multiple samples can be sorted according to the correlation score from high to low.
  • the sample cell texture features can be selected in turn to generate a texture feature combination.
  • the sample cell texture feature A with the highest score can be selected as a cell texture feature combination, and then the sample cell texture feature B with the second highest score and the original sample cell texture feature A can be selected to generate a cell texture feature combination.
  • Select the sample cell texture feature C with the lowest score and generate a cell texture feature combination with the original sample cell texture feature A and the sample cell texture feature B.
  • the texture features of multiple sample cells are combined to determine multiple texture feature combinations, and the multiple texture feature combinations are respectively input into multiple regression models, so that the influence of different sample cell features on the prediction effect of the regression model can be comprehensively evaluated , to improve the prediction accuracy of the regression model.
  • the optimal cell texture feature with the highest contribution to the predicted protein expression is selected from the cell texture features of multiple samples, including:
  • Step 501 Input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output from the multiple optimized regression models.
  • each texture feature combination can be input into multiple optimized regression models of different types to obtain the predicted protein expression output by the multiple optimized regression models, and then the corresponding texture feature combinations can be obtained.
  • Expression label according to the output predicted protein expression and expression label, to determine the size of the prediction error corresponding to each texture feature combination. By obtaining the prediction error size, the prediction effect of the optimized regression model can be verified.
  • Step 502 For each texture feature combination, determine the contribution degree of the texture feature combination to the predicted protein expression according to the prediction error.
  • Step 503 Determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of the multiple texture feature combinations to the predicted protein expression.
  • the contribution of the texture feature combination to the predicted protein expression can be determined according to the size of the prediction error. By comparing the contribution of each texture feature combination to the predicted protein expression, the predicted protein expression can be determined.
  • the number of optimized regression models whose prediction error is smaller than a preset threshold can be determined, and the number is positively correlated with the contribution degree of the texture feature combination.
  • the type of optimization regression model it can be obtained that the prediction error size is smaller than the preset prediction result, and the texture feature combination has a high degree of contribution to the accurate prediction of protein expression.
  • the texture feature combination that simultaneously produces the best prediction results for each type of optimized regression model can be determined, and the sample cell texture feature in the texture feature combination is the most Excellent cell texture features.
  • the prediction error sizes corresponding to multiple texture feature combinations can be sorted from small to large, and the highest ranked A preset error size of a preset number is determined as the best prediction result.
  • a texture feature combination that simultaneously makes each type of optimization regression model produce the best prediction result can be determined, and the sample cell texture feature in the texture feature combination is the optimal sample cell texture feature.
  • the optimal sample cell texture feature with the highest contribution to the predicted protein expression can be determined, which can be used in the subsequent prediction of the protein expression.
  • the cell texture features of the sample are extracted more targetedly, so as to avoid excessive extraction of other cell texture features with lower contributions, which may interfere with the prediction results or waste computing resources.
  • the regression model with the smallest prediction error is determined from the multiple regression models as the expression prediction model, which may include the following steps:
  • Step 601 Obtain the optimal cell texture features corresponding to each of the multiple sample cell grayscale images, and obtain a plurality of optimal cell texture features.
  • the optimal cell texture features corresponding to each of the grayscale images of multiple sample cells can be obtained, and multiple optimal cell texture features can be obtained.
  • Step 602 Input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and obtain the predicted protein expression output from the plurality of optimal regression models and the expression label corresponding to the optimal cell texture features. The prediction error of each optimized regression model for the optimal cell texture feature.
  • the expression labels corresponding to each of the multiple optimal cell texture features can be determined, and the multiple optimal cell texture features can be input into multiple optimized regression models respectively to obtain the predicted protein expression output by the multiple optimized regression models, and then Multiple output predicted protein expression levels and corresponding expression level labels can be used to determine the prediction error when each optimized regression model uses the optimal sample cell texture feature for prediction.
  • Step 603 from a plurality of optimal regression models, determine the optimal regression model with the smallest prediction error for the optimal cell texture feature as the expression prediction model.
  • the optimal regression model with the smallest prediction error can be determined as the expression amount prediction model.
  • the multiple optimal cell texture features can be divided into ten equal parts, and ten experiments are performed, of which nine parts are used for training , Validate the regression model, one for testing the trained regression model. Furthermore, the average prediction error of ten experiments can be obtained, the expression amount prediction model can be determined from multiple types of regression models according to the average prediction error, and the performance of the expression amount prediction model can be evaluated.
  • multiple optimal cell texture features are respectively input into multiple optimized regression models, and according to the predicted protein expression levels output by the multiple optimized regression models, and the expression level labels corresponding to the optimal cell texture features, from multiple
  • the expression prediction model is determined from the optimal regression model, and the regression model with accurate prediction effect can be determined from multiple types of regression models based on the optimal texture feature that contributes the most to the protein expression.
  • determining the target cells whose predicted protein expression meets the set condition from the plurality of cells to be tested may include the following steps:
  • the grayscale image of the cell is measured, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • the multiple predicted protein expression levels can be sorted, and from the sorted multiple predicted protein expression levels, the first predicted protein expression level is sorted.
  • the predicted protein expression level of the number is determined as the target expression level.
  • the predicted protein expression levels may be sorted in descending order, that is, sorted from large to small, and after sorting, the predicted protein expression levels corresponding to the top N names may be determined as the target expression levels.
  • the predicted protein expression that exceeds the preset expression threshold can also be determined as the target expression.
  • the grayscale image of the cell to be tested corresponding to the target expression level can be determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell.
  • the target cells can be used to culture cell lines.
  • a plurality of predicted protein expression levels are sorted, and according to the sorted multiple predicted protein expression levels, from a plurality of cells to be tested, the predicted protein expression levels of the first preset number of cells are sorted Determined as target cells, cells with high protein expression can be quickly screened, which greatly reduces the screening workload.
  • aflibercept expression plasmid can be used to transfect multiple CHO-K1 host cells to obtain multiple CHO-K1 host cells treated with transfection technology, that is, the cells to be tested in this application. After obtaining a plurality of cells to be tested, as shown in FIG. 7 , a grayscale image of the cells to be tested corresponding to each of the plurality of cells to be tested can be obtained by microphotography.
  • the texture feature analysis of the grayscale image of the cell to be tested can be performed to obtain the cell texture feature, and the cell texture feature can be input into the expression prediction model to predict the protein expression of the cell and obtain the model output. predicted protein expression.
  • the grayscale images of the cells to be tested After the grayscale images of the cells to be tested are processed, it can be determined whether the processing of all grayscale images of the cells to be tested is completed. If not, the steps of performing texture feature analysis on the grayscale images of the cells to be tested and obtaining cell texture features can be returned; , according to the predicted protein expression levels of multiple cells to be tested, the expression levels of multiple cells to be tested can be sorted, and the cells to be tested with high protein expression levels can be determined from the sorting results, and a screening report can be generated and submitted.
  • a cell screening device based on an expression level prediction model comprising:
  • the target cell texture feature acquisition module 801 is used to acquire the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank, and to acquire the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the The target cell texture feature is the optimal cell texture feature determined in advance from a variety of cell texture features;
  • the cell expression level prediction module 802 is used for inputting the target cell texture features of a plurality of cells to be tested into a pre-trained expression level prediction model, and according to the output of the expression level prediction model, obtain the corresponding corresponding
  • the predicted protein expression level is obtained by training the expression level prediction model according to the texture features of multiple sample cells with expression level labels, and the expression level labels are used to represent the real protein expression levels corresponding to the texture features of each sample cell.
  • the quantity prediction model is used to predict the protein expression quantity corresponding to the texture feature of the target cell;
  • the target cell determination module 803 is configured to, according to the predicted protein expression, determine, from the plurality of cells to be tested, target cells whose predicted protein expression meets a set condition.
  • it also includes:
  • the image acquisition module is used to acquire the grayscale image of the sample cell and its corresponding fluorescence image
  • a sample cell texture feature acquisition module used for acquiring multiple sample cell texture features of the sample cell grayscale image and the real protein expression corresponding to the fluorescence image
  • an expression label determination module configured to obtain expression labels corresponding to texture features of a plurality of sample cells according to the real protein expression
  • the training module is used to train multiple regression models of different types by using multiple sample cell texture features and their corresponding expression labels;
  • the expression level prediction model determination module is used to select the optimal cell texture feature with the highest contribution to the predicted protein expression level from the cell texture features of multiple samples according to the training results of multiple regression models, Among the regression models, the regression model with the smallest prediction error is determined as the expression prediction model.
  • the training module includes:
  • the sample cell texture feature input sub-module is used to input the multiple sample cell texture features into the multiple regression models respectively, and according to the predicted protein expression output from the multiple regression models and the expression label corresponding to the sample cell texture features, Get the current prediction error of each regression model;
  • the regression model optimization sub-module is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met, and the regression model is obtained.
  • the optimized regression model of the model is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met, and the regression model is obtained.
  • the optimized regression model of the model is used to adjust the model parameters of the regression model according to the current prediction error for each regression model, and then re-input the sample cell texture features for model training until the training end condition is met.
  • the sample cell texture feature input sub-module includes:
  • the texture feature combination determination unit is used to determine multiple texture feature combinations according to multiple sample cell texture features corresponding to the same sample cell grayscale image; each texture feature combination includes one or more sample cell texture features, and the The expression labels corresponding to multiple texture feature combinations are the same;
  • the texture feature combination input unit is used for inputting multiple texture feature combinations into the multiple regression models respectively.
  • the expression level prediction model determination module includes:
  • the prediction error size determination sub-module is used to input each texture feature combination into multiple optimized regression models of different types, and obtain the prediction error size corresponding to each texture feature combination according to the predicted protein expression output by the multiple optimized regression models ;
  • Contribution degree determination sub-module for each texture feature combination, according to the prediction error size, to determine the contribution degree of the texture feature combination to the predicted protein expression level
  • the optimal cell texture feature determination sub-module is used to determine the optimal cell texture feature with the highest contribution to the predicted protein expression according to the respective contribution degrees of multiple texture feature combinations to the predicted protein expression.
  • the expression level prediction model determination module includes:
  • the optimal cell texture feature acquisition sub-module is used to obtain the respective optimal cell texture features corresponding to the grayscale images of multiple sample cells, and obtain multiple optimal cell texture features;
  • the prediction error determination sub-module is used to input the plurality of optimal cell texture features into the plurality of optimal regression models respectively, and output the predicted protein expression according to the plurality of optimal regression models and the corresponding optimal cell texture features.
  • the expression label is used to obtain the prediction error of each optimized regression model for the optimal cell texture feature;
  • the optimal regression model screening sub-module is used to determine the optimal regression model with the smallest prediction error for the optimal cell texture feature from a plurality of optimal regression models, as the expression amount prediction model.
  • the target cell determination module includes:
  • the target expression level determination submodule is used to sort the multiple predicted protein expression levels, and from the multiple predicted protein expression levels after sorting, determine the predicted protein expression level of the first preset number as the target expression level;
  • the cell to be tested screening sub-module is used to determine the grayscale image of the cell to be tested corresponding to the target expression level, and to determine the cell to be tested corresponding to the grayscale image of the cell to be tested as the target cell.
  • Each module in the above-mentioned cell screening device based on the expression level prediction model can be realized in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 9 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer program when executed by the processor, implements a cell screening method based on an expression level prediction model.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 9 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the processor when the processor executes the computer program, it also implements the steps in the other embodiments described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank, and obtain the texture features of the target cells corresponding to the grayscale images of the cells to be tested; the texture features of the target cells are obtained from a variety of cells in advance.
  • the expression prediction model is trained according to the texture features of multiple sample cells with expression quantity labels, the expression quantity labels are used to represent the real protein expression corresponding to the texture features of each sample cell, and the expression quantity prediction model is used to predict the target cell texture.
  • the protein expression corresponding to the feature
  • a target cell whose predicted protein expression level satisfies the set condition is determined from the plurality of cells to be tested.
  • the computer program when executed by the processor, also implements the steps in the other embodiments described above.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression, dispositif informatique et support d'enregistrement. Le procédé comprend : l'acquisition d'images d'échelle de gris de cellules devant être prédites respectivement correspondant à de multiples cellules devant être prédites dans un réservoir de culture cellulaire, et l'acquisition de caractéristiques de texture cellulaire cible correspondant respectivement à de multiples images d'échelle de gris de cellules à prédire, les caractéristiques de texture cellulaire cible étant des caractéristiques de texture cellulaire optimale déterminées à partir de multiples caractéristiques de texture cellulaire à l'avance (101) ; l'entrée des caractéristiques de texture cellulaire cible des multiples cellules à prédire dans un modèle de prédiction de niveau d'expression pré-entraîné, et l'obtention, en fonction d'une sortie du modèle de prédiction de niveau d'expression, de niveaux d'expression protéique prédits correspondant respectivement aux multiples cellules à prédire (102) ; et la détermination, à partir des multiples cellules à prédire, selon les niveaux d'expression protéique prédits, d'une cellule cible ayant le niveau d'expression protéique prédit satisfaisant une condition définie (103). Une cellule ayant un niveau d'expression protéique élevé est déterminée rapidement, un criblage cellulaire peut être effectué sans réaliser de culture ni de criblage répétés, et ainsi la période de criblage est considérablement raccourcie.
PCT/CN2021/114168 2020-08-26 2021-08-24 Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression WO2022042509A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010870681.2A CN112017730B (zh) 2020-08-26 2020-08-26 基于表达量预测模型的细胞筛选方法和装置
CN202010870681.2 2020-08-26

Publications (1)

Publication Number Publication Date
WO2022042509A1 true WO2022042509A1 (fr) 2022-03-03

Family

ID=73502282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114168 WO2022042509A1 (fr) 2020-08-26 2021-08-24 Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression

Country Status (2)

Country Link
CN (1) CN112017730B (fr)
WO (1) WO2022042509A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153240A (zh) * 2023-08-18 2023-12-01 国家超级计算天津中心 基于氧自由基的关系确定方法、装置、设备及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017730B (zh) * 2020-08-26 2022-08-09 深圳太力生物技术有限责任公司 基于表达量预测模型的细胞筛选方法和装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104404082A (zh) * 2014-11-19 2015-03-11 上海美百瑞生物医药技术有限公司 一种高效的外源蛋白表达细胞株的筛选方法
CN104850860A (zh) * 2015-05-25 2015-08-19 广西师范大学 细胞图像识别方法及细胞图像识别装置
US20190012521A1 (en) * 2015-08-12 2019-01-10 Molecular Devices, Llc System and Method for Automatically Analyzing Phenotypical Responses of Cells
CN109740560A (zh) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 基于卷积神经网络的人体细胞蛋白质自动识别方法及系统
CN109815870A (zh) * 2019-01-17 2019-05-28 华中科技大学 细胞表型图像定量分析的高通量功能基因筛选方法及系统
CN112001329A (zh) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 蛋白表达量的预测方法、装置、计算机设备和存储介质
CN112017730A (zh) * 2020-08-26 2020-12-01 东莞太力生物工程有限公司 基于表达量预测模型的细胞筛选方法和装置
CN112037862A (zh) * 2020-08-26 2020-12-04 东莞太力生物工程有限公司 基于卷积神经网络的细胞筛选方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897984A (zh) * 2018-05-07 2018-11-27 上海理工大学 基于ct影像组学特征与肺癌基因表达间相关性分析方法
CN109948429A (zh) * 2019-01-28 2019-06-28 上海依智医疗技术有限公司 图像分析方法、装置、电子设备及计算机可读介质
CN110119710A (zh) * 2019-05-13 2019-08-13 广州锟元方青医疗科技有限公司 细胞分类方法、装置、计算机设备和存储介质
CN110838126B (zh) * 2019-10-30 2020-11-17 东莞太力生物工程有限公司 细胞图像分割方法、装置、计算机设备和存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104404082A (zh) * 2014-11-19 2015-03-11 上海美百瑞生物医药技术有限公司 一种高效的外源蛋白表达细胞株的筛选方法
CN104850860A (zh) * 2015-05-25 2015-08-19 广西师范大学 细胞图像识别方法及细胞图像识别装置
US20190012521A1 (en) * 2015-08-12 2019-01-10 Molecular Devices, Llc System and Method for Automatically Analyzing Phenotypical Responses of Cells
CN109740560A (zh) * 2019-01-11 2019-05-10 济南浪潮高新科技投资发展有限公司 基于卷积神经网络的人体细胞蛋白质自动识别方法及系统
CN109815870A (zh) * 2019-01-17 2019-05-28 华中科技大学 细胞表型图像定量分析的高通量功能基因筛选方法及系统
CN112001329A (zh) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 蛋白表达量的预测方法、装置、计算机设备和存储介质
CN112017730A (zh) * 2020-08-26 2020-12-01 东莞太力生物工程有限公司 基于表达量预测模型的细胞筛选方法和装置
CN112037862A (zh) * 2020-08-26 2020-12-04 东莞太力生物工程有限公司 基于卷积神经网络的细胞筛选方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153240A (zh) * 2023-08-18 2023-12-01 国家超级计算天津中心 基于氧自由基的关系确定方法、装置、设备及介质

Also Published As

Publication number Publication date
CN112017730A (zh) 2020-12-01
CN112017730B (zh) 2022-08-09

Similar Documents

Publication Publication Date Title
WO2022042510A1 (fr) Procédé et appareil de prédiction de quantités d'expression de protéines, dispositif informatique et support de stockage
WO2022042506A1 (fr) Procédé et dispositif de criblage de cellules sur la base d'un réseau neuronal convolutif
WO2022042509A1 (fr) Procédé et appareil de criblage cellulaire basés sur un modèle de prédiction de niveau d'expression
Schulz et al. Exploiting citation networks for large-scale author name disambiguation
Frise et al. Systematic image‐driven analysis of the spatial Drosophila embryonic expression landscape
CN113454733A (zh) 用于预后组织模式识别的多实例学习器
JP2021503666A (ja) 単一チャネル全細胞セグメンテーションのためのシステム及び方法
CN110378206B (zh) 一种智能审图系统及方法
WO2020232874A1 (fr) Procédé et appareil de modélisation basés sur l'apprentissage par transfert, et dispositif d'ordinateur et support d'informations
CN110890137A (zh) 一种化合物毒性预测模型建模方法、装置及其应用
US20220366710A1 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN111047563A (zh) 一种应用于医学超声图像的神经网络构建方法
CN110969600A (zh) 一种产品缺陷检测方法、装置、电子设备及存储介质
Momeni et al. Deep recurrent attention models for histopathological image analysis
CN114494168A (zh) 模型确定、图像识别与工业质检方法、设备及存储介质
Chen et al. Evaluation of cell segmentation methods without reference segmentations
CN113408802B (zh) 能耗预测网络的训练、能耗预测方法、装置和计算机设备
Xu et al. TrichomeYOLO: A Neural Network for Automatic Maize Trichome Counting
Ridhovan et al. Disease detection in banana leaf plants using densenet and inception method
KR101913952B1 (ko) V-CNN 접근을 통한 iPSC 집락 자동 인식 방법
CN114664382B (zh) 多组学联合分析方法、装置及计算设备
KR20220142905A (ko) 공간 유전자발현정보에 기반하여 조직 이미지의 세포 구성을 예측하는 장치 및 방법
Vanea et al. HAPPY: A deep learning pipeline for mapping cell-to-tissue graphs across placenta histology whole slide images
CN113095589A (zh) 一种人口属性确定方法、装置、设备及存储介质
Johnson et al. Recombination rate inference via deep learning is limited by sequence diversity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860344

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/07/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21860344

Country of ref document: EP

Kind code of ref document: A1