WO2022042510A1 - 蛋白表达量的预测方法、装置、计算机设备和存储介质 - Google Patents

蛋白表达量的预测方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022042510A1
WO2022042510A1 PCT/CN2021/114173 CN2021114173W WO2022042510A1 WO 2022042510 A1 WO2022042510 A1 WO 2022042510A1 CN 2021114173 W CN2021114173 W CN 2021114173W WO 2022042510 A1 WO2022042510 A1 WO 2022042510A1
Authority
WO
WIPO (PCT)
Prior art keywords
cells
network model
protein expression
tested
cell
Prior art date
Application number
PCT/CN2021/114173
Other languages
English (en)
French (fr)
Inventor
陈亮
韩晓健
李争尔
梁国龙
Original Assignee
深圳太力生物技术有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳太力生物技术有限责任公司 filed Critical 深圳太力生物技术有限责任公司
Publication of WO2022042510A1 publication Critical patent/WO2022042510A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present application relates to the field of biotechnology, and in particular, to a method, device, computer equipment and storage medium for predicting protein expression.
  • the cells in the cell pool can be transfected, and the cell pool can be processed by the limiting dilution method to obtain a single cell, and then a single cell can be used to culture a homogeneous cell population, that is, a cell line, The target protein expression level of the cells in the cell lines was tested, and the cell lines with high target protein expression levels were screened.
  • the limiting dilution method is a cumbersome process, and the cells need to be cultured and screened repeatedly before calculating the expression level of the target protein, resulting in a low prediction efficiency of protein expression and a very long time.
  • a method for predicting protein expression comprising:
  • the target generation network model is obtained by training the generative confrontation network model with multiple grayscale images of training cells; the multiple grayscale images of training cells have corresponding The fluorescence map label; the fluorescence map label is the real fluorescence map corresponding to the grayscale image of the training cells;
  • the predicted fluorescence maps corresponding to the plurality of cells to be tested are obtained respectively;
  • the corresponding protein expression levels of the plurality of cells to be tested are determined.
  • the grayscale images of the cells to be tested corresponding to the multiple target cells respectively are input into the target generation network model.
  • determine the protein expression ability level in each of the cells to be tested including:
  • the grayscale images of the cells to be tested are input into the cell classification network model;
  • the cell classification network model is obtained by training the initial convolutional neural network with multiple grayscale images of training cells with protein expression labels;
  • the protein The expression label is used to characterize the protein expression ability level of the cells in the grayscale image of each training cell;
  • the cell classification network model is used to detect the protein expression ability level of the cells in the grayscale image of the input model;
  • the protein expression ability level in each test cell is determined.
  • the real protein expression level of the corresponding cells in the training cell grayscale image, and the protein expression ability level corresponding to the real protein amount are determined, and based on the protein expression The ability level obtains the protein expression label of the grayscale image of the training cell;
  • the initial convolutional neural network is trained by using the protein expression label and the grayscale image of the training cells to obtain a cell classification network model.
  • the generative confrontation network model includes a generative network model and a discriminant network model to be trained;
  • the discriminant network model When the discriminant network model has been trained for a set number of times, it is switched to training the generation network model; the generation network model and the discriminant network model are alternately trained until the target generation network model is obtained.
  • adjusting the network parameters of the generation network model according to the discrimination result until the target generation network model is obtained including:
  • a target grayscale map corresponding to the target expression level is obtained, and the cells to be tested corresponding to the target grayscale map are determined as the target cells used for culturing the cell line.
  • a device for predicting protein expression comprising:
  • the grayscale image acquisition module of the cells to be tested is used to acquire the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank;
  • the first input module is used to input multiple grayscale images of cells to be tested into the target generation network model;
  • the target generation network model is obtained by training the generative confrontation network model with multiple grayscale images of training cells;
  • the cell grayscale images respectively have corresponding fluorescent map labels;
  • the fluorescent map labels are the real fluorescent maps corresponding to the training cell grayscale images;
  • a predicted fluorescence map acquisition module configured to generate the output of the network model according to the target, and obtain the predicted fluorescence maps corresponding to the plurality of cells to be tested respectively;
  • the protein expression level determination module is used for determining the protein expression levels corresponding to the plurality of cells to be tested according to the predicted fluorescence map.
  • a computer device includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the steps of the above-described method for predicting protein expression level are implemented.
  • the above-mentioned method, device, computer equipment and storage medium for predicting the amount of protein expression by acquiring the grayscale images of the cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank, and inputting the grayscale images of the plurality of cells to be tested into a target Generative network model, the target generation network model can be obtained by training the generative adversarial network model by using multiple grayscale images of training cells, and multiple grayscale images of training cells can have corresponding fluorescence map labels respectively; and then the network model can be generated according to the target.
  • the output of the test cell is obtained, and the predicted fluorescence maps corresponding to the multiple cells to be tested are obtained, and the protein expression levels corresponding to the multiple cells to be tested are determined according to the predicted fluorescence maps, which realizes the rapid determination of each cell in the culture pool according to the grayscale image of the cells.
  • the protein expression level is avoided to be evaluated after repeated culture and screening, which effectively improves the prediction efficiency of the protein expression level of the cells during the culture process and shortens the evaluation period of the protein expression level.
  • FIG. 1 is a schematic flowchart of a method for predicting protein expression in one embodiment
  • FIG. 2 is a schematic flowchart of a training step of a target generation network model in one embodiment
  • Figure 3a is a grayscale image of a training cell in one embodiment
  • Figure 3b is a fluorescence image of a training cell in one embodiment
  • FIG. 4 is a schematic flow chart of a cell screening method in one embodiment
  • FIG. 5 is a schematic flowchart of a training step of a cell classification network model in one embodiment
  • FIG. 6 is a schematic flow chart of another cell screening method in one embodiment
  • FIG. 7 is a structural block diagram of a device for predicting protein expression in one embodiment
  • FIG. 8 is a diagram of the internal structure of a computer device in one embodiment.
  • the present application provides a method, device, computer equipment and storage medium for predicting protein expression level.
  • a method for predicting protein expression is provided.
  • the method is applied to a terminal for illustration. It can be understood that the method can also be applied to a server, and also It can be applied to a system including a terminal and a server, and the method can be realized through the interaction of the terminal and the server.
  • the method includes the following steps:
  • Step 101 Obtain grayscale images of cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank.
  • the cells to be tested may be cells treated with transfection technology
  • the cells to be tested may be cells that fail to obtain exogenous DNA fragments after treatment with transfection technology, or cells that have obtained exogenous DNA fragments but not A cell that has been integrated into a chromosome, or a cell in which a foreign DNA segment has been integrated into a chromosome.
  • the grayscale image of the cell to be tested may be a grayscale image of the cell to be tested.
  • multiple cells in the cell culture pool can be transfected, so that some or all of the cells in the cell culture pool can obtain exogenous DNA fragments.
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank can be obtained through a photomicrography device.
  • Step 102 Input multiple grayscale images of cells to be tested into the target generation network model; the target generation network model is obtained by training the generative adversarial network model with multiple grayscale images of training cells; the multiple grayscale images of training cells are obtained by training the generative confrontation network model. respectively have corresponding fluorescence map labels; the fluorescence map labels are the real fluorescence maps corresponding to the grayscale images of the training cells;
  • the grayscale image of the training cells may be a grayscale image of cells used for training a generative adversarial network model, and the cells in the grayscale image of the training cells may be cells after transfection.
  • the multiple grayscale images of the cells to be tested can be input into the target generation network model.
  • the target generation network model can be obtained by training the generative adversarial network model by using multiple grayscale images of training cells with fluorescent map labels, wherein the fluorescent map labels can be the real fluorescent maps corresponding to the grayscale images of the training cells .
  • Step 103 according to the output of the target generation network model, obtain the predicted fluorescence images corresponding to the plurality of cells to be tested respectively.
  • the predicted fluorescence images corresponding to the plurality of cells to be tested can be obtained according to the output of the target generation network model.
  • the target generation network model can generate the corresponding predicted fluorescence image according to the input grayscale image of the cell to be tested.
  • the cells can be stained and photographed to obtain a fluorescence image.
  • cells become inactive after staining, making it difficult to continue to reproduce.
  • the target generation network model is used to predict the fluorescence image of the cell, and the fluorescence image of the cell can be obtained while the cell viability is maintained.
  • Step 104 determine the respective protein expression levels corresponding to the plurality of cells to be tested.
  • the protein expression levels corresponding to the multiple tested cells can be determined according to the predicted fluorescence maps corresponding to the multiple tested cells respectively.
  • the grayscale images of the cells to be tested corresponding to the plurality of cells to be tested in the cell culture tank are obtained, and the grayscale images of the cells to be tested are input into the target generation network model.
  • the target generation network model can use multiple The grayscale images of the training cells are obtained by training the generative adversarial network model, and the grayscale images of the training cells can have corresponding fluorescence map labels respectively; and then the output of the network model can be generated according to the target to obtain the corresponding corresponding to the multiple cells to be tested.
  • Predict the fluorescence map and determine the protein expression corresponding to multiple cells to be tested according to the predicted fluorescence map, which realizes the rapid determination of the protein expression of each cell in the culture pool according to the grayscale image of the cells, avoiding repeated cultivation and screening.
  • the evaluation of the protein expression can effectively improve the prediction efficiency of the protein expression of the cells during the culture process, and shorten the evaluation period of the protein expression.
  • the method may further include the following steps:
  • Step 201 Obtain a generative adversarial network model, a grayscale image of training cells and their corresponding real fluorescence images; the generative adversarial network model includes a generative network model and a discriminant network model to be trained.
  • a cell as a training set can be set, and the cell can be photographed to obtain a grayscale image of the training cell and the corresponding real fluorescence image respectively.
  • the cells used as the training set can be cells in the cell culture pool that have been processed by transfection technology, and the cells can be cells that have not been able to obtain exogenous DNA fragments after the treatment of transfection technology, or have obtained exogenous DNA fragments. Cells with DNA fragments but not integrated into the chromosome, or cells with foreign DNA fragments integrated into the chromosome.
  • the grayscale image of the training cell and its corresponding real fluorescence image can be the grayscale image and the fluorescence image obtained by shooting the same cell under the same shooting conditions, such as Figure 3a and Figure 3b.
  • the generative adversarial network model can be obtained, the grayscale image of the training cells, and the real fluorescence image corresponding to the grayscale image of the training cells; the generative adversarial network model can include the generative network model and the discriminant network model to be trained.
  • the discriminative network model can be composed of convolutional layers, maximum pooling layers and fully connected layers, for example, it can be composed of 3 convolutional layers, 2 maximum pooling layers and 1 fully connected layer.
  • the generative network model can be composed of a convolutional layer, a maximum pooling layer and a deconvolutional layer.
  • the grayscale image of the training cells undergoes a convolution operation through the convolutional layer and a pooling operation through the maximum pooling layer.
  • the operation is determined to be a downsampling operation, and the cell feature map is extracted from the training cell grayscale image through several downsampling operations.
  • a deconvolution operation is performed through the deconvolution layer, and it is determined to perform an upsampling operation, so that the extracted cell feature map is used to generate the fluorescence map to be discriminated.
  • Step 202 Input the training cell grayscale image into the generation network model, and obtain the fluorescence image to be discriminated output by the generation network model.
  • the training cell grayscale image can be input into the generation network model, and the fluorescence image to be discriminated output by the generation network model can be obtained. Fluorescence image to be discriminated.
  • a first preset number of downsampling operations can be performed, wherein the feature map can be compressed while retaining the cell features through the maximum pooling layer.
  • the generation network model After acquiring the cell feature vector, the generation network model performs an up-sampling operation on the acquired cell feature vector for a second preset number of times to generate a fluorescence image to be discriminated, wherein the first preset number of times and the second preset number of times may both be five Second-rate.
  • Step 203 Input the fluorescence image to be discriminated and the real fluorescence image into the discrimination network model, and obtain the discrimination result corresponding to the fluorescence image to be discriminated.
  • the fluorescence image to be discriminated and the real fluorescence image can be input into the discriminant network model, and the discrimination result output by the discriminant network model is obtained.
  • the discriminant network model can judge the authenticity of the fluorescence image to be discriminated according to the known real fluorescence image, determine whether the image is a real fluorescence image, and output the determination result. Wherein, when the output result of the discriminant network model is 1, it can be represented that the fluorescence image to be discriminated is true, and when the output result is 0, it can be represented that the fluorescence image to be discriminated is false.
  • Step 204 Adjust the network parameters of the discrimination network model according to the discrimination result.
  • the network parameters of the generation network model can be adjusted, and it is known that the target generation network model whose network parameters satisfy the preset conditions is obtained.
  • the generative network model and the discriminant network model are alternately trained.
  • the discriminant network model can be trained first.
  • the network parameters of the generative network model are fixed, and the protein expression label and the generative network model are generated.
  • the predicted fluorescence map is input into the discriminant network model, and the network parameters of the discriminant network model are adjusted according to the discriminant results.
  • Step 205 when the discrimination network model has been trained for a set number of times, switch to training the generation network model; alternately train the generation network model and the discrimination network model until the target generation network model is obtained.
  • the current network parameters of the discriminant network model can be fixed, and the training of the generative network model can be switched to train the generative network model and the discriminant network model alternately until the target generation network model is obtained. .
  • the generation network model can be supervised by the continuously optimized discriminant network model, and the training cell grayscale image and the fluorescence to be discriminated can be established. relationship between graphs.
  • the training of the generation network model may include the following steps:
  • the discrimination result indicates that the fluorescence image to be discriminated is false
  • the fluorescence image to be discriminated the real fluorescence image and the discriminant network model can be combined to calculate the current loss error of the generated network model, and the current loss error of the generated network model can be calculated according to the loss error.
  • the network parameters are adjusted to update the generated network model.
  • the back-propagation algorithm can be used to adjust the parameters of the generative network model.
  • the current adjustment times of the network parameters can be determined, and whether the current adjustment times are less than the preset threshold can be determined. If so, it can be determined that the current network parameters need to be adjusted continuously, and the grayscale image of the training cells can be input to generate the network model to obtain the generated network model.
  • the target generation network model can be generated based on the network parameters of the current generation network model.
  • the parameters of the discriminator can be set to be non-adjustable.
  • the grayscale image of the training cells is input into the generative network model to be trained for the first time, because the model does not converge, the network parameters can be randomly generated, then In the first generated fluorescence image to be discriminated, the discriminated network model is determined as a fake image.
  • the generated network model can obtain the loss error through the cost function, and adjust the network parameters through the back-propagation algorithm to continuously reduce the loss error.
  • the discriminant network model determines that the fluorescence image to be discriminated is a real image, the discriminant network model and the generation network model reach a balance.
  • the network parameters of the generation network model can be adjusted according to the loss error, and the generation network model is continuously optimized, so that the optimization of the generation network model can generate a more realistic fluorescence image.
  • the inputting multiple grayscale images of cells to be tested into the target generation network model may include:
  • Step 401 Determine the level of protein expression ability in each cell to be tested according to the grayscale image of the cell to be tested.
  • the protein expression ability grade is used to characterize the ability of a cell to produce a target protein, and can be divided into multiple grades, for example, it can be divided into four grades: high-level expression, medium-level expression, low-level expression, and no expression.
  • the protein expression ability level of the corresponding cells to be tested can be determined based on the grayscale images of the cells to be tested.
  • Step 402 from a plurality of cells to be tested, determine a plurality of target cells whose protein expression ability level satisfies a preset condition.
  • the high-level expression level can be classified into four levels.
  • the detected cells are identified as target cells.
  • Step 403 Input the grayscale images of the cells to be tested corresponding to the multiple target cells into the target generation network model.
  • the grayscale images of the cells to be tested corresponding to the multiple target cells can be input into the target generation network model.
  • the level of protein expression ability in each cell to be tested is determined according to the grayscale image of each cell to be tested, and a plurality of levels of protein expression ability that meet preset conditions are determined from a plurality of cells to be tested.
  • target cells and input the grayscale images of the cells to be tested corresponding to multiple target cells into the target generation network model, and the cells to be tested can be preliminarily screened, so that the cells with poor protein expression or no expression can be screened out in the cells. Avoid prediction interference caused by cells with low or no protein expression ability in the process of screening cells with high protein expression ability.
  • determining the level of protein expression ability in each cell to be tested according to the grayscale image of the cell to be tested of each cell to be tested may include the following steps:
  • the grayscale images of the cells to be tested are input into the cell classification network model; the cell classification network model is obtained by training the initial convolutional neural network with multiple grayscale images of training cells with protein expression labels; the protein The expression label is used to characterize the protein expression ability level of the cells in the grayscale image of each training cell; the cell classification network model is used to detect the protein expression ability level of the cells in the cell grayscale image of the input cell classification network model; according to the The output of the cell classification network model determines the level of protein expression ability in each tested cell.
  • a cell classification network model can be preset, and the cell classification network model can be obtained by training the initial convolutional neural network using multiple grayscale images of training cells with protein expression labels.
  • the protein expression ability level of cells in the grayscale image of each training cell can be characterized.
  • the protein expression ability level of the cells in the grayscale image of the cells input to the cell classification network model can be predicted.
  • the grayscale images of the cells to be tested can be input into the cell classification network model to determine the protein expression ability level of the cells to be tested.
  • the cell grayscale image of each cell to be tested is input into the cell classification network model, and the protein expression ability level in each cell to be tested is determined according to the output of the cell classification network model.
  • Figure to quickly and efficiently conduct a preliminary assessment of the protein expression ability level of the cells to be tested,
  • Step 501 Obtain the grayscale image of the training cells and the corresponding real fluorescence image.
  • grayscale images of training cells and their corresponding real fluorescence images can be obtained.
  • Step 502 Determine the value corresponding to the green channel in the real fluorescence image.
  • proteins produced by genes of interest can fluoresce at specific wavelengths.
  • the value corresponding to the green channel in the real fluorescence image can be determined, that is, the G value of the real fluorescence image, wherein the G value can also be called the fluorescence value.
  • Step 503 according to the value corresponding to the green channel in the real fluorescence image, determine the actual protein expression level of the corresponding cells in the grayscale image of the training cells, and the protein expression ability level corresponding to the real protein amount, and based on the The protein expression ability grade is obtained to obtain the protein expression label of the grayscale image of the training cell.
  • the actual protein expression level of the cells in the corresponding grayscale image of the training cells can be determined based on the value, as well as the protein expression ability level corresponding to the actual protein expression level, and then the protein expression ability level can be used. Determine the protein expression level label of the grayscale image of the training cells.
  • the fluorescence value can be used to determine the real protein expression amount.
  • a preset grading list can be obtained, and through the grading list, when the real protein expression falls within a numerical range, its corresponding protein expression ability level can be determined. According to the real protein expression amount and the grade division list, the protein expression ability level corresponding to the real protein expression amount can be determined.
  • step 504 the initial convolutional neural network is trained by using the protein expression label and the grayscale image of the training cells to obtain a cell classification network model.
  • the initial convolutional neural network can be trained by using the training cell grayscale image with the protein expression amount label to obtain a cell classification network model.
  • the initial convolutional neural network may include a first network structure, a second network structure, a third network structure, and a fourth network structure.
  • the first network structure may be a feature extraction network composed of 10 convolutional layers, the first network structure may be used to extract training cell features from the training cell grayscale image, and the training cell features may be input to the second network structure, the first network structure The third network structure and the fourth network structure.
  • the second network structure may include a convolutional layer and a global average pooling layer, the second network structure may be connected with the first network structure, and the training cell features output by the first network structure are input into the convolutional layer, and the convolutional layer After processing with the global average pooling layer, the first cell feature vector is obtained.
  • the third network structure can include 2 layers of convolution and global maximum pooling layers, the third network structure can be connected with the first network structure, and the training cell features output by the first network structure are input into the convolution layer, and the 2 layers of convolution After the layer and the global max pooling layer are processed, the second cell feature vector is obtained.
  • the fourth network structure can include 2 layers of convolution and global average pooling layers, the third network structure can be connected with the first network structure, and the training cell features output by the first network structure are input into the convolution layer, and the 2 layers of convolution After the layer and the global average pooling layer are processed, the third cell feature vector is obtained.
  • the corresponding weight of each network structure can be determined, and the first cell feature vector, the second cell feature vector and the third cell feature vector.
  • the weighted summation is performed, and the result after the weighted summation is the classification result of the cells in the grayscale image of the training cells, that is, the protein expression ability level corresponding to the cells.
  • the grayscale image of the training cells and the corresponding real fluorescence image are obtained, the value corresponding to the green channel in the real fluorescence image is determined, and according to the value, the real protein expression of the cells in the corresponding grayscale image of the training cells is determined , the initial convolutional neural network is trained by using the protein expression label and the gray level of the training cells, and the fluorescence value of the real fluorescence map can be used as an intermediate variable to quantify the protein expression and then determine the protein expression ability level to obtain an accurate protein expression.
  • the expression label provides accurate training data for the training of the cell classification network model.
  • the determining the protein expression levels corresponding to the plurality of cells to be tested according to the plurality of predicted fluorescence maps may include the following steps:
  • the values corresponding to the green channels in the multiple predicted fluorescence maps can be determined, that is, the G value of the predicted fluorescence maps, and based on the values corresponding to the green channels of the predicted fluorescence maps, the respective values of the multiple cells to be tested can be determined. corresponding protein expression.
  • the G value of the fluorescence image can be used to determine the Measure the corresponding protein expression in cells.
  • the protein expression levels corresponding to the cells to be tested are determined according to the values corresponding to the green channels of the predicted fluorescence images. The amount of protein expression in the cells in the figure was quantified.
  • it also includes:
  • the multiple protein expression levels can be sorted, and from the sorted protein expression levels, the first preset number of proteins expressed The amount was determined as the target expression amount.
  • the protein expression levels corresponding to the multiple cells to be tested can be sorted in descending order, that is, sorted from large to small. After sorting, the protein expression levels corresponding to the top N names can be determined as the target expression levels. Of course, in practical applications, the protein expression level exceeding the preset expression level threshold can also be determined as the target expression level.
  • the grayscale image of the cell to be tested corresponding to the target expression level can be determined, and the cell to be tested corresponding to the grayscale image of the cell to be tested is determined as the target cell for culturing the cell line.
  • multiple protein expression levels are sorted, and according to the sorted multiple protein expression levels, from the multiple cells to be tested, the preset number of cells with the highest protein expression levels are determined as the target cells , which can quickly screen cells with high protein expression, which not only reduces the screening workload, but also effectively shortens the cell screening cycle.
  • the acquiring a grayscale image of the training cells may include the following steps:
  • the original cell grayscale image used for model training and perform normalization processing on the original cell grayscale image; perform data enhancement processing on the processed original cell grayscale image to obtain the training cell grayscale image; the Data enhancement processing includes any one or more of the following: rotation processing, inversion processing, contrast enhancement processing, and random cropping processing.
  • the original cell grayscale image used for model training can be obtained, and the original cell grayscale image can be normalized, wherein the original cell grayscale image can be photographed using a microscope on the cells used as the training set Grayscale image.
  • data enhancement processing can be performed on the processed raw grayscale image, such as rotating, flipping, randomly cropping the image, or enhancing the contrast of the image.
  • the training cell grayscale image is obtained, and the training cell grayscale image used for training the initial convolutional neural network model can be added.
  • quickly expand the training samples to provide data support for the training of the initial convolutional neural network model.
  • a grayscale image of the cell (that is, the grayscale image of the cell to be tested in this application) can be photographed through a microscope, and the grayscale image can be input into the trained cell classification network model, and the output of the cell classification network model can be obtained through the output of the cell classification network model.
  • highly fluorescent-expressing cells that is, cells with a high level of protein expression ability, were obtained.
  • the target generation network model can be used to predict the fluorescence map of the cells with high fluorescence expression, and obtain the predicted cell fluorescence picture (that is, the predicted fluorescence map in this application).
  • the grayscale image corresponding to each cell can be input into the cell classification network model to obtain the classification result.
  • the corresponding cell fluorescence image After the corresponding cell fluorescence image is generated, it can be detected whether all grayscales are completed. If not, you can return to the step of analyzing the grayscale image and obtain cells with high protein expression through the cell classification model; if so, you can calculate the protein expression of the cells according to the predicted cell fluorescence picture.
  • the multiple protein expression levels can be sorted, and the target cells can be screened from multiple cells according to the preset screening submission, and a screening report can be generated and submitted.
  • FIGS. 1, 2, and 4-6 are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 1, 2, and 4-6 may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. These steps Alternatively, the order of execution of the stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in the other steps.
  • a device for predicting protein expression may include:
  • a grayscale image acquisition module 701 of cells to be tested configured to acquire grayscale images of cells to be tested corresponding to a plurality of cells to be tested in the cell culture tank;
  • the first input module 702 is used to input multiple grayscale images of cells to be tested into the target generation network model;
  • the target generation network model is obtained by training the generative confrontation network model with multiple grayscale images of training cells;
  • the grayscale images of the training cells respectively have corresponding fluorescent map labels;
  • the fluorescent map labels are the real fluorescent maps corresponding to the grayscale images of the training cells;
  • a predicted fluorescence map acquisition module 703, configured to generate the output of the network model according to the target, and obtain the predicted fluorescence maps corresponding to the plurality of cells to be tested respectively;
  • the protein expression level determination module 704 is configured to determine the respective protein expression levels corresponding to the plurality of cells to be tested according to the predicted fluorescence map.
  • the first input module 702 includes:
  • the first expression ability level determination submodule is used to determine the protein expression ability level in each test cell according to the test cell grayscale image of each test cell;
  • the screening sub-module is used to determine a plurality of target cells whose protein expression ability level meets a preset condition from a plurality of cells to be tested;
  • the second input sub-module is configured to input the grayscale images of the cells to be tested corresponding to the multiple target cells into the target generation network model.
  • the expression ability level determination submodule includes:
  • the third input unit is used to input the grayscale images of the cells to be tested of each cell to be tested into the cell classification network model;
  • the cell classification network model uses a plurality of grayscale images of training cells with protein expression labels to input the initial convolutional neural network model. obtained through network training;
  • the protein expression label is used to represent the protein expression ability level of the cells in the grayscale image of each training cell;
  • the cell classification network model is used to detect the protein in the cells in the grayscale image of the input cell classification network model. level of expressiveness;
  • the output unit is configured to determine the protein expression ability level in each cell to be tested according to the output of the cell classification network model.
  • the apparatus may further include:
  • the real fluorescence image acquisition module is used to obtain the training cell grayscale image and its corresponding real fluorescence image
  • a first fluorescence value determination module used for determining the value corresponding to the green channel in the real fluorescence image
  • the second expression ability level determination module is configured to determine, according to the value corresponding to the green channel, the actual protein expression amount of the corresponding cells in the grayscale image of the training cells, and the protein expression ability level corresponding to the actual protein amount, and obtaining the protein expression label of the grayscale image of the training cell based on the protein expression ability level;
  • the cell classification network model generation module is used for training the initial convolutional neural network by using the protein expression label and the training cell grayscale to obtain a cell classification network model.
  • the apparatus may further include:
  • the first model acquisition module is used to acquire a generative confrontation network model, a grayscale image of training cells and their corresponding real fluorescence images;
  • the generative confrontation network model includes a generative network model to be trained and a discriminant network model;
  • a first to-be-discriminated fluorescence image generation module configured to input the training cell grayscale image into the generation network model, and obtain the to-be-discriminated fluorescence image output by the generation network model;
  • a discrimination result acquisition module configured to input the fluorescence image to be discriminated and the real fluorescence image into the discrimination network model, and obtain the discrimination result corresponding to the fluorescence image to be discriminated;
  • a discriminant network parameter adjustment module configured to adjust the network parameters of the discriminant network model according to the discriminant result
  • a model training switching module configured to switch to training the generation network model when the discriminant network model has been trained for a set number of times; alternately train the generation network model and the discriminant network model until the target generation network is obtained Model.
  • the model training switching module includes:
  • the second model acquisition sub-module is used to input the grayscale image of the training cells into the generation network model, and obtain the to-be-discriminated fluorescence image output by the generation network model;
  • the second to-be-discriminated fluorescence image generation sub-module is configured to input the to-be-discriminated fluorescence image and the real fluorescence image into the discrimination network model, and obtain the discrimination result corresponding to the to-be-discriminated fluorescence image;
  • the loss error determination sub-module is used to calculate the generated network model according to the fluorescence image to be discriminated, the real fluorescence image and the discriminant network model when the discrimination result indicates that the fluorescence image to be discriminated is false. loss error;
  • a generation network parameter adjustment sub-module is configured to adjust the network parameters of the generation network model according to the loss error.
  • the protein expression level determination module includes:
  • the second fluorescence value determination submodule is used to determine the respective values corresponding to the green channels in the multiple predicted fluorescence maps
  • the mapping sub-module is used to determine the protein expression levels corresponding to the multiple cells to be tested according to the values corresponding to the green channels in the multiple predicted fluorescence maps.
  • the apparatus may further comprise
  • the sorting module is used to sort the protein expression levels corresponding to the plurality of cells to be tested respectively, and from the sorted multiple protein expression levels, determine the protein expression level of the first preset number as the target expression level ;
  • the target cell determination module is configured to obtain the target grayscale image corresponding to the target expression level, and determine the cells to be tested corresponding to the target grayscale image as the target cells used for culturing the cell line.
  • each module in the above-mentioned apparatus for predicting protein expression level can be implemented by software, hardware, or a combination thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 8 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer program when executed by the processor, implements a method for predicting the amount of protein expression.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:
  • the target generation network model is obtained by training the generative confrontation network model with multiple grayscale images of training cells; the multiple grayscale images of training cells have corresponding The fluorescence map label; the fluorescence map label is the real fluorescence map corresponding to the grayscale image of the training cells;
  • the predicted fluorescence maps corresponding to the plurality of cells to be tested are obtained respectively;
  • the corresponding protein expression levels of the plurality of cells to be tested are determined.
  • the processor when the processor executes the computer program, it also implements the steps in the other embodiments described above.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the target generation network model is obtained by training the generative confrontation network model with multiple grayscale images of training cells; the multiple grayscale images of training cells have corresponding The fluorescence map label; the fluorescence map label is the real fluorescence map corresponding to the grayscale image of the training cells;
  • the predicted fluorescence maps corresponding to the plurality of cells to be tested are obtained respectively;
  • the corresponding protein expression levels of the plurality of cells to be tested are determined.
  • the computer program when executed by the processor, also implements the steps in the other embodiments described above.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Multimedia (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

一种蛋白表达量的预测方法、装置、计算机设备和存储介质。该方法包括:获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图(101);将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图(102);根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图(103);根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量(104)。该方法实现了根据细胞灰度图快速确定培养池中各个细胞的蛋白表达量,避免反复进行培养和筛选,有效提高培养过程中细胞的蛋白表达量的预测效率。

Description

蛋白表达量的预测方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2020年08月26日递交至中国国家知识产权局、申请号为202010870702.0、发明名称为“蛋白表达量的预测方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用合并入本申请中。
技术领域
本申请涉及生物技术领域,特别是涉及一种蛋白表达量的预测方法、装置、计算机设备和存储介质。
背景技术
随着基因工程技术的不断发展,从细胞池中分离出能够表达特定产物的单克隆细胞株已成为生物领域中的常见需求。
在现有技术中,可以对细胞池中的细胞进行转染,并采用有限稀释法对细胞池进行处理,得到单细胞,进而可以采用单细胞培养具有同质性的细胞群体,即细胞株,并测试细胞株中细胞的目的蛋白表达量,筛选其中目的蛋白表达水平高的细胞株。
然而,有限稀释法过程繁琐,需要反复地对细胞进行培养和筛选后,才计算目的蛋白表达量,导致蛋白表达量的预测效率低下,耗时十分长。
发明内容
基于此,有必要针对上述技术问题,提供一种蛋白表达量的预测方法、装置、计算机设备和存储介质。
一种蛋白表达量的预测方法,所述方法包括:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
可选地,所述将多张待测细胞灰度图输入目标生成网络模型,包括:
根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级;
从多个待测细胞中确定蛋白表达能力等级满足预设条件的多个目标细胞;
将所述多个目标细胞分别对应的待测细胞灰度图,输入所述目标生成网络模型。
可选地,所述根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能 力等级,包括:
将各个待测细胞的待测细胞灰度图输入细胞分类网络模型;所述细胞分类网络模型采用具有蛋白表达量标签的多张训练细胞灰度图对初始卷积神经网络训练得到;所述蛋白表达量标签用于表征各训练细胞灰度图中细胞的蛋白表达能力等级;所述细胞分类网络模型用于检测输入模型的细胞灰度图中细胞的蛋白表达能力等级;
根据所述细胞分类网络模型的输出,确定各个待测细胞中的蛋白表达能力等级。
可选地,还包括:
获取训练细胞灰度图及其对应的真实荧光图;
确定所述真实荧光图中绿色通道对应的数值;
根据所述真实荧光图中绿色通道对应的数值,确定对应的所述训练细胞灰度图中细胞的真实蛋白表达量,以及所述真实蛋白量对应的蛋白表达能力等级,并基于所述蛋白表达能力等级得到所述训练细胞灰度图的蛋白表达量标签;
采用所述蛋白表达量标签和所述训练细胞灰度图对初始卷积神经网络进行训练,得到细胞分类网络模型。
可选地,还包括:
获取生成式对抗网络模型、训练细胞灰度图及其对应的真实荧光图;所述生成式对抗网络模型包括待训练的生成网络模型和判别网络模型;
将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
根据所述判别结果,调整所述判别网络模型的网络参数;
当所述判别网络模型训练了设定次数时,切换为对所述生成网络模型进行训练;交替训练所述生成网络模型和所述判别网络模型,直到得到目标生成网络模型。
可选地,所述根据所述判别结果,调整所述生成网络模型的网络参数,直到得到目标生成网络模型,包括:
将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
当所述判别结果表征所述待判别荧光图为假,根据所述待判别荧光图、所述真实荧光图和所述判别网络模型,计算所述生成网络模型的损失误差;
根据所述损失误差,调整所述生成网络模型的网络参数。
可选地,还包括
对所述多个待测细胞分别对应的蛋白表达量进行排序,并从排序后的多个蛋白表达量中,将排序最前的预设数量的蛋白表达量确定为目标表达量;
获取所述目标表达量对应的目标灰度图,并将所述目标灰度图对应的待测细胞确定为用于培养细胞株的目标细胞。
一种蛋白表达量的预测装置,所述装置包括:
待测细胞灰度图获取模块,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
第一输入模块,用于将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
预测荧光图获取模块,用于根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
蛋白表达量确定模块,用于根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如上所述的蛋白表达量的预测方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序所述计算机程序被处理器执行时实现如上所述的蛋白表达量的预测方法的步骤。
上述一种蛋白表达量的预测方法、装置、计算机设备和存储介质,通过获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,将多张待测细胞灰度图输入目标生成网络模型,该目标生成网络模型可以采用多张训练细胞灰度图对生成式对抗网络模型训练得到,多张训练细胞灰度图可以分别具有对应的荧光图标签;进而可以根据目标生成网络模型的输出,得到多个待测细胞分别对应的预测荧光图,并根据预测荧光图,确定多个待测细胞分别对应的蛋白表达量,实现了根据细胞灰度图快速确定培养池中各个细胞的蛋白表达量,避免反复进行培养和筛选后才能进行蛋白表达量的评估,有效提高培养过程中细胞的蛋白表达量的预测效率,缩短蛋白表达量的评估周期。
附图说明
图1为一个实施例中一种蛋白表达量的预测方法的流程示意图;
图2为一个实施例中一种目标生成网络模型训练步骤的流程示意图;
图3a为一个实施例中一种训练细胞灰度图;
图3b为一个实施例中一种训练细胞荧光图;
图4为一个实施例中一种细胞筛选方法的流程示意图;
图5为一个实施例中一种细胞分类网络模型训练步骤的流程示意图;
图6为一个实施例中另一种细胞筛选方法的流程示意图;
图7为一个实施例中一种蛋白表达量的预测装置的结构框图;
图8为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
为了便于对本发明实施例的理解,先对现有技术进行说明。
在现有技术中,在筛选目的蛋白表达水平高的细胞株时,需要采用有限稀释法对细胞培养池中的细胞进行处理,反复进行培养和筛选后,才计算目的蛋白表达量,整个过程需要耗时6个月以上,在消耗大量人力物力的同时,难以满足产业化、规模化生产的需求。针对现有技术中存在的缺点,本申请提供了一种蛋白表达量的预测方法、装置、计算机设备和存储介质。
在一个实施例中,如图1所示,提供了一种蛋白表达量的预测方法,本实施例以该方法应用于终端进行举例说明,可以理解的是,该方法也可以应用于服务器,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现该方法。本实施例中,该方法包括以下步骤:
步骤101,获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图。
作为一示例,待测细胞可以是经过转染技术处理后的细胞,待测细胞可以是在转染技术处理后未能获得外源DNA片段的细胞,也可以是已获得外源DNA片段但未整合到染色体中的细胞,或者是外源DNA片段已整合到染色体中的细胞。待测细胞灰度图可以是待测细胞的灰度图。
在实际应用中,可以对细胞培养池中的多个细胞进行转染,使得细胞培养池中的部分或全部细胞可以获得外源DNA片段。在进行转染技术处理后,可以通过显微摄影设备获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图。
步骤102,将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
作为一示例,训练细胞灰度图可以是用于训练生成式对抗网络模型的细胞灰度图,训练细胞灰度图中的细胞可以是经过转染处理后的细胞。
在得到多张待测细胞灰度图后,可以将多张待测细胞灰度图输入目标生成网络模型。具体而言,目标生成网络模型,可以采用具有荧光图标签的多张训练细胞灰度图对生成式对抗网络模型进行训练得到,其中,荧光图标签可以是训练细胞灰度图对应的真实荧光图。
步骤103,根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图。
在将多张待测细胞灰度图输入目标生成网络模型,可以根据目标生成网络模型的输出,得到多个待测细胞分别对应的预测荧光图。在实际应用中,目标生成网络模型可以根据输入 的待测细胞灰度图生成对应的预测荧光图。
在实际应用中,可以对细胞染色后进行拍摄,得到荧光图。然而,细胞在染色后会失去活性,难以继续繁殖。在本申请中,采用目标生成网络模型对细胞的荧光图进行预测,可以在保持细胞活性时,得到细胞的荧光图像。
步骤104,根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
在获取预测荧光图后,可以根据多个待测细胞分别对应的预测荧光图,确定多个待测细胞分别对应的蛋白表达量。
在本实施例中,通过获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图,将多张待测细胞灰度图输入目标生成网络模型,该目标生成网络模型可以采用多张训练细胞灰度图对生成式对抗网络模型训练得到,多张训练细胞灰度图可以分别具有对应的荧光图标签;进而可以根据目标生成网络模型的输出,得到多个待测细胞分别对应的预测荧光图,并根据预测荧光图,确定多个待测细胞分别对应的蛋白表达量,实现了根据细胞灰度图快速确定培养池中各个细胞的蛋白表达量,避免反复进行培养和筛选后才能进行蛋白表达量的评估,有效提高培养过程中细胞的蛋白表达量的预测效率,缩短蛋白表达量的评估周期。
在一个实施例中,如图2所示,所述方法还可以包括如下步骤:
步骤201,获取生成式对抗网络模型、训练细胞灰度图及其对应的真实荧光图;所述生成式对抗网络模型包括待训练的生成网络模型和判别网络模型。
具体而言,可以设置作为训练集的细胞,并对该细胞进行拍摄,分别获取训练细胞灰度图和对应的真实荧光图。
其中,作为训练集的细胞,可以是细胞培养池中经过转染技术处理后的细胞,该细胞可以是在转染技术处理后未能获得外源DNA片段的细胞,也可以是已获得外源DNA片段但未整合到染色体中的细胞,或者是外源DNA片段已整合到染色体中的细胞。训练细胞灰度图及其对应的真实荧光图,可以是在相同的拍摄条件下,针对相同的细胞进行拍摄得到的灰度图和荧光图,例如图3a和图3b。
在实际应用中,可以获取生成式对抗网络模型,训练细胞灰度图,以及训练细胞灰度图对应的真实荧光图;生成式对抗网络模型中可以包括待训练的生成网络模型和判别网络模型。
其中,判别网络模型可以由卷积层、最大池化层和全连接层组成,例如,可以由3层卷积层、2层最大池化层和1层全连接层组成。
生成网络模型可以由卷积层,最大池化层和反卷积层组成,在生成网络模型中,训练细胞灰度图经过卷积层进行一次卷积操作以及经过最大池化层进行一次池化操作,确定为进行一次下采样操作,通过若干次的下采样操作,从训练细胞灰度图中提取细胞特征图。
针对生成网络模型中的反卷积层,经过反卷积层进行一次反卷积操作,确定为进行一次上采样操作,由此采用已提取的细胞特征图生成待判别荧光图。
步骤202,将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图。
在实际应用中,可以将训练细胞灰度图输入生成网络模型,并获取生成网络模型输出的 待判别荧光图,其中,生成网络模型可以根据输入的训练细胞灰度图,生成由该模型预测的待判别荧光图。
具体的,训练细胞灰度图在输入生成网络模型后,可以进行第一预设次数的下采样操作,其中,通过最大池化层可以在保留细胞特征的同时,对特征图进行压缩。
在获取细胞特征向量后,生成网络模型对获取的细胞特征向量进行第二预设次数的上采样操作,生成待判别荧光图,其中,第一预设次数和第二预设次数可以都是五次。
步骤203,将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果。
在得到待判别荧光图后,可以将待判别荧光图和真实荧光图输入判别网络模型,并获取判别网络模型输出的判别结果。其中,判别网络模型可以根据已知为真实的真实荧光图,对待判别荧光图的真实性进行判断,判断该图片是否为真实的荧光图并输出判别结果。其中,当判别网络模型输出结果为1时,可以表征待判别荧光图为真,当输出结果为0时,可以表征待判别荧光图为假。
步骤204,根据所述判别结果,调整所述判别网络模型的网络参数。
在得到判别结果后,可以对生成网络模型的网络参数进行调整,知道得到网络参数满足预设条件的目标生成网络模型。
在实际应用中,生成网络模型和判别网络模型交替进行训练,具体的,可以先对判别网络模型进行训练,在训练时,固定生成网络模型的网络参数,将蛋白表达量标签和生成网络模型生成预测荧光图输入到判别网络模型中,并根据判别结果调整判别网络模型的网络参数。
步骤205,当所述判别网络模型训练了设定次数时,切换为对所述生成网络模型进行训练;交替训练所述生成网络模型和所述判别网络模型,直到得到目标生成网络模型。
当判别网络模型的训练次数达到设定次数时,可以将判别网络模型当前的网络参数固定,切换为对生成网络模型的训练,如此交替训练生成网络模型和判别网络模型,直到得到目标生成网络模型。
在本实施例中,通过交替训练生成网络模型和判别网络模型,可以在调整网络参数的过程中,通过不断优化的判别网络模型对生成网络模型进行监督,建立训练细胞灰度图和待判别荧光图之间的关系。
在一个实施例中,所述对所述生成网络模型进行训练,可以包括如下步骤:
将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;当所述判别结果表征所述待判别荧光图为假,根据所述待判别荧光图、所述真实荧光图和所述判别网络模型,计算所述生成网络模型的损失误差;根据所述损失误差,调整所述生成网络模型的网络参数。
在具体实现中,当判别结果表征待判别荧光图为假时,可以结合待判别荧光图、真实荧光图和判别网络模型,计算生成网络模型当前的损失误差,并根据损失误差对生成网络模型的网络参数进行调整,更新生成网络模型。其中,在调整生成网络模型参数时,可以采用反 向传播算法调整生成网络模型的参数。
在更新后,可以确定网络参数的当前调整次数,并判断当前调整次数是否小于预设阈值,若是,可以确定当前的网络参数需继续调整,返回将训练细胞灰度图输入生成网络模型,获取生成网络模型输出的待判别荧光图的步骤,并将新生成的待判别荧光图和真实荧光图输入判别网络模型,获取判别结果,继续对生成网络模型进行训练;若是,可以切户为对判别网络模型的训练。
在交替训练预设次数后,可以在当前生成网络模型对应的损失误差已收敛并且不再变化时,基于当前生成网络模的网络参数生成目标生成网络模型。
例如,在训练生成网络模型时,判别器的参数可以设置为不可调整,当初次将训练细胞灰度图输入待训练的生成网络模型时,由于模型未收敛,网络参数可以是随机生成的,则首次生成的待判别荧光图将被判别网络模型确定为假图,生成网络模型可以通过代价函数获取损失误差,并通过反向传播算法对网络参数进行调整,不断降低损失误差。当判别网络模型判定待判别荧光图为真图时,判别网络模型与生成网络模型达到平衡。
在本实施例中,可以根据损失误差,调整生成网络模型的网络参数,不断地对生成网络模型进行优化,使生成网络模型进行优化可以产生更逼真的荧光图。
在一个实施例中,如图4所示,所述将多张待测细胞灰度图输入目标生成网络模型,可以包括:
步骤401,根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级。
作为一示例,蛋白表达能力等级用于表征细胞产生目的蛋白的能力,可以分为多个不同的等级,例如,可以分为高水平表达、中水平表达、低水平表达和不表达四个等级。
在实际应用中,在获取各个待测细胞的待测细胞灰度图后,可以基于多张待测细胞灰度图,确定对应的各个待测细胞的蛋白表达能力等级。
步骤402,从多个待测细胞中确定蛋白表达能力等级满足预设条件的多个目标细胞。
在确定各个待测细胞对应的蛋白表达能力登记后,可以从多个待测细胞中,确定蛋白表达能力登记满足预设条件的多个目标细胞,其中,预设条件可以是具有指定的蛋白表达能力等级。
例如,在确定各个待测细胞对应的蛋白表达能力等级后,例如将各个待测细胞划分为高水平表达、中水平表达、低水平表达和不表达四个等级后,可以将高表达水平的待测细胞确定为目标细胞。
步骤403,将所述多个目标细胞分别对应的待测细胞灰度图,输入所述目标生成网络模型。
在确定多个目标细胞后,可以将多个目标细胞分别对应的待测细胞灰度图,输入目标生成网络模型。
在本实施例中,根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级,从多个待测细胞中确定蛋白表达能力等级满足预设条件的多个目标细胞,并将多 个目标细胞分别对应的待测细胞灰度图,输入目标生成网络模型,可以对待测细胞进行初步筛选,从而将蛋白表达能力较差或不表达的细胞在细胞筛除,避免在筛选高蛋白表达能力细胞的过程中,因蛋白表达能力低或不表达的细胞带来预测干扰。
在一个实施例中,所述根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级,可以包括如下步骤:
将各个待测细胞的待测细胞灰度图输入细胞分类网络模型;所述细胞分类网络模型采用具有蛋白表达量标签的多张训练细胞灰度图对初始卷积神经网络训练得到;所述蛋白表达量标签用于表征各训练细胞灰度图中细胞的蛋白表达能力等级;所述细胞分类网络模型用于检测输入细胞分类网络模型的细胞灰度图中细胞的蛋白表达能力等级;根据所述细胞分类网络模型的输出,确定各个待测细胞中的蛋白表达能力等级。
在实际应用中,可以预先设置有细胞分类网络模型,细胞分类网络模型可以采用具有蛋白表达量标签的多张训练细胞灰度图,对初始卷积神经网络进行训练得到,其中,蛋白表达量标签可以表征各个训练细胞灰度图中细胞的蛋白表达能力等级。通过细胞分类网络模型,可以预测输入细胞分类网络模型的细胞灰度图中细胞的蛋白表达能力等级。
基于此,在得到各个待测细胞的待测细胞灰度图后,可以将多张待测细胞灰度图输入细胞分类网络模型,确定各个待测细胞的蛋白表达能力等级。
在本实施例中,将各个待测细胞的待测细胞灰度图输入细胞分类网络模型,并根据细胞分类网络模型的输出,确定各个待测细胞中的蛋白表达能力等级,可以根据细胞灰度图,快速高效地对待测细胞的蛋白表达能力等级进行初步评估,
在一个实施例中,如图5所示,还可以包括如下步骤:
步骤501,获取训练细胞灰度图及其对应的真实荧光图。
在实际应用中,可以获取训练细胞灰度图及其对应的真实荧光图。
步骤502,确定所述真实荧光图中绿色通道对应的数值。
在实际应用中,通过目的基因(例如外源DNA片段)生成的蛋白质,可以在特定波长下发出荧光。在得到真实荧光图后,可以确定真实荧光图中绿色通道对应的数值,即真实荧光图片的G值,其中,G值也可以称为荧光值。
步骤503,根据所述真实荧光图中绿色通道对应的数值,确定对应的所述训练细胞灰度图中细胞的真实蛋白表达量,以及所述真实蛋白量对应的蛋白表达能力等级,并基于所述蛋白表达能力等级得到所述训练细胞灰度图的蛋白表达量标签。
在确定绿色通道对应的数值,可以基于该数值,确定对应的训练细胞灰度图中细胞的真实蛋白表达量,以及与该真实蛋白表达量对应的蛋白表达能力等级,进而可以采用蛋白表达能力等级确定训练细胞灰度图的蛋白表达量标签。
具体而言,荧光值与蛋白表达量之间可以为正相关的关系,通过获取荧光值与蛋白表达量之间的数量映射关系,可以采用荧光值确定真实蛋白表达量。
在得到训练细胞灰度图中细胞的真实蛋白表达量,可以获取预设的等级划分列表,通过等级划分列表,可以确定当真实蛋白表达量属于一数值范围时,其对应的蛋白表达能力等级。 根据真实蛋白表达量和等级划分列表,可以确定与真实蛋白表达量对应的蛋白表达能力等级。
步骤504,采用所述蛋白表达量标签和所述训练细胞灰度图对初始卷积神经网络进行训练,得到细胞分类网络模型。
在得到蛋白表达量标签后,可以采用具有蛋白表达量标签的训练细胞灰度图对初始卷积神经网络进行训练,得到细胞分类网络模型。
在实际应用中,初始卷积神经网络可以包括第一网络结构、第二网络结构、第三网络结构和第四网络结构。
其中,第一网络结构可以是10层卷积层组成的特征提取网络,第一网络结构可用于从训练细胞灰度图中提取训练细胞特征,训练细胞特征可以分别输入至第二网络结构、第三网络结构和第四网络结构。
第二网络结构可以包括1层卷积层和全局平均池化层,第二网络结构可以与第一网络结构连接,并将第一网络结构输出的训练细胞特征输入卷积层,经卷积层和全局平均池化层处理后,得到第一细胞特征向量。
第三网络结构可以包括2层卷积和全局最大池化层,第三网络结构可以与第一网络结构连接,并将第一网络结构输出的训练细胞特征输入卷积层,经2层卷积层和全局最大池化层处理后,得到第二细胞特征向量。
第四网络结构可以包括2层卷积和全局平均池化层,第三网络结构可以与第一网络结构连接,并将第一网络结构输出的训练细胞特征输入卷积层,经2层卷积层和全局平均池化层处理后,得到第三细胞特征向量。
在获取第一细胞特征向量、第二细胞特征向量和第三细胞特征向量后,可以确定每个网络结构对应的权重,并对第一细胞特征向量、第二细胞特征向量和第三细胞特征向量进行加权求和,加权求和后的结果即为训练细胞灰度图中细胞的分类结果,即细胞对应的蛋白表达能力等级。
在本实施例中,获取训练细胞灰度图及其对应的真实荧光图,确定真实荧光图中绿色通道对应的数值,根据该数值,确定对应的训练细胞灰度图中细胞的真实蛋白表达量,采用蛋白表达量标签和训练细胞灰度对初始卷积神经网络进行训练,能够以真实荧光图的荧光值作为中间变量,对蛋白表达量进行量化后再确定蛋白表达能力等级,得到准确的蛋白表达量标签,为细胞分类网络模型的训练提供准确的训练数据。
在一个实施例中,所述根据多张预测荧光图,确定多个待测细胞分别对应的蛋白表达量,可以包括如下步骤:
确定多张预测荧光图中绿色通道分别对应的数值;根据多个预测荧光图绿色通道对应的数值,确定多个待测细胞分别对应的蛋白表达量。
在获取多张预测荧光图后,可以确定多张预测荧光图中绿色通道分别对应的数值,即预测荧光图的G值,并基于预测荧光图绿色通道对应的数值,确定多个待测细胞分别对应的蛋白表达量。
在实际应用中,荧光图的G值与蛋白表达量之间可以为正相关的关系,通过获取荧光图 的G值与蛋白表达量之间的数量映射关系,可以采用荧光图的G值确定待测细胞对应的蛋白表达量。
在本实施例中,根据多个预测荧光图绿色通道对应的数值,确定多个待测细胞分别对应的蛋白表达量,能够以预测荧光图绿色通道对应的数值作为中间变量,对待测细胞灰度图中细胞的蛋白表达量进行量化。
在一个实施例中,还包括:
对所述多个待测细胞分别对应的蛋白表达量进行排序,并从排序后的多个蛋白表达量中,将排序最前的预设数量的蛋白表达量确定为目标表达量;获取所述目标表达量对应的目标灰度图,并将所述目标灰度图对应的待测细胞确定为用于培养细胞株的目标细胞。
在具体实现中,在得到多个待测细胞分别对应的蛋白表达量后,可以对多个蛋白表达量进行排序,并从排序后的蛋白表达量中,将排序最前的预设数量的蛋白表达量确定为目标表达量。
具体的,可以对多个待测细胞分别对应的蛋白表达量进行降序排列,即由大到小进行排序,在排序后,可以将前N名对应的蛋白表达量,确定为目标表达量。当然,在实际应用中,还可以将超过预设表达量阈值的蛋白表达量确定为目标表达量。
在确定目标表达量后,可以确定目标表达量对应的待测细胞灰度图,并将待测细胞灰度图对应的待测细胞,确定为用于培养细胞株的目标细胞。
在本实施例中,对多个蛋白表达量进行排序,并根据排序后的多个蛋白表达量,从多个待测细胞中,将蛋白表达量排序最前的预设数量的细胞确定为目标细胞,能够快速筛选具有高蛋白表达量的细胞,既减少了筛选工作量,又有效缩短细胞筛选周期。
在一个实施例中,所述获取训练细胞灰度图,可以包括如下步骤:
获取用于模型训练的原始细胞灰度图,并对所述原始细胞灰度图进行归一化处理;对处理后的原始细胞灰度图进行数据增强处理,得到训练细胞灰度图;所述数据增强处理包括以下任一项或多项:旋转处理、翻转处理、对比度增强处理、随机剪裁处理。
在具体实现中,可以获取用于模型训练的原始细胞灰度图,并对原始细胞灰度图进行归一化处理,其中,原始细胞灰度图可以是使用显微镜对作为训练集的细胞拍摄的灰度图。
在进行归一化处理后,可以对处理后的原始细胞灰度图进行数据增强处理,例如对图像进行旋转、翻转、随机剪裁,或者增强图像的对比度。
在本实施例中,通过对处理后的原始细胞灰度图进行数据增强处理,得到训练细胞灰度图,可以增加用于训练初始卷积神经网络模型的训练细胞灰度图,在训练样本不足的情况下,快速扩大训练样本,为初始卷积神经网络模型的训练提供数据支撑。
为了使本领域技术人员能够更好地理解上述步骤,以下通过一个例子对本申请实施例加以示例性说明,但应当理解的是,本申请实施例并不限于此。
如图6所示,可以通过显微镜拍摄细胞的灰度图像(即本申请中的待测细胞灰度图),并将灰度图像输入训练好的细胞分类网络模型,通过细胞分类网络模型的输出结果,得到高荧光表达细胞,即蛋白表达能力等级为高水平的细胞。在确定高荧光表达细胞后,可以使用目 标生成网络模型,对高荧光表达细胞的荧光图进行预测,获得预测的细胞荧光图片(即本申请中的预测荧光图)
针对每个细胞对应的灰度图,都可以输入至细胞分类网络模型,获得分类结果,针对分类结果中具有高蛋白表达量的细胞,在生成对应的细胞荧光图片后,可以检测是否完成所有灰度图像的处理,若否,可以返回对灰度图像进行分析,通过细胞分类模型得到具有高蛋白表达量的细胞的步骤;若是,可以根据预测的细胞荧光图片,计算细胞的蛋白表达量。
在获取多个细胞对应的蛋白表达量后,可以对多个蛋白表达量进行排序,并按照预设筛选提交从多个细胞中筛选目标细胞,生成并提交筛选报告。
应该理解的是,虽然图1、2、4-6的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1、2、4-6中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图7所示,提供了一种蛋白表达量的预测装置,所述装置可以包括:
待测细胞灰度图获取模块701,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
第一输入模块702,用于将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
预测荧光图获取模块703,用于根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
蛋白表达量确定模块704,用于根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
在一个实施例中,所述第一输入模块702包括:
第一表达能力等级确定子模块,用于根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级;
筛选子模块,用于从多个待测细胞中确定蛋白表达能力等级满足预设条件的多个目标细胞;
第二输入子模块,用于将所述多个目标细胞分别对应的待测细胞灰度图,输入所述目标生成网络模型。
在一个实施例中,所述表达能力等级确定子模块包括:
第三输入单元,用于将各个待测细胞的待测细胞灰度图输入细胞分类网络模型;所述细胞分类网络模型采用具有蛋白表达量标签的多张训练细胞灰度图对初始卷积神经网络训练得到;所述蛋白表达量标签用于表征各训练细胞灰度图中细胞的蛋白表达能力等级;所述细胞 分类网络模型用于检测输入细胞分类网络模型的细胞灰度图中细胞的蛋白表达能力等级;
输出单元,用于根据所述细胞分类网络模型的输出,确定各个待测细胞中的蛋白表达能力等级。
在一个实施例中,所述装置还可以包括:
真实荧光图获取模块,用于获取训练细胞灰度图及其对应的真实荧光图;
第一荧光值确定模块,用于确定所述真实荧光图中绿色通道对应的数值;
第二表达能力等级确定模块,用于根据所述绿色通道对应的数值,确定对应的所述训练细胞灰度图中细胞的真实蛋白表达量,以及所述真实蛋白量对应的蛋白表达能力等级,并基于所述蛋白表达能力等级得到所述训练细胞灰度图的蛋白表达量标签;
细胞分类网络模型生成模块,用于采用所述蛋白表达量标签和所述训练细胞灰度对初始卷积神经网络进行训练,得到细胞分类网络模型。
在一个实施例中,所述装置还可以包括:
第一模型获取模块,用于获取生成式对抗网络模型、训练细胞灰度图及其对应的真实荧光图;所述生成式对抗网络模型包括待训练的生成网络模型和判别网络模型;
第一待判别荧光图生成模块,用于将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
判别结果获取模块,用于将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
判别网络参数调整模块,用于根据所述判别结果,调整所述判别网络模型的网络参数;
模型训练切换模块,用于当所述判别网络模型训练了设定次数时,切换为对所述生成网络模型进行训练;交替训练所述生成网络模型和所述判别网络模型,直到得到目标生成网络模型。
在一个实施例中,所述模型训练切换模块包括:
第二模型获取子模块,用于将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
第二待判别荧光图生成子模块,用于将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
损失误差确定子模块,用于当所述判别结果表征所述待判别荧光图为假,根据所述待判别荧光图、所述真实荧光图和所述判别网络模型,计算所述生成网络模型的损失误差;
生成网络参数调整子模块,用于根据所述损失误差,调整所述生成网络模型的网络参数。
在一个实施例中,所述蛋白表达量确定模块包括:
第二荧光值确定子模块,用于确定多张预测荧光图中绿色通道分别对应的数值;
映射子模块,用于根据多个预测荧光图中绿色通道对应的数值,确定多个待测细胞分别对应的蛋白表达量。
在一个实施例中,所述装置还可以包括
排序模块,用于对所述多个待测细胞分别对应的蛋白表达量进行排序,并从排序后的多 个蛋白表达量中,将排序最前的预设数量的蛋白表达量确定为目标表达量;
目标细胞确定模块,用于获取所述目标表达量对应的目标灰度图,并将所述目标灰度图对应的待测细胞确定为用于培养细胞株的目标细胞。
关于一种蛋白表达量的预测装置的具体限定可以参见上文中对于一种蛋白表达量的预测方法的限定,在此不再赘述。上述一种蛋白表达量的预测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种蛋白表达量的预测方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
在一个实施例中,处理器执行计算机程序时还实现上述其他实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图 标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
在一个实施例中,计算机程序被处理器执行时还实现上述其他实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种蛋白表达量的预测方法,其特征在于,所述方法包括:
    获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
    将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
    根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
    根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
  2. 根据权利要求1所述的方法,其特征在于,所述将多张待测细胞灰度图输入目标生成网络模型,包括:
    根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级;
    从多个待测细胞中确定蛋白表达能力等级满足预设条件的多个目标细胞;
    将所述多个目标细胞分别对应的待测细胞灰度图,输入所述目标生成网络模型。
  3. 根据权利要求2所述的方法,其特征在于,所述根据各个待测细胞的待测细胞灰度图,确定各个待测细胞中的蛋白表达能力等级,包括:
    将各个待测细胞的待测细胞灰度图输入细胞分类网络模型;所述细胞分类网络模型采用具有蛋白表达量标签的多张训练细胞灰度图对初始卷积神经网络训练得到;所述蛋白表达量标签用于表征各训练细胞灰度图中细胞的蛋白表达能力等级;所述细胞分类网络模型用于检测输入模型的细胞灰度图中细胞的蛋白表达能力等级;
    根据所述细胞分类网络模型的输出,确定各个待测细胞中的蛋白表达能力等级。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    获取训练细胞灰度图及其对应的真实荧光图;
    确定所述真实荧光图中绿色通道对应的数值;
    根据所述真实荧光图中绿色通道对应的数值,确定对应的所述训练细胞灰度图中细胞的真实蛋白表达量,以及所述真实蛋白量对应的蛋白表达能力等级,并基于所述蛋白表达能力等级得到所述训练细胞灰度图的蛋白表达量标签;
    采用所述蛋白表达量标签和所述训练细胞灰度图对初始卷积神经网络进行训练,得到细胞分类网络模型。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    获取生成式对抗网络模型、训练细胞灰度图及其对应的真实荧光图;所述生成式对抗网络模型包括待训练的生成网络模型和判别网络模型;
    将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
    将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
    根据所述判别结果,调整所述判别网络模型的网络参数;
    当所述判别网络模型训练了设定次数时,切换为对所述生成网络模型进行训练;交替训练所述生成网络模型和所述判别网络模型,直到得到目标生成网络模型。
  6. 根据权利要求5所述的方法,其特征在于,所述对所述生成网络模型进行训练,包括:
    将所述训练细胞灰度图输入所述生成网络模型,获取所述生成网络模型输出的待判别荧光图;
    将所述待判别荧光图和所述真实荧光图输入所述判别网络模型,获取所述待判别荧光图对应的判别结果;
    当所述判别结果表征所述待判别荧光图为假,根据所述待判别荧光图、所述真实荧光图和所述判别网络模型,计算所述生成网络模型的损失误差;
    根据所述损失误差,调整所述生成网络模型的网络参数。
  7. 根据权利要求1所述的方法,其特征在于,还包括
    对所述多个待测细胞分别对应的蛋白表达量进行排序,并从排序后的多个蛋白表达量中,将排序最前的预设数量的蛋白表达量确定为目标表达量;
    获取所述目标表达量对应的目标灰度图,并将所述目标灰度图对应的待测细胞确定为用于培养细胞株的目标细胞。
  8. 一种蛋白表达量的预测装置,其特征在于,所述装置包括:
    待测细胞灰度图获取模块,用于获取细胞培养池中多个待测细胞分别对应的待测细胞灰度图;
    第一输入模块,用于将多张待测细胞灰度图输入目标生成网络模型;所述目标生成网络模型采用多张训练细胞灰度图对生成式对抗网络模型训练得到;所述多张训练细胞灰度图分别具有对应的荧光图标签;所述荧光图标签为训练细胞灰度图对应的真实荧光图;
    预测荧光图获取模块,用于根据所述目标生成网络模型的输出,得到所述多个待测细胞分别对应的预测荧光图;
    蛋白表达量确定模块,用于根据所述预测荧光图,确定所述多个待测细胞分别对应的蛋白表达量。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述的蛋白表达量的预测方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的蛋白表达量的预测方法的步骤。
PCT/CN2021/114173 2020-08-26 2021-08-24 蛋白表达量的预测方法、装置、计算机设备和存储介质 WO2022042510A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010870702.0A CN112001329B (zh) 2020-08-26 2020-08-26 蛋白表达量的预测方法、装置、计算机设备和存储介质
CN202010870702.0 2020-08-26

Publications (1)

Publication Number Publication Date
WO2022042510A1 true WO2022042510A1 (zh) 2022-03-03

Family

ID=73471063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114173 WO2022042510A1 (zh) 2020-08-26 2021-08-24 蛋白表达量的预测方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN112001329B (zh)
WO (1) WO2022042510A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017730B (zh) * 2020-08-26 2022-08-09 深圳太力生物技术有限责任公司 基于表达量预测模型的细胞筛选方法和装置
CN112001329B (zh) * 2020-08-26 2021-11-30 深圳太力生物技术有限责任公司 蛋白表达量的预测方法、装置、计算机设备和存储介质
CN112861986B (zh) * 2021-03-02 2022-04-22 广东工业大学 一种基于卷积神经网络的血脂亚组分含量检测方法
CN113539374A (zh) * 2021-06-29 2021-10-22 深圳先进技术研究院 高热稳定性酶的蛋白序列生成方法、装置、介质和设备
CN113782093B (zh) * 2021-09-16 2024-03-05 平安科技(深圳)有限公司 一种基因表达填充数据的获取方法及装置、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815870A (zh) * 2019-01-17 2019-05-28 华中科技大学 细胞表型图像定量分析的高通量功能基因筛选方法及系统
CN109903284A (zh) * 2019-03-04 2019-06-18 武汉大学 一种her2免疫组化图像自动判别方法及系统
WO2019172901A1 (en) * 2018-03-07 2019-09-12 Google Llc Virtual staining for tissue slide images
CN112001329A (zh) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 蛋白表达量的预测方法、装置、计算机设备和存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NZ555575A (en) * 2004-11-05 2010-11-26 Us Gov Sec Navy Diagnosis and prognosis of infectious disease clinical phenotypes and other physiologic states using host gene expression biomarkers in blood
SG10201507721SA (en) * 2012-01-20 2015-10-29 Agency Science Tech & Res CHO-GMT Recombinant Protein Expression
CN108376565B (zh) * 2018-02-13 2022-07-19 北京市神经外科研究所 一种脑胶质瘤Ki-67表达水平的影像组学预测方法
CN109061131A (zh) * 2018-06-29 2018-12-21 志诺维思(北京)基因科技有限公司 染色图片处理方法及装置
US10885631B2 (en) * 2019-02-01 2021-01-05 Essen Instruments, Inc. Label-free cell segmentation using phase contrast and brightfield imaging
CN109903282B (zh) * 2019-02-28 2023-06-09 安徽省农业科学院畜牧兽医研究所 一种细胞计数方法、系统、装置和存储介质
CN110136103B (zh) * 2019-04-24 2024-05-28 平安科技(深圳)有限公司 医学影像解释方法、装置、计算机设备及存储介质
CN110853703A (zh) * 2019-10-16 2020-02-28 天津大学 一种对蛋白质二级结构进行半监督学习预测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019172901A1 (en) * 2018-03-07 2019-09-12 Google Llc Virtual staining for tissue slide images
CN109815870A (zh) * 2019-01-17 2019-05-28 华中科技大学 细胞表型图像定量分析的高通量功能基因筛选方法及系统
CN109903284A (zh) * 2019-03-04 2019-06-18 武汉大学 一种her2免疫组化图像自动判别方法及系统
CN112001329A (zh) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 蛋白表达量的预测方法、装置、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NACEI: "A Preliminary Screening Method for Cell Expression Levels based on Gray-level Co-occurrence Matrix", BAIDU SNAPSHOTS, pages 1 - 2, XP009534819, [retrieved on 20211018] *

Also Published As

Publication number Publication date
CN112001329A (zh) 2020-11-27
CN112001329B (zh) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2022042510A1 (zh) 蛋白表达量的预测方法、装置、计算机设备和存储介质
CA2948499C (en) System and method for classifying and segmenting microscopy images with deep multiple instance learning
WO2022042506A1 (zh) 基于卷积神经网络的细胞筛选方法和装置
US20190087954A1 (en) Pathology case review, analysis and prediction
JP2021503666A (ja) 単一チャネル全細胞セグメンテーションのためのシステム及び方法
CN113454733A (zh) 用于预后组织模式识别的多实例学习器
US11756677B2 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN110751644B (zh) 道路表面裂纹检测方法
WO2022042509A1 (zh) 基于表达量预测模型的细胞筛选方法和装置
WO2023216747A1 (zh) 对象确定方法、装置、计算机设备和存储介质
Zhang et al. DeepPhagy: a deep learning framework for quantitatively measuring autophagy activity in Saccharomyces cerevisiae
Dürr et al. Know when you don't know: a robust deep learning approach in the presence of unknown phenotypes
CN114359563A (zh) 模型训练方法、装置、计算机设备和存储介质
CN111815633A (zh) 医用图像诊断装置、图像处理装置和方法、判断单元以及存储介质
KR101913952B1 (ko) V-CNN 접근을 통한 iPSC 집락 자동 인식 방법
Lehnert et al. Comparative assessment of immune evasion mechanisms in human whole-blood infection assays by a systems biology approach
EP4379676A1 (en) Detection system, detection apparatus, learning apparatus, detection method, learning method and program
CN116486404B (zh) 基于卷积神经网络的针状焦显微图检测的方法和装置
Itano et al. An automated image analysis and cell identification system using machine learning methods
US20230062003A1 (en) System and method for interactively and iteratively developing algorithms for detection of biological structures in biological samples
CN111862003B (zh) 医疗影像目标信息获取方法、装置、设备和存储介质
Ku et al. New dimension in leaf stomatal behavior analysis: a robust method with machine learning approach
Santa Cruz et al. Generalising from Conventional Pipelines: A Case Study in Deep Learning-Based for High-Throughput Screening
Mo et al. A Suitability Assessment Framework for Medical Cell Images in Chromosome Analysis
Ferreira et al. Classification and counting of cells in brightfield microscopy images: an application of convolutional neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860345

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21860345

Country of ref document: EP

Kind code of ref document: A1