CN114970862A

CN114970862A - PDL1 expression level prediction method based on multi-instance knowledge distillation model

Info

Publication number: CN114970862A
Application number: CN202210460006.1A
Authority: CN
Inventors: 白相志; 晋达睿
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-08-30
Anticipated expiration: 2042-04-28
Also published as: CN114970862B

Abstract

The invention relates to a PDL1 expression level prediction method based on a multi-instance knowledge distillation model, which comprises the following steps: the method comprises the following steps: image block level PDL1 expression level prediction is carried out on the digital pathology by utilizing a teacher-student convolutional neural network model based on multi-instance learning; step two: constructing a weighted distillation loss function to constrain the optimization process of a student network S, and reasonably configuring PDL1 weak label supervision information; step three: and (3) classifying the global aggregation characteristics by using a depth stochastic model, and realizing the prediction of the PDL1 expression level of the digital pathological image level. The method can construct the association between histopathology image morphological characteristics and PDL1 molecular phenotype, and simultaneously designs a characteristic extraction framework combining multi-instance learning and knowledge distillation models aiming at the weak label problem, thereby ensuring the reliability and stability of the PDL1 expression level prediction result. The invention can be closely combined with medical system and application in the field related to pathological analysis, and has wide market prospect and application value.

Description

PDL1 expression level prediction method based on multi-instance knowledge distillation model

Technical Field

The invention relates to a PDL1 expression level prediction method based on a multi-instance knowledge distillation model, and belongs to the field of computational pathology, pattern recognition and computer vision. It mainly relates to multi-instance learning, self-supervised learning and semi-supervised learning. Has wide application prospect in the medical system and the related field of pathological analysis.

Background

And the step of calculating pathology refers to analyzing and explaining digitalized pathology information by using machine learning and deep learning algorithms. In the process, pathological sections sampled from a patient are digitalized into high-resolution images in a specific format by a fully automatic scanner, high-precision multi-view seamless splicing and processing are carried out on a computer, and the obtained images are called full-section images (WSI). The maximum physical resolution of a common full-slice image is 0.25 micron/pixel, and the resolution of a single slice image can reach 15000 multiplied by 15000 and 100000 multiplied by 100000 pixels, so that the full-slice image contains abundant visualized pathological information. Currently, common analysis in digital pathology includes determination of tissue/cell malignancy, quantification of cell level biomarker expression level, detection and classification of specific types of tissue/cell/subcellular structures, and the like, and related algorithms and models are highly crossed with tasks such as classification, segmentation, detection, and the like in the field of computer vision. In the analysis of digital pathological information, prognosis, biomarker analysis, etc., computer vision models exhibit performance comparable to or even superior to that of pathologists. Especially in applications involving various subcellular structures such as tumor immune microenvironment analysis, pathologists generally need to rely on means with large time overhead and capital cost overhead such as whole genome sequencing and exome sequencing to obtain exact conclusions, and relevant machine learning algorithm models can further mine potential features and patterns in pathological image information to realize effective quantitative analysis and calculation. Considering the problems of large cancer patient size, shortage of pathologists, large pathological analysis conclusion confidence fluctuation caused by physician level variation and the like existing in China and even in the world at present, the development of the digital pathological image analysis technology has important significance for reducing pathological research cost, improving interpretation accuracy and supplementing medical resource gaps.

In recent years, deep learning has been rapidly developed, and attention has been paid to the development in many fields. Thanks to the open sources of some large pathological image datasets, data-driven ideas began to be applied to pathological image analysis. CAMELYON is the first large sentinel lymph node metastasis dataset with pixel-level labeling published in the field of computational pathology, comprising in total 1399 full-slice images and their corresponding tumor region labeling (see: Littens, Bandy, Bernoulli et al.1399 breast cancer patients H & E stained sentinel lymph node sections: CAMELYON dataset. DataSci, Vol 7,2018(G.Litjens, P.Bandi and B.E.Bejnordi, et al., "1399H & E-stabilized sentinel lymph node sections of Breast cancer patches: the CAMELYON dataset," Gigance, vol.7, No.6, giy065,2018)). With the open source of such datasets, a portion of deep learning network models began to be designed for the tasks of tumor region detection and subtype classification, cell detection and counting, mitotic detection, and tumor staging. Based on a CAMELYON16 data set, D.Wang et al propose a GoogleLeNet-based two-stage breast cancer lymphatic sentinel transfer detection model (image block level classification-slice level classification), firstly cutting pathological images into blocks, and training an image block classifier under the supervision of pixel level labels; then, based on the image block classification result, a plurality of geometric features and morphological features are manually constructed to classify the slices (see the literature: Wang, Coskla, Calgiia, etc.. Breast cancer metastasis identification based on Deep learning. Preprint, https:// axiv.org/abs/1606.05718,2016. (D.Wang, A.Khosla, R.Gargeya, H.Irload and A.H.Beck, "Deep learning for identification method crack cancer," Preprint https:// axiv.org/abs/1606.05718,2016)). Wang et al further introduces a rough label into the pathological section, constructs a loss function training image block classifier under the common supervision of part of rough and section level labels; and a section level classifier integrating a local feature descriptor, a category feature descriptor and a global feature descriptor is constructed to realize the discrimination of the lung cancer sections (see the literature: king, chen, gan, et al. analysis of the lung cancer full-section image based on weak supervised learning. the institute of electrical and electronics engineers' control treaty of the united states university of america, volume 50, 3950-. Syrykh et al discriminates between follicular hyperplasia and follicular lymphoma using the mean of the output probabilities based on the results obtained by the image block classifier (Serratch, Abhler, Amarala, et al. lymphoma in histopathological images was accurately discriminated using deep learning. Nature's Provisions of Cooperation, Digital Medicine, Vol 3,2020.(C. Syrykh, A. Abreu and N. Amara et al., "Accurate diagnosis of lymphoma on wall-tissue pathology images using depletion study," npj Digital Medicine, vol.3,63,2020.)). Tabibu et al subtype Classification and prediction of corresponding survival using a similar deep learning model framework (see references: Tabibu, Venoude, Waha. histopathology image-based Classification and survival prediction. Scientific report, Vol.9,10509,2019. (S. Tabibu, P.K.video and C.V.Jawahar, "Classification and subvalprediction from histopathology images using leave," Scientific Reports, 9,10509, 2019.)). Rawat et al proposed a concept of "tissue fingerprint" which considers that the histopathology images contain key individual variability information, and maps the image blocks to individual identification using neural networks, and discriminates the expression of ER, PR and HER2 genes of breast cancer based on the features extracted by the neural networks (Laval, Oltcaga, Roy et al. discriminating the state of ER, PR, HER2 in H & E slices of breast cancer using Deep tissue fingerprint features. Scientific report, Vol.10,7275,2020. (R.R.Rawat, I.Ortega and P.Roy et al., "Deep led tissue fingerprint's' classification samples cameras by ER/PR/Her2 stations from H & E images," Scientific Reports, Vol.9,7275, 2020.). Iizuka et al feature extraction of histopathological images of gastric and colon cancers based on the inclusion-V3 network and comprehensive determination of image patch features using a circular convolutional neural network (meal Tukawa et al, Deep learning based histopathological classification of gastric and colon epithelial tumors Scientific report, volume 10,1504,2020 (O.Iizuka, et al, "Deep learning models for histopathological classification of structural and colonic epithelial tumors," Scientific Reports, vol.10,1504, 2020)). Xu et al proposed a histopathological image segmentation and classification framework based on multi-instance learning, which constructs two multi-instance learning logics of max-max and max-min, and uses them to re-assign weak labels, and converts the weak supervised problem into a supervised problem for solving (permit, sons, grandsons, et al CAMEL: a weak supervised learning framework for histopathological image segmentation. Computer Vision international Conference 10681 and 20190 (g.xu et al. CAMEL: a weak supervised learning framework for histopathological image segmentation. in proc. international Conference 10681 and 10690, 2019.)).

Aiming at the relation between the histomorphism characteristics and the pathological information in pathological sections, the main range of the research of the methods is limited in the problem that a pathologist can judge and obtain an accurate conclusion through the auxiliary observation of equipment such as a microscope and the like, for example, benign and malignant tissues are distinguished, different tumor types are distinguished, the subtype of the tumor tissue is judged, and the corresponding morphological relation is relatively clear; meanwhile, aiming at the problem of supervision labels, most methods rely on slice-level weak labels to supervise the whole situation, so that the task precision is severely limited, or rely on a few data sets with pixel-level labels to develop algorithm research, so that high requirements are provided for specific application scenes. Some weak supervised learning related algorithms clearly limit data distribution, and are not applicable to data which do not conform to hypothesis, so that effective expansion in the field of pathological image analysis cannot be realized. The expression level of PDL1 is generally determined by mRNA sequencing or immunohistochemical method and belongs to the biomacromolecule level, the expression level cannot be determined by a pathologist under the observation of a common hematoxylin-eosin stained section, and the morphological correlation is unknown. The PDL1 expression level prediction method based on the multi-instance knowledge distillation model can construct the association between histopathology image morphological characteristics and PDL1 molecular phenotype, and simultaneously designs a characteristic extraction framework combining multi-instance learning and the knowledge distillation model aiming at the weak label problem, so that the reliability and stability of the PDL1 expression level prediction result are ensured.

Disclosure of Invention

In view of the above problems, the invention aims to provide a PDL1 expression level prediction method based on a multi-instance knowledge distillation model, which constructs a correlation between histopathology image morphological characteristics and PDL1 molecular phenotype, and simultaneously designs a characteristic extraction framework combining multi-instance learning and knowledge distillation models for weak label problem, thereby ensuring the reliability and stability of PDL1 expression level prediction results.

In order to achieve the purpose, the overall idea of the technical scheme is based on a multi-instance learning frame, a teacher-student convolutional neural network model is used for extracting the characteristics of digital pathological image blocks and predicting the PDL1 expression level of the image block level, multi-scale global characteristics of corresponding digital pathological images are constructed based on the multi-scale image block level prediction result, and depth random forests are used for distinguishing the global characteristics to achieve PDL1 expression level prediction of corresponding cases. The algorithm technical idea of the invention is mainly embodied in the following four aspects:

1) designing a maximization-minimization training framework based on multi-instance learning, and respectively supervising and optimizing PDL1 high-expression instances and low-expression instances;

2) constructing an average teacher-student convolutional neural network model pair, performing data distribution constraint on a student network by using a teacher network and typical image block example classification loss, and updating the teacher network reversely by using the student network;

3) constructing a weighted distillation loss function, and pertinently solving the problems of class imbalance and label generalization in samples and among samples;

4) and classifying the overall characteristics of the pathological images fused based on the horizontal characteristics of the image blocks and the prediction results by using a depth random forest, and predicting the PDL1 expression level of the case.

The invention relates to a PDL1 expression level prediction method based on a multi-instance knowledge distillation model, which comprises the following specific steps:

the method comprises the following steps: performing digital pathological image block level PDL1 expression level prediction by using a teacher-student convolutional neural network model based on multi-instance learning;

constructing a teacher network T and a student network S by using a backbone network based on ResNet34 pre-trained on ImageNet, and training the student network under a maximization-minimization multi-sample learning framework by using the number of Fragments (FPKM) matched in every thousand basic groups read by sequencing the case to which the pathological image corresponding to the image block belongs as an initial label of expression level. Optimizing student network S parameters based on teacher network T distribution constraint and typical image block example classification loss; and updating the teacher network T parameter in an exponential moving average mode based on the updated student network S parameter, updating each sample label according to the data distribution output by the teacher network after each training, and reserving the original labels of the typical image blocks in a certain proportion for subsequent training. The final output image block corresponds to the PDL1 expression level prediction result. The specific process is as follows:

s11, considering weak label attributes and specific distribution conditions of digital pathological images, presuming that high-expression image blocks and low-expression image blocks exist in the PDL1 high-expression pathological section and the low-expression pathological section in a certain proportion, and constructing a multi-instance learning training framework. And (3) giving the initial image block labels the same as the corresponding pathological images, and performing typicality judgment on each image block of the training set based on the intermediate state network of each stage in the training process. And constructing a maximization-minimization sample screening mode based on the corresponding hypothesis, and respectively selecting high-expression typical image blocks and low-expression typical image blocks in a certain proportion, wherein the labels of the high-expression typical image blocks and the low-expression typical image blocks are the same as those of the images, and the rest image blocks are set to be label-free. And continuing training the network based on the updated data set distribution, and continuously and iteratively updating the data set distribution and the network parameters.

And S12, constructing a teacher-student (T-S) network pair with a twin structure on the basis of the training framework. The student network S parameter updating is constrained by a maximization-minimization multi-instance learning framework, and specifically comprises the following steps: the image blocks with labels under the distribution of the current data set constrain the updating of student network parameters in a cross entropy classification loss form, and the unlabeled data participate in the updating of student network S parameters by measuring the consistency of the teacher network output data distribution and the student network output distribution. And after the student network parameters are updated, updating the teacher network T parameters in an exponential moving average mode. And in the training process, the T and the S are supervised and alternately updated until convergence.

Step two: constructing a weighted distillation loss function to constrain student network optimization, and reasonably configuring PDL1 weak label supervision information

Considering the number of fragments matched in each thousand bases obtained by sequencing of cases as the basis of the PDL1 expression level, one third of the samples with the highest reading is taken as a high expression sample, and the rest are taken as low expression samples. The number of low expression samples is about twice that of high expression samples. A large number of low-expression-level image blocks may exist in the high-expression samples at the same time, and the data has the characteristic of obvious sample distribution imbalance, so the method introduces the weighting cross entropy classification loss function into the loss function. Meanwhile, the problem of sample weak labels is considered, a maximization-minimization multi-instance learning loss function is constructed, the classification loss function acts on strong-representation positive instances of the positive samples and strong-representation negative instances of the negative samples, and fuzzy instances which are inconsistent with the weak label attributes are prevented from being introduced into a training process as noise. A distribution consistency function is further introduced on the basis, the consistency of the student network output and the teacher network output is restrained, and the overfitting problem on a few samples is avoided while the non-strong representation examples are effectively utilized.

The specific composition of the weighted distillation loss function is as follows: in the k-th round of training, for each PDL1 high-expression case, the first 50% instance set with the highest positive probability output over the teacher network T is selected from all image blocks of the case

For low expression cases, a 50% set with the lowest positive probability is selected

Correspondence giving

And

the examples in (1) are labeled positive and negative, the remaining examples are unlabeled samples. For labeled instances, a weighted cross-entropy classification is computedA loss function expressed as

Wherein x _i Is a set

Middle image block sample, x _j Is a set

Middle image block sample, P ⁺ (x _i ) Output x for student network _i To a high expressed probability value, P ^- (x _j ) Is x _j Probability value for underexpression, α ₀ And alpha ₁ The corresponding weights for low and high expression sample classification losses respectively,

is a mathematical expectation within the range. For unlabeled sample set U ^k Calculates the distribution consistency of its output under the student network S and the teacher network T. Input data pairs are formed given the same input and with different degrees of data enhancement, and l is used ₂ The norm measures the distance between the student network and the teacher network, and the expression is

Wherein T (-) and T' (-) are different forms of random data enhancement,

and

respectively, the teacher network T and the student network S. The weighted distillation loss function is a weighted combination of the two loss functions, i.e. L ═ L _weighted-CE +ηL _consistency And η is the weight of the distribution consistency loss function. The invention adopts an adaptive momentum estimation optimizer ADAM for optimization, and the learning of a teacher network T and a student network SRate settings are all 10 ^-2 And optimizing the loss function by adjusting the network weight value through gradient back propagation.

Step three: global aggregation characteristics are classified by using a depth stochastic model, and digital pathological image level PDL1 expression level prediction is realized

On the basis of the prediction result of the image block level PDL1, a series of global features are constructed to describe the digital pathological image. The manually constructed global features include: the percentage of image blocks classified as positive (high expression) in a single digital pathological image, the positive probability distribution histogram of the image blocks in the single digital pathological image, the positive probability median of the image blocks in the single digital pathological image, the positive probability mean of the image blocks in the single digital pathological image and 512-dimensional average characteristics of the image blocks extracted by an average pooling layer in a ResNet34 network model in a specified proportion. Wherein the image block level network comprises 3 scales: 20 times magnification (objective magnification), 10 times magnification and 5 times magnification. And splicing the features of the image block level prediction network output structure under the three scales in sequence to obtain corresponding case multi-scale global features, using a case consistent with the image block level network training set as a digital pathological image PDL1 expression level prediction model training sample, constructing a depth random forest, and realizing the digital pathological image level PDL1 expression level prediction.

The PDL1 expression level prediction method based on the multi-instance knowledge distillation model is as shown in the flow chart of fig. 1, iterative training is carried out on a teacher network T and a student network S by using image block training data supervised by weak labels, and the trained teacher network T outputs prediction results and corresponding depth features of all image blocks of a tested digital pathological image. And counting to obtain global features of the corresponding digital pathological images on the basis of the image block prediction result, and predicting the PDL1 expression condition of the case by using a deep random forest on the basis of the multi-scale global features.

The invention has the advantages and effects that: the invention provides a feature extraction framework combining multi-instance learning and a knowledge distillation model, aiming at the common problem of weak label supervision in digital pathological image analysis, dynamic labels are given to training samples based on multi-instance learning ideas, and instance representation is automatically judged; by utilizing a knowledge distillation frame, parameter distribution between the unlabeled data constraint average teacher network and the student network is reintroduced, so that overfitting of the model to a specific mode is prevented while training data is effectively expanded, and the network model benefits from long-time span average parameter distribution. The overall characteristics of the multi-scale digital pathological image are obtained by constructing a multi-scale image block level classification model, and the receptive fields of the overall characteristics are enriched by combining image block level prediction result statistical characteristics and depth characteristics, so that the accurate prediction of the PDL1 expression level of the corresponding digital pathological section is realized. Compared with the traditional technical means such as whole genome sequencing and whole exome sequencing, the algorithm provided by the invention has the advantages that the capital cost and the time cost are obviously reduced, the prediction accuracy is higher, the algorithm can be used as an effective supplement for sequencing, and the market prospect and the application value are wide.

Drawings

FIG. 1 is a general flow diagram of a PDL1 expression level prediction method based on a multi-instance knowledge distillation model.

FIG. 2 is a block level multi-instance knowledge distillation model framework.

Fig. 3 is a schematic diagram of a method for predicting the expression level of a multi-scale digital pathology image PDL 1.

FIGS. 4 a-4 d are graphs showing the distribution of PDL1 expression predicted by the method of the present invention.

Detailed Description

For better understanding of the technical solutions of the present invention, the following further describes embodiments of the present invention with reference to the accompanying drawings.

The invention relates to a PDL1 expression level prediction method based on a multi-instance knowledge distillation model, the overall flow of which is shown in figure 1, and the detailed implementation steps of each part are as follows:

the first step is as follows: performing digital pathological image block level PDL1 expression level prediction by using a teacher-student convolutional neural network pair based on multi-instance learning;

s11, for each image block, its initial label is given to be the same as the pathological image it is in, i.e. for slice X _i Image block x thereof _i,j Label y of _i,j ＝S _i In which S is _i Is X _i Tag (high expression/low expression) of (1). At this time, the training data set is distributed as D ₀ ＝{(x _i,j ,y _i,j ) 1, N, j, M. Based on D ₀ To the backbone network

Training is carried out, when all training samples are traversed once, the typical judgment is carried out on the training samples based on the maximization-minimization principle, then the distribution state of the data set is updated, and the network parameter updating is continued. For the k-th round of training, let the round of training obtain the corresponding mapping of the network as

Setting a threshold p for S _i The image block included in the slice image of 1 is taken

The maximum p samples are updated with the label y _i,j ＝S _i Let the number of target samples be p > 1, i.e. for the conventional multiple sample learning mode

Relaxation of the condition; likewise, S _i The image block included in the slice image of 0 is taken

Minimum p samples y _i,j ＝S _i . Except the updated image blocks, the rest image blocks are set to be label-free, and the data set after updating is distributed to be D _k . Based on D _k Continuing to optimize the network and obtaining updated parameters

Reuse of

Updating the training set distribution to D _k+1 Iterating alternately in this way until receivingAnd (7) converging.

And S12, constructing teacher-student (T-S) network pairs with twin structures on the basis of the training frame, wherein the teacher-student model network structures adopt ResNet 34. Wherein teacher network T is an exponential moving average of student network S parameters, i.e.

θ _k,l ＝(1-α)θ′ _k,l +αθ _k,l-1 (1)

Wherein theta is _k,l And theta _k,l-1 Is the parameter of the teacher's network T at the ith and l-1 iteration of the kth round, theta _k ' _,l The parameter is the parameter of the student network S at the first iteration of the kth round, and α is the weight contributed by the student network parameter during the adjustment and update. Considering that the student network S is fast in convergence in the early stage of training, alpha can be a small value in the early stage, the updating speed of the teacher network T is improved, and alpha can be set to be a fixed value slightly smaller than 1 when the training progresses to a certain stage. In particular, in the process of the invention

l is the iteration number.

Meanwhile, the student network S is subjected to the specific distribution D of the training set at the time when the parameters are updated _k I.e., the training set distribution updated based on the previous step maximization-minimization multi-instance learning framework. Training data D based on kth wheel belt label _k,labeled Constructing a cross entropy classification loss function to form constraint on known distribution; for unlabeled data D _k,unlabeled And giving reference distribution by using the current teacher network T, and estimating the distance between the distribution and the reference distribution by using the Euclidean distance measurement student network S to form a distribution consistency loss function. Particularly, in order to avoid overfitting and simultaneously improve the network characterization capability to ensure that effective characterization features are extracted, random data enhancement including random rotation, horizontal turnover, vertical turnover, color enhancement and the like is carried out on input data of T and S to form contrast data pairs with the same data source and different forms. And in the training process, the T and the S are supervised and alternately updated until convergence. The specific framework is shown in fig. 2.

Considering the number of fragments matched in each thousand bases obtained by sequencing of cases as the basis of the PDL1 expression level, one third of the samples with the highest reading is taken as a high expression sample, and the rest are taken as low expression samples. The number of low expression samples is about twice that of high expression samples. A large number of low-expression-level image blocks may exist in the high-expression samples at the same time, and the data has the characteristic of obvious sample distribution imbalance, so the method introduces the weighting cross entropy classification loss function into the loss function. Specifically, in the k-th round of training, for each PDL1 high-expression case, the first 50% instance set with the highest positive probability of output over the teacher network T was selected from all image blocks of the case

Correspondence giving

And

the examples in (1) are labeled positive and negative, the remaining examples are unlabeled samples. For the labeled example, calculating the weighted cross entropy classification loss function of the labeled example, and expressing the weighted cross entropy classification loss function as

Wherein x _i Is a set

Middle image block sample, x _j Is a set

Middle image block sample, P ⁺ (x _i ) X output for student network _i To a high expressed probability value, P ^- (x _j ) Is x _j Probability value for underexpression, α ₀ And alpha ₁ The corresponding weights for low and high expression sample classification losses respectively,

is a mathematical expectation within the range.

Meanwhile, the problem of sample weak labels is considered, a maximization-minimization multi-instance learning loss function is constructed, the classification loss function acts on strong-representation positive instances of the positive samples and strong-representation negative instances of the negative samples, and fuzzy instances which are inconsistent with the weak label attributes are prevented from being introduced into a training process as noise. A distribution consistency function is further introduced on the basis, the consistency of the student network output and the teacher network output is restrained, and the overfitting problem on a few samples is avoided while the non-strong representation examples are effectively utilized. For unlabeled sample set U ^k Calculates the distribution consistency of its output under the student network S and the teacher network T. Input data pairs are formed given the same input and with different degrees of data enhancement, and l is used ₂ The norm measures the distance between the student network and the teacher network, and the expression is

Wherein T (-) and T' (-) are different forms of random data enhancement,

and

respectively mapping functions corresponding to the teacher network T and the student network S. The weighted distillation loss function is a weighted combination of the two loss functions, i.e. L ═ L _weighted-CE +ηL _consistency Eta is distribution uniformity lossThe weight of the function. The learning rate settings of the teacher network T and the student network S are both 10 by adopting ADAM optimization of the adaptive momentum estimation optimizer ^-2 And optimizing the loss function by adjusting the network weight value through gradient back propagation.

Generally, after training is completed, the teacher network T has a slightly higher accuracy than the student network S in discriminating the expression level of the image block PDL 1. Therefore, on the basis of the trained teacher network T, a series of global features V ═ V is constructed ₁ ,v ₂ ,v ₃ ,v ₄ ,v ₅ The complete digital pathology image is described. In particular, the amount of the solvent to be used,

wherein n is _positive For the number of positive image blocks contained in the corresponding image, n _slide The total number of image blocks included for the image. v. of ₁ The percentage of image blocks classified as positive (high expression) in a single digital pathological image; v. of ₂ ＝[p _0-0.1 ,p _0.1-0.2 ,...,p _0.9-1 ]Wherein p is _a-b Indicates that the probability of positivity falls within the interval [ a, b]Image block frequency v between ₂ Namely image block positive probability distribution histogram data in a single digital pathological image; v. of ₃ ＝median[f _θ (x _positive )]Wherein x is _positive The method comprises the steps of (1) obtaining a positive image block set in a single digital pathological image, namely a positive probability median of image blocks in the single digital pathological image; v. of ₄ ＝mean[f _θ (x _positive )]Namely the positive probability mean value of the image block in a single digital pathological image;

Φ _avgpool (. to) the mapping operation of the last average pooling layer of the trained teacher network T, the output is a 512-dimensional feature vector, D _high For a set of positive probability highest image blocks of a given proportion in a single image, n _high For the number of elements contained in the set, v ₅ I.e. 512-dimensional average features of the strongly characterizing image blocks extracted by the network. V contains 3-scale global feature information (under 20-fold magnification, 10-fold magnification and 5-fold magnification), and out when a deep random forest F is constructed by using the global features V of cases contained in the image block level classifier training data set _i ＝F(V _i ) I.e. the corresponding digital pathology image PDL1 expression level prediction result. The specific flow is shown in fig. 3.

To visually demonstrate the effects of the present invention, fig. 4 a-4 d show the results of visualizing the expression level of PDL1 obtained on TCGA colon cancer digital pathology images, where fig. 4a is a colon cancer digital pathology slice image, fig. 4b is a PDL1 expression profile heat map obtained at 20-fold magnification generated by the present invention, fig. 4c is a PDL1 expression profile heat map obtained at 10-fold magnification generated by the present invention, and fig. 4d is a PDL1 expression profile heat map obtained at 5-fold magnification generated by the present invention. The high grayscale region in the heat map corresponds to PDL1 expressing the high level region. It can be seen that the distribution of PDL1 expression levels obtained at different scales has high consistency, and has high similarity with the distribution of tumor tissues in sections and the distribution of real immunohistochemical data. Based on TCGA colon cancer data, under the condition that 576 digital pathological sections are randomly selected as a training set and 126 digital pathological sections are selected as a test set, the method achieves 86.508% of accuracy rate on the task of judging PDL1 expression level. According to the method, a multi-instance knowledge distillation training strategy is designed aiming at the problem of weak labels of digital pathological images, the characteristic capture capability of the network and the characteristic of inhibiting overfitting of an average model are utilized, and multi-scale global information is integrated when the overall expression level of a slice is predicted, so that the PDL1 expression level of the corresponding tissue of the digital pathological images can be accurately judged. The invention can be widely applied to medical systems and the related fields of pathological analysis, and has wide market prospect and application value.

Claims

1. A PDL1 expression level prediction method based on a multi-instance knowledge distillation model is characterized by comprising the following steps:

step two: constructing a weighted distillation loss function to constrain student network optimization, and reasonably configuring PDL1 weak label supervision information;

step three: and (4) classifying the global aggregation characteristics by using a depth stochastic model, and realizing the PDL1 expression level prediction of the digital pathological image level.

2. The method for predicting the expression level of PDL1 based on multi-instance knowledge distillation model according to claim 1, wherein the method comprises the following steps: in the first step, a teacher network T and a student network S are constructed by taking ResNet34 pre-trained on ImageNet as a basic backbone network, fragments FPKM matched in every thousand basic groups are read every million times obtained by sequencing a case to which a pathological image corresponding to an image block belongs as an initial label of expression level, and the student network is trained under a maximization-minimization multiple sample learning framework; optimizing S parameters of the student network based on teacher network T distribution constraint and typical image block example classification loss; updating the teacher network T parameter in an exponential moving average mode based on the updated student network S parameter, updating each sample label according to the data distribution output by the teacher network after each training, and reserving a certain proportion of the original labels of the typical image blocks for subsequent training; the final output image block corresponds to the PDL1 expression level prediction result.

3. The PDL1 expression level prediction method based on the multi-instance knowledge distillation model as claimed in claim 2, wherein: in the step one, the specific process is as follows:

s11, considering weak label attributes and specific distribution conditions of digital pathological images, presuming that high-expression image blocks and low-expression image blocks exist in the PDL1 high-expression pathological section and the low-expression pathological section in a certain proportion, and constructing a multi-instance learning training framework; the given initial image block label is the same as the corresponding pathological image, and the typicality of each image block of the training set is judged based on the intermediate state network at each stage in the training process; constructing a maximization-minimization sample screening mode based on corresponding hypothesis, and respectively selecting high-expression typical image blocks and low-expression typical image blocks in a certain proportion, wherein labels of the high-expression typical image blocks and the low-expression typical image blocks are the same as those of the images, and the rest image blocks are set to be label-free; continuing to train the network based on the updated data set distribution, and continuously and iteratively updating the data set distribution and the network parameters;

s12, constructing a teacher-student T-S network pair with a twin structure on the basis of the training frame; the student network S parameter updating is constrained by a maximization-minimization multi-instance learning framework, and specifically comprises the following steps: under the distribution of the current data set, the image blocks with labels restrict the updating of student network parameters in a cross entropy classification loss form, and the unlabeled data participates in the updating of student network S parameters by measuring the consistency of the distribution of teacher network output data and the distribution of student network output; after the student network parameters are updated, updating the teacher network T parameters in an exponential moving average mode; and in the training process, the T and the S are supervised and alternately updated until convergence.

4. The method for predicting the PDL1 expression level based on the multi-instance knowledge distillation model, according to claim 2 or 3, wherein: in step S11, for each image block, it is given that its initial label is the same as that of the pathology image, i.e., for slice X _i Image block x thereof _i,j Label y of _i,j ＝S _i In which S is _i Is X _i The label of (1); at this time the training data set is distributed as D ₀ ＝{(x _i,j ,y _i,j ) 1., N, j ═ 1., a., M }; based on D ₀ To the backbone network

Training is carried out, when all training samples are traversed once, the typical discrimination is carried out on the training samples based on the maximization-minimization principle, then the distribution state of the data set is updated, and the network parameter updating is continued; for the k-th round of training, let the round of training obtain the corresponding mapping of the network as

Minimum p samples y _i,j ＝S _i (ii) a Except the updated image blocks, the rest image blocks are set to be label-free, and the data set after updating is distributed to be D _k (ii) a Based on D _k Continuing to optimize the network and obtaining updated parameters

Reuse of

Updating the training set distribution to D _k+1 And iterating in such an alternating manner until convergence.

5. The method for predicting the PDL1 expression level based on the multi-instance knowledge distillation model, according to claim 2 or 3, wherein: in step S12, the teacher-student model network structure adopts ResNet 34; wherein teacher network T is an exponential moving average of student network S parameters, i.e.

θ _k,l ＝(1-α)θ′ _k,l +αθ _k,l-1 (1)

Wherein theta is _k,l And theta _k,l-1 Is a parameter of the teacher network T at the ith and l-1 iteration of the kth round, θ' _k,l Is the ith i of the kth roundthe parameter of the student network S during the session, and alpha is the weight contributed by the student network parameter during the adjustment and the update;

l is iteration number;

specific distribution D of training set under time when student network S updates parameters _k Based on the training set distribution obtained by the maximization-minimization multi-instance learning framework updating; kth wheel belt label training data D _k,labeled Constructing a cross entropy classification loss function to form a constraint on the known distribution; for unlabeled data D _k,unlabeled Giving reference distribution by using the current teacher network T, and measuring the distance between the distribution and the reference distribution by using the student network S based on the Euclidean distance to form a distribution consistency loss function; and carrying out random data enhancement on the input data of the T and the S, wherein the random data enhancement comprises random rotation, horizontal turning, vertical turning and color enhancement, and forming a contrast data pair with the same data source and different forms.

6. The PDL1 expression level prediction method based on the multi-instance knowledge distillation model as claimed in claim 1, wherein: in the second step, the specific process is as follows:

taking one third of the samples with the highest reading as a high expression sample and the rest as a low expression sample when the number of fragments matched in each thousand bases obtained by sequencing of a case is taken as the basis of the PDL1 expression level; the number of low expression samples is about twice that of high expression samples; a large number of low-expression-level image blocks exist in the high-expression samples at the same time, and data have the characteristic of obvious sample distribution imbalance, so that a weighting cross entropy classification loss function is introduced into the loss function; meanwhile, the problem of weak labels of samples is considered, a maximization-minimization multi-instance learning loss function is constructed, the classification loss function acts on strong-representation positive instances of the positive samples and strong-representation negative instances of the negative samples, and fuzzy instances which are inconsistent with the properties of the weak labels are prevented from being used as noise and introduced into a training process; and introducing a distribution consistency function to restrict the consistency of the distribution of the student network output and the teacher network output.

7. The PDL1 expression level prediction method based on the multi-instance knowledge distillation model as claimed in claim 6, wherein: the specific composition of the weighted distillation loss function is as follows: in the k-th round of training, for each PDL1 high-expression case, the first 50% instance set with the highest positive probability of output over the teacher network T was selected from all image blocks of the case

Correspondence giving

And

examples in (a) are labeled positively and negatively, the remaining examples are unlabeled samples; for the labeled example, calculating the weighted cross entropy classification loss function of the labeled example, and expressing the weighted cross entropy classification loss function as

Wherein x _i Is a set

Middle image block sample, x _j Is a set

is a mathematical expectation within a range; for unlabeled sample set U ^k Calculating the distribution consistency of the output of the student network S and the teacher network T; input data pairs are formed given the same input and with different degrees of data enhancement, and l is used ₂ The norm measures the distance between the student network and the teacher network, and the expression is

Wherein T (-) and T' (-) are different forms of random data enhancement,

and

mapping functions respectively represented by the teacher network T and the student network S; the weighted distillation loss function is a weighted combination of the two loss functions, i.e. L ═ L _weighted-CE +ηL _consistency η is the weight of the distribution consistency loss function; the learning rate settings of the teacher network T and the student network S are both 10 by adopting ADAM optimization of the adaptive momentum estimation optimizer ^-2 And optimizing the loss function by adjusting the network weight value through gradient back propagation.

8. The method for predicting the expression level of PDL1 based on multi-instance knowledge distillation model according to claim 1, wherein the method comprises the following steps: in the second step, the specific process is as follows: on the basis of a prediction result of an image block level PDL1, constructing a series of global features to describe a digital pathological image; the manually constructed global features include: the percentage of image blocks classified as positive in a single digital pathological image, a positive probability distribution histogram of the image blocks in the single digital pathological image, a positive probability median of the image blocks in the single digital pathological image, a positive probability mean of the image blocks in the single digital pathological image and 512-dimensional average characteristics of the image blocks with specified proportions extracted by an average pooling layer in a ResNet34 network model are calculated; the image block level network comprises 3 scales: 20 times magnification, 10 times magnification and 5 times magnification; and splicing the features of the image block level prediction network output structure under the three scales in sequence to obtain corresponding case multi-scale global features, using a case consistent with the image block level network training set as a digital pathological image PDL1 expression level prediction model training sample, constructing a depth random forest, and realizing digital pathological image level PDL1 expression level prediction.

9. The method for predicting the PDL1 expression level based on the multi-instance knowledge distillation model according to claim 8, wherein the method comprises the following steps: when the expression level of the image blocks PDL1 is judged, the accuracy rate of the teacher network T is slightly higher than that of the student network S; therefore, on the basis of the trained teacher network T, a series of global features V ═ V is constructed ₁ ,v ₂ ,v ₃ ,v ₄ ,v ₅ Describing the complete digital pathological image; in particular, the amount of the solvent to be used,

wherein n is _positive For the number of positive image blocks contained in the corresponding image, n _slide The total number of image blocks included for the image; v. of ₁ Namely the percentage of image blocks classified as positive in a single digital pathological image; v. of ₂ ＝[p _0-0.1 ,p _0.1-0.2 ,...,p _0.9-1 ]Wherein p is _a-b Indicates that the probability of positivity falls within the interval [ a, b]Image block frequency v between ₂ Namely image block positive probability distribution histogram data in a single digital pathological image; v. of ₃ ＝median[f _θ (x _positive )]Wherein x is _positive The method comprises the steps of (1) obtaining a positive image block set in a single digital pathological image, namely a positive probability median of image blocks in the single digital pathological image;

v ₄ ＝mean[f _θ (x _positive )]namely the positive probability mean value of the image block in a single digital pathological image;

Φ _avgpool (. to) the mapping operation of the last average pooling layer of the trained teacher network T, the output is a 512-dimensional feature vector, D _high For a set of positive probability highest image blocks of a given proportion in a single image, n _high For the number of elements contained in the set, v ₅ Namely 512-dimensional average characteristics of the strong characteristic image blocks extracted by the network; v contains global feature information of 3 scales, and a depth random forest F is constructed by utilizing the global features V of cases contained in an image block level classifier training data set, and then out _i ＝F(V _i ) I.e. the corresponding digital pathology image PDL1 expression level prediction result.