CN117036288A

CN117036288A - Tumor subtype diagnosis method for full-slice pathological image

Info

Publication number: CN117036288A
Application number: CN202311006126.5A
Authority: CN
Inventors: 杨蓓; 李晓宇
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-11-10

Abstract

The invention belongs to the technical field of tumor subtype diagnosis, and particularly relates to a tumor subtype diagnosis method oriented to a full-slice pathological image. Firstly, preprocessing WSI to be classified, and performing subtype diagnosis on the preprocessed WSI to be classified by using a trained tumor subtype diagnosis model, wherein the diagnosis process comprises the following steps of: and extracting embedded feature representation of the WSI to be classified, measuring the similarity between the WSI to be classified and each class cluster, and obtaining subtype diagnosis results of the WSI to be classified according to the similarity. In addition, the similarity between the embedded feature representations of the WSIs of the same category is improved in the training process, and the similarity between the embedded feature representations of the WSIs of different categories is reduced, so that the WSI features of the same category are more similar, the WSI features of different categories are more different, and more accurate prediction is convenient to realize. Moreover, training of the model can be realized by only using image-level label information, and the huge workload of marking images by pathology specialists is greatly reduced.

Description

Tumor subtype diagnosis method for full-slice pathological image

Technical Field

The invention belongs to the technical field of tumor subtype diagnosis, and particularly relates to a tumor subtype diagnosis method oriented to a full-slice pathological image.

Background

Cancer is one of the most common diseases worldwide, severely jeopardizing the physical health of people. Pathological analysis of suspicious tissues diagnosis is a gold standard for clinical diagnosis of many major diseases, especially malignant tumors (cancers), and is also an important basis for early detection, diagnosis, treatment and prognosis. Traditional histopathological analysis is performed by a pathologist observing hematoxylin and eosin stained tissue sections through a high-magnification microscope and analyzing the structure and morphology of tissue cells under different magnifications to determine tumor areas, the degree of cell proliferation, etc., thereby making pathological diagnoses. However, today there is a serious shortage of chinese pathologists. In some specialized oncology hospitals, pathologists observe and analyze thousands of pathological sections each day, and heavy and time-consuming work can lead to diagnostic errors. In recent years, with the rapid development of deep learning technology, a high-magnification scanner can save tissue on a slide as a high-definition digital pathology image WSI (Whole Slide Image, full-slice digital image), making it possible to assist in clinical diagnosis of pathology using computer artificial intelligence.

With the rapid development of computer hardware and the improvement of computational theory, particularly in the aspect of a deep learning algorithm, a machine learning method is used for assisting medical image diagnosis to a certain extent. However, research and development in the field of computational pathology remains relatively lagging. The early pretreatment of pathology images by manual labeling requires considerable labor and time for the pathologist, and analysis of tissue sections is time-consuming and labor-consuming for the physician. The main reasons are as follows: the physician's judgment of the unique cytological and histological characteristics of pathological sections is influenced by subjectivity and personal experience, and varies The doctor of (1) can judge pathological sections differently, and the average diagnostic consistency is only about 75%; the doctor uses the microscope and other instruments to study and label the histopathological image, which is time-consuming and labor-consuming; the pathologist's experience of sitting through the accumulation of time is difficult to transfer to other people, and doctor's diagnosis and treatment of patients is extremely important for patient's life health. In the initial stage based on computer assistance, it is often necessary for a medical expert to perform segmentation processing on an image for nuclei, chromosomes or cells, extract features for pathological conditions, and train a machine learning model based on the extracted features. Although computer-aided algorithms of the manual feature stage solve the problem of analysis of medical images to some extent, the pathological diagnosis methods of these algorithms generally rely on labeling at the pixel level of a specialist doctor, which is costly and time-consuming for the doctor. On the other hand, the quality of the labeling depends on the subjectivity and experience of the doctor, and especially for some difficult and complicated diseases, different doctors may label the same focus area differently. Moreover, with the development of deep learning, manual features are gradually replaced by stages based on deep learning automatic learning features, and although some algorithms based on traditional convolutional neural networks (Convolutional neural network, CNN) are widely applied to medical image diagnosis at present, the algorithms are rarely applied to the field of pathological image diagnosis due to the limitation of computer hardware memory (particularly graphics display card memory such as GPU), and models such as CNN cannot directly process WSI images (such as 10) with huge sizes ⁵ ×10 ⁵ Pixels) while limited by the size of the WSI dataset, conventional CNN-like models are not trained effectively. Therefore, artificial intelligence-assisted pathology diagnosis places higher demands on acquisition and labeling of pathology image data, efficient representation of image information, and design and implementation of accurate diagnosis methods.

The pathology image is different from the normal image. The pathology image cannot be directly processed by the conventional deep learning model due to its own extreme image size. In addition, there are also problems that there are few pathological images in the public pathological data set, and the labeling of the images requires a pathologist to expend a great deal of time and effort, so that the labeling of the pixel level is difficult to obtain. In the face of the problem that only labels containing the whole pathological image level (WSI-level) are missing pixel-level labeling, the existing mature deep learning network has difficulty in obtaining effective diagnosis results. Thus, automatic diagnosis oriented to full-slice pathology images presents a significant challenge to computer vision tasks. In summary, the existing deep learning technology is used for analyzing tissue pathology images, and a high-precision and feasible solution is not yet available.

Disclosure of Invention

The invention aims to provide a tumor subtype diagnosis method oriented to a full-slice pathological image, which is used for solving the problem that a deep learning technology is lack of high precision and is feasible in the prior art and is used for analyzing a tissue pathological image.

In order to solve the technical problems, the invention provides a tumor subtype diagnosis method for a full-slice pathological image, which comprises the following steps:

1) Constructing a tumor subtype diagnostic model, wherein the tumor subtype diagnostic model comprises overall feature extraction, and the overall feature extraction is used for extracting embedded feature representations of input WSI;

2) Training the constructed tumor subtype diagnosis model by using the tumor WSI data set subjected to subtype classification as a training sample, improving the similarity between embedded feature representations of WSIs of the same category in the training process, and reducing the similarity between embedded feature representations of WSIs of different categories; obtaining a plurality of class clusters after training, wherein the number of the class clusters is the subtype classification number, and one class cluster comprises embedded feature representations of all training samples belonging to the class cluster;

3) Preprocessing the WSI to be classified, and performing subtype diagnosis on the preprocessed WSI to be classified by using a trained tumor subtype diagnosis model, wherein the diagnosis process comprises the following steps of: and obtaining the embedded feature representation of the preprocessed WSI to be classified by utilizing a feature extraction module, measuring the similarity between the WSI to be classified and each class cluster according to the embedded feature representation of the WSI to be classified and the embedded feature representation of each training sample in each class cluster, and obtaining the subtype diagnosis result of the WSI to be classified according to the similarity.

The beneficial effects of the technical scheme are as follows: according to the tumor subtype diagnosis model designed integrally, in the training process of the tumor subtype diagnosis model, the similarity between WSI features of the same category can be continuously improved, and meanwhile, the similarity between WSI features of different categories is reduced, so that the WSI features of the same category are more similar, the WSI features of different categories are more different, the similarity measurement between the WSI to be classified and each category cluster is more favorable for the subsequent measurement of the part, and more accurate prediction of the WSI to be classified is facilitated. Moreover, training of the model can be realized by only using WSI image-level label information, and model performance superior to that of training of a large number of pixel-level labeling images is achieved, and huge workload of pathological expert labeling images is greatly reduced without the aid of fine pixel-level horizontal labeling.

Further, the similarity between the embedded feature representation of the WSI to be classified and each cluster is measured in any one of the following ways:

mode one: measuring the similarity between the embedded feature representation of the WSI to be classified and the embedded feature representation of all training samples in each class of clusters, and taking the maximum similarity of each class of clusters as the similarity between the WSI to be classified and the class of clusters;

Mode two: calculating average embedded feature representation of all training samples in each class cluster and taking the average embedded feature representation as a cluster center of the class cluster, and measuring similarity between the embedded feature representation of the WSI to be classified and the cluster center of each class cluster and taking the similarity between the WSI to be classified and each class cluster;

mode three: and (3) obtaining the average value of the result obtained in the first mode and the result obtained in the second mode as the similarity between the WSI to be classified and each cluster.

The beneficial effects of the technical scheme are as follows: different metrology strategies are designed for flexible selection.

Further, in training the tumor subtype diagnostic model, the loss function used is:

wherein Loss represents a Loss value; b represents the batch size involved in one iteration training; k (K) _i And L _i Representation package X _i The number of the existing similar samples and the number of the existing heterogeneous samples are respectively as followsAnd->Gamma represents a scaling factor; />And->Representing the weight factor; delta _pos And delta _neg Respectively, intra-class spacing and inter-class spacing.

The beneficial effects of the technical scheme are as follows: the above-mentioned loss function optimizes the distribution of the packet features in the feature space by forcing the vector similarity between the packet features in the class to be smaller than the vector similarity between the packet features in the class, so that the packet feature vectors of each class are gathered around the center of the respective class, thereby being evolved into different class clusters, and the class clusters keep a certain interval with each other.

Further, the preprocessing includes segmenting the tissue region in the WSI into a plurality of image blocks.

The beneficial effects of the technical scheme are as follows: the tissue region is divided into a plurality of image blocks, the image blocks are processed in the subsequent processing, the training of the model can be realized by only utilizing WSI image-level label information without the aid of fine pixel-level horizontal labeling, the model performance superior to that of the model requiring to provide a large number of pixel-level labeling images is achieved, the dependence of a deep learning method on pixel-level labeling of the full-slice pathological image is avoided, and the huge workload of pathological specialist labeling images is greatly reduced.

Further, the overall feature extraction comprises a feature extraction module, a projection module and a gating attention module; the feature extraction module is used for extracting the features of each image block; the projection module is used for projecting the features extracted by the feature extraction module into a low-dimensional unit feature space to obtain a plurality of unit vectors, so that different feature vectors only have differences in directions in the unit feature space; the gated attention module is used for fusing the outputs of the projection modules to generate an embedded feature representation.

Further, the feature extraction module adopts the ResNet101 as a backbone network, and is used for respectively processing the features output by Stage3 and Stage4 in the ResNet101 through an adaptive average pooling layer, and then performing splicing processing on the output through the adaptive average pooling layer to extract the features corresponding to each image block.

The beneficial effects of the technical scheme are as follows: the extracted features are multi-scale features, and completeness expression of the example features is realized by fusing the multi-scale features.

Further, the projection module comprises a trainable BatchNorm1d layer, a trim layer Proj-Fc parameterized by weights, and an L2 Norm layer; the BatchNorm1d layer is used for normalizing the input feature matrix along the direction of the feature dimension, and the calculation process of the fine tuning layer Proj-Fc is as follows:representing a weight matrix; />Representing the output of the trim layer Proj-Fc; reLU represents an activation function; the L2 Norm layer is used to normalize each feature vector in the output of the fine-tuning layer Proj-Fc to a unit vector.

Further, the gated attention module includes an Attn-Fc1 layer, an Attn-Fc2 layer, an Attn-Fc3 layer, and a Dropout layer; the Attn-Fc1 layer is used for outputting a compression characteristic for the input characteristic, and mapping the characteristic value in the compression characteristic into a positive infinity-negative infinity interval through a Tanh activation function; the Attn-Fc2 layer is used for playing a gating role, and the output result of the Attn-Fc1 layer is regulated and controlled by mapping the output value of the network between 0 and 1 through a Sigmoid activation function; the output results of the Attn-Fc1 layer and the Attn-Fc2 layer are transmitted into the Attn-Fc3 layer after the bitwise multiplication operation; the Attn-Fc3 layer is configured to generate an attention score of each feature and process the attention score by the Dropout layer, and further perform weighted summation on the unit vectors by using the attention score processed by the Dropout layer to obtain the embedded feature representation.

The beneficial effects of the technical scheme are as follows: the purpose of applying the gated attention mechanism GAM is to assign a piece of learnable weight information to each example vector and to generate an embedded feature representation of the package by fusing all example feature vectors in a package.

Further, the preprocessing includes: firstly converting WSI from RGB color space to HSV color space, then calculating a segmentation threshold value of a background area and a tissue area by adopting an Ojin method aiming at saturation channels in the HSV color space, and extracting a tissue mask by binarizing the saturation channels based on the segmentation threshold value to extract the tissue area; the tissue region is then segmented into a series of image blocks of equal size using a sliding window.

The beneficial effects of the technical scheme are as follows: the background area in the WSI can be effectively removed by using the Ojin method.

Further, after the tissue mask is extracted, mean filtering and morphological closing operations are also required to extract the tissue region.

The beneficial effects of the technical scheme are as follows: the mean filtering and morphological closing operation are adopted, so that the denoising can be further refined, and a large number of tiny holes and tiny tissue areas existing in the tissue mask can be removed.

Drawings

FIG. 1 is an overall workflow diagram of the present invention;

FIG. 2 (a) is an original WSI diagram;

FIG. 2 (b) is a WSI tissue profile;

FIG. 2 (c) is a block diagram of a segmented image;

FIG. 3 is a general feature extraction flow chart of the WSI of the present invention;

FIG. 4 (a) is a schematic diagram of the MaxS metric strategy employed by the present invention;

FIG. 4 (b) is a schematic diagram of an AvgS metrology strategy employed by the present invention;

FIG. 5 (a) is a schematic diagram of the effect of the L2 normalization layer on TSMIL on the test set;

FIG. 5 (b) is a schematic diagram of the impact of the L2 normalization layer on TSMIL on the training set;

FIG. 6 (a) is a first Annopointed WSI diagram;

FIG. 6 (b) is an attentive heatmap of FIG. 6 (a);

FIG. 6 (c) is a second Annopointed WSI diagram;

FIG. 6 (d) is an attentive heatmap of FIG. 6 (c);

fig. 6 (e) is a thermal index map of fig. 6 (b) and 6 (e).

Detailed Description

The invention designs a framework integrating weak supervision multi-example learning and metric learning, and trains the framework, thereby realizing a tumor subtype diagnosis method oriented to full-slice pathological images. The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.

An embodiment of a tumor subtype diagnosis method facing to a full-slice pathological image:

The specific implementation process of the tumor subtype diagnosis method (hereinafter referred to as TSMIL) for the full-slice pathological image is shown in fig. 1, and is specifically as follows:

step one, acquiring WSI to be classified, and preprocessing the WSI.

Since there are a large number of background areas in WSI, such as white areas shown in fig. 2 (a), these background areas are not of any significance to classification of WSI. To eliminate these nonsensical background areas and to ensure efficient performance of subsequent classification diagnostics, WSI is first pre-processed. The pretreatment mainly comprises two steps: (1) effectively removing background areas in WSI; (2) The extracted tissue region is divided into image blocks of suitable size for subsequent model analysis. The process of extracting and segmenting tissue regions in WSI into image blocks is as follows:

firstly, when tissue areas are extracted, each WSI is firstly converted from an RGB color space to an HSV color space, then, for saturation channels in the HSV color space, a segmentation threshold value of a background area and the tissue areas is calculated by adopting an Ojin method, and based on the segmentation threshold value, a tissue mask is extracted by binarizing the saturation channels. Since there are a large number of tiny holes and tiny tissue areas in the tissue mask, to handle these noise, the invention then uses mean filtering to smooth, and finally uses morphological closing operation to further refine the denoising. The extracted tissue region is shown in fig. 2 (b), in which the region surrounded by the green curve is the final extracted tissue region.

After the tissue region is extracted, the tissue region is divided into a series of image blocks with the same size by adopting a sliding window mode, as shown in fig. 2 (c). For image block processing, the invention only retains image blocks with tissue areas greater than 80% of the entire image block area. Because of the different WSI resolutions, tissue region sizes vary, so the number of image blocks per WSI segment varies from hundreds to thousands. Model training after the invention is completed on the image block.

Step two, constructing a tumor subtype diagnosis model, wherein the tumor subtype diagnosis model comprises overall feature extraction, and the overall feature extraction is used for extracting embedded feature representation of input WSI; and training the constructed tumor subtype diagnosis model by using the subtype classified tumor WSI data set as a training sample. It should be noted that, different tumors have different subtype classifications, for example, the subtype classifications of renal cell carcinoma include chromophobe renal cell carcinoma, clear cell renal cell carcinoma, and renal papillary cell carcinoma.

1. And extracting overall characteristics.

The classification diagnosis of pathological images is based on the overall characteristics of WSI, and the acquisition process of the characteristics comprises three modules, namely a Multi-scale characteristic extraction Module (Multi-scale Feature Extraction Module, MFEM), a projection Module (Projection Module, PM) and a Gated-Attention Module (GAM), as shown in FIG. 3. After the tissue area in the WSI is divided into a plurality of image blocks, firstly, multi-scale features of each image block in the WSI are extracted through the MFEM, then, each image block feature is projected into a unit vector in a low-dimensional feature space by the PM module, and finally, all image block feature vectors in one WSI are fused into a global overall feature of the WSI by the GAM module.

1) The multi-scale feature extraction module MFEM.

In WSI weak supervision classification, one dataset containing N WSIs is represented as d= { (X) ₁ ,Y ₁ ),…,(X _N ,Y _N ) Each of WSI X _i With unique class labels Y _i . After preprocessing the pathology image, each WSI X _i Comprises n _i Blocks of pictures, i.e.Corresponding to the multiple instance learning algorithm, each WSI is considered a package, and image blocks in the WSI are considered instances in the package.

When a pathologist performs pathological diagnosis, multi-scale observation is usually performed on a glass slide to realize complete inspection and accurate judgment of suspicious lesions and avoid missed diagnosis and misdiagnosis of cancer lesions. Based on the idea, the invention designs a multi-scale feature extraction module MFEM, which aims to extract features on multiple scales of each example and realize completeness expression of the example features by fusing the multi-scale features. The present invention uses ResNet101 as a backbone network and migrates pre-training parameters from the ImageNet. The multi-scale features of each example are then extracted based on the network, as shown in fig. 3.

Specifically, the features of each image block are obtained by fusing the features of Stage3 output in ResNet101And the characteristics of Stage4 output->To obtain multi-scale features as shown in equation (1). In addition, the invention is designed in St and after the step (age 3) and the step (Stage 4), an adaptive average pooling layer is applied to ensure that the output of different stages of the backbone network is a characteristic value.

Wherein f _S3 (. Cndot.) and f _S4 (. Cndot.) represents the different scale features extracted by STAGE3 and STAGE4, respectively, in ResNet 101; p (·) represents an average pooling operation;and->The dimension of (d) is d respectively _s3 Sum d of _s4 Dimension;

further, a multi-scale combination strategy is designed, and each example x is fused _i,j The specific process of the multi-scale joint strategy is shown in a formula (2) of the features with different depths.

Where concat represents a splice of different scale features. Each example x _i,j From multi-scale featuresRepresentation, where d _r Representing the feature dimension after fusion; bag X _i From the feature matrix->Representation of wherein

2) And a projection module.

Unlike the existing classical multi-example learning MILs method, the present invention proposes that the projection module PM is intended to project example featuresInto a low-dimensional unit feature space such that there is only a difference in direction in the unit feature space for the different example feature vectors. The present invention applies two different regularization operations in the PM to stabilize the training of the coding layer. Specifically, as shown in FIG. 3, the PM comprises three main network layers, one trainable BatchNorm1d layer, weighted by Parameterized trim layers Proj-Fc and L2Norm layers. The BatchNorm1d layer execution operation is batch normalized, the main function of the operation is in the feature matrix R _i In the direction of the feature dimension +.>Normalization to feature matrix->The specific process is shown in formula (3).

R′ _i ＝BachNorm1d(R _i ) (3)

Wherein BachNorm1d represents batch normalization, and the output result of the BachNorm1d layer isWherein->

Because parameters of the backbone network are fixed and migrated during the training process, multi-scale features extracted based on the backbone network cannot be optimized during the training process. To optimize example features in the training process, the present invention applies a trim layer Proj-Fc in the projection module. Proj-Fc comprises two components, respectively a weight matrixAnd a ReLU activation function, wherein the weight matrix W _p The main function of (a) is to fine tune the example features during training, reLU activationThe function is to implement nonlinear conversion of features in the feature matrix. Feature matrix->After being processed by the trimming layer Proj-Fc, the output characteristic matrix is +.>Wherein d is _p The characteristic dimension after fine tuning is represented, and the specific calculation process of the fine tuning layer Proj-Fc is shown in a formula (4).

Wherein,representing a weight matrix; reLU represents an activation function; the output result of the trimming layer Proj-Fc is +. >

In addition, the normalization shown in equation (5) plays an important role in TSMILs, determining whether TSMILs can be trained effectively based on metric-based feature learning. Specifically, the L2 standard layer performs an operation of L2 normalization, and the main function is to normalize an example feature vector, eliminating the influence of a feature module length on a model. In metric learning, the features of each example are considered as feature vectors in feature space, each vector containing two basic attributes of direction and modulo length. Since in metric-based feature learning, direction is the only vector attribute of interest in the TSMIL of the present invention, the modulo length of an example vector may be an interference factor. To solve the problem, the invention is characterized in that the characteristic matrix H _i In which each example feature vector h is normalized using L2 normalization operations _i,j Normalized to the unit vector h' _i,j The modular length information of the example feature vector is effectively eliminated, and the influence of the modular length information on the expression of the package features is avoided. After L2 normalization, each example x _i,j From a unit vector h' _i,j The difference between different examples is represented by the direction of the vector only, while the following gating attention module GAM obtains the feature vector of the packet by aggregating the example vectors in different directions. Notably, GAM will assign a weight to each exemplary feature vector, and the feature vector of the packet will remain more closely oriented than the more heavily weighted exemplary feature vector.

H′ _i ＝L2Norm(H _i ) (5)

Wherein L2 Norm represents L2 normalization, and the output result of the L2 Norm layer is

3) And a gating attention module.

The present invention is designed to apply a gated attention mechanism GAM in order to assign a piece of learnable weight information to each example vector and to generate an embedded feature representation of a package by fusing all example feature vectors in a package. As shown in FIG. 3, GAM is a multi-layer attention structure consisting of four parts, namely, attn-Fc1 layer, attn-Fc2 layer, attn-Fc3 layer and Dropout layer, wherein the Attn-Fc1 layer, attn-Fc2 layer, attn-Fc3 layer are respectively composed of weight matrix And->Parameterization and continuously adjusting in the training process until the training is finished and then fixed.

In GAM, the Attn-Fc1 layer is formed by a weight matrixAnd a Tanh activation function, wherein d _p And d _a The input dimension and the output dimension of the Attn-Fc1 layer are shown, respectively. Weight matrix V _a Learning each example featureAnd outputting a compression feature for each example feature, and finally mapping the feature value in the compression feature into a positive infinity-negative infinity interval through a Tanh activation function. In the multilayer attention structure, the network layer playing a role in gating is an Attn-Fc2 layer, and the layer is formed by a weight matrix +. >And Sigmoid activation functions. The Sigmoid activation function regulates the output result of the Attn-Fc1 layer by mapping the output value of the network between 0 and 1, so that a value close to 1 is released, and a value close to 0 is suppressed. The output results of the Attn-Fc1 layer and the Attn-Fc2 layer are transmitted into the Attn-Fc3 layer after the bitwise multiplication operation.

Finally the Attn-Fc3 layer is used to generate an attention score for each example feature, with example features that are critical to classification being given a greater attention score and vice versa. Attn-Fc3 layer is formed by weight matrixAnd a Softmax function, wherein the weight matrix W _a For outputting an attention score for each example. While the Softmax function normalizes the attention scores of all example features in a package such that the sum of the attention scores of all example features in the package is 1. For example x _i,j The specific attention score is calculated as shown in formula (6).

Wherein, as follows, the dot product between vectors; sigm represents a Sigmoid activation function; tanh represents the Tanh activation function; a, a _i,j For example x _i,j Corresponding attention score.

In addition, the invention also designs the generalization of the enhancement model by applying the dropout technique on all the attention scores in the package. This technique is achieved by randomly selecting d in a packet _p % attention score, will select the showThe attention value of the example is set to 0 to achieve random masking of d in WSI _p % of images, and enhances robustness in the model training process. The dropout operation on the attention score is shown in equation (7).

Feature vector of final one packetDerived from fusing operator g by computing packet X _i All example vectors h' _i,j And attention score a ^′ _i,j Is fused to obtain a feature representation of the package. The fusion process of the package is shown in formula (8).

2. A loss function.

How to optimize the characteristics of the packet is a key consideration for TSMILs to choose the loss function. Metric learning is a powerful method for obtaining compactness of sample features of the same category and variability of sample features of different categories in the training process. The present invention design uses the loss function shown in formula (9) (instead of the Cross-Entropy loss function adopted by most deep neural networks), optimizes the distribution of packet features in the feature space by forcing the vector similarity between the packet features in the classes to be smaller than the vector similarity between the packet features in the classes, so that the packet feature vectors of each class are gathered around the center of the respective class, thereby developing into different class clusters, and the class clusters keep a certain interval with each other. During training, the batch size (batch size) involved in one iteration training is denoted as B, and the packets involved in training are denoted as { X }, respectively ₁ ,…,X _B }. Suppose that for packet X, within a training batch _i Presence of K _i Samples (packets) and L of the same class _i Individual heterogeneous samples, in feature space, similarity scores between them are notedIs thatk∈{1,…,K _i Sum } and->l∈{1,…,L _i }. The loss function of a training batch is shown in equation (9).

Two from package X _i And X _b Feature vector G of (1) _i And G _b (i ε {1, … B }, B ε {1, … B }, i+.b) the similarity score of these two feature vectors is denoted s, the similarity definition of which is shown in equation (10).

Wherein,<·,·>is the inner product of two vectors; II illustrates Euclidean regularization of the vector;and->Is a weight factor, delta _pos And delta _neg Respectively, the intra-class spacing and the inter-class spacing, and their specific parameter settings are shown in formula (11).

Wherein [ (S)] ₊ Represents a "zero intercept" operation; the scaling factor gamma and the spacing factor m are two super-parameters.

After the overall features of the WSI are extracted, the classification training process of the pathological image can continuously improve the similarity between the features of the WSI of the same category, and simultaneously reduce the similarity between the features of the WSI of different categories, so that the features of the WSI of the same category are more similar, and the features of the WSIs of different categories are more different. (FIG. 3 shows a WSI overall feature optimization acquisition flow.)

Step three, performing subtype diagnosis on the pretreated WSI to be classified by using a trained tumor subtype diagnosis model, wherein the diagnosis process comprises the following steps: and obtaining the embedded feature representation of the preprocessed WSI to be classified by utilizing a feature extraction module, measuring the similarity between the WSI to be classified and each class cluster according to the embedded feature representation of the WSI to be classified and the embedded feature representation of each training sample in each class cluster, and obtaining the subtype diagnosis result of the WSI to be classified according to the similarity.

Based on the feature distribution of the training samples introduced in the second step, TSMILs propose three packet classification strategies based on metrics: maxS, aveS, and HybS, all three different classification strategies apply a similarity measure as shown in equation (10), which can provide a larger value for a more similar vector. During testing, measuring the distance between the test sample and each class cluster in different measurement modes, and taking the class cluster category closest to the test sample as a prediction for the test sample. It is assumed that for a particular dataset, the distribution of the WSI features in the training set in the feature space is shown in fig. 4 (a) and fig. 4 (b), where each color represents a class. Obviously, the training samples of the same class are more compact in feature space, and a larger distance interval (white blank area) exists between the training samples of different classes. Three classification strategies designed in TSMILs are as follows:

1) MaxS (Maximum Similarity): as shown in fig. 4 (a), after all training samples and test samples X are embedded into the feature space, the similarity between the test samples and all training samples in each class cluster is measured, and the maximum similarity of each class cluster is regarded as the prediction score of the class, three different class clusters generate three prediction scores in total, and finally all the prediction scores are normalized based on the Softmax function.

2) AveS (Average Similarity): as shown in fig. 4 (b), after the coding layer embeds all training samples into the feature space, the average feature of all training samples in each cluster is extracted as the "cluster center" of the cluster. During testing, only the similarity between the feature vector of the test sample X and all the cluster centers is needed to be calculated, and finally, the Softmax operation is carried out on all the similarities to obtain the final prediction score.

3) HybS (Hybrid Similarity): hybS is a combination of the two classification strategies described above. Specifically, the prediction scores of the two strategies are averaged as the final prediction result.

Thus, the whole implementation process of the tumor subtype diagnosis method facing the full-slice pathological image is introduced.

Experiments were performed on two published pathological data sets, TCGA-NSCLC and TCGA-RCC, below, to verify the feasibility of the method of the invention as a generic framework for application on a variety of cancers.

1) Experimental data set.

(1) TCGA-RCC. The Caner Genome Atlas Renal Cell Carcinoma (TCGA-RCC) as a published pathological dataset, contains three categories, respectively, chromophobe renal cell carcinoma (Kidney Chromophobe Renal Cell Carcinoma, KICH), clear cell renal cell carcinoma (Kidney Renal Clear Cell Carcinoma, KIRC) and renal papillary cell carcinoma (Kidney Renal Papillary Cell Carcinoma, KIRP). The TCGA-RCC data contains 937 WSIs in total, and consists of 121 WSIs belonging to the KICH category, 519 WSIs belonging to the KIRC category and 297 WSIs belonging to the KIRP category. Obviously, the dataset is a multi-category and category-unbalanced dataset. After preprocessing, the dataset generated approximately 390 ten thousand tiles at 20 times magnification, with an average of approximately 4100 tiles per WSI. The only information available for learning in the dataset is the class tag of the WSI.

(2) TCGA-NSCLC. The TCGAnon-small cell lung cancer (TCGA-NSCLC) dataset contains two categories, lung squamous cell carcinoma (Lung Squamous Cell Carcinoma, luc) and lung adenocarcinoma (Lung Adenocarcinoma, LUAD), respectively. The TCGA-NSCLC dataset contained 1053 pathological WSIs, consisting of 541 WSIs belonging to the LUAD class and 512 WSIs belonging to the luc class. After pretreatment, the dataset produced a total of about 390 ten thousand tiles at 20 x magnification, with an average of about 3700 tiles per WSI.

2) And (5) experimental evaluation indexes.

Model evaluation refers to measuring specific performance of a model by a certain technical method, and an evaluation index is a quantitative standard for evaluating the performance of the model, and plays an important role in comparison of different methods. Different indexes reflect different performances of the model, so that a more comprehensive and objective evaluation result can be obtained by evaluating a plurality of indexes. Classification evaluation indexes used in the invention comprise Accuracy (Accuracy) and AUC (area under ROC curve). The derivation of these indices all comes from the Confusion Matrix (fusion Matrix), as shown in table 1.

TABLE 1 confusion matrix

Accuracy refers to the percentage of the number of correctly predicted samples to the total number of samples in all the prediction results, and the specific calculation process is shown in the formula (12). AUC (Area Under Curve) is the area under the ROC curve, wherein the abscissa of the ROC curve is the false positive rate (False Positive Rate, FPR), the ordinate is the true positive rate (True Positive Rate, TPR), the false positive rate is the percentage of the predicted false positive samples to all the true negative samples, and the specific calculation process is shown in formula (13). True positive rate refers to the percentage of predicted true positive samples to all true positive samples, and the specific calculation process is shown in formula (14). The AUC values ranged from 0 to 1, with AUC approaching 1 indicating better performance of the model.

3) Experimental setup.

During training, TSMIL optimizes the network using a Ranger optimizer with a fixed weight decay factor of 1 e-5. For the learning rate, the initial value is set to 0.1 and cosine decays during training. The training period was 300 and the batch size B was 16. Furthermore, TSMILs also adopted an early stop strategy during training, with a patience value of 50. In the proposed TSMIL, d in MFEM _s3 、d _s4 And d _r 1024, 2048 and 3072, respectively. In the Proj-Fc layer, d _p Set to 768, and p in GAM _d The value is set to 0.35, d _a Has a value of 128. In the loss function, the scale factor γ and the pitch factor m are set to 256 and 0.25, respectively. The performance index of all experiments was the average of five experiments and 95% confidence intervals (Confidence Intervals, CI) were estimated 1000 iterations based on the bootstrap algorithm.

4) Experimental results.

(1) Experimental results and analysis on TCGA-NSCLC and TCGA-RCC.

To verify the feasibility and superiority of the proposed method TSMILs, this section selects the superior weak supervision method in recent years as a comparison method and compares with the experimental results provided in these methods. These methods include ABMIL, DSMIL, CLAM-SB, CLAM-MB, transMIL, DTFD-MIL, SRCL and NAGCN. The comparative experimental results are shown in table 2.

TABLE 2 Experimental results on TCGA-NSCLC and TCGA-RCC

As can be seen from table 2, in the proposed TSMILs method, the three classification strategies have similar performance, with the AUC difference between the two data sets being less than 0.0003. Obviously, any classification method in TSMIL is superior to other existing methods in terms of AUC and accuracy. For example, on the TCGA-NSCLC dataset, TSMILs (HybS) and the remaining two methods achieved the same classification performance, and the classification performance of both methods was superior to both SRCL and fransmils, specifically higher than 0.0264 and 0.0391 in AUC, respectively. On the TCGA-RCC dataset, the AUC of TSMIL (HybS) and TSMIL (AvgS) performed best, TSMIL (MaxS) achieved the highest accuracy of 98.76% (95% CI, 0.9836-0.9908). Compared with other methods, the accuracy and AUC of TSMILs (HybS) are 3.34% and 0.0072% higher than NAGCN, respectively. TSMIL (MaxS) was also 0.0079 and 0.0107 higher in AUC than SRCL and TransMIL, respectively. These results demonstrate that the proposed method achieves superior performance with only weak supervision labels, verifying that the proposed method has a degree of feasibility in WSI classification.

(2) Necessity of L2 standardization.

This section shows the performance impact of the L2 Norm layer on TSMILs. In WSI feature optimization, feature representation of a packet is optimized by reducing the included angle between inter-class packet feature vectors while increasing the included angle between inter-class packet feature vectors. Whereas the feature vectors of the packets are obtained by GAM aggregating the feature vectors of all examples. The exemplary feature vector thus affects the expression of the packet feature to some extent. Since only the angles of the vectors are optimized, only the direction of the vectors is the information of interest in TSMIL, and the modulo length of the vectors may be redundant information and may even affect the performance of the model. Therefore, to verify this idea, the present invention designed two sets of comparative experiments on three data sets (TCGA-NSCLC and TCGA-RCC), one set of experiments in which the TSMILs method left out the L2 Norm layer and the other set of experiments still adopted the L2 Norm layer, both sets of experiments using TSMILs (MaxS) as the classification. The experimental comparison results on the three data sets are shown in FIG. 5 (a), where w/o L2 Nor used no L2 Nor layer and with L2 Nor used L2 Nor layer. The comparison of the models takes the accuracy as an evaluation index.

From experimental results, the L2 Norm layer provides beneficial help to the classification performance of TSMILs, and even the key network layer that TSMILs can effectively perform the classification function. As can be seen from FIG. 3, the TSMIL using the L2 Norm layer was 5.49% and 12.04% higher accuracy than the TSMIL without the L2 Norm layer on both the TCGA-RCC and TCGA-NSCLC data sets, respectively.

To explore the reason for the significant improvement of TSMILs classification performance by the L2 Norm layer, the present invention demonstrates the change in loss during training on TCGA-NSCLC datasets with large performance improvement, as shown in fig. 5 (b). Wherein the solid line represents the use of an L2 Nor layer and the dotted line does not use an L2 Nor layer. FIG. 5 (b) shows that performing an L2 normalization operation on example features effectively reduces the loss of the model compared to models that do not use an L2 Norm layer, with loss differences of even greater than 200 for both models, indicating that erasure of example feature length information renders classification of the model effective, and also indicating the necessity and effectiveness of the present invention to increase the L2 Norm layer.

(3) Interpretability.

This section demonstrates the interpretability of TSMILs. In practical application, TSMIL is more hopeful to provide visual display for a professional pathologist through a visual method, and the work load of the pathologist is reduced by displaying a high suspicious region in WSI to assist the pathologist in reading. In TSMIL, each WSI is cut into a series of image blocks. Then, after MFEM extracts image block features, GAM assigns a weight score to each image block feature. Here, the invention shows the region of interest of TSMILs in the form of an attention heat map by visualizing the weights of the image block features into the original WSI, wherein a high attention value is represented by red, which represents that the image block has a higher contribution to the classification prediction of WSI, whereas a blue color represents that the image block has no help to the classification prediction of WSI. As shown in fig. 6 (a) to 6 (d), two WSIs were selected from TCGA-RCC, respectively, in which two WSIs from TCGA-RCC were annotated by pathologists having working experiences of five years or more, fig. 6 (e) but index graphs of fig. 6 (b) and 6 (d). In fig. 6 (a) -6 (d), the region within the blue curve in fig. 6 (a) and 6 (c) is an approximate annotation of the pixel level of the expert pathologist, representing the presence of a large number of positive lesions in this region, and fig. 6 (b) and 6 (d) are probability heatmaps of the TSMILs output.

From the probability heatmaps in fig. 6 (a) to 6 (d), it can be seen that TSMILs have the ability to identify suspicious regions associated with cancer in positive WSI. From a comparison between the two sets of images, it is clear that TSMILs activated significantly positively in the expert-labeled region, indicating that TSMILs considered the presence of a large number of suspicious regions associated with cancer in this region.

In summary, the invention has the following characteristics:

1) The invention designs a framework for combining weak supervision multi-example learning and metric learning-based training, and realizes a tumor auxiliary intelligent diagnosis method oriented to full-slice pathological images. The method can realize model training without the aid of fine pixel level horizontal labeling and only by utilizing WSI image level label information, achieves model performance superior to that of training for providing a large number of pixel level labeling images, avoids the dependence of a deep learning method on pixel level labeling of full-slice pathological images, and greatly reduces the huge workload of pathological expert labeling images.

2) The classification diagnosis of the pathological image designed by the invention is based on the global overall characteristics of WSI, and the acquisition process of the characteristics comprises three modules, namely a Multi-scale characteristic extraction Module (Multi-scale Feature Extraction Module, MFEM), a projection Module (Projection Module, PM) and a Gating Attention Module (GAM). Firstly, when pathologists conduct pathological diagnosis, multi-scale observation is usually conducted on a glass slide, so that complete detection and accurate judgment of suspicious lesions are achieved, and missed diagnosis and misdiagnosis of cancer lesions are avoided. Based on the thought, the invention designs a multi-scale feature extraction module MFEM, which can extract the features of each example on multiple scales and realize the completeness expression of the example features by fusing the multi-scale features. Secondly, unlike the existing classical multi-example learning MILs method that does not do any operation on example features, the present invention proposes that the projection module PM aims to project example features into one low-dimensional unit hypersphere feature space, so that the different example feature vectors only have differences in direction in the unit feature space; meanwhile, the invention applies two different regularization operations in PM to stabilize the training of the coding layer. Furthermore, the present invention is designed to apply a gated attention mechanism GAM with the aim of assigning a piece of learnable weight information to each example vector and generating an embedded feature representation of the package by fusing all example feature vectors in one package.

3) After the integral features of the WSI are extracted, the classification training process of the pathological image can continuously improve the similarity between the features of the WSI of the same category, and simultaneously reduce the similarity between the features of the WSI of different categories, so that the features of the WSI of the same category are more similar, and the features of the WSIs of different categories are more different. These characteristics are more advantageous for subsequent classification diagnostics for WSI based in part on metrics.

4) Based on the training sample distribution, TSMILs propose three metrics-based WSI packet classification strategies: maxS, aveS, and HybS. When the test sample is subjected to classification diagnosis, the three different classification strategies measure the distance between the test sample and each class cluster in different measurement modes, and the class cluster class closest to the distance is used as the prediction of the test sample.

5) The invention utilizes the weak supervision multi-example learning framework to be more easily transferred and applied to the analysis of a plurality of different tissue pathology images, has strong generalization capability and low learning cost, and is easy to realize. In addition, the invention is a novel deep learning framework, and can realize the auxiliary diagnosis of the full-slice pathological image in a shorter time only through forward propagation of a network during testing, and has high calculation time efficiency and low hardware requirement cost.

Claims

1. The tumor subtype diagnosis method for the full-slice pathological image is characterized by comprising the following steps of:

2. The tumor subtype diagnosis method facing to the full-slice pathological image according to claim 1, wherein the similarity between the embedded feature representation of the WSI to be classified and each cluster is measured by adopting any one of the following modes:

3. The full-slice pathology image oriented tumor subtype diagnosis method according to claim 1, wherein the loss function used in the training of the tumor subtype diagnosis model is:

wherein Loss represents a Loss value; b represents the batch size involved in one iteration training; k (K) _i And L _i Representation package X _i The number of the existing similar samples and the number of the existing heterogeneous samples are respectively as followsAnd->Gamma represents a scaling factor;and->Representing the weight factor; delta _pos And delta _neg Respectively, intra-class spacing and inter-class spacing.

4. The full slice pathology image oriented tumor subtype diagnostic method according to claim 1, wherein the preprocessing comprises segmenting tissue regions in WSI into a plurality of image blocks.

5. The full slice pathology image oriented tumor subtype diagnostic method according to claim 4, wherein the overall feature extraction comprises a feature extraction module, a projection module and a gated attention module; the feature extraction module is used for extracting the features of each image block; the projection module is used for projecting the features extracted by the feature extraction module into a low-dimensional unit feature space to obtain a plurality of unit vectors, so that different feature vectors only have differences in directions in the unit feature space; the gated attention module is used for fusing the outputs of the projection modules to generate an embedded feature representation.

6. The tumor subtype diagnosis method facing the full-slice pathological image according to claim 5, wherein the feature extraction module adopts ResNet101 as a backbone network, and is used for respectively processing features output by Stage3 and Stage4 in ResNet101 through an adaptive average pooling layer, and then performing splicing processing on the output through the adaptive average pooling layer to extract the features corresponding to each image block.

7. The full slice pathology image oriented tumor subtype diagnostic method according to claim 5, wherein the projection module comprises a trainable batch Norm1d layer, a fine tuning layer Proj-Fc parameterized by weight, and an L2 Norm layer; the BatchNorm1d layer is used for normalizing the input feature matrix along the direction of the feature dimension, and the calculation process of the fine tuning layer Proj-Fc is as follows:representing a weight matrix; />Representing the output of the trim layer Proj-Fc; reLU represents an activation function; the L2 Norm layer is used to normalize each feature vector in the output of the fine-tuning layer Proj-Fc to a unit vector.

8. The full slice pathology image oriented tumor subtype diagnostic method according to claim 5, wherein the gated attention module comprises an Attn-Fc1 layer, an Attn-Fc2 layer, an Attn-Fc3 layer and a Dropout layer; the Attn-Fc1 layer is used for outputting a compression characteristic for the input characteristic, and mapping the characteristic value in the compression characteristic into a positive infinity-negative infinity interval through a Tanh activation function; the Attn-Fc2 layer is used for playing a gating role, and the output result of the Attn-Fc1 layer is regulated and controlled by mapping the output value of the network between 0 and 1 through a Sigmoid activation function; the output results of the Attn-Fc1 layer and the Attn-Fc2 layer are transmitted into the Attn-Fc3 layer after the bitwise multiplication operation; the Attn-Fc3 layer is configured to generate an attention score of each feature and process the attention score by the Dropout layer, and further perform weighted summation on the unit vectors by using the attention score processed by the Dropout layer to obtain the embedded feature representation.

9. The full-slice pathology image oriented tumor subtype diagnosis method according to claim 4, wherein the preprocessing comprises: firstly converting WSI from RGB color space to HSV color space, then calculating a segmentation threshold value of a background area and a tissue area by adopting an Ojin method aiming at saturation channels in the HSV color space, and extracting a tissue mask by binarizing the saturation channels based on the segmentation threshold value to extract the tissue area; the tissue region is then segmented into a series of image blocks of equal size using a sliding window.

10. The method of claim 9, wherein after extracting the tissue mask, the tissue region is extracted by mean filtering and morphological closing.