CN114119585A

CN114119585A - Method for identifying key feature enhanced gastric cancer image based on Transformer

Info

Publication number: CN114119585A
Application number: CN202111457189.3A
Authority: CN
Inventors: 李华锋; 柴毅; 唐凌峰
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-03-01
Anticipated expiration: 2041-12-01
Also published as: CN114119585B

Abstract

The invention relates to a transform-based key feature enhanced gastric cancer image identification method. According to the method, a local lesion area is screened out by using a pre-trained YoloV5 network, and the key characteristics of the area to be identified are further enhanced by using a cross information Transformer network for images to be classified. In the cross information Transformer network, the characteristics of the lesion region in the image to be classified are enhanced by multi-head self-attention. The entire network is trained by classification loss and triplet loss. And after the training is finished, inputting the test set image into the trained network model, and evaluating the performance index of the network. Compared with the existing gastric cancer image identification method, the detection mechanism of the lesion area can effectively screen key characteristic information, weaken the interference of invalid background information, and meanwhile, the cross information Transformer network can fully enhance the characteristic representation of the lesion area information and improve the gastric cancer image identification precision.

Description

Method for identifying key feature enhanced gastric cancer image based on Transformer

Technical Field

The invention relates to a transform-based key feature enhanced gastric cancer image identification method, and belongs to the field of image identification in computer vision.

Background

Gastric cancer is one of the most common cancers, and the number of deaths due to lung cancer is the second leading cause of cancer death worldwide each year. In order to improve the accuracy and efficiency of gastric cancer detection, computer methods have been increasingly focused on assisting pathological image analysis in the past decades. The identification of gastric cancer images is difficult due to the slight color difference of cells and the problems of overlapping and uneven distribution of cells among different gastric cancer pathological images. At present, deep learning techniques are widely used in various computer vision fields, and have the best performance in many applications such as image recognition. Some of the current relevant work uses deep learning for pathological image analysis. The CNN network is applied to the fields of deep segmentation and classification of epithelial regions and stroma regions in histopathology images, cancer region detection, cancer image identification and the like. The invention mainly focuses on the problem of clinical gastric cancer image identification. Manual pathological examination of stomach section pictures is time consuming and often affects the accuracy of the determination due to inconsistent judgment criteria caused by observer variability. Most of the current methods are based on convolutional neural networks and achieve certain effects. Recently, after seeing the great success of linguistic tasks, researchers have been exploring the way transformers apply to computer vision tasks. The invention mainly researches the identification problem of applying the Transformer to clinical gastric cancer focuses.

Because of the problems of slight color difference, overlapping, uneven distribution and the like of cells between different gastric cancer pathological images, how to effectively enhance information of lesion region characteristics and improve the capability of network to pay attention to remarkable discriminability information is a key problem for improving network identification performance at present. In order to solve the problems, the invention provides a transform-based key feature-enhanced gastric cancer image identification method. Although the convolution-based network has translation invariance, the transform-based network design has more capability of integrating global information, and is more robust to disturbance.

Disclosure of Invention

The invention provides a transform-based key feature enhanced gastric cancer image identification method, which is used for solving the problem of poor network identification robustness caused by large differences in appearance and distribution among different gastric cancer pathological images.

The technical scheme of the invention is as follows: a method for identifying a key feature enhanced gastric cancer image based on a Transformer comprises the following specific steps:

step1, collecting data sets of gastric cancer pictures and normal stomach pictures which are disclosed currently to form a data set;

step2, further identifying gastric cancer pictures with the existing category labels, wherein the identified information comprises whether the pictures contain the focus of gastric cancer tumor cells and the position of the focus;

step3, performing data enhancement on the existing gastric cancer pathology picture to expand a data sample;

step4, loading the pre-trained weight of the YoloV5 network, and then finely adjusting the YoloV5 network by using the gastric cancer image recognition data set;

step5, respectively extracting global features of the complete image and local features of the cut image, inputting the global features and the local features into a transform network, and enhancing the characteristics of lesion areas in the image to be classified through multi-head self-attention; finally, adding a full connection layer as a classifier for classification;

step6, training the whole network through cross entropy loss and triple loss on a training set;

step7, verifying whether the trained model meets the requirements or not by using the test set; in order to evaluate the model effect, the average classification accuracy ACA and the average accuracy AP of all the test images are used as evaluation indexes.

As a further scheme of the invention, the data set adopted in Step1 comprises a BOT gastric slice data set and a seed cancer risk intelligent diagnosis data set, 80% of pictures are divided into a training set, and 20% of pictures are divided into a testing set.

As a further aspect of the present invention, the data enhancement method used in Step3 includes: mirroring and rotating; wherein 30% of the pictures of the training set are randomly extracted for mirroring, 30% of the extracted remaining pictures are randomly rotated by 90 degrees, 180 degrees and 270 degrees clockwise, and the remaining pictures are not operated.

In a further embodiment of the present invention, Step4 is to fine tune the YoloV5 network weight trained on ImageNet, adjust the detection effect of the network on the gastric cancer tumor lesion, and cut out a local image including the lesion area on the original data set by using the coordinates of the detection result.

As a further scheme of the invention, the specific steps of Step5 are as follows:

step5.1, respectively extracting global features of the complete image and local features of the cut image, and inputting the global features and the local features into a transform network;

in a Step5.2, a transform network, by establishing a cross information flow relationship between a global feature and a local feature of a cut image, the cross-scale relationship between the local focus feature and a global feature token is favorably identified, and by the cross-scale relationship, the features of two scales are highly aligned and mutually coupled;

respectively effectively processing local focus characteristics f in Step5.3 and Transformer networks_lAnd global feature f_gThereby extracting local and global features to the maximum extent;

step5.4, upsampling local lesion feature f_lIt is compared with the global feature f_gConnecting, and performing convolution one by one to perform channel dual-scale information fusion to obtain the output characteristic f of the network_O；

Step5.5, output feature f_OAnd inputting a classifier for classification, wherein the classifier is composed of two fully-connected layers.

As a further aspect of the present invention, Step5 includes:

in the Transformer network, an image reshape with the size of H × W × C is formed into a 2-dimensional image block with the size of N × P²X is C; wherein, P²Is the size of the image in the spatial dimension, N ═ H × W/P²N is the number of image blocks, affecting the length of the input sequence; location embedding is added to the patch embedding to preserve location information; the Transformer encoder consists of a multi-head self-attention and multi-layer perceptron of a plurality of interaction layers, wherein the multi-layer perceptron comprises two GELU nonlinear layers; LayerNorm is applied before each block, while residual concatenation is applied after each block;

for a global feature f having a size of W H C_gA 1 is to f_gInto a sequence L of length L_g(ii) a For local lesion feature f with size W × H × C_lA 1 is to f_lFlattened into a sequence L of length L_l(ii) a Through the operation, each vector in the sequence is regarded as a visual mark without space information, the convolution result is completely different, the dependency relationship between different mark pairs is independent of the space positions of the mark pairs in the feature map, and in order to mine the correlation relationship of local feature information in the global feature, the L is divided into L by adopting a full connection layer_lMapping to a sequence L of length L_{g_l}；

The global information is integrated, and the focus area characteristic coupling relation is modeled through an attention mechanism:

f_Q＝W_Q×L_g，f_K＝W_K×L_{g_l}，f_V＝W_V×L_g

wherein f is_Q，f_K，f_VRespectively inputting the multi-head self-attention in the transform; wherein, W_Q、W_K、W_vRespectively representing generating a matrix of queries, keys, and values; by calculating f_QAnd f_KThe similarity between the two is obtained_KAt f_QAttention weights for different location information; finally, the attention weight sum f is calculated_VThereby obtaining a composite signature:

wherein the content of the first and second substances,

the method is used for standardizing the features, effectively enhancing the feature representation of the key focus region in the global features by using a Transformer structure, enhancing the characteristics of the lesion region in the image to be classified by using multi-head self-attention, and improving the discrimination capability of a network on the lesion region.

As a further aspect of the present invention, the cross-entropy loss in Step6 is expressed as follows:

wherein, W_clsRepresents a class classifier, n_bIndicates the Batch image number Batch size,

is onehot vector, only the ith element is 1;

in addition to optimizing the network by using cross entropy loss, the characteristics of different gastric cancer images are constrained to have high similarity by the triplet loss, and different categories have low similarity, and a specific triplet loss optimization formula is as follows:

due to L_triThe intra-class and inter-class samples are constrained simultaneously, so n_2b＝2n_bI.e. n_bStomach cancer image sample and n_bAll non-gastric cancer image samples were involved in the calculation of the loss, wherein f_iRepresents n_2bOne of the samples, f_i ^pDenotes f_iCorresponding hard positive sample, f_i ⁿDenotes f_iCorresponding to the hard negative sample, m is set to 0.3.

The invention has the beneficial effects that:

(1) the trans-scale cross information Transformer network can effectively enhance the information of a focus area in a gastric cancer image, improves the identification precision of the gastric cancer image, and is beneficial to accurately identifying gastric cancer tumor parts;

(2) the network design based on the Transformer has the capability of integrating global information and has robustness to disturbance.

Drawings

FIG. 1 is a general flow chart of the present invention;

Detailed Description

Example 1: as shown in fig. 1, a method for identifying a key feature-enhanced gastric cancer image based on a transform specifically comprises the following steps:

step1, collecting data sets of gastric cancer pictures and normal stomach pictures which are disclosed currently, forming a data set comprising a BOT gastric section data set and a seed cancer risk intelligent diagnosis data set, and dividing 80% of pictures into a training set and 20% of pictures into a testing set.

Step2, manually using LabelImg software to further identify gastric cancer pictures with existing category labels for improving detection precision, wherein the identified information comprises whether the pictures contain the focus of gastric cancer tumor cells and the position of the focus;

step3, the data set contains 4560 pictures, and the data enhancement is carried out on the existing gastric cancer pathology pictures to expand the data samples; the data enhancement method used therein comprises: mirroring and rotating; wherein 30% of the pictures of the training set are randomly extracted for mirroring, 30% of the extracted remaining pictures are randomly rotated by 90 degrees, 180 degrees and 270 degrees clockwise, and the remaining pictures are not operated.

Step4, loading the pre-trained weight of the YoloV5 network, and then finely adjusting the YoloV5 network by using the gastric cancer image recognition data set; since YoloV5 network weights are trained on ImageNet, the detection accuracy of lesion areas in gastric cancer images needs to be improved. And training a YoloV5 network by using partial gastric cancer pictures with well-marked focus positions to improve the capability of detecting the focus positions of the gastric cancer by the network.

Step5, cutting out a lesion area image according to the detected coordinates; respectively extracting global features of the complete image and local features of the cut image, inputting the global features and the local features into a transform network, and enhancing the characteristics of lesion areas in the image to be classified through multi-head self-attention; finally, adding a full connection layer as a classifier for classification;

f_Q＝W_Q×L_g，f_K＝W_K×L_{g_l}，f_V＝W_V×L_g

wherein the content of the first and second substances,

the method is used for standardizing the characteristics, effectively enhancing the characteristic representation of a key focus area in the global characteristics by using a Transformer structure, enhancing the characteristics of a lesion area in an image to be classified by using multi-head self-attention and improving the discrimination capability of a network on the lesion area; effectively relieves the problem of reduced network discrimination capability caused by color difference, overlapping and uneven distribution among different gastric cancer pathological images. In the Transformer network, by establishing the cross information flow relationship between the global feature and the local feature of the cut image, the cross information flow can identify the cross-scale relationship between the local focus feature and the global feature token, and through the relationship, the features of the two scales are highly aligned and mutually coupled. In addition, the Transformer effectively processes the feature mapping of the local focus features and the global features respectively, so that the local features and the global features are extracted to the maximum extent. After this, we upsample the local lesion feature f_lIt is compared with the global feature f_gConnecting, and performing convolution one by one to perform channel dual-scale information fusion to obtain the output characteristic f of the network_O. Finally, we add a full link layer as a classifier to classify.

Step6, training the whole network through cross entropy loss and triple loss on a training set; specifically, we used the BOT gastric section dataset. The data set contained 560 gastric cancer sections and 140 normal sections. Sections were stained with hematoxin-eosin at 20-fold magnification. The resolution of the stomach slices was 2048x 2048. The tumor area portion is annotated by the data provider. In order to expand the data set samples, a seed cancer risk intelligent diagnosis data set is added, wherein the data set comprises 4000 samples, the data comprises a positive sample and a negative sample, a part of regions in the positive sample have gastric cancer focuses, and the negative sample does not have the gastric cancer focuses. By integrating the samples of the two data sets, the data set used in the method contains 4560 pictures. In the experiment, 80% of the stomach sections (normal and cancer) were randomly selected for network training, while the remaining 20% of the sections were used for testing.

In order to extract the robust features with class discriminant, the network adopts cross entropy loss and triple loss to output features f of the network_OAnd (6) carrying out constraint.

is onehot vector, only the ith element is 1;

Step7, verifying whether the trained model meets the requirements or not by using the test set; to evaluate the model effect, the Average Classification Accuracy (ACA) and average Accuracy (AP) of all test images were used as evaluation indexes. The average classification accuracy represents the overall correctness classification rate for all test images. The average accuracy calculation formula is the actual number of positive samples/all positive samples in the predicted sample.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A method for identifying key feature-enhanced gastric cancer images based on transformers is characterized by comprising the following specific steps:

2. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: the data set adopted in Step1 comprises a BOT stomach slice data set and a seed cancer risk intelligent diagnosis data set, 80% of pictures are divided into a training set, and 20% of pictures are divided into a testing set.

3. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: the data enhancement method used in Step3 comprises the following steps: mirroring and rotating; wherein 30% of the pictures of the training set are randomly extracted for mirroring, 30% of the extracted remaining pictures are randomly rotated by 90 degrees, 180 degrees and 270 degrees clockwise, and the remaining pictures are not operated.

4. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: in Step4, the YoloV5 network weight trained on ImageNet is finely adjusted, the detection effect of the network on the gastric cancer tumor focus is adjusted, and a local picture containing the focus area is cut out on an original data set by using the coordinates of the detection result.

5. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: the specific steps of Step5 are as follows:

step5.4, upsampling local lesion feature f_lIt is compared with the global feature f_gConnecting and performing convolution one by one to perform channel dual-scale information fusionCombining to obtain the output characteristic f of the network_O；

6. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: step5 comprises the following steps:

f_Q＝W_Q×L_g，f_K＝W_K×L_{g_l}，f_V＝W_V×L_g

wherein f is_Q，f_K，f_VMultiple self-injection in TransformerInputting an intention force; wherein, W_Q、W_K、W_vRespectively representing generating a matrix of queries, keys, and values; by calculating f_QAnd f_KThe similarity between the two is obtained_KAt f_QAttention weights for different location information; finally, the attention weight sum f is calculated_VThereby obtaining a composite signature:

wherein the content of the first and second substances,

7. The method for identifying transform-based key feature-enhanced gastric cancer images according to claim 1, wherein: the cross entropy loss in Step6 is expressed as follows:

is onehot vector, only the ith element is 1;