CN114067159A

CN114067159A - EUS-based fine-granularity classification method for submucosal tumors

Info

Publication number: CN114067159A
Application number: CN202111375187.XA
Authority: CN
Inventors: 郑杭彬; 鲍劲松; 刘天元; 汪俊亮
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-02-18

Abstract

The invention relates to an EUS-based fine-grained classification method for submucosal tumors, which is characterized by comprising the following steps of: establishing a multi-scale relation attention network; training a relationship attention network in two stages; and inputting the EUS image obtained in real time into the trained relational attention network, and outputting the corresponding class label by the relational attention network. Compared with the prior art, the method can effectively improve the identification accuracy of the submucosal tumor based on the EUS, effectively reduce the dependence on the target level data label in the EUS identification process, and effectively identify the submucosal tumor area in the EUS image.

Description

EUS-based fine-granularity classification method for submucosal tumors

Technical Field

The invention relates to a vision system calibration method for submucosal tumor online detection, and belongs to the technical field of example fine-grained image classification.

Background

The gastrointestinal submucosa lesions (SMTs) refer to lesions from all levels below the mucosal layer of the gastrointestinal tract, and the common gastrointestinal SMTs comprise 5 types including gastrointestinal stromal tumors, ectopic pancreas, lipoma, leiomyoma and neuroendocrine tumors. Since they all exhibit normal elevated lesions of the mucosa under endoscopy, the level and nature of origin cannot be determined by conventional endoscopy techniques. Therefore, after finding the normal elevated lesion of the mucosa by using the white light endoscope, other imaging means are required to further judge the tumor type.

Among them, EUS is one of the most accurate imaging methods for assessing gut SMT. During imaging of EUS, the transducer forms an ultrasound image of the tissue by sending short pulses of ultrasound energy into the tissue and receiving tissue-reflected signals, thereby helping the physician to further assess the nature of the submucosal tumor. However, the image content of the EUS is complex and varied, usually exhibiting a 1-5 layer ring hierarchy, and showing damage to the hierarchy in the tumor region, and clinicians usually rely on observing the level of origin of the lesion in the region of interest and its representation (e.g., shape, hyperechoic, echoic patterns, etc.) in the ultrasound endoscope to make a diagnosis.

With the continuous progress of science and technology, the connection between computer technology and medical treatment is more and more compact, and the application of computer vision to the medical treatment field is more and more, and the fact that cloud film reading and robot contactless medicine delivery and the like prove that the technologies really provide great assistance. While during the medical procedure, manual diagnosis of EUS-based SMTs has a steep learning curve, manual interpretation requires years of learning practice and is aggregated into empirical knowledge. The diagnosis is carried out by a computer-aided clinician, the learning cost of the EUS can be reduced, and the working efficiency of the clinician is improved, so that the diagnosis method has important research significance.

However, identification of EUS-based SMTs has several challenges. First, the EUS image is usually accompanied by a lot of speckle noise and, in some cases, artifacts due to the difficulty and instability of the ultrasound imaging operation. The characteristics of high noise and strong interference of the image bring huge challenges to the extraction of effective information. Second, the EUS image can reflect the hierarchical structure information of the gastrointestinal wall, and these information are used to determine the level of origin of the lesion, thereby further characterizing the lesion. However, the EUS imaging often has dense layers and unclear boundaries, and the hierarchical structure thereof has irregular shape, which is not obvious, and the high-variable hierarchical characteristics further increase the difficulty of extracting hierarchical origin information. Finally, the EUS image has quite complex semantic relations, including multiple semantics such as lesion region tumor shape, echo pattern and tumor origin level, and these semantics and the coupling relation existing between them play an important role in determining the tumor type to different degrees, but further increase the difficulty of recognition based on visual SMTs.

Disclosure of Invention

The purpose of the invention is: the implementation classifies SMTs based only on image-level labels.

In order to achieve the aim, the technical scheme of the invention is to provide a method for classifying submucosal tumors based on EUS (EUS), which is characterized by comprising the following steps of:

step 1, establishing a multi-scale relational attention network, wherein the relational attention network comprises an encoder and a relational attention module, the encoder is used for extracting a plurality of scale features of an input EUS image, the relational attention module is used for obtaining an attention map of the features, the attention map and the features extracted by the encoder are multiplied element by element to enhance an interested area in a spatial domain, inhibit an irrelevant area, and output a category label based on the interested area, and the category is different predefined tumor categories;

step 2, training the model in two stages, training a feature extractor for tumor classification by adopting an image restoration-based self-supervision learning pre-training method, and adding a multi-scale feature relation attention network at the downstream to further train the model until convergence, specifically comprising the following steps:

step 201, obtaining EUS image samples of different categories;

step 202, forming the EUS image samples of different categories obtained in the step 201 into an EUS image training sample set;

step 203, processing each EUS image sample in the EUS image training sample set by adopting an EUS self-adaptive occlusion algorithm to obtain an occluded image sample and a corresponding occluded area, wherein the EUS self-adaptive occlusion algorithm adaptively occludes partial information on an EUS image hierarchical structure by utilizing an imaging characteristic of the EUS and is used for realizing self-supervision learning;

step 204, taking the occlusion image sample as input and the corresponding occlusion area as a label, and training a model structure adopting a Context Encoder so as to finish the training of the first stage, wherein the model structure comprises an Encoder, a decoder and a discriminator which are realized based on a convolutional neural network structure, the occlusion area of the input occlusion image sample is complemented by the model structure, and a complete EUS image sample is output;

step 205, assigning the weight of the encoder of the trained model structure to an encoder of a relational attention network, wherein the encoder is a feature extractor;

step 206, taking the EUS image sample obtained in step 201 as input and the corresponding category as a label, and training the relationship attention network after updating the weight of the encoder, thereby completing the second stage of training;

and 3, inputting the EUS image obtained in real time into the trained relational attention network, and outputting the corresponding class label by the relational attention network.

Preferably, the tumor categories include gastrointestinal stromal tumors, ectopic pancreas, neuroendocrine tumors, lipomas, and leiomyomas.

Preferably, the EUS adaptive occlusion algorithm includes the following steps:

step 2031, histogram equalization processing is carried out on the EUS image sample, then thresholding operation is carried out, so that the values of all pixels with pixel values between [0.7 and 0.8] in the image after histogram equalization are 1, the values of all pixels with pixel values between [0.85 and 0.99] are 2, and the pixel values of the rest pixels are 0, thereby obtaining a thresholded image;

step 2032, performing erosion operation on the thresholded image to make the region profiles of the regions close to each other in the thresholded image mutually communicated, and the region profiles of the regions far away from each other mutually separated;

step 2033, searching for closed curves in the image obtained in the previous step, calculating the area of a curve region of a region surrounded by each closed curve, sorting the areas of the curve regions, selecting the first three closed curves with the largest area of the curve region from the sorted areas, and assuming the three closed curves as the outer contour curves of the image hierarchical structure;

step 2034, collecting points inside the outer contour curve, using the obtained points as center coordinate points for generating masks, generating masks based on the center coordinate points, and using the masks to mask the EUS image samples, thereby obtaining the masked image samples and the corresponding masked areas.

Preferably, in step 2034, when the point is located inside the outer contour curve, any one point inside the outer contour curve is selected as the center coordinate point.

Preferably, in step 2034, the mask is a square mask.

Compared with the prior art, the method can effectively improve the identification accuracy of the submucosal tumor based on the EUS, effectively reduce the dependence on the target level data label in the EUS identification process, and effectively identify the submucosal tumor area in the EUS image.

Drawings

FIG. 1 is a general framework diagram of a fine-grained classification method for submucosal tumors under EUS according to the present invention;

FIG. 2 is a schematic diagram of the present invention illustrating a self-supervised pre-training process for fine-grained classification of submucosal tumors under EUS;

FIG. 3 is a schematic diagram of a multi-scale relationship attention network suitable for fine-grained classification of submucosal tumors under EUS according to the present invention;

FIG. 4 is a schematic illustration of a relational attention mechanism for fine-grained classification of submucosal tumors under EUS in accordance with the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The invention provides a EUS-based fine-grained classification method for submucosal tumors, which comprises the following steps of:

step 1, establishing a hierarchical structure perception module based on self-supervision learning, learning hierarchical structure information of an image under the condition of only image level labeling by means of an auxiliary task of hierarchical region image restoration, and comprising the following steps:

step 101, collecting EUS image samples containing five categories, and establishing an EUS image training sample set. The five categories are gastrointestinal stromal tumors, ectopic pancreas, neuroendocrine tumors, lipomas, and leiomyoma, respectively.

And 102, shielding the hierarchical region by adopting an EUS adaptive shielding algorithm (LPM) aiming at each EUS image sample in the EUS image training sample set to obtain shielded image samples and corresponding shielded regions, wherein the shielded image samples and the corresponding shielded regions are used as input and labels for training a CNN model structure adopting a Context encoder.

The EUS self-adaptive occlusion algorithm utilizes partial information on an imaging characteristic self-adaptive occlusion hierarchical structure of the EUS to be used for an auxiliary task of self-supervision learning, and specifically comprises the following steps:

step 1021, processing the EUS image sample by using a Contrast Limited Adaptive Histogram Equalization algorithm, and matching with a thresholding operation, so that the pixel values of all pixels with pixel values between [0.7 and 0.8] in the Histogram equalized image are 1, the pixel values of all pixels with pixel values between [0.85 and 0.99] are 2, and the pixel values of the rest pixels are 0, thereby obtaining an image Y.

Step 1022, performing corrosion operation on the image Y to fuse the areas of the image Y close to each other, and searching for closed curves on the corroded image, wherein each closed curve corresponds to one contour, so as to form a contour set { C };

and step 1023, calculating the area of the region surrounded by each contour in the contour set { C }, and selecting three contours with the largest area as candidate regions for applying masks.

Step 1024, sampling any point in the candidate area as the center coordinate point of the square mask, and generating a mask with a size of 56 × 56. In the image, the pixel value of the pixel of the region where the mask is located is 0, and the pixel value of the rest part is 1. And covering the mask on the original image by a method of element-by-element multiplication to obtain the shielded image and the corresponding mask area. The shielded image is a shielded image sample, and the mask area is a corresponding shielded area.

And 103, training a model structure adopting a Context encoder by taking the occlusion image sample as input and the corresponding occlusion area as a label. The model structure comprises an encoder, a decoder and a discriminator which are realized based on a convolutional neural network structure and is used for repairing the shielded area in the shielded image sample, so that the encoder can learn the hierarchical structure information of the image in the process of repairing the image.

And 2, establishing a multi-scale relational attention network, wherein the multi-scale relational attention network comprises an encoder sharing the weight with the encoder in the hierarchical structure perception module and a relational attention module. In step 1, after the model structure of the Context encoder is adopted to complete training, the weight of the encoder in the hierarchical structure sensing module is shared with the encoder of the multi-scale relational attention network, so that the encoder of the multi-scale relational attention network updates the weight before training.

The multi-scale relation attention network enables a model to learn different importance of the relations to tumor identification by modeling the relations among the space semantics under different scales, so that the complex semantic information in the EUS image is fully mined.

A multi-scale relational attention network includes an encoder and a relational attention module. The encoder is implemented based on a given convolutional neural network architecture. The multi-scale relational attention network extracts features of different scales output by the encoder. Taking VGG-16 as an example, the CNN features obtained after each maxporoling layer is extracted as the input of the relationship attention module. Let the batch contain B samples, and the feature can be expressed as (B, C, W, H), where C represents the number of channels, W represents the width of the feature map, and H represents the height of the feature map.

The relationship attention module processes the input features (B, C, W, H) by taking the following steps:

step 201, for a CNN feature with an input of (B, C, W, H) at a certain scale, firstly, obtaining a feature map of (B,1, W, H) according to a channel dimension by using an Average firing operation;

step 202, sampling the pixel pair in the 3 × 3 spatial domain on the feature map obtained in step 201, and aggregating two points of pixels of the pixel pair by using a convolution kernel to obtain a relational expression of the pixel pair. Such a relationship of pixel pairs represents a spatial semantic relationship that exhibits different granularities at different scales, such as: on a low-level scale, this relationship is expressed as a relationship between points and dots; on a high-level scale, the relationship representation of such pixel pairs is the relationship between abstract semantics.

Step 203 learns the weights of different relational expressions for the relational expressions of pixel pairs at different positions in the same spatial domain. In this embodiment, a relationship representation of n pairs of pixels (in this embodiment, n is 36) is generated according to all the possible situations that all the pixels in the 3 × 3 spatial domain form different combinations of pixel pairs, and the attention weights corresponding to different relationship representations are learned, and this process can be expressed as the following function:

wherein p' represents the result after 3X 3 spatial domain polymerization, w_kFor the weight of the representation of the relationship to be learned,

for both pixel points of the kth pair of pixels,

to aggregate the convolution kernels for the kth pair of pixels.

And step 204, repeating the step 201 to the step 203 for C times under different channel dimensions, and performing relation attention learning for C times under different channel dimensions to finally obtain a spatial attention diagram containing semantic relation knowledge of the C channels.

And step 205, adopting the channel attention to purify the spatial attention diagrams of the C channels, and superposing the purified spatial attention diagrams of all the C channels to obtain the attention diagram S of the current input features (B, C, W and H).

Step 206, multiplying the attention map S element by element with the input features (B, C, W, H) to enhance the region of interest in the spatial domain and suppress the extraneous region.

And step 207, carrying out classification judgment based on the enhanced region of interest, and finally outputting a class label.

Claims

1. A fine granularity classification method of submucosal tumors based on EUS is characterized by comprising the following steps:

step 201, obtaining EUS image samples of different categories;

2. The method of claim 1, wherein the tumor categories comprise gastrointestinal stromal tumors, ectopic pancreas, neuroendocrine tumors, lipomas, and leiomyomas.

3. The EUS-based fine-grained classification method for submucosal tumors according to claim 1, wherein the EUS adaptive occlusion algorithm comprises the following steps:

4. The method of claim 4, wherein in step 2034, when points are collected inside the outer contour curve, any point inside the outer contour curve is selected as the center coordinate point.

5. The method of claim 4, wherein in step 2034, the mask is a square mask.