CN117876767A

CN117876767A - Metric-based medical endoscopic image classification algorithm combining subspace attention

Info

Publication number: CN117876767A
Application number: CN202410010192.8A
Authority: CN
Inventors: 金军; 罗以宁; 蒲蔚; 胡大裟
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-04-12

Abstract

The invention discloses a medical endoscope image classification algorithm based on measurement and combining subspace attention, which aims at solving the problems that a large amount of noise exists in an endoscope image due to various complex factors in the endoscope imaging process, the noise signals contained in the image features extracted by a convolutional neural network disturb the spatial distribution of effective semantic features, and the representation capability of a model on key semantic features is weakened, so that the classification precision is reduced. Comprising the following steps: 1) Training a model by adopting a few sample learning method based on measurement; 2) Dividing the features extracted from the higher layer of the network structure into a plurality of feature subspaces; 3) The attention profile and global correlation are calculated separately for each feature subspace to suppress noise in the subspace and highlight the expressed effective semantic region. The image classification method provided by the invention can enable the model to pay attention to different key semantic parts in the endoscopic image, inhibit interference characteristics, and enhance the expression capability of effective semantic characteristics, thereby improving the classification accuracy of the model.

Description

Metric-based medical endoscopic image classification algorithm combining subspace attention

Technical Field

The invention relates to the field of computer-aided diagnosis based on medical images, in particular to a medical endoscope image classification algorithm based on measurement and integrating subspace attention.

Background

Early detection and early diagnosis are important modes for controlling digestive tract diseases and improving survival rate, and medical endoscopic screening is an effective means for detecting early lesions of the digestive tract, and can accurately detect lesion sites. In digestive endoscopy, machine learning-based auxiliary diagnostic systems must accurately extract endoscopic image information, analyze and identify the image information, and provide powerful diagnostic support for doctors.

In the microscopic imaging process, the light source is switched to cause uneven brightness and abnormal color change of the image; the imaging resolution is low and the movement of the endoscope causes blurring of the image to different degrees; digestive tract mucus and residues, and strong light reflection, etc., make many noises in the imaged endoscopic image. The machine learning method based on the deep neural network generally uses the convolutional neural network to extract image features for classification, so that the extracted endoscopic image features also contain a lot of noise, and the existence of the interference signals inevitably disturbs the spatial distribution of semantic features, so that the model is difficult to accurately capture key semantic parts in the image, thereby weakening the semantic representation capability of the model and affecting classification performance.

Disclosure of Invention

The invention aims to: in view of the above, the present invention proposes a metric-based medical endoscopic image classification algorithm that incorporates subspace attention mechanisms.

The technical scheme is as follows: the subspace attention mechanism is introduced into the last stage of the ResNet network structure, the key semantic feature region expressed by the subspace is highlighted, the model focuses on effective feature information, interference features are restrained, and semantic information with stronger expression capability is obtained. The implementation steps are as follows:

step 1: the task of creating a large number of C-way K-shots of small sample classification from up-sampling of a set of medical endoscopic images, specifically, randomly selecting C classes from the set of endoscopic images each time, randomly sampling K images from each class selected, forming a support set I _s Randomly sampling any images from the images left in the C classes to form a query set I _q Support set and query set images are combined into a small sample classification task T _i ＝{(I _s ,I _q ),(y _s ,y _q )}。

Step 2: taking ResNet network structure with the last full connection layer removed as a feature extraction network f _θ C-way K-shot task T established in step 1 is implemented _i The feature extraction network is input, noise contained in the features of the lower layer in the front of the neural network is more, and the features of the higher layer in the rear of the neural network have stronger semantic information and position relation information, so that subspace attention mechanisms are applied in the last stage of the upper layer of the feature extraction network structure, and the semantic feature learning capacity of the model is enhanced.

Specifically, the feature F output by the last BatchNorm layer in each basic block or bottleneck of the last stage of the feature extraction network structure is divided into g subspaces [ F ] along the channel dimension ¹ ,…,F ⁿ ,…,F ^g ]For each feature subspace F ⁿ Two attention matrices are calculated:

①attn(F ⁿ )＝σ(bn(gap(F ⁿ )⊙F ⁿ ))，

wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F ⁿ ) Watch (watch)Attention matrix generated by using similarity between global feature descriptors and local feature descriptors of subspaces, new feature subspaces with an attention mechanism appliedThe spatial distribution of the internal features is enhanced, the interference features are suppressed, key semantic feature areas in the subspaces are highlighted, and the semantic feature learning capacity of the feature subspaces is improved.

②attn(F ⁿ )＝σ(pconv(mp(dconv(F ⁿ ))))，

Wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,representing pixel-by-pixel addition operations, note that the moment array attn (F ⁿ ) Representing the global correlation of semantic and spatial information fused captured across channels, a new feature subspace with an applied attention mechanism +.>The distribution of internal features is improved, highlighting local semantic regions.

Step 3, fusing 2g of new feature subspaces obtained in the step 2:

F _new ＝conv1*1(concat(F _{new_1} ,F _{new_2} ))

wherein concat represents splicing according to channels, conv1 x 1 represents 1 x 1 convolution operation, and is used for fusing different channel data and reducing dimension. Each basic block or bottleneck finally outputs a feature F enhanced by subspace attentiveness mechanisms _new 。

Step 4: extracting network f from features incorporating subspace attention mechanisms _θ Separating a support set feature Z from the finally output image feature Z _s And query set feature Z _q Computing a support set feature class prototype:

step 5: measuring the distance between the query set features and each class prototype, and predicting the probability of the lesion class to which the query set image belongs according to the distance measurement:

where d () represents the euclidean distance.

Step 6: calculating cross entropy loss of predicted query set categories and actual categories thereof:

model network parameters are optimized by back propagation.

The beneficial effects of the invention are as follows: (1) the high-level convolution features of the model network structure are divided into a plurality of feature subspaces, and an attention matrix is independently generated for each subspace, so that the model can pay attention to a plurality of different key semantic parts in the image; (2) two kinds of attention matrixes are generated for each feature subspace through learning cross-channel information and similarity between global and local features, and are used for improving the spatial distribution of the features, so that interference features can be restrained, and feature areas of key semantics are highlighted; (3) the proposed attention mechanism can enhance the semantic expression capability of the model, does not increase additional parameters and has low calculation cost, and can be embedded into other feature extraction networks to improve feature learning and classification accuracy.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention and subspace attention mechanism;

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a medical endoscope image classification algorithm based on measurement and combined with a subspace attention mechanism, wherein a subspace attention module is embedded in a feature extraction network. The following is a detailed description of specific embodiments:

the invention adopts the ResNet network structure with the last full connection layer removed as a characteristic extraction network for projecting the image into an embedded space to obtain the image characteristics. The ResNet network structure is a depth residual error network, and is divided into 5 stages from the bottom layer to the high layer, wherein the last 4 stages are composed of basic blocks or bottlenecks except stage 0, and the structure is relatively similar.

FIG. 1 shows a flow chart and subspace attention module illustrating a ResNet-50 network architecture for an embodiment of the present invention. Stage 1 to stage 4 of ResNet-50 contain 3, 4, 6, 3 bottlenecks, respectively.

Step 1: a C-way K-shot task T is established _i Inputting a feature extraction network;

step 2: dividing the feature F output by the last BatchNorm layer in each bottleneck of the highest layer stage 4 into g subspaces [ F ] along the channel dimension ¹ ,…,F ⁿ ,…,F ^g ]For each featureSubspace F ⁿ Two attention matrices are calculated:

①attn(F ⁿ )＝σ(bn(gap(F ⁿ )⊙F ⁿ ))，

wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F ⁿ ) Representing the first attention matrix generated, resulting in a new feature subspace with applied attention mechanisms

②attn(F ⁿ )＝σ(pconv(mp(dconv(F ⁿ ))))，

Wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,represents a pixel-wise addition operation, attn (F ⁿ ) Representing the generated second attention matrix, a new feature subspace is obtained, to which attention mechanisms are applied>

Step 3, fusing 2g of new feature subspaces obtained in the step 2:

F _new ＝conv1*1(concat(F _{new_1} ,F _{new_2} ))

where d () represents the euclidean distance.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A metric-based medical endoscopic image classification algorithm incorporating subspace attention, comprising the steps of:

step 1: sampling from a medical endoscope image set to establish a large number of C-way K-shot few sample classification tasks;

step 2: inputting one C-way K-shot task in the step 1 into a feature extraction network f _θ A subspace attention mechanism is applied to the last stage of the high-level of the feature extraction network structure, so that the semantic feature learning capacity of the model is enhanced;

step 3, fusing 2g of new feature subspaces obtained in the step 2 to obtain the features with enhanced semantics;

where d () represents the euclidean distance;

model network parameters are optimized by back propagation.

2. The method of claim 1, wherein the step 1 of creating a few-sample classification task comprises: t (T) _i ＝{(I _s ,I _q ),(y _s ,y _q )}。

3. Generating subspace attentiveness in accordance with step 2 of claim 1, whereinDividing the feature F output by the last BatchNorm layer in each basic block or bottleneck of the last stage of the high-level of the feature extraction network structure into g subspaces [ F ] according to the channel dimension ¹ ,…,F ⁿ ,…,F ^g ]For each feature subspace F ⁿ Two attention matrices are calculated:

①attn(F ⁿ )＝σ(bn(gap(F ⁿ )⊙F ⁿ ))，

wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F ⁿ ) Representing the first attention matrix generated,representing a new feature subspace in which an attention mechanism is applied;

②attn(F ⁿ )＝σ(pconv(mp(dconv(F ⁿ ))))，

wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,represents a pixel-wise addition operation, attn (F ⁿ ) Representing the generated second attention matrix, < >>Representing a new feature subspace in which the attention mechanism is applied.

4. The method of claim 1, wherein the step 3 of fusing new feature subspaces is characterized in that all new feature subspaces respectively applying two attention mechanisms are spliced according to channels, and cross-channel features are fused and reduced in dimension through 1×1 convolution operation;

F _new ＝conv1*1(concat(F _{new_1} ,F _{new_2} ))

wherein conv1 x 1 represents a 1×1 convolution operation, concat represents a splice per channel, F _new Representing the resulting semantically enhanced features.