CN117876767A - Metric-based medical endoscopic image classification algorithm combining subspace attention - Google Patents
Metric-based medical endoscopic image classification algorithm combining subspace attention Download PDFInfo
- Publication number
- CN117876767A CN117876767A CN202410010192.8A CN202410010192A CN117876767A CN 117876767 A CN117876767 A CN 117876767A CN 202410010192 A CN202410010192 A CN 202410010192A CN 117876767 A CN117876767 A CN 117876767A
- Authority
- CN
- China
- Prior art keywords
- feature
- subspace
- attention
- new
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007635 classification algorithm Methods 0.000 title claims abstract description 7
- 238000005259 measurement Methods 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000003902 lesion Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 abstract description 3
- 238000013527 convolutional neural network Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 description 3
- 210000001035 gastrointestinal tract Anatomy 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001079 digestive effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000001839 endoscopy Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000003097 mucus Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Abstract
The invention discloses a medical endoscope image classification algorithm based on measurement and combining subspace attention, which aims at solving the problems that a large amount of noise exists in an endoscope image due to various complex factors in the endoscope imaging process, the noise signals contained in the image features extracted by a convolutional neural network disturb the spatial distribution of effective semantic features, and the representation capability of a model on key semantic features is weakened, so that the classification precision is reduced. Comprising the following steps: 1) Training a model by adopting a few sample learning method based on measurement; 2) Dividing the features extracted from the higher layer of the network structure into a plurality of feature subspaces; 3) The attention profile and global correlation are calculated separately for each feature subspace to suppress noise in the subspace and highlight the expressed effective semantic region. The image classification method provided by the invention can enable the model to pay attention to different key semantic parts in the endoscopic image, inhibit interference characteristics, and enhance the expression capability of effective semantic characteristics, thereby improving the classification accuracy of the model.
Description
Technical Field
The invention relates to the field of computer-aided diagnosis based on medical images, in particular to a medical endoscope image classification algorithm based on measurement and integrating subspace attention.
Background
Early detection and early diagnosis are important modes for controlling digestive tract diseases and improving survival rate, and medical endoscopic screening is an effective means for detecting early lesions of the digestive tract, and can accurately detect lesion sites. In digestive endoscopy, machine learning-based auxiliary diagnostic systems must accurately extract endoscopic image information, analyze and identify the image information, and provide powerful diagnostic support for doctors.
In the microscopic imaging process, the light source is switched to cause uneven brightness and abnormal color change of the image; the imaging resolution is low and the movement of the endoscope causes blurring of the image to different degrees; digestive tract mucus and residues, and strong light reflection, etc., make many noises in the imaged endoscopic image. The machine learning method based on the deep neural network generally uses the convolutional neural network to extract image features for classification, so that the extracted endoscopic image features also contain a lot of noise, and the existence of the interference signals inevitably disturbs the spatial distribution of semantic features, so that the model is difficult to accurately capture key semantic parts in the image, thereby weakening the semantic representation capability of the model and affecting classification performance.
Disclosure of Invention
The invention aims to: in view of the above, the present invention proposes a metric-based medical endoscopic image classification algorithm that incorporates subspace attention mechanisms.
The technical scheme is as follows: the subspace attention mechanism is introduced into the last stage of the ResNet network structure, the key semantic feature region expressed by the subspace is highlighted, the model focuses on effective feature information, interference features are restrained, and semantic information with stronger expression capability is obtained. The implementation steps are as follows:
step 1: the task of creating a large number of C-way K-shots of small sample classification from up-sampling of a set of medical endoscopic images, specifically, randomly selecting C classes from the set of endoscopic images each time, randomly sampling K images from each class selected, forming a support set I s Randomly sampling any images from the images left in the C classes to form a query set I q Support set and query set images are combined into a small sample classification task T i ={(I s ,I q ),(y s ,y q )}。
Step 2: taking ResNet network structure with the last full connection layer removed as a feature extraction network f θ C-way K-shot task T established in step 1 is implemented i The feature extraction network is input, noise contained in the features of the lower layer in the front of the neural network is more, and the features of the higher layer in the rear of the neural network have stronger semantic information and position relation information, so that subspace attention mechanisms are applied in the last stage of the upper layer of the feature extraction network structure, and the semantic feature learning capacity of the model is enhanced.
Specifically, the feature F output by the last BatchNorm layer in each basic block or bottleneck of the last stage of the feature extraction network structure is divided into g subspaces [ F ] along the channel dimension 1 ,…,F n ,…,F g ]For each feature subspace F n Two attention matrices are calculated:
①attn(F n )=σ(bn(gap(F n )⊙F n )),
wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F n ) Watch (watch)Attention matrix generated by using similarity between global feature descriptors and local feature descriptors of subspaces, new feature subspaces with an attention mechanism appliedThe spatial distribution of the internal features is enhanced, the interference features are suppressed, key semantic feature areas in the subspaces are highlighted, and the semantic feature learning capacity of the feature subspaces is improved.
②attn(F n )=σ(pconv(mp(dconv(F n )))),
Wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,representing pixel-by-pixel addition operations, note that the moment array attn (F n ) Representing the global correlation of semantic and spatial information fused captured across channels, a new feature subspace with an applied attention mechanism +.>The distribution of internal features is improved, highlighting local semantic regions.
Step 3, fusing 2g of new feature subspaces obtained in the step 2:
F new =conv1*1(concat(F new_1 ,F new_2 ))
wherein concat represents splicing according to channels, conv1 x 1 represents 1 x 1 convolution operation, and is used for fusing different channel data and reducing dimension. Each basic block or bottleneck finally outputs a feature F enhanced by subspace attentiveness mechanisms new 。
Step 4: extracting network f from features incorporating subspace attention mechanisms θ Separating a support set feature Z from the finally output image feature Z s And query set feature Z q Computing a support set feature class prototype:
step 5: measuring the distance between the query set features and each class prototype, and predicting the probability of the lesion class to which the query set image belongs according to the distance measurement:
where d () represents the euclidean distance.
Step 6: calculating cross entropy loss of predicted query set categories and actual categories thereof:
model network parameters are optimized by back propagation.
The beneficial effects of the invention are as follows: (1) the high-level convolution features of the model network structure are divided into a plurality of feature subspaces, and an attention matrix is independently generated for each subspace, so that the model can pay attention to a plurality of different key semantic parts in the image; (2) two kinds of attention matrixes are generated for each feature subspace through learning cross-channel information and similarity between global and local features, and are used for improving the spatial distribution of the features, so that interference features can be restrained, and feature areas of key semantics are highlighted; (3) the proposed attention mechanism can enhance the semantic expression capability of the model, does not increase additional parameters and has low calculation cost, and can be embedded into other feature extraction networks to improve feature learning and classification accuracy.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention and subspace attention mechanism;
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a medical endoscope image classification algorithm based on measurement and combined with a subspace attention mechanism, wherein a subspace attention module is embedded in a feature extraction network. The following is a detailed description of specific embodiments:
the invention adopts the ResNet network structure with the last full connection layer removed as a characteristic extraction network for projecting the image into an embedded space to obtain the image characteristics. The ResNet network structure is a depth residual error network, and is divided into 5 stages from the bottom layer to the high layer, wherein the last 4 stages are composed of basic blocks or bottlenecks except stage 0, and the structure is relatively similar.
FIG. 1 shows a flow chart and subspace attention module illustrating a ResNet-50 network architecture for an embodiment of the present invention. Stage 1 to stage 4 of ResNet-50 contain 3, 4, 6, 3 bottlenecks, respectively.
Step 1: a C-way K-shot task T is established i Inputting a feature extraction network;
step 2: dividing the feature F output by the last BatchNorm layer in each bottleneck of the highest layer stage 4 into g subspaces [ F ] along the channel dimension 1 ,…,F n ,…,F g ]For each featureSubspace F n Two attention matrices are calculated:
①attn(F n )=σ(bn(gap(F n )⊙F n )),
wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F n ) Representing the first attention matrix generated, resulting in a new feature subspace with applied attention mechanisms
②attn(F n )=σ(pconv(mp(dconv(F n )))),
Wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,represents a pixel-wise addition operation, attn (F n ) Representing the generated second attention matrix, a new feature subspace is obtained, to which attention mechanisms are applied>
Step 3, fusing 2g of new feature subspaces obtained in the step 2:
F new =conv1*1(concat(F new_1 ,F new_2 ))
wherein concat represents splicing according to channels, conv1 x 1 represents 1 x 1 convolution operation, and is used for fusing different channel data and reducing dimension. Each basic block or bottleneck finally outputs a feature F enhanced by subspace attentiveness mechanisms new 。
Step 4: extracting network f from features incorporating subspace attention mechanisms θ Separating a support set feature Z from the finally output image feature Z s And query set feature Z q Computing a support set feature class prototype:
step 5: measuring the distance between the query set features and each class prototype, and predicting the probability of the lesion class to which the query set image belongs according to the distance measurement:
where d () represents the euclidean distance.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (4)
1. A metric-based medical endoscopic image classification algorithm incorporating subspace attention, comprising the steps of:
step 1: sampling from a medical endoscope image set to establish a large number of C-way K-shot few sample classification tasks;
step 2: inputting one C-way K-shot task in the step 1 into a feature extraction network f θ A subspace attention mechanism is applied to the last stage of the high-level of the feature extraction network structure, so that the semantic feature learning capacity of the model is enhanced;
step 3, fusing 2g of new feature subspaces obtained in the step 2 to obtain the features with enhanced semantics;
step 4: extracting network f from features incorporating subspace attention mechanisms θ Separating a support set feature Z from the finally output image feature Z s And query set feature Z q Computing a support set feature class prototype:
step 5: measuring the distance between the query set features and each class prototype, and predicting the probability of the lesion class to which the query set image belongs according to the distance measurement:
where d () represents the euclidean distance;
step 6: calculating cross entropy loss of predicted query set categories and actual categories thereof:
model network parameters are optimized by back propagation.
2. The method of claim 1, wherein the step 1 of creating a few-sample classification task comprises: t (T) i ={(I s ,I q ),(y s ,y q )}。
3. Generating subspace attentiveness in accordance with step 2 of claim 1, whereinDividing the feature F output by the last BatchNorm layer in each basic block or bottleneck of the last stage of the high-level of the feature extraction network structure into g subspaces [ F ] according to the channel dimension 1 ,…,F n ,…,F g ]For each feature subspace F n Two attention matrices are calculated:
①attn(F n )=σ(bn(gap(F n )⊙F n )),
wherein σ represents a sigmoid function, bn represents a batch normalization layer, gap represents a global average pooling function, and radix represents a dot product operation, attn (F n ) Representing the first attention matrix generated,representing a new feature subspace in which an attention mechanism is applied;
②attn(F n )=σ(pconv(mp(dconv(F n )))),
wherein σ represents a sigmoid function, dconv represents a convolution kernel of 1×1 depth convolution operation, mp represents a maximum pooling function with a window size of 3×3 and stride of 1, pconv represents a point convolution operation with a convolution kernel of 1×1, as indicated by the dot product operation,represents a pixel-wise addition operation, attn (F n ) Representing the generated second attention matrix, < >>Representing a new feature subspace in which the attention mechanism is applied.
4. The method of claim 1, wherein the step 3 of fusing new feature subspaces is characterized in that all new feature subspaces respectively applying two attention mechanisms are spliced according to channels, and cross-channel features are fused and reduced in dimension through 1×1 convolution operation;
F new =conv1*1(concat(F new_1 ,F new_2 ))
wherein conv1 x 1 represents a 1×1 convolution operation, concat represents a splice per channel, F new Representing the resulting semantically enhanced features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410010192.8A CN117876767A (en) | 2024-01-04 | 2024-01-04 | Metric-based medical endoscopic image classification algorithm combining subspace attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410010192.8A CN117876767A (en) | 2024-01-04 | 2024-01-04 | Metric-based medical endoscopic image classification algorithm combining subspace attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117876767A true CN117876767A (en) | 2024-04-12 |
Family
ID=90594190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410010192.8A Pending CN117876767A (en) | 2024-01-04 | 2024-01-04 | Metric-based medical endoscopic image classification algorithm combining subspace attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117876767A (en) |
-
2024
- 2024-01-04 CN CN202410010192.8A patent/CN117876767A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111951288B (en) | Skin cancer lesion segmentation method based on deep learning | |
Chen et al. | PCAT-UNet: UNet-like network fused convolution and transformer for retinal vessel segmentation | |
CN109754007A (en) | Peplos intelligent measurement and method for early warning and system in operation on prostate | |
Nawaz et al. | Melanoma localization and classification through faster region-based convolutional neural network and SVM | |
Das et al. | Automated Indian sign language recognition system by fusing deep and handcrafted feature | |
Pacal et al. | Deep learning-based approaches for robust classification of cervical cancer | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
Jin et al. | FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection | |
CN114565572A (en) | Cerebral hemorrhage CT image classification method based on image sequence analysis | |
CN117152433A (en) | Medical image segmentation method based on multi-scale cross-layer attention fusion network | |
CN114972202A (en) | Ki67 pathological cell rapid detection and counting method based on lightweight neural network | |
Nie et al. | Recent advances in diagnosis of skin lesions using dermoscopic images based on deep learning | |
Wang et al. | An eyes-based siamese neural network for the detection of gan-generated face images | |
CN113538363A (en) | Lung medical image segmentation method and device based on improved U-Net | |
Hosseinzadeh Kassani et al. | Automatic polyp segmentation using convolutional neural networks | |
Gnanapriya et al. | A Hybrid Deep Learning Model for Real Time Hand Gestures Recognition. | |
Lee et al. | A hardware accelerated system for high throughput cellular image analysis | |
CN117876767A (en) | Metric-based medical endoscopic image classification algorithm combining subspace attention | |
CN115631526A (en) | Shielded facial expression recognition method based on self-supervision learning technology and application | |
Sharma et al. | Facial Image Super-Resolution with CNN,“A Review” | |
Chao et al. | Instance-aware image dehazing | |
Grabska-Barwińska | Measuring and improving the quality of visual explanations | |
Salvi et al. | cyto‐Knet: An instance segmentation approach for multiple myeloma plasma cells using conditional kernels | |
Sabitha et al. | Classifying Hematoxylin and Eosin Images Using a Super-Resolution Segmentor and a Deep Ensemble Classifier | |
Monkumar et al. | Unified framework of dense convolution neural network for image super resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |