CN116386034A

CN116386034A - Cervical cell classification method based on multiscale attention feature enhancement

Info

Publication number: CN116386034A
Application number: CN202310131751.6A
Authority: CN
Inventors: 刘娟; 金钰
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-07-04

Abstract

The invention discloses a cervical cell classification method based on multi-scale attention feature enhancement, which comprises the following steps: s1: extracting multi-scale features by using a deep convolutional neural network; s2: constructing a multi-scale feature pyramid according to a feature fusion method in the FPN; s3: respectively calculating spatial attention and channel attention for each layer of features in pyramid features to generate a spatial attention pyramid and a channel attention vector; s4: the masks obtained by segmentation of the spatial attention pyramid threshold are used for enhancing the spatial attention; s5: the attention weighting is carried out on the multi-scale pyramid features in the S2 by using the enhanced spatial attention pyramid and the attention vectors of the channels of each layer, so as to obtain a multi-scale attention feature pyramid; s6: respectively constructing a classifier for each layer of features in the multi-scale attention feature pyramid; s7: and (3) performing gradient descent optimization training on the whole network and performing classified prediction on cervical cells. The invention provides a cervical cell classification model with higher classification accuracy.

Description

Cervical cell classification method based on multiscale attention feature enhancement

Technical Field

The invention relates to the field of deep learning and medical intersection, in particular to a cervical cell classification method based on multi-scale attention feature enhancement, and belongs to the application of deep learning in the medical field.

Background

Cervical cancer is one of the most common gynaecological malignancies, with extremely high morbidity and mortality. The advanced age of carcinoma in situ is 30-35 years, the invasive carcinoma is 45-55 years, and the onset of carcinoma in situ has a tendency to be younger in recent years. Clinical practice shows that the effective cervical cell screening means can lead cervical cancer and precancerous lesions to be found and treated earlier, and obviously reduces the incidence and death rate of cervical cancer.

Cervical cells can be classified into the following 5 classes according to the different stages of cervical cancer precancerous lesions: NORMAL, ASC-US (atypical squamous cells of undetermined significance, atypical squamous cell of undefined diagnostic significance), ASC-H (atypical squamous cells but cannot rule out high-grade squamous intra-epithelial lesion, atypical squamous cell of high squamous intraepithelial lesions cannot be excluded), LSIL (low-grade squamous intra-epithelial lesion, low squamous intraepithelial lesions), HSIL (high-grade squamous intra-epithelial lesion, high squamous intraepithelial lesions). These 5 cell types are generally similar in appearance, but as the course of the disease progresses, they differ primarily in that their pathological characteristics change to varying degrees, for example, in abnormal cells there is often uncontrolled division leading to the appearance of a heterogeneous nucleus. Thus, in clinical diagnosis, cervical cell screening is often performed by a cytologist observing the pathological features of the nuclei in cervical cell smears under a microscope to determine the extent of cytopathy. However, manual cervical cytological screening is highly specialized and experienced by cytologists and is time-consuming and labor-consuming. With the rapid development of machine learning methods, particularly deep learning, more and more researchers are beginning to use computer-aided diagnosis (CAD) technology to make up for the shortfall of manual cytology screening methods.

Many automatic cervical cell classification tasks have been presented, however, these methods have not directly solved two major difficulties faced by cervical cell classification. First, the differentiation between the different classes of cervical cells is mainly manifested on the nucleus, so the classification network needs to be more focused on the nucleus area. Second, cervical cell classification is a fine-grained image classification task, and the similarity between different classes of cells is large, and due to inconsistent staining conditions or differences in personal preferences of cervical smear producers, there may be large differences in cells of the same class, which makes the cervical cell classification task more challenging.

Disclosure of Invention

Aiming at the technical problems, the invention discloses a cervical cell classification method based on multi-scale attention feature enhancement, which is characterized in that a multi-scale feature pyramid with bidirectional feature transmission is built for extracting a multi-scale feature pyramid simultaneously comprising low-level detail features and high-level semantic features, a plurality of joint classifiers are built by using the richer features, the discrimination features among various cervical cells with fine granularity can be more effectively excavated, and then the attention of multiple scales is enhanced by combining with the attention masks generated by threshold segmentation, so that more accurate cervical cell classification is finally realized.

The technical scheme provided by the invention is as follows:

a cervical cell classification method based on multi-scale attention feature enhancement comprises the following steps:

s1: extracting multi-scale features from the cervical cell image using a deep convolutional neural network comprising a plurality of blocks;

s2: constructing a multi-scale feature pyramid according to a feature fusion method in the FPN;

s3: calculating spatial attention and channel attention for each layer of features in the multi-scale pyramid features by using APN, and generating a spatial attention pyramid and channel attention vectors of each layer;

s4: the masks obtained by segmentation of the spatial attention pyramid threshold are used for enhancing the spatial attention;

s5: the enhanced spatial attention pyramid and the attention vectors of the channels of each layer are used for carrying out attention weighting on the multi-scale pyramid features in the S2, so that a multi-scale attention feature pyramid for classification is obtained;

s6: respectively constructing a classifier for each layer of features in the multi-scale attention feature pyramid;

s7: and performing end-to-end gradient descent optimization training on the whole network by using three loss functions, and performing classification prediction on cervical cells by using a trained classification network.

Further, the deep convolutional neural network in step S1 is a backhaul of any general classification network, which is divided into n blocks, and the output feature map of each block has a different scale. Preferred classification networks include ResNet, denseNet, VGG, etc.

Further, in step S2, after sampling the feature map output by the deep block to the same size as the feature map output by the shallow block, adding the feature map to the feature map output by the shallow block to realize fusion of features of different scales from top to bottom, and constructing a multi-scale feature pyramid by using the features fused by each block output feature and the deep feature.

Further, step S3 comprises the sub-steps of:

s3.1: for each layer of features P in a multi-scale pyramid _i Performing 3×3 deconvolution and sigmoid normalization to generate a spatial attention map corresponding to each layer of features

S3.2: for each layer of features F in a multi-scale pyramid _i Generating initial channel attention after global average pooling and two full connection layers in turn

S3.3: initial channel attention using current layer

Channel attention to the shallow layer>

Adding and fusing to obtain final channel attention of the current layer>

And the attention characteristic fusion of different scales from bottom to top is realized.

Further, step S4 comprises the sub-steps of:

s4.1: computing per-layer spatial attention patterns

Mean value theta of (2) _i ；

S4.2: using theta _i Spatial attention as a threshold

Dividing to obtain->

Corresponding segmentation mask M _i ；

S4.3: by M _i Space attention force diagram

And weighting is carried out, and attention scores at points with smaller attention weights are removed, so that the enhanced multi-scale space attention characteristics are obtained.

Further, in step S4.2, θ is used _i Spatial attention as a threshold

The method for dividing comprises the following steps: />

Is greater than or equal to theta _i The value of (2) is assigned 1, (-)>

Values less than are assigned as0。

Further, in step S5, each value in the channel attention vector corresponds to one channel in the current layer feature map, and each attention value is weighted to all feature values on its corresponding channel by multiplication.

Further, the classifier in step S6 is composed of one global average pooling layer and several fully connected layers.

Further, in step S7, the three-term loss function is obtained by the joint cross entropy loss L of all classifiers _CE Smoothing loss L of masks _SM Loss of channel feature diversity L _CDis Is a weighted sum composition of (i.e., l=l) _CE +λL _CDis +μL _SM Where λ and μ are constants used to balance the losses.

Further, in step S7, the smoothing loss L _SM Boundary smoothing for constraint-type saturation mask, which is the scale-type saturation mask M _i The first-order gradient sum has the formula:

wherein n is the total number of the attention masks, W _i And H _i The length and width of the ith saturation mask, m and n represent the coordinate index of a point in the saturation mask;

representing the gradient at the (m, n) point coordinate point in the ith saturation mask, a gradient operator is used in the present invention to determine.

Further, the channel feature diversity loss in step S7 is lost L _CDis The feature channel used for forcing the last convolution layer output in the classifier has the discrimination to the specific class, thereby increasing the diversity of the features of each channel, and the specific calculation steps comprise:

(1) Dividing each layer of classification features into C groups according to channels, wherein each group comprises n=c/C channel features, C is the channel number of the classification features of the current layer, and C is the cervical cell class number;

(2) Aggregating n channel features in each group into single channel features by using cross-channel averaging pooling, calculating a class response value for each group of features by using global averaging pooling, and normalizing all c response values by using a softmax function;

(3) And calculating the channel diversity loss of the current layer characteristic of the cross entropy loss value of the normalized vector formed by c response values and the current image category single thermal coding, and then calculating the average value of the channel diversity loss of each layer as the channel diversity loss of the current network.

Further, in step S7, the result of the classification prediction is the average value of all the classifier prediction results.

Further, in step S7, cervical cells are classified into 5 classes, including NORMAL, ASC-US, ASC-H, LSIL, and HSIL.

The beneficial effects of the invention are as follows:

(1) According to the invention, the multi-scale feature pyramid with the bidirectional feature transfer is constructed to extract the multi-scale feature pyramid simultaneously containing the low-level detail features and the high-level semantic features, and the identification features among various cervical cells with fine granularity can be more effectively excavated by constructing a plurality of joint classifiers by using the richer features;

(2) The attention weight is corrected based on the characteristic enhancement method of the attention mask, so that attention to a key area is further promoted, and more attention is added to a nucleus area. In the implementation flow, firstly, multiscale features and multiscale attention of cervical cell images are extracted through FPN (Feature Pyramid Network) and APN (Attention Pyramid Network), then, the multiscale attention is enhanced by using attention masks generated by threshold segmentation, and finally, a classifier is constructed by using features generated by weighting the attention of the multiscale features by the enhanced attention, so that the accuracy of cervical cell classification is improved from the aspect of improving the interpretability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary view of various cervical cell images used in accordance with an embodiment of the invention;

FIG. 2 is a general flow chart of an embodiment of the present invention;

FIG. 3 is a detailed view of the implementation of the modules in an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

The embodiment of the invention provides a cervical cell classification method (based on ResNet 50) based on multi-scale attention feature enhancement, and referring to a general flow structure of FIG. 2, the method comprises the following steps:

s1: backbone using ResNet50 as the multi-scale feature extractor, using the output feature map { B > of 4 blocks therein ₁ ，B ₂ ，B ₃ ，B ₄ -forming a multi-scale feature;

s2: multi-scale feature { B } is processed according to feature fusion method in FPN ₁ ，B ₂ ，B ₃ ，B ₄ Performing feature fusion from top to bottom to obtain a multi-scale feature pyramid { P } ₂ ，P ₃ ，P ₄ Specific implementation details are shown in FF of FIG. 3 _i Module, note P ₄ Directly from B ₄ The 1×1 convolution results in that P1 is not taken into consideration because the large feature map size requires significant computational resources and thus can be selectively implementedConstructing a feature pyramid by using the features of the partial scale;

s3: calculating spatial attention and channel attention for each layer of features in the multi-scale pyramid feature by using APN, generating spatial attention pyramid and channel attention vector of each layer, and specific implementation details are shown in CA in FIG. 3 _i &SA _i A module comprising the steps of:

S3.3: initial channel attention using current layer

Channel attention to the shallow layer>

Adding and fusing to obtain final channel attention of the current layer>

Realizing the fusion of attention features with different scales from bottom to top, the final channel attention of the shallowest layer features is the initial channel attention, namely +.>

S4: using masks obtained by threshold segmentation of the spatial attention pyramid to enhance spatial attention, see Mask in FIG. 3 for specific implementation details _i A module comprising the steps of:

s4.1: computing per-layer spatial attention patterns

Mean value theta of (2) _i ；

S4.2: using theta _i Spatial attention as a threshold

Dividing, i.e.)>

Is greater than or equal to theta _i The value of (2) is assigned 1, i.e. +.>

A value less than or equal to 0, giving +.>

Corresponding segmentation mask M _i In the implementation process, M can be conveniently updated in order to facilitate back propagation _i M can be made by a sigmoid function _i The specific calculation formula is as follows:

wherein ω is set to a very large constant to ensure M _i The elements in (2) are close to 1 or 0; normally ω=10 can be set ¹² 。

S4.3: spatial attention seeking Using Mi

Weighting, i.e. by->

Eliminating attention scores at points with smaller attention weights to obtain enhanced multiscale spatial attention features +.>

S5: using the enhanced spatial attention pyramid and the attention vectors of each layer of channels to perform multi-scale pyramid characteristics in S2Attention weighting, resulting in a multi-scale attention feature pyramid { P 'for classification' ₂ ，P′ ₃ ，P′ ₄ Specific implementation details are shown in FIG. 3 FW _i A module, expressed in terms of a formula, may be:

wherein->

Is a broadcast addition, ->

Is bit-wise multiplication;

s6: respectively constructing classifiers for each layer of features in the multi-scale attention feature pyramid, wherein each classifier consists of a global average pooling layer and two full-connection layers;

s7: dividing the cervical cell image dataset with five types of cervical cells shown in fig. 1 into a training set, a verification set and a test set according to the ratio of 7:1:2, performing gradient descent optimization training on the whole network by using three loss functions, and performing classification prediction on the cervical cells on the test set by using the network with the highest classification accuracy on the verification set after training for 200 rounds, wherein the final prediction result is the average value of all classifier prediction results. Wherein the three-term loss function is obtained by the joint cross entropy loss L of all classifiers _CE Smoothing loss L of masks _SM Loss of channel feature diversity L _CDis Is a weighted sum composition of (i.e., l=l) _CE +λL _CDis +μL _SM Where λ and μ are constants used to balance the losses.

In the specific implementation process, the smoothing loss L _SM Boundary smoothing for constraint-type saturation mask, which is the scale-type saturation mask M _i The first-order gradient sum has the formula:

wherein n is the total number of the attention masks, W _i And H _i Respectively the ith saturation masThe length and width of k, m and n represent the coordinate index of a point in the attribute mask;

representing the gradient at the (m, n) point coordinate point in the ith saturation mask, a gradient operator is used in the present invention to determine. Channel feature diversity loss L _CDis The feature channel used to force the last convolutional layer output in the classifier has discrimination to specific class, thus increasing the diversity of each channel feature, the calculation process includes:

firstly, classifying each layer of classification features into 5 groups (cervical cell class number) according to channels, wherein each group comprises n=C/5 channel features, and C is the channel number of the classification features of the current layer;

then, aggregating n channel features in each group into single channel features by using cross-channel average pooling, calculating a class response value for each group of features by using global average pooling, and normalizing all 5 response values by using a softmax function;

and finally, calculating the channel diversity loss of the current layer characteristics of the vector formed by the normalized 5 response values and the cross entropy loss value of the current image category single thermal coding, and then calculating the average value of the channel diversity losses of each layer as the channel diversity loss of the current network.

Today, where cervical cancer patients tend to be younger, efficient and automated cervical cell classification techniques are critical to reducing the incidence and mortality of cervical cancer. Aiming at two difficulties faced by the automatic classification task of cervical cells at present, the invention provides a cervical cell classification method based on multi-scale attention feature enhancement, which uses the attention feature enhanced by the attention mask for classifying cervical cells, realizes the purpose of simultaneously utilizing the high-level semantic features and the more detailed low-level features of the general classification network, and uses the attention mask to prompt the network to pay more attention to the nucleus region which is critical to classifying cervical cells, thereby further improving the accuracy of classifying cervical cells from the aspect of improving the interpretability.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A cervical cell classification method based on multi-scale attention feature enhancement is characterized by comprising the following steps:

2. The method according to claim 1, characterized in that: the deep convolutional neural network in step S1 is a backbone of any general classification network, which is divided into n blocks, the output feature map of each block having a different scale.

3. The method according to claim 1, characterized in that: in step S2, the feature map output by the deep block is sampled to be the same size as the feature map output by the shallow block, and then added with the feature map output by the shallow block to realize the fusion of features with different scales from top to bottom, and the features fused by each block output feature and the deep feature are used for constructing a multi-scale feature pyramid.

4. The method according to claim 1, characterized in that: step S3 comprises the following sub-steps:

S3.3: initial channel attention using current layer

Channel attention to the shallow layer>

Adding and fusing to obtain final channel attention of the current layer>

5. The method according to claim 1, characterized in that: step S4 comprises the following sub-steps:

s4.1: computing per-layer spatial attention patterns

Mean value theta of (2) _i ；

S4.2: using theta _i Spatial attention as a threshold

Dividing to obtain->

Corresponding split mask M _i ；

S4.3: by M _i Space attention force diagram

6. The method according to claim 5, wherein: in step S4.2, θ is used _i Spatial attention as a threshold

The method for dividing comprises the following steps: />

Is greater than or equal to theta _i The value of (2) is assigned 1, (-)>

Values less than are assigned 0.

7. The method according to claim 1, characterized in that: in step S5, each value in the channel attention vector corresponds to a channel in the current layer feature map, and each attention value is weighted by multiplication to all feature values on its corresponding channel.

8. The method according to claim 1, characterized in that: the classifier in step S6 is composed of a global average pooling layer and a number of fully connected layers.

9. The method according to claim 1, characterized in that: in step S7, the three-term loss function is obtained by the joint cross entropy loss L of all classifiers _CE Smoothing loss L of masks _SM Loss of channel feature diversity L _CDis Is a weighted sum composition of (i.e., l=l) _CE +λL _CDis +μL _SM Where λ and μ are constants used to balance the losses.

10. The method according to claim 1, characterized in that: in step S7, the result of the classification prediction is the average value of all the classifier prediction results.