CN115810191A

CN115810191A - Pathological cell classification method based on multi-attention fusion and high-precision segmentation network

Info

Publication number: CN115810191A
Application number: CN202211710618.8A
Authority: CN
Inventors: 胡鹤轩; 丁秋阳; 黄倩; 杨天金; 胡强; 巫义锐; 张晔; 狄峰; 胡震云; 周晓军; 沈勤; 吕京澴
Original assignee: Jiuyisanluling Medical Technology Nanjing Co ltd; Hohai University HHU
Current assignee: Jiuyisanluling Medical Technology Nanjing Co ltd; Hohai University HHU
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-03-17

Abstract

The invention discloses a pathological cell classification method based on a multi-attention fusion mechanism and a high-precision segmentation network, which comprises the following steps of: collecting and preprocessing pathological cell images; constructing a multi-attention fusion module, which comprises a construction compression excitation module and a space attention module; constructing a high-precision segmentation network, and generating an interested region on the pathological cell image through the high-precision segmentation network; building a depth network model based on multi-attention fusion and high-precision segmentation network, and training the model by utilizing the preprocessed pathological cell image to obtain a network model with the highest pathological cell classification accuracy; and (4) sending the pathological cell image to be detected into the optimized network model to obtain a classification result of the pathological cell sample. The invention enables the model to learn the channel weight and the space weight in the cell image, and can extract the characteristics of pathological cell areas in the image for classification by using a high-precision segmentation network, thereby effectively reducing the calculated amount and the noise influence.

Description

Pathological cell classification method based on multi-attention fusion and high-precision segmentation network

Technical Field

The invention belongs to the technical field of digital image processing and medical science intersection, and particularly relates to a pathological cell classification method based on multi-attention fusion and high-precision segmentation network.

Background

With the development of science and technology, cancer is no longer an incurable disease, the early cancer cure rate is more than 90%, and the late cancer cure rate is only about 10%. Therefore, early screening for cancer is very important for the prevention and treatment of cancer. Pap smear testing is a common and effective way to screen for cancer. The professional physician observes the cell morphology in the pap smear under a microscope, classifies each cell, and determines whether the cell sample is cancerous. The cost for classifying pathological cells by manpower is high, and the classification result often has subjective deviation and influences the screening accuracy, so that the research and development of the technology for automatically classifying the medical images of the pathological cells have great significance.

Some pathological cell medical image classification methods include a pathological cell medical image classification method based on context modeling, a pathological cell medical image classification method based on a atlas neural network, and the like. Chinese patent application (CN 112200253A) "cervical cell image classification method based on SENet", which uses SENet to realize classification of cervical pathological cell medical images. The method mainly has the following defects: (1) SENET uses a channel attention mechanism, only channel weight is calculated, and space weight is not calculated, so that the attention of a network model in the method to key areas of pathological cell medical images is not enough, cell features cannot be fully extracted by the network model, and the accuracy of cell classification is poor; (2) According to the method, the global features extracted from the whole cell image by the network model are sent to the classifier for classification, so that the calculated amount is large, and the global features often contain some noises which do not belong to cells, so that the accuracy of model classification is influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a pathological cell classification method based on multi-attention fusion and a high-precision segmentation network.

In order to solve the technical problems, the invention adopts the following technical scheme.

A pathological cell classification method based on a multi-attention fusion mechanism and a high-precision segmentation network comprises the following steps:

step 1, collecting and preprocessing a pathological cell medical image;

step 2, constructing a multi-attention fusion module, which comprises constructing a compression excitation module so that a network model can learn the relationship among different channels; constructing a spatial attention module to help the network model to identify key feature regions in the cell image;

step 3, constructing a high-precision segmentation network, and generating an interested area on the pathological cell medical image through the high-precision segmentation network;

step 4, building a depth network model based on a multi-attention fusion and high-precision segmentation network, wherein the depth network model comprises a multi-attention fusion module and a high-precision segmentation network module, and training the depth network model by utilizing a preprocessed pathological cell medical image to obtain a depth network model with the highest pathological cell classification accuracy;

and 5, sending the pathological cell medical image to be detected into the depth network model optimized in the step 4 to obtain a classification result of the pathological cell sample.

Specifically, the step 1 process comprises:

step 1.1. Collect the images of pap smears made with pathological cells under a microscope, classify the pathological cells in each pathological cell medical image by a specialist as follows: cells with high karyoplasmic ratio morphology meeting HSIL standard, abnormal keratinocytes in HSIL or SCC, abnormal naked nuclei in HSIL or SCC, hollowed cells, non-hollowed cells with morphology meeting LSIL standard, abnormal glandular cells, abnormal metaplastic cells, normal naked nuclei, normal mesodermal cells, pathogenic microbial cells, normal parabasal layer cells, classical squamous metaplasia, classical keratinization;

step 1.2, performing image enhancement on the collected pathological cell medical image, including random angle rotation, turning, cutting, displacement, scaling and the like, so that the obtained pathological cell medical image data set is richer;

and 1.3, labeling each pathological cell medical image by using a labelme labeling tool, wherein the labeling comprises a classification label and a cell outline coordinate of pathological cells in the pathological cell medical image.

Specifically, the step 2 includes:

step 2.1. Construct the compression excitation module, including:

compression: performing global average pooling on the pathological cell medical images, compressing the pathological cell medical images from H multiplied by W multiplied by C to 1 multiplied by C after compression operation, and aggregating cross-space characteristics; wherein H represents the height of the pathological cell medical image, W represents the width of the pathological cell medical image, and C represents the number of channels of the pathological cell medical image; the calculation formula for the compression operation is:

wherein u is _c (i, j) pixel values representing different channels of the pathological cell medical image at the (i, j) position;

excitation: learning the weight of each channel of the pathological cell medical image, wherein the weight comprises two fully-connected layers and finally outputs a 1 × 1 × C vector; the calculation formula of the excitation operation is:

S＝F _ex (z,W)＝σ(g(z,w))＝σ(W ₂ δ(W ₁ z)) (2)

wherein sigma represents a sigmoid activation function, and delta represents a ReLu function; w ₁ Refers to the weight parameter, W, of the first fully-connected layer ₂ Refers to the weight parameter of the second fully-connected layer;

scale: multiplying the weight of each channel learned in the excitation operation by the original features, the Scale operation is calculated by the formula:

u ^′ _c (i,j)＝S×u _c (i,j) (3)

wherein u is _c Is a medical image of pathological cells input into the Scale operation formula, u ^′ _c Representing Scale operation disclosureOutputting the pathological cell medical image;

step 2.2, constructing a space attention module: firstly, aggregating the channel characteristics of the pathological cell medical image by using a global average pooling method, wherein the global average pooling along the channel direction is represented by the following formula:

wherein x (i, j) represents a feature of a point (i, j) on the pathological cell medical image;

the weight G of the attention map is then learned with a 1 × 1 convolution, which computes the formula:

u＝F _Conv ＝Mx(i，j) (5)

where M is a weight matrix learned by 1 × 1 convolution;

and finally, carrying out Scale operation, wherein the calculation formula is as follows:

u ^′ _c (i,j)＝G×u _c (i,j) (6)。

specifically, the step 3 includes:

generating a plurality of rectangular anchor frames on each pixel point on the pathological cell medical image, wherein the anchor frames select 3 sizes and 3 different length-width ratios, and the total number of the anchor frames is 9; classifying each anchor frame and performing an offset regression task; wherein the anchor frame classification task uses a binary cross entropy loss function:

L _cls (p,y)＝-y log(p)-(1-y)log(1-p) (7)

where y is the true value and p is the predicted value;

the loss function used for the anchor frame offset regression is:

wherein y is ^′ Is a predicted value, delta is a constant, and the value can be determined according to the situation;

taking the weighted sum of the penalty function used by the anchor box classification task and the penalty function used by the anchor box offset as the overall penalty function:

i is the index of each anchor frame; alpha is a constant determined according to the situation and used for controlling the proportion of the loss of the anchor frame classification task in the total loss; p is a radical of ^* 、t ^* Respectively representing true values, and p and t respectively representing predicted values; n is a radical of _cls Representing the number of anchor frames;

calculating the position and the size of the proposed frame by using the anchor frame offset obtained by the anchor frame regression, and further calculating the intersection ratio of the proposed frame and the real frame, wherein the formula is as follows:

wherein, F _IoU Representing the intersection ratio of the suggestion frame and the real frame, A representing the area of the real frame, B representing the area of the suggestion frame; weighting and summing the classification scores obtained by the anchor frame classification task with the cross-over ratio, wherein the calculation formula is as follows:

Ts＝F _cls +λF _IoU (11)

wherein Ts represents a fusion score, F _cls Representing classification scores, wherein lambda is a coefficient and can be obtained by model learning, and the weight of intersection ratio in the fusion scores is measured;

and processing the fusion score by adopting a non-maximum inhibition algorithm, and screening to obtain the most appropriate suggestion box.

Specifically, the step 4 includes:

step 4.1, adding a multi-attention fusion module in the ResNet18 network; the network model is based on a ResNet18 network, the ResNet18 network has 5 stages, the first stage is convolution of 7 multiplied by 7, and four stages consisting of residual blocks; the compression excitation module and the spatial attention module are serially embedded into the second to fourth stages of the ResNet18 network, and the ResNet18 network finally outputs a characteristic diagram obtained by extracting a pathological cell medical image;

step 4.2, sending the characteristic diagram into a high-precision segmentation network to obtain an interested area, namely an anchor frame containing pathological cells; obtaining the characteristics contained in the region of interest according to the characteristic diagram obtained in the step 4.1 and the anchor frame obtained in the previous step, and sending the characteristics to a classifier for classification tasks after pooling;

and 4.3, performing pretreatment on the medical image of the pathological cells according to the ratio of 8:2, dividing a training set and a test set; the training set is sent into the network model set up in the front for training, the loss function used in the training process adopts a focus loss function, and the formula is as follows:

where FL represents the focus loss function value,

representing a predicted value, p representing a true value, gamma being a weight, and confirming an optimal value through experiments;

and after the training is finished, testing the network model by using the test set, and selecting the model with the highest average accuracy as the optimal pathological cell medical image classification model.

Specifically, the step 5 includes:

step 5.1, the medical image of the pathological cell to be detected is sent to the network model optimized in the step 4, and the classification result of the network model to the inspection sample is obtained;

and 5.2, judging a classification result: judging the sample as canceration positive if the classification result belongs to one of cells with high nuclear-to-cytoplasmic ratio and morphology meeting HSIL standard, abnormal keratinocytes in HSIL or SCC, abnormal naked nuclei, hollowed cells, non-hollowed cells with morphology meeting LSIL standard, abnormal glandular cells and abnormal metaplasia cells in HSIL or SCC; and if the classification result belongs to one of normal naked nucleus, normal mesodermal cell, pathogenic microorganism cell, normal accessory basal layer cell, normal cervical canal cell and typical parakeratosis, judging the result as canceration negative.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the method, a multi-attention mechanism is introduced, a channel attention mechanism is adopted to enable a network model to learn weights among different channels, a space attention mechanism is adopted to enable the network model to learn weights of different regions of cell images, and the network model can better extract key features in the cell images through a method of fusing the two attention mechanisms, so that a classification task can be favorably carried out, and the classification accuracy is improved.

2. The invention uses a high-precision segmentation network, adds a module for combining classification scores and cross-over ratios on the basis of the traditional regional suggestion network, and processes the fusion scores by using a non-maximum suppression algorithm to obtain the most appropriate suggestion frame. The high-precision segmentation network in the invention processes the classification score, which is a single variable, by adopting a non-maximum suppression algorithm so as to screen the most suitable suggestion frame, so that the algorithm not only needs to pay attention to whether the classification of the suggestion frame is correct, but also needs to meet the requirement of higher cross-over ratio as far as possible, thereby overcoming the problem that the cross-over ratio of the suggestion frames obtained by the traditional region suggestion network is poor in effect, enabling the model to more accurately position the region of interest, being capable of accurately extracting the features of the region of interest for classification, effectively reducing the calculated amount, eliminating the noise interference outside the region of interest and improving the classification accuracy.

3. The invention introduces a focus loss function, and the focus loss function adds a prepositive term in front of the cross entropy function, so that the weight of a small number of samples in a data set in the overall loss function can be increased. Compared with a cross entropy function, the focus loss function can effectively solve the problem of unbalanced training sample types in the model training process by improving the weight of a small number of samples in the loss function.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Fig. 2 is a block diagram of a compression excitation module according to the prior art.

Fig. 3 is a block diagram of a spatial attention module according to the prior art.

Fig. 4 is a diagram of a high-precision segmentation network architecture according to an embodiment of the present invention.

Detailed Description

The invention discloses a pathological cell classification method based on multi-attention fusion and high-precision segmentation network, which comprises the following steps: collecting a pap smear medical image dataset made with pathological cells; constructing a multi-attention fusion module for learning the weight and the spatial region weight among different channels of the medical image; constructing a high-precision segmentation network module to extract the characteristics of a pathological cell area in a medical image; constructing a depth network model based on a multi-attention fusion module and a high-precision segmentation network; and classifying pathological cells by using the trained deep network model with the optimal parameters, and further judging whether the cell sample in the medical image is cancerated or not. The invention not only needs less calculation amount, but also greatly improves the accuracy of pathological cell classification.

The present invention will be described in further detail with reference to the accompanying drawings.

FIG. 1 is a flow chart of a pathological cell classification method based on a multi-attention fusion mechanism and a high-precision segmentation network provided by the invention. As shown in fig. 1, the method of this embodiment includes the following steps:

step 1, collecting and preprocessing a pathological cell medical image. The process comprises the following steps:

step 1.1, collecting images of a pap smear made of pathological cells under a microscope field, classifying the pathological cells in each image by a professional physician into 13 categories of cells with high nuclear-to-cytoplasmic ratio morphology meeting HSIL standards, abnormal keratinocytes in HSIL or SCC, abnormal naked nuclei in HSIL or SCC, hollowed cells, non-hollowed cells with morphology meeting LSIL standards, abnormal glandular cells, abnormal metaplastic cells, normal naked nuclei, normal mesodermal cells, pathogenic microbial cells, normal parabasal layer cells, typical squamous metaplasia and typical parakeratosis;

Step 2, constructing a multi-attention fusion module, which comprises constructing a compression excitation module, so that a network model can learn the relationship between different channels of the pathological cell medical image; and constructing a spatial attention module to help the network model to identify key characteristic regions in the pathological cell medical image. The method specifically comprises the following steps:

and 2.1, constructing a compression excitation module. The compression excitation module can learn the relation among different channels of the pathological cell medical image, and the function of the compression excitation module is realized through three steps of compression, excitation and Scale.

The compression operation is to perform global average pooling on the pathological cell medical images, and after the compression operation, the size of the pathological cell medical images is compressed from H multiplied by W multiplied by C to 1 multiplied by C, so that the features across the space are aggregated. Where H denotes the height of the pathological cell medical image, W denotes the width of the pathological cell medical image, and C denotes the number of channels of the pathological cell medical image. The calculation formula for the compression operation is:

wherein u is _c (i, j) represents pixel values of different channels of the pathological cell medical image at the (i, j) position.

The excitation operation learns the weights of each channel of the pathological cell medical image, which contains two fully-connected layers, ultimately outputting a 1 × 1 × C vector. The calculation formula of the excitation operation is:

S＝F _ex (z,W)＝σ(g(z,w))＝σ(W ₂ δ(W ₁ z)) (2)

wherein sigmaRepresents sigmoid activation function, and δ represents ReLu function. W ₁ Weight parameter referring to the first fully connected layer, c ₂ Refers to the weight parameter of the second fully connected layer.

The Scale operation is to multiply the weight of each channel learned in the excitation operation by the original feature. The formula for Scale operation is:

u ^′ _c (i,j)＝S×u _c (i,j) (3)

wherein u _c Is a medical image of pathological cells input into the Scale operation formula, u ^′ _c Representing a pathological cell medical image output by a Scale operation formula;

and 2.2, constructing a space attention module. Firstly, aggregating the channel characteristics of the pathological cell medical image by using a global average pooling method, wherein the global average pooling formula along the channel direction is as follows:

where x (i, j) represents the characteristic of point (i, j) on the pathological cell medical image.

The weight G of the attention map is then learned by a 1 × 1 convolution, which is calculated by the formula:

G＝F _Conv ＝Mx(i,j) (5)

where M is a weight matrix learned by 1 × 1 convolution.

u′ _c (i,j)＝u×u _c (i,j) (6)

and 3, constructing a high-precision segmentation network, and generating an interested region on the pathological cell medical image through the high-precision segmentation network, so that the next classification task can be conveniently carried out. The method specifically comprises the following steps:

a plurality of rectangular anchor frames are generated on each pixel point on the pathological cell medical image, and the anchor frames are selected from 3 sizes and 3 different length-width ratios, and 9 anchor frames are calculated. And then, classifying and performing an offset regression task on each anchor frame. Wherein the anchor frame classification task uses a binary cross entropy loss function:

L _cls (p,y)＝-y log(p)-(1-y)log(1-p) (7)

where y is the true value and p is the predicted value.

The loss function used for the anchor frame offset regression is:

wherein y is ^′ δ is a constant value for the predicted value, and its value may be determined as appropriate.

i is the index of each anchor frame; a is a constant determined according to the situation and used for controlling the proportion of the loss of the classification task in the total loss; p is a radical of ^* 、t ^* Respectively representing true values, and p and t respectively representing predicted values; n is a radical of _cls The anchor frame number is indicated.

The position and the size of the suggested frame can be calculated by using the anchor frame offset obtained by the anchor frame regression, and then the intersection ratio of the suggested frame and the real frame can be calculated, wherein the calculation formula is as follows:

wherein, F _IoU Represents the intersection ratio of the suggestion frame and the real frame, A represents the area of the real frame, and B represents the area of the suggestion frame.

Weighting and summing the classification scores obtained by the classification tasks with the cross-over ratio, wherein the calculation formula is as follows:

Ts＝F _cls +λF _IoU (11)

whereinTs represents the fusion score, F _cls Represents the classification score, and λ is a coefficient that can be learned by the model and measures the weight of the cross-over ratio in the fusion score.

The high-precision segmentation network used by the invention is added with a module for combining the classification scores and the cross-over ratio on the basis of the traditional regional suggestion network, and the fusion scores are processed by using a non-maximum suppression algorithm to obtain the most appropriate suggestion frame. The high-precision segmentation network in the invention processes the classification score, which is a single variable, by adopting a non-maximum suppression algorithm so as to screen the most suitable suggestion frame, so that the algorithm not only needs to pay attention to whether the classification of the suggestion frame is correct, but also needs to meet the requirement of higher cross-over ratio as far as possible, thereby overcoming the problem that the cross-over ratio of the suggestion frames obtained by the traditional region suggestion network is poor in effect, enabling the model to more accurately position the region of interest, being capable of accurately extracting the features of the region of interest for classification, effectively reducing the calculated amount, eliminating the noise interference outside the region of interest and improving the classification accuracy.

And 4, building a depth network model based on multi-attention fusion and a high-precision segmentation network, and training the depth network model by utilizing the preprocessed pathological cell medical image to obtain the depth network model with the highest pathological cell classification accuracy. The method specifically comprises the following steps:

and 4.1, adding a multi-attention fusion module in the ResNet18 network. The network model is based on a ResNet18 network, the ResNet18 network has 5 stages, the first stage is convolution of 7 multiplied by 7, and the last 4 stages are composed of residual blocks. And the compressed excitation module and the spatial attention module are serially embedded into the second to fourth stages of the ResNet18 network, and the feature extraction part of the network model finally outputs a feature map obtained by extracting a pathological cell medical image.

And 4.2, sending the characteristic diagram into a high-precision segmentation network to obtain an interested area, namely an anchor frame containing pathological cells. And (4) acquiring the characteristics contained in the region of interest according to the characteristic diagram obtained in the step (4.1) and the anchor frame obtained in the previous step, and sending the characteristics to a classifier for classification after pooling.

And 4.3, performing the data set obtained in the step 1 according to the ratio of 8: and 2, dividing a training set and a test set. The training set is sent to the network model set up in the front for training, the loss function used in the training process adopts a focus loss function, and the calculation formula is as follows:

where FL represents the focus loss function value,

the predicted value is represented, p represents the true value, gamma is a weight, and the optimal value needs to be confirmed through experiments.

The focus loss function is preceded by a prefix term in the conventional cross-entropy function, which can increase the weight of a small number of samples in the data set in the overall loss function. Compared with a cross entropy function, the focus loss function can effectively solve the problem of unbalanced training sample types in the model training process by improving the weight of a small number of samples in the loss function.

And after the training is finished, testing the network model by using the test set, and selecting the model with the highest average accuracy as the optimal network model.

And 5, sending the pathological cell medical image to be detected into the network model optimized in the step 4 to obtain a classification result of the pathological cell sample. The method specifically comprises the following steps:

and 5.1, sending the pathological cell medical image to be detected into the network model optimized in the step 4 to obtain a classification result of the network model on the pathological cell medical image.

And 5.2, judging the classification result. If the classification result belongs to one of cells with high nuclear-to-mass ratio and morphology meeting HSIL standard, abnormal keratinocytes in HSIL or SCC, abnormal naked nuclei, hollow cells in HSIL or SCC, non-hollow cells with morphology meeting LSIL standard, abnormal gland cells and abnormal metaplasia cells, judging the sample as cancerization positive; and if the classification result belongs to one of normal naked nucleus, normal mesodermal cell, pathogenic microorganism cell, normal accessory basal layer cell, normal cervical canal cell and typical parakeratosis, judging the result as canceration negative.

Claims

1. A pathological cell classification method based on a multi-attention fusion mechanism and a high-precision segmentation network is characterized by comprising the following steps:

step 1, collecting and preprocessing a pathological cell medical image;

step 2, constructing a multi-attention fusion module, which comprises constructing a compression excitation module so that a network model can learn the relationship among different channels; constructing a spatial attention module to help the network model identify key feature regions in the cell image;

2. The method for classifying pathological cells based on the multi-attention fusion mechanism and the high-precision segmentation network as claimed in claim 1, wherein the step 1 process comprises:

3. The method for classifying pathological cells based on the multi-attention fusion mechanism and the high-precision segmentation network as claimed in claim 1, wherein the step 2 is specifically as follows:

step 2.1. Construct the compression excitation module, including:

S＝F _ex (z，W)＝σ(g(z，w))＝σ(W ₂ δ(W ₁ z)) (2)

wherein sigma represents a sigmoid activation function, and delta represents a ReLu function; w1 refers to the weight parameter of the first fully-connected layer, W ₂ Refers to the weight parameter of the second fully-connected layer;

scale: multiplying the weight of each channel learned in the excitation operation by the original features, and the calculation formula of Scale operation is:

u′ _c (i，j)＝S×u _c (i，j) (3)

wherein u is _c Is a pathological cell medical image u 'input to the Scale operation formula' _c Representing a pathological cell medical image output by a Scale operation formula;

G＝F _Conv ＝Mx(i，j) (5)

where M is a weight matrix learned by 1 × 1 convolution;

u′ _c (i，j)＝G×u _c (i，j) (6)。

4. the method for classifying pathological cells based on the multi-attention fusion mechanism and the high-precision segmentation network as claimed in claim 1, wherein the step 3 is specifically as follows:

L _cls (p，y)＝-ylog(p)-(1-y)log(1-p) (7)

where y is the true value and p is the predicted value;

the loss function used for the anchor frame offset regression is:

wherein y' is a predicted value and δ is a constant, the value of which can be determined as appropriate;

wherein, F _IoU Representing the intersection ratio of the suggestion frame and the real frame, A representing the area of the real frame, B representing the area of the suggestion frame;

weighting and summing the classification scores obtained by the anchor frame classification task with the cross-over ratio, wherein the calculation formula is as follows:

Ts＝F _cls +λF _IoU (11)

5. The method for classifying pathological cells according to claim 1, wherein the step 4 comprises:

where FL represents the focus loss function value,

6. The method for classifying pathological cells based on a multi-attention fusion mechanism and a high-precision segmentation network as claimed in claim 1, wherein the step 5 specifically comprises: