CN115965602A - Abnormal cell detection method based on improved YOLOv7 and Swin-Unet - Google Patents
Abnormal cell detection method based on improved YOLOv7 and Swin-Unet Download PDFInfo
- Publication number
- CN115965602A CN115965602A CN202211726362.XA CN202211726362A CN115965602A CN 115965602 A CN115965602 A CN 115965602A CN 202211726362 A CN202211726362 A CN 202211726362A CN 115965602 A CN115965602 A CN 115965602A
- Authority
- CN
- China
- Prior art keywords
- cell
- swin
- yolov7
- model
- improved
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 100
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000011218 segmentation Effects 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000012216 screening Methods 0.000 claims abstract description 17
- 230000001575 pathological effect Effects 0.000 claims abstract description 15
- 238000005070 sampling Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 19
- 238000002372 labelling Methods 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000003709 image segmentation Methods 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 210000005075 mammary gland Anatomy 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000007935 neutral effect Effects 0.000 claims description 2
- 238000007634 remodeling Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 206010008342 Cervix carcinoma Diseases 0.000 description 4
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 4
- 201000010881 cervical cancer Diseases 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 3
- 201000011510 cancer Diseases 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000009595 pap smear Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses an abnormal cell detection method based on improved YOLOv7 and Swin-Unet, which comprises the following steps: collecting pathological cell smear images, and making an abnormal cell detection data set and a segmentation data set; constructing an improved YOLOv7 model and training; constructing a detection result screening module, and classifying cells in the detection network output image; building and training a Swin-Unet model for segmenting the overlapped cell mass images: based on an Unet model, a Swin-Transformer module is introduced to sample according to the local relation and the global relation of the cell image under multiple scales; abnormal cell detection was performed using the improved YOLOv7 and Swin-uet models. The invention fully utilizes the context information during cell detection, effectively processes the cell clusters difficult to detect, and can greatly improve the accuracy and the recall rate on the premise of ensuring the detection rate.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an abnormal cell detection method based on improved YOLOv7 and Swin-Unet.
Background
The pathological cytology examination refers to taking cytology specimens, such as sputum cast-off cells, liquid-based cells and the like, making pathological cytology slices through smear, pathological section making technology and the like, and then observing the conditions of cell types, cell types and the like through a microscope to diagnose diseases. For example, screening of diseases such as breast cancer and cervical cancer is mostly diagnosed by applying pathological cytology examination, wherein the cervical cancer is the most common gynecological malignant tumor worldwide and seriously threatens the life of women. 2016 world health organization reports: more than 50 million new cervical cancer cases are globally sent every year, the cases account for about 28 percent of China in developing countries, the early treatment effect of the cancer is good, the cost is low, the difficulty is small, but no obvious symptoms exist and the cancer is not easy to find, cytology (including traditional pap smears) is taken as a main screening means of common female cancers such as the cervical cancer in China, but the overall screening level is not high, and mainly because domestic experienced cytopathologists and auxiliary personnel are scarce, the auxiliary detection of pathological cells by using a computer is very necessary and valuable.
The detection method in the prior art is mainly based on a deep learning method, and comprises a method based on target detection and a method based on example segmentation. Chinese patent application (CN 202111048528.2) "an abnormal cell detection method based on attention-inducing mechanism", adopts advanced target detection network RetinaNet to screen suspicious cells, and then classifies the suspicious cells by using Mean-Teacher network with attention-inducing mechanism. The method effectively realizes false positive inhibition in the detection process and improves the detection precision, but the method has poor performance when the noise of a detection sample is increased and the overlapped cell mass is increased. The main performance is as follows: (1) overlapping abnormal cells are difficult to detect; (2) When the sample contains non-cell units such as tissue fluid, the detection precision is obviously reduced; (3) The attention mechanism introduced does not sufficiently combine multi-scale information, and the detection performance needs to be improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an abnormal cell detection method based on improved YOLOv7 and Swin-Unet, which is used for carrying out staged detection on overlapped abnormal cells difficult to detect, effectively detecting difficult-to-detect cell samples by utilizing the characteristics of good performance of large targets and high precision of example segmentation of target detection, and preventing the phenomena of missed detection and false detection; the most advanced dynamic head attention mechanism is adopted in the detection head part, the attention of three dimensions of scale, space and task is fully fused, and the detection precision can be greatly improved.
In order to solve the technical problems, the invention adopts the following technical scheme.
An abnormal cell detection method based on improved YOLOv7 and Swin-Unet, comprising the following steps:
step 1, collecting a cell smear image in pathological cytology examination, and making an abnormal cell detection data set and an abnormal cell segmentation data set;
step 3, building a detection result screening module, classifying cells in the detection network output image, outputting the abnormal cell image, and inputting the overlapped cell cluster image as a segmentation model;
step 4, building a Swin-Unet model and training, wherein the Swin-Unet model is used for segmenting the overlapped cell mass images: based on a most commonly used Unet model in the medical field, a Swin-Transformer module is introduced to sample the local relation and the global relation of the cell image under multiple scales; the Swin-Unet model structure comprises an Encoder Encoder, a Neck network Neck and a Decoder Decoder;
and 5, carrying out abnormal cell detection by using a modified Yolov7 model and a Swin-Unet model.
The step 1 process is as follows:
1-1, collecting cell smear images in pathological cytology examination, including cervical cell images and mammary gland cell images, performing sliding window cutting on original cell smear images, wherein the cutting size is 640 multiplied by 640, the overlapping range of the sliding window is 50%, obtaining cell images of small areas, performing rectangular frame labeling on independent abnormal cells and overlapping cell groups by using a LabelImg tool, storing labels as XML files, and making an abnormal cell detection data set for training an improved YOLOv7 model;
and 1-2, screening cell images with labels of overlapped cell groups in the abnormal cell detection data set, subdividing a segmentation area by utilizing a polygonal labeling function of a LabelImg tool, labeling the area, labeling the abnormal cells, storing the labels, and preparing the abnormal cell segmentation data set for training a Swin-Unet model.
Specifically, in step 2, the building of the improved YOLOv7 model includes:
2-1, constructing an abnormal cell detection data preprocessing module, comprising: performing data enhancement of turning and translation on the cell image; and (3) carrying out noise reduction processing on the cell image by using Gaussian filtering, wherein a Gaussian kernel function is as follows:
g (x, y) is a pixel value of the denoised cell image, x and y represent coordinates of pixel points, and sigma is a standard variance of Gaussian, so that the smoothness degree of the cell image is determined;
2-2, constructing a backbone network of the improved YOLOv7 model: firstly, convolving a feature map of an input cell image by a 4-layer CBS module, wherein the CBS module comprises a Conv layer, a BN layer and a SiLU layer, and then stacking an ELAN module and an MP module to output three feature maps; the ELAN comprises a plurality of CBS modules, the input and output characteristic size of the ELAN is kept unchanged, the number of channels is changed in the first two CBS modules, the latter input channels are all kept consistent with the output channels, and the output channels are required channels through the last CBS module; the MP module is spliced by output vectors of the Maxpool and CBS modules;
2-3, building a neck network of the improved YOLOv7 model: fusing three characteristic graphs output by a backbone network by using a PAFPN structure;
2-4. Build the head network of the improved YOLOv7 model: introducing a dynamic head Dyhead module to perform characteristic diagram attention fusion; its dynamic head module structure includes: scale-aware attention, spatially-aware attention, and task-aware attention with a stacked fit of attention functions; the formula for applying self-attention is:
W(F)=π C (π S (π L (F)·F)·F)·F (2)
wherein F ∈ R L×S×C Corresponding to the input feature vector, R is the input feature vector set, L represents the scale degree of the feature, S = H multiplied by W is the remodeling of the height H and width W dimensions of the feature map, C represents the channel number of the feature map, pi L (x)、π S (x)、π C (x) The attention function which is independent on three dimensions of task, space and scale respectively corresponds to the formulas (3), (4) and (5):
π C (F)·F=max(α 1 (F)·F c +β 1 (F),α 2 (F)·F c +β 2 (F)) (5)
wherein, in the formula (3), f (x) is a linear function approximated by 1 × 1 convolution,is a Hard-Sigmoid activation function;
in equation (4), K is the number of sparse sampling locations, ω is the weighting factor for l and K, p k +Δp k Is the position of the spatial offset, Δ m, by self-learning k Is position p k A self-learning scalar;
in the formula (5), [ alpha ] 1 ,α 2 ,β 1 ,β 2 ] T (·) is a hyper-function of the learning control activation threshold, F C The feature slice at the C-th channel is represented.
Specifically, in step 2, the process of training the improved YOLOv7 model is as follows:
inputting cell images in abnormal cell detection data sets, wherein the size of the cell images is 640 multiplied by 640, the batch-size is set to be 16, training 180 epochs to obtain an improved YOLOv7 model with the best effect, extracting overlapped cell mass images in detection results, and setting different IoU cross-over ratios; where @ x represents the performance of setting the IoU threshold to x; the mAP represents the average value of the AP calculated for each corresponding category, the higher the value, the better the detection effect, the higher the Recall Recall rate, namely the Recall ratio, the higher the value, the fewer the leaked marked cells.
Specifically, the process of building the Swin-uet model in step 4 includes:
4-1, constructing an encoder part of a Swin-Unet model: firstly, the minimum unit of a cell image is converted into a 4 x 4 Patch from a pixel through the Patch Panel, a structure of a high-dimensional space is maintained through a linear embedding module, then a down-sampling process is carried out, the initial part still uses a convolution layer in Unet, and the two following down-sampling processes are replaced by a pair of Swin-Transformer, and the basic formula is as follows:
wherein Q, K, V represent query, key, value matrix in the self Attention separately, d represents the dimensionality, B is the relative position bias that can be learnt, attention (Q, K, V) represents the Attention function in each patch;
4-2, building a neck layer of a Swin-Unet model: filtering the high-dimensional characteristic information after down-sampling by using a group of paired Swin-transformers;
4-3, constructing a decoder part of the Swin-Unet model: the network structure corresponding to the encoder network comprises the steps of firstly, performing twin Swin-transducer + batch expansion twice, performing original up-sampling +2 times of convolution on the last layer of characteristics, splicing characteristic graphs corresponding to encoders in the same phase for each up-sampling module to form a residual block, performing linear projection on output characteristics once, and sending the output characteristics into a convolution network for classification to obtain a final output result.
Specifically, the training process of the Swin-Unet model in the step 4 is as follows:
taking the overlapped cell mass image as input, setting the batch-size to 64, training 80 epochs, and setting different IoU (cross-over ratio, which is expressed as a cross-over ratio threshold value of a prediction Mask and a group Truth Mask in a segmentation task.
In the step 5, the process of abnormal cell detection using the improved Yolov7 and Swin-Unet model comprises the following steps:
5-1, obtaining a cell smear image in the pathological cytology examination, performing sliding window cutting on the cell smear image, and inputting the cut cell image into a detection network in sequence;
5-2, extracting the features of the cell image by using the improved backbone network of the YOLOv7 network, sending feature maps of different scales into a neck network, fusing the features, sending the feature maps into a detection head network, and outputting a detection result;
5-3, screening the detection result by using a detection result screening module, judging that abnormal cells are directly used as output, and sending the abnormal cells into a segmentation network if the abnormal cells are judged to be overlapped cell clusters;
and 5-4. The encoder of the Swin-Unet network performs down-sampling on the overlapped cell images, the overlapped cell images are filtered by the neck network, the overlapped cell images enter the decoder network to perform up-sampling by using a residual error structure, the output characteristic images enter a convolution layer to classify image segmentation areas, the areas judged to be abnormal cells are output, and the output characteristic images and the abnormal cells in the step 3 are used as final output results.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention adopts a staged detection method of target detection and example segmentation, fully utilizes the high precision of a segmentation model to process overlapping cell clusters which are difficult to detect, simultaneously keeps the high performance of the original detection model and the easy labeling property of data, is used for processing single abnormal cells which are easy to detect, effectively solves the problems of cell clusters which are difficult to detect and easy to detect by mistake under the condition of not losing the real-time detection performance, improves the recall ratio and the accuracy of detection, realizes good balance on the hardware level and precision, meets the practical requirements better and has feasibility.
2. The invention introduces the most advanced dynamic head module (Dyhead), and simultaneously fits scale perception attention, space perception attention and task perception attention by stacking of attention functions, so that the cell context correlation under multiple dimensions is fully considered in the detection process of a target detection network and is matched with the real detection process of a pathologist, the model is more robust, and the detection accuracy is improved.
3. According to the method, a Swin-transducer attention module is introduced, and the local attention and the global attention of the image are combined through a sliding window mechanism, so that the segmentation performance of the segmentation network is enhanced effectively, the identification accuracy of the model on the complex cell mass is improved, and the overall detection precision is improved.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of an improved YOLOv7 model according to an embodiment of the present invention.
FIG. 3 is a diagram of a dynamic head module according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of the Swin-uet model according to an embodiment of the present invention.
Figure 5 is a Swin-Transformer block diagram according to one embodiment of the present invention.
FIG. 6 is a diagram of an overall algorithm implementation process according to an embodiment of the present invention.
Detailed Description
The invention relates to an abnormal cell detection method based on improved YOLOv7 and Swin-Unet, which comprises the following steps: collecting cell slice images in cytopathology examination, and making an abnormal cell detection data set and an abnormal cell segmentation data set; an improved YOLOv7 model is built, and the robustness of the model is enhanced by adding a dynamic attention head to detect independent abnormal cells and overlapped cell clusters; carrying out abnormal cell detection by using the model with the best training effect, and screening the detection result; inputting the overlapped cell groups in the detection result into a Swin-Unet model for segmentation detection. The invention fully considers the context information during cell detection, effectively processes the cell clusters difficult to detect, and effectively improves the accuracy and the recall rate on the premise of ensuring the detection rate.
The present invention will be described in further detail with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method according to an embodiment of the present invention. As shown in fig. 1, the method of this embodiment includes the following steps:
step 1, collecting a cell smear image in pathological cytology examination, and making an abnormal cell detection data set and an abnormal cell segmentation data set;
1-1, collecting cell smear images in pathological cytology examination, including cervical cell images and mammary gland cell images, performing sliding window cutting on original cell smear images, wherein the cutting size is 640 multiplied by 640, the overlapping range of the sliding window is 50%, obtaining cell images of small areas, performing rectangular frame labeling on independent abnormal cells and overlapping cell groups by using a LabelImg tool, storing labels as XML files, and making an abnormal cell detection data set for training an improved YOLOv7 model;
1-2, screening cell images with labels of overlapped cell groups in the abnormal cell detection data set, subdividing a segmentation area by utilizing a polygon labeling function of a LabelImg tool, labeling the area, labeling abnormal cells, storing the labels, and preparing an abnormal cell segmentation data set for training a Swin-Unet model;
and 2, constructing an improved YOLOv7 model and training the model for detecting abnormal cells and overlapped cell clusters. The invention is based on the latest YOLOv7 model, and improves the extraction problem of multi-dimensional characteristic information, the improved YOLOv7 model structure is shown in FIG. 2, the overall structure comprises a data preprocessing module (Process), a Backbone network (Backbone), a Neck network (Neck) and a detection network (Head), and the construction of each network is as follows:
2-1, constructing an abnormal cell detection data preprocessing module, comprising: performing data enhancement of turning and translation on the cell image; and (3) carrying out noise reduction processing on the cell image by using Gaussian filtering, wherein a Gaussian kernel function is as follows:
g (x, y) is a pixel value of the denoised cell image, x and y represent coordinates of pixel points, and sigma is a standard variance of Gaussian, so that the smoothness degree of the cell image is determined;
2-2, constructing a backbone network of the improved YOLOv7 model: firstly, convolving a feature map of an input cell image by a 4-layer CBS module, wherein the CBS module comprises a Conv layer, a BN layer and a SiLU layer, and then stacking an ELAN module and an MP module to output three feature maps; the ELAN comprises a plurality of CBS modules, the input and output characteristic size of the ELAN is kept unchanged, the number of channels is changed in the first two CBS modules, the latter input channels are all kept consistent with the output channels, and the output channels are required channels through the last CBS module; the MP module is spliced by output vectors of the Maxpool and CBS modules;
2-3, building a neck network of the improved YOLOv7 model: fusing three characteristic graphs output by a main Network by utilizing a PAFPN (Path Aggregation Network with Feature Pyramid Networks) structure;
2-4. Build the head network of the improved YOLOv7 model: introducing a dynamic head Dyhead module to perform characteristic diagram attention fusion; its dynamic head module structure includes: scale-aware attention, spatially-aware attention, and task-aware attention with a stacked fit of attention functions; the formula for applying self-attention is:
W(F)=π C (π S (π L (F)·F)·F)·F (2)
wherein F ∈ R L×S×C Corresponding to the input feature vector, R is an input feature vector set, L represents the scale number of the feature, S = H multiplied by W is the reshaping of the height (H) and width (W) dimensions of the feature map, C represents the channel number of the feature map, and pi L (x)、π S (x)、π C (x) The attention functions which are independent in three dimensions of task, space and scale respectively correspond to formulas (3), (4) and (5):
π C (F)·F=max(α 1 (F)·F c +β 1 (F),α 2 (F)·F c +β 2 (F)) (5)
wherein in the formula (3), f (x) is a linear function approximated by a 1 × 1 convolution,is a Hard-Sigmoid activation function;
in equation (4), K is the number of sparsely sampled locations, ω is the weighting factor corresponding to l and K, p k +Δp k Is the position of the spatial offset, Δ m, by self-learning k Is position p k A self-learning scalar;
in the formula (5), [ alpha ] 1 ,α 2 ,β 1 ,β 2 ] T (·) is a hyper-function of the learning control activation threshold, F C Representing the feature slice at the C-th channel;
2-5 training the improved Yolov7 model: inputting an abnormal cell detection data set, cutting an image into a size of 640 multiplied by 640, setting a batch-size of 16, training 180 epochs to obtain an improved Yolov7 model with the best effect, extracting overlapped cell mass samples in a detection result, setting different IoU (intersection ratio, which is expressed as the intersection ratio of a prediction frame and a group trial box in a target detection task) thresholds, and setting the training result as shown in Table 1.
TABLE 1
Where @ x represents an expression in which the IoU threshold is set to x, and mapp represents an Average value of AP (Average Precision) calculated for each category, and a higher value indicates a better detection effect, and Recall is also called Recall rate, and a higher value indicates a smaller number of cells with labels that have been missed.
Step 3, building a detection result screening module, classifying cells in the detection network output image, outputting the abnormal cell image, and inputting the overlapped cell cluster image as a segmentation model;
and 4, building a Swin-Unet model and training the Swin-Unet model for segmenting the overlapped cell mass images. The invention is based on the most commonly used Unet model in the medical field, aiming at the local relation and the global relation of the cell image under the multi-scale, a Swin-Transformer module is introduced for sampling, the structure of the Swin-Unet model is shown as figure 4, the overall structure of the Swin-Unet model comprises an Encoder (Encoder), a Neck network (Neck) and a Decoder (Decoder), and the construction of each network is as follows:
4-1, constructing an encoder part of a Swin-Unet model: firstly, the minimum unit of a cell image is converted into a 4 × 4 Patch from a pixel through the Patch Panel, a downsampling process is performed after a structure of a high-dimensional space is maintained through a linear embedding module, the initial part still uses a convolution layer in Unet, and downsampling of the two subsequent times is replaced by a pair of Swin-Transformers, the structure of which is shown in FIG. 5, and the basic formula of which is shown in formula (6):
wherein Q, K, V represent query, key, value matrix in the self Attention separately, d represents the dimensionality, B is the relative position bias that can be learnt, attention (Q, K, V) represents the Attention function in each patch;
4-2, building a neck layer of a Swin-Unet model: filtering the high-dimensional characteristic information after down-sampling by using a group of paired Swin-transformers;
4-3, constructing a decoder part of a Swin-Unet model: the network structure corresponding to the encoder network comprises the steps of firstly, performing twin Swin-transducer + batch expansion twice, performing original up-sampling + 2 times of convolution on the last layer of characteristics, splicing characteristic graphs corresponding to encoders in the same phase for each up-sampling module to form a residual block, performing linear projection on output characteristics once, and sending the output characteristics into a convolution network for classification to obtain a final output result.
4-4, training Swin-Unet model: overlapping cell clusters were entered as segmentation datasets, with the batch-size set to 64, 80 epochs trained, and different IoU (cross-over ratio, expressed as the cross-over ratio of the predictive Mask to the group try Mask in the example segmentation task) thresholds set, with the training results shown in table 2.
TABLE 2
Wherein each index is the same as table 1.
Step 5, using improved Yolov7 and Swin-Unet model to detect abnormal cells, the algorithm flow is shown in FIG. 6, and the detailed process is as follows:
5-1, obtaining a cell smear image in the pathological cytology examination, performing sliding window cutting on the cell smear image, and inputting the cut cell image into a detection network in sequence;
5-2, extracting the features of the cell image by using the improved backbone network of the YOLOv7 network, sending feature maps of different scales into a neck network, fusing the features, sending the feature maps into a detection head network, and outputting a detection result;
5-3, screening the detection result by using a detection result screening module, judging that abnormal cells are directly used as output, and sending the abnormal cells into a segmentation network if the abnormal cells are judged to be overlapped cell clusters;
and 5-4. Down-sampling the overlapped cell images by an encoder of the Swin-Unet network, filtering the overlapped cell images by the neck network, entering a decoder network to perform up-sampling by using a residual error structure, entering an output characteristic image into a convolution layer to classify image segmentation regions, outputting the regions which are judged to be abnormal cells, and taking the abnormal cells together with the abnormal cells in the step 3 as a final output result.
Claims (6)
1. An abnormal cell detection method based on improved YOLOv7 and Swin-Unet, which is characterized by comprising the following steps:
step 1, collecting a cell smear image in pathological cytology examination, and making an abnormal cell detection data set and an abnormal cell segmentation data set;
step 2, constructing an improved YOLOv7 model and training the improved YOLOv7 model for detecting abnormal cells and overlapped cell clusters, improving the extraction problem of multi-dimensional characteristic information on the basis of the latest YOLOv7 model to obtain an improved YOLOv7 model structure, wherein the improved YOLOv7 model structure comprises an abnormal cell detection data preprocessing module Process, a Backbone network backhaul, a Neck network neutral and a detection network Head;
step 3, building a detection result screening module, classifying cells in the detection network output image, outputting the abnormal cell image, and inputting the overlapped cell cluster image as a segmentation model;
step 4, building a Swin-Unet model and training, wherein the Swin-Unet model is used for segmenting the overlapped cell mass images: based on the most common Unet model in the medical field, aiming at the local relation and the global relation of the cell image under the multi-scale, a Swin-Transformer module is introduced for sampling; the Swin-Unet model structure comprises an Encoder Encoder, a Neck network Neck and a Decoder Decoder;
step 5, carrying out abnormal cell detection by using improved YOLOv7 and Swin-Unet model;
the step 1 process is as follows:
1-1, collecting cell smear images in pathological cytology examination, including cervical cell images and mammary gland cell images, performing sliding window cutting on original cell smear images, wherein the cutting size is 640 multiplied by 640, the overlapping range of the sliding window is 50%, obtaining cell images of small areas, performing rectangular frame labeling on independent abnormal cells and overlapping cell groups by using a LabelImg tool, storing labels as XML files, and making an abnormal cell detection data set for training an improved YOLOv7 model;
and 1-2, screening cell images with labels of overlapped cell groups in the abnormal cell detection data set, subdividing a segmentation area by utilizing a polygonal labeling function of a LabelImg tool, labeling the area, labeling the abnormal cells, storing the labels, and preparing the abnormal cell segmentation data set for training a Swin-Unet model.
2. The method for detecting abnormal cells based on improved YOLOv7 and Swin-Unet as claimed in claim 1, wherein in step 2, the construction of the improved YOLOv7 model comprises:
2-1, constructing an abnormal cell detection data preprocessing module, which comprises: performing data enhancement of turning and translation on the cell image; and (3) carrying out noise reduction processing on the cell image by using Gaussian filtering, wherein a Gaussian kernel function is as follows:
g (x, y) is a pixel value of the denoised cell image, x and y represent coordinates of pixel points, and sigma is a standard variance of Gaussian, so that the smoothness degree of the cell image is determined;
2-2, constructing a backbone network of the improved YOLOv7 model: firstly, convolving a feature map of an input cell image by a 4-layer CBS module, wherein the CBS module comprises a Conv layer, a BN layer and a SiLU layer, and then stacking an ELAN module and an MP module to output three feature maps; the ELAN comprises a plurality of CBS modules, the input and output characteristic size of the ELAN is kept unchanged, the number of channels is changed in the first two CBS modules, the latter input channels are all kept consistent with the output channels, and the output channels are required channels through the last CBS module; the MP module is spliced by output vectors of the Maxpool and CBS modules;
2-3. Build the neck network of the improved YOLOv7 model: fusing three feature maps output by a backbone network by using a PAFPN structure;
2-4. Build the head network of the improved YOLOv7 model: a dynamic head Dyhead module is introduced to carry out feature map attention fusion; its dynamic head module structure includes: scale-aware attention, spatially-aware attention, and task-aware attention with a stacked fit of attention functions; the formula for applying self-attention is:
W(F)=π C (π S (π L (F)·F)·F)·F (2)
wherein F ∈ R L×S×C Corresponding to the input feature vector, R is the input feature vector set, L represents the scale number of the feature, S = H multiplied by W is the remodeling of the height H and width W dimensions of the feature map, C represents the channel number of the feature map, and pi L (x)、π S (x)、π C (x) The attention function which is independent on three dimensions of task, space and scale respectively corresponds to the formulas (3), (4) and (5):
π C (F)·F=max(α 1 (F)·F c +β 1 (F),α 2 (F)·F c +β 2 (F)) (5)
wherein in the formula (3), f (x) is a linear function approximated by a 1 × 1 convolution,is a Hard-Sigmoid activation function;
in equation (4), K is the number of sparse sampling locations, ω is the weighting factor for l and K, p k +Δp k Is the position of the spatial offset, Δ m, by self-learning k Is position p k A self-learning scalar;
in the formula (5), [ alpha ] 1 ,α 2 ,β 1 ,β 2 ] T (= θ) ·) is a hyper-function of learning control activation threshold, F C The feature slice at the C-th channel is represented.
3. The method for detecting abnormal cells based on improved YOLOv7 and Swin-uet as claimed in claim 1, wherein in step 2, the process of training the improved YOLOv7 model is:
inputting cell images in abnormal cell detection data sets, wherein the size of the cell images is 640 multiplied by 640, the batch-size is set to be 16, training 180 epochs to obtain an improved YOLOv7 model with the best effect, extracting overlapped cell mass images in detection results, and setting different IoU cross-over ratios; where @ x represents the performance of setting the IoU threshold to x; the mAP represents the average value of the AP calculated for each corresponding category, the higher the value, the better the detection effect, the higher the Recall Recall rate, namely the Recall ratio, the higher the value, the fewer the leaked marked cells.
4. The improved YOLOv7 and Swin-uet based abnormal cell detection method as claimed in claim 1, wherein the process of constructing Swin-uet model in step 4 comprises:
4-1, constructing an encoder part of a Swin-Unet model: firstly, the minimum unit of a cell image is converted into a 4 x 4 Patch from a pixel through the Patch Panel, a structure of a high-dimensional space is maintained through a linear embedding module, then a down-sampling process is carried out, the initial part still uses a convolution layer in Unet, and the two following down-sampling processes are replaced by a pair of Swin-Transformer, and the basic formula is as follows:
wherein Q, K, V represent query, key, value matrix in the self Attention separately, d represents the dimensionality, B is the relative position bias that can be learnt, attention (Q, K, V) represents the Attention function in each patch;
4-2, building a neck layer of the Swin-Unet model: filtering the high-dimensional characteristic information after down-sampling by using a group of paired Swin-transformers;
4-3, constructing a decoder part of the Swin-Unet model: the network structure corresponding to the encoder network comprises the steps of firstly, performing twin Swin-transducer + batch expansion twice, performing original up-sampling +2 times of convolution on the last layer of characteristics, splicing characteristic graphs corresponding to encoders in the same phase for each up-sampling module to form a residual block, performing linear projection on output characteristics once, and sending the output characteristics into a convolution network for classification to obtain a final output result.
5. The improved Yolov7 and Swin-Unet based abnormal cell detection method as claimed in claim 1, wherein the Swin-Unet model training process in step 4 is:
with the overlapping cell mass images as input, set the batch-size to 64, train 80 epochs, and set different ious (union ratio, expressed as a union ratio threshold for predicting Mask and group Truth Mask in the segmentation task.
6. The method for detecting abnormal cells based on improved Yolov7 and Swin-Unet as claimed in claim 1, wherein in step 5, the process of abnormal cell detection using improved Yolov7 and Swin-Unet model comprises:
5-1, obtaining a cell smear image in the pathological cytology examination, performing sliding window cutting on the cell smear image, and inputting the cut cell image into a detection network in sequence;
5-2, extracting the features of the cell image by using a backbone network of the improved YOLOv7 network, sending feature maps with different scales into a neck network, sending the feature maps into a detection head network after feature fusion, and outputting a detection result;
5-3, screening the detection result by using a detection result screening module, judging that abnormal cells are directly used as output, and sending the abnormal cells into a segmentation network if the abnormal cells are judged to be overlapped cell clusters;
and 5-4. Down-sampling the overlapped cell images by an encoder of the Swin-Unet network, filtering the overlapped cell images by the neck network, entering a decoder network to perform up-sampling by using a residual error structure, entering an output characteristic image into a convolution layer to classify image segmentation regions, outputting the regions which are judged to be abnormal cells, and taking the regions and the abnormal cells in the step 3 as final output results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211726362.XA CN115965602A (en) | 2022-12-29 | 2022-12-29 | Abnormal cell detection method based on improved YOLOv7 and Swin-Unet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211726362.XA CN115965602A (en) | 2022-12-29 | 2022-12-29 | Abnormal cell detection method based on improved YOLOv7 and Swin-Unet |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115965602A true CN115965602A (en) | 2023-04-14 |
Family
ID=87363239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211726362.XA Pending CN115965602A (en) | 2022-12-29 | 2022-12-29 | Abnormal cell detection method based on improved YOLOv7 and Swin-Unet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115965602A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452574A (en) * | 2023-04-28 | 2023-07-18 | 合肥工业大学 | Gap detection method, system and storage medium based on improved YOLOv7 |
CN116630294A (en) * | 2023-06-08 | 2023-08-22 | 南方医科大学南方医院 | Whole blood sample detection method and device based on deep learning and storage medium |
CN116844161A (en) * | 2023-09-04 | 2023-10-03 | 深圳市大数据研究院 | Cell detection classification method and system based on grouping prompt learning |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN117935236A (en) * | 2024-01-23 | 2024-04-26 | 山东大学 | Dark and weak celestial body searching method based on convolutional neural network |
-
2022
- 2022-12-29 CN CN202211726362.XA patent/CN115965602A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452574A (en) * | 2023-04-28 | 2023-07-18 | 合肥工业大学 | Gap detection method, system and storage medium based on improved YOLOv7 |
CN116630294A (en) * | 2023-06-08 | 2023-08-22 | 南方医科大学南方医院 | Whole blood sample detection method and device based on deep learning and storage medium |
CN116630294B (en) * | 2023-06-08 | 2023-12-05 | 南方医科大学南方医院 | Whole blood sample detection method and device based on deep learning and storage medium |
CN116844161A (en) * | 2023-09-04 | 2023-10-03 | 深圳市大数据研究院 | Cell detection classification method and system based on grouping prompt learning |
CN116844161B (en) * | 2023-09-04 | 2024-03-05 | 深圳市大数据研究院 | Cell detection classification method and system based on grouping prompt learning |
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN117314898B (en) * | 2023-11-28 | 2024-03-01 | 中南大学 | Multistage train rail edge part detection method |
CN117935236A (en) * | 2024-01-23 | 2024-04-26 | 山东大学 | Dark and weak celestial body searching method based on convolutional neural network |
CN117935236B (en) * | 2024-01-23 | 2024-07-30 | 山东大学 | Dark and weak celestial body searching method based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115965602A (en) | Abnormal cell detection method based on improved YOLOv7 and Swin-Unet | |
CN112070772B (en) | Blood leukocyte image segmentation method based on UNet++ and ResNet | |
Jiang et al. | Deep learning for computational cytology: A survey | |
CN110942446A (en) | Pulmonary nodule automatic detection method based on CT image | |
CN113378791B (en) | Cervical cell classification method based on double-attention mechanism and multi-scale feature fusion | |
CN114266717B (en) | Parallel capsule network cervical cancer cell detection method based on Inception modules | |
CN114332572B (en) | Method for extracting breast lesion ultrasonic image multi-scale fusion characteristic parameters based on saliency map-guided hierarchical dense characteristic fusion network | |
CN115775226B (en) | Medical image classification method based on transducer | |
Jiang et al. | A systematic review of deep learning-based cervical cytology screening: from cell identification to whole slide image analysis | |
CN113838009A (en) | Abnormal cell detection false positive inhibition method based on semi-supervision mechanism | |
CN112233085A (en) | Cervical cell image segmentation method based on pixel prediction enhancement | |
Zhang et al. | Research on application of classification model based on stack generalization in staging of cervical tissue pathological images | |
CN117036288A (en) | Tumor subtype diagnosis method for full-slice pathological image | |
CN115471701A (en) | Lung adenocarcinoma histology subtype classification method based on deep learning and transfer learning | |
CN115471838A (en) | Cervical squamous lesion cell detection method based on depth self-adaptive feature extraction | |
CN115206495A (en) | Renal cancer pathological image analysis method and system based on CoAtNet deep learning and intelligent microscopic device | |
CN113012167B (en) | Combined segmentation method for cell nucleus and cytoplasm | |
CN113205484B (en) | Mammary tissue classification and identification method based on transfer learning | |
CN114387596A (en) | Automatic interpretation system for cytopathology smear | |
CN114140437A (en) | Fundus hard exudate segmentation method based on deep learning | |
CN109948706B (en) | Micro-calcification cluster detection method combining deep learning and feature multi-scale fusion | |
Cao et al. | Patch-to-Sample Reasoning for Cervical Cancer Screening of Whole Slide Image | |
CN116778164A (en) | Semantic segmentation method for improving deep V < 3+ > network based on multi-scale structure | |
CN115937188A (en) | Cytopathology image abnormality detection method based on improved YOLOv5 and EfficientNet | |
CN114973244A (en) | System and method for automatically identifying mitosis of H & E staining pathological image of breast cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |