CN116363060A - Mixed attention retinal vessel segmentation method based on residual U-shaped network - Google Patents
Mixed attention retinal vessel segmentation method based on residual U-shaped network Download PDFInfo
- Publication number
- CN116363060A CN116363060A CN202310106849.6A CN202310106849A CN116363060A CN 116363060 A CN116363060 A CN 116363060A CN 202310106849 A CN202310106849 A CN 202310106849A CN 116363060 A CN116363060 A CN 116363060A
- Authority
- CN
- China
- Prior art keywords
- module
- attention
- residual
- image
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 47
- 210000001210 retinal vessel Anatomy 0.000 title claims abstract description 21
- 210000001525 retina Anatomy 0.000 claims abstract description 29
- 238000011176 pooling Methods 0.000 claims abstract description 16
- 238000005070 sampling Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 36
- 230000002792 vascular Effects 0.000 claims description 16
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 15
- 230000002207 retinal effect Effects 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 13
- 238000003709 image segmentation Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 11
- 230000035945 sensitivity Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000002401 inhibitory effect Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 9
- 210000004204 blood vessel Anatomy 0.000 abstract description 6
- 238000012545 processing Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 27
- 239000011159 matrix material Substances 0.000 description 15
- 238000000605 extraction Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000004256 retinal image Effects 0.000 description 4
- 230000017105 transposition Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 206010012689 Diabetic retinopathy Diseases 0.000 description 1
- 206010025421 Macule Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000002911 mydriatic effect Effects 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30101—Blood vessel; Artery; Vein; Vascular
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention belongs to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which comprises the following steps: step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module. The invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.
Description
Technical Field
The invention relates to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network.
Background
The common color fundus image comprises structures such as retinal blood vessels, visual cups, optic discs and macula, wherein the abnormal shapes of the retinal blood vessels reflect early symptoms of various diseases of a human body, and analyzing characteristic structures such as the length, the width and the curvature of the retinal blood vessels is beneficial to a doctor to carry out rapid clinical pathological diagnosis, accurately grasp pathological situations of patients and provide powerful diagnosis basis for prevention and treatment of some diseases. The implementation difficulty of the retinal vessel segmentation technology is exacerbated by the complexity of the retinal vessel itself, particularly the extremely large number of capillary vessel branches at the fundus image tip. And the capillary vessel part at the fundus retina image tip is easily affected by acousto-optic and noise, so that the collected medical picture has low quality, less detail information, image blurring and other phenomena, and is unfavorable for clinical medical diagnosis.
The Chinese patent publication No. CN113487615A, named as "retina segmentation method and terminal based on residual network feature extraction", is characterized in that an original retina vascular image is firstly passed through a pretrained VGG coding layer to obtain five images; then five feature images are connected, decoded and focused to obtain a first output image; multiplying and convoluting the original retinal vascular image with the first output image to obtain a first intermediate image; obtaining the rest four intermediate images through the four residual error coding layers; then five intermediate images and five characteristic images are connected with an image connecting and decoding layer to obtain a second output image; and obtaining the retina blood vessel image after feature extraction through the connecting layer by the first output image and the second output image. The retinal blood vessel segmentation method has the advantages of low precision, low segmentation speed, poor capillary vessel tip segmentation effect and the problem of loss of image detail information.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which solves the problems of low segmentation precision, poor capillary vessel tip segmentation effect, low segmentation speed and loss of image detail information of the existing fundus image retinal vessel segmentation method.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a mixed attention retinal vessel segmentation method based on a residual U-shaped network comprises the following steps:
step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder part comprises a residual error module, an attention decoding module, a transpose convolution up-sampling module and a classification convolution module;
step 2, preparing a data set: the method uses a fundus retina DRIVE color dataset and a fundus retina CHASE_DB1 color dataset, performs image enhancement operation on the two sets of datasets, improves image contrast, performs image enhancement pretreatment on the two sets of datasets respectively, and expands the datasets;
step 3, training a network model: training a fundus retina image segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, and obtaining training weights;
step 4: selecting a proper loss function and determining an optimal evaluation index of the segmentation method: selecting a proper loss function to minimize the loss of the weight of the output image and the real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing the model until the training times reach the set threshold value or the value of the loss function reaches the set threshold value range, and considering that the model parameters are pre-trained and the model parameters are saved;
step 5, determining a segmentation model: and solidifying network model parameters to determine a final segmentation model, wherein if a retina image segmentation task is carried out, fundus retina color images can be directly input into the trained end-to-end network model to obtain a final retina binary segmentation image.
Further, the five encoders of the encoder path in the step 1 are composed of five residual error modules, five attention encoding modules and four M-pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.
Further, in the step 1, the residual error module is composed of a first convolution layer and a second convolution layer, each convolution layer is composed of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of the convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.
Further, the five decoders of the decoder paths in the step 1 are composed of five residual modules, five attention modules, four-layer transpose convolution layers and one sort convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.
Further, the loss function of the whole U-shaped network training process in the step 4 constructs binary cross entropy loss through the network output image and the label image marked manually by people, and the binary image segmentation precision of the network output is dynamically adjusted by the minimum loss function.
Further, in the step 4, sensitivity (SE), specificity (SP), accuracy (Accuracy, ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.
(III) beneficial effects
Compared with the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which has the following beneficial effects:
the invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.
The invention introduces a new attention module between each layer of the same-level paths of the attention coding path and the attention decoding path, extracts useful information of the same-level layers of the coding path and splices the useful information into the corresponding decoding path layers; compared with the existing method of adding the attention module only at the junction point of the encoding and decoding paths, the method improves the accurate value by 1.1013, further improves the extraction capability of the network on the image characteristic information, and avoids the loss of detail information.
The invention introduces a new attention coding module in the network coding path, and designs the new attention coding module behind the residual layer of each layer; introducing a new attention decoding module into a network decoding path, designing the new attention decoding module into a transposed convolution layer of each layer, and forming a symmetrical network with the coding stage; compared with the existing method, the method increases the segmentation result of the number of the blood vessel segments of each image by approximately 20%, and further improves the semantic extraction capability of the peripheral capillaries of the retinal blood vessels.
Drawings
FIG. 1 is a flow chart of a method for mixed attention retinal vessel segmentation based on a residual U-network;
FIG. 2 is a network structure diagram of a mixed attention retinal vessel segmentation method based on a residual U-network;
FIG. 3 is a block diagram of a residual module according to the present invention;
FIG. 4 is a schematic diagram of the specific composition of each layer in the residual module according to the present invention;
FIG. 5 is an overall block diagram of an attention module according to the present invention;
FIG. 6 is a schematic diagram showing the specific components of a multi-head attention module of the encoding path according to the present invention;
FIG. 7 is a schematic diagram showing the specific components of a multi-head attention module of the decoding path according to the present invention;
FIG. 8 is a schematic diagram showing the specific components of a multi-head attention module of the splice path according to the present invention;
FIG. 9 is a comparative diagram of evaluation indexes of the prior art and the proposed method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a flow chart of a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which specifically comprises the following steps:
step 1, constructing a network model; the whole residual U-shaped network is composed of a coder and a decoder, and a new attention module is introduced between splicing paths of the coder and the decoder; the encoder path comprises five encoders, and consists of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the shallow characteristic information of the image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic information is obtained after the image passes through the attention module; the attention coding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; after the attention encoding path is passed, the number of feature images is rich, and the feature images are passed through the attention decoding path to obtain a final binary segmentation image; the attention decoding path comprises five decoders, and consists of five residual error modules, five attention modules, four-layer transposition convolution layers and one classification convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; the convolution kernel sizes of the residual error module and the transposition convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, and a final high-precision segmentation image is obtained after passing through an attention decoding path.
Step 2, preparing a data set; the pre-training data set uses 40 images of the fundus retina DRIVE color data set and 28 images of the fundus retina CHASE_DB1 color data set, performs image enhancement operation on the images in the two sets of data sets, improves image contrast, performs data clipping, scaling and rotation operation on the two sets of data sets respectively, and inputs the preprocessed images into a training network.
Step 3, training a network model; training a fundus retina segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, obtaining training weights, and further segmenting a retina image to obtain a segmentation result.
Step 4, selecting a proper loss function and determining an optimal evaluation index of the segmentation method; selecting a proper loss function to minimize loss of a binary segmentation image output by a network and a real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing a model until training times reach the set threshold value or the value of the loss function reaches a set range, and considering that model parameters are pre-trained and saving the model parameters; selecting an optimal evaluation index for retinal image segmentation to measure the segmentation precision and performance of the model; the loss function of the whole U-shaped network training constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted; the binary cross entropy loss is selected as the most effective loss function specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image of the manual marked image can be accurately estimated and adjusted, so that the model output precision is higher; the training process uses Sensitivity (SE), specificity (SP), accuracy (ACC) and area under a segmentation working characteristic curve (AreaUnderRoc, AUC) as indexes for evaluating the quality of a segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of a segmented retinal vascular binary image in an initial whole image, and the segmentation efficiency of a network is improved.
Step 5, determining a segmentation model; and solidifying network model parameters to determine a final segmentation model, wherein if retinal image segmentation is carried out, fundus color images can be directly input into the trained end-to-end network model to obtain a final binary retinal segmentation image.
Example 2:
the residual U-shaped network model structure in the step 1 is shown in figure 2; the whole residual U-shaped network adopts a coder-decoder structure, and the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module.
The encoder path is composed of five encoders, and the number of characteristic diagram channels of the encoder I, the encoder II, the encoder III, the encoder IV and the encoder V is 16, 32, 64, 128 and 256 respectively; the first four encoders comprise a residual error module, an attention coding module and a pooling downsampling module, the fifth encoder comprises the residual error module and the attention coding module, the pooling downsampling module is not added any more, experiments show that the size of an image after four downsampling processes is very small, and excessive image information can be lost if downsampling is carried out again; the specific composition of the residual error module is shown in figure 3, the specific composition of each layer of convolution is shown in figure 4, the size of a small batch is 8, the size of a convolution kernel is 3 multiplied by 3, the step length is 1, and the discarding rate is 0.5; the overall composition of the attention module is as shown in fig. 5, firstly, input data are normalized by using batch normalization, then, the global information of images is extracted by using multi-head attention and is fused with the input data, the information extraction rate is improved, through the second batch normalization, multi-channel information is integrated by using a multi-layer perceptron, finally, output data are activated by using a T-shaped activation function, and are fused with intermediate fusion features, so that the input and output are ensured to be identical mapping relation, and finally, a feature image with richer information is obtained; the multi-head attention mechanism structure of the attention coding module is shown in FIG. 6, and the output image of the residual module is firstly subjected to dimension reduction under the separable convolution action of 1×1 to generate Q o ,K o ,V o Feature matrix, down sampling three groups of feature matrix to reduce its parameter, calculating probability value by softmax regression, and calculating the probability valueMultiplying the probability value with the V feature matrix, then using 1X 1 to carry out dimension lifting on the feature image, and finally carrying out point multiplication operation on the feature image and the original input image to obtain an output image after an attention mechanism.
The attention formula of the encoding path can be designed as:
wherein Conv (-) represents a convolution operation, I o Represents the input original image, and d represents the number of channels corresponding to each head.
The decoder path is composed of five decoders, and the number of the channels of the feature map of the first decoder, the second decoder, the third decoder, the fourth decoder and the fifth decoder is 16, 32, 64, 128 and 256 respectively; the first four decoders comprise a residual error module, an attention decoding module, a transposition convolution module and a classification convolution layer with the channel number of 1 multiplied by 1 being n, and are used for reducing the dimension of an output characteristic diagram and outputting the characteristic diagram classified by n, wherein n is set as 2 in the invention, which indicates that the output is retinal vascular foreground and background, and the fifth decoder comprises the residual error module and the attention decoding module; the residual error module is the same as the residual error module in the coding path; the multi-head attention mechanism structure of the attention decoding module is shown in fig. 7, and the multi-head attention input image of the decoder is derived from the front layer decoded image and the same-level encoded spliced image, so that the front layer decoded image is firstly adopted by a transposed convolution block, the consistent size of the image is ensured, and K is obtained d ,V d Feature matrix, for the same-level encoded spliced image, using separable 1×1 convolution block to reduce dimension to generate Q e The matrix, the latter feature processing section, is identical to the attention encoding module.
The attention formula of the decoding path can be designed as:
wherein Conv (-) represents a convolution operation, I d Representing a front layer decoded input image,I e Representing the same-level encoded stitched image, and d represents the number of channels corresponding to each header.
As shown in FIG. 8, the multi-head attention module of the splicing path needs to be matched with the resolution of the image and the number of channels in advance in the splicing operation, and the attention mechanism constructs a global characteristic information matrix, so that the spatial position information is ignored to a certain extent, and the splicing operation can be completed better by introducing the spatial information matrix; the attention module of the splicing path and the attention module of the coding path are structurally added into a spatial information matrix, and other structures are the same. The spatial information matrix is composed of a channel position matrix and a pixel position matrix, so that the problem of image spatial position information loss in the splicing process is solved.
The attention formula of the splice path can be designed as:
wherein Conv (-) represents a convolution operation, I o Representing the input original image S T Representing a spatial feature matrix, d representing the number of channels corresponding to each head.
The spatial feature matrix may be defined as:
S T =P T +L T
wherein P is T Representing a matrix of channel locations, L T Representing a matrix of pixel locations.
Considering the robustness of the final model, the segmentation model can be applied to a plurality of data sets successfully, the generalization capability of the model is improved, a P-type nonlinear activation function is used in each layer of convolution of a residual error module, a T-type nonlinear activation function is used in an attention mechanism integral module, and the P-type function and the T-type nonlinear activation function are defined as follows:
the fundus retina dataset in step 2 uses a fundus retina DRIVE color dataset and a fundus retina chase_db1 color dataset, DRIVE being a diabetic retinopathy screening program from the netherlands, comprising 40 images obtained using a CR5 non-mydriatic 3CCD camera, a depression angle of 45 degrees, each image resolution of 584 x 565 pixels; 20 of the 40 images are used for network training and 20 images are used for network testing; chaSE_DB1 contained 28 color retinal images of 999 x 960 pixels in size taken from the left and right eyes of 14 children; and (3) performing image enhancement operation on the images in the two sets of data sets to improve the image contrast, performing data random clipping, scaling and rotation operation augmentation treatment on the two sets of data sets respectively, finally, using the images with the size of 256 multiplied by 256 resolution of each original image as input, expanding each original image into 242 images through clipping and rotation operation, wherein the total of DRIVE data sets is 9680 images, and the total of CHASE_DB1 is 6776 images.
The design of the loss function in the step 4 is to measure the similarity between the predicted value of the network and the label, and the better the loss function is selected, the better the performance of the network is. The loss function in the training process constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted.
The binary cross entropy loss is an effective means specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image marked manually by human can be accurately estimated and adjusted, so that the model output precision is higher. The binary cross entropy loss is defined as:
w out weights representing output loss terms, l out Indicating output loss.
For each term i we use standard binary cross entropy to calculate the loss:
where (r, c) represents pixel coordinates and (H, W) represents the height and width of the image; p (P) G(r,c) Representing the probability that the pixel points are mapped through a Sigmoid function and output as vascular pixels; p (P) S(r,c) Representing the probability that the pixel point is mapped by a Sigmoid function and output as a non-vascular pixel; attempting to minimize binary cross entropy loss during training
The Sigmoid function is defined as follows:
the binary cross entropy loss is used in a segmentation network of retinal vascular images by checking each pixel one by one and comparing the class prediction vector with a target vector coded by a hot spot, thereby being beneficial to segmenting the binary images with higher precision.
And in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are selected as indexes for evaluating the quality of the segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area ratio of the segmented retinal vascular binary image in the initial whole image, and the segmentation efficiency of the network is improved. Sensitivity, specificity, accuracy are defined as follows:
wherein TP represents the number of correctly segmented foreground pixels, FP represents the number of background pixels that are incorrectly segmented into foreground pixels, TN represents the number of correctly segmented background pixels, FN represents the number of foreground pixels that are incorrectly segmented into background pixels; tp+fn+tn+fp represents the total number of pixels of the image, tp+fn represents the actual number of foreground pixels, and tn+fp represents the number of pixels whose prediction result is foreground.
The area under the segmentation working characteristic curve is expressed by AUC, namely the proportion of the area under the curve to the total number of pixels; sometimes interesting curves of different segmentation algorithms are crossed, so that AUC values are used as judgment standards of algorithm quality in many cases; the larger the area, the better the classification performance.
The area under the split operating characteristic curve can be defined as:
wherein S is p Representing the number of area pixels under the curve S t Representing the total number of pixels of the whole image area.
Optimizing a network model, setting the training times to be 200, using an Adam optimizer, setting the learning rate to be 0.001, multiplying the learning rate attenuation rate by 0.1 for every training 10 rounds, setting the loss threshold to be 0.0002, continuously iterating the training times, and considering that the network is basically trained when the training loss approaches the loss threshold infinitely.
After the network training is completed in the step 5, all the trained parameters in the network are required to be solidified, and a final segmentation model is determined, if the retinal vascular image segmentation task is performed, fundus color images can be directly input into the trained end-to-end network model, so that a final binary segmentation image is obtained.
The implementation of convolution, activation functions, splicing operations, batch normalization, multi-layer perceptron and the like are algorithms well known to those skilled in the art, and specific procedures and methods can be referred to in corresponding textbooks or technical literature.
According to the invention, the mixed attention retinal vessel segmentation method based on the residual U-shaped network is designed and applied to fundus retinal vessel image segmentation tasks, so that end-to-end network input and output are realized, and the problem that a traditional manual segmentation complex method is used in clinical image segmentation tasks for a long time is solved well, so that the retinal image segmentation tasks become simple, and the realization effect is more efficient; under the same condition, the feasibility and the superiority of the method are further verified by calculating the related index of the binary image obtained by the existing method.
The comparison of the evaluation indexes of the prior art and the method provided by the invention is shown in fig. 9, and as can be seen from the graph, the method provided by the invention has higher sensitivity, specificity, accuracy and larger area under the segmentation working characteristic curve than the prior art, and in the test stage, the average segmentation time of each image only needs 1.03 seconds; these indices further illustrate that the proposed method has better segmentation quality, achieving the desired effect.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A mixed attention retinal vessel segmentation method based on a residual U-shaped network is characterized in that: the method comprises the following steps:
step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder part comprises a residual error module, an attention decoding module, a transpose convolution up-sampling module and a classification convolution module;
step 2, preparing a data set: the method uses a fundus retina DRIVE color dataset and a fundus retina CHASE_DB1 color dataset, performs image enhancement operation on the two sets of datasets, improves image contrast, performs image enhancement pretreatment on the two sets of datasets respectively, and expands the datasets;
step 3, training a network model: training a fundus retina image segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, and obtaining training weights;
step 4: selecting a proper loss function and determining an optimal evaluation index of the segmentation method: selecting a proper loss function to minimize the loss of the weight of the output image and the real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing the model until the training times reach the set threshold value or the value of the loss function reaches the set threshold value range, and considering that the model parameters are pre-trained and the model parameters are saved;
step 5, determining a segmentation model: and solidifying network model parameters to determine a final segmentation model, wherein if a retina image segmentation task is carried out, fundus retina color images can be directly input into the trained end-to-end network model to obtain a final retina binary segmentation image.
2. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five encoders of the encoder path in the step 1 consist of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.
3. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the residual error module in the step 1 consists of a first convolution layer and a second convolution layer, each convolution layer consists of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of a convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.
4. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five decoders of the decoder paths in the step 1 are composed of five residual error modules, five attention modules, four transposed convolution layers and a classified convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.
5. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: and 4, constructing a binary cross entropy loss by a loss function of the whole U-shaped network training process through a network output image and a label image marked manually by people, and dynamically adjusting the segmentation precision of the binary image output by the network by the loss function to be minimized.
6. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310106849.6A CN116363060B (en) | 2023-02-14 | 2023-02-14 | Mixed attention retinal vessel segmentation method based on residual U-shaped network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310106849.6A CN116363060B (en) | 2023-02-14 | 2023-02-14 | Mixed attention retinal vessel segmentation method based on residual U-shaped network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116363060A true CN116363060A (en) | 2023-06-30 |
CN116363060B CN116363060B (en) | 2024-08-16 |
Family
ID=86905899
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310106849.6A Active CN116363060B (en) | 2023-02-14 | 2023-02-14 | Mixed attention retinal vessel segmentation method based on residual U-shaped network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116363060B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116843685A (en) * | 2023-08-31 | 2023-10-03 | 山东大学 | 3D printing workpiece defect identification method and system based on image detection |
CN117274256A (en) * | 2023-11-21 | 2023-12-22 | 首都医科大学附属北京安定医院 | Pain assessment method, system and equipment based on pupil change |
CN117409100A (en) * | 2023-12-15 | 2024-01-16 | 山东师范大学 | CBCT image artifact correction system and method based on convolutional neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114240949A (en) * | 2021-11-18 | 2022-03-25 | 上海浦东发展银行股份有限公司 | Cervical cell segmentation network training method, cervical cell segmentation method and cervical cell segmentation device |
CN114283158A (en) * | 2021-12-08 | 2022-04-05 | 重庆邮电大学 | Retinal blood vessel image segmentation method and device and computer equipment |
CN114359292A (en) * | 2021-12-10 | 2022-04-15 | 南昌大学 | Medical image segmentation method based on multi-scale and attention |
CN114881962A (en) * | 2022-04-28 | 2022-08-09 | 桂林理工大学 | Retina image blood vessel segmentation method based on improved U-Net network |
WO2022199143A1 (en) * | 2021-03-26 | 2022-09-29 | 南京邮电大学 | Medical image segmentation method based on u-shaped network |
-
2023
- 2023-02-14 CN CN202310106849.6A patent/CN116363060B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022199143A1 (en) * | 2021-03-26 | 2022-09-29 | 南京邮电大学 | Medical image segmentation method based on u-shaped network |
CN114240949A (en) * | 2021-11-18 | 2022-03-25 | 上海浦东发展银行股份有限公司 | Cervical cell segmentation network training method, cervical cell segmentation method and cervical cell segmentation device |
CN114283158A (en) * | 2021-12-08 | 2022-04-05 | 重庆邮电大学 | Retinal blood vessel image segmentation method and device and computer equipment |
CN114359292A (en) * | 2021-12-10 | 2022-04-15 | 南昌大学 | Medical image segmentation method based on multi-scale and attention |
CN114881962A (en) * | 2022-04-28 | 2022-08-09 | 桂林理工大学 | Retina image blood vessel segmentation method based on improved U-Net network |
Non-Patent Citations (3)
Title |
---|
徐宏伟 等: "基于残差双注意力U-Net模型的CT图像囊肿肾脏自动分割", 《计算机应用研究》, vol. 37, no. 07, 31 July 2020 (2020-07-31), pages 2237 - 2240 * |
梁礼明 等: "多尺度特征融合双U型视网膜分割算法", 光电子·激光, vol. 33, no. 3, 31 March 2022 (2022-03-31), pages 272 - 282 * |
胡扬涛 等: "空洞残差U型网络用于视网膜血管分割", 计算机工程与应用, vol. 57, no. 7, 1 April 2021 (2021-04-01), pages 185 - 191 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116843685A (en) * | 2023-08-31 | 2023-10-03 | 山东大学 | 3D printing workpiece defect identification method and system based on image detection |
CN116843685B (en) * | 2023-08-31 | 2023-12-12 | 山东大学 | 3D printing workpiece defect identification method and system based on image detection |
CN117274256A (en) * | 2023-11-21 | 2023-12-22 | 首都医科大学附属北京安定医院 | Pain assessment method, system and equipment based on pupil change |
CN117274256B (en) * | 2023-11-21 | 2024-02-06 | 首都医科大学附属北京安定医院 | Pain assessment method, system and equipment based on pupil change |
CN117409100A (en) * | 2023-12-15 | 2024-01-16 | 山东师范大学 | CBCT image artifact correction system and method based on convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN116363060B (en) | 2024-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116363060B (en) | Mixed attention retinal vessel segmentation method based on residual U-shaped network | |
CN111784671B (en) | Pathological image focus region detection method based on multi-scale deep learning | |
CN111754520B (en) | Deep learning-based cerebral hematoma segmentation method and system | |
CN109685813A (en) | A kind of U-shaped Segmentation Method of Retinal Blood Vessels of adaptive scale information | |
CN109448006A (en) | A kind of U-shaped intensive connection Segmentation Method of Retinal Blood Vessels of attention mechanism | |
CN113205538A (en) | Blood vessel image segmentation method and device based on CRDNet | |
CN112258488A (en) | Medical image focus segmentation method | |
CN113689954B (en) | Hypertension risk prediction method, device, equipment and medium | |
CN113205524B (en) | Blood vessel image segmentation method, device and equipment based on U-Net | |
CN112001928A (en) | Retinal vessel segmentation method and system | |
CN113012163A (en) | Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network | |
CN112884788B (en) | Cup optic disk segmentation method and imaging method based on rich context network | |
CN116071292A (en) | Ophthalmoscope retina image blood vessel identification method based on contrast generation learning | |
CN113160226A (en) | Two-way guide network-based classification segmentation method and system for AMD lesion OCT image | |
CN115294075A (en) | OCTA image retinal vessel segmentation method based on attention mechanism | |
CN115908241A (en) | Retinal vessel segmentation method based on fusion of UNet and Transformer | |
CN115471470A (en) | Esophageal cancer CT image segmentation method | |
CN117876242B (en) | Fundus image enhancement method, fundus image enhancement device, fundus image enhancement apparatus, and fundus image enhancement program | |
CN117934824A (en) | Target region segmentation method and system for ultrasonic image and electronic equipment | |
CN116228785A (en) | Pneumonia CT image segmentation method based on improved Unet network | |
CN117611824A (en) | Digital retina image segmentation method based on improved UNET | |
CN114972365A (en) | OCT image choroid segmentation model construction method combined with prior mask and application thereof | |
CN114820632A (en) | Retinal vessel image segmentation method based on two-channel U-shaped improved Transformer network | |
Yang et al. | AMF-NET: Attention-aware multi-scale fusion network for retinal vessel segmentation | |
CN117522893A (en) | Fundus blood vessel segmentation method based on level set segmentation region prototype correction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |