CN116363060A - Mixed attention retinal vessel segmentation method based on residual U-shaped network - Google Patents

Mixed attention retinal vessel segmentation method based on residual U-shaped network Download PDF

Info

Publication number
CN116363060A
CN116363060A CN202310106849.6A CN202310106849A CN116363060A CN 116363060 A CN116363060 A CN 116363060A CN 202310106849 A CN202310106849 A CN 202310106849A CN 116363060 A CN116363060 A CN 116363060A
Authority
CN
China
Prior art keywords
module
attention
residual
image
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310106849.6A
Other languages
Chinese (zh)
Inventor
詹伟达
郭金鑫
于永吉
李鑫
李国宁
韩登
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202310106849.6A priority Critical patent/CN116363060A/en
Publication of CN116363060A publication Critical patent/CN116363060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which comprises the following steps: step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module. The invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.

Description

Mixed attention retinal vessel segmentation method based on residual U-shaped network
Technical Field
The invention relates to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network.
Background
The common color fundus image comprises structures such as retinal blood vessels, visual cups, optic discs and macula, wherein the abnormal shapes of the retinal blood vessels reflect early symptoms of various diseases of a human body, and analyzing characteristic structures such as the length, the width and the curvature of the retinal blood vessels is beneficial to a doctor to carry out rapid clinical pathological diagnosis, accurately grasp pathological situations of patients and provide powerful diagnosis basis for prevention and treatment of some diseases. The implementation difficulty of the retinal vessel segmentation technology is exacerbated by the complexity of the retinal vessel itself, particularly the extremely large number of capillary vessel branches at the fundus image tip. And the capillary vessel part at the fundus retina image tip is easily affected by acousto-optic and noise, so that the collected medical picture has low quality, less detail information, image blurring and other phenomena, and is unfavorable for clinical medical diagnosis.
The Chinese patent publication No. CN113487615A, named as "retina segmentation method and terminal based on residual network feature extraction", is characterized in that an original retina vascular image is firstly passed through a pretrained VGG coding layer to obtain five images; then five feature images are connected, decoded and focused to obtain a first output image; multiplying and convoluting the original retinal vascular image with the first output image to obtain a first intermediate image; obtaining the rest four intermediate images through the four residual error coding layers; then five intermediate images and five characteristic images are connected with an image connecting and decoding layer to obtain a second output image; and obtaining the retina blood vessel image after feature extraction through the connecting layer by the first output image and the second output image. The retinal blood vessel segmentation method has the advantages of low precision, low segmentation speed, poor capillary vessel tip segmentation effect and the problem of loss of image detail information.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which solves the problems of low segmentation precision, poor capillary vessel tip segmentation effect, low segmentation speed and loss of image detail information of the existing fundus image retinal vessel segmentation method.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a mixed attention retinal vessel segmentation method based on a residual U-shaped network comprises the following steps:
step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder part comprises a residual error module, an attention decoding module, a transpose convolution up-sampling module and a classification convolution module;
step 2, preparing a data set: the method uses a fundus retina DRIVE color dataset and a fundus retina CHASE_DB1 color dataset, performs image enhancement operation on the two sets of datasets, improves image contrast, performs image enhancement pretreatment on the two sets of datasets respectively, and expands the datasets;
step 3, training a network model: training a fundus retina image segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, and obtaining training weights;
step 4: selecting a proper loss function and determining an optimal evaluation index of the segmentation method: selecting a proper loss function to minimize the loss of the weight of the output image and the real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing the model until the training times reach the set threshold value or the value of the loss function reaches the set threshold value range, and considering that the model parameters are pre-trained and the model parameters are saved;
step 5, determining a segmentation model: and solidifying network model parameters to determine a final segmentation model, wherein if a retina image segmentation task is carried out, fundus retina color images can be directly input into the trained end-to-end network model to obtain a final retina binary segmentation image.
Further, the five encoders of the encoder path in the step 1 are composed of five residual error modules, five attention encoding modules and four M-pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.
Further, in the step 1, the residual error module is composed of a first convolution layer and a second convolution layer, each convolution layer is composed of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of the convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.
Further, the five decoders of the decoder paths in the step 1 are composed of five residual modules, five attention modules, four-layer transpose convolution layers and one sort convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.
Further, the loss function of the whole U-shaped network training process in the step 4 constructs binary cross entropy loss through the network output image and the label image marked manually by people, and the binary image segmentation precision of the network output is dynamically adjusted by the minimum loss function.
Further, in the step 4, sensitivity (SE), specificity (SP), accuracy (Accuracy, ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.
(III) beneficial effects
Compared with the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which has the following beneficial effects:
the invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.
The invention introduces a new attention module between each layer of the same-level paths of the attention coding path and the attention decoding path, extracts useful information of the same-level layers of the coding path and splices the useful information into the corresponding decoding path layers; compared with the existing method of adding the attention module only at the junction point of the encoding and decoding paths, the method improves the accurate value by 1.1013, further improves the extraction capability of the network on the image characteristic information, and avoids the loss of detail information.
The invention introduces a new attention coding module in the network coding path, and designs the new attention coding module behind the residual layer of each layer; introducing a new attention decoding module into a network decoding path, designing the new attention decoding module into a transposed convolution layer of each layer, and forming a symmetrical network with the coding stage; compared with the existing method, the method increases the segmentation result of the number of the blood vessel segments of each image by approximately 20%, and further improves the semantic extraction capability of the peripheral capillaries of the retinal blood vessels.
Drawings
FIG. 1 is a flow chart of a method for mixed attention retinal vessel segmentation based on a residual U-network;
FIG. 2 is a network structure diagram of a mixed attention retinal vessel segmentation method based on a residual U-network;
FIG. 3 is a block diagram of a residual module according to the present invention;
FIG. 4 is a schematic diagram of the specific composition of each layer in the residual module according to the present invention;
FIG. 5 is an overall block diagram of an attention module according to the present invention;
FIG. 6 is a schematic diagram showing the specific components of a multi-head attention module of the encoding path according to the present invention;
FIG. 7 is a schematic diagram showing the specific components of a multi-head attention module of the decoding path according to the present invention;
FIG. 8 is a schematic diagram showing the specific components of a multi-head attention module of the splice path according to the present invention;
FIG. 9 is a comparative diagram of evaluation indexes of the prior art and the proposed method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a flow chart of a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which specifically comprises the following steps:
step 1, constructing a network model; the whole residual U-shaped network is composed of a coder and a decoder, and a new attention module is introduced between splicing paths of the coder and the decoder; the encoder path comprises five encoders, and consists of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the shallow characteristic information of the image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic information is obtained after the image passes through the attention module; the attention coding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; after the attention encoding path is passed, the number of feature images is rich, and the feature images are passed through the attention decoding path to obtain a final binary segmentation image; the attention decoding path comprises five decoders, and consists of five residual error modules, five attention modules, four-layer transposition convolution layers and one classification convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; the convolution kernel sizes of the residual error module and the transposition convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, and a final high-precision segmentation image is obtained after passing through an attention decoding path.
Step 2, preparing a data set; the pre-training data set uses 40 images of the fundus retina DRIVE color data set and 28 images of the fundus retina CHASE_DB1 color data set, performs image enhancement operation on the images in the two sets of data sets, improves image contrast, performs data clipping, scaling and rotation operation on the two sets of data sets respectively, and inputs the preprocessed images into a training network.
Step 3, training a network model; training a fundus retina segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, obtaining training weights, and further segmenting a retina image to obtain a segmentation result.
Step 4, selecting a proper loss function and determining an optimal evaluation index of the segmentation method; selecting a proper loss function to minimize loss of a binary segmentation image output by a network and a real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing a model until training times reach the set threshold value or the value of the loss function reaches a set range, and considering that model parameters are pre-trained and saving the model parameters; selecting an optimal evaluation index for retinal image segmentation to measure the segmentation precision and performance of the model; the loss function of the whole U-shaped network training constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted; the binary cross entropy loss is selected as the most effective loss function specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image of the manual marked image can be accurately estimated and adjusted, so that the model output precision is higher; the training process uses Sensitivity (SE), specificity (SP), accuracy (ACC) and area under a segmentation working characteristic curve (AreaUnderRoc, AUC) as indexes for evaluating the quality of a segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of a segmented retinal vascular binary image in an initial whole image, and the segmentation efficiency of a network is improved.
Step 5, determining a segmentation model; and solidifying network model parameters to determine a final segmentation model, wherein if retinal image segmentation is carried out, fundus color images can be directly input into the trained end-to-end network model to obtain a final binary retinal segmentation image.
Example 2:
the residual U-shaped network model structure in the step 1 is shown in figure 2; the whole residual U-shaped network adopts a coder-decoder structure, and the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module.
The encoder path is composed of five encoders, and the number of characteristic diagram channels of the encoder I, the encoder II, the encoder III, the encoder IV and the encoder V is 16, 32, 64, 128 and 256 respectively; the first four encoders comprise a residual error module, an attention coding module and a pooling downsampling module, the fifth encoder comprises the residual error module and the attention coding module, the pooling downsampling module is not added any more, experiments show that the size of an image after four downsampling processes is very small, and excessive image information can be lost if downsampling is carried out again; the specific composition of the residual error module is shown in figure 3, the specific composition of each layer of convolution is shown in figure 4, the size of a small batch is 8, the size of a convolution kernel is 3 multiplied by 3, the step length is 1, and the discarding rate is 0.5; the overall composition of the attention module is as shown in fig. 5, firstly, input data are normalized by using batch normalization, then, the global information of images is extracted by using multi-head attention and is fused with the input data, the information extraction rate is improved, through the second batch normalization, multi-channel information is integrated by using a multi-layer perceptron, finally, output data are activated by using a T-shaped activation function, and are fused with intermediate fusion features, so that the input and output are ensured to be identical mapping relation, and finally, a feature image with richer information is obtained; the multi-head attention mechanism structure of the attention coding module is shown in FIG. 6, and the output image of the residual module is firstly subjected to dimension reduction under the separable convolution action of 1×1 to generate Q o ,K o ,V o Feature matrix, down sampling three groups of feature matrix to reduce its parameter, calculating probability value by softmax regression, and calculating the probability valueMultiplying the probability value with the V feature matrix, then using 1X 1 to carry out dimension lifting on the feature image, and finally carrying out point multiplication operation on the feature image and the original input image to obtain an output image after an attention mechanism.
The attention formula of the encoding path can be designed as:
Figure BDA0004075439540000081
wherein Conv (-) represents a convolution operation, I o Represents the input original image, and d represents the number of channels corresponding to each head.
The decoder path is composed of five decoders, and the number of the channels of the feature map of the first decoder, the second decoder, the third decoder, the fourth decoder and the fifth decoder is 16, 32, 64, 128 and 256 respectively; the first four decoders comprise a residual error module, an attention decoding module, a transposition convolution module and a classification convolution layer with the channel number of 1 multiplied by 1 being n, and are used for reducing the dimension of an output characteristic diagram and outputting the characteristic diagram classified by n, wherein n is set as 2 in the invention, which indicates that the output is retinal vascular foreground and background, and the fifth decoder comprises the residual error module and the attention decoding module; the residual error module is the same as the residual error module in the coding path; the multi-head attention mechanism structure of the attention decoding module is shown in fig. 7, and the multi-head attention input image of the decoder is derived from the front layer decoded image and the same-level encoded spliced image, so that the front layer decoded image is firstly adopted by a transposed convolution block, the consistent size of the image is ensured, and K is obtained d ,V d Feature matrix, for the same-level encoded spliced image, using separable 1×1 convolution block to reduce dimension to generate Q e The matrix, the latter feature processing section, is identical to the attention encoding module.
The attention formula of the decoding path can be designed as:
Figure BDA0004075439540000091
wherein Conv (-) represents a convolution operation, I d Representing a front layer decoded input image,I e Representing the same-level encoded stitched image, and d represents the number of channels corresponding to each header.
As shown in FIG. 8, the multi-head attention module of the splicing path needs to be matched with the resolution of the image and the number of channels in advance in the splicing operation, and the attention mechanism constructs a global characteristic information matrix, so that the spatial position information is ignored to a certain extent, and the splicing operation can be completed better by introducing the spatial information matrix; the attention module of the splicing path and the attention module of the coding path are structurally added into a spatial information matrix, and other structures are the same. The spatial information matrix is composed of a channel position matrix and a pixel position matrix, so that the problem of image spatial position information loss in the splicing process is solved.
The attention formula of the splice path can be designed as:
Figure BDA0004075439540000101
wherein Conv (-) represents a convolution operation, I o Representing the input original image S T Representing a spatial feature matrix, d representing the number of channels corresponding to each head.
The spatial feature matrix may be defined as:
S T =P T +L T
wherein P is T Representing a matrix of channel locations, L T Representing a matrix of pixel locations.
Considering the robustness of the final model, the segmentation model can be applied to a plurality of data sets successfully, the generalization capability of the model is improved, a P-type nonlinear activation function is used in each layer of convolution of a residual error module, a T-type nonlinear activation function is used in an attention mechanism integral module, and the P-type function and the T-type nonlinear activation function are defined as follows:
Figure BDA0004075439540000102
Figure BDA0004075439540000103
the fundus retina dataset in step 2 uses a fundus retina DRIVE color dataset and a fundus retina chase_db1 color dataset, DRIVE being a diabetic retinopathy screening program from the netherlands, comprising 40 images obtained using a CR5 non-mydriatic 3CCD camera, a depression angle of 45 degrees, each image resolution of 584 x 565 pixels; 20 of the 40 images are used for network training and 20 images are used for network testing; chaSE_DB1 contained 28 color retinal images of 999 x 960 pixels in size taken from the left and right eyes of 14 children; and (3) performing image enhancement operation on the images in the two sets of data sets to improve the image contrast, performing data random clipping, scaling and rotation operation augmentation treatment on the two sets of data sets respectively, finally, using the images with the size of 256 multiplied by 256 resolution of each original image as input, expanding each original image into 242 images through clipping and rotation operation, wherein the total of DRIVE data sets is 9680 images, and the total of CHASE_DB1 is 6776 images.
The design of the loss function in the step 4 is to measure the similarity between the predicted value of the network and the label, and the better the loss function is selected, the better the performance of the network is. The loss function in the training process constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted.
The binary cross entropy loss is an effective means specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image marked manually by human can be accurately estimated and adjusted, so that the model output precision is higher. The binary cross entropy loss is defined as:
Figure BDA0004075439540000113
w out weights representing output loss terms, l out Indicating output loss.
For each term i we use standard binary cross entropy to calculate the loss:
Figure BDA0004075439540000111
where (r, c) represents pixel coordinates and (H, W) represents the height and width of the image; p (P) G(r,c) Representing the probability that the pixel points are mapped through a Sigmoid function and output as vascular pixels; p (P) S(r,c) Representing the probability that the pixel point is mapped by a Sigmoid function and output as a non-vascular pixel; attempting to minimize binary cross entropy loss during training
Figure BDA0004075439540000114
The Sigmoid function is defined as follows:
Figure BDA0004075439540000112
the binary cross entropy loss is used in a segmentation network of retinal vascular images by checking each pixel one by one and comparing the class prediction vector with a target vector coded by a hot spot, thereby being beneficial to segmenting the binary images with higher precision.
And in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are selected as indexes for evaluating the quality of the segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area ratio of the segmented retinal vascular binary image in the initial whole image, and the segmentation efficiency of the network is improved. Sensitivity, specificity, accuracy are defined as follows:
Figure BDA0004075439540000121
Figure BDA0004075439540000122
Figure BDA0004075439540000123
wherein TP represents the number of correctly segmented foreground pixels, FP represents the number of background pixels that are incorrectly segmented into foreground pixels, TN represents the number of correctly segmented background pixels, FN represents the number of foreground pixels that are incorrectly segmented into background pixels; tp+fn+tn+fp represents the total number of pixels of the image, tp+fn represents the actual number of foreground pixels, and tn+fp represents the number of pixels whose prediction result is foreground.
The area under the segmentation working characteristic curve is expressed by AUC, namely the proportion of the area under the curve to the total number of pixels; sometimes interesting curves of different segmentation algorithms are crossed, so that AUC values are used as judgment standards of algorithm quality in many cases; the larger the area, the better the classification performance.
The area under the split operating characteristic curve can be defined as:
Figure BDA0004075439540000131
wherein S is p Representing the number of area pixels under the curve S t Representing the total number of pixels of the whole image area.
Optimizing a network model, setting the training times to be 200, using an Adam optimizer, setting the learning rate to be 0.001, multiplying the learning rate attenuation rate by 0.1 for every training 10 rounds, setting the loss threshold to be 0.0002, continuously iterating the training times, and considering that the network is basically trained when the training loss approaches the loss threshold infinitely.
After the network training is completed in the step 5, all the trained parameters in the network are required to be solidified, and a final segmentation model is determined, if the retinal vascular image segmentation task is performed, fundus color images can be directly input into the trained end-to-end network model, so that a final binary segmentation image is obtained.
The implementation of convolution, activation functions, splicing operations, batch normalization, multi-layer perceptron and the like are algorithms well known to those skilled in the art, and specific procedures and methods can be referred to in corresponding textbooks or technical literature.
According to the invention, the mixed attention retinal vessel segmentation method based on the residual U-shaped network is designed and applied to fundus retinal vessel image segmentation tasks, so that end-to-end network input and output are realized, and the problem that a traditional manual segmentation complex method is used in clinical image segmentation tasks for a long time is solved well, so that the retinal image segmentation tasks become simple, and the realization effect is more efficient; under the same condition, the feasibility and the superiority of the method are further verified by calculating the related index of the binary image obtained by the existing method.
The comparison of the evaluation indexes of the prior art and the method provided by the invention is shown in fig. 9, and as can be seen from the graph, the method provided by the invention has higher sensitivity, specificity, accuracy and larger area under the segmentation working characteristic curve than the prior art, and in the test stage, the average segmentation time of each image only needs 1.03 seconds; these indices further illustrate that the proposed method has better segmentation quality, achieving the desired effect.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A mixed attention retinal vessel segmentation method based on a residual U-shaped network is characterized in that: the method comprises the following steps:
step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder part comprises a residual error module, an attention decoding module, a transpose convolution up-sampling module and a classification convolution module;
step 2, preparing a data set: the method uses a fundus retina DRIVE color dataset and a fundus retina CHASE_DB1 color dataset, performs image enhancement operation on the two sets of datasets, improves image contrast, performs image enhancement pretreatment on the two sets of datasets respectively, and expands the datasets;
step 3, training a network model: training a fundus retina image segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, and obtaining training weights;
step 4: selecting a proper loss function and determining an optimal evaluation index of the segmentation method: selecting a proper loss function to minimize the loss of the weight of the output image and the real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing the model until the training times reach the set threshold value or the value of the loss function reaches the set threshold value range, and considering that the model parameters are pre-trained and the model parameters are saved;
step 5, determining a segmentation model: and solidifying network model parameters to determine a final segmentation model, wherein if a retina image segmentation task is carried out, fundus retina color images can be directly input into the trained end-to-end network model to obtain a final retina binary segmentation image.
2. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five encoders of the encoder path in the step 1 consist of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.
3. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the residual error module in the step 1 consists of a first convolution layer and a second convolution layer, each convolution layer consists of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of a convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.
4. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five decoders of the decoder paths in the step 1 are composed of five residual error modules, five attention modules, four transposed convolution layers and a classified convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.
5. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: and 4, constructing a binary cross entropy loss by a loss function of the whole U-shaped network training process through a network output image and a label image marked manually by people, and dynamically adjusting the segmentation precision of the binary image output by the network by the loss function to be minimized.
6. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.
CN202310106849.6A 2023-02-14 2023-02-14 Mixed attention retinal vessel segmentation method based on residual U-shaped network Pending CN116363060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310106849.6A CN116363060A (en) 2023-02-14 2023-02-14 Mixed attention retinal vessel segmentation method based on residual U-shaped network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310106849.6A CN116363060A (en) 2023-02-14 2023-02-14 Mixed attention retinal vessel segmentation method based on residual U-shaped network

Publications (1)

Publication Number Publication Date
CN116363060A true CN116363060A (en) 2023-06-30

Family

ID=86905899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310106849.6A Pending CN116363060A (en) 2023-02-14 2023-02-14 Mixed attention retinal vessel segmentation method based on residual U-shaped network

Country Status (1)

Country Link
CN (1) CN116363060A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843685A (en) * 2023-08-31 2023-10-03 山东大学 3D printing workpiece defect identification method and system based on image detection
CN117274256A (en) * 2023-11-21 2023-12-22 首都医科大学附属北京安定医院 Pain assessment method, system and equipment based on pupil change
CN117409100A (en) * 2023-12-15 2024-01-16 山东师范大学 CBCT image artifact correction system and method based on convolutional neural network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843685A (en) * 2023-08-31 2023-10-03 山东大学 3D printing workpiece defect identification method and system based on image detection
CN116843685B (en) * 2023-08-31 2023-12-12 山东大学 3D printing workpiece defect identification method and system based on image detection
CN117274256A (en) * 2023-11-21 2023-12-22 首都医科大学附属北京安定医院 Pain assessment method, system and equipment based on pupil change
CN117274256B (en) * 2023-11-21 2024-02-06 首都医科大学附属北京安定医院 Pain assessment method, system and equipment based on pupil change
CN117409100A (en) * 2023-12-15 2024-01-16 山东师范大学 CBCT image artifact correction system and method based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN116363060A (en) Mixed attention retinal vessel segmentation method based on residual U-shaped network
CN109685813A (en) A kind of U-shaped Segmentation Method of Retinal Blood Vessels of adaptive scale information
CN109345538A (en) A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks
CN111754520B (en) Deep learning-based cerebral hematoma segmentation method and system
CN113205538A (en) Blood vessel image segmentation method and device based on CRDNet
CN108764342B (en) Semantic segmentation method for optic discs and optic cups in fundus image
CN116071292B (en) Ophthalmoscope retina image blood vessel identification method based on contrast generation learning
CN115205300A (en) Fundus blood vessel image segmentation method and system based on cavity convolution and semantic fusion
CN112884788B (en) Cup optic disk segmentation method and imaging method based on rich context network
CN113205524B (en) Blood vessel image segmentation method, device and equipment based on U-Net
CN111598894B (en) Retina blood vessel image segmentation system based on global information convolution neural network
CN112001928A (en) Retinal vessel segmentation method and system
CN113012163A (en) Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network
CN115578406B (en) CBCT jaw bone region segmentation method and system based on context fusion mechanism
CN110490843A (en) A kind of eye fundus image blood vessel segmentation method
CN115908241A (en) Retinal vessel segmentation method based on fusion of UNet and Transformer
CN114972365A (en) OCT image choroid segmentation model construction method combined with prior mask and application thereof
Yang et al. AMF-NET: Attention-aware multi-scale fusion network for retinal vessel segmentation
CN110992309B (en) Fundus image segmentation method based on deep information transfer network
CN115587967B (en) Fundus image optic disk detection method based on HA-UNet network
CN114820632A (en) Retinal vessel image segmentation method based on two-channel U-shaped improved Transformer network
CN110969117A (en) Fundus image segmentation method based on Attention mechanism and full convolution neural network
Zijian et al. AFFD-Net: A Dual-Decoder Network Based on Attention-Enhancing and Feature Fusion for Retinal Vessel Segmentation
CN117876242B (en) Fundus image enhancement method, fundus image enhancement device, fundus image enhancement apparatus, and fundus image enhancement program
CN110889859A (en) U-shaped network for fundus image blood vessel segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination