CN116363060A

CN116363060A - Mixed attention retinal vessel segmentation method based on residual U-shaped network

Info

Publication number: CN116363060A
Application number: CN202310106849.6A
Authority: CN
Inventors: 詹伟达; 郭金鑫; 于永吉; 李鑫; 李国宁; 韩登
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-30
Anticipated expiration: 2043-02-14
Also published as: CN116363060B

Abstract

The invention belongs to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which comprises the following steps: step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module. The invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.

Description

Mixed attention retinal vessel segmentation method based on residual U-shaped network

Technical Field

The invention relates to the technical field of medical image processing, in particular to a mixed attention retinal vessel segmentation method based on a residual U-shaped network.

Background

The common color fundus image comprises structures such as retinal blood vessels, visual cups, optic discs and macula, wherein the abnormal shapes of the retinal blood vessels reflect early symptoms of various diseases of a human body, and analyzing characteristic structures such as the length, the width and the curvature of the retinal blood vessels is beneficial to a doctor to carry out rapid clinical pathological diagnosis, accurately grasp pathological situations of patients and provide powerful diagnosis basis for prevention and treatment of some diseases. The implementation difficulty of the retinal vessel segmentation technology is exacerbated by the complexity of the retinal vessel itself, particularly the extremely large number of capillary vessel branches at the fundus image tip. And the capillary vessel part at the fundus retina image tip is easily affected by acousto-optic and noise, so that the collected medical picture has low quality, less detail information, image blurring and other phenomena, and is unfavorable for clinical medical diagnosis.

The Chinese patent publication No. CN113487615A, named as "retina segmentation method and terminal based on residual network feature extraction", is characterized in that an original retina vascular image is firstly passed through a pretrained VGG coding layer to obtain five images; then five feature images are connected, decoded and focused to obtain a first output image; multiplying and convoluting the original retinal vascular image with the first output image to obtain a first intermediate image; obtaining the rest four intermediate images through the four residual error coding layers; then five intermediate images and five characteristic images are connected with an image connecting and decoding layer to obtain a second output image; and obtaining the retina blood vessel image after feature extraction through the connecting layer by the first output image and the second output image. The retinal blood vessel segmentation method has the advantages of low precision, low segmentation speed, poor capillary vessel tip segmentation effect and the problem of loss of image detail information.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which solves the problems of low segmentation precision, poor capillary vessel tip segmentation effect, low segmentation speed and loss of image detail information of the existing fundus image retinal vessel segmentation method.

(II) technical scheme

The invention adopts the following technical scheme for realizing the purposes:

a mixed attention retinal vessel segmentation method based on a residual U-shaped network comprises the following steps:

step 1, constructing a network model: the whole residual U-shaped network consists of a coder and a decoder, wherein the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder part comprises a residual error module, an attention decoding module, a transpose convolution up-sampling module and a classification convolution module;

step 2, preparing a data set: the method uses a fundus retina DRIVE color dataset and a fundus retina CHASE_DB1 color dataset, performs image enhancement operation on the two sets of datasets, improves image contrast, performs image enhancement pretreatment on the two sets of datasets respectively, and expands the datasets;

step 3, training a network model: training a fundus retina image segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, and obtaining training weights;

step 4: selecting a proper loss function and determining an optimal evaluation index of the segmentation method: selecting a proper loss function to minimize the loss of the weight of the output image and the real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing the model until the training times reach the set threshold value or the value of the loss function reaches the set threshold value range, and considering that the model parameters are pre-trained and the model parameters are saved;

step 5, determining a segmentation model: and solidifying network model parameters to determine a final segmentation model, wherein if a retina image segmentation task is carried out, fundus retina color images can be directly input into the trained end-to-end network model to obtain a final retina binary segmentation image.

Further, the five encoders of the encoder path in the step 1 are composed of five residual error modules, five attention encoding modules and four M-pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.

Further, in the step 1, the residual error module is composed of a first convolution layer and a second convolution layer, each convolution layer is composed of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of the convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.

Further, the five decoders of the decoder paths in the step 1 are composed of five residual modules, five attention modules, four-layer transpose convolution layers and one sort convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.

Further, the loss function of the whole U-shaped network training process in the step 4 constructs binary cross entropy loss through the network output image and the label image marked manually by people, and the binary image segmentation precision of the network output is dynamically adjusted by the minimum loss function.

Further, in the step 4, sensitivity (SE), specificity (SP), accuracy (Accuracy, ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.

(III) beneficial effects

Compared with the prior art, the invention provides a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which has the following beneficial effects:

the invention adopts a residual U-shaped network, replaces each layer of coding and decoding layer of the original U-net network by an attention coding module and an attention decoding module which are composed of a residual module, an attention module, an up-sampling layer and a down-sampling layer, redefines a U-shaped network suitable for retina blood vessel segmentation, utilizes a mixed attention mechanism to extract more useful image information, and aims to extract deep layer characteristics of images and improve binary segmentation precision of retina images.

The invention introduces a new attention module between each layer of the same-level paths of the attention coding path and the attention decoding path, extracts useful information of the same-level layers of the coding path and splices the useful information into the corresponding decoding path layers; compared with the existing method of adding the attention module only at the junction point of the encoding and decoding paths, the method improves the accurate value by 1.1013, further improves the extraction capability of the network on the image characteristic information, and avoids the loss of detail information.

The invention introduces a new attention coding module in the network coding path, and designs the new attention coding module behind the residual layer of each layer; introducing a new attention decoding module into a network decoding path, designing the new attention decoding module into a transposed convolution layer of each layer, and forming a symmetrical network with the coding stage; compared with the existing method, the method increases the segmentation result of the number of the blood vessel segments of each image by approximately 20%, and further improves the semantic extraction capability of the peripheral capillaries of the retinal blood vessels.

Drawings

FIG. 1 is a flow chart of a method for mixed attention retinal vessel segmentation based on a residual U-network;

FIG. 2 is a network structure diagram of a mixed attention retinal vessel segmentation method based on a residual U-network;

FIG. 3 is a block diagram of a residual module according to the present invention;

FIG. 4 is a schematic diagram of the specific composition of each layer in the residual module according to the present invention;

FIG. 5 is an overall block diagram of an attention module according to the present invention;

FIG. 6 is a schematic diagram showing the specific components of a multi-head attention module of the encoding path according to the present invention;

FIG. 7 is a schematic diagram showing the specific components of a multi-head attention module of the decoding path according to the present invention;

FIG. 8 is a schematic diagram showing the specific components of a multi-head attention module of the splice path according to the present invention;

FIG. 9 is a comparative diagram of evaluation indexes of the prior art and the proposed method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment of the invention provides a flow chart of a mixed attention retinal vessel segmentation method based on a residual U-shaped network, which specifically comprises the following steps:

step 1, constructing a network model; the whole residual U-shaped network is composed of a coder and a decoder, and a new attention module is introduced between splicing paths of the coder and the decoder; the encoder path comprises five encoders, and consists of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the shallow characteristic information of the image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic information is obtained after the image passes through the attention module; the attention coding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; after the attention encoding path is passed, the number of feature images is rich, and the feature images are passed through the attention decoding path to obtain a final binary segmentation image; the attention decoding path comprises five decoders, and consists of five residual error modules, five attention modules, four-layer transposition convolution layers and one classification convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of batch normalization, a multi-head attention mechanism, a multi-layer perceptron and a T-shaped function; the convolution kernel sizes of the residual error module and the transposition convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, and a final high-precision segmentation image is obtained after passing through an attention decoding path.

Step 2, preparing a data set; the pre-training data set uses 40 images of the fundus retina DRIVE color data set and 28 images of the fundus retina CHASE_DB1 color data set, performs image enhancement operation on the images in the two sets of data sets, improves image contrast, performs data clipping, scaling and rotation operation on the two sets of data sets respectively, and inputs the preprocessed images into a training network.

Step 3, training a network model; training a fundus retina segmentation network model, inputting the preprocessed data set in the step 2 into the network model constructed in the step 1 for training, obtaining training weights, and further segmenting a retina image to obtain a segmentation result.

Step 4, selecting a proper loss function and determining an optimal evaluation index of the segmentation method; selecting a proper loss function to minimize loss of a binary segmentation image output by a network and a real label value of manual segmentation, setting a training loss threshold value, continuously iterating and optimizing a model until training times reach the set threshold value or the value of the loss function reaches a set range, and considering that model parameters are pre-trained and saving the model parameters; selecting an optimal evaluation index for retinal image segmentation to measure the segmentation precision and performance of the model; the loss function of the whole U-shaped network training constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted; the binary cross entropy loss is selected as the most effective loss function specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image of the manual marked image can be accurately estimated and adjusted, so that the model output precision is higher; the training process uses Sensitivity (SE), specificity (SP), accuracy (ACC) and area under a segmentation working characteristic curve (AreaUnderRoc, AUC) as indexes for evaluating the quality of a segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of a segmented retinal vascular binary image in an initial whole image, and the segmentation efficiency of a network is improved.

Step 5, determining a segmentation model; and solidifying network model parameters to determine a final segmentation model, wherein if retinal image segmentation is carried out, fundus color images can be directly input into the trained end-to-end network model to obtain a final binary retinal segmentation image.

Example 2:

the residual U-shaped network model structure in the step 1 is shown in figure 2; the whole residual U-shaped network adopts a coder-decoder structure, and the coder part comprises a residual module, an attention coding module and a pooling downsampling module; the decoder section includes a residual module, an attention decoding module, a transpose convolution upsampling module, and a classification convolution module.

The encoder path is composed of five encoders, and the number of characteristic diagram channels of the encoder I, the encoder II, the encoder III, the encoder IV and the encoder V is 16, 32, 64, 128 and 256 respectively; the first four encoders comprise a residual error module, an attention coding module and a pooling downsampling module, the fifth encoder comprises the residual error module and the attention coding module, the pooling downsampling module is not added any more, experiments show that the size of an image after four downsampling processes is very small, and excessive image information can be lost if downsampling is carried out again; the specific composition of the residual error module is shown in figure 3, the specific composition of each layer of convolution is shown in figure 4, the size of a small batch is 8, the size of a convolution kernel is 3 multiplied by 3, the step length is 1, and the discarding rate is 0.5; the overall composition of the attention module is as shown in fig. 5, firstly, input data are normalized by using batch normalization, then, the global information of images is extracted by using multi-head attention and is fused with the input data, the information extraction rate is improved, through the second batch normalization, multi-channel information is integrated by using a multi-layer perceptron, finally, output data are activated by using a T-shaped activation function, and are fused with intermediate fusion features, so that the input and output are ensured to be identical mapping relation, and finally, a feature image with richer information is obtained; the multi-head attention mechanism structure of the attention coding module is shown in FIG. 6, and the output image of the residual module is firstly subjected to dimension reduction under the separable convolution action of 1×1 to generate Q _o ,K _o ,V _o Feature matrix, down sampling three groups of feature matrix to reduce its parameter, calculating probability value by softmax regression, and calculating the probability valueMultiplying the probability value with the V feature matrix, then using 1X 1 to carry out dimension lifting on the feature image, and finally carrying out point multiplication operation on the feature image and the original input image to obtain an output image after an attention mechanism.

The attention formula of the encoding path can be designed as:

wherein Conv (-) represents a convolution operation, I _o Represents the input original image, and d represents the number of channels corresponding to each head.

The decoder path is composed of five decoders, and the number of the channels of the feature map of the first decoder, the second decoder, the third decoder, the fourth decoder and the fifth decoder is 16, 32, 64, 128 and 256 respectively; the first four decoders comprise a residual error module, an attention decoding module, a transposition convolution module and a classification convolution layer with the channel number of 1 multiplied by 1 being n, and are used for reducing the dimension of an output characteristic diagram and outputting the characteristic diagram classified by n, wherein n is set as 2 in the invention, which indicates that the output is retinal vascular foreground and background, and the fifth decoder comprises the residual error module and the attention decoding module; the residual error module is the same as the residual error module in the coding path; the multi-head attention mechanism structure of the attention decoding module is shown in fig. 7, and the multi-head attention input image of the decoder is derived from the front layer decoded image and the same-level encoded spliced image, so that the front layer decoded image is firstly adopted by a transposed convolution block, the consistent size of the image is ensured, and K is obtained _d ,V _d Feature matrix, for the same-level encoded spliced image, using separable 1×1 convolution block to reduce dimension to generate Q _e The matrix, the latter feature processing section, is identical to the attention encoding module.

The attention formula of the decoding path can be designed as:

wherein Conv (-) represents a convolution operation, I _d Representing a front layer decoded input image，I _e Representing the same-level encoded stitched image, and d represents the number of channels corresponding to each header.

As shown in FIG. 8, the multi-head attention module of the splicing path needs to be matched with the resolution of the image and the number of channels in advance in the splicing operation, and the attention mechanism constructs a global characteristic information matrix, so that the spatial position information is ignored to a certain extent, and the splicing operation can be completed better by introducing the spatial information matrix; the attention module of the splicing path and the attention module of the coding path are structurally added into a spatial information matrix, and other structures are the same. The spatial information matrix is composed of a channel position matrix and a pixel position matrix, so that the problem of image spatial position information loss in the splicing process is solved.

The attention formula of the splice path can be designed as:

wherein Conv (-) represents a convolution operation, I _o Representing the input original image S ^T Representing a spatial feature matrix, d representing the number of channels corresponding to each head.

The spatial feature matrix may be defined as:

S ^T ＝P ^T +L ^T

wherein P is ^T Representing a matrix of channel locations, L ^T Representing a matrix of pixel locations.

Considering the robustness of the final model, the segmentation model can be applied to a plurality of data sets successfully, the generalization capability of the model is improved, a P-type nonlinear activation function is used in each layer of convolution of a residual error module, a T-type nonlinear activation function is used in an attention mechanism integral module, and the P-type function and the T-type nonlinear activation function are defined as follows:

the fundus retina dataset in step 2 uses a fundus retina DRIVE color dataset and a fundus retina chase_db1 color dataset, DRIVE being a diabetic retinopathy screening program from the netherlands, comprising 40 images obtained using a CR5 non-mydriatic 3CCD camera, a depression angle of 45 degrees, each image resolution of 584 x 565 pixels; 20 of the 40 images are used for network training and 20 images are used for network testing; chaSE_DB1 contained 28 color retinal images of 999 x 960 pixels in size taken from the left and right eyes of 14 children; and (3) performing image enhancement operation on the images in the two sets of data sets to improve the image contrast, performing data random clipping, scaling and rotation operation augmentation treatment on the two sets of data sets respectively, finally, using the images with the size of 256 multiplied by 256 resolution of each original image as input, expanding each original image into 242 images through clipping and rotation operation, wherein the total of DRIVE data sets is 9680 images, and the total of CHASE_DB1 is 6776 images.

The design of the loss function in the step 4 is to measure the similarity between the predicted value of the network and the label, and the better the loss function is selected, the better the performance of the network is. The loss function in the training process constructs a binary cross entropy loss through the network output image and the label image marked manually by people, and the segmentation precision of the binary image output by the network is dynamically adjusted.

The binary cross entropy loss is an effective means specially aiming at fundus retina blood vessel segmentation, and the difference between the output binary image and the binary image marked manually by human can be accurately estimated and adjusted, so that the model output precision is higher. The binary cross entropy loss is defined as:

w _out weights representing output loss terms, l _out Indicating output loss.

For each term i we use standard binary cross entropy to calculate the loss:

where (r, c) represents pixel coordinates and (H, W) represents the height and width of the image; p (P) _G(r,c) Representing the probability that the pixel points are mapped through a Sigmoid function and output as vascular pixels; p (P) _S(r,c) Representing the probability that the pixel point is mapped by a Sigmoid function and output as a non-vascular pixel; attempting to minimize binary cross entropy loss during training

The Sigmoid function is defined as follows:

the binary cross entropy loss is used in a segmentation network of retinal vascular images by checking each pixel one by one and comparing the class prediction vector with a target vector coded by a hot spot, thereby being beneficial to segmenting the binary images with higher precision.

And in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are selected as indexes for evaluating the quality of the segmentation model, wherein the area under the segmentation working characteristic curve can effectively evaluate the area ratio of the segmented retinal vascular binary image in the initial whole image, and the segmentation efficiency of the network is improved. Sensitivity, specificity, accuracy are defined as follows:

wherein TP represents the number of correctly segmented foreground pixels, FP represents the number of background pixels that are incorrectly segmented into foreground pixels, TN represents the number of correctly segmented background pixels, FN represents the number of foreground pixels that are incorrectly segmented into background pixels; tp+fn+tn+fp represents the total number of pixels of the image, tp+fn represents the actual number of foreground pixels, and tn+fp represents the number of pixels whose prediction result is foreground.

The area under the segmentation working characteristic curve is expressed by AUC, namely the proportion of the area under the curve to the total number of pixels; sometimes interesting curves of different segmentation algorithms are crossed, so that AUC values are used as judgment standards of algorithm quality in many cases; the larger the area, the better the classification performance.

The area under the split operating characteristic curve can be defined as:

wherein S is _p Representing the number of area pixels under the curve S _t Representing the total number of pixels of the whole image area.

Optimizing a network model, setting the training times to be 200, using an Adam optimizer, setting the learning rate to be 0.001, multiplying the learning rate attenuation rate by 0.1 for every training 10 rounds, setting the loss threshold to be 0.0002, continuously iterating the training times, and considering that the network is basically trained when the training loss approaches the loss threshold infinitely.

After the network training is completed in the step 5, all the trained parameters in the network are required to be solidified, and a final segmentation model is determined, if the retinal vascular image segmentation task is performed, fundus color images can be directly input into the trained end-to-end network model, so that a final binary segmentation image is obtained.

The implementation of convolution, activation functions, splicing operations, batch normalization, multi-layer perceptron and the like are algorithms well known to those skilled in the art, and specific procedures and methods can be referred to in corresponding textbooks or technical literature.

According to the invention, the mixed attention retinal vessel segmentation method based on the residual U-shaped network is designed and applied to fundus retinal vessel image segmentation tasks, so that end-to-end network input and output are realized, and the problem that a traditional manual segmentation complex method is used in clinical image segmentation tasks for a long time is solved well, so that the retinal image segmentation tasks become simple, and the realization effect is more efficient; under the same condition, the feasibility and the superiority of the method are further verified by calculating the related index of the binary image obtained by the existing method.

The comparison of the evaluation indexes of the prior art and the method provided by the invention is shown in fig. 9, and as can be seen from the graph, the method provided by the invention has higher sensitivity, specificity, accuracy and larger area under the segmentation working characteristic curve than the prior art, and in the test stage, the average segmentation time of each image only needs 1.03 seconds; these indices further illustrate that the proposed method has better segmentation quality, achieving the desired effect.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A mixed attention retinal vessel segmentation method based on a residual U-shaped network is characterized in that: the method comprises the following steps:

2. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five encoders of the encoder path in the step 1 consist of five residual error modules, five attention encoding modules and four M pooling downsampling modules; the first residual block, the second residual block, the third residual block, the fourth residual block and the fifth residual block are used for extracting the characteristic information of the shallow image and fusing the basic information of each layer in the residual module; the attention coding module is used for enabling the network to pay attention to more useful characteristic information extracted by the residual error module and inhibiting unimportant characteristic information; the four M pooling downsampling modules are used for increasing the channel number of the image, and more useful characteristic diagram information is obtained after the image passes through the attention module.

3. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the residual error module in the step 1 consists of a first convolution layer and a second convolution layer, each convolution layer consists of batch normalization, common convolution, dropout and a P-type nonlinear activation function, and the size of a convolution kernel is unified to be n multiplied by n; the attention coding module consists of a batch normalization, a multi-head attention coding layer, a multi-layer perceptron and a T-shaped function; the pooling downsampling module is unified into a maximum pooling layer.

4. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: the five decoders of the decoder paths in the step 1 are composed of five residual error modules, five attention modules, four transposed convolution layers and a classified convolution block; the composition of the residual error module is the same as that of the coding path, and the attention decoding module consists of a batch normalization, a multi-head attention decoding layer, a multi-layer perceptron and a T-shaped function; the convolution kernels of the residual error module and the transpose convolution module are unified to be n multiplied by n; the last layer of the fifth decoder is a classified convolution layer with the size of 1×1 and the channel size of 2, and is used for outputting a classified image; the input image generates multi-channel rich characteristic information after passing through an attention coding path, then carries out decoding and segmentation operation through an attention decoding path, and outputs a final binary segmentation image.

5. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: and 4, constructing a binary cross entropy loss by a loss function of the whole U-shaped network training process through a network output image and a label image marked manually by people, and dynamically adjusting the segmentation precision of the binary image output by the network by the loss function to be minimized.

6. The mixed attention retinal vascular segmentation method based on a residual U-shaped network of claim 1, wherein: in the step 4, sensitivity (SE), specificity (SP), accuracy (ACC) and area under the segmentation working characteristic curve (AreaUnderRoc, AUC) are used as indexes for evaluating the quality of the segmentation model in the whole U-shaped network training process, wherein the area under the segmentation working characteristic curve can effectively evaluate the area occupation ratio of the segmented retinal vascular binary image in the initial whole image, and dynamically guide the network optimization training.