CN113205534A

CN113205534A - Retinal vessel segmentation method and device based on U-Net +

Info

Publication number: CN113205534A
Application number: CN202110534275.3A
Authority: CN
Inventors: 彭绍湖; 李动员; 张一梵; 肖鸿鑫; 彭凌西; 董志明
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-08-03
Anticipated expiration: 2041-05-17
Also published as: CN113205534B

Abstract

The invention discloses a retinal vessel segmentation method and a device based on U-Net + +, wherein the method comprises the following steps: acquiring a vessel segmentation data set; preprocessing the blood vessel segmentation data set to enhance the contrast of a blood vessel image; cutting the blood vessel image to obtain sample data; constructing a vessel segmentation network through a Pythrch deep learning framework according to the sample data; and performing retinal blood vessel image segmentation through the blood vessel segmentation network, and evaluating a segmentation result. The method uses the selective kernel block to replace the traditional volume block, and simultaneously replaces the volume block with the receptive field of 5 in the selective kernel block by two 3X3 convolution series connections, thereby not only improving the depth of the network, but also reducing the calculated amount and the parameter amount. The receptive field can be automatically adjusted by using the selective kernel block, the characteristic information extracted under different scales is more effectively utilized, and the method can be widely applied to the technical field of artificial intelligence.

Description

Retinal vessel segmentation method and device based on U-Net +

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a retinal vessel segmentation method and a retinal vessel segmentation device based on U-Net + +.

Background

Retinal vessel segmentation plays an important role in the analysis and diagnosis of many pathologies. Many diseases, such as glaucoma, hypertension and age-related macular degeneration, are associated with the morphology and structure of blood vessels in fundus images. However, the complex retinal vascular network, as well as some small and ambiguous vascular structures, makes the segmentation task extremely complex and challenging. For retinal vessel segmentation, manual labeling is not only time consuming, repetitive, but also requires the experience and skill of a trained expert.

Generally, methods for automatic retinal vessel segmentation can be divided into two categories, unsupervised and supervised. A variety of unsupervised techniques are used to extract retinal blood vessels from fundus images, including matched filtering, morphological processing, blood vessel tracking, and the like. One advantage of the unsupervised segmentation method is the basic fact that no labeling is required. However, these methods do not perform well, especially in the case of small blood vessels. For the surveillance method, retinal vessel features are first extracted from the fundus image, and then the features are separated into background and vessels using a trainable classifier. In most cases, the execution of a supervised method outperforms the results of an unsupervised based method. There is room for improvement.

In recent years, with the rise of deep learning and research enthusiasm, some people in medical research make internal disorder or usurp introduced a deep neural network into the task of fundus blood vessel segmentation to improve the effect of fundus blood vessel segmentation, wherein UNET algorithm has been widely used for biological image segmentation, such as retinal blood vessel segmentation, lung CT image, coronary artery blood vessel image, etc., and has achieved good effect. The algorithm is built on a Full Convolution Network (FCN) consisting of an encoder and a decoder, the shape of the network being similar to a "U" and hence called "U-Net". However, the standard U-Net uses convolution with a fixed receptive field to some extent affecting the segmentation effect.

Disclosure of Invention

In view of this, embodiments of the present invention provide a retinal vessel segmentation method and apparatus based on U-Net + +, so as to improve the segmentation effect.

The invention provides a retinal vessel segmentation method based on U-Net + +, which comprises the following steps:

acquiring a vessel segmentation data set;

preprocessing the blood vessel segmentation data set to enhance the contrast of a blood vessel image;

cutting the blood vessel image to obtain sample data;

constructing a vessel segmentation network through a Pythrch deep learning framework according to the sample data;

performing retinal vessel image segmentation through the vessel segmentation network, and evaluating a segmentation result;

wherein the vessel segmentation network comprises a four-layer U-Net network; the encoder and the decoder of the blood vessel segmentation network are symmetrically distributed on two sides of the network; the four layers of U-Net networks share one feature extractor at the same time; the convolution block in the vessel segmentation network is replaced by a selective kernel block.

Optionally, the preprocessing the vessel segmentation data set comprises:

extracting a green channel of an original image in the blood vessel segmentation data set;

carrying out whitening processing on the green channel;

performing adaptive histogram equalization processing on the vessel segmentation data set;

the vessel segmentation data set is subjected to gamma transformation processing.

Optionally, the performing a cropping operation on the blood vessel image to obtain sample data includes:

cutting the image blocks of the preprocessed blood vessel image to generate random coordinates;

cutting the random coordinates as a central point to obtain image blocks, wherein the size of each image block is 48 multiplied by 48;

and obtaining sample data according to the image block.

Optionally, in the step of constructing the vessel segmentation network through a Pytorch deep learning framework according to the sample data,

two 3 multiplied by 3 convolutions are connected in series in the selective kernel block to generate a target convolution with a reception field of 5;

the encoder comprises five layers of structures, each layer of structure comprises two SK modules and a Relu activation function, the maximum pooling processing is carried out after the feature extraction of each layer of structure is finished, the step length of the maximum pooling processing is 2, and each layer of structure is sequentially subjected to down-sampling processing.

Optionally, the vessel segmentation network comprises an active attention gate;

the first input of the active attention gate is an upsampling feature in the encoder;

the second input of the active attention gate is a same depth feature in the encoder;

the active attention gate determines a target upsampled feature from the co-depth features as a target feature.

Optionally, the context information extracted by the encoder is propagated to the decoder of the corresponding layer through a dense jump connection;

the input of each convolution block in the decoder consists of two equal-scale feature maps.

Optionally, the evaluating the segmentation result includes:

acquiring a first pixel point of a target object of a sample labeling image, and acquiring a second pixel point of the target object of a prediction segmentation image;

calculating to obtain an overlap evaluation index according to the first pixel point and the second pixel point;

acquiring a third pixel point of a target object predicted to be true by a positive sample in the prediction segmentation image, acquiring a fourth pixel point of the target object predicted to be true by a negative sample in the prediction segmentation image, and acquiring a fifth pixel point of the target object predicted to be false by the positive sample;

calculating according to the third pixel point and the fourth pixel point to obtain a precision evaluation index;

calculating according to the third pixel point and the fifth pixel point to obtain a recall ratio evaluation index;

and calculating to obtain a mean evaluation index according to the precision evaluation index and the recall ratio evaluation index.

A second aspect of the embodiments of the present invention provides a retinal vessel segmentation method based on U-Net + +, including:

a first module for obtaining a vessel segmentation dataset;

the second module is used for preprocessing the blood vessel segmentation data set and enhancing the blood vessel image contrast;

the third module is used for performing cutting operation on the blood vessel image to obtain sample data;

a fourth module, configured to build a vessel segmentation network through a Pytorch deep learning framework according to the sample data;

a fifth module, configured to perform retinal blood vessel image segmentation through the blood vessel segmentation network, and evaluate a segmentation result;

A third aspect of embodiments of the present invention provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a program for execution by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

According to the embodiment of the invention, the selective kernel block is used for replacing the traditional volume block, and meanwhile, the volume block with the receptive field of 5 in the selective kernel block is replaced by two 3X3 convolution series connection, so that the depth of the network is improved, and the calculated amount and the parameter amount are reduced. The receptive field can be automatically adjusted by using the selective kernel block, and the feature information extracted under different scales is more effectively utilized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a vessel segmentation method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the structure of a retinal vessel image segmentation convolution network in the present invention;

FIG. 3 is a schematic view of the structure of the attention gate of the present invention;

fig. 4 is a schematic diagram of the structure of an improved SK module in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the invention also provides a retinal vessel segmentation method based on U-Net + +, which comprises the following steps:

acquiring a vessel segmentation data set;

cutting the blood vessel image to obtain sample data;

Optionally, the preprocessing the vessel segmentation data set comprises:

carrying out whitening processing on the green channel;

and obtaining sample data according to the image block.

Optionally, the vessel segmentation network comprises an active attention gate;

Optionally, the evaluating the segmentation result includes:

a first module for obtaining a vessel segmentation dataset;

The embodiment of the invention also provides the electronic equipment, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.

The following detailed description of the embodiments of the present invention is made with reference to the accompanying drawings:

referring to fig. 1, the method may include the steps of:

step S1: obtaining a vessel segmentation dataset;

step S2: the obtained image is preprocessed, so that the overall contrast of the blood vessel image is enhanced, and the model can better fit the blood vessel image data in the later experiment, thereby better realizing the segmentation result.

Step S2.1: the green channel of the original RGB image is extracted because the blood vessels exhibit a better contrast in the green channel than in the red and blue channels.

Step S2.2: and carrying out whitening processing on the green channel, wherein the influence of factors such as external environment brightness, object reflection and the like on the image can be solved through the whitening processing. The formula for averaging μ and variance δ of the image is as follows:

and then for each pixel p of the original image_ijPerforming conversion calculation to obtain new pixel value

And finally obtaining the whitened image. The specific formula is as follows:

step S2.3: the method is characterized in that adaptive histogram equalization processing (also called CLAHE operation) is carried out, the CLAHE operation stretches the gray level histogram of the original image to a certain extent, the contrast is improved, meanwhile, a local bright or dark area is limited, and the situation that information is lost in the histogram equalization operation is prevented.

Step S2.4: and gamma conversion processing is carried out, so that a nonlinear exponential relationship is presented between the gray value of the processed blood vessel image and the gray value of the image before processing, and gray stretching is realized.

The gamma transformation formula is as follows:

the input gray level is 0 to 255, and the input gray level and the output gray level are normalized to be 0 to 1, when gamma is small, the overall brightness of the image is improved in a nonlinear mode, and when gamma is large, the overall brightness of the image is reduced in a nonlinear mode.

Step S3: and after the image preprocessing operation is finished, performing image block clipping operation, generating a group of random coordinates during clipping, clipping image blocks with the size of 48x48 by taking the coordinates as a central point, and using a large amount of obtained sample data for training the segmentation model.

Step S4: the structure of the convolutional neural network provided by the invention is shown in figure 2, and integrates U-Net with different depths by using a nested U-Net architecture by taking advantage of UNet + + network.

The network designed by the invention nests four layers of U-Net as a basic network framework, wherein an encoder and a decoder are symmetrically distributed on two sides of the network. All layers of U-Net share a feature extractor, so the present invention requires only one encoder to be trained. The traditional convolution block is replaced by the improved selective kernel block in the network, and the convolution with the receptive field of 5 is generated by using two 3X3 convolution series in the selective kernel block (SK module), so that the depth of the network is improved, the calculated amount and the parameter amount are reduced, and the specific structure is shown in FIG. 4. The receptive field can be automatically adjusted by using the selective kernel block, and the feature information extracted under different scales is more effectively utilized. The encoder has a total of five layers, each layer structure including two modified SK modules + Relu. After the extraction of each layer of features is finished, a maximum pooling with the size of 2x2 and the step size of 2 is performed. The downsampling is carried out in turn for each layer of the structure.

To focus on the features associated with the target or objects, the present invention adds a simple but effective attention gate specific structure to the nested architecture as shown in FIG. 3. The attention gate has two inputs, an upsampling feature Fg in the decoder and a feature Fx of equal depth in the encoder. The gating signal Fg in the attention gate may select a more useful feature from the encoding features Fx and send it to the upper layer decoder.

The context information extracted by the encoder is propagated to the decoders of the corresponding layers through a dense jump connection, so that more efficient hierarchical features can be extracted. In the case of dense skip-join, the input to each volume block in the decoder consists of two equal-scale feature maps (1) the intermediate feature map is from the output of the previous potential gate along the skip-join at the same depth; (2) the final feature map comes from the output of the deeper deconvolution block operation. After receiving and connecting all the essential feature maps, the decoder restores the picture in a bottom-up manner.

The present invention defines the feature mapping as follows, let X_ijRepresents the output of a volume block where i represents the characteristic depth in the network and j represents the sequence of volume blocks in the ith layer along the hop connection, with the following formula:

wherein

Indicating a connected merge of the volume blocks. UP () and Ag () represent upsampling and attention gate selection, respectively. Step S4: deep supervision is introduced in the network structure, and a 1X1 convolution and sigmoid activation function with a C core is added to the outputs of nodes X0_1, X0_2, X0_3 and X0_4, wherein C is the number of classes of the data set given therein. A mixed segmentation penalty is then defined for each semantic scale, including pixel-level cross entropy and Dice coefficient penalty. The mixing loss can take advantage of two loss functions: smooth gradients and treatment of class imbalances. The specific definition is as follows:

wherein y is_n,cE.g. Y and p_n,cE P represents the target label and prediction probability of c class and N pixels in a batch, and N represents the number of pixels in a batch.

Since the output of each sub-network is actually the result of the segmentation of the image during the deep supervision, the invention can cut out the superfluous parts if the output of a small sub-network is good enough.

Step S5: and after the training of the blood vessel segmentation model is finished, evaluating the blood vessel segmentation result according to the evaluation index. The evaluation indexes include an overlap (IOU), Precision (Precision), Recall (Recall), and weighted harmonic mean (F-Measure). The formula is as follows:

wherein, Target is a pixel point of a Target object of the sample labeling image, and Prediction is a pixel point of the Target object of the Prediction segmentation image.

Wherein, TP is the pixel point of the target object with positive sample prediction as true, FP is the pixel point of the target object with negative sample prediction as true.

Wherein, TP is the pixel of the target object whose positive sample is predicted to be true, and FN is the pixel of the target object whose positive sample is predicted to be false.

Wherein the content of the first and second substances,

is the weight.

In summary, compared with the prior art, the invention has the following advantages:

1. the method uses the selective kernel block to replace the traditional volume block, and simultaneously replaces the volume block with the receptive field of 5 in the selective kernel block by two 3X3 convolution series connections, thereby not only improving the depth of the network, but also reducing the calculated amount and the parameter amount. The receptive field can be automatically adjusted by using the selective kernel block, and the feature information extracted under different scales is more effectively utilized.

2. The present invention can increase the weight of the target region by adding attention gates between nested convolutional blocks while suppressing background regions that are not relevant to the segmentation task.

3. According to the method, the model can be trimmed during the test period by introducing the deep supervision, and a large number of model parameters can be reduced by trimming so as to accelerate the segmentation speed of the model.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The retinal vessel segmentation method based on U-Net + + is characterized by comprising the following steps:

acquiring a vessel segmentation data set;

cutting the blood vessel image to obtain sample data;

2. A U-Net + + based retinal vessel segmentation method as claimed in claim 1, wherein the preprocessing of the vessel segmentation data set comprises:

carrying out whitening processing on the green channel;

3. The method for retinal vessel segmentation based on U-Net + + according to claim 1, wherein the performing a clipping operation on the vessel image to obtain sample data includes:

and obtaining sample data according to the image block.

4. The method according to claim 1, wherein in the step of constructing a vessel segmentation network using a Pythrch deep learning framework based on the sample data,

5. The U-Net + + based retinal vessel segmentation method according to claim 4, wherein the vessel segmentation network includes an active attention gate;

6. The U-Net + + based retinal vessel segmentation method according to claim 5, wherein the context information extracted by the encoder is propagated to the decoder of the corresponding layer through dense jump connection;

7. The method for retinal vessel segmentation based on U-Net + + according to claim 1 wherein the segmentation result is evaluated and comprises:

8. The retinal vessel segmentation method based on U-Net + + is characterized by comprising the following steps:

a first module for obtaining a vessel segmentation dataset;

9. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-7.