CN114419054A

CN114419054A - Retinal blood vessel image segmentation method and device and related equipment

Info

Publication number: CN114419054A
Application number: CN202210059594.8A
Authority: CN
Inventors: 陈丹妮; 杨文忠; 谭思翔; 林荐壮; 王梦婷
Original assignee: Xinjiang University
Current assignee: Xinjiang University
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-04-29

Abstract

The invention relates to the field of computer vision image processing, and discloses a retinal blood vessel image segmentation method and device. The method comprises the following steps: in the data processing stage, firstly, obtaining a retinal blood vessel image to be detected; then, data enhancement and pretreatment are carried out on the retinal fundus image; in the training stage, firstly, a network model based on Transformer optimization is constructed, and then the network model is trained by using the processed training image; in the testing stage, inputting the retinal blood vessel image into a trained network model for image segmentation; and finally, carrying out weighted average on the prediction results of a plurality of retinal blood vessel images output by the network model to obtain the classification probability of each pixel so as to obtain a final segmentation result image.

Description

Retinal blood vessel image segmentation method and device and related equipment

Technical Field

The invention relates to the field of image processing, in particular to a retinal blood vessel image segmentation method, a retinal blood vessel image segmentation device and related equipment.

Background

Retinal examinations can provide important clinical information for the diagnosis of many retinal diseases such as diabetic retinopathy. However, manual retinal examinations require specialized clinicians or specialists to screen large numbers of retinas, which is time consuming, tedious, difficult to batch process, and prone to misdiagnosis. In order to alleviate the shortage of medical resources and reduce the burden of experts, an automatic and high-performance retinal blood vessel image segmentation device needs to be developed for pre-screening and other examinations. The automatic segmentation device for the retinal blood vessel image can rapidly and accurately segment the structural features of the retinal blood vessels, wherein branch points and bends can be used for assisting the diagnosis and analysis of cardiovascular diseases and diabetic retinopathy, and the change features of the width of the retinal blood vessels can detect and analyze hypertension. Therefore, the current research on the automatic segmentation method of the retinal vessel image has important significance on the research of relevant retinal diseases.

Since the automatic retinal blood vessel segmentation technology has been rapidly developed, researchers have proposed a large number of retinal blood vessel segmentation methods. Some traditional methods successfully carry out automatic segmentation on retinal vessel images and obtain good segmentation results, but the characteristics of the images cannot be completely characterized, so that the detection of the structural characteristics of the retina is insufficient, the segmentation precision is not ideal, and the requirements of assisting clinical diagnosis of ophthalmologists cannot be met.

Compared with the conventional method, the CNN method combines the advantages of the medical image segmentation method and the semantic segmentation method, so that they achieve remarkable performance. Many excellent works in the retinal vessel segmentation show excellent segmentation performance, and the CNN is proved to have strong feature representation learning and recognition capability. However, due to the inherent locality of convolution operation, with the expansion of training data and the increase of network layers, the methods are difficult to learn explicit global and long-distance semantic information interaction, so that the algorithm segmentation result is discontinuous at the bifurcation of a fine blood vessel, the blood vessel with a complex curvature form is lost, and the feature distinction between the retinal blood vessel edge and a background region is not obvious.

Transformer, an efficient network structure, relies on self-attention to capture global information over long distances, and has achieved significant achievement in the field of natural language processing. Considering that global information is also urgently needed in computer vision tasks and that suitable Transformer applications help overcome the limitations of CNN, researchers have made extensive efforts to explore transformers suitable for use in vision tasks. For example, Hu attempts to extract deep features using CNN, and these features are then fed into a Transformer for processing and regression. Both Dosovitskiy and Cao propose pure transform networks, classifying and segmenting images separately, with great success. They split the image into tiles and treat each vectorized tile as a word/token in NLP so that the appropriate Transformer can be applied. Subsequently, on the basis of the success of VIT, a large number of better Transformer-based architectures were documented and achieved better performance than CNN. However, the vision transformer still has the problems of large calculation amount and low precision caused by insufficient extraction of local information.

Therefore, the conventional method has the problem of low segmentation precision when segmenting the retinal blood vessel image.

Disclosure of Invention

The embodiment of the invention provides a retinal blood vessel image segmentation method and device, which are used for improving the segmentation precision of a retinal blood vessel image.

In order to solve the above technical problem, an embodiment of the present application provides a retinal blood vessel image segmentation method, including:

obtaining a retinal blood vessel image, and inputting the retinal blood vessel image into a segmentation network model;

based on the encoder-decoder module of the segmentation network model, performing feature extraction and fusion on the retinal blood vessel image, and classifying each pixel in a feature map obtained after feature extraction and fusion to obtain at least one segmentation processing result;

and performing weighted average processing on all the segmentation processing results to obtain a final segmentation result of the retinal blood vessel image.

In order to solve the above technical problem, an embodiment of the present invention further provides a retinal vessel image segmentation apparatus based on transform optimization, including:

the retina blood vessel image acquisition module is used for acquiring a retina blood vessel image and inputting the retina blood vessel image into a segmentation network model;

the segmentation module is used for performing feature extraction and fusion on the retinal blood vessel image based on an encoder-decoder module of the segmentation network model, and classifying each pixel in a feature map obtained after the feature extraction and fusion to obtain at least one segmentation processing result;

and the segmentation result acquisition module is used for carrying out weighted average processing on all the segmentation processing results to obtain a final segmentation result of the retinal blood vessel image.

In order to solve the technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the retinal blood vessel image segmentation method when executing the computer program.

In order to solve the technical problem, an embodiment of the present application further provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the steps of the above retinal blood vessel image segmentation method.

According to the retinal blood vessel image segmentation method provided by the embodiment of the invention, the retinal blood vessel image is obtained and input into a segmentation network model; based on the encoder-decoder module of the segmentation network model, performing feature extraction and fusion on the retinal blood vessel image, and classifying each pixel in a feature map obtained after feature extraction and fusion to obtain at least one segmentation processing result; and performing weighted average processing on all the segmentation processing results to obtain a final segmentation result of the retinal blood vessel image. The network structure based on the transform optimization proposed for retinal vessel image segmentation is a U-shaped symmetric network based on the transform optimization, and not only has a symmetric encoder-decoder structure, but also a jump connection is added between an encoder part and a decoder part. The method can obtain an accurate segmentation result on a limited data set while realizing end-to-end segmentation of the retinal blood vessel image, and improve the segmentation precision of the retinal blood vessel image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart of one embodiment of a retinal blood vessel image segmentation method proposed by the present invention;

FIG. 2 is a schematic diagram of a network model structure based on Transformer optimization according to the present invention;

FIG. 3 is a detailed diagram of the optimized Transformer module of the present invention;

FIG. 4 is a segmentation of a vessel image in a DRIVE data set;

FIG. 5 is a segmentation of a vessel image in the DRIVE data set;

FIG. 6 is a segmentation result for a vessel image in a STARE dataset;

FIG. 7 is a schematic structural diagram of one embodiment of a retinal blood vessel image segmentation apparatus according to the present application;

FIG. 8 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 shows a retinal blood vessel image segmentation method according to an embodiment of the present invention, which is detailed as follows:

s201, obtaining a retinal blood vessel image, and inputting the retinal blood vessel image into a segmentation network model.

In step S201, the above-mentioned manner of acquiring the retinal blood vessel image includes, but is not limited to, acquiring images based on image medical diagnostic technology, such as OCT image medical diagnostic technology, by disclosing training set, such as DRIVE, STARE and CHASE _ DB 1.

The segmentation network model refers to a model trained in advance for segmenting the retinal blood vessel image. The model is a retinal vessel image segmentation network model based on Transformer optimization.

By means of the segmentation network model based on the Transformer optimization, end-to-end network training can be achieved, and a complex post-processing method is not needed. In the segmentation result, large-scale blood vessels cannot be hollow and broken, low-contrast micro blood vessels and edge blood vessels are well reserved, the accuracy of blood vessel segmentation and the detection sensitivity of small blood vessels are comprehensively improved, and therefore the segmentation precision of retinal blood vessel images is improved.

In some optional implementations of the present embodiment, before step 201, the retinal blood vessel image segmentation method further includes:

and S101, acquiring a training image.

And S102, performing data enhancement processing on the training image to obtain an enhanced image.

And S103, taking the training image and the enhanced image as training samples.

And S104, preprocessing the training sample to obtain a preprocessed training sample.

And S105, inputting the preprocessed training sample into the initial segmentation network model for training to obtain the segmentation network model.

For step S101, the above training images include, but are not limited to, an open training set, such as the prior open retinal vessel data sets DRIVE, STARE, and CHASE _ DB 1. For example, 20 images were selected at random as training images in the DRIVE data set, 16 images were selected at random as training images in the start data set, and the first 20 images were selected as training images in the case _ DB1 data set.

For step S102, the data enhancement includes, but is not limited to, histogram stretching, image flipping, image rotation.

The training sample is expanded by adopting a data enhancement method for the retinal vessel image, and the training image and the enhanced image obtained by data enhancement are used as the training sample of the segmentation network model together.

Preferably, the training image is subjected to data enhancement processing based on a histogram stretching mode to obtain an enhanced image.

An enhanced image is generated by adopting a histogram stretching method on the retinal vessel image, and the training sample is doubled. And taking the training image and the enhanced image together as a training sample of the segmentation network model. Training data is added, so that the data reaches a certain amount to avoid over-fitting training, and the generalization capability of the segmentation network model is improved.

For step S104, the above pre-processing includes, but is not limited to, normalization, gaussian transformation, random rotation, random horizontal flipping, and color correction processing.

Preferably, the training image is sequentially subjected to Gaussian transformation, random rotation, random horizontal inversion and color correction to obtain a preprocessed training sample.

It should be understood that the preprocessing herein can be varied according to the actual application scenario.

Step S104 is explained below by way of an example, such as performing gaussian transformation with a probability of 0.5 for each color retinal blood vessel image, where Sigma is (0, 0.5); then, the random rotation and the random horizontal turnover with the probability of 0.5 are carried out on the image within the range of [0,20 ° ]; finally, Gamma color correction within the range of [0.5,2] is applied to the image to adjust the contrast of the image.

The training images are preprocessed, so that the contrast of the training images is enhanced, the accuracy of subsequent processing such as feature extraction on the images is improved, and the segmentation precision of the segmentation network model on the retinal vessel images is improved.

Through the steps, the segmentation network model is obtained, end-to-end network training can be realized, and a complex post-processing method is not needed. In the segmentation result, large-scale blood vessels cannot be hollow and broken, low-contrast micro blood vessels and edge blood vessels are well reserved, the accuracy of blood vessel segmentation and the detection sensitivity of small blood vessels are comprehensively improved, and therefore the segmentation precision of the segmentation network model on retinal blood vessel images is improved.

S202, based on the coding module and the decoding module in the segmentation network model, feature extraction and fusion are carried out on the retinal blood vessel image, each pixel in the feature image obtained after feature extraction and fusion is classified, and at least one segmentation processing result is obtained.

In step S202, the above-mentioned segmentation network model refers to a network model for segmenting the retinal blood vessel image. The segmentation network model includes, but is not limited to, a transformation optimization-based segmentation network model, and a machine learning-based segmentation network model.

Preferably, a transform optimization based split network model is used here, and as shown in fig. 2, the structure of the transform optimization based split network model is divided into an encoder, a decoder, a skip connection and a side output.

Wherein, both the encoder and the decoder comprise a Transformer module, and the Transformer module is the optimized Transformer provided by the invention.

It should be noted here that each optimized Transformer module is composed of a LayerNorm normalization layer, a multi-head attention module, a residual connection layer, and a multi-layer sensing layer. The multi-head Attention modules adopted by two continuous Transformer modules in the invention are respectively a Cross patch Convolution Self-Attention (CPCA) module and an Inner patch Convolution Self-Attention (IPCA). The CPCA module is used for extracting attention among image block intersections in a feature map, and the IPCA module is used for extracting and integrating global feature information among pixels in one of the image blocks.

Two successive optimized Transformer modules, attention calculation can be performed according to the following equations (1) to (4):

wherein, Y_n-1And Y_n+1Respectively, the input and output of the Transformer module in the present invention,

and

the outputs of the CPCA module and the IPCA module are shown separately. As shown in fig. 3, LN is the LayerNorm normalization layer, MLP is the multilayer sensing layer, the picture image block sizes in the CPCA module and the IPCA module are both 8 × 8, and the number of attention heads is 1 and 8, respectively.

The encoder comprises four encoding modules, namely a first encoding module, a second encoding module, a third encoding module and a fourth encoding module which are sequentially connected. The first coding module comprises an embedding module and two continuous Transformer modules, the second coding module and the third coding module both comprise a merging module and two continuous Transformer modules, and the fourth coding module only comprises a merging module.

It should be noted here that the embedding module in the first encoding module consists of an upsampling and linear projection layer, which upsamples the 512 × 512 input image by a factor of 2, expands it to a 1024 × 1024 pixel image, then divides it into non-overlapping 256 × 256 image blocks, and increases the number of channels to 96. The merging module performs x 2 down-sampling on the input feature map and increases the number of channels of the feature map. And the restoration module performs x 2 upsampling on the feature map and reduces the number of channels of the feature map.

The decoder symmetrical to the encoder also comprises four decoding modules, namely a first decoding module, a second decoding module, a third decoding module and a fourth decoding module which are sequentially connected. Except that the first decoding module only comprises one restoring module, the other decoding modules consist of one restoring module and two continuous Transformer modules.

At the Bottom of the split network model, a Bottom Block consisting of 4 transform modules is used to connect the encoder and decoder.

In addition, the output of the first coding module is fused with the input and the output of the transform module in the second decoding module in sequence, the output of the second coding module is fused with the input and the output of the transform module in the third decoding module in sequence, and the output of the third coding module is fused with the input and the output of the transform module in the fourth decoding module in sequence, so that the jump connection is formed.

The side output comprises four convolution modules, and the weighted average of the output of each layer of convolution module is taken as the final segmentation result.

It should be noted here that the convolution module includes a convolution of 3 × 3, Batch Normalization, activation function ReLU, and convolution of one up-sample and one 1 × 1, which are sequentially connected. And recovering the spatial information lost in the down-sampling process of each layer, and recovering the picture to be equal to the size of the network input picture so as to realize end-to-end segmentation.

The following detailed description of the split network model is given by an embodiment, and the network structure parameters based on transform optimization proposed by the present invention are shown in table one:

table one, network structure parameter based on Transformer optimization

In this example, the network training sets 250 epochs to train the network, the initial learning rate is 0.001, 10 decays every 50 epochs, the batch size is set to 4, and the transform-based optimized network is trained using an Adam optimizer and a binary cross-entropy loss function. The entire network is trained from scratch without using additional training data. We trained the network on the DRIVE, start and CHASE _ DB1 training sets and evaluated on the respective validation set for each data set. The experiments of this example were performed on NVIDIA Tesla V10032 GB GPU using a PyTorch framework as the backend implementation.

In order to evaluate the performance of the model proposed by the present invention, the segmentation result is compared with an expert segmentation map (the expert segmentation result is used as a standard, i.e., a label of training data), and the comparison result of each pixel is classified into True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN). We then used sensitivity (Se), specificity (Sp), accuracy (Acc), F1-Score (F1) and area under Receiver Operating Characteristics (ROC) (AUC) as metrics to evaluate the performance of the model, as follows:

wherein TP indicates that the true value when one of the vessel pixels in the expert segmentation map is correctly classified in the predicted image is positive, TN indicates that the true value when one of the non-vessel pixels in the expert segmentation map is correctly classified in the predicted image is negative, FN is the error classification of the vessel pixel in the expert segmentation map as a non-vessel pixel in the predicted image, and FP is the error labeling of the non-vessel pixel in the expert segmentation map as a vessel pixel in the predicted image. Precision and recalling are Precision and Recall, respectively, and the ROC curve represents the proportion of blood vessels correctly classified as vascular pixels versus non-vascular pixels. And the AUC refers to the area under the ROC curve, and can be used to measure the performance of the segmentation, and the closer the AUC value is to 1, the closer the system performance is to perfect.

It should be noted that, in comparison of network structures by different methods, in order to benchmark the performance of the model, the retinal vessel segmentation data sets of DRIVE, start and CHASE _ DB1 are used to perform comparison of different network structures. In a specific example, as shown in table two, table three and table four, the method proposed by the present application is tested on three common data sets respectively, and compared with the U-Net network, the R2UNet network, the DFUNet network, the DenseBlock-UNet network and the IterNet network, the test performance is superior to other methods in AUC, ACC and SP, respectively, 0.9869/0.9627/0.9902, 0.9945/0.9772/0.9903, 0.9917/0.9805/0.9963, and the sensitivity and F1 values are also obviously improved. Clearly comparing the method of the invention to obtain the best performance.

Table two: performance comparison on DRIVE data set

Table three: performance comparison on STARE data set

Table four: performance comparison on the CHASE _ DB dataset

As shown in fig. 4, 5 and 6, (a) is an original retinal blood vessel image; (b) is used as a mask; (c) is a gold standard; (d) the segmentation result of the present invention. It can be seen from fig. 4, fig. 5 and fig. 6 that the retinal vessel segmentation result in the present invention is very close to the gold standard, which proves that the retinal vessel segmentation method based on transform optimization provided by the present invention has better vessel segmentation performance, improves the detection capability of the microvasculature with low contrast, and the segmented blood vessel trunk has better connectivity, reducing vessel breakage.

The invention provides a retinal blood vessel image segmentation method aiming at the problems of few retinal blood vessel image samples and low detection degree of low-contrast micro blood vessels. By constructing a transform-based Encoder-Decoder symmetric structure and adding a convolution layer in a multi-head attention of a transform module, the dependency relationship of long-distance pixels and the local detailed information of a complementary blood vessel can be better acquired, the micro blood vessel with low contrast in a segmentation result is better reserved, and the accuracy of blood vessel segmentation and the detection sensitivity of a small blood vessel are comprehensively improved.

The transform optimization-based segmentation network model adopts a symmetric structure of a decoder-encoder. The decoder and the encoder are both constructed based on a Transformer, and the two parts are connected by adopting a jump connection, so that a network can acquire information lost due to reduction of the size of a picture in a down-sampling process in an up-sampling process, and meanwhile, enough context information and semantic information can be acquired, thereby obtaining a better segmentation effect. The multi-head attention in the transform module is added with a convolutional layer, the transform has a global receptive field and can better acquire long-distance dependence, but the method is lack of acquiring local detail information, and the method is added with the convolutional layer of 3 multiplied by 3 and can better supplement the local detail characteristic information. And the network model is segmented, so that the segmentation precision of the network model on the retinal vessel image is improved.

In some optional implementation manners of this embodiment, in step S202, it specifically includes the following steps S2021 to S2025:

s2021, based on the multi-layer coding blocks corresponding to the encoder of the segmentation network model, performing down-sampling processing on the retinal blood vessel image to obtain down-sampling characteristics corresponding to each layer of coding blocks.

S2022, using the down-sampling feature corresponding to the bottom coding block of the split network model as the current feature map, and using the decoding block in the decoder of the split network model corresponding to the current feature map as the current decoding block.

S2023, inputting the current feature map into the current decoding block, and performing feature restoration processing on the current feature map based on the current decoding block to obtain restored features.

S2024, fusing the downsampling feature and the restoring feature in the coding block on the upper layer corresponding to the current feature map to obtain an upsampling feature, and fusing the upsampling feature and the downsampling feature in the coding block on the upper layer corresponding to the current feature map to obtain a fused feature.

S2025, taking the previous layer decoding block corresponding to the current decoding block as the current decoding block, taking the fusion feature as the current feature map, returning to input the current feature map into the current decoding block, and performing feature restoration processing on the current feature map based on the current decoding block to obtain a restoration feature step, and continuing to execute the restoration feature step until the current decoding block is the highest layer decoding block of the split network model.

For step S2021, an interlaced alternate replication method is used to implement downsampling of the image, thereby reducing the resolution of the image and increasing the number of channels.

For step S2023, a reduction module of a decoder that segments the network model is used to perform upsampling, the image size is increased by 2 times, the channel is decreased by 2 times, and the upsampling is implemented by deconvolution to recover details of the pixel space and the reconstructed target.

As for step S2024, the structure diagram in fig. 2 is exemplarily described, and includes a bottommost coding block, a third layer coding block, a second layer coding block, and a topmost coding block. After the current feature graph is subjected to feature reduction processing to obtain reduction features, the down-sampling features of a coding block at the upper layer of the coding block at the bottommost layer, namely a coding module at the third layer, and the obtained reduction features (obtained by decoding the decoding block at the bottommost layer) are input into a Transformer in the current decoding block together for fusion processing to obtain up-sampling features, then the up-sampling features and the down-sampling features of the third coding module are subjected to further detail feature fusion processing to obtain fusion features, then the decoding block at the upper layer (the decoding block at the third layer) corresponding to the current decoding block is used as the current decoding block, the fusion features are used as the current feature graph, the current feature graph is input into the current decoding block, and feature reduction processing is carried out on the current feature graph based on the current decoding block to obtain the reduction features; and inputting the downsampling characteristic and the restoring characteristic of the second layer of coding module into a transform in the current decoding block together for fusion to obtain an upsampling characteristic, and then performing detail characteristic fusion processing on the upsampling characteristic and the downsampling characteristic of the second coding module to obtain a fusion characteristic. Then, taking the previous layer decoding block (the highest layer decoding block) corresponding to the current decoding block as the current decoding block, taking the fusion characteristic as a current characteristic diagram, inputting the current characteristic diagram into the current decoding block, and performing characteristic restoration processing on the current characteristic diagram based on the current decoding block to obtain a restoration characteristic; and inputting the downsampling characteristic and the restoring characteristic of the first layer of coding module into a transform in the current decoding block together for fusion to obtain an upsampling characteristic, and then performing detail characteristic fusion processing on the upsampling characteristic and the downsampling characteristic of the first coding module to obtain a fusion characteristic.

It should be noted that, according to the schematic diagram shown in fig. 2, the lowest decoding block is taken as the lowest decoding block, the decoding block on the last layer of the lowest decoding block is taken as the third layer decoding block, and so on, the decoding block on the highest layer is taken as the highest layer decoding block, and the number of decoding blocks can be specifically selected according to actual needs, which is not limited herein, and as a preferable mode, the number of decoding blocks in this embodiment is 4.

With step S2025, the above-described fusion processing is realized by jump connection.

This operation enables the upsampling process to obtain sufficient context information and semantic information at the same time by adding a skip connection in the decoder and encoder such that the input of each decoding module is a fusion of the output of the previous layer and the output of the corresponding encoding module.

The encoder and the decoder are built based on the Transformer, and the two parts are connected by adopting jump connection, so that the network can acquire information lost due to reduction of the size of the picture in the down-sampling process in the up-sampling process, and meanwhile, enough context information and semantic information can be acquired, and the segmentation precision of the segmentation network model on the retinal vessel image is improved.

In step S203, a weighted average process is performed on all the segmentation process results to obtain a final segmentation result of the retinal blood vessel image.

In step S204, the retinal vessel image is input into the trained network model, and then the output result of the decoding module of each layer passes through the convolution module, and the prediction segmentation result of each layer is output. The invention carries out weighted average on the prediction result of each layer to obtain the classification probability of each pixel, and finally obtains the segmentation result of the retinal blood vessel image.

When the trained segmentation network model is used for carrying out blood vessel segmentation on the retinal blood vessel image, the weighted average is carried out on the prediction result output by each layer of decoding module to obtain the classification probability of each pixel, and the classification probability is used as the final blood vessel segmentation result, so that the accuracy of retinal blood vessel image segmentation is further improved.

The retinal blood vessel image segmentation method provided by the embodiment of the invention comprises the steps of obtaining a retinal blood vessel image and inputting the retinal blood vessel image into a segmentation network model; the encoder-decoder module based on the segmentation network model extracts and fuses the features of the retinal vessel image, and classifies each pixel in the feature image obtained after the feature extraction and fusion to obtain at least one segmentation processing result; and performing weighted average processing on all the segmentation processing results to obtain a final segmentation result of the retinal blood vessel image, wherein a network structure based on Transformer optimization is provided for retinal blood vessel image segmentation, the network structure is a U-shaped symmetric network constructed based on Transformer, the network structure not only has a symmetric encoder-decoder structure, but also is added with jump connection between an encoder part and a decoder part. While realizing the end-to-end segmentation of the retinal vessel image, the accurate segmentation result can be obtained on a limited data set, and the segmentation precision of the retinal vessel image is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 7 shows a schematic block diagram of a retinal blood vessel image segmentation apparatus in one-to-one correspondence with the retinal blood vessel image segmentation method of the above-described embodiment. As shown in fig. 7, the retinal blood vessel image segmentation apparatus includes a retinal blood vessel image acquisition module 31, a segmentation module 32, and a segmentation result acquisition module 33. The functional modules are explained in detail as follows:

and a retinal blood vessel image obtaining module 31, configured to obtain a retinal blood vessel image, and input the retinal blood vessel image into the segmentation network model.

And the segmentation module 32 is configured to perform feature extraction and fusion on the retinal blood vessel image based on an encoding module and a decoding module of the segmentation network model, and classify each pixel in the feature map obtained after the feature extraction and fusion to obtain at least one segmentation processing result.

And a segmentation result obtaining module 33, configured to perform weighted average processing on all the segmentation processing results to obtain a final segmentation result of the retinal blood vessel image.

In one embodiment, before the retinal blood vessel image obtaining module 31, the retinal blood vessel image segmentation apparatus further includes:

and the training image acquisition module is used for acquiring a training image.

And the data enhancement module is used for carrying out data enhancement processing on the training image to obtain an enhanced image.

And the training sample acquisition module is used for taking the training image and the enhanced image as training samples.

And the preprocessing module is used for preprocessing the training sample to obtain a preprocessed training sample.

And the model training module is used for inputting the preprocessed training samples into the initial segmentation network model for training to obtain the segmentation network model.

In one embodiment, the data enhancement module further comprises:

and the histogram stretching unit is used for carrying out data enhancement processing on the training image based on a histogram stretching mode to obtain an enhanced image.

In one embodiment, the preprocessing module further comprises:

and the preprocessing unit is used for sequentially carrying out Gaussian transformation, random rotation, random horizontal turnover and color correction on the training image to obtain a preprocessed training sample.

In one embodiment, the segmentation module 32 further comprises:

and the downsampling unit is used for carrying out downsampling processing on the retinal blood vessel image based on the multi-layer coding blocks corresponding to the coder of the segmentation network model to obtain downsampling characteristics corresponding to each layer of coding blocks.

And the current coding block acquisition unit is used for taking the downsampling characteristics corresponding to the bottommost coding block of the segmented network model as a current characteristic diagram and taking the decoding block in the decoder of the segmented network model corresponding to the current characteristic diagram as a current decoding block.

And the characteristic restoration unit is used for inputting the current characteristic diagram into the current decoding block and carrying out characteristic restoration processing on the current characteristic diagram based on the current decoding block to obtain restored characteristics.

And the fusion unit is used for fusing the downsampling characteristic and the restoring characteristic in the coding block at the upper layer corresponding to the current characteristic diagram to obtain an upsampling characteristic, and fusing the upsampling characteristic and the downsampling characteristic in the coding block at the upper layer corresponding to the current characteristic diagram to obtain a fused characteristic.

And the circulating unit is used for taking the previous layer of decoding block corresponding to the current decoding block as the current decoding block, taking the fusion characteristic as the current characteristic diagram, returning the current characteristic diagram to be input into the current decoding block, and carrying out characteristic restoration processing on the current characteristic diagram based on the current decoding block to continuously execute the step of obtaining the restoration characteristic until the current decoding block is the highest layer of decoding block of the split network model.

In one embodiment, the down-sampling unit further comprises:

and the downsampling characteristic acquisition unit is used for carrying out downsampling processing on the retinal blood vessel image based on a merging module and a transform module in the multi-layer coding blocks corresponding to the encoder of the segmentation network model to obtain the downsampling characteristic corresponding to each layer of coding blocks.

For specific limitations of the retinal blood vessel image segmentation device, reference may be made to the above limitations of the retinal blood vessel image segmentation method, which are not described herein again. The modules in the retinal blood vessel image segmentation device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 8, fig. 8 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only the computer device 4 having the components connection memory 41, processor 42, network interface 43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing an interface display program, which is executable by at least one processor to cause the at least one processor to execute the steps of the retinal blood vessel image segmentation method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A retinal blood vessel image segmentation method is characterized by comprising the following steps:

based on the coding module and the decoding module of the segmentation network model, performing feature extraction and fusion on the retinal blood vessel image, and classifying each pixel in a feature map obtained after the feature extraction and fusion to obtain at least one segmentation processing result;

2. The retinal blood vessel image segmentation method according to claim 1, wherein before the obtaining of the retinal blood vessel image and the inputting of the retinal blood vessel image into the segmentation network model, the method comprises:

acquiring a training image;

performing data enhancement processing on the training image to obtain an enhanced image;

taking the training image and the enhanced image as training samples;

preprocessing the training sample to obtain a preprocessed training sample;

and inputting the preprocessed training sample into an initial segmentation network model for training to obtain a segmentation network model.

3. The method of claim 2, wherein the step of performing data enhancement processing on the training image to obtain an enhanced image comprises:

and performing data enhancement processing on the training image based on a histogram stretching mode to obtain an enhanced image.

4. The method of claim 2, wherein the preprocessing the training samples to obtain preprocessed training samples comprises:

and sequentially carrying out Gaussian transformation, random rotation, random horizontal turnover and color correction on the training image to obtain a preprocessed training sample.

5. The method according to any one of claims 1 to, wherein the specific steps of feature extraction and fusion of the retinal vessel images based on the coding module and the decoding module of the segmentation network model comprise:

based on a plurality of layers of coding blocks corresponding to a coder of a segmentation network model, performing down-sampling processing on the retinal blood vessel image to obtain down-sampling characteristics corresponding to each layer of the coding blocks;

taking down-sampling features corresponding to the bottom coding block of the segmented network model as a current feature map, and taking a decoding block in a decoder of the segmented network model corresponding to the current feature map as a current decoding block;

inputting the current feature map into the current decoding block;

performing feature restoration processing on the current feature map based on the current decoding block to obtain restoration features;

fusing the downsampling features in the coding blocks at the upper layer corresponding to the current feature map with the restoration features to obtain upsampling features, and fusing the upsampling features with the downsampling features in the coding blocks at the upper layer corresponding to the current feature map to obtain fusion features;

and taking the previous layer decoding block corresponding to the current decoding block as the current decoding block, taking the fusion feature as a current feature map, returning to the step of inputting the current feature map into the current decoding block, and continuing to execute until the current decoding block is the highest layer decoding block of the segmentation network model.

6. The method of claim 5, wherein the step of downsampling the retinal blood vessel image by using multiple layers of coding blocks corresponding to the encoder based on the segmentation network model to obtain the downsampling characteristics corresponding to each layer of the coding blocks comprises:

based on a merging module and a transform module in a multilayer coding block corresponding to an encoder of a segmentation network model, down-sampling processing is carried out on the retinal blood vessel image to obtain down-sampling characteristics corresponding to each layer of the coding block.

7. A retinal blood vessel image segmentation device based on Transformer optimization is characterized by comprising:

the segmentation module is used for extracting and fusing the characteristics of the retinal blood vessel image based on the coding module and the decoding module of the segmentation network model, and classifying each pixel in the characteristic image obtained after the characteristics are extracted and fused to obtain at least one segmentation processing result;

8. The retinal blood vessel image segmentation apparatus according to claim 7, wherein before the segmentation result acquisition module, the apparatus further includes:

the training image acquisition module is used for acquiring a training image;

the data enhancement module is used for carrying out data enhancement processing on the training image to obtain an enhanced image;

a training sample obtaining module, configured to use the training image and the enhanced image as training samples;

the preprocessing module is used for preprocessing the training sample to obtain a preprocessed training sample;

and the model training module is used for inputting the preprocessing training sample into an initial segmentation network model for training to obtain the segmentation network model.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the retinal blood vessel image segmentation method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the retinal blood vessel image segmentation method according to any one of claims 1 to 6.