CN115294075A - OCTA image retinal vessel segmentation method based on attention mechanism - Google Patents

OCTA image retinal vessel segmentation method based on attention mechanism Download PDF

Info

Publication number
CN115294075A
CN115294075A CN202210960639.9A CN202210960639A CN115294075A CN 115294075 A CN115294075 A CN 115294075A CN 202210960639 A CN202210960639 A CN 202210960639A CN 115294075 A CN115294075 A CN 115294075A
Authority
CN
China
Prior art keywords
convolution
attention
layer
segmentation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210960639.9A
Other languages
Chinese (zh)
Inventor
崔少国
文浩
张宇楠
柳耘豪
杨泽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Normal University
Original Assignee
Chongqing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Normal University filed Critical Chongqing Normal University
Priority to CN202210960639.9A priority Critical patent/CN115294075A/en
Publication of CN115294075A publication Critical patent/CN115294075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an OCTA image retinal vessel segmentation method based on an attention mechanism, which comprises the following steps of: the method comprises the steps of constructing a convolutional neural network segmentation model with an attention mechanism, training the model, optimizing parameters, and quickly positioning and accurately segmenting the retinal vascular structure based on OCTA, wherein the convolutional neural network segmentation model is composed of a main feature extractor, a structural feature extractor, a reinforced feature extractor and a classifier with different scales. According to the method, a deep learning mixed model is built, when feature extraction is carried out by using a deep separable convolution layer of a large convolution kernel in the segmentation process, loss of important information of blood vessels at the frame of an image is reduced by adopting a brand-new down-sampling mode, and a space channel attention module, namely an STAM (static state modeling) module is introduced and is used for better learning the space channel information of feature maps with different scales, a complete blood vessel structure is captured, and rapid and accurate segmentation of an OCTA retinal blood vessel image is realized.

Description

OCTA image retinal vessel segmentation method based on attention mechanism
Technical Field
The invention relates to the technical field of medical image full-automatic semantic segmentation, in particular to an OCTA image retinal vessel segmentation method based on an attention mechanism.
Background
The eye is an important organ of the human perception world, and various retinal diseases are important problems threatening human health. Medical image segmentation is an important processing step in an intelligent auxiliary diagnosis process, and can help doctors to perform image-guided medical intervention or more effective radiology diagnosis and the like. Many fundus lesions occur around blood vessels, and retinal blood vessel features are abundant in retinal fundus images. The clinical case characteristics of the fundus disease can be obtained by analyzing the special structural effects of the retinal vessel such as length, width, curvature, bifurcation mode and the like, and the method has important significance for preventing and treating some related diseases. In clinical medicine, retinal blood vessel extraction technology can assist doctors in diagnosing whether patients have eye-related diseases or not by performing a series of analyses on retinal blood vessels. The conventional retinal vessel image is a common color fundus image, and Optical Coherence Tomography (OCTA) is a new imaging modality built on Optical Coherence Tomography (OCT), and is an emerging non-invasive imaging technology capable of observing vessel information of different retinal layers. Therefore, the OCTA is becoming one of important tools for observing fundus-related diseases.
In recent years, by combining low-level features to form abstract deep-level features, the convolutional neural network can better complete various visual tasks. Deep learning solves various problems in a learning mode from data, and an early retinal vessel segmentation algorithm takes a full convolution neural network as a core and aims to better solve the problem of how to better recover lost information from convolution downsampling. Later on, the following three categories were developed: firstly, designing a U-shaped 'coding and decoding symmetrical' structure represented by U-net; secondly, designing the convolution of the introduced cavity represented by deep Lab; finally, the design is based on self-attention structure of self-attention in the Transformer encoder. Retinal vessel data sets generally have the characteristic of small sample number, and a segmentation network and a deep lab series based on self-attention calculation both need a large amount of data to be trained to achieve a good segmentation result. Therefore, only the network improved based on the U-net can achieve good segmentation effect under the condition of less data set sample number.
Although, the method based on the U-net improvement is applied to the blood vessel segmentation of the ordinary color fundus image, and good results are obtained. However, the inventors of the present application have found through research that the current research for extracting blood vessels from an OCTA fundus image is relatively few, and the current deep learning-based OCTA retinal blood vessel segmentation has the following disadvantages: (1) The size of retinal blood vessels changes greatly, wherein the retinal blood vessels have tiny capillary vessels, the minimum diameter is only 1-2 pixels wide, and the tail ends of the blood vessels are easily confused with the background to cause the wrong classification of the tail ends of the blood vessels as the background; (2) Retinal blood vessels have complex structures similar to trees, such as bifurcations, intersections and the like, are irregular in shape and uneven in distribution, and cause the conditions of blood vessels discontinuity and breakage in the segmentation process; (3) Lesions such as microaneurysms and exudates also exist at the edge of part of retinal blood vessels to influence the segmentation result; (4) In the process of extracting semantic information with different sizes by a network, the change of the size of the feature map can cause the condition that blood vessels at the image frame are misclassified and missed for detection; (5) The depth and width of the network are relatively deep, resulting in a relatively slow segmentation speed.
Disclosure of Invention
Aiming at the technical problems of loss of important information of blood vessels, low segmentation speed, inaccurate segmentation of blood vessel terminal regions, incomplete captured blood vessel structures and discontinuous and broken blood vessels in segmentation results in the existing OCTA retinal blood vessel segmentation based on deep learning, the invention provides an OCTA image retinal blood vessel segmentation method based on attention mechanism.
In order to solve the technical problems, the invention adopts the following technical scheme:
an OCTA image retinal vessel segmentation method based on an attention mechanism comprises the following steps:
s1, building a convolutional neural network segmentation model with an attention mechanism:
s11, a convolutional neural network segmentation model with an attention mechanism consists of stem feature extractors with different scales, structural feature extractors, reinforced feature extractors and classifiers, wherein the stem feature extractors with different scales are used for carrying out four times of downsampling on retinal vessel feature maps based on OCTA fundus images to respectively obtain four vessel feature maps with different scales of which the input feature map size ratios are 1/4, 1/16, 1/64 and 1/256, carrying out vessel detail feature extraction on the four vessel feature maps with different scales of sizes, and serially connecting the vessel feature maps in the order of the size of the feature maps from large to small; the structural feature extractor is used for accurately extracting the structural features of the blood vessels before the 1/4 and 1/16 blood vessel feature maps obtained by the main feature extractor are subjected to channel splicing with the corresponding proportional feature maps in the enhanced feature extractor; the enhanced feature extractor is used for up-sampling the 1/256 high-order feature map obtained by the main feature extractor, gradually reducing the size proportion of the feature map to 1/64, 1/16, 1/4 and 1/1, and extracting the detailed features of the blood vessel aiming at the feature maps with different sizes; the classifier is used for classifying labels of the pixels according to the blood vessel characteristics extracted from the characteristic images with different scale sizes; the input of the segmentation network is three channels, the output is two channels, the size of the input image and the size of the output image are both 512 multiplied by 512, and end-to-end semantic segmentation can be realized;
s12, the different-scale trunk feature extractors comprise ten convolution layer groups, two convolution layers and four adjacent block merging layers, wherein one adjacent block merging layer is arranged behind every two convolution layer groups, the two convolution layers are positioned between the last two convolution layer groups, and each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1; the structural feature extractor comprises two attention layers, each attention layer comprising a channel attention submodule and a spatial attention submodule; the enhanced feature extractor comprises eight convolution layer groups and four up-sampling layers, wherein two convolution layer groups are arranged behind each up-sampling layer, and each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1; the classifier consists of a category prediction layer and a softmax regression layer, and the softmax regression layer converts category prediction scores into probability distribution;
s2, model training and parameter optimization:
s21, initializing the convolution neural network segmentation model parameters with the attention mechanism built in the step S1 by adopting an Xavier method;
s22, preprocessing the data and performing online enhancement on the retinal vessel data set with the retinal vessel segmentation label according to the ratio of 7:2:1, dividing the network segmentation model into a training set, a verification set and a test set in proportion, and pre-training the network segmentation model by adopting 10-fold cross verification;
s23, inputting the OCTA images of the same retinal vessel section into a network, and generating a retinal vessel segmentation result through network forward calculation, wherein the network forward calculation comprises convolution operation, nonlinear excitation, probability value conversion and multi-head attention calculation;
s24, adopting a classified cross entropy loss function as a segmentation network optimization target, wherein the target function is defined as follows:
Figure BDA0003792672110000041
where θ 'is the classification network parameter, Y' is the segmentation label, Y is the probability of prediction, S is the number of image pixels, C is the number of pixel classes,
Figure BDA0003792672110000042
is a regularization term, lambda is a regularization factor, and Q is the number of model parameters;
s25, optimizing a function:
optimizing an objective function by adopting a stochastic gradient descent algorithm, and updating parameters of the retinal vessel segmentation network model by using error back propagation, wherein the optimization process comprises the following steps:
Figure BDA0003792672110000043
m t =μ×m t-1t g t
θ t =θ t-1 +m t
wherein, t represents the number of iterations,
Figure BDA0003792672110000044
representing the gradient calculation, theta corresponds to theta', L (theta) in the objective function of step S23 t-1 ) When using theta t-1 As a loss function in the network parameters, g t 、m t And μ are the gradient, momentum and momentum coefficient, η, respectively t Is the learning rate;
s3, fast positioning and accurate segmentation of the retinal vascular structure based on OCTA:
s31, sequentially carrying out data preprocessing and online enhancement processing on the fundus images based on the OCTA to obtain processed images;
s32, inputting the image subjected to online enhancement processing as a three-channel input into a feature extractor consisting of a trunk feature extractor, a structural feature extractor and an enhancement feature extractor for feature extraction, automatically positioning and outputting a reconstructed retinal vessel image feature map;
s33, inputting the reconstructed retinal blood vessel image feature map into a classifier, and predicting feature image pixels one by one in a sliding window mode to generate two pixel label prediction value maps with the same size as the original image;
s34, converting the prediction score into probability distribution by using a softmax function;
and S35, taking the subscript component where the maximum probability of each pixel is located as a pixel class label, and obtaining a retina blood vessel segmentation result binary image while realizing rapid positioning of a blood vessel structure.
Further, in the step S12 of extracting the trunk features of different scales, the convolution kernel size of each convolution layer in the two convolution layers is 1 × 1, the step size is 1, the number of convolution kernels is 1024, 512, respectively, and each adjacent block merging layer is composed of a convolution layer with convolution kernel size of 2 × 2 and step size of 2 and a convolution layer with convolution kernel size of 1 × 1 and step size of 1.
Further, in the structural feature extractor of step S12, the channel attention submodule includes a global maximum pooling layer, a global average pooling layer, a depth separable convolutional layer with a convolutional kernel size of 1 × 2, and two convolutional layers with a convolutional kernel size of 1 × 1, which are sequentially connected; the space attention submodule comprises a convolution layer with convolution kernel size of 1 x 1, a multi-head self-attention calculation layer, a full connection layer and a convolution layer with convolution kernel size of 1 x 1 which are sequentially connected.
Further, each upsampling layer in the step S12 enhanced feature extractor is an deconvolution layer with a convolution kernel size of 3 × 3 and a step size of 2.
Further, the size of the convolution kernel of the class prediction layer in the classifier of step S12 is 1 × 1, the number of convolution kernels is 2, and the step size is 1.
Further, the data preprocessing in step S22 is to adjust the size of the image pixels to 512 × 512 uniformly before sending to the network, and the online enhancement is to use horizontal flipping, vertical flipping, clipping, rotating by 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, and 315 ° online data enhancement techniques, so as to increase the training data samples by 10 times of the original.
Further, the rolling operation in step S23 is: output characteristic diagram Z corresponding to any convolution kernel in network i The calculation was performed using the following formula:
Figure BDA0003792672110000061
wherein f represents a non-linear excitation function, r represents an input channel index number, k represents the number of input channels, W ir An r channel weight matrix representing the ith convolution kernel,
Figure BDA0003792672110000062
is the convolution impairment, X r Representing the r-th input channel image.
Further, the nonlinear excitation in step S23 is: using a rectifying linear unit ReLU as a non-linear excitation function for transforming the output profile Z generated by the convolution kernel i Is non-linearly transformed, the rectifying linear unit ReLU being defined as follows:
f(x)=max(0,x)
where f (x) represents the rectifying linear unit function, max represents the maximum value, and x is an input value.
Further, the probability value in step S23 is converted into: the predicted scores output by the network are converted into probability distributions using a softmax function, which is defined as follows:
Figure BDA0003792672110000063
wherein Y is j Is the probability that the pixel belongs to class j, O j Is the prediction score of a certain pixel on the jth class, and K represents the number of classes.
Further, in the multi-head attention calculation in step S23, the multi-head self-attention mechanism only performs one calculation, and finally combines the results, where the multi-head self-attention mechanism is implemented based on a zoom dot product attention operation, and the zoom dot product attention operation is defined as follows:
Figure BDA0003792672110000071
wherein Attention is the Attention function, Q, K, V represent the input matrix of the scaled dot product Attention,
Figure BDA0003792672110000072
and expressing the dimensionality of the K matrix, multiplying the Q and K matrixes by the V matrix after the Q and K matrixes are multiplied and changed and pass through a softmax function, and obtaining a final output result of self attention.
Compared with the prior art, the OCTA image retinal vessel segmentation method based on the attention mechanism has the following beneficial effects:
1. each convolution layer group in the trunk feature extractor and the enhanced feature extractor consists of a channel-by-channel convolution layer and a point-by-point convolution layer, so that in the retinal blood vessel feature extraction stage, deep separable convolution operation based on a large convolution kernel is used, thereby being capable of better extracting blood vessel detail features, reducing the number of pixel points for misclassifying the blood vessel tail end as a background, reducing the calculated amount of the convolution operation and improving the segmentation speed;
2. in the down-sampling stage, the main feature extractor can capture detail information around the feature map frame while reducing the feature map by adopting a near block merging operation mode, reduces the loss of important information such as blood vessels and the like at the frame positions in the feature maps with different scales, and improves the segmentation accuracy;
3. a convolution self-attention module (STAM) is introduced into the structural feature extractor, and is used for better learning the spatial channel information of feature maps with different scales, capturing the complete blood vessel structure, avoiding the conditions of blood vessel discontinuity and fracture in the retinal blood vessel segmentation result and improving the segmentation precision.
Drawings
Fig. 1 is a schematic structural diagram of a retinal vessel segmentation network based on an attention mechanism for an oca image provided by the invention.
FIG. 2 is a schematic diagram of a deep separable convolution operation based on a large convolution kernel according to the present invention.
Fig. 3 is a schematic diagram of a neighboring block merging operation provided by the present invention.
Fig. 4a is a schematic diagram of the overall structure of the STAM module provided by the present invention.
FIG. 4b is a schematic diagram of a channel attention submodule structure in the STAM module provided by the present invention.
Fig. 4c is a schematic structural diagram of a spatial attention submodule in the STAM module provided in the present invention.
Fig. 5 is a schematic diagram of an embedded jump connection structure of a STAM module according to the present invention.
Fig. 6 is a schematic diagram of an original image and a manually segmented genuine label provided by the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained below by combining the specific drawings.
Referring to fig. 1, the present invention provides an attention-based method for segmenting retinal blood vessels in an OCTA image, including the following steps:
s1, building a convolutional neural network segmentation model with an attention mechanism:
s11, a convolutional neural network segmentation model with an attention mechanism is used for generating an input OCTA fundus image, and consists of stem feature extractors (numbered 1-16 in Table 1), structural feature extractors (numbered 29-30 in Table 1), enhanced feature extractors (numbered 17-27 in Table 1) and classifiers (numbered 31 in Table 1) in different scales, wherein the stem feature extractors in different scales are used for extracting different context features of retinal vessels from feature maps in different scales, and are used for performing down-sampling on a retinal vessel feature map based on the OCTA fundus image for four times to respectively obtain four blood vessel feature maps with different scales of input feature map size ratios of 1/4, 1/16, 1/64 and 1/256, performing blood vessel detail feature extraction on the four blood vessel feature maps in different scales, and serially connecting the four blood vessel feature maps in the order of the feature map size from large to small; the structure feature extractor has the function of accurately extracting the structure features of the retinal blood vessels, and is used for accurately extracting the structure features of the blood vessels before channel splicing is carried out on 1/4 and 1/16 blood vessel feature maps obtained by the main feature extractor and corresponding proportion feature maps in the enhanced feature extractor; the enhanced feature extractor is used for carrying out further blood vessel feature extraction with different scales while reducing the size of a retina blood vessel feature map from a 1/256 ratio to a 1/1 ratio, wherein the enhanced feature extractor is used for carrying out up-sampling on a 1/256 high-order feature map obtained by a main feature extractor, gradually reducing the size ratio of the feature map to 1/64, 1/16, 1/4 and 1/1, carrying out blood vessel detail feature extraction on the feature maps with different scales while reducing, and carrying out jump connection on channel dimensions with the feature maps obtained by the main feature extractor and the structural feature extractor after each up-sampling is completed, so that the number of input channels in layers corresponding to serial numbers 18, 21, 24 and 27 in table 1 is 1024, 512, 256 and 128 respectively, and carrying out detail feature extraction in the extractor and simultaneously carrying out accurate prediction on pixel classes by reducing to the 1/1 feature map; the classifier is used for classifying the labels of the pixels according to the blood vessel characteristics extracted from the characteristic maps with different scales and sizes; the input of the segmentation network is three channels which respectively represent an RGB color three channel of an OCTA fundus image, the output of the segmentation network is two channels which respectively represent the probability that a pixel belongs to a blood vessel region and a non-blood vessel region (background), the size of an input image is the same as that of an output image, and the input image and the output image are 512 multiplied by 512, so that end-to-end semantic segmentation can be realized;
s12, the different-scale trunk feature extractors comprise ten convolution layer groups, two convolution layers and four adjacent block merging layers, wherein one adjacent block merging layer is arranged behind every two convolution layer groups, and the two convolution layers are positioned between the last two convolution layer groups; each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1; the size of a convolution kernel of each convolution layer in the two convolution layers is 1 × 1, the step length is 1, the number of the convolution kernels is 1024 and 512, depth separable convolution operation based on the large convolution kernels is shown in fig. 2, specifically, the number of channels before the operation is performed is n as shown in fig. 2, firstly, after performing channel-by-channel convolution operation on each channel by n convolution kernels with the size of 7 × 7, splicing according to channel dimensions, performing point-by-point convolution operation by using m convolution kernels with the size of 1 × 1 to obtain a final output channel m, and the specific values of m and n are the number of convolution kernels in each convolution layer group in table 1 respectively; each adjacent block merging layer is composed of a convolution layer with convolution kernel size of 2 × 2 and step size of 2 and a convolution layer with convolution kernel size of 1 × 1 and step size of 1, merging operations of adjacent block merging layers are shown in fig. 3, specifically, red, yellow, blue and green in fig. 3 respectively represent four adjacent blocks, and the numbers 1234 in the blocks are index values of corresponding positions of the adjacent blocks respectively; the structural feature extractor comprises two attention layers (STAM modules), each attention layer (STAM module) comprises a channel attention submodule and a spatial attention submodule, as shown in FIG. 4 a; the channel attention submodule comprises a global maximum pooling layer, a global average pooling layer, a depth separable convolutional layer with a convolutional kernel size of 1 × 2 and two convolutional layers with a convolutional kernel size of 1 × 1 which are sequentially connected, as shown in fig. 4 b; the spatial attention submodule comprises a convolution layer with convolution kernel size of 1 multiplied by 1, a multi-head self-attention calculation layer, a full connection layer and a convolution layer with convolution kernel size of 1 multiplied by 1 which are sequentially connected and arranged, as shown in figure 4c, and a specific embedding schematic diagram of the structural feature extractor is shown in figure 5; the enhanced feature extractor comprises eight convolution layer groups and four upper sampling layers, wherein two convolution layer groups are arranged behind each upper sampling layer, each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 x 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 x 1 and step length of 1, and each upper sampling layer is an inverse convolution layer with convolution kernel size of 3 x 3 and step length of 2; the classifier consists of a category prediction layer and a softmax regression layer, wherein the softmax regression layer converts category prediction scores into probability distribution, the size of a convolution kernel of the category prediction layer is 1 multiplied by 1, the number of the convolution kernels is 2, and the step length is 1, and specifically, the convolution neural network segmentation model parameters with the attention mechanism are shown in the following table 1.
TABLE 1 convolution neural network segmentation model parameter table with attention mechanism
Figure BDA0003792672110000101
Figure BDA0003792672110000111
In the above table, padding =3 with a convolution kernel size of 7 × 7, padding =1 with a convolution kernel size of 3 × 3, and Padding =0 with a convolution kernel size of 1 × 1.
S2, model training and parameter optimization:
s21, initializing the convolution neural network segmentation model parameters with the attention mechanism built in the step S1 by adopting an Xavier method;
s22, preprocessing the data and performing online enhancement on the retinal vessel data set with the retinal vessel segmentation label according to the ratio of 7:2:1, dividing the network segmentation model into a training set, a verification set and a test set in proportion, and pre-training the network segmentation model by adopting 10-fold cross verification; as a specific implementation manner, the inventor of the present application obtains 300 cases of patient data with pixel-level segmentation labels, specifically, the adopted data set is a data set for segmentation in an oca _6M subset partitioned by a 20 edition retinal blood vessel data set oca-500 disclosed in IEEE-DataPort by professor and its team, wherein the selected original image is an oca image of the whole eye, the selected labels are retinal blood vessel segmentation labels (artificial segmentation real labels), 300 pieces of each of the original image and the artificial segmentation real labels are shown in fig. 6, and the data set is to be according to 7:2: the scale of 1 is divided into a training set, a verification set and a test set by a random sampling mode, and the training set, the verification set and the test set respectively comprise 210 training images, 60 verification images and 30 test images. To unify the input sizes of different networks in subsequent comparative experiments, the pixel size was therefore uniformly adjusted to 512 × 512 before being sent to the network, as shown in table 2 below:
table 2 data set distribution table
Figure BDA0003792672110000112
In the training process, the data amount of the training set is less, which is one of the important factors causing model under-fitting, and because the OCTA image is scarce and the data amount of the OCTA-500 retinal vessel segmentation data set is less, in order to reduce the influence of under-fitting, the on-line data enhancement technology of horizontal turning, vertical turning, cutting, rotation by 45 degrees, 90 degrees, 135 degrees, 180 degrees, 225 degrees, 270 degrees and 315 degrees is used, so that the training data sample is increased by 10 times of the initial training data sample.
S23, inputting the OCTA image of the same retinal vessel section into a network, and generating a retinal vessel segmentation result through network forward calculation, wherein the network forward calculation comprises the following steps:
and (3) convolution operation: output characteristic diagram Z corresponding to any convolution kernel in network i The calculation was performed using the following formula:
Figure BDA0003792672110000121
wherein f represents a nonlinear excitation function, r represents an input channel index number, k represents the number of input channels, W ir An r channel weight matrix representing the ith convolution kernel,
Figure BDA0003792672110000122
is the convolution impairment, X r Representing the r-th input channel image.
Nonlinear excitation: using as the non-linear excitation function f a rectifying linear unit ReLU, which is the activation function of the network, for transforming the output profile Z generated by the convolution kernel i Is non-linearly transformed, said rectifying linear unit ReLU being defined as follows:
f(x)=max(0,x)
where f (x) represents the rectifying linear unit function, max represents the maximum value, and x is an input value.
Probability value conversion: the predicted scores output by the network are converted into probability distributions using a softmax function, which is defined as follows:
Figure BDA0003792672110000123
wherein Y is j Is the probability that the pixel belongs to class j, O j Is the prediction score of a certain pixel on the jth class, and K represents the number of classes.
Multi-head attention (Multi-head Self-attention) calculation: the Attention function is a process of carrying out weighting change processing on an output characteristic diagram, and compared with a multi-head self-Attention mechanism and a common self-Attention mechanism, the multi-dimensional characteristic can be obtained; in the multi-head Attention calculation, the multi-head self-Attention mechanism only executes one calculation, and finally combines the results, the multi-head self-Attention mechanism is realized based on the zoom point product Attention operation (Attention), and the zoom point product Attention operation (Attention) is defined as follows:
Figure BDA0003792672110000131
wherein Attention is the Attention function, Q, K, V represent the input matrix of the scaled dot product Attention,
Figure BDA0003792672110000132
and expressing the dimensionality of the K matrix, multiplying the Q and K matrixes by the V matrix after the Q and K matrixes are multiplied and changed and pass through a softmax function, and obtaining a final output result of self attention.
The multi-head self-attention mechanism is that different linear mappings are carried out for h times, scaling dot product attention operation is carried out on different mapping results in parallel, then the results are spliced and input into a linear mapping layer, and finally the output result of the multi-head self-attention mechanism is obtained. After h times of linear mapping, the model can learn related information in different representation subspaces.
S24, adopting a classified cross entropy loss function as a segmentation network optimization target, wherein the loss function is designed as follows:
Figure BDA0003792672110000133
where θ 'is a classification network parameter, Y' is a segmentation label, Y is a predicted probability, S is the number of image pixels, C is the number of pixel classes, and C =2, S =512 × 512=262144 in the experiment;
in order to prevent overfitting, an L2 regularization term is added to the loss function to obtain a final objective function, which is defined as follows:
Figure BDA0003792672110000134
wherein the content of the first and second substances,
Figure BDA0003792672110000135
and lambda is a regularization factor, and Q is the number of model parameters.
S25, optimizing a function:
optimizing an objective function by adopting a random gradient descent algorithm, and updating retinal vessel segmentation network model parameters by using error back propagation, wherein the specific optimization process is as follows:
Figure BDA0003792672110000141
m t =μ×m t-1t g t
θ t =θ t-1 +m t
wherein, t represents the number of iterations,
Figure BDA0003792672110000142
representing the gradient calculation, θ corresponds to θ', L (θ) in the objective function of step S23 t-1 ) When using theta t-1 As a loss function in the network parameters, g t 、m t And μ is the gradient, momentum and momentum coefficient, respectively, such as given by μ =0.99; eta t Is a learning rate, and is initially set to 1e -3 Decreasing by 1/10 every 50 iterations until 1e -5 Until now.
S3, fast positioning and accurate segmentation of the retinal vascular structure based on OCTA:
s31, sequentially carrying out data preprocessing and online enhancement processing on the fundus images based on the OCTA to obtain processed images, wherein the data preprocessing and online enhancement technology in the step S22 can be specifically referred;
s32, inputting the image subjected to online enhancement processing as three channels into a feature extractor consisting of a trunk feature extractor, a structural feature extractor and a reinforcing feature extractor with different scales for feature extraction, automatically positioning and outputting a reconstructed retinal blood vessel image feature map;
s33, inputting the reconstructed retinal blood vessel image feature map into a classifier, and predicting feature image pixels one by one in a sliding window mode to generate two pixel label prediction value maps with the same size as the original image;
s34, converting the prediction score into probability distribution by using a softmax function;
s35, taking the subscript component (0 or 1) where the maximum probability of each pixel is located as a pixel class label, and obtaining a retina blood vessel segmentation result binary image (a blood vessel region and a non-blood vessel region) while realizing rapid positioning of a blood vessel structure.
Compared with the prior art, the OCTA image retinal vessel segmentation method based on the attention mechanism has the following beneficial effects:
1. each convolution layer group in the trunk feature extractor and the enhanced feature extractor consists of a channel-by-channel convolution layer and a point-by-point convolution layer, so that in the retinal blood vessel feature extraction stage, deep separable convolution operation based on a large convolution kernel is used, thereby being capable of better extracting blood vessel detail features, reducing the number of pixel points for misclassifying the blood vessel tail end as a background, reducing the calculated amount of the convolution operation and improving the segmentation speed;
2. in the down-sampling stage, the main feature extractor can capture detail information around the feature map frame while reducing the feature map by adopting a near block merging operation mode, reduces the loss of important information such as blood vessels and the like at the frame positions in the feature maps with different scales, and improves the segmentation accuracy;
3. a convolution self-attention module (STAM) is introduced into the structural feature extractor, and the method is used for better learning the spatial channel information of feature maps with different scales, capturing the complete blood vessel structure, avoiding the conditions of blood vessel discontinuity and fracture in the retinal blood vessel segmentation result and improving the segmentation precision.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An OCTA image retinal vessel segmentation method based on an attention mechanism is characterized by comprising the following steps:
s1, building a convolutional neural network segmentation model with an attention mechanism:
s11, a convolutional neural network segmentation model with an attention mechanism consists of stem feature extractors with different scales, structural feature extractors, reinforced feature extractors and classifiers, wherein the stem feature extractors with different scales are used for carrying out four times of downsampling on retinal vessel feature maps based on OCTA fundus images to respectively obtain four vessel feature maps with different scales of which the input feature map size ratios are 1/4, 1/16, 1/64 and 1/256, carrying out vessel detail feature extraction on the four vessel feature maps with different scales of sizes, and serially connecting the vessel feature maps in the order of the size of the feature maps from large to small; the structural feature extractor is used for accurately extracting the structural features of the blood vessels before channel splicing is carried out on the 1/4 and 1/16 blood vessel feature graphs obtained by the trunk feature extractor and the corresponding proportional feature graphs in the enhanced feature extractor; the enhanced feature extractor is used for up-sampling the 1/256 high-order feature map obtained by the main feature extractor, gradually reducing the size proportion of the feature map to 1/64, 1/16, 1/4 and 1/1, and extracting the detailed features of the blood vessel aiming at the feature maps with different sizes; the classifier is used for classifying the labels of the pixels according to the blood vessel characteristics extracted from the characteristic maps with different scales and sizes; the input of the segmentation network is three channels, the output of the segmentation network is two channels, the size of the input image and the size of the output image are both 512 multiplied by 512, and end-to-end semantic segmentation can be realized;
s12, the different-scale trunk feature extractors comprise ten convolution layer groups, two convolution layers and four adjacent block merging layers, wherein one adjacent block merging layer is arranged behind every two convolution layer groups, the two convolution layers are positioned between the last two convolution layer groups, and each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1; the structural feature extractor comprises two attention layers, each attention layer comprising a channel attention submodule and a spatial attention submodule; the enhanced feature extractor comprises eight convolution layer groups and four up-sampling layers, wherein two convolution layer groups are arranged behind each up-sampling layer, and each convolution layer group consists of a channel-by-channel convolution layer with convolution kernel size of 7 multiplied by 7 and step length of 1 and a point-by-point convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1; the classifier consists of a category prediction layer and a softmax regression layer, and the softmax regression layer converts category prediction scores into probability distribution;
s2, model training and parameter optimization:
s21, initializing the convolution neural network segmentation model parameters with the attention mechanism built in the step S1 by adopting an Xavier method;
s22, preprocessing the data and performing online enhancement on the retinal vessel data set with the retinal vessel segmentation label according to the ratio of 7:2:1, dividing the network segmentation model into a training set, a verification set and a test set in proportion, and pre-training the network segmentation model by adopting 10-fold cross verification;
s23, inputting the OCTA images of the same retinal vessel section into a network, and generating a retinal vessel segmentation result through network forward calculation, wherein the network forward calculation comprises convolution operation, nonlinear excitation, probability value conversion and multi-head attention calculation;
s24, adopting a classified cross entropy loss function as a segmentation network optimization target, wherein the target function is defined as follows:
Figure FDA0003792672100000021
where θ 'is a classification network parameter, Y' is a segmentation label, Y is a probability of prediction, S is a number of image pixels, C is a number of pixel classes,
Figure FDA0003792672100000022
is a regularization term, lambda is a regularization factor, and Q is the number of model parameters;
s25, optimizing a function:
optimizing an objective function by adopting a stochastic gradient descent algorithm, and updating parameters of the retinal vessel segmentation network model by using error back propagation, wherein the optimization process comprises the following steps:
Figure FDA0003792672100000023
m t =μ×m t-1t g t
θ t =θ t-1 +m t
wherein, t represents the number of iterations,
Figure FDA0003792672100000031
representing the gradient calculation, θ corresponds to θ', L (θ) in the objective function of step S23 t-1 ) When using theta t-1 As a loss function in the network parameters, g t 、m t And μ are eachGradient, momentum and momentum coefficient, eta t Is the learning rate;
s3, fast positioning and accurate segmentation of the retinal vascular structure based on OCTA:
s31, sequentially carrying out data preprocessing and online enhancement processing on the fundus images based on the OCTA to obtain processed images;
s32, inputting the image subjected to online enhancement processing as a three-channel input into a feature extractor consisting of a trunk feature extractor, a structural feature extractor and an enhancement feature extractor for feature extraction, automatically positioning and outputting a reconstructed retinal vessel image feature map;
s33, inputting the reconstructed retinal blood vessel image feature map into a classifier, and predicting feature image pixels one by one in a sliding window mode to generate two pixel label prediction value maps with the same size as the original image;
s34, converting the prediction score into probability distribution by using a softmax function;
s35, taking the subscript component where the maximum probability of each pixel is located as a pixel class label, and obtaining a retina blood vessel segmentation result binary image while realizing rapid positioning of a blood vessel structure.
2. The method for retinal vessel segmentation based on an OCTA image of an attention device as claimed in claim 1, wherein in the step S12 of the stem feature extractor with different scales, the convolution kernel size of each convolution layer in two convolution layers is 1 x 1 and the step size is 1, the number of convolution kernels is 1024 and 512 respectively, and each adjacent block merging layer is composed of one convolution layer with convolution kernel size of 2 x 2 and the step size of 2 and one convolution layer with convolution kernel size of 1 x 1 and the step size of 1.
3. The method for retinal vessel segmentation based on an OCTA image of an attention mechanism as claimed in claim 1, wherein in the step S12 of the structural feature extractor, the channel attention submodule comprises a global maximum pooling layer, a global average pooling layer, a depth separable convolutional layer with a convolutional kernel size of 1 x 2 and two convolutional layers with a convolutional kernel size of 1 x 1 which are sequentially connected; the space attention submodule comprises a convolution layer with convolution kernel size of 1 x 1, a multi-head self-attention calculation layer, a full connection layer and a convolution layer with convolution kernel size of 1 x 1 which are sequentially connected.
4. The method of claim 1, wherein each upsampling layer in the step S12 robust feature extractor is an deconvolution layer with a convolution kernel size of 3 x 3 and a step size of 2.
5. The method for retinal vessel segmentation based on an OCTA image of an attention mechanism as claimed in claim 1, wherein the size of the convolution kernel of the class prediction layer in the classifier of the step S12 is 1 x 1, the number of the convolution kernels is 2, and the step size is 1.
6. The method for retinal vessel segmentation based on OCTA images in an attention mechanism according to claim 1, wherein the data preprocessing in step S22 is to adjust the image pixel size to 512 × 512 uniformly before being sent to the network, and the online enhancement is to use horizontal flip, vertical flip, clipping, 45 ° rotation, 90 °, 135 °, 180 °, 225 °, 270 °, 315 ° online data enhancement techniques to increase the training data samples by 10 times of the initial one.
7. The method for segmenting retinal blood vessels based on an OCTA image of claim 1, wherein the convolution operation in the step S23 is: output characteristic diagram Z corresponding to any convolution kernel in network i The calculation was performed using the following formula:
Figure FDA0003792672100000041
wherein f represents a nonlinear excitation function, r represents an input channel index number, k represents the number of input channels, W ir Represents the ith volumeThe r-th channel weight matrix of the product kernel,
Figure FDA0003792672100000042
is the convolution impairment, X r Representing the r-th input channel image.
8. The method for segmenting retinal blood vessels based on an OCTA image of an attention mechanism as claimed in claim 1, wherein the nonlinear excitation in the step S23 is as follows: using the rectified linear unit ReLU as a non-linear excitation function for transforming the output profile Z generated by the convolution kernel i Is non-linearly transformed, said rectifying linear unit ReLU being defined as follows:
f(x)=max(0,x)
where f (x) represents the rectified linear unit function, max represents the maximum value, and x is an input value.
9. The method for retinal vessel segmentation based on an OCTA image of an attention mechanism as claimed in claim 1, wherein the probability value in the step S23 is converted into: the predicted scores output by the network are converted into probability distributions using a softmax function, which is defined as follows:
Figure FDA0003792672100000051
wherein, Y j Is the probability that the pixel belongs to class j, O j Is the prediction score of a certain pixel on the jth class, and K represents the number of classes.
10. The method for segmenting retinal vessels of an OCTA image based on an attention mechanism as claimed in claim 1, wherein in the step S23, during the multi-head attention calculation, the multi-head self-attention mechanism only performs one calculation, and finally combines the results, and the multi-head self-attention mechanism is implemented based on a zoom dot product attention operation defined as follows:
Figure FDA0003792672100000052
wherein Attention is the Attention function, Q, K, V represent the input matrix of the scaled dot product Attention,
Figure FDA0003792672100000053
and expressing the dimensionality of the K matrix, multiplying the Q and K matrixes by the V matrix after the Q and K matrixes are multiplied and changed and pass through a softmax function, and obtaining a final output result of self attention.
CN202210960639.9A 2022-08-11 2022-08-11 OCTA image retinal vessel segmentation method based on attention mechanism Pending CN115294075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210960639.9A CN115294075A (en) 2022-08-11 2022-08-11 OCTA image retinal vessel segmentation method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210960639.9A CN115294075A (en) 2022-08-11 2022-08-11 OCTA image retinal vessel segmentation method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN115294075A true CN115294075A (en) 2022-11-04

Family

ID=83828949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210960639.9A Pending CN115294075A (en) 2022-08-11 2022-08-11 OCTA image retinal vessel segmentation method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN115294075A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984952A (en) * 2023-03-20 2023-04-18 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition
CN116012367A (en) * 2023-02-14 2023-04-25 山东省人工智能研究院 Deep learning-based stomach mucosa feature and position identification method
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012367A (en) * 2023-02-14 2023-04-25 山东省人工智能研究院 Deep learning-based stomach mucosa feature and position identification method
CN116012367B (en) * 2023-02-14 2023-09-12 山东省人工智能研究院 Deep learning-based stomach mucosa feature and position identification method
CN115984952A (en) * 2023-03-20 2023-04-18 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition
CN115984952B (en) * 2023-03-20 2023-11-24 杭州叶蓁科技有限公司 Eye movement tracking system and method based on bulbar conjunctiva blood vessel image recognition
CN117058380A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
CN117058380B (en) * 2023-08-15 2024-03-26 北京学图灵教育科技有限公司 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention

Similar Documents

Publication Publication Date Title
CN108021916B (en) Deep learning diabetic retinopathy sorting technique based on attention mechanism
CN109345538B (en) Retinal vessel segmentation method based on convolutional neural network
CN108898175B (en) Computer-aided model construction method based on deep learning gastric cancer pathological section
CN115294075A (en) OCTA image retinal vessel segmentation method based on attention mechanism
CN107977932A (en) It is a kind of based on can differentiate attribute constraint generation confrontation network face image super-resolution reconstruction method
CN107256550A (en) A kind of retinal image segmentation method based on efficient CNN CRF networks
CN107506761A (en) Brain image dividing method and system based on notable inquiry learning convolutional neural networks
CN115205300B (en) Fundus blood vessel image segmentation method and system based on cavity convolution and semantic fusion
WO2021115084A1 (en) Structural magnetic resonance image-based brain age deep learning prediction system
CN114038037B (en) Expression label correction and identification method based on separable residual error attention network
CN114283158A (en) Retinal blood vessel image segmentation method and device and computer equipment
CN113223005B (en) Thyroid nodule automatic segmentation and grading intelligent system
CN112508864A (en) Retinal vessel image segmentation method based on improved UNet +
CN115223715A (en) Cancer prediction method and system based on multi-modal information fusion
CN113610859B (en) Automatic thyroid nodule segmentation method based on ultrasonic image
CN113012163A (en) Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network
CN110738660A (en) Spine CT image segmentation method and device based on improved U-net
CN111444844A (en) Liquid-based cell artificial intelligence detection method based on variational self-encoder
CN114511502A (en) Gastrointestinal endoscope image polyp detection system based on artificial intelligence, terminal and storage medium
CN112288749A (en) Skull image segmentation method based on depth iterative fusion depth learning model
CN113361353A (en) Zebrafish morphological scoring method based on DeepLabV3Plus
CN114565628B (en) Image segmentation method and system based on boundary perception attention
CN109508640A (en) A kind of crowd's sentiment analysis method, apparatus and storage medium
CN113344933B (en) Glandular cell segmentation method based on multi-level feature fusion network
Sallam et al. Diabetic retinopathy grading using resnet convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination