CN115330807A - Choroidal neovascularization image segmentation method based on hybrid convolutional network - Google Patents

Choroidal neovascularization image segmentation method based on hybrid convolutional network Download PDF

Info

Publication number
CN115330807A
CN115330807A CN202210814858.6A CN202210814858A CN115330807A CN 115330807 A CN115330807 A CN 115330807A CN 202210814858 A CN202210814858 A CN 202210814858A CN 115330807 A CN115330807 A CN 115330807A
Authority
CN
China
Prior art keywords
convolution
dimensional
network
feature
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210814858.6A
Other languages
Chinese (zh)
Inventor
叶中玉
刁东宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Original Assignee
Nari Technology Co Ltd
NARI Nanjing Control System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, NARI Nanjing Control System Co Ltd filed Critical Nari Technology Co Ltd
Priority to CN202210814858.6A priority Critical patent/CN115330807A/en
Publication of CN115330807A publication Critical patent/CN115330807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10101Optical tomography; Optical coherence tomography [OCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The invention discloses a choroid neovascularization image segmentation method based on a hybrid convolution network, which comprises the steps of firstly collecting a fundus OCT scanning image and marking the fundus OCT scanning image, and forming a data set by the marked image; constructing a depth space-time separation mixed convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, then expanding the two-dimensional features to three-dimensional, then carrying out three-dimensional attention depth space-time separation convolution, and aligning and fusing the two-dimensional features and the three-dimensional features; then training the constructed hybrid neural network model by using the data set; and finally, segmenting choroidal neovascularization from the fundus OCT scanning image by using the trained mixed neural network model to obtain a segmentation result. Local features of the choroid neovascularization image are better extracted through a space-time attention mechanism, and calculation parameters can be effectively reduced by using deep space-time separation convolution and performing dimension reduction operation on an input feature map, so that the network calculation amount is reduced, and the channel attention can be more effectively calculated.

Description

Choroidal neovascularization image segmentation method based on hybrid convolutional network
Technical Field
The invention relates to retina image segmentation, in particular to a choroid neovascular image segmentation method based on a hybrid convolution network.
Background
The tissue structure in the center of the retina is called a macular area, and the strong photosensitivity of the macular area can determine the quality of visual function. Choroidal neovascularization refers to the proliferative blood vessels from the choroidal capillaries, commonly found in the macula.
Medical image processing methods are various and complex. The imaging principle of the medical image is not refined enough, the specificity of the retina of an individual is large, the difference between individuals is large, the structures are different, the choroid neovascularization area is estimated manually and quantitatively by the traditional method, the whole process is time-consuming, the method depends on personal experience, and judgment errors exist.
With the development of deep learning, image segmentation techniques based on deep learning have become an important component of image segmentation, and for choroidal neovascularization segmentation, deep learning methods have recently succeeded in this regard. However, choroidal neovascularization segmentation presents greater difficulties and challenges than human organ image segmentation: 1) The fundus OCT image has artifacts and eye structure tissue noise 2) the characteristics are omitted by simply using a two-dimensional convolution network, and image inter-slice information needs to be captured by using three-dimensional convolution, but the calculation cost is high by simply using three-dimensional convolution.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above disadvantages, the present invention provides a hybrid convolutional network-based choroidal neovascularization image segmentation method with fine segmentation and low computation.
The technical scheme is as follows: in order to solve the above problems, the present invention provides a choroidal neovascularization image segmentation method based on a hybrid convolution network, comprising the following steps:
(1) Constructing a data set; collecting an eye ground OCT scanning image, labeling choroidal neovascularization in the image, and forming a data set by the labeled image;
(2) Constructing a depth space-time separation hybrid convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, expanding the two-dimensional convolution to three-dimensional by the two-dimensional depth convolution, performing three-dimensional attention depth space-time separation convolution, and aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution;
(3) A mixed neural network model is obtained by using a deep space-time separation mixed convolution neural network of a fusion attention mechanism constructed by data set training;
(4) And segmenting a choroid neovascular image from the fundus OCT scan image by using the trained mixed neural network model to obtain a segmentation result.
Further, in the step (2), four consecutive two-dimensional feature images are converted into a three-dimensional feature vector.
Further, in the step (2), a new attention map is formed by subjecting the obtained three-dimensional feature vector to a time-space attention mechanism, and then deep space-time separation convolution is performed.
Further, when performing deep space-time separation convolution, the three-dimensional convolution is divided into two separate convolutions, which are 1 × Y × Z spatial convolution and X × 1 × 1 temporal convolution, respectively, so that the three-dimensional deep space-time separation convolution DSTS has:
Figure BDA0003741938260000021
wherein, K P Represents the convolution kernel of the point-to-convolution,
Figure BDA0003741938260000022
representing a spatial convolution
Figure BDA0003741938260000023
The convolution kernel of (a);
Figure BDA0003741938260000024
representing a time convolution
Figure BDA0003741938260000025
The convolution kernel of (2); u represents that two characteristic graphs of space convolution and time convolution are spliced; f' represents a final attention feature graph after the space-time attention mechanism; r represents a hole convolution operation.
Further, the calculation process of the final attention feature map F ″ through the spatiotemporal attention mechanism is as follows:
Figure BDA0003741938260000026
Figure BDA0003741938260000027
wherein F represents the input feature map in the input temporal attention module, M C (F) Representing the output feature map generated by the elapsed time attention module, F' representing the input feature map in the input space attention module, M S (F') represents the output feature map generated by the spatial attention module.
Further, after the three-dimensional feature vector is input into the time attention module, average pooling and maximum pooling are carried out to obtain maximum pooling features
Figure BDA0003741938260000028
And average pooling characteristics
Figure BDA0003741938260000029
Then a shared network layer consisting of a plurality of layers of perceptrons containing a hidden layer is adopted to receive the characteristics which are subjected to two kinds of pooling operation, and finally a channel attention diagram M is generated X ∈R X×1×1 After the operation of the shared network layer, the sum of the elements is used to combine the output feature vectors, and the calculation formula in the time attention module is as follows:
Figure BDA00037419382600000210
wherein σ represents a sigmiod function, W 0 And W 1 Is the weight of the MLP part of the multi-layered perceptron, avgPool (F) means the average pooling of the input features F, maxPool (F) means the maximum pooling of the input features F.
Further, after the three-dimensional feature vector is input into the space attention module, average pooling and maximum pooling are carried out to obtain maximum pooled features
Figure BDA00037419382600000211
And average pooling characteristics
Figure BDA00037419382600000212
And coupling the features through standard convolution operation in the convolution layer to generate a final space attention diagram, wherein a calculation formula in a space attention module is as follows:
Figure BDA0003741938260000031
where σ denotes a sigmiod function and f denotes a convolution operation.
Further, the aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution in the step (2) specifically includes the following steps:
calculating a feature map and associated pixel probability scores output from the two-dimensional convolutional network:
X 2d =f 2d (I 2d ;θ 2d ),X 2d ∈R 4n×256×256×64
y 2d =f 2dcls (X 2d ;θ 2dcls ),y 2d ∈R 4n×256×256×3
wherein, I 2d Representing samples input into a two-dimensional convolutional network; n represents the batch size of the input training sample;
aligning the feature map and the probability score in the two-dimensional convolution network with the score map of the three-dimensional feature map, wherein the calculation formula is as follows:
X' 2d =T(X 2d ),X' 2d ∈R n×256×256×64
y' 2d =T(y 2d ),y' 2d ∈R n×256×256×3
wherein T represents the transformation of three-dimensional data composed of adjacent slices;
obtaining context feature y 'from two-dimensional convolutional network through skip connection' 2d The three-dimensional convolution network trains context pixels of a probability map generated by the two-dimensional convolution network, and the probability map generated by the two-dimensional convolution network feeds back the training of the three-dimensional convolution network, wherein the calculation formula is as follows:
X 3d =f dsts (I,y' 2d ;θ 3d )
Z=X 3d +X' 2d
wherein, X 3d And (3) representing a three-dimensional convolution network output feature map, and Z representing a two-dimensional three-dimensional mixed feature map, which refers to the sum of intra-chip features and inter-chip features in the two-dimensional and three-dimensional networks.
Further, joint learning and optimization are carried out on the two-dimensional and three-dimensional mixed features Z, and the calculation formula is as follows:
H=f hff (Z;θ hff )
y h =f hffcls (H;θ hffcls )
wherein H represents the optimized mixing feature, y h Representing the pixel-level prediction probability of the mixed feature fusion layer.
Further, in the training-constructed depth space-time separation hybrid convolutional neural network integrating the attention mechanism, the imbalance of the segmentation grades of the three-dimensional choroid neovascularization is optimized by adopting a plurality of types of dice losses and cross entropy losses which are insensitive to the class imbalance, and the loss function is as follows:
Figure BDA0003741938260000032
where C represents the number of classes, V represents the number of voxels,
Figure BDA0003741938260000041
representing the prediction probability that voxel i belongs to class c, epsilon represents a smoothing factor,
Figure BDA0003741938260000042
a truth label indicating that voxel i belongs to class c.
Has the advantages that: compared with the prior art, the method has the obvious advantages that local features of the choroid neovascularization image are better extracted through a space-time attention mechanism, the calculation parameters can be effectively reduced by using deep space-time separation convolution and performing dimension reduction operation on the input feature map, so that the network calculation amount is reduced, and the channel attention can be more effectively calculated. The average pooling can effectively fuse spatial information, the maximum pooling is more in accordance with the attention mechanism, a pixel area closest to the target feature can be found in the feature map, and the feature map can be more effectively refined by using the combined operation of the maximum pooling and the average pooling. The spatial attention is more biased to the position information of the attention target in the image, and the temporal attention can be effectively supplemented. The spatial attention features are efficiently computed by concatenating the average pooling and maximum pooling operations on the channel axis after they have been applied to generate a temporal feature map.
Drawings
FIG. 1 is a schematic diagram of a hybrid convolutional network framework of the present invention;
FIG. 2 is a diagram of the operation of the time attention module in the hybrid convolutional network of the present invention;
FIG. 3 is a diagram of the operation of the spatial attention module in the hybrid convolutional network of the present invention.
Detailed Description
In this embodiment, a choroidal neovascularization image segmentation method based on a hybrid convolutional network includes the following steps:
(1) Constructing a data set; acquiring an eye ground OCT scanning image, labeling choroidal neovascularization in the image, manually labeling a choroidal neovascularization region in the scanning image as a standard, and forming a data set by the labeled image;
(2) Constructing a depth space-time separation mixed convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, expanding the two-dimensional convolution to three-dimensional by the two-dimensional depth convolution, performing three-dimensional attention depth space-time separation convolution, and aligning and fusing features generated by the two-dimensional convolution and features generated by the three-dimensional convolution;
(3) A mixed neural network model is obtained by using a deep space-time separation mixed convolution neural network of a fusion attention mechanism constructed by data set training;
(4) And segmenting choroidal neovascularization from the fundus OCT scan image by using the trained mixed neural network model to obtain a segmentation result.
In the step (2), the network structure is shown in fig. 1, and comprises two-dimensional convolution, two-dimensional depth convolution, three-dimensional point-to-point convolution and three-dimensional attention depth space-time separation convolution, wherein the three-dimensional attention depth space-time separation convolution comprises a three-dimensional space pyramid module and a space-time attention mechanism module. The hybrid convolution network adopts a coding and decoding structural form, the bottom of the hybrid convolution network is formed by two-dimensional depth convolution, and the rest of the hybrid convolution network is formed by three-dimensional depth and space-time separation convolution; spatial pyramid pooling is employed at the end of the encoder, which captures multi-scale information by parallel, different sized, three-dimensional void space-time separation convolutions.
In the step (3), the marked input image is subjected to down-sampling operation through common 3 x 3 two-dimensional convolution to obtain a two-dimensional feature image, then the two-dimensional feature image enters a two-dimensional depth convolution module to expand the depth two-dimensional convolution to three-dimensional, four continuous two-dimensional feature images are converted into a three-dimensional feature vector, and calculation and parameters are reduced by independently performing convolution on each input channel.
The dimension of the mechanism is T in the three-dimensional convolution layer through space-time attention F ×W F ×H F X M feature map F' as input, dimension T G ×W G ×H G And taking the characteristic diagram G of multiplied by N as an output, wherein T, W and H respectively represent a time dimension, a space width and a space height in the three-dimensional characteristic diagram, and M and N respectively represent the number of input and output channels. By convolution kernel K S (X Y Z M N) to parameterize a standard three-dimensional convolution layer, X, YZ is the time dimension and the space dimension of the convolution kernel, respectively, and the output three-dimensional convolution layer can be calculated by the following formula:
Figure BDA0003741938260000051
in the above equation, r represents a hole convolution operation.
For a convolution kernel K with X Y Z M D Can be calculated by the following formula:
Figure BDA0003741938260000052
then a convolution kernel K with dimensions of 1 × 1 × 1 × M × N is applied P The three-dimensional point-to-convolution to combine the output deep convolutions and then project them into a new tunnel space can be represented by:
Figure BDA0003741938260000053
the deep convolution can effectively reduce convolution parameters and calculation complexity, and is a powerful operation mode. For example, a tensor with 3 × 3 × 3 × 3 dimensions, c input channels and c output channels is subjected to convolution operation, and a standard convolution includes 27c 2 But the depth convolution has only 27c of parameters, which is c times less than the standard parameters.
The spatiotemporal attention mechanism is better to extract local features of the choroidal neovascularization image, in a temporal attention module such as fig. 2, any channel of a feature map can be regarded as a feature detector, and attention is to find regional features which need to be learned in an input image. The calculation parameters can be effectively reduced by performing dimension reduction operation on the input feature diagram, so that the network calculation amount is reduced, and the channel attention can be calculated more effectively. The average pooling can effectively fuse spatial information, the maximum pooling is more matched with the attention mechanism, and the image closest to the target feature can be found in the feature mapThe primitive regions, using a combination of max pooling and average pooling, can refine the feature map more efficiently. The combination of the average pooling and the maximum pooling after the feature is input can effectively fuse the spatial information of the feature map and obtain the attention area feature, and the feature map obtained by the operation can be represented by the two symbols:
Figure BDA0003741938260000061
and
Figure BDA0003741938260000062
maximum pooling characteristic and average pooling characteristic, respectively. The structure adopts a shared network layer consisting of a plurality of layers of perceptrons containing a hidden layer to receive the characteristics subjected to two kinds of pooling operations, and finally generates a channel attention diagram M X ∈R X×1×1 . After operation of the shared network layer, the sum of the elements is used to merge the output feature vectors, and the temporal attention module can be expressed as follows:
Figure BDA0003741938260000063
σ denotes a sigmiod function, W 0 And W 1 Is the weight of the MLP part.
Unlike the temporal attention module, the spatial attention module, as shown in fig. 3, prefers the position information of the attention target in the image, and can effectively supplement the temporal attention. The spatial attention features are efficiently computed by concatenating the average pooling and maximum pooling operations on the channel axis after they have been applied to generate a temporal feature map. When connecting feature maps, a convolution layer is used to generate a spatial feature map M S ∈R H×W . Maximum pooling and average pooling operations are used to enable the input feature information to be effectively aggregated, and then two mappings are generated respectively and can be expressed as
Figure BDA0003741938260000064
And
Figure BDA0003741938260000065
spatial maximum pooling characteristics and average pooling characteristics are represented separately. And then coupling the features through standard convolution operation in the convolution layer to generate a final space attention diagram. The spatial attention module may be expressed as:
Figure BDA0003741938260000066
σ denotes a sigmoid function, and f denotes a convolution operation.
By separating the learning channel attention and the space attention process, the calculation amount can be greatly reduced, and the space-time characteristics can be effectively captured and learned. The overall process of computing the final spatiotemporal attention map F "can be expressed as:
Figure BDA0003741938260000067
Figure BDA0003741938260000068
more computation is required for three-dimensional convolution than for two-dimensional convolution. To make model operations more efficient, a three-dimensional volume is integrated into two separate convolutions, one being a 1 × Y × Z spatial convolution and one X × 1 × 1 temporal convolution, for the purpose of temporal-spatial separation (STS). The emphasis of spatial convolution is on the learning of spatial features, and the emphasis of temporal convolution is on the learning of temporal features. The parallel space-time separation computing method is defined as follows:
Figure BDA0003741938260000069
in the above formula
Figure BDA0003741938260000071
Represents the convolution kernel of a 1 xyxz spatial convolution,
Figure BDA0003741938260000072
a convolution kernel representing an X × 1 × 1 time convolution, and u represents splicing two feature maps. In the parallel STS module, the two convolutions are performed in parallel in two branches, and then their outputs are connected, which is also more effective for retinal image anisotropy. To further reduce computational complexity and model parameters, three-dimensional depth space-time separation convolution is employed.
After the three-dimensional space-time separation convolution operation, the output channel is divided into a spatial branch and a temporal branch, which respectively focus on learning the temporal and spatial features, and in each branch, a space/time convolution is performed in each channel. After independent feature learning, the outputs of the spatial and temporal branches are connected and fed to a point for feature integration into the convolution. The three-dimensional depth space-time separation convolution (DSTS) may be represented by the following equation:
Figure BDA0003741938260000073
in the above formula K P Is the convolution kernel of the point-to-convolution,
Figure BDA0003741938260000074
and
Figure BDA0003741938260000075
the convolution kernels of the spatial convolution and the temporal convolution, respectively, and F "is the final attention map via a spatio-temporal attention mechanism. We replace all three-dimensional convolutions with deep space-time separated convolutions to better save computational cost.
The two-dimensional convolution network with deep convolution can effectively learn high-level plane features, and has the defects that spatial information along the Z dimension is ignored, and the three-dimensional convolution network can make up the deficiency, but higher calculation cost is needed, so that the two-dimensional convolution network is adopted for combined fusion and optimized learning, the in-slice and inter-slice features of the CNV can be better learned, and choroid neovascularization image segmentation can be better performed.
The feature map and associated pixel probability scores output from the two-dimensional convolutional network may be represented as follows:
X 2d =f 2d (I 2d ;θ 2d ),X 2d ∈R 4n×256×256×64
y 2d =f 2dcls (X 2d ;θ 2dcls ),y 2d ∈R 4n×256×256×3
I 2d representing samples of the input two-dimensional convolutional network, and n represents the batch size of the input training samples.
In order to fuse the mixed features from the two-dimensional and three-dimensional convolution networks, the feature sizes need to be aligned, and the feature map and the probability score in the two-dimensional convolution network are aligned with the score map of the three-dimensional feature map according to the following formula:
X′ 2d =T(X 2d ),X′ 2d ∈R n×256×256×64
y′ 2d =T(y 2d ),y′ 2d ∈R n×256×256×3
t represents the transformation of adjacent slices into three-dimensional data.
The three-dimensional convolutional network part extracts multi-scale features through spatial pyramid pooling, and context features y 'are obtained from the two-dimensional convolutional network through skip connection' 2d The two-dimensional network part generates a characteristic probability graph, and the training is carried out on the context pixels of the two-dimensional network part through a three-dimensional convolution network. The probability graph generated by the two-dimensional convolutional network can feed back the training of the three-dimensional convolutional network part, so that the problem of overweight calculation burden of self-learning and updating the best weight by independently using the three-dimensional network is solved, and the self-learning speed and the calculation speed of the three-dimensional network part are greatly improved. The three-dimensional network part learning process can be described as:
X 3d =f dsts (I,y′ 2d ;θ 3d )
Z=X 3d +X′ 2d
X 3d representing a partial output characteristic diagram of the three-dimensional convolution network, Z represents a two-dimensional three-dimensional mixed characteristic, which respectively refers to two dimensionsAnd the sum of intra-and inter-patch features in the three-dimensional network, and then performing joint learning and optimization on the hybrid features at the HFF layer:
H=f hff (Z;θ hff )
y h =f hffcls (H;θ hffcls )
h represents the optimized mixing feature, y h Pixel-level prediction probabilities of the mixed feature fusion layers.
In order to solve the problem that the three-dimensional choroidal neovascularization segmentation has serious grade imbalance, multiple types of dice losses and cross entropy losses which are insensitive to class imbalance are used in the mixed convolution network, and the formula is as follows:
Figure BDA0003741938260000081
in the above formula, C represents the number of analogy, V represents the number of voxels,
Figure BDA0003741938260000082
representing the prediction probability that voxel i belongs to class c, epsilon represents a smoothing factor,
Figure BDA0003741938260000083
a truth label indicating that voxel i belongs to class c.
Gradient constraints are also introduced to better preserve CNV boundary gradient constraints G as:
Figure BDA0003741938260000084
where N represents a collection of pixel boundaries, g (N) represents the computation of a gradient,
Figure BDA0003741938260000085
and
Figure BDA0003741938260000086
representing the gradient of the pixel n in the x-direction and the y-direction, respectively.
The final loss function can be expressed as:
Figure BDA0003741938260000087

Claims (10)

1. a choroid neovascularization image segmentation method based on a hybrid convolution network is characterized by comprising the following steps:
(1) Constructing a data set; collecting an eye ground OCT scanning image, labeling choroidal neovascularization in the image, and forming a data set by the labeled image;
(2) Constructing a depth space-time separation mixed convolution neural network fused with an attention mechanism, extracting two-dimensional features by adopting two-dimensional convolution, expanding the two-dimensional convolution to three-dimensional by the two-dimensional depth convolution, performing three-dimensional attention depth space-time separation convolution, and aligning and fusing features generated by the two-dimensional convolution and features generated by the three-dimensional convolution;
(3) Using a depth space-time separation hybrid convolution neural network of a fusion attention mechanism constructed by data set training to obtain a hybrid neural network model;
(4) And segmenting a choroid neovascular image from the fundus OCT scan image by using the trained mixed neural network model to obtain a segmentation result.
2. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 1, wherein in step (2), four consecutive two-dimensional feature images are converted into a three-dimensional feature vector.
3. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 2, wherein in the step (2), a new attention map is formed by subjecting the obtained three-dimensional feature vectors to a time-space attention mechanism, and then deep space-time separation convolution is performed.
4. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 3 wherein said deep spatiotemporal separation convolution is performed by dividing a three-dimensional convolution into two separate convolutions, 1X Y X Z spatial convolution and X1 temporal convolution, such that said three-dimensional deep spatiotemporal separation convolution DSTS has:
Figure FDA0003741938250000011
wherein, K P Represents the convolution kernel of the point-to-convolution,
Figure FDA0003741938250000012
representing a spatial convolution
Figure FDA0003741938250000013
The convolution kernel of (a);
Figure FDA0003741938250000014
representing a time convolution
Figure FDA0003741938250000015
The convolution kernel of (a); u represents that two feature graphs of the spatial convolution and the temporal convolution are spliced; f' represents a final attention feature graph after the space-time attention mechanism; r represents a hole convolution operation.
5. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 4, wherein the final attention feature map F "subjected to spatiotemporal attention mechanism is calculated by:
Figure FDA0003741938250000016
Figure FDA0003741938250000017
wherein F represents the input feature map in the input temporal attention module, M C (F) Representing the output feature map generated by the elapsed time attention module, F' representing the input feature map in the input space attention module, M S (F') representing an output feature map generated by a spatial attention module;
Figure FDA0003741938250000021
representing the multiplication of two matrix elements.
6. The method as claimed in claim 5, wherein the average pooling and maximum pooling are performed after inputting the three-dimensional feature vector into the temporal attention module to obtain the maximum pooled feature
Figure FDA0003741938250000022
And average pooling characteristics
Figure FDA0003741938250000023
Then a shared network layer consisting of a plurality of layers of sensors containing a hidden layer is adopted to receive the characteristics subjected to two kinds of pooling operation, and finally a channel attention diagram M is generated X ∈R X×1×1 ,R X×1×1 Representing a feature diagram set with the number of channels being X, the length being 1 and the width being 1, merging output feature vectors by using the sum of elements after the operation of a shared network layer, wherein a calculation formula in a time attention module is as follows:
Figure FDA0003741938250000024
wherein σ represents a sigmiod function, W 0 And W 1 Is the weight of the multi-layer perceptron MLP part, avgPool (F) indicates the average pooling of the input features F, maxPool (F) indicates the maximum pooling of the input features F.
7. The method as claimed in claim 6, wherein the average pooling and maximum pooling are performed after the three-dimensional feature vector is inputted into the spatial attention module, so as to obtain maximum pooled features
Figure FDA0003741938250000025
And average pooling characteristics
Figure FDA0003741938250000026
R 1×H×W Representing a feature diagram set with the number of channels being 1, the length being H and the width being W, coupling features through standard convolution operation in a convolution layer to generate a final spatial attention diagram, wherein a calculation formula in a spatial attention module is as follows:
Figure FDA0003741938250000027
where σ denotes a sigmiod function and f denotes a convolution operation.
8. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 1, wherein the step (2) of aligning and fusing the features generated by the two-dimensional convolution and the features generated by the three-dimensional convolution specifically comprises the following steps:
calculating a feature map and a related pixel probability score output from the two-dimensional convolution network:
X 2d =f 2d (I 2d ;θ 2d ),X 2d ∈R 4n×256×256×64
y 2d =f 2dcls (X 2d ;θ 2dcls ),y 2d ∈R 4n×256×256×3
wherein, I 2d Representing samples input into a two-dimensional convolutional network; n represents the batch size of the input training sample; theta 2d Representing images in two-dimensional convolutionScoring prime probability;
aligning the feature map and the probability score in the two-dimensional convolution network with the score map of the three-dimensional feature map, firstly carrying out three-dimensional transformation on the two-dimensional convolution feature map, wherein the calculation formula is as follows:
X' 2d =T(X 2d ),X' 2d ∈R n×256×256×64
y' 2d =T(y 2d ),y' 2d ∈R n×256×256×3
wherein T represents the transformation of three-dimensional data composed of adjacent slices;
obtaining context feature y 'from two-dimensional convolutional network through skip connection' 2d The three-dimensional convolution network trains context pixels of a probability map generated by the two-dimensional convolution network, and the probability map generated by the two-dimensional convolution network feeds back the training of the three-dimensional convolution network, wherein the calculation formula is as follows:
X 3d =f dsts (I,y' 2d ;θ 3d )
Z=X 3d +X′ 2d
wherein, X 3d Representing a three-dimensional convolution network output characteristic diagram, and Z represents a two-dimensional three-dimensional mixed characteristic diagram, which refers to the sum of intra-chip characteristics and inter-chip characteristics in a two-dimensional network and a three-dimensional network; theta.theta. 3d Representing the pixel probability scores in a three-dimensional convolution.
9. The hybrid convolution network-based choroidal neovascularization image segmentation method according to claim 8, wherein the two-dimensional three-dimensional hybrid feature Z is jointly learned and optimized according to a calculation formula:
H=f hff (Z;θ hff )
y h =f hffcls (H;θ hffcls )
where H represents the optimized blend feature, θ hff Representing the pixel probability score, y, in the HFF layer h Representing the pixel-level prediction probability of the mixed feature fusion layer.
10. The method for choroidal neovascularization image segmentation based on hybrid convolutional network according to claim 9, wherein in the deep spatiotemporal separation hybrid convolutional neural network of the fusion attention mechanism constructed by training, the segmentation level imbalance of the three-dimensional choroidal neovascularization is optimized by using various dice losses and cross entropy losses insensitive to class imbalance, and the loss function is:
Figure FDA0003741938250000031
where C denotes the number of classes, V denotes the number of voxels,
Figure FDA0003741938250000032
representing the prediction probability that voxel i belongs to class c, epsilon represents a smoothing factor,
Figure FDA0003741938250000033
a truth label indicating that voxel i belongs to class c; g denotes a gradient constraint, e () Representing the activation function in deep learning with respect to the variable mu.
CN202210814858.6A 2022-07-12 2022-07-12 Choroidal neovascularization image segmentation method based on hybrid convolutional network Pending CN115330807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210814858.6A CN115330807A (en) 2022-07-12 2022-07-12 Choroidal neovascularization image segmentation method based on hybrid convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210814858.6A CN115330807A (en) 2022-07-12 2022-07-12 Choroidal neovascularization image segmentation method based on hybrid convolutional network

Publications (1)

Publication Number Publication Date
CN115330807A true CN115330807A (en) 2022-11-11

Family

ID=83916628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210814858.6A Pending CN115330807A (en) 2022-07-12 2022-07-12 Choroidal neovascularization image segmentation method based on hybrid convolutional network

Country Status (1)

Country Link
CN (1) CN115330807A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485820A (en) * 2023-06-21 2023-07-25 杭州堃博生物科技有限公司 Method and device for extracting artery and vein image and nonvolatile storage medium
CN116559949A (en) * 2023-05-19 2023-08-08 北京宸宇金源科技有限公司 Carbonate reservoir prediction method, system and equipment based on deep learning
CN117582185A (en) * 2024-01-19 2024-02-23 季华实验室 Pulse force level prediction method based on CLLSR hybrid model
CN117635950A (en) * 2024-01-04 2024-03-01 广东欧谱曼迪科技股份有限公司 Method, device, electronic equipment and storage medium for vessel segmentation correction processing

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116559949A (en) * 2023-05-19 2023-08-08 北京宸宇金源科技有限公司 Carbonate reservoir prediction method, system and equipment based on deep learning
CN116485820A (en) * 2023-06-21 2023-07-25 杭州堃博生物科技有限公司 Method and device for extracting artery and vein image and nonvolatile storage medium
CN116485820B (en) * 2023-06-21 2023-09-22 杭州堃博生物科技有限公司 Method and device for extracting artery and vein image and nonvolatile storage medium
CN117635950A (en) * 2024-01-04 2024-03-01 广东欧谱曼迪科技股份有限公司 Method, device, electronic equipment and storage medium for vessel segmentation correction processing
CN117635950B (en) * 2024-01-04 2024-04-09 广东欧谱曼迪科技股份有限公司 Method, device, electronic equipment and storage medium for vessel segmentation correction processing
CN117582185A (en) * 2024-01-19 2024-02-23 季华实验室 Pulse force level prediction method based on CLLSR hybrid model
CN117582185B (en) * 2024-01-19 2024-05-07 季华实验室 Pulse force level prediction method based on CLLSR mixed model

Similar Documents

Publication Publication Date Title
CN115330807A (en) Choroidal neovascularization image segmentation method based on hybrid convolutional network
Yue et al. Auto-detection of Alzheimer's disease using deep convolutional neural networks
CN107609503A (en) Intelligent cancerous tumor cell identifying system and method, cloud platform, server, computer
CN110930416A (en) MRI image prostate segmentation method based on U-shaped network
CN111932529B (en) Image classification and segmentation method, device and system
Oliveira et al. Deep transfer learning for segmentation of anatomical structures in chest radiographs
CN112950644B (en) Neonatal brain image segmentation method and model construction method based on deep learning
CN115147600A (en) GBM multi-mode MR image segmentation method based on classifier weight converter
Chen et al. Skin lesion segmentation using recurrent attentional convolutional networks
Lei et al. Automated detection of retinopathy of prematurity by deep attention network
Zhang et al. Multi-scale neural networks for retinal blood vessels segmentation
Cai et al. Identifying architectural distortion in mammogram images via a se-densenet model and twice transfer learning
Liu et al. Integrated learning approach based on fused segmentation information for skeletal fluorosis diagnosis and severity grading
Yang et al. RADCU-Net: Residual attention and dual-supervision cascaded U-Net for retinal blood vessel segmentation
Banerjee et al. A CADe system for gliomas in brain MRI using convolutional neural networks
Wang et al. Application of artificial intelligence methods in carotid artery segmentation: a review
Kumar et al. Multi-class Brain Tumor Classification and Segmentation using Hybrid Deep Learning Network Model
Babu et al. Machine learning algorithms for MR brain image classification
Wen et al. A-PSPNet: A novel segmentation method of renal ultrasound image
CN111583192A (en) MRI (magnetic resonance imaging) image and deep learning breast cancer image processing method and early screening system
CN115471436A (en) Ocular fundus retinal OCTA image fusion method and system
CN111275720B (en) Full end-to-end small organ image identification method based on deep learning
Taş et al. Detection of retinal diseases from ophthalmological images based on convolutional neural network architecture.
Subramanian et al. Design and Evaluation of a Deep Learning Aided Approach for Kidney Stone Detection in CT scan Images
Vanmore et al. Liver Lesions Classification System using CNN with Improved Accuracy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination