CN114663444A

CN114663444A - Fetal heart segmentation method based on self-supervision migration and global attention

Info

Publication number: CN114663444A
Application number: CN202210210353.9A
Authority: CN
Inventors: 曾宪华; 高歌
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Shenzhen Hengyuan Zhida Information Technology Co ltd
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-24

Abstract

The invention requests to protect a fetus heart segmentation method based on self-supervision migration and global attention, a fetus heart segmentation model for processing an ultrasonic image is constructed through a global attention module, the global attention module performs fusion of the coding layer characteristic and the decoding layer characteristic of the model in the channel and space directions, and meanwhile, the position attention is utilized to learn the relation between long-distance characteristic points; training an encoder for optimizing a fetal heart segmentation model by using unlabeled fetal heart ultrasonic image data and using an automatic supervision method, so that the characteristic vector generated by the encoder can keep the manifold characteristics of the ultrasonic image, and storing the obtained network parameters; initializing an encoder of the fetal heart segmentation model according to the parameters, and optimizing the whole fetal heart segmentation model by using the marked fetal heart ultrasonic image data; and testing and verifying the fetal ultrasonic image by using the trained fetal heart segmentation model.

Description

Fetal heart segmentation method based on self-supervision migration and global attention

Technical Field

The invention belongs to the field of artificial intelligence traditional Chinese medicine image segmentation, and relates to a fetal heart segmentation method based on self-supervision migration and global attention. Feature information in the label-free data is learned through an auto-supervision method, and a global attention mechanism is utilized to combine feature information of a channel domain, a space domain and a position domain to obtain a high-quality heart segmentation result.

Background

Congenital heart disease is a disease which is lethal to newborn and has the highest morbidity. According to the Chinese birth defect prevention report, the incidence rate of congenital heart disease of Chinese fetus is about 0.74%. In prenatal ultrasound examination of a fetus, a doctor observes the states of a heart chamber, an atrium and a cusp of the fetus through ultrasound images so as to screen for congenital heart disease. Accurately segmenting the fetal heart region of the ultrasound image can help the physician quickly locate, identify, and even measure the heart region. Segmenting the fetal heart is a challenging task due to speckle noise, strong artifacts, and differences in the angles at which the heart regions are imaged in two-dimensional ultrasound images.

In medical image segmentation tasks of various modalities (CT, MRI, X-ray and ultrasound), a deep learning technology is successfully applied, and a neural network model based on a skip connection coding and decoding structure is one of effective methods. The U-Net model has been applied to the ultrasound segmentation task. The U-Net network has excellent characteristics, but the characteristics of the U-Net network at hop connection positions have the problem of inefficient fusion. The Attention U-Net model tries to introduce Attention to solve the problem, but the Attention U-Net model cannot simultaneously fuse features from channels and spatial scales, and has the defect of long-distance Attention.

In the training process of the medical image segmentation model, the optimization of the neural network needs to continuously fit the image labels. After many training cycles, the model may over-fit on the training set, thereby affecting the generalization ability of the model on the test set. The ultrasonic image has the characteristics of high noise, low contrast and more artifacts, the similarity between the imaged tissues and organs is large, and the boundaries of the tissues and organs are unclear. In the ultrasound image segmentation task, the segmentation labels are made with different tendencies of different physicians due to the blurring of the organ boundaries, which may result in inconsistent segmentation labels.

Disclosure of Invention

In order to more efficiently fuse the coding layer characteristics and decoding layer information at the jump connection, the invention develops a global attention block. The module firstly fuses shallow semantics of a coding layer and deep semantics of a decoding layer, then performs spatial scale fusion on features, and finally uses an attention module with long-distance dependence. Aiming at the problems existing in the overfitting and segmentation labels, a fetal heart segmentation method based on self-supervision migration and global attention is utilized to enable a network to learn network parameters beneficial to an ultrasonic image segmentation task by comparing the difference between ultrasonic images from unlabeled ultrasonic image data, and the network parameters are beneficial to subsequent fine-tuning segmentation performance. Meanwhile, the method can combine the coding layer characteristics and the decoding layer information from multiple angles by combining the global attention module, can pay attention to the mutual information among channels, spaces and positions, and further improves the segmentation performance.

The present invention is directed to solve the above problems of the prior art, and provides a fetal heart segmentation method based on self-supervised migration and global attention, which includes the following steps:

1) images in the fetal heart ultrasound image dataset are preprocessed.

2) A global attention module is constructed.

3) Constructing a fetal heart segmentation model by using the global attention module obtained in the step 2).

4) Constructing a network which is the same as the encoder of the fetal heart segmentation model in the step 3), and training and optimizing the network on the unlabeled fetal heart ultrasonic image data by using an automatic supervision method to obtain network parameters of the network.

5) Initializing the network parameters of the encoder of the fetal heart segmentation model by using the network parameters obtained in the step 4), and training and optimizing the whole fetal heart segmentation model by using the fetal heart ultrasonic image data with the tag.

6) And (5) transmitting the ultrasonic image into the fetal heart segmentation model obtained in the step 5), and obtaining a heart segmentation result after the model is calculated.

In summary, the invention has the following beneficial technical effects:

the invention constructs a fetal heart segmentation method based on self-supervision migration and global attention, which can not only enable the model to pay attention to more important characteristic information from a channel domain, a space domain and a position domain, but also keep certain characteristics of an ultrasonic image, and improve the heart segmentation capability and generalization capability of the model. Compared with the other four methods, the method obtains the best scores on the Dice, HD, IoU and Sensitivity indexes.

Drawings

Fig. 1 is a frame diagram of a fetal heart segmentation method based on self-supervised migration and global attention according to the present invention;

FIG. 2 is a diagram of the self-supervised training framework of the present invention;

FIG. 3 is a block diagram of the global attention module of the present invention;

fig. 4 is a prototype system for fetal heart segmentation constructed in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly in the following with reference to the accompanying drawings. The described implementations are only some of the embodiments of the present invention.

As shown in fig. 1, a fetal heart segmentation method based on self-supervised migration and global attention includes:

1) the method includes preprocessing images in the fetal heart ultrasound image dataset, the preprocessing including cropping the ultrasound images, all of the ultrasound images being cropped to 224 × 224 size under uniform conditions.

2) A global attention module is constructed. As shown in fig. 2, the global attention module is composed of a channel attention network, a spatial attention network, and a location attention network. And constructing a channel attention network, a spatial attention network and a position attention network from different dimensional directions by utilizing a global average pooling operation, a global maximum pooling operation and an activation function operation. The global attention module is formed by the three networks. The global attention module processes the input feature map X as follows:

1. the feature map X is processed in the channel domain. The feature map X is processed into two c × 1 × 1 features through an average pooling operation based on width and height and a maximum pooling operation based on width and height, respectively, and features in the channel direction are obtained. Then the two are processed by a shared fully-connected MLP layer to obtain two c multiplied by 1 characteristics, wherein c represents the number of channels of the characteristic diagram. I.e. to obtain relevant information between different channels. Then adding the two and processing the Sigmoid function to obtain the channel weight M_c(X) so that different channels are obtainedAttention is paid. The whole process is shown as formula (1):

M_c(X)＝σ(MLP(AvgPool_(w,h)(X))+MLP(MaxPool_(w,h)(X))) (1)

wherein σ is a Sigmoid activation function. MLP is a multilayer shared fully-connected layer, AvgPool_(w,h)To perform a broad and high based tie pooling operation on the input feature map. Maxport_(w,h)To perform maximum pooling operations on feature maps based on width and height. Will M_cAnd (X) multiplying the input feature map X to obtain a channel domain fused feature map X'. Namely, as shown in formula (2):

X′＝X×M_c(X) (2)

2. the feature map X' is processed in the spatial domain. Firstly, a feature diagram X' is subjected to maximum pooling operation and average pooling operation based on channels respectively, then two obtained feature results are spliced in the channel direction, and then the dimension of the channel is reduced to 1 channel through convolution operation. Finally, generating spatial domain weight M through Sigmoid_s(X'), thus obtaining attention in the spatial domain. The whole process is shown as formula (3):

M_s(X′)＝σ(f^7×7([AvgPool_c(X′)；MaxPool_c(X′)])) (3)

wherein f is^7×7Is a 7 x 7 convolution operation. [;]for splicing operations in the channel direction. Avgpool_(c)To perform a channel-based tie pooling operation on the feature map. Maxport_(c)To perform a channel-based max pooling operation on the feature map. And then multiplying the spatial domain weight by the input feature map X' to obtain a spatial domain fused feature map. Then, a 3 × 3 convolution kernel is performed to obtain a characteristic diagram X ″, which is shown in formula (4):

X″＝f^3×3(X′×M_s(X′)) (4)

3. the profile X "is processed in the location domain. The profile X "is subjected to a high-based maximum pooling operation and an average pooling operation, respectively, while being subjected to a wide-based maximum pooling operation and an average pooling operation, respectively. That is, the input feature map X' is encoded in horizontal coordinates using a pool kernel (H,1), H being the set kernel size. The input feature map X' is encoded in vertical coordinates using a pool kernel (1, W), W being the set kernel size. These 4 results are then stitched in the height dimension. And after the merged result is subjected to convolution operation, splitting the merged result into 4 position information characteristics in the vertical direction and the horizontal direction. The four features are passed through Sigmoid to generate attention weights for the location domain. The whole process is shown as formula (5):

M_a(X″)＝σ(f^1×1(Split(f^1×1([AvgPool_h(X″)；MaxPool_h(X″)；AvgPool_w(X″)；MaxPool_w(X″)])))) (5)

wherein, Split is a splitting operation. The four location domain weights M are then weighted_aAnd (X ') is multiplied with the input feature map X' to obtain a fusion feature in the position direction. Namely, formula (6):

X″′＝X″×M_a(X″) (6)

3) constructing a fetal heart segmentation model by using the global attention module obtained in the step 2). And constructing a fetal heart segmentation model based on an encoder-decoder by utilizing a global attention module and a jump connection operation. The fetal heart segmentation model is composed of an encoder with a down sampling process and a decoder with an up sampling process, wherein symmetrical network layers between the encoder and the decoder are connected by jump connections, and a global attention module is added after the splicing operation of the jump connections.

4) And (3) constructing a network which is the same as the encoder of the fetal heart segmentation model in the step 3), and training and optimizing the network on the unlabeled fetal heart ultrasonic image data set by using an automatic supervision method to obtain network parameters of the network. A momentum contrast self-supervision approach is used to optimize the parameters of the encoder network. As shown in fig. 3, the momentum contrast method provides a large number of negative sample encodings for contrast loss, mainly through queues. The encoder Q has the same network structure as the encoder K. The encoder Q generates a sample code Q; the encoder K generates a sample encoding K. In order to provide a large number of negative sample encodings, a large sample encoding queue is constructed. Ultrasound sample encoding enqueuing for the current lot, with the earliest lot in the queueThe next ultrasound image sample is encoded out of queue. Decoupling large negative sample numbers from small batches through dynamic variation enables smaller batches to be used to train the encoder, while the queue can provide a larger number of negative sample codes k^-. The current batch is regarded as a positive sample k⁺And enqueue again and increase as negative sample k^-. The sample codes in the queue come from a plurality of batches, and in order to ensure the consistency of the ultrasonic sample codes in the queue, a moving average updating method is used to make the encoder K change slowly, and only gradient descending updating is used for the encoder Q.

5) Initializing the network parameters of the encoder of the fetal heart segmentation model by using the network parameters obtained in the step 4), and training and optimizing the whole fetal heart segmentation model by using the labeled fetal heart data set. The loss function used is the contrast segmentation loss L_CSAs shown in equation (7). The loss function is composed of two parts, wherein the first part is used for measuring the difference of feature vectors of different classes, and the second part is used for calculating the difference between the segmentation prediction result of each pixel and the segmentation label.

Wherein L is_CSTo compare segmentation loss. And q is image sample coding. k is a radical of⁺Positive image samples are encoded.

The negative image samples are encoded. M is the size of the queue. Tau is a temperature over-parameter and is used for controlling the concentration degree of the distance so as to enable the distance distribution to be smoother. w and h are the width and height of the image respectively. y is_ijAs a true tag at positions i, j, y_i′_jThe predicted values of the categories at the positions i and j are obtained. Two ultrasound image views produced by the same ultrasound image using different data enhancements are considered as positive sample pairs, while ultrasound image pairs that are not identical are considered as negative sample pairs. Sample coding k⁺For positive sample coding corresponding to sample coding q, i.e. two ultrasound views of the same ultrasound imageThe graph is subjected to compression coding generated by the underlying network. Sample coding

And coding M negative samples corresponding to the sample codes q, namely performing compression coding on different ultrasonic images generated by an underlying network. Dot product operation means using cosine similarity to measure the distance between two sample codes, i.e. measuring the difference between ultrasound images, q and k⁺The more similar, q.k⁺The closer the operation result is to 1, the smaller the difference between the corresponding ultrasonic image pairs, and q are the same

The more different the number of the bits is,

the result of the operation will be approximately 0, and the greater the disparity between the corresponding ultrasound image pairs. When q and k are⁺The more similar to

The more different the function the lower the loss value will be.

6) And (3) transmitting the ultrasonic image into the fetal heart segmentation model obtained in the step 5), and obtaining a probability map after the calculation of an encoder, the jump connection, the calculation of a decoder, the calculation of the global attention and the calculation of an activation function of the model, so as to obtain a final heart segmentation result.

Fig. 4 shows a fetal heart segmentation prototype system constructed by the method of the present invention, which shows the result of fetal heart segmentation.

The above examples are to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. Various changes or modifications equivalent to those made according to the present invention also fall within the scope of the present invention defined by the appended claims.

Claims

1. A fetal heart segmentation method based on self-supervision migration and global attention is characterized by comprising the following steps:

1) preprocessing images in the fetal heart ultrasound image dataset;

2) constructing a global attention module;

3) constructing a fetal heart segmentation model by using the global attention module obtained in the step 2);

4) constructing a network which is the same as the encoder of the fetal heart segmentation model in the step 3), and training and optimizing the network on the unlabeled fetal heart ultrasonic image data set by using an automatic supervision method to obtain network parameters of the network;

5) initializing network parameters of an encoder of the fetal heart segmentation model by using the network parameters obtained in the step 4), and training and optimizing the whole fetal heart segmentation model by using the fetal heart ultrasonic image data with the tags;

6) and (4) transmitting the ultrasonic image into the fetal heart segmentation model obtained in the step 5), and obtaining a heart segmentation result after the model is calculated.

2. The fetal heart segmentation method based on the self-supervised migration and global attention of claim 1, wherein: the preprocessing of step 1) includes cropping the ultrasound image to a uniform size.

3. The fetal heart segmentation method based on the self-supervised migration and global attention of claim 1, wherein: and 2) constructing a global attention module, wherein the global average pooling operation, the global maximum pooling operation and the activation function operation are used for constructing a channel attention network, a space attention network and a position attention network from different dimensional directions, and the global attention module is formed by using the three networks.

4. The fetal heart segmentation method based on the self-supervised migration and global attention of claim 3, wherein: the global attention module processes the input feature map X as follows:

(1) processing the channel domain by the characteristic diagram X: the feature map X is subjected to width-based and height-based averaging respectivelyPooling operation and maximum pooling operation based on width and height are processed into two c multiplied by 1 characteristics, and characteristics in the channel direction are obtained; then processing the two through sharing the full connection MLP layer to obtain two c multiplied by 1 characteristics; then adding the two and processing the Sigmoid function to obtain the channel weight M_c(X) adding M_c(X) multiplying the input characteristic diagram X to obtain a channel domain fused characteristic diagram X';

(2) the feature map X' is processed in the spatial domain: firstly, a feature graph X' is subjected to maximum pooling operation and average pooling operation based on channels respectively, then two obtained feature results are spliced in the channel direction, then the channels are reduced into 1 channel through convolution operation, and finally space domain weight M is generated through Sigmoid_s(X'), then weighting the spatial domain by M_s(X ') multiplying the input feature map X ' to obtain a spatial domain fused feature map X ';

(3) the feature map X' is processed in a position domain: the feature map X ' is respectively subjected to a maximum pooling operation and an average pooling operation based on height, and simultaneously respectively subjected to a maximum pooling operation and an average pooling operation based on width, namely, the input feature map X ' is coded by using a pool core (H,1) in a horizontal coordinate, and the input feature map X ' is coded by using the pool core (1, W) in a vertical coordinate; and then splicing the 4 results in the height dimension, splitting the combined result into 4 position information characteristics in the vertical direction and the horizontal direction after convolution operation, and generating the attention weight of a position domain by the four characteristics through Sigmoid.

5. The fetal heart segmentation method based on the self-supervised migration and global attention of claim 1, wherein: and 3) the fetal heart segmentation model is composed of an encoder with a down-sampling process and a decoder with an up-sampling process, wherein symmetrical network layers between the encoder and the decoder are connected by hop connection, and a global attention module is added after splicing operation of the hop connection.

6. The fetal heart segmentation method based on the autonomous migration and the global attention as claimed in claim 1, wherein: step 4) optimizing parameters of an encoder network by using a momentum contrast self-supervision method in the training optimization network, wherein an encoder Q and an encoder K have the same network structure, the encoder Q generates a sample code Q, and the encoder K generates a sample code K; constructing a sample coding queue, wherein the ultrasonic sample codes of the current batch are enqueued, and the ultrasonic sample codes of the earliest batch in the queue are dequeued; a moving average update is used for encoder K and a gradient descent update is used for encoder Q.

7. The fetal heart segmentation method based on the self-supervised migration and global attention of claim 1, wherein: the step 5) specifically comprises the steps of transferring the network parameters obtained in the step 4) into an encoder of a fetal heart segmentation model, training and optimizing the whole fetal heart segmentation model by utilizing a labeled fetal heart data set, and taking a loss function as a comparison segmentation loss L_CSThe following formula:

the loss function is composed of two parts, wherein the first part is used for measuring the difference of feature vectors of different classes, the second part is used for calculating the difference between the segmentation prediction result and the segmentation label of each pixel, and L is_CSFor contrast segmentation loss, q is image sample coding, k⁺For the encoding of the positive image samples it is,

coding for negative image sample, M is size of queue, tau is temperature over-parameter, w and h are width and height of image, y_ijIs a true tag at location i, j, y'_ijAre the class predictors at positions i, j.

8. A computer-readable storage medium storing a computer program, characterized in that: running the program to implement the method of fetal heart segmentation based on unsupervised migration and global attention of any one of claims 1-7.