Background
Congenital heart disease is a disease which is lethal to newborn and has the highest morbidity. According to the Chinese birth defect prevention report, the incidence rate of congenital heart disease of Chinese fetus is about 0.74%. In prenatal ultrasound examination of a fetus, a doctor observes the states of a heart chamber, an atrium and a cusp of the fetus through ultrasound images so as to screen for congenital heart disease. Accurately segmenting the fetal heart region of the ultrasound image can help the physician quickly locate, identify, and even measure the heart region. Segmenting the fetal heart is a challenging task due to speckle noise, strong artifacts, and differences in the angles at which the heart regions are imaged in two-dimensional ultrasound images.
In medical image segmentation tasks of various modalities (CT, MRI, X-ray and ultrasound), a deep learning technology is successfully applied, and a neural network model based on a skip connection coding and decoding structure is one of effective methods. The U-Net model has been applied to the ultrasound segmentation task. The U-Net network has excellent characteristics, but the characteristics of the U-Net network at hop connection positions have the problem of inefficient fusion. The Attention U-Net model tries to introduce Attention to solve the problem, but the Attention U-Net model cannot simultaneously fuse features from channels and spatial scales, and has the defect of long-distance Attention.
In the training process of the medical image segmentation model, the optimization of the neural network needs to continuously fit the image labels. After many training cycles, the model may over-fit on the training set, thereby affecting the generalization ability of the model on the test set. The ultrasonic image has the characteristics of high noise, low contrast and more artifacts, the similarity between the imaged tissues and organs is large, and the boundaries of the tissues and organs are unclear. In the ultrasound image segmentation task, the segmentation labels are made with different tendencies of different physicians due to the blurring of the organ boundaries, which may result in inconsistent segmentation labels.
Disclosure of Invention
In order to more efficiently fuse the coding layer characteristics and decoding layer information at the jump connection, the invention develops a global attention block. The module firstly fuses shallow semantics of a coding layer and deep semantics of a decoding layer, then performs spatial scale fusion on features, and finally uses an attention module with long-distance dependence. Aiming at the problems existing in the overfitting and segmentation labels, a fetal heart segmentation method based on self-supervision migration and global attention is utilized to enable a network to learn network parameters beneficial to an ultrasonic image segmentation task by comparing the difference between ultrasonic images from unlabeled ultrasonic image data, and the network parameters are beneficial to subsequent fine-tuning segmentation performance. Meanwhile, the method can combine the coding layer characteristics and the decoding layer information from multiple angles by combining the global attention module, can pay attention to the mutual information among channels, spaces and positions, and further improves the segmentation performance.
The present invention is directed to solve the above problems of the prior art, and provides a fetal heart segmentation method based on self-supervised migration and global attention, which includes the following steps:
1) images in the fetal heart ultrasound image dataset are preprocessed.
2) A global attention module is constructed.
3) Constructing a fetal heart segmentation model by using the global attention module obtained in the step 2).
4) Constructing a network which is the same as the encoder of the fetal heart segmentation model in the step 3), and training and optimizing the network on the unlabeled fetal heart ultrasonic image data by using an automatic supervision method to obtain network parameters of the network.
5) Initializing the network parameters of the encoder of the fetal heart segmentation model by using the network parameters obtained in the step 4), and training and optimizing the whole fetal heart segmentation model by using the fetal heart ultrasonic image data with the tag.
6) And (5) transmitting the ultrasonic image into the fetal heart segmentation model obtained in the step 5), and obtaining a heart segmentation result after the model is calculated.
In summary, the invention has the following beneficial technical effects:
the invention constructs a fetal heart segmentation method based on self-supervision migration and global attention, which can not only enable the model to pay attention to more important characteristic information from a channel domain, a space domain and a position domain, but also keep certain characteristics of an ultrasonic image, and improve the heart segmentation capability and generalization capability of the model. Compared with the other four methods, the method obtains the best scores on the Dice, HD, IoU and Sensitivity indexes.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly in the following with reference to the accompanying drawings. The described implementations are only some of the embodiments of the present invention.
As shown in fig. 1, a fetal heart segmentation method based on self-supervised migration and global attention includes:
1) the method includes preprocessing images in the fetal heart ultrasound image dataset, the preprocessing including cropping the ultrasound images, all of the ultrasound images being cropped to 224 × 224 size under uniform conditions.
2) A global attention module is constructed. As shown in fig. 2, the global attention module is composed of a channel attention network, a spatial attention network, and a location attention network. And constructing a channel attention network, a spatial attention network and a position attention network from different dimensional directions by utilizing a global average pooling operation, a global maximum pooling operation and an activation function operation. The global attention module is formed by the three networks. The global attention module processes the input feature map X as follows:
1. the feature map X is processed in the channel domain. The feature map X is processed into two c × 1 × 1 features through an average pooling operation based on width and height and a maximum pooling operation based on width and height, respectively, and features in the channel direction are obtained. Then the two are processed by a shared fully-connected MLP layer to obtain two c multiplied by 1 characteristics, wherein c represents the number of channels of the characteristic diagram. I.e. to obtain relevant information between different channels. Then adding the two and processing the Sigmoid function to obtain the channel weight Mc(X) so that different channels are obtainedAttention is paid. The whole process is shown as formula (1):
Mc(X)=σ(MLP(AvgPool(w,h)(X))+MLP(MaxPool(w,h)(X))) (1)
wherein σ is a Sigmoid activation function. MLP is a multilayer shared fully-connected layer, AvgPool(w,h)To perform a broad and high based tie pooling operation on the input feature map. Maxport(w,h)To perform maximum pooling operations on feature maps based on width and height. Will McAnd (X) multiplying the input feature map X to obtain a channel domain fused feature map X'. Namely, as shown in formula (2):
X′=X×Mc(X) (2)
2. the feature map X' is processed in the spatial domain. Firstly, a feature diagram X' is subjected to maximum pooling operation and average pooling operation based on channels respectively, then two obtained feature results are spliced in the channel direction, and then the dimension of the channel is reduced to 1 channel through convolution operation. Finally, generating spatial domain weight M through Sigmoids(X'), thus obtaining attention in the spatial domain. The whole process is shown as formula (3):
Ms(X′)=σ(f7×7([AvgPoolc(X′);MaxPoolc(X′)])) (3)
wherein f is7×7Is a 7 x 7 convolution operation. [;]for splicing operations in the channel direction. Avgpool(c)To perform a channel-based tie pooling operation on the feature map. Maxport(c)To perform a channel-based max pooling operation on the feature map. And then multiplying the spatial domain weight by the input feature map X' to obtain a spatial domain fused feature map. Then, a 3 × 3 convolution kernel is performed to obtain a characteristic diagram X ″, which is shown in formula (4):
X″=f3×3(X′×Ms(X′)) (4)
3. the profile X "is processed in the location domain. The profile X "is subjected to a high-based maximum pooling operation and an average pooling operation, respectively, while being subjected to a wide-based maximum pooling operation and an average pooling operation, respectively. That is, the input feature map X' is encoded in horizontal coordinates using a pool kernel (H,1), H being the set kernel size. The input feature map X' is encoded in vertical coordinates using a pool kernel (1, W), W being the set kernel size. These 4 results are then stitched in the height dimension. And after the merged result is subjected to convolution operation, splitting the merged result into 4 position information characteristics in the vertical direction and the horizontal direction. The four features are passed through Sigmoid to generate attention weights for the location domain. The whole process is shown as formula (5):
Ma(X″)=σ(f1×1(Split(f1×1([AvgPoolh(X″);MaxPoolh(X″);AvgPoolw(X″);MaxPoolw(X″)])))) (5)
wherein, Split is a splitting operation. The four location domain weights M are then weightedaAnd (X ') is multiplied with the input feature map X' to obtain a fusion feature in the position direction. Namely, formula (6):
X″′=X″×Ma(X″) (6)
3) constructing a fetal heart segmentation model by using the global attention module obtained in the step 2). And constructing a fetal heart segmentation model based on an encoder-decoder by utilizing a global attention module and a jump connection operation. The fetal heart segmentation model is composed of an encoder with a down sampling process and a decoder with an up sampling process, wherein symmetrical network layers between the encoder and the decoder are connected by jump connections, and a global attention module is added after the splicing operation of the jump connections.
4) And (3) constructing a network which is the same as the encoder of the fetal heart segmentation model in the step 3), and training and optimizing the network on the unlabeled fetal heart ultrasonic image data set by using an automatic supervision method to obtain network parameters of the network. A momentum contrast self-supervision approach is used to optimize the parameters of the encoder network. As shown in fig. 3, the momentum contrast method provides a large number of negative sample encodings for contrast loss, mainly through queues. The encoder Q has the same network structure as the encoder K. The encoder Q generates a sample code Q; the encoder K generates a sample encoding K. In order to provide a large number of negative sample encodings, a large sample encoding queue is constructed. Ultrasound sample encoding enqueuing for the current lot, with the earliest lot in the queueThe next ultrasound image sample is encoded out of queue. Decoupling large negative sample numbers from small batches through dynamic variation enables smaller batches to be used to train the encoder, while the queue can provide a larger number of negative sample codes k-. The current batch is regarded as a positive sample k+And enqueue again and increase as negative sample k-. The sample codes in the queue come from a plurality of batches, and in order to ensure the consistency of the ultrasonic sample codes in the queue, a moving average updating method is used to make the encoder K change slowly, and only gradient descending updating is used for the encoder Q.
5) Initializing the network parameters of the encoder of the fetal heart segmentation model by using the network parameters obtained in the step 4), and training and optimizing the whole fetal heart segmentation model by using the labeled fetal heart data set. The loss function used is the contrast segmentation loss LCSAs shown in equation (7). The loss function is composed of two parts, wherein the first part is used for measuring the difference of feature vectors of different classes, and the second part is used for calculating the difference between the segmentation prediction result of each pixel and the segmentation label.
Wherein L is
CSTo compare segmentation loss. And q is image sample coding. k is a radical of
+Positive image samples are encoded.
The negative image samples are encoded. M is the size of the queue. Tau is a temperature over-parameter and is used for controlling the concentration degree of the distance so as to enable the distance distribution to be smoother. w and h are the width and height of the image respectively. y is
ijAs a true tag at positions i, j, y
i′
jThe predicted values of the categories at the positions i and j are obtained. Two ultrasound image views produced by the same ultrasound image using different data enhancements are considered as positive sample pairs, while ultrasound image pairs that are not identical are considered as negative sample pairs. Sample coding k
+For positive sample coding corresponding to sample coding q, i.e. two ultrasound views of the same ultrasound imageThe graph is subjected to compression coding generated by the underlying network. Sample coding
And coding M negative samples corresponding to the sample codes q, namely performing compression coding on different ultrasonic images generated by an underlying network. Dot product operation means using cosine similarity to measure the distance between two sample codes, i.e. measuring the difference between ultrasound images, q and k
+The more similar, q.k
+The closer the operation result is to 1, the smaller the difference between the corresponding ultrasonic image pairs, and q are the same
The more different the number of the bits is,
the result of the operation will be approximately 0, and the greater the disparity between the corresponding ultrasound image pairs. When q and k are
+The more similar to
The more different the function the lower the loss value will be.
6) And (3) transmitting the ultrasonic image into the fetal heart segmentation model obtained in the step 5), and obtaining a probability map after the calculation of an encoder, the jump connection, the calculation of a decoder, the calculation of the global attention and the calculation of an activation function of the model, so as to obtain a final heart segmentation result.
Fig. 4 shows a fetal heart segmentation prototype system constructed by the method of the present invention, which shows the result of fetal heart segmentation.
The above examples are to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. Various changes or modifications equivalent to those made according to the present invention also fall within the scope of the present invention defined by the appended claims.