CN114240966B

CN114240966B - Self-supervision learning method for 3D medical image segmentation training feature extractor

Info

Publication number: CN114240966B
Application number: CN202111523320.1A
Authority: CN
Inventors: 夏勇; 廖泽慧; 谢雨彤
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2024-03-15
Anticipated expiration: 2041-12-13
Also published as: CN114240966A

Abstract

The invention discloses a self-supervision learning method for a 3D medical image segmentation training feature extractor, which comprises three parts of data augmentation, an online path and a target path; giving an input image, firstly randomly extracting two image blocks from the input image, wherein the two image blocks are overlapped in a certain proportion; then, respectively carrying out different data augmentation on the two image blocks, and then respectively inputting the two image blocks into an online path and a target path; since the data augmentation performed by the two image blocks is known, the relative position correspondence of the two image blocks at the pixel level in the feature space is easy to obtain; each pair of matched pixels in the two image blocks represents the same local area in the original image, and the model judges that the corresponding local characteristics in the two image blocks are consistent; through the constraint of local consistency, the model can learn the feature extraction capability of finer granularity in the 3D medical image data, and can preserve the spatial structural relationship inside the image.

Description

Self-supervision learning method for 3D medical image segmentation training feature extractor

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a self-supervision learning method.

Background

In recent years, a method based on a deep convolutional neural network has achieved great success in a 3D image segmentation task. In deep learning, in order to ensure that a model obtained by training has higher accuracy and generalization, a large amount of annotation data is generally required to be provided. However, in the medical image field, the segmentation and labeling of the 3D image are time-consuming and labor-consuming, and the amount of available labels is very limited. Unlabeled 3D medical image data is relatively more readily available and the data is more voluminous. Self-supervision learning is one of the very hot research directions in the field of artificial intelligence in recent years, and by using large-scale unmarked data, the deep learning model can automatically learn the characteristic characterization capability from massive data without any expert annotation under the self-defined supervision task. For example, paper Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning published in conference 34th Conference on Neural Information Processing Systems is a self-supervised learning method for keeping information obtained by a model consistent even when an input image is transformed differently, and feature extractors obtained by training The method show very excellent performance in downstream tasks such as image segmentation, classification and detection.

However, current self-supervised learning approaches are basically designed based on global consistency constraints of images, i.e. the model is required to consider the information obtained from two different perspectives of the same image to be consistent. The consistency on the whole image level has global property, so that the spatial structure relation inside the image in the model is destroyed to a certain extent, and the downstream intensive prediction task such as 3D medical image segmentation is not facilitated.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a self-supervision learning method aiming at a 3D medical image segmentation training feature extractor, which comprises three parts of data augmentation, an online path and a target path; giving an input image, firstly randomly extracting two image blocks from the input image, wherein the two image blocks are overlapped in a certain proportion; then, respectively carrying out different data augmentation on the two image blocks, and then respectively inputting the two image blocks into an online path and a target path; since the data augmentation performed by the two image blocks is known, the relative position correspondence of the two image blocks at the pixel level in the feature space is easy to obtain; each pair of matched pixels in the two image blocks represents the same local area in the original image, and the model judges that the corresponding local characteristics in the two image blocks are consistent; through the constraint of local consistency, the model can learn the feature extraction capability of finer granularity in the 3D medical image data, and can preserve the spatial structural relationship inside the image.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

step 1: constructing a data augmentation model;

step 1-1: definition data augmentation module

Data augmentation moduleIncluding 6 augmentation modes: (1) overturning: randomly flipped with 50% probability in 3 axes of the 3D image; (2) clipping and scaling: cutting the image blocks randomly, interpolating the cut image blocks, and recovering the size to be the size before cutting; (3) Gaussian noise is added with 10% probability, variance from [0, a]Randomly selecting seeds; (4) Gaussian blur with 20% probability, sigma of gaussian kernel is from [ b,1]Randomly selecting seeds; (5) adjusting brightness and contrast with 50% probability; (6) The voxel-by-voxel gamma transformation is performed with 50% probability as follows:

wherein i is _new Representing the voxel values after gamma conversion, i _old Representing voxel values before gamma conversion; lambda is from [0.7,1.5 ]]Randomly sampled values within the range; normalizing and adjusting the transformed voxel value to [0,1 ]]；

Step 1-2: given a label-free 3D medical image training sample x, extracting two image blocks containing overlapping areas from the x; slave data augmentation moduleTwo different combinations of augmentation modes are randomly selected, two different augmentation modes are respectively carried out on the two image blocks extracted in x, and the image blocks after augmentation are respectively expressed as x ₁ And x ₂ The method comprises the steps of carrying out a first treatment on the surface of the Note that if the (2) th augmentation mode is selected, the overlapping area of the two image blocks is completely reserved during clipping;

step 2: constructing an online path module and a target path module;

the online path module and the target path module are parallel and respectively comprise an encoder, a mapper and a space position regulator in sequence;

encoders of the online path module and the target path module both adopt 3D ResNet50 network structures for extracting the amplified image block x ₁ And x ₂ Is a feature of the image;

the mappers of the online path module and the target path module are all of a convolution layer-batch normalization layer-activation function layer-convolution layer structure and are used for mapping the output of each encoder to a feature space; the outputs of the online path module and the target path module mappers are f respectively ₁ And f ₂ ；

The spatial position adjusters of the online path module and the target path module firstly respectively and sequentially align two features, and the specific operation is as follows:

(1) Mirror image adjustment: for the (1) th augmentation mode, for f ₁ And f ₂ Respectively turn τ ₁ And τ ₂ To the inverse operation of (a) to make f after inverse amplification ₁ And f ₂ I.e.And->Consistent with the position in the original image:

wherein,and->Refers to turning tau ₁ And τ ₂ Is the reverse of the above;

if the mode of augmentation to the input image block in step 1-2 is not the (1) th mode of augmentation, then:

(2) ROI adjustment: for the (2) th augmentation mode, first, from the known augmented location informationAnd->Extracting features of the overlapping region, and then adjusting the overlapping region to be respectively matched with f by bilinear interpolation ₁ And f ₂ Is of the same size, the resulting aligned ROI is characterized as +.>And->Wherein F represents flip, CS represents clipping and scaling;

if the mode of augmentation to the input image block in step 1-2 is not the (2) th mode of augmentation, then:

the online path module is a mapper after the two features are aligned; the target path module has no mapper;

step 3: pre-training a self-supervision learning model by utilizing local consistency loss;

the self-supervision learning model comprises a data augmentation model, an online path module and a target path module;

designing a loss function based on local consistency constraints when x ₁ ,x ₂ When the online path and the target path are respectively input:

wherein Y is _θ A mapper representing the on-line path,is L in the channel dimension of the feature ₂ Normalizing, wherein N represents the input quantity, and H, W and D represent the height, width and depth of the feature respectively;

when x is ₁ ,x ₂ The input target path and the on-line path respectively also meet the local consistency constraint (1), namely L _local (x ₂ ,x ₁ )；

Thus, the overall satisfied local consistency constraint is:

L _total (x ₁ ,x ₂ )＝L _local (x ₁ ,x ₂ )+L _local (x ₂ ,x ₁ ) (2)

pre-training the self-supervision learning model by adopting the loss function of the step (2) until the pre-training is completed;

step 4: constructing a 3D medical image segmentation network;

the 3D medical image segmentation network comprises three parts in sequence: a 3d resnet50 structured encoder, a spatial pyramid pooling module and a decoder; wherein three jump connections are also included between the encoder and decoder;

the encoder with the 3D ResNet50 structure is a pre-trained encoder in a self-supervision learning model online path; the space pyramid pooling module comprises 3 separable convolution layers, and the expansion rates are 2,4 and 8 respectively; the decoder comprises 4 deconvolution blocks and one convolution layer for segmentation;

step 5: verifying the effect of self-supervision pre-training on 3D medical image segmentation;

augmenting a 3D medical image segmentation dataset;

the optimizer used in the network training is a random gradient descent optimizer, and the loss function is a combination of the Dice similarity coefficient loss and the cross entropy loss;

and 3D medical image segmentation network training is completed, and 3D medical image segmentation is realized.

Preferably, a=0.1 and b=0.5.

The beneficial effects of the invention are as follows:

the existing self-supervision learning method is basically designed based on global consistency constraint of images, so that the feature extractor obtained through training can easily ignore the spatial structure relationship in the images and only pay attention to global image features, and is not suitable for downstream tasks of dense prediction such as 3D medical image segmentation. The invention provides a self-supervision learning method based on local consistency constraint, which can lead a model to pay attention to the spatial structure relation in an image and learn the feature extraction capacity with finer granularity, so that the feature extractor obtained by training is more suitable for downstream 3D medical image segmentation tasks.

Drawings

FIG. 1 is a schematic block diagram of the method of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

The existing self-supervision learning task is basically designed based on global consistency constraint of images, which ensures consistency on the whole image level, but damages the spatial structure relationship inside the images in the model to a certain extent, so that the learned feature characterization does not completely fit the dense prediction task such as 3D medical image segmentation.

In order to better learn the spatial structure relationship in the image, the invention provides a self-supervision learning method based on local consistency constraint. The method comprises three parts of data augmentation, online path and target path. Given an input image, two image blocks are first randomly extracted from the input image, and in order to ensure the existence of a later constraint relationship, a certain proportion of overlapping of the two image blocks is required. And then respectively carrying out different data augmentation, such as rotation, scaling, mirror symmetry and the like, on the two image blocks, and then respectively inputting the two image blocks subjected to the data augmentation into an online path and a target path. Because different data augmentation modes are used, two image blocks do not have one-to-one corresponding spatial position relationship, and therefore the characteristics obtained by the two paths are difficult to directly match at the pixel level. However, since the data augmentation by two image blocks is known, the relative positional correspondence of the two image blocks at the pixel level in the feature space is readily obtained. Each pair of matched pixels in the two image blocks characterizes the same local region in the original image, so the model is required to consider that the corresponding local features in the two image blocks should remain consistent. Through the constraint of local consistency, the model can automatically learn the feature extraction capability of finer granularity in the 3D medical image data, and can preserve the spatial structural relationship inside the image.

Given a label-free training sample x, firstly extracting two image blocks with a certain overlapping area from x, and inputting the image blocks into a data amplification moduleThe two image blocks are amplified in two different amplifying modes, and the amplified image blocks are respectively expressed as x ₁ And x ₂ Then respectively X ₁ And x ₂ The input to the online path module and the target path module are used for feature extraction and feature alignment. The features extracted by the online path module and the target path module are respectively f ₁ And f ₂ To represent. Since the way of augmentation is known, it is easy to apply to f ₁ And f ₂ Alignment at the pixel level is performed. The present model introduces local consistency loss so that the aligned features become consistent in the optimization. After training, only an encoder (namely a feature extractor) of the online path module is needed to build a segmentation model of the 3D medical image segmentation task.

A self-supervised learning method for a 3D medical image segmentation training feature extractor, comprising the steps of:

step 1: constructing a data augmentation model;

step 1-1: definition data augmentation module

Data augmentation moduleIncluding 6 augmentation modes: (1) overturning: randomly flipped with 50% probability in 3 axes of the 3D image; (2) clipping and scaling: cutting the image blocks randomly, interpolating the cut image blocks, and recovering the size to be the size before cutting; (3) Gaussian noise is added with 10% probability, variance from [0,0.1 ]]Randomly selecting seeds; (4) Gaussian blur is performed with 20% probability, and σ of the Gaussian kernel is from [0.5,1]Randomly selecting seeds; (5) adjusting brightness and contrast with 50% probability; (6) The voxel-by-voxel gamma transformation is performed with 50% probability as follows:

step 2: constructing an online path module and a target path module;

wherein,and->Refers to turning tau ₁ And τ ₂ Is the reverse of the above;

wherein Y is _θ Representing an online pathThe mapping means may be a function of the mapping means,is L in the channel dimension of the feature ₂ Normalizing, wherein N represents the input quantity, and H, W and D represent the height, width and depth of the feature respectively;

Thus, the overall satisfied local consistency constraint is:

step 4: constructing a 3D medical image segmentation network;

augmenting a 3D medical image segmentation dataset;

Claims

1. A self-supervised learning method for a 3D medical image segmentation training feature extractor, comprising the steps of:

step 1: constructing a data augmentation model;

step 1-1: definition data augmentation module

wherein i is _new Representing the voxel values after gamma conversion, i _oid Representing voxel values before gamma conversion; lambda is from [0.7,1.5 ]]Randomly sampled values within the range; normalizing and adjusting the transformed voxel value to [0,1 ]]；

Step 1-2: given a label-free 3D medical image training sample x, extracting two image blocks containing overlapping areas from the x; slave data augmentation moduleTwo different combinations of augmentation modes are selected randomly, and two different augmentation modes are respectively carried out on the two image blocks extracted in the x, so that the image blocks are augmentedThe image blocks after are respectively denoted as x ₁ And x ₂ The method comprises the steps of carrying out a first treatment on the surface of the Note that if the (2) th augmentation mode is selected, the overlapping area of the two image blocks is completely reserved during clipping;

step 2: constructing an online path module and a target path module;

wherein,and->Refers to turning tau ₁ And τ ₂ Is the reverse of the above;

(2) ROI adjustment: for the (2) th augmentation mode, first, from the L-known augmented location informationAnd->Extracting features of the overlapping region, and then adjusting the overlapping region to be respectively matched with f by bilinear interpolation ₁ And f ₂ Is of the same size, the resulting aligned ROI is characterized as +.>And->Wherein F represents flip, CS represents clipping and scaling;

designing a loss function based on local consistency constraints when x ₁ ，x ₂ When the online path and the target path are respectively input:

when x is ₁ ，x ₂ The input target path and the on-line path respectively also meet the local consistency constraint (1), namely L _local (x ₂ ，x ₁ )；

Thus, the overall satisfied local consistency constraint is:

L _total (x ₁ ，x ₂ )＝L _local (x ₁ ，x ₂ )+L _local (x ₂ ，x ₁ ) (2)

step 4: constructing a 3D medical image segmentation network;

augmenting a 3D medical image segmentation dataset;

2. A self-supervised learning method for a 3D medical image segmentation training feature extractor as defined in claim 1, wherein a = 0.1 and b = 0.5.