CN112686898B

CN112686898B - Automatic radiotherapy target area segmentation method based on self-supervision learning

Info

Publication number: CN112686898B
Application number: CN202110274005.3A
Authority: CN
Inventors: 章毅; 柏森; 余程嵘; 宋�莹; 胡俊杰; 王强; 张海仙; 郭际香; 郭泉
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-08-13
Anticipated expiration: 2041-03-15
Also published as: CN112686898A

Abstract

The invention discloses a radiotherapy target area automatic segmentation method based on self-supervision learning, which relates to the technical field of image processing and comprises the following steps: 1) preparing data: collecting original CT data, separating a labeled data set and a non-labeled data set, and sketching the labeled data set; 2) feature extraction: constructing a pre-training network based on self-supervision learning, inputting a label-free data set into the pre-training network for iterative training, and selecting an optimal pre-training model; 3) and (3) generating a segmentation model: and constructing a segmentation network, loading the trained self-supervision pre-training model into the segmentation network, inputting the labeled data set into the segmentation network for iterative training to select an optimal model, and finally testing and evaluating the segmentation performance of the model. The invention uses the coordinate label of CT data to pre-train the self-supervision task without designing a new label; the pre-trained model contains shallow features of the CT image, and therefore has a faster convergence rate when the segmentation task is performed.

Description

Automatic radiotherapy target area segmentation method based on self-supervision learning

Technical Field

The invention relates to the technical field of image processing, in particular to the technical field of an automatic segmentation method of a radiotherapy target area based on self-supervision learning.

Background

Radiotherapy (abbreviated as radiotherapy) is one of the important means for treating malignant tumors at present, and the purpose of treatment is mainly achieved by irradiating tumor parts with a certain dose of radioactive rays. Modern radiotherapy comprises a series of normative procedures, and the main work of the radiotherapy comprises target region delineation, plan making, plan implementation, reverse optimization and the like. Wherein the target area is sketched mainly by the physical engineer to CT image data carry out the sketching of target area and organs at risk, and the time proportion that the sketching task of common case is all great, and the difference on the result is often brought because of the difference of the custom of sketching to artificial sketching.

The method aims to reduce the time cost of the sketching task in the radiotherapy process and further reduce the influence of the artificial sketching habit on the sketching result. An image processing algorithm is often introduced into the delineation task to realize automatic delineation, wherein a segmentation algorithm based on a deep neural network is one of important means for realizing medical image processing. The neural network is mainly formed by connecting and combining a plurality of neurons and mainly comprises an input layer, a hidden layer and an output layer. Weight information is carried among the neurons of the hidden layer, and the weight is continuously updated to approximate the highly nonlinear relation between the input data and the characteristics of the input data through the previous calculation and the backward propagation during training. After multiple times of training, the network output is converged, an optimal network model is selected according to the evaluation index, and finally, the input image can be automatically segmented based on the model.

In the prior art, a deep neural network-based automatic delineation is adopted, firstly, a physical engineer needs to delineate a large amount of CT image data to be trained, the delineated data are divided into a training set and a test set according to a certain proportion, a network structure is designed, the training set data are input into a network for training, an optimal model is selected according to the test set after training is iterated for multiple times, and therefore the final model is determined to be put into use.

However, the prior art has the following disadvantages: in the current automatic delineation technology based on the deep neural network, in order to achieve an optimal delineation model, a large amount of training data is required to be input in the early stage, which increases the time cost and equipment cost brought by the training model in the early stage work.

Disclosure of Invention

The invention aims to: in order to solve the technical problem, the invention provides a radiotherapy target region automatic segmentation method based on self-supervision learning.

The invention specifically adopts the following technical scheme for realizing the purpose:

a radiotherapy target area automatic segmentation method based on self-supervision learning comprises the following steps:

step 1, data preparation: collecting original CT data, separating a labeled data set and a non-labeled data set, and sketching the labeled data set;

step 2, feature extraction: constructing a pre-training network based on self-supervision learning, inputting a label-free data set into the pre-training network for iterative training, and selecting an optimal pre-training model;

step 3, generating a segmentation model: and constructing a segmentation network, loading the trained self-supervision pre-training model into the segmentation network, inputting the labeled data set into the segmentation network for iterative training to select an optimal model, and finally testing and evaluating the segmentation performance of the model.

Further, in step 1, the data preparation specifically includes the following steps:

step 1a, data acquisition: a physicist scans a target region of a patient by using a CT machine to obtain CT image data, and a plurality of continuous CT images are generated by one-time scanning;

step 1b, data division: dividing the obtained original CT data into a non-labeled data set and a labeled data set according to the ratio of 6:4, and respectively corresponding to a pre-learning task and a segmentation task;

step 1c, data processing: and (4) performing target region delineation on the labeled data set in the step 1b by a physicist according to pathological features of the case, and generating a data set containing target region segmentation labels.

Further, in step 2, the feature extraction mainly generates a pre-training model through an executed pre-learning task, and specifically includes the following steps:

step 2a, constructing a pre-learning network: the pre-learning task inputs two different CT images belonging to a case each time, and the output result is a single numerical value each time, so that the pre-learning network is realized by combining two twin deep neural networks in a weight sharing mode, and the pre-learning network comprises two modules of a coder and a fusion device;

step 2b, sample data set construction: the CT data acquired in step 1 contains a plurality of cases, and it is assumed that a patient is waiting for treatmentThe physical CT data comprises N cases, each case comprises M CT images, and the size of the constructed sample set is equal to that of the sample set

Each sample can be represented as a triplet slice1, slice2, Dis, where slice1 and slice2 represent two CT images of the same case, respectively, and Dis represents the relative distance between the two CT images, which is calculated as:

wherein,

are the coordinate values of the CT image,

is the relative distance between the two CT images;

step 2c, CT data preprocessing, namely performing data augmentation operation on the unlabeled CT image data obtained in the step 1, and realizing the data augmentation operation through random overturning, translation and rotation, so that the pre-learning model can learn the characteristics with stronger robustness;

step 2d, pre-learning model training: and (4) the augmented CT data is calculated according to the following weight ratio of 8: and 2, dividing the proportion of the data into a training set and a test set, inputting the data of the training set into the pre-learning network designed in the step 2a, sequentially performing previous calculation, back propagation and repetition for multiple times until the output loss value is converged, and stopping training when the output on the test set is the minimum value in multiple times of training, thereby completing feature extraction and obtaining the pre-learning model.

Further, in step 2a, the encoder mainly comprises a plurality of convolution layers and pooling layers, and respectively extracts the features of two input CT images to obtain two decoded feature vectors;

the fusion device aims to calculate and fuse the feature vectors extracted by the two networks, the method adopted by the fusion is a vector maximum interpolation method, namely, the maximum value on each channel in the two vectors is selected as a fusion vector, and the calculation formula is as follows:

wherein

Represents the output vector after the two networks are merged,

、

respectively representing the output vectors of the twin deep neural network 1 and the twin deep neural network 2.

Further, in step 3, a pre-training model is obtained after feature extraction is completed, the pre-training model carries shallow image features of CT data, and in this step, a small amount of data containing segmentation labels are input into a segmentation network for training based on the pre-training model to obtain a segmentation model, and the specific steps are as follows:

step 3a, constructing a segmentation network, wherein the segmentation model network is designed based on a deep neural network and comprises an encoder, a random multi-scale module and a decoder;

step 3b, loading a pre-learning model: because the structure of the encoder of the pre-learning network is consistent with that of the encoder of the segmentation network during network design, the pre-training model obtained in the step 2 is directly loaded into the segmentation network, so that the segmentation network can directly obtain the characteristics of the CT image and can be suitable for scenes with too little label data;

step 3c, CT data preprocessing: considering that the labeled CT data volume is small, in order to make the segmentation model effect more feasible, the labeled CT data set obtained in step 1 needs to be subjected to augmentation operation, which is mainly implemented by random inversion, translation and rotation;

step 3d, training a segmentation model: the method comprises the following steps of performing model training by using a deep neural network for image segmentation; firstly, the data after the augmentation operation is divided into 7: 1: 2, dividing the proportion of the training set, the verification set and the test set, inputting the training set data into the network designed in the step 3a, sequentially performing previous item calculation, back propagation and repetition for multiple times until the loss value output by the network is converged, and stopping training when the output on the verification set is the minimum value in multiple times of training, thereby obtaining the optimal model result of the segmentation model network;

step 3e, segmentation model testing: after the training of the segmentation network model is completed, the segmentation effect of the segmentation model on the test set in (3d) needs to be quantitatively evaluated, and commonly used evaluation indexes include TPVF, PPV, and DSC, which are defined as follows:

wherein

And

the model is used for evaluating the segmentation result, and the model is used for evaluating the segmentation result and is mainly used for evaluating the TPVF and the PPV.

Further, in step 3a, the encoder is consistent with the structure of the encoder of the pre-training network, and the segmentation network encoder is composed of a series of convolution and pooling layers and is used for extracting abstract features from the input CT image.

Further, in step 3a, the random multi-scale module mainly comprises a cavity convolution module with 4 different cavity rates and a global average pooling layer, and the random multi-scale module further processes the feature vector output by the encoder to enable the feature vector to have features with different scales, so as to improve the robustness of the model to the features with different scales.

Further, in step 3a, the decoder mainly aims to restore the previous features to the original input size, generate a class prediction for the original picture on each pixel, and merge the deep features and the shallow features, so that the feature multiplexing of the model can be improved, and the convergence of the model can be accelerated.

The invention has the following beneficial effects:

1. the invention uses the coordinate label of CT data to pre-train the self-supervision task without designing a new label; the pre-training model generated in the pre-learning task contains shallow features of the CT image, so that the convergence speed is high when the segmentation task is executed; since the pre-trained model contains the general features of CT data, segmentation model training can also be performed for most organs at risk and target regions.

2. The traditional target region segmentation method based on the deep neural network needs a large amount of labeled CT data, the actual labeled CT data are extremely few, and a physical engineer is required to consume a great amount of time and cost to draw up training data. The method can train the segmentation model only by less labeled data, greatly saves the labor and time cost, and has important guiding significance for the automatic segmentation work of the target area based on the neural network in the future.

Drawings

FIG. 1 is a schematic flow diagram of the system of the present invention;

FIG. 2 is a schematic diagram of "relative position" of a CT scan;

FIG. 3 is a schematic diagram of a pre-learning network architecture;

fig. 4 is a schematic diagram of a split network architecture.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Example 1

As shown in fig. 1, the present embodiment provides a method for automatically segmenting a radiotherapy target region based on self-supervised learning, including the following steps:

step 2, feature extraction: constructing a pre-training network based on self-supervised learning according to the characteristics of CT data, inputting a label-free data set into the pre-training network for iterative training, and selecting an optimal pre-training model;

step 3, generating a segmentation model: and constructing a segmentation network according to the segmentation task, loading the trained self-supervision pre-training model into the segmentation network, inputting the labeled data set into the segmentation network for iterative training to select an optimal model, and finally testing and evaluating the segmentation performance of the model.

In the step 1, different from the traditional image segmentation method based on the deep neural network, the method performs model training by using a method of combining a pre-learning task and a segmentation task based on self-supervision learning, and the pre-learning task performs feature extraction by using unlabeled original data; in the segmentation task, based on the extracted features, the labeled data is used for training to generate a segmentation model. For the two tasks, the step firstly needs to prepare the data to be processed, and the data preparation specifically comprises the following steps:

Example 2

The embodiment is further optimized on the basis of embodiment 1, and specifically comprises the following steps:

in the step 2, the feature extraction mainly generates a pre-training model through an executed pre-learning task, and the pre-learning task mainly uses the self-carried features of the data as labels to design and train the network. The CT images generated in the same scan are continuous, and therefore there is a similarity in features between any two CT images in the same scan. Considering that the CT scanner generates a plurality of information labels when generating the CT images, including coordinate information of the CT images, which indicates the scanning positions of the CT images, as shown in fig. 2, it can be known that two CT images have a "relative distance" therebetween according to the coordinate Z pointing to the head. Thus, the deep nerves can extract image features of the CT by learning such "relative distance" between CT images. The method specifically comprises the following steps:

step 2a, constructing a pre-learning network: the pre-learning task inputs two different CT images belonging to a case each time, and the output result is a single numerical value each time, so that the pre-learning network is realized by combining two twin deep neural networks in a weight sharing mode, and the pre-learning network comprises two modules of an encoder and a fusion device, as shown in figure 3.

Step 2b, sample data set construction: in the step, the acquired CT data is required to be constructed into an available sample data set, the CT data acquired in the step 1 comprises a plurality of cases, N cases are assumed in the CT data to be processed, each case comprises M CT images, and the size of the constructed sample set is required to be equal to that of the CT data to be processed

wherein,

are the coordinate values of the CT image,

is the relative distance between the two CT images;

step 2d, training a pre-learning model: and (4) the augmented CT data is calculated according to the following weight ratio of 8: and 2, dividing the proportion of the data into a training set and a test set, inputting the data of the training set into the pre-learning network designed in the step 2a, sequentially performing previous calculation, back propagation and repetition for multiple times until the output loss value is converged, and stopping training when the output on the test set is the minimum value in multiple times of training, thereby completing feature extraction and obtaining the pre-learning model.

In step 2a, the encoder mainly comprises a plurality of convolution layers and pooling layers, and respectively extracts the characteristics of two input CT images to obtain two decoded characteristic vectors;

wherein

Represents the output vector after the two networks are merged,

、

Example 3

The embodiment is further optimized on the basis of embodiment 1 or 2, and specifically comprises the following steps:

in step 3, a pre-training model is obtained after feature extraction is completed, the pre-training model carries shallow image features of CT data, and a small amount of data containing segmentation labels are input into a segmentation network for training based on the pre-training model to obtain a segmentation model, wherein the method specifically comprises the following steps:

step 3a, constructing a segmentation network, wherein the segmentation model network is designed based on a deep neural network and comprises an encoder, a random multi-scale module and a decoder; the structure of the encoder part of the split model network is consistent with that of the encoder of the pre-learning network, and the main structure is shown in fig. 4 as follows:

wherein

And

In step 3a, the encoder is consistent with the encoder of the pre-training network in structure, and the segmentation network encoder is composed of a series of convolution and pooling layers and is used for extracting abstract features from the input CT image.

In step 3a, the random multi-scale module mainly comprises a cavity convolution module with 4 different cavity rates and a global average pooling layer, and the random multi-scale module further processes the feature vector output by the encoder to enable the feature vector to have features with different scales, so that the robustness of the model to the features with different scales is improved.

In step 3a, the decoder mainly aims to restore the features of the previous step to the original input size, generate the class prediction on each pixel of the original picture, and the decoder integrates the deep-layer features and the shallow-layer features, so that the feature multiplexing of the model can be improved, and the convergence of the model can be accelerated.

Claims

1. A radiotherapy target area automatic segmentation method based on self-supervision learning is characterized by comprising the following steps:

step 2, feature extraction: constructing a pre-training network based on self-supervision learning, inputting a label-free data set into the pre-training network for iterative training, and selecting an optimal pre-training model; in step 2, the feature extraction mainly generates a pre-training model through an executed pre-learning task, and specifically comprises the following steps:

step 2a, constructing a pre-learning network: the pre-learning network is realized by combining two twin deep neural networks in a weight sharing mode, and comprises two modules, namely an encoder and a fusion device; inputting two different CT images belonging to a case into a pre-learning network every time, and outputting a result as a single numerical value every time;

step 2b, sample data set construction: the CT data acquired in the step 1 comprises a plurality of cases, N cases are assumed to exist in the CT data to be processed, each case comprises M CT images, and the size of the constructed sample set is equal to

Dis＝|Z_slice1-Z_sliice2|

wherein Z is the coordinate value of the CT image, and Dis is the relative distance between the two CT images;

step 2c, CT data preprocessing, namely performing data augmentation operation on the label-free CT image data obtained in the step 1 in a random overturning, translation and rotation mode;

step 2d, pre-learning model training: and (4) the augmented CT data is calculated according to the following weight ratio of 8: 2, inputting the training set data into the pre-learning network designed in the step 2a, sequentially performing previous item calculation, back propagation and repetition for multiple times until the output loss value is converged, and stopping training when the output on the test set is the minimum value in multiple times of training, thereby obtaining a pre-learning model by completing feature extraction;

step 3, generating a segmentation model: constructing a segmentation network, loading the trained self-supervision pre-training model into the segmentation network, and inputting the labeled data set into the segmentation network for iterative training to select an optimal model;

step 3b, loading a pre-learning model: when the network is designed, the structures of the encoder of the pre-learning network and the encoder of the segmentation network are consistent, so that the pre-training model obtained in the step 2 is directly loaded into the segmentation network, and the segmentation network directly obtains the CT image characteristics;

step 3c, carrying out data augmentation operation on the label-free CT image data obtained in the step 1 in a random overturning, translation and rotation mode through CT data preprocessing;

step 3e, segmentation model testing: after the training of the segmentation network model is completed, the segmentation effect of the segmentation model on the test set in step 3d needs to be quantitatively evaluated, and common evaluation indexes include TPVF, PPV, and DSC, which are defined as follows:

wherein V_sAnd V_GRespectively representing the number of positive sample pixels predicted by the model and the number of real positive sample pixels, wherein TPVF represents the number of correct predictions in all real positive sample pixels, PPV represents the number of real positive samples in all predicted positive sample pixels, and DSC is a common index for evaluating segmentation results, and the common index mainly balances indexes of TPVF and PPV;

and 4, inputting the acquired CT data into a segmentation model, and outputting the automatically segmented CT data.

2. The method of claim 1, wherein the step 1, the data preparation comprises the following steps:

step 1a, data acquisition: CT image data are obtained by scanning the target region of a patient by using a CT machine, and a plurality of continuous CT images are generated by one-time scanning;

step 1c, data processing: and (4) according to pathological features of the case, performing target region delineation on the labeled data set in the step 1b, and generating a data set containing target region segmentation labels.

3. The automatic segmentation method for radiotherapy target area based on self-supervised learning of claim 1, wherein in step 2a, the encoder is composed of a plurality of convolution layers and pooling layers, and respectively extracts the features of two input CT images to obtain two decoded feature vectors;

β_combine＝Combine(Max(β₁)，Max(β₂))

wherein beta is_combineRepresenting the output vector, beta, after the fusion of two networks₁、β₂Respectively representing the output vectors of the twin deep neural network 1 and the twin deep neural network 2.

4. The method of claim 1, wherein in step 3a, the encoder is consistent with the structure of the encoder of the pre-training network, and the encoder of the segmentation network is composed of a series of convolution and pooling layers for extracting abstract features from the input CT image.

5. The method of claim 1, wherein in step 3a, the stochastic multi-scale module comprises a cavity convolution module with 4 different cavity rates and a global average pooling layer, and the feature vectors output by the encoder are further processed.

6. The method of claim 1, wherein in step 3a, the decoder restores the previous feature to the original size, generates a class prediction for the original picture at each pixel, and fuses the deep features and the shallow features.