CN115861181A

CN115861181A - Tumor segmentation method and system for CT image

Info

Publication number: CN115861181A
Application number: CN202211398539.8A
Authority: CN
Inventors: 张文强; 魏徐峻; 郭倩宇; 高述勇; 周新宇
Original assignee: Yiwu Research Institute Of Fudan University; Fudan University
Current assignee: Yiwu Research Institute Of Fudan University; Fudan University
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-03-28

Abstract

The invention relates to a tumor segmentation method and a tumor segmentation system for CT images, wherein the method comprises the following steps: acquiring a CT tumor image dataset; extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution; acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame; adopting a mixed model of CNN and a Transformer, using an approximate edge decoder, firstly, performing a first round of training by using a weak supervision label, and correcting a result through a joint loss function; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model; and preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result. Compared with the prior art, the label segmentation method can efficiently and accurately generate the label and then train the model, so that the problems of time and labor waste caused by full-supervision labeling are solved, and the efficiency and the accuracy of the segmentation task are improved.

Description

Tumor segmentation method and system for CT image

Technical Field

The invention relates to the technical field of computer vision processing, in particular to a tumor segmentation method and a tumor segmentation system for CT images.

Background

The image segmentation technology is an important research direction in the current artificial intelligence field, particularly in computer vision, and is also an important component for helping a machine to carry out semantic understanding.

Computer aided diagnosis and treatment refers to the analysis and calculation of medical data by means of imaging analysis, physiological and biochemical means, computer image processing, machine learning modeling and other methods, and can assist in finding focus or determining focus properties to improve diagnosis accuracy. Wherein, accurately segmenting the organ or lesion site can provide important reference for subsequent diagnosis. The existing medical image segmentation method based on deep learning mostly uses full supervision to perform model training, namely segmentation labels of all focus parts need to be labeled pixel by pixel manually, however, in practical situations, the labeling method is time-consuming and labor-consuming, and a large number of accurately labeled samples are difficult to obtain.

Since 2015, a method based on deep learning became the main method for computer-processed medical image segmentation. Jonathan et al propose using full convolution instead of fully connected FCNs; olaf et al propose a skip connection based U-Net; fausto et al propose a segmentation model V-Net for 3D medical data; zhou et al designed a more complete U-Net structure U-Net + +, aiming at the limitations of U-Net. The lung tumor segmentation problem facing CT images also has higher research value and practical significance, and Reza et al adds an LSTM structure on the basis of U-Net and provides BCDU-Net, li et al provides a model H-DenseUNet for jointly learning characteristics in slices and characteristics between slices. However, as with all medical image segmentation tasks, most of the existing methods based on deep learning adopt a fully supervised training mode, i.e., segmentation information of a target lesion needs to be manually marked pixel by pixel, which consumes a lot of labor cost, has low efficiency, and cannot ensure the accuracy of marking.

Therefore, the method is very practical in not only the lung tumor segmentation task facing the CT image, but also other medical image segmentation tasks, so as to effectively reduce the waste of a large amount of manpower and material resources caused by the need of pixel-by-pixel labeling in the deep learning model training. In addition, in the general visual segmentation task, the used natural image usually contains rich semantic information, namely, each instance object to be segmented has obvious and rich characteristics, the labeling difficulty is low, and a high domain knowledge threshold is not needed. However, in medical images, the lesion part to be segmented does not have rich and accurate semantic information like natural images, so how to fully use the features of medical images is also one of the keys of medical multi-modal tasks.

In summary, how to reduce a large amount of manpower labels by a mode of automatically generating labels and how to train a model with more accurate segmentation precision from limited medical semantic information is an important problem of the current computer-aided medical segmentation, and has higher research significance and practical clinical application value.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a tumor segmentation method and a tumor segmentation system for CT images, which can efficiently and accurately generate labels and then perform model training, so that the problem of time and labor waste caused by full supervision and labeling is solved, and the efficiency and accuracy of segmentation tasks are improved.

The purpose of the invention can be realized by the following technical scheme: a tumor segmentation method facing CT images comprises the following steps:

s1, acquiring a CT tumor image data set: dividing each 3D data into a plurality of 2D data, carrying out weak annotation on the 2D data, carrying out pretreatment on all images, and then dividing a CT tumor image data set into a training set and a test set with real labels;

s2, edge extraction: extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution;

s3, self-adaptive flushing filling generates a pseudo label: acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame;

s4, training a segmentation model: adopting a mixed model of CNN and a Transformer, refining the structure by using an approximate edge decoder, firstly performing a first round of training by using a weak supervision label, and correcting the result by using a combined loss function of three loss functions; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model;

and S5, preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result.

Further, the specific process of weakly labeling the 2D data in step S1 is as follows: for a lesion region on the 2D data, two points are spotted on the data using annotation software to indicate the presence and absence of a tumor in the region, respectively, to obtain a corresponding json file.

Further, the specific process of the pretreatment is as follows: the image was adjusted to a uniform 352 × 352 size and normalized.

Further, step S2 is to specifically input the preprocessed 2D data into the multi-scale edge detection network model, and output to obtain the edge feature map.

Further, the specific process of step S3 is:

taking the edge characteristic graph and the json file as input, and obtaining a pseudo label by adopting self-adaptive flooding filling, wherein the flooding radius is set as:

where I is the input image, r (I) is the mask radius corresponding to the input image I, h _I And w _I Respectively the length and the width of an input image, and gamma is a set hyper-parameter;

in addition, the labeled group route is:

in the formula, S _b And

position coordinates of the background pixel and the ith labeled tumor object, respectively;

the set of circular masks used for flood filling is defined as:

in the formula, C is a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius;

then, combining the edge feature map, dividing the image into a plurality of connected areas:

in the formula, F (I) is a connected region obtained after flood filling, E (I) is an edge feature map, E (·) denotes an edge detector, and I is an input image.

Further, the mixed model of CNN and transform in step S4 includes an Embedding part, an encoder part, and a decoder part, where the Embedding part uses ResNet to extract feature maps of 3 stages, and then performs patch _ Embedding based on transform;

the encoder part comprises 12 Attention Encode blocks, and each block is set according to a Vision Transformer, namely an Attention module and an MLP module;

the decoder section consists of two components, a Vision Transformer (ViT) decoder and an approximate edge detector.

Further, the ViT decoder includes four concatenated convolutional layers, each layer having a Batch Normalization (BN) layer, a ReLU activation layer, and an upsampled layer, with the characteristic output of the encoder portion as input, and the corresponding characteristic of each layer decoder is represented as D = { D = { D } ⁱ |i＝1,2,3,4}。

Further, the output of the approximate edge detector is:

f _e ＝σ(cat(R ³ ,D ² ))

where σ represents a 3 × 3 convolutional layer, which includes BN and ReLU layers.

Further, the training process in step S4 specifically includes:

the first round of training uses weakly supervised labels for training, and the input of the training is as follows: the method comprises the steps of preprocessing a CT slice image after 2D, preprocessing an edge image and aligning the length and the width of the CT image to form a weak label image;

model training adopts an SGD optimizer, the learning rate is adaptively attenuated according to training epochs, attenuation is performed every 10 epochs, the attenuation ratio is 0.1, binary cross entropy loss, local cross entropy loss and gated CRF loss are adopted, and for edge decoder branches, the binary cross entropy loss is used for constraining e:

where y is the true label, e represents the edge map, r and c represent the row and column coordinates of the image, and the decoder branch uses local cross-entropy loss and gated CRF loss, which is designed to let the model focus only on the determined regions and ignore the uncertain regions:

wherein J represents a marker region, g represents a ground truth, and s represents a predicted tumor map;

the gated CRF loss was:

wherein, K _i The area covered by the k × k range around the pixel i, d (i, j) is defined as:

d(i,j)＝|s _i -s _j |

wherein s is _i And s _j For confidence values of s at locations i and j, | · | represents the L1 distance, f (i, j) is a Gaussian kernel bandwidth filter:

wherein,

for normalized weights, I (-) and PT (-) are the gray value of the pixel and the position of the pixel, σ _PT And σ _I To control the hyperparameters of the gaussian kernel scale, the total loss function is thus defined as:

L _final ＝α ₁ L _bce +α ₂ L _pbce +α ₃ L _gcrf

wherein alpha is ₁ ，α ₂ ，α ₃ Weights corresponding to the binary cross entropy loss, the local cross entropy loss and the gated CRF loss are respectively set;

the second round of training uses a new correction label generated by the first round of training as a ground route to supervise the model and further optimize the segmentation capability of the model, the self-supervision training mode can effectively enhance the understanding of the model on medical image semantics and improve the segmentation precision of tumors, and the model obtained after the second round of training is used as a final segmentation model.

A tumor segmentation system facing CT images comprises a medical image preprocessing module, an edge detection module, a weak label generation module and a CNN-ViT mixed segmentation module, wherein the image preprocessing module is used for preprocessing CT image images, dividing a medical image data set into a test set and a training set and converting data in a 3D format into 2D data;

the edge detection module acquires the edge information of the image through multi-scale convolution and combination of set threshold parameters based on an RCF network model;

the weak label generating module adopts point marking as supervision signal input and obtains a weak label on the edge image through a self-adaptive flooding filling algorithm;

the CNN-ViT mixed segmentation module adopts a mixed Embedding mode, edges are added into a decoder, three loss function combinations are set for a first round of training, and then a label generated in the first round is used as a new supervision training for a second round to obtain a final segmentation model.

Compared with the prior art, the invention provides a CT image-oriented weak supervision deep learning tumor segmentation scheme, labels are automatically generated through a simple weak labeling and edge detection algorithm, a segmentation model is trained through an image sample and the generated labels to segment lung tumors in the CT image, the tumor target in the CT image does not need to be manually and finely sketched pixel by pixel, time and labor can be saved, a coarse label can be automatically generated to assist in training a segmentation model, the problems of time and labor waste caused by the fact that full supervision labeling is needed in the existing method are solved, the labels can be automatically generated efficiently and accurately, the speed and the accuracy of subsequent segmentation model training are further ensured, and the efficiency and the accuracy of a segmentation task are improved.

In the invention, the actual medical problems and the characteristics of tumor CT images are considered, a complete sketching label is not used as the supervision information of model training, but a point labeling mode which is simpler and easier for a doctor to label is used instead, the generation of the weak supervision label is completed by acquiring edge information to a certain degree and acquiring the pseudo label by adopting self-adaptive flooding filling, and the weak supervision label is not limited to CT tumor images and can be widely applied to various medical scene tasks.

The invention adopts a two-round training mode when training the segmentation model, uses the label generated in the first round as a new supervised training second round to obtain a final segmentation model, and provides a combined loss function aiming at a medical CT tumor segmentation task, including binary cross entropy loss, local cross entropy loss and gated CRF loss, so that a tumor part and a normal part are more easily distinguished, namely the difference between the tumor and the normal tissue is further separated by the local cross entropy loss, and a gated condition random field can also enable a relatively coarse weak label to train a better result, thereby effectively improving the performance of the CT tumor segmentation task.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flowchart illustrating a weakly supervised learning method for CT tumor image segmentation task according to an exemplary embodiment;

FIG. 3 is a diagram showing a group of images-tag composition used in the embodiment;

FIG. 4 is a block diagram of a coding block in the embodiment;

fig. 5 is a block diagram of an embodiment incorporating a near edge decoder.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments.

Examples

As shown in fig. 1, a tumor segmentation method for CT images includes the following steps:

s1, acquiring a CT tumor image data set: dividing each 3D data into a plurality of 2D data, carrying out weak labeling on the 2D data, preprocessing all images, and then dividing a CT tumor image data set into a training set and a test set with real labels;

s4, training a segmentation model: adopting a mixed model of CNN and a Transformer, refining the structure by using an approximate edge decoder, firstly performing a first round of training by using a weak supervision label, and performing a correction result by using a combined loss function of three loss functions; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model;

In this embodiment, applying the above technical solution, when constructing the segmentation model, as shown in fig. 1, the following contents are mainly included:

(1) The method comprises the steps of obtaining a CT tumor image data set with a class label, dividing each 3D data into a plurality of 2D data, carrying out weak labeling on the 2D data, and preprocessing all images, wherein the class label is used in a testing part to evaluate a model after subsequent training.

The weak annotation includes: for a focus area on the 2D data, labelme and other simple labeling software is used for pointing two points on the data to respectively indicate the existence and nonexistence of a tumor in the area, and therefore a corresponding json file is obtained.

In this example, 584 CT tumor images in the LUNA16 dataset were used, as were the authenticity signatures used for the test. The primary image formats are mhd and raw format. The image is then converted to 2D data and normalized, cropped to 352 x 352 size, and converted to feature vectors for use by the model.

(2) And using an edge detection network model based on multi-scale convolution for extracting the edge image fusing multi-scale information.

In this embodiment, the RCF network is used as the edge detection model, and the network is a multi-scale network constructed based on the VGG 16. The convolution is divided into 5 stages, and the adjacent two stages and a general neural network model realize the function of down-sampling similarly through a pooling layer so as to achieve the effect of fusing different scale features. While each convolutional layer uses a convolution operation with a convolutional kernel size of 1 x 1 and a channel depth of 21, and then the output of each section is subjected to an element addition operation to obtain a composite feature. Followed by an up-sampling layer to enlarge the feature size. And (2) using a cross entropy loss and sigmoid layer behind each upsampling layer, then performing coherent form superposition on the outputs of all the upsampling layers, then using a1 x 1 convolution to fuse the outputs of all the upsampling layers, and finally using the same layer to obtain an output.

The specific process is as follows: acquiring a public model of the RCF network, using checkpoint with the best performance to carry out reasoning, wherein the data format is pth, adopting a multi-scale reasoning mode, and carrying out multiple tests, wherein [1.5,2,2.5] is a group of better parameter thresholds, can achieve better edge extraction effect and has no over-high requirement on a machine, and after reasoning in each scale, the edge extraction results are superposed together to obtain a final edge graph.

(3) And acquiring the weak supervision label based on self-adaptive flooding filling and self-adaptive gradient selection through the extracted multi-scale edges and the positions of the detection frames.

Acquiring a json file with point marking information of each image in step (1), then taking the edge image and the json file information as input, and acquiring a pseudo label by adopting adaptive flooding filling, wherein the flooding radius is set as follows:

wherein, I is the input image, and r (I) is the mask radius corresponding to the input image I. h is _I And w _I Respectively representing the length and width of the input image, and gamma represents a hyper parameterThe number can be set in different tasks.

The labeled ground route of the method is expressed as follows:

wherein S is _b And

the position coordinates of the background pixel and the ith labeled tumor object, respectively. The set of circular masks for flood filling can be defined as:

where C denotes a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius. For the edge detector used in step (2) to detect the edge of the image, it is expressed as: e (I), where E (·) denotes an edge detector, I denotes an input image, and E denotes a generated edge. By means of the above defined variables, the image is divided into a plurality of connected regions:

where F (I) represents a connected region obtained after flood filling, so for a point labeled as a foreground label in the json document, γ =20, that is, 1/20 of the length of the smaller of the length and the width of the image is used as the radius of the flood filling in the present embodiment, and for a point of the background, γ =8, that is, 1/8 of the smaller of the length and the width of the image is used as the radius of the flood filling in the present embodiment.

The specific process is as follows: reading an edge image and a json file with point labels, wherein the points labeled by labelme store corresponding information in a dictionary form, a 'shape' field represents the information of the points, data in the field is a list, and each element in the list represents a point. Then, the radius of the flooding is set according to the formula, and the pixel value range for preventing the flooding on the three channels of RGB is set, and since the image is originally a gray scale map, the three channels are set to have the same value, the flooding truncation pixel value adopted in the present embodiment is (10, 10, 10), and each data also includes a "label" field, which is divided into two types in the present embodiment, one type is a tumor part, for example, the field is defined as "formed round", otherwise, the normal part is "back ground", and two types of images are obtained by performing flooding filling twice respectively, one type is a pseudo label map, that is, a mask map only including a focus region, and the other type is a mask image including both the focus point region and the normal point region, and the function of performing loss calculation of local cross entropy.

(4) Training a segmentation model, adopting a CNN and Transformer mixed model, and adjusting and using different Embedding models according to different preset parameters.

In the Embedding part, a feature map of 3 stages (stage 2, stage 3, and stage 5, respectively) is extracted using ResNetV2, and patch Embedding is performed based on ViT.

Next, 12 Attention Encode blocks are designed in the encoder part, and each block is set according to ViT, namely an Attention module and an MLP module.

The Decoder part consists of two components, a ViT Decoder and an approximate edge detector, and the ViT Decoder is four cascaded convolutional layers, each layer having a batch normalization layer, a ReLU activation layer and an upsampling layer, with the characteristic output of the encoder part as input, and the corresponding characteristics of each layer of decoders denoted as D = { D = { D } ⁱ I =1,2,3,4}; furthermore, due to the lack of structure and detail in weakly supervised labeling, the present solution designs an approximate edge decoder as an approximate edge detector to generate structure to overcome this drawback, in particular, the output of the approximate edge detector can be denoted as f _e ＝σ(cat(R ³ ,D ² ) Where σ represents a 3 × 3 convolutional layer, including normalization and ReLU layers, the edge feature map e may be represented by the graph at f _e Then adding a 3X 3 convolution layer to obtain,followed by mixing f _e And D ³ 、cat(f _e ,D ³ ) Merging and passing through two convolution layers to obtain a multi-channel feature f _s . Similar to e, the final single channel signature s can also be obtained in the same way.

The specific implementation operation is as follows: firstly, a pool variable is set according to the requirement to judge whether a mixed model is used, and if the mixed model is used, the training of the edge reconstruction capability in a transform is required. Then the rescetv 2 pre-training model is used to extract features before patch embedding, the extracted features being three features, stage 2,3, 5 of rescetv 2, as one of the decoder inputs. The configuration of this part and the configuration of ResNetV2 remain the same. In the section that proceeds ViT, the patch size takes 16 and the embedding dimension takes 768. The Encoder also basically conforms to a conventional ViT, and the embodiment uses 12 blocks to form the Encoder, which also conforms to a conventional ViT configuration.

(5) And (4) performing a first round of model training in the step (4) by using the labels obtained in the weak supervision mode in the step (3), wherein the training input is as follows: a CT slice image after 2D preprocessing, an edge image after preprocessing and a weak label image aligned with the length and width of the CT image. In the technical scheme, a joint loss function is constructed by adopting binary cross entropy loss, local cross entropy loss and gated CRF loss. For the edge decoder branch, the binary cross entropy loss is used to constrain e:

where y is the true label, e represents the edge map, and r and c represent the row and column coordinates of the image. While the decoder branch uses local cross-entropy loss and gated CRF loss. The purpose of the partial binary cross entropy loss design is to let the model focus only on the determined regions and ignore the uncertain regions:

where J denotes the marker region, g denotes the ground truth, and s denotes the predicted tumor map. In order to learn better object structures as much as possible, gated CRFs are also used in the loss function:

wherein K _i The area covered by the k × k range around the pixel i, d (i, j) is defined as:

d(i,j)＝|s _i -s _j |

wherein s is _i And s _j For confidence values of s at locations i and j, |, represents the L1 distance. f (i, j) is a Gaussian kernel bandwidth filter:

wherein

For normalized weights, I (-) and PT (-) are the gray value of the pixel and the position of the pixel, σ _PT And σ _I Is a hyperparameter for controlling the scale of the Gaussian kernel. The total loss function is defined as:

L _final ＝α ₁ L _bce +α ₂ L _pbce +α ₃ L _gcrf

wherein alpha is ₁ ，α ₂ ，α ₃ Are the weights for the three loss functions. In the present embodiment, all are set to 1.

The specific implementation is as follows: the original image, the weak label containing only the tumor site, the weak label containing the tumor and the normal site, and the edge map are input and converted into a tenator form variable of 352 × 352. After initializing the network, the parameter configuration of the first round of training is respectively as follows: the batch size is 32, the optimizer uses an SGD optimizer, the initial learning rate is 0.01, the momentum is 0.9, the initial attenuation rate is 0.1, and attenuation change is carried out every 10 epochsSetting the attenuation to be minimum 5X 10 ^-4 The training length is 100 epochs.

Fig. 2 is a schematic diagram of the image group-tag pair configuration in the data in the present embodiment. The original image is mhd format data based on the LUNA16 data set, the visualization result is shown in the figure, and the second diagram shown is a preprocessing diagram, which cuts out the main part of the lung and discards other redundant parts. The third diagram shown is a schematic diagram of an edge map, which is obtained by an edge detection network, and is used both for adaptive flood filling and for supervising the process of approximating heavy results of an edge decoder. The last diagram is a schematic diagram of the point surveillance marking of the embodiment, and the points with the first color are marked on the part marked as the lesion or the tumor, and the points with random positions of normal non-lesion areas are shown as the second color points on the diagram.

Fig. 3 is a structural diagram of a transform coding block in the present embodiment. The feature after passing Embedding is denoted F ₀ The mixture was input to a Transformer of the mixture model, as shown in FIG. 3. In order to inhibit overfitting of a downstream task, normalization operation is added before a multi-head attention layer and a feedforward neural network layer. The multi-head attention layer is designed to learn different projection methods by a plurality of attentions in order to recognize different patterns. The feedforward neural Network layer (Feed Forward Network), FFN for short, the structure of FFN is as follows:

FFN(h _i )＝GeLU(h _i W ¹ +b ¹ )W ² +b ²

wherein h is _i Is a vector of the hidden layer. W ¹ ，b ¹ ，W ² ，b ² Is a parameter of FFN, geLU is the activation function. As shown in FIG. 3, the module is stacked 12 times in the entire model, F after Embedding ₀ As the first input of the encoder, the output is F ₁ And by analogy, the encoder of the model is finally formed.

Fig. 4 is a block diagram of an embodiment of the present invention incorporating a near-edge decoder. The output of the approximate edge detector can be expressed as f _e ＝σ(cat(R ³ ,D ² ) Where σ represents a 3 x 3 convolutional layer comprising BN and ReLU layers, with the previously obtained edge map as the supervision. The final edge feature map e may be obtained by applying f _e Then adding a 3X 3 convolutional layer, followed by _e And D ³ With cat (f) _e ,D ³ ) Are combined and passed through two convolutional layers to obtain a multi-channel feature f _s . Similar to e, the corresponding convolutional layer is finally added to obtain a single channel feature map s.

It can be seen that, in the technical scheme, the actual medical problems and the characteristics of the tumor CT image are fully considered, the complete sketching label is not used as the supervision information of model training, the point label which is easier to label for doctors is used instead, and then some fine structural information is lost. However, after the approximate edge decoder is added, the related structure of the focus position can be gradually refined, in addition, the technical scheme also provides a combined loss function aiming at the medical tumor segmentation task, so that the tumor part and the normal part can be more easily distinguished, namely, the difference between the tumor and the normal tissue is further separated by the loss of local cross entropy, and the gating condition random field can also lead the training of relatively coarse weak labels to be better, thereby bringing certain performance improvement for the CT tumor segmentation task.

The present embodiment further provides a tumor segmentation system for CT images, which includes:

a medical image preprocessing module, which performs medical image preprocessing by using the method in the step (1);

and (3) an edge detection module, which performs edge detection by using the method in the step (2), wherein the obtained result is used for adaptive flooding filling and supervision of the approximate edge decoder.

And (4) a weak label generating module, wherein by using the method in the step (3), only the key point label is used as a supervision signal input, and a weak label can be obtained on the edge image through a self-adaptive flooding filling algorithm and is used for the first training of the model.

And (3) a CNN-ViT mixed segmentation module, wherein the segmentation network model adopts a mixed embedding mode by using the methods in the steps (4) to (5), edges are added into a decoder, and three loss function combinations are set for training. And using the label generated in the first round as a new supervised training second round to obtain a final segmentation model.

In summary, in order to realize efficient and accurate segmentation tasks, the technical scheme includes that a CT tumor image data set is obtained, each 3D data is divided into a plurality of 2D data, weak labeling is carried out on the data, all images are preprocessed, and meanwhile, the CT image data set is divided into a test set and a training set; then pre-training an edge detection network model based on multi-scale convolution, wherein the edge detection network model is used for extracting multi-scale edges and obtaining corresponding edges of medical semantic levels; then, acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted multi-scale edges and the positions of the detection frames; finally, training a segmentation model, adopting a mixed model of CNN and a Transformer, and adjusting to use different models according to different data; on one hand, the weak supervision label is subjected to a first round of training and subjected to combined loss function correction results such as a conditional random field; and on the other hand, the correction result is used as a label training of the second round of training to obtain a final segmentation model.

Therefore, the technical scheme is suitable for automatically generating trainable pseudo labels and establishing a tumor segmentation model facing a CT image in a computer-aided diagnosis and treatment scene, and can be well used for clinical tumor segmentation tasks. According to the technical scheme, the tumor target in the CT image does not need to be manually subjected to pixel-by-pixel fine delineation, and the segmentation model is trained by using the time-saving and labor-saving coarse marking generated automatically. Meanwhile, the technical scheme has high adaptability, and can realize rapid deployment of the convolutional neural network model and the Transformer model which are popular at the present stage. According to the technical scheme, a large amount of manpower marks can be reduced, and a model with more accurate segmentation precision can be trained from limited medical semantic information, so that the efficiency and accuracy of a segmentation task are ensured.

Claims

1. A tumor segmentation method for CT images is characterized by comprising the following steps:

2. The method of claim 1, wherein the weak labeling of the 2D data in step S1 is performed by: for a lesion region on the 2D data, two points are spotted on the data using annotation software to indicate the presence and absence of a tumor in the region, respectively, to obtain a corresponding json file.

3. The method of claim 1, wherein the preprocessing comprises: the image was adjusted to a uniform 352 × 352 size and normalized.

4. The method of claim 2, wherein the step S2 is to input the preprocessed 2D data into a multi-scale edge detection network model and output the data to obtain an edge feature map.

5. The method of claim 4, wherein the step S3 comprises the following steps:

taking an edge feature graph and a json file as input, and obtaining a pseudo label by adopting self-adaptive flooding filling, wherein the flooding radius is set as:

in addition, the labeled group route is:

in the formula, S _b And

the set of circular masks used for flood filling is defined as:

wherein C is a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius;

6. The method for tumor segmentation oriented to CT images of claim 1, wherein the mixture model of CNN and transform in step S4 includes an Embedding part, an encoder part and a decoder part, wherein the Embedding part uses ResNet to extract feature maps of 3 stages, and then performs patch _ Embedding based on transform;

the decoder section consists of two components, a ViT decoder and an approximate edge detector.

7. The method of claim 6, wherein the ViT decoder comprises four cascaded convolutional layers, each layer having a Batch Normalization (BN) layer, a ReLU activation layer and an upsampling layer, with the characteristic output of the encoder portion as input, and the corresponding characteristic of each decoder layer being represented as D = { D } ⁱ |i＝1,2,3,4}。

8. The method of claim 7, wherein the output of the approximate edge detector is:

f _e ＝σ(cat(R ³ ,D ² ))

where σ represents a 3 × 3 convolutional layer comprising BN and ReLU layers.

9. The method as claimed in claim 4, wherein the training process in step S4 comprises:

the first round of training uses weakly supervised labels for training, and the input of the training is as follows: the method comprises the steps of obtaining a CT slice image after 2D preprocessing, an edge image after preprocessing and a weak label image with aligned length and width of the CT image;

the gated CRF loss was:

d(i,j)＝|s _i -s _j |

wherein,

L _final ＝α ₁ L _bce +α ₂ L _pbce +α ₃ L _gcrf

10. A tumor segmentation system facing CT images is characterized by comprising a medical image preprocessing module, an edge detection module, a weak label generation module and a CNN-ViT mixed segmentation module, wherein the image preprocessing module is used for preprocessing CT image data, dividing a medical image data set into a test set and a training set, and converting data in a 3D format into 2D data;

the weak label generating module adopts point labeling as supervision signal input and obtains a weak label on the edge image through a self-adaptive flooding filling algorithm;