CN115861181A - Tumor segmentation method and system for CT image - Google Patents
Tumor segmentation method and system for CT image Download PDFInfo
- Publication number
- CN115861181A CN115861181A CN202211398539.8A CN202211398539A CN115861181A CN 115861181 A CN115861181 A CN 115861181A CN 202211398539 A CN202211398539 A CN 202211398539A CN 115861181 A CN115861181 A CN 115861181A
- Authority
- CN
- China
- Prior art keywords
- image
- training
- segmentation
- model
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 77
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 73
- 238000011049 filling Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 23
- 238000003708 edge detection Methods 0.000 claims abstract description 18
- 238000002372 labelling Methods 0.000 claims abstract description 16
- 230000003044 adaptive effect Effects 0.000 claims abstract description 13
- 238000012937 correction Methods 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000003902 lesion Effects 0.000 claims description 7
- 241000282326 Felis catus Species 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 238000011010 flushing procedure Methods 0.000 claims description 3
- 239000003550 marker Substances 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 3
- 230000002238 attenuated effect Effects 0.000 claims description 2
- 239000002699 waste material Substances 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 14
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000003709 image segmentation Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 4
- 208000037841 lung tumor Diseases 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004195 computer-aided diagnosis Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Landscapes
- Apparatus For Radiation Diagnosis (AREA)
Abstract
The invention relates to a tumor segmentation method and a tumor segmentation system for CT images, wherein the method comprises the following steps: acquiring a CT tumor image dataset; extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution; acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame; adopting a mixed model of CNN and a Transformer, using an approximate edge decoder, firstly, performing a first round of training by using a weak supervision label, and correcting a result through a joint loss function; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model; and preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result. Compared with the prior art, the label segmentation method can efficiently and accurately generate the label and then train the model, so that the problems of time and labor waste caused by full-supervision labeling are solved, and the efficiency and the accuracy of the segmentation task are improved.
Description
Technical Field
The invention relates to the technical field of computer vision processing, in particular to a tumor segmentation method and a tumor segmentation system for CT images.
Background
The image segmentation technology is an important research direction in the current artificial intelligence field, particularly in computer vision, and is also an important component for helping a machine to carry out semantic understanding.
Computer aided diagnosis and treatment refers to the analysis and calculation of medical data by means of imaging analysis, physiological and biochemical means, computer image processing, machine learning modeling and other methods, and can assist in finding focus or determining focus properties to improve diagnosis accuracy. Wherein, accurately segmenting the organ or lesion site can provide important reference for subsequent diagnosis. The existing medical image segmentation method based on deep learning mostly uses full supervision to perform model training, namely segmentation labels of all focus parts need to be labeled pixel by pixel manually, however, in practical situations, the labeling method is time-consuming and labor-consuming, and a large number of accurately labeled samples are difficult to obtain.
Since 2015, a method based on deep learning became the main method for computer-processed medical image segmentation. Jonathan et al propose using full convolution instead of fully connected FCNs; olaf et al propose a skip connection based U-Net; fausto et al propose a segmentation model V-Net for 3D medical data; zhou et al designed a more complete U-Net structure U-Net + +, aiming at the limitations of U-Net. The lung tumor segmentation problem facing CT images also has higher research value and practical significance, and Reza et al adds an LSTM structure on the basis of U-Net and provides BCDU-Net, li et al provides a model H-DenseUNet for jointly learning characteristics in slices and characteristics between slices. However, as with all medical image segmentation tasks, most of the existing methods based on deep learning adopt a fully supervised training mode, i.e., segmentation information of a target lesion needs to be manually marked pixel by pixel, which consumes a lot of labor cost, has low efficiency, and cannot ensure the accuracy of marking.
Therefore, the method is very practical in not only the lung tumor segmentation task facing the CT image, but also other medical image segmentation tasks, so as to effectively reduce the waste of a large amount of manpower and material resources caused by the need of pixel-by-pixel labeling in the deep learning model training. In addition, in the general visual segmentation task, the used natural image usually contains rich semantic information, namely, each instance object to be segmented has obvious and rich characteristics, the labeling difficulty is low, and a high domain knowledge threshold is not needed. However, in medical images, the lesion part to be segmented does not have rich and accurate semantic information like natural images, so how to fully use the features of medical images is also one of the keys of medical multi-modal tasks.
In summary, how to reduce a large amount of manpower labels by a mode of automatically generating labels and how to train a model with more accurate segmentation precision from limited medical semantic information is an important problem of the current computer-aided medical segmentation, and has higher research significance and practical clinical application value.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a tumor segmentation method and a tumor segmentation system for CT images, which can efficiently and accurately generate labels and then perform model training, so that the problem of time and labor waste caused by full supervision and labeling is solved, and the efficiency and accuracy of segmentation tasks are improved.
The purpose of the invention can be realized by the following technical scheme: a tumor segmentation method facing CT images comprises the following steps:
s1, acquiring a CT tumor image data set: dividing each 3D data into a plurality of 2D data, carrying out weak annotation on the 2D data, carrying out pretreatment on all images, and then dividing a CT tumor image data set into a training set and a test set with real labels;
s2, edge extraction: extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution;
s3, self-adaptive flushing filling generates a pseudo label: acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame;
s4, training a segmentation model: adopting a mixed model of CNN and a Transformer, refining the structure by using an approximate edge decoder, firstly performing a first round of training by using a weak supervision label, and correcting the result by using a combined loss function of three loss functions; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model;
and S5, preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result.
Further, the specific process of weakly labeling the 2D data in step S1 is as follows: for a lesion region on the 2D data, two points are spotted on the data using annotation software to indicate the presence and absence of a tumor in the region, respectively, to obtain a corresponding json file.
Further, the specific process of the pretreatment is as follows: the image was adjusted to a uniform 352 × 352 size and normalized.
Further, step S2 is to specifically input the preprocessed 2D data into the multi-scale edge detection network model, and output to obtain the edge feature map.
Further, the specific process of step S3 is:
taking the edge characteristic graph and the json file as input, and obtaining a pseudo label by adopting self-adaptive flooding filling, wherein the flooding radius is set as:
where I is the input image, r (I) is the mask radius corresponding to the input image I, h I And w I Respectively the length and the width of an input image, and gamma is a set hyper-parameter;
in addition, the labeled group route is:
in the formula, S b Andposition coordinates of the background pixel and the ith labeled tumor object, respectively;
the set of circular masks used for flood filling is defined as:
in the formula, C is a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius;
then, combining the edge feature map, dividing the image into a plurality of connected areas:
in the formula, F (I) is a connected region obtained after flood filling, E (I) is an edge feature map, E (·) denotes an edge detector, and I is an input image.
Further, the mixed model of CNN and transform in step S4 includes an Embedding part, an encoder part, and a decoder part, where the Embedding part uses ResNet to extract feature maps of 3 stages, and then performs patch _ Embedding based on transform;
the encoder part comprises 12 Attention Encode blocks, and each block is set according to a Vision Transformer, namely an Attention module and an MLP module;
the decoder section consists of two components, a Vision Transformer (ViT) decoder and an approximate edge detector.
Further, the ViT decoder includes four concatenated convolutional layers, each layer having a Batch Normalization (BN) layer, a ReLU activation layer, and an upsampled layer, with the characteristic output of the encoder portion as input, and the corresponding characteristic of each layer decoder is represented as D = { D = { D } i |i=1,2,3,4}。
Further, the output of the approximate edge detector is:
f e =σ(cat(R 3 ,D 2 ))
where σ represents a 3 × 3 convolutional layer, which includes BN and ReLU layers.
Further, the training process in step S4 specifically includes:
the first round of training uses weakly supervised labels for training, and the input of the training is as follows: the method comprises the steps of preprocessing a CT slice image after 2D, preprocessing an edge image and aligning the length and the width of the CT image to form a weak label image;
model training adopts an SGD optimizer, the learning rate is adaptively attenuated according to training epochs, attenuation is performed every 10 epochs, the attenuation ratio is 0.1, binary cross entropy loss, local cross entropy loss and gated CRF loss are adopted, and for edge decoder branches, the binary cross entropy loss is used for constraining e:
where y is the true label, e represents the edge map, r and c represent the row and column coordinates of the image, and the decoder branch uses local cross-entropy loss and gated CRF loss, which is designed to let the model focus only on the determined regions and ignore the uncertain regions:
wherein J represents a marker region, g represents a ground truth, and s represents a predicted tumor map;
the gated CRF loss was:
wherein, K i The area covered by the k × k range around the pixel i, d (i, j) is defined as:
d(i,j)=|s i -s j |
wherein s is i And s j For confidence values of s at locations i and j, | · | represents the L1 distance, f (i, j) is a Gaussian kernel bandwidth filter:
wherein,for normalized weights, I (-) and PT (-) are the gray value of the pixel and the position of the pixel, σ PT And σ I To control the hyperparameters of the gaussian kernel scale, the total loss function is thus defined as:
L final =α 1 L bce +α 2 L pbce +α 3 L gcrf
wherein alpha is 1 ,α 2 ,α 3 Weights corresponding to the binary cross entropy loss, the local cross entropy loss and the gated CRF loss are respectively set;
the second round of training uses a new correction label generated by the first round of training as a ground route to supervise the model and further optimize the segmentation capability of the model, the self-supervision training mode can effectively enhance the understanding of the model on medical image semantics and improve the segmentation precision of tumors, and the model obtained after the second round of training is used as a final segmentation model.
A tumor segmentation system facing CT images comprises a medical image preprocessing module, an edge detection module, a weak label generation module and a CNN-ViT mixed segmentation module, wherein the image preprocessing module is used for preprocessing CT image images, dividing a medical image data set into a test set and a training set and converting data in a 3D format into 2D data;
the edge detection module acquires the edge information of the image through multi-scale convolution and combination of set threshold parameters based on an RCF network model;
the weak label generating module adopts point marking as supervision signal input and obtains a weak label on the edge image through a self-adaptive flooding filling algorithm;
the CNN-ViT mixed segmentation module adopts a mixed Embedding mode, edges are added into a decoder, three loss function combinations are set for a first round of training, and then a label generated in the first round is used as a new supervision training for a second round to obtain a final segmentation model.
Compared with the prior art, the invention provides a CT image-oriented weak supervision deep learning tumor segmentation scheme, labels are automatically generated through a simple weak labeling and edge detection algorithm, a segmentation model is trained through an image sample and the generated labels to segment lung tumors in the CT image, the tumor target in the CT image does not need to be manually and finely sketched pixel by pixel, time and labor can be saved, a coarse label can be automatically generated to assist in training a segmentation model, the problems of time and labor waste caused by the fact that full supervision labeling is needed in the existing method are solved, the labels can be automatically generated efficiently and accurately, the speed and the accuracy of subsequent segmentation model training are further ensured, and the efficiency and the accuracy of a segmentation task are improved.
In the invention, the actual medical problems and the characteristics of tumor CT images are considered, a complete sketching label is not used as the supervision information of model training, but a point labeling mode which is simpler and easier for a doctor to label is used instead, the generation of the weak supervision label is completed by acquiring edge information to a certain degree and acquiring the pseudo label by adopting self-adaptive flooding filling, and the weak supervision label is not limited to CT tumor images and can be widely applied to various medical scene tasks.
The invention adopts a two-round training mode when training the segmentation model, uses the label generated in the first round as a new supervised training second round to obtain a final segmentation model, and provides a combined loss function aiming at a medical CT tumor segmentation task, including binary cross entropy loss, local cross entropy loss and gated CRF loss, so that a tumor part and a normal part are more easily distinguished, namely the difference between the tumor and the normal tissue is further separated by the local cross entropy loss, and a gated condition random field can also enable a relatively coarse weak label to train a better result, thereby effectively improving the performance of the CT tumor segmentation task.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flowchart illustrating a weakly supervised learning method for CT tumor image segmentation task according to an exemplary embodiment;
FIG. 3 is a diagram showing a group of images-tag composition used in the embodiment;
FIG. 4 is a block diagram of a coding block in the embodiment;
fig. 5 is a block diagram of an embodiment incorporating a near edge decoder.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments.
Examples
As shown in fig. 1, a tumor segmentation method for CT images includes the following steps:
s1, acquiring a CT tumor image data set: dividing each 3D data into a plurality of 2D data, carrying out weak labeling on the 2D data, preprocessing all images, and then dividing a CT tumor image data set into a training set and a test set with real labels;
s2, edge extraction: extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution;
s3, self-adaptive flushing filling generates a pseudo label: acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame;
s4, training a segmentation model: adopting a mixed model of CNN and a Transformer, refining the structure by using an approximate edge decoder, firstly performing a first round of training by using a weak supervision label, and performing a correction result by using a combined loss function of three loss functions; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model;
and S5, preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result.
In this embodiment, applying the above technical solution, when constructing the segmentation model, as shown in fig. 1, the following contents are mainly included:
(1) The method comprises the steps of obtaining a CT tumor image data set with a class label, dividing each 3D data into a plurality of 2D data, carrying out weak labeling on the 2D data, and preprocessing all images, wherein the class label is used in a testing part to evaluate a model after subsequent training.
The weak annotation includes: for a focus area on the 2D data, labelme and other simple labeling software is used for pointing two points on the data to respectively indicate the existence and nonexistence of a tumor in the area, and therefore a corresponding json file is obtained.
In this example, 584 CT tumor images in the LUNA16 dataset were used, as were the authenticity signatures used for the test. The primary image formats are mhd and raw format. The image is then converted to 2D data and normalized, cropped to 352 x 352 size, and converted to feature vectors for use by the model.
(2) And using an edge detection network model based on multi-scale convolution for extracting the edge image fusing multi-scale information.
In this embodiment, the RCF network is used as the edge detection model, and the network is a multi-scale network constructed based on the VGG 16. The convolution is divided into 5 stages, and the adjacent two stages and a general neural network model realize the function of down-sampling similarly through a pooling layer so as to achieve the effect of fusing different scale features. While each convolutional layer uses a convolution operation with a convolutional kernel size of 1 x 1 and a channel depth of 21, and then the output of each section is subjected to an element addition operation to obtain a composite feature. Followed by an up-sampling layer to enlarge the feature size. And (2) using a cross entropy loss and sigmoid layer behind each upsampling layer, then performing coherent form superposition on the outputs of all the upsampling layers, then using a1 x 1 convolution to fuse the outputs of all the upsampling layers, and finally using the same layer to obtain an output.
The specific process is as follows: acquiring a public model of the RCF network, using checkpoint with the best performance to carry out reasoning, wherein the data format is pth, adopting a multi-scale reasoning mode, and carrying out multiple tests, wherein [1.5,2,2.5] is a group of better parameter thresholds, can achieve better edge extraction effect and has no over-high requirement on a machine, and after reasoning in each scale, the edge extraction results are superposed together to obtain a final edge graph.
(3) And acquiring the weak supervision label based on self-adaptive flooding filling and self-adaptive gradient selection through the extracted multi-scale edges and the positions of the detection frames.
Acquiring a json file with point marking information of each image in step (1), then taking the edge image and the json file information as input, and acquiring a pseudo label by adopting adaptive flooding filling, wherein the flooding radius is set as follows:
wherein, I is the input image, and r (I) is the mask radius corresponding to the input image I. h is I And w I Respectively representing the length and width of the input image, and gamma represents a hyper parameterThe number can be set in different tasks.
The labeled ground route of the method is expressed as follows:
wherein S is b Andthe position coordinates of the background pixel and the ith labeled tumor object, respectively. The set of circular masks for flood filling can be defined as:
where C denotes a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius. For the edge detector used in step (2) to detect the edge of the image, it is expressed as: e (I), where E (·) denotes an edge detector, I denotes an input image, and E denotes a generated edge. By means of the above defined variables, the image is divided into a plurality of connected regions:
where F (I) represents a connected region obtained after flood filling, so for a point labeled as a foreground label in the json document, γ =20, that is, 1/20 of the length of the smaller of the length and the width of the image is used as the radius of the flood filling in the present embodiment, and for a point of the background, γ =8, that is, 1/8 of the smaller of the length and the width of the image is used as the radius of the flood filling in the present embodiment.
The specific process is as follows: reading an edge image and a json file with point labels, wherein the points labeled by labelme store corresponding information in a dictionary form, a 'shape' field represents the information of the points, data in the field is a list, and each element in the list represents a point. Then, the radius of the flooding is set according to the formula, and the pixel value range for preventing the flooding on the three channels of RGB is set, and since the image is originally a gray scale map, the three channels are set to have the same value, the flooding truncation pixel value adopted in the present embodiment is (10, 10, 10), and each data also includes a "label" field, which is divided into two types in the present embodiment, one type is a tumor part, for example, the field is defined as "formed round", otherwise, the normal part is "back ground", and two types of images are obtained by performing flooding filling twice respectively, one type is a pseudo label map, that is, a mask map only including a focus region, and the other type is a mask image including both the focus point region and the normal point region, and the function of performing loss calculation of local cross entropy.
(4) Training a segmentation model, adopting a CNN and Transformer mixed model, and adjusting and using different Embedding models according to different preset parameters.
In the Embedding part, a feature map of 3 stages (stage 2, stage 3, and stage 5, respectively) is extracted using ResNetV2, and patch Embedding is performed based on ViT.
Next, 12 Attention Encode blocks are designed in the encoder part, and each block is set according to ViT, namely an Attention module and an MLP module.
The Decoder part consists of two components, a ViT Decoder and an approximate edge detector, and the ViT Decoder is four cascaded convolutional layers, each layer having a batch normalization layer, a ReLU activation layer and an upsampling layer, with the characteristic output of the encoder part as input, and the corresponding characteristics of each layer of decoders denoted as D = { D = { D } i I =1,2,3,4}; furthermore, due to the lack of structure and detail in weakly supervised labeling, the present solution designs an approximate edge decoder as an approximate edge detector to generate structure to overcome this drawback, in particular, the output of the approximate edge detector can be denoted as f e =σ(cat(R 3 ,D 2 ) Where σ represents a 3 × 3 convolutional layer, including normalization and ReLU layers, the edge feature map e may be represented by the graph at f e Then adding a 3X 3 convolution layer to obtain,followed by mixing f e And D 3 、cat(f e ,D 3 ) Merging and passing through two convolution layers to obtain a multi-channel feature f s . Similar to e, the final single channel signature s can also be obtained in the same way.
The specific implementation operation is as follows: firstly, a pool variable is set according to the requirement to judge whether a mixed model is used, and if the mixed model is used, the training of the edge reconstruction capability in a transform is required. Then the rescetv 2 pre-training model is used to extract features before patch embedding, the extracted features being three features, stage 2,3, 5 of rescetv 2, as one of the decoder inputs. The configuration of this part and the configuration of ResNetV2 remain the same. In the section that proceeds ViT, the patch size takes 16 and the embedding dimension takes 768. The Encoder also basically conforms to a conventional ViT, and the embodiment uses 12 blocks to form the Encoder, which also conforms to a conventional ViT configuration.
(5) And (4) performing a first round of model training in the step (4) by using the labels obtained in the weak supervision mode in the step (3), wherein the training input is as follows: a CT slice image after 2D preprocessing, an edge image after preprocessing and a weak label image aligned with the length and width of the CT image. In the technical scheme, a joint loss function is constructed by adopting binary cross entropy loss, local cross entropy loss and gated CRF loss. For the edge decoder branch, the binary cross entropy loss is used to constrain e:
where y is the true label, e represents the edge map, and r and c represent the row and column coordinates of the image. While the decoder branch uses local cross-entropy loss and gated CRF loss. The purpose of the partial binary cross entropy loss design is to let the model focus only on the determined regions and ignore the uncertain regions:
where J denotes the marker region, g denotes the ground truth, and s denotes the predicted tumor map. In order to learn better object structures as much as possible, gated CRFs are also used in the loss function:
wherein K i The area covered by the k × k range around the pixel i, d (i, j) is defined as:
d(i,j)=|s i -s j |
wherein s is i And s j For confidence values of s at locations i and j, |, represents the L1 distance. f (i, j) is a Gaussian kernel bandwidth filter:
whereinFor normalized weights, I (-) and PT (-) are the gray value of the pixel and the position of the pixel, σ PT And σ I Is a hyperparameter for controlling the scale of the Gaussian kernel. The total loss function is defined as:
L final =α 1 L bce +α 2 L pbce +α 3 L gcrf
wherein alpha is 1 ,α 2 ,α 3 Are the weights for the three loss functions. In the present embodiment, all are set to 1.
The specific implementation is as follows: the original image, the weak label containing only the tumor site, the weak label containing the tumor and the normal site, and the edge map are input and converted into a tenator form variable of 352 × 352. After initializing the network, the parameter configuration of the first round of training is respectively as follows: the batch size is 32, the optimizer uses an SGD optimizer, the initial learning rate is 0.01, the momentum is 0.9, the initial attenuation rate is 0.1, and attenuation change is carried out every 10 epochsSetting the attenuation to be minimum 5X 10 -4 The training length is 100 epochs.
Fig. 2 is a schematic diagram of the image group-tag pair configuration in the data in the present embodiment. The original image is mhd format data based on the LUNA16 data set, the visualization result is shown in the figure, and the second diagram shown is a preprocessing diagram, which cuts out the main part of the lung and discards other redundant parts. The third diagram shown is a schematic diagram of an edge map, which is obtained by an edge detection network, and is used both for adaptive flood filling and for supervising the process of approximating heavy results of an edge decoder. The last diagram is a schematic diagram of the point surveillance marking of the embodiment, and the points with the first color are marked on the part marked as the lesion or the tumor, and the points with random positions of normal non-lesion areas are shown as the second color points on the diagram.
Fig. 3 is a structural diagram of a transform coding block in the present embodiment. The feature after passing Embedding is denoted F 0 The mixture was input to a Transformer of the mixture model, as shown in FIG. 3. In order to inhibit overfitting of a downstream task, normalization operation is added before a multi-head attention layer and a feedforward neural network layer. The multi-head attention layer is designed to learn different projection methods by a plurality of attentions in order to recognize different patterns. The feedforward neural Network layer (Feed Forward Network), FFN for short, the structure of FFN is as follows:
FFN(h i )=GeLU(h i W 1 +b 1 )W 2 +b 2
wherein h is i Is a vector of the hidden layer. W 1 ,b 1 ,W 2 ,b 2 Is a parameter of FFN, geLU is the activation function. As shown in FIG. 3, the module is stacked 12 times in the entire model, F after Embedding 0 As the first input of the encoder, the output is F 1 And by analogy, the encoder of the model is finally formed.
Fig. 4 is a block diagram of an embodiment of the present invention incorporating a near-edge decoder. The output of the approximate edge detector can be expressed as f e =σ(cat(R 3 ,D 2 ) Where σ represents a 3 x 3 convolutional layer comprising BN and ReLU layers, with the previously obtained edge map as the supervision. The final edge feature map e may be obtained by applying f e Then adding a 3X 3 convolutional layer, followed by e And D 3 With cat (f) e ,D 3 ) Are combined and passed through two convolutional layers to obtain a multi-channel feature f s . Similar to e, the corresponding convolutional layer is finally added to obtain a single channel feature map s.
It can be seen that, in the technical scheme, the actual medical problems and the characteristics of the tumor CT image are fully considered, the complete sketching label is not used as the supervision information of model training, the point label which is easier to label for doctors is used instead, and then some fine structural information is lost. However, after the approximate edge decoder is added, the related structure of the focus position can be gradually refined, in addition, the technical scheme also provides a combined loss function aiming at the medical tumor segmentation task, so that the tumor part and the normal part can be more easily distinguished, namely, the difference between the tumor and the normal tissue is further separated by the loss of local cross entropy, and the gating condition random field can also lead the training of relatively coarse weak labels to be better, thereby bringing certain performance improvement for the CT tumor segmentation task.
The present embodiment further provides a tumor segmentation system for CT images, which includes:
a medical image preprocessing module, which performs medical image preprocessing by using the method in the step (1);
and (3) an edge detection module, which performs edge detection by using the method in the step (2), wherein the obtained result is used for adaptive flooding filling and supervision of the approximate edge decoder.
And (4) a weak label generating module, wherein by using the method in the step (3), only the key point label is used as a supervision signal input, and a weak label can be obtained on the edge image through a self-adaptive flooding filling algorithm and is used for the first training of the model.
And (3) a CNN-ViT mixed segmentation module, wherein the segmentation network model adopts a mixed embedding mode by using the methods in the steps (4) to (5), edges are added into a decoder, and three loss function combinations are set for training. And using the label generated in the first round as a new supervised training second round to obtain a final segmentation model.
In summary, in order to realize efficient and accurate segmentation tasks, the technical scheme includes that a CT tumor image data set is obtained, each 3D data is divided into a plurality of 2D data, weak labeling is carried out on the data, all images are preprocessed, and meanwhile, the CT image data set is divided into a test set and a training set; then pre-training an edge detection network model based on multi-scale convolution, wherein the edge detection network model is used for extracting multi-scale edges and obtaining corresponding edges of medical semantic levels; then, acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted multi-scale edges and the positions of the detection frames; finally, training a segmentation model, adopting a mixed model of CNN and a Transformer, and adjusting to use different models according to different data; on one hand, the weak supervision label is subjected to a first round of training and subjected to combined loss function correction results such as a conditional random field; and on the other hand, the correction result is used as a label training of the second round of training to obtain a final segmentation model.
Therefore, the technical scheme is suitable for automatically generating trainable pseudo labels and establishing a tumor segmentation model facing a CT image in a computer-aided diagnosis and treatment scene, and can be well used for clinical tumor segmentation tasks. According to the technical scheme, the tumor target in the CT image does not need to be manually subjected to pixel-by-pixel fine delineation, and the segmentation model is trained by using the time-saving and labor-saving coarse marking generated automatically. Meanwhile, the technical scheme has high adaptability, and can realize rapid deployment of the convolutional neural network model and the Transformer model which are popular at the present stage. According to the technical scheme, a large amount of manpower marks can be reduced, and a model with more accurate segmentation precision can be trained from limited medical semantic information, so that the efficiency and accuracy of a segmentation task are ensured.
Claims (10)
1. A tumor segmentation method for CT images is characterized by comprising the following steps:
s1, acquiring a CT tumor image data set: dividing each 3D data into a plurality of 2D data, carrying out weak labeling on the 2D data, preprocessing all images, and then dividing a CT tumor image data set into a training set and a test set with real labels;
s2, edge extraction: extracting image edges from the 2D data by adopting an edge detection network model based on multi-scale convolution;
s3, self-adaptive flushing filling generates a pseudo label: acquiring a weak supervision label based on adaptive flooding filling and adaptive gradient selection through the extracted edge and the position of the detection frame;
s4, training a segmentation model: adopting a mixed model of CNN and a Transformer, refining the structure by using an approximate edge decoder, firstly performing a first round of training by using a weak supervision label, and correcting the result by using a combined loss function of three loss functions; then, taking the correction result as a label of a second round of training, and training to obtain a final segmentation model;
and S5, preprocessing the CT tumor image to be segmented, inputting the preprocessed CT tumor image into the segmentation model, and outputting to obtain a corresponding segmentation result.
2. The method of claim 1, wherein the weak labeling of the 2D data in step S1 is performed by: for a lesion region on the 2D data, two points are spotted on the data using annotation software to indicate the presence and absence of a tumor in the region, respectively, to obtain a corresponding json file.
3. The method of claim 1, wherein the preprocessing comprises: the image was adjusted to a uniform 352 × 352 size and normalized.
4. The method of claim 2, wherein the step S2 is to input the preprocessed 2D data into a multi-scale edge detection network model and output the data to obtain an edge feature map.
5. The method of claim 4, wherein the step S3 comprises the following steps:
taking an edge feature graph and a json file as input, and obtaining a pseudo label by adopting self-adaptive flooding filling, wherein the flooding radius is set as:
where I is the input image, r (I) is the mask radius corresponding to the input image I, h I And w I Respectively the length and the width of an input image, and gamma is a set hyper-parameter;
in addition, the labeled group route is:
in the formula, S b Andposition coordinates of the background pixel and the ith labeled tumor object, respectively;
the set of circular masks used for flood filling is defined as:
wherein C is a circle using a lower corner mark variable as a center and an upper corner mark variable as a radius;
then, combining the edge feature map, dividing the image into a plurality of connected areas:
in the formula, F (I) is a connected region obtained after flood filling, E (I) is an edge feature map, E (·) denotes an edge detector, and I is an input image.
6. The method for tumor segmentation oriented to CT images of claim 1, wherein the mixture model of CNN and transform in step S4 includes an Embedding part, an encoder part and a decoder part, wherein the Embedding part uses ResNet to extract feature maps of 3 stages, and then performs patch _ Embedding based on transform;
the encoder part comprises 12 Attention Encode blocks, and each block is set according to a Vision Transformer, namely an Attention module and an MLP module;
the decoder section consists of two components, a ViT decoder and an approximate edge detector.
7. The method of claim 6, wherein the ViT decoder comprises four cascaded convolutional layers, each layer having a Batch Normalization (BN) layer, a ReLU activation layer and an upsampling layer, with the characteristic output of the encoder portion as input, and the corresponding characteristic of each decoder layer being represented as D = { D } i |i=1,2,3,4}。
8. The method of claim 7, wherein the output of the approximate edge detector is:
f e =σ(cat(R 3 ,D 2 ))
where σ represents a 3 × 3 convolutional layer comprising BN and ReLU layers.
9. The method as claimed in claim 4, wherein the training process in step S4 comprises:
the first round of training uses weakly supervised labels for training, and the input of the training is as follows: the method comprises the steps of obtaining a CT slice image after 2D preprocessing, an edge image after preprocessing and a weak label image with aligned length and width of the CT image;
model training adopts an SGD optimizer, the learning rate is adaptively attenuated according to training epochs, attenuation is performed every 10 epochs, the attenuation ratio is 0.1, binary cross entropy loss, local cross entropy loss and gated CRF loss are adopted, and for edge decoder branches, the binary cross entropy loss is used for constraining e:
where y is the true label, e represents the edge map, r and c represent the row and column coordinates of the image, and the decoder branch uses local cross-entropy loss and gated CRF loss, which is designed to let the model focus only on the determined regions and ignore the uncertain regions:
wherein J represents a marker region, g represents a ground truth, and s represents a predicted tumor map;
the gated CRF loss was:
wherein, K i The area covered by the k × k range around the pixel i, d (i, j) is defined as:
d(i,j)=|s i -s j |
wherein s is i And s j For confidence values of s at locations i and j, | · | represents the L1 distance, f (i, j) is a Gaussian kernel bandwidth filter:
wherein,for normalized weights, I (-) and PT (-) are the gray value of the pixel and the position of the pixel, σ PT And σ I To control the hyperparameters of the Gaussian kernel scale, the total loss function is thus defined as:
L final =α 1 L bce +α 2 L pbce +α 3 L gcrf
wherein alpha is 1 ,α 2 ,α 3 Weights corresponding to the binary cross entropy loss, the local cross entropy loss and the gated CRF loss are respectively set;
the second round of training uses a new correction label generated by the first round of training as a ground route to supervise the model and further optimize the segmentation capability of the model, the self-supervision training mode can effectively enhance the understanding of the model on medical image semantics and improve the segmentation precision of tumors, and the model obtained after the second round of training is used as a final segmentation model.
10. A tumor segmentation system facing CT images is characterized by comprising a medical image preprocessing module, an edge detection module, a weak label generation module and a CNN-ViT mixed segmentation module, wherein the image preprocessing module is used for preprocessing CT image data, dividing a medical image data set into a test set and a training set, and converting data in a 3D format into 2D data;
the edge detection module acquires the edge information of the image through multi-scale convolution and combination of set threshold parameters based on an RCF network model;
the weak label generating module adopts point labeling as supervision signal input and obtains a weak label on the edge image through a self-adaptive flooding filling algorithm;
the CNN-ViT mixed segmentation module adopts a mixed Embedding mode, edges are added into a decoder, three loss function combinations are set for a first round of training, and then a label generated in the first round is used as a new supervision training for a second round to obtain a final segmentation model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211398539.8A CN115861181A (en) | 2022-11-09 | 2022-11-09 | Tumor segmentation method and system for CT image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211398539.8A CN115861181A (en) | 2022-11-09 | 2022-11-09 | Tumor segmentation method and system for CT image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115861181A true CN115861181A (en) | 2023-03-28 |
Family
ID=85662856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211398539.8A Pending CN115861181A (en) | 2022-11-09 | 2022-11-09 | Tumor segmentation method and system for CT image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115861181A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116758048A (en) * | 2023-07-06 | 2023-09-15 | 河北大学 | PET/CT tumor periphery feature extraction system and extraction method based on transducer |
CN117079273A (en) * | 2023-07-17 | 2023-11-17 | 浙江工业大学 | Floating algae microorganism detection method based on deep learning |
CN117541798A (en) * | 2024-01-09 | 2024-02-09 | 中国医学科学院北京协和医院 | Medical image tumor segmentation model training method, device and segmentation method |
CN117789206A (en) * | 2024-01-05 | 2024-03-29 | 中国矿业大学 | Bimodal coal rock micro-component identification method based on double decoder fusion UNetFormer architecture |
CN118470712A (en) * | 2024-07-11 | 2024-08-09 | 吉林农业大学 | Corn plant heart identification method |
-
2022
- 2022-11-09 CN CN202211398539.8A patent/CN115861181A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116758048A (en) * | 2023-07-06 | 2023-09-15 | 河北大学 | PET/CT tumor periphery feature extraction system and extraction method based on transducer |
CN116758048B (en) * | 2023-07-06 | 2024-02-27 | 河北大学 | PET/CT tumor periphery feature extraction system and extraction method based on transducer |
CN117079273A (en) * | 2023-07-17 | 2023-11-17 | 浙江工业大学 | Floating algae microorganism detection method based on deep learning |
CN117789206A (en) * | 2024-01-05 | 2024-03-29 | 中国矿业大学 | Bimodal coal rock micro-component identification method based on double decoder fusion UNetFormer architecture |
CN117541798A (en) * | 2024-01-09 | 2024-02-09 | 中国医学科学院北京协和医院 | Medical image tumor segmentation model training method, device and segmentation method |
CN117541798B (en) * | 2024-01-09 | 2024-03-29 | 中国医学科学院北京协和医院 | Medical image tumor segmentation model training method, device and segmentation method |
CN118470712A (en) * | 2024-07-11 | 2024-08-09 | 吉林农业大学 | Corn plant heart identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115861181A (en) | Tumor segmentation method and system for CT image | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
WO2022001623A1 (en) | Image processing method and apparatus based on artificial intelligence, and device and storage medium | |
CN114120102A (en) | Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium | |
CN109903292A (en) | A kind of three-dimensional image segmentation method and system based on full convolutional neural networks | |
CN111127482A (en) | CT image lung trachea segmentation method and system based on deep learning | |
CN108921851A (en) | A kind of medicine CT image dividing method based on 3D confrontation network | |
CN113674253A (en) | Rectal cancer CT image automatic segmentation method based on U-transducer | |
CN114092439A (en) | Multi-organ instance segmentation method and system | |
CN110223300A (en) | CT image abdominal multivisceral organ dividing method and device | |
CN112465754B (en) | 3D medical image segmentation method and device based on layered perception fusion and storage medium | |
Shu et al. | LVC-Net: Medical image segmentation with noisy label based on local visual cues | |
CN117078930A (en) | Medical image segmentation method based on boundary sensing and attention mechanism | |
CN116188479B (en) | Hip joint image segmentation method and system based on deep learning | |
Feng et al. | Deep learning for chest radiology: a review | |
US20210312243A1 (en) | Method for synthesizing image based on conditional generative adversarial network and related device | |
Li et al. | Learning depth via leveraging semantics: Self-supervised monocular depth estimation with both implicit and explicit semantic guidance | |
EP4405893A1 (en) | A computer-implemented method of enhancing object detection in a digital image of known underlying structure, and corresponding module, data processing apparatus and computer program | |
Li et al. | Image segmentation based on improved unet | |
CN116580202A (en) | Mammary gland medical image segmentation method based on improved U-net network | |
CN117437423A (en) | Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement | |
CN115222748A (en) | Multi-organ segmentation method based on parallel deep U-shaped network and probability density map | |
CN114972382A (en) | Brain tumor segmentation algorithm based on lightweight UNet + + network | |
CN114581474A (en) | Automatic clinical target area delineation method based on cervical cancer CT image | |
CN116977338B (en) | Chromosome case-level abnormality prompting system based on visual semantic association |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |