CN118229981A

CN118229981A - CT image tumor segmentation method, device and medium combining convolutional network and transducer

Info

Publication number: CN118229981A
Application number: CN202410641655.0A
Authority: CN
Inventors: 王东骥; 程海博; 涂燕晖; 陈一昕
Original assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Current assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Priority date: 2024-05-23
Filing date: 2024-05-23
Publication date: 2024-06-21

Abstract

The invention provides a CT image tumor segmentation method, device and medium combining a convolution network and a transducer, and belongs to the technical field of image processing. The method comprises the following steps: collecting an original data set, preprocessing the data set, and dividing the data set into a training set, a verification set and a test set; a CT image tumor segmentation network is established, wherein the CT image tumor segmentation network comprises a basic coding module, an attention feature fusion module, a spatial pyramid pooling module and an attention gate module, wherein the basic coding module consists of a convolution network and a transducer network; training a CT image tumor segmentation network by using training set data, and optimizing model parameters; and deploying a CT image tumor segmentation network after training, collecting CT images to be segmented, preprocessing, and inputting the CT image tumor segmentation network to obtain segmentation results. Compared with the existing scheme, the method realizes more accurate liver tumor segmentation and provides powerful technical support for clinical diagnosis and treatment scheme formulation.

Description

CT image tumor segmentation method, device and medium combining convolutional network and transducer

Technical Field

The invention relates to a CT image tumor segmentation method, device and medium combining a convolution network and a transducer, belonging to the technical field of image processing.

Background

The automatic segmentation of liver tumors has important significance for diagnosis, staging and treatment planning of liver tumors. Compared with manual segmentation, the automatic segmentation can remarkably improve the working efficiency, reduce the labor cost and lighten the workload of doctors. However, automatic tumor segmentation of liver CT images faces the following difficulties: the shape, size and location of liver tumors vary greatly. The different patients and the same patient have great differences in different time points, so that the applicability of the segmentation algorithm is challenged; compared with normal liver tissue, the image contrast of the tumor is lower and the boundary is not clear. This presents difficulties for the recognition capabilities of the segmentation algorithm.

Currently, a statistical-based method or a machine learning method and a deep learning method are mostly adopted for liver CT image tumor segmentation. The concrete main steps are as follows:

1. Image segmentation methods based on statistics, such as methods based on probability distribution modeling, methods based on cluster analysis, and the like. The method can capture the difference of tumor and normal tissue on image statistical characteristics (such as gray scale, texture and the like) for classification and segmentation.

2. The classification segmentation method based on traditional machine learning firstly extracts feature vectors representing image statistical characteristics, and commonly used features based on shape, texture and frequency spectrum information. Then dividing the image into small blocks to be used as samples, training a class-II classification model such as a support vector machine, a decision tree and the like by using the sample data, and finally predicting the small blocks of the test image by using a trained classifier to combine the small blocks into a final segmentation result.

3. The deep learning segmentation method based on the convolutional neural network directly learns the hierarchical semantic feature expression on the image data through operations such as convolution, pooling and the like, and does not need to manually design and extract features. Representative split networks such as UNet of U-type structure encode-decode FCNs of structure.

The existing segmentation method has the following technical defects:

1. traditional segmentation methods based on simple statistical analysis mainly rely on statistical information such as image intensity distribution for modeling, and cannot effectively characterize semantic concepts in complex cases. The method is sensitive to image quality and contrast variation, has poor adaptability and robustness to the conditions of organ texture difference, focus heterogeneity, noise and the like existing in a test sample, and cannot effectively divide tumors with abnormal forms.

2. Machine learning based segmentation methods rely on low-level image features of artificial design that are difficult to capture the high-level semantic concepts of images and have very limited ability to express complex lesions. Therefore, the segmentation result obtained by the method has a rough structure, fewer details and low structural accuracy, and cannot meet the requirements of clinical accurate diagnosis and treatment.

3. In the existing segmentation method based on deep learning, most of network structures are convolutional neural networks. While such convolution methods can learn the hierarchical feature representation end-to-end, the convolution operation of their cores is more focused on local features. Therefore, the existing network has weak global morphological expression of the focus and adaptability to abnormal samples, and the edge positioning accuracy is not high, which limits the segmentation accuracy.

Disclosure of Invention

The invention aims to provide a CT image tumor segmentation method, device and medium combining a convolution network and a transducer, which can effectively alleviate the problems of unclear segmentation of different forms and boundaries.

The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:

collecting an original data set, preprocessing the data set, and dividing the data set into a training set, a verification set and a test set;

A CT image tumor segmentation network is established, wherein the CT image tumor segmentation network comprises a basic coding module, an attention feature fusion module, a spatial pyramid pooling module and an attention gate module, wherein the basic coding module consists of a convolution network and a transducer network; the CT image sequentially passes through a basic coding module, an attention feature fusion module and a space pyramid pooling module, and then is rolled and upsampled to obtain a feature map The convolution network output feature map is sent to an attention gate module to obtain a feature map/>Feature map/>And/>Generating a segmentation prediction result through a convolution layer in cascade connection;

training a CT image tumor segmentation network by using training set data, and using a combined loss function combining binary cross entropy loss and Dice loss as an optimization target optimization model parameter;

and deploying a CT image tumor segmentation network after training, collecting CT images to be segmented, preprocessing, and inputting the CT image tumor segmentation network to obtain segmentation results.

Preferably, the convolutional network encoder is composed of 5 standard convolutional blocks, each block comprises convolution, BN and an activation layer, the input CT image is encoded, and local bottom visual features are extracted; the transducer encoder consists of 3 standard transducer encoder layers, models an input CT image through a self-attention mechanism, and captures long-range dependency and global context information.

Preferably, the attention feature fusion module processes the feature map as follows:

Feature map of outputs of convolutional and transform networks And/>Splicing in the channel dimension to obtain a feature map/>；

Map the characteristic mapAlternately inputting the attention weighting modules; the attention weighting module comprises two paths, wherein a feature map/>, in one pathSequentially passing through convolution, batch normalization, relu function blocks, convolution and batch normalization blocks, and feature map/>, in another pathSequentially executing pooling operation, convolution, batch normalization and Relu function blocks, splicing output feature graphs of two paths, and acquiring weights by using a Sigmoid function; and performing point multiplication weighting operation on the output of the convolution network in the basic coding module by using the weight, and performing point multiplication weighting operation on the output of the transform network by using the output obtained by the point multiplication weighting.

Preferably, the spatial pyramid pooling module includes 5 parallel convolution branch paths: the first branch path uses a standard 1 multiplied by 1 convolution check input feature map to carry out convolution, and low-level fine-grained local features are obtained; the second branch path is convolved with a hole with an expansion rate of 6; the third branch path uses hole convolution with expansion rate of 6; the fourth branch path uses hole convolution with expansion rate of 6; the fifth path sequentially comprises a global average pooling layer, 1×1 convolution and bilinear interpolation; the output feature graphs of the five paths are spliced in the channel dimension, and multi-scale convolution features are fused; and carrying out channel dimension reduction on the spliced feature map through 1X 1 convolution, and outputting a final multi-scale fusion feature representation.

Preferably, the attention gate module takes a low-level characteristic diagram of a shallow layer of the convolutional network as a characteristic input, and takes the characteristic diagramAs gating signal input, the signals are spliced after being respectively subjected to convolution and batch normalization; the spliced feature images are subjected to convolution, batch normalization and Relu function blocks, and then a Sigmoid function is used for generating an attention coefficient feature image; the spatial regions of the shallow low-level features are adaptively weighted by the attention coefficient feature map.

Preferably, the joint loss functionThe following are provided:

，

Wherein, Representation/>Weight coefficient of loss,/>Representation/>Weight coefficient of loss,/>For a true mark to be true,Output probability for model,/>And/>Pixels of the tumor region and the model prediction region are truly labeled.

Preferably, the method comprises the steps of,The value is 0.6,/>The value is 0.4.

The invention has the advantages that: the invention uses the fusion of the convolution network and the transform encoder as the basic encoding module, can simultaneously acquire local detail characteristics and global context characteristics, and greatly enhances the characteristic expression capability. The attention feature fusion module can adaptively and weighted fuse the rolled and Transformer features from different sources through an attention mechanism, selectively highlight important features, and capture semantic-rich feature representations. The spatial pyramid pooling module and the attention gate module can respectively fuse the context semantic features and the detail features with different scales, so that the model has strong adaptability, and tumors with various sizes and forms can be segmented efficiently. The creative loss function is designed to balance the supervision signals of the local pixel level and the global level, and the model is guided to pay attention to details and consistency at the same time, so that a fine and accurate segmentation result is generated.

The invention can effectively fuse local and global characteristic information end to end, fully express multi-scale semantic information, improve adaptability to tumor morphological change, fineness of edge positioning and segmentation robustness to complex cases, realize more accurate liver tumor segmentation than the existing scheme, and provide powerful technical support for clinical diagnosis and treatment scheme formulation.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

FIG. 1 is a schematic flow chart of the method of the invention.

Fig. 2 is a schematic diagram of a network structure according to the present invention.

Fig. 3 is a schematic diagram of the attention feature fusion module structure of the present invention.

FIG. 4 is a schematic diagram of a spatial pyramid pooling module structure according to the present invention.

Fig. 5 is a schematic view of the structure of the attention door module of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in FIG. 1, a CT image tumor segmentation method combining a convolution network and a transducer comprises the steps of original data and processing, network structure, training by using a joint loss function and model application. The specific implementation mode is as follows:

S1: the method comprises the steps of collecting an original data set, preprocessing the data set, and dividing the data set into a training set, a verification set and a test set.

S2: a CT image tumor segmentation network is established, wherein the CT image tumor segmentation network comprises a basic coding module, an attention feature fusion module, a spatial pyramid pooling module and an attention gate module, wherein the basic coding module consists of a convolution network and a transducer network; the CT image sequentially passes through a basic coding module, an attention feature fusion module and a space pyramid pooling module, and then is rolled and upsampled to obtain a feature mapThe convolution network output feature map is sent to an attention gate module to obtain a feature map/>Feature map/>And/>The concatenation generates a segmentation prediction result through a convolution layer.

S3: training a CT image tumor segmentation network by using training set data, and using a joint loss function combining binary cross entropy loss and Dice loss as an optimization target optimization model parameter.

S4: and deploying a CT image tumor segmentation network after training, collecting CT images to be segmented, preprocessing, and inputting the CT image tumor segmentation network to obtain segmentation results.

As a refinement of step S1, the data used in the present invention is from LiTS (TheLiverTumorSegmentationBenchmark) datasets, which collected data from 7 different medical centers, including 131 training sets and 70 test sets. Each patient had a gold standard labeling of liver tumor segmentation. These CT scans reflect a number of variations from patient to patient, from scanning device to scanning device, and from pathology type to pathology type, all of which are 512 x 512 in image size.

In order to enhance the generalization capability and robustness of the model, the method performs data augmentation on all training images before training. Specifically, a series of image processing transformations are performed on the paired original liver CT images and the corresponding labeled segmented images in the same step, including multiple data augmentation modes such as horizontal and vertical flipping, rotation, scaling, translation, gaussian noise addition, brightness adjustment, contrast adjustment, and the like. Therefore, different scanning conditions and pathological changes in a real scene can be simulated, so that the model is contacted with more abundant and various data during training, and better generalization performance is obtained.

After data augmentation, a sufficiently large number of training samples rich in changes are obtained. The invention takes the amplified paired CT images and the segmentation annotation images as input and training targets, and inputs the paired CT images and the segmentation annotation images into the proposed novel network architecture for end-to-end training.

As a refinement of step S2, as shown in fig. 2 to 5, a specific module structure is schematically shown.

1. Basic coding module

The convolution network and the converter network form a basic coding module, the convolution network coder consists of 5 standard convolution blocks, each block comprises convolution, BN and an activation layer, and the input CT image is coded through the basic operations to extract local bottom visual characteristics. The transducer encoder is then composed of 3 standard transducer encoder layers that model the incoming CT image by a self-attention mechanism, capturing long-range dependencies and global context information, forming a higher level feature representation. Feature maps from different levels of the convolutional encoder and the transform encoder will be fed in parallel into the attention feature fusion module.

Specifically, the treatment process is as follows:

S2-1: the attention feature fusion module first fuses features from convolutional network branches and transform branches And/>Splice in the channel dimension. The stitched features are then alternately input into a series of attention weighting modules to explicitly model the interdependencies between the convolved feature channels.

S2-2: the structure of the attention weighting module is shown in fig. 3, the spliced characteristic diagram is divided into two paths, and the characteristic diagram in one pathSequentially passing through convolution, batch normalization, relu function blocks, convolution and batch normalization blocks, and feature map/>, in another pathAnd sequentially executing pooling operation, convolution, batch normalization and Relu function blocks, splicing the output feature graphs of the two paths, and then acquiring weights by using a Sigmoid function.

S2-3: the output of the convolution network in the basic coding module is weighted by the weight, and then the obtained output is used for further carrying out dot multiplication on the output of the transform network, so that the weighting operation is realized.

The attention mechanism module can adaptively allocate attention weights to different feature channels and selectively highlight or inhibit different features, thereby capturing multi-scale context semantics and enhancing the feature expression of a tumor region.

2. Space pyramid pooling module

To accommodate a wide variety of tumor sizes and morphologies, the spatial pyramid pooling module receives the output of the attention feature fusion module, which includes 5 parallel convolution branch paths:

the first branch path convolves the input feature map using a standard 1 x1 convolution kernel to obtain low-level fine-grained local features.

The second branch path is convolved with a hole with an expansion rate of 6; the third branch path uses hole convolution with expansion rate of 6; the fourth branch path uses hole convolution with expansion rate of 6; the receptive field range is gradually enlarged by controlling the expansion rate, and the medium-scale semantic features are extracted.

The fifth path sequentially comprises a global average pooling layer, 1×1 convolution and bilinear interpolation, and is restored to the original resolution to capture global profile features.

The output feature graphs of the five paths are spliced in the channel dimension, and multi-scale convolution features are fused; and carrying out channel dimension reduction on the spliced feature map through 1X 1 convolution, and outputting a final multi-scale fusion feature representation. Through the module, the invention can integrate local details, intermediate semantics and global context information under different scales at the same time, so that the feature expression has strong adaptability and robustness, and the influence of size and form changes on segmentation is overcome, thereby improving the accuracy and generalization capability of liver tumor segmentation.

3. Attention door module

The attention gate module takes a low-level characteristic diagram of a convolutional network shallow layer as characteristic input, and takes the characteristic diagramAs gating signal input, the signals are spliced after being respectively subjected to convolution and batch normalization; the spliced feature images are subjected to convolution, batch normalization and Relu function blocks, and then a Sigmoid function is used for generating an attention coefficient feature image; the spatial regions of the shallow low-level features are adaptively weighted by the attention coefficient feature map. The attention coefficient feature map adaptively weights the spatial region of the shallow low-level features, retains and strengthens important detail information related to a gating signal (high-level semantics), and suppresses irrelevant background regions.

The output feature map of the attention gate module is cascaded with the feature maps of other branches, and the split prediction is generated comprehensively through a convolution layer and an up-sampling layer to be combined into a finished network. The problem that edge details of a segmentation result are unclear due to poor image quality is solved through the mechanism, and particularly, after a plurality of rolling and downsampling operations in a network, some detail semantic information in an original input image can be lost, so that the fineness of a final segmentation contour is affected. The attention gate module successfully fuses shallow detail and deep semantic features, helps the segmentation network to better reserve the detail features, and accordingly generates a finer and accurate segmentation contour result.

As a refinement of step S3, the joint loss functionThe following are provided:

，

The method is a standard binary cross entropy loss, and is used for comparing prediction and true value pixel by pixel and describing local detail differences; the Dice similarity coefficients between the prediction mask and the truth mask are calculated, providing a global evaluation.

The two are combined, complementary loss gradient information can be provided, moderate change is introduced, and meanwhile, the smooth stability of a loss curved surface is ensured due to the existence of BCE loss. The present invention is directed to the characteristics of tumors in liver CT images,The value is 0.6,/>The value is 0.4. Through the innovative joint loss function design, the segmentation network can synthesize the supervision signals of the local pixel level and the global semantic level, and is beneficial to generating finer and more accurate segmentation results.

As refinement of step S4, in the network training process, reasonable parameter settings such as batch size, optimizer and learning rate are adopted, in this embodiment, batch size is 4, optimizer is Adam, learning rate is 0.001, andThe joint loss function is used as an optimization target, so that the model can pay attention to local detail and global similarity at the same time. Training will continue until the segmentation performance on the validation set is no longer improved. Finally, the model parameters with optimal performance are saved, and the preparation is made for the subsequent clinical application.

In practical application, the trained model is deployed and integrated to a server side of the medical image analysis system. After uploading new CT image data of the patient, the system automatically performs necessary preprocessing on the image, adjusts the image to be consistent with training data, and then inputs the image into a segmentation model for forward reasoning. The model outputs a high-quality segmentation probability map, generates a final binary segmentation contour based on the probability map, and outputs a visualized segmentation result after fusion with the original image data.

The segmentation result can be widely applied to links such as clinical tumor detection, three-dimensional reconstruction, operation navigation, curative effect evaluation, volume calculation and the like, and the diagnosis and treatment working efficiency is greatly improved. Moreover, the system has the potential of continuously optimizing the segmentation model: continuously collecting new labeling samples in the actual application process, adding the new labeling samples into a training set to perform incremental learning or fine adjustment regularly, and further improving the segmentation performance by adopting strategies such as multi-model integration and the like; the attention module weight can be adjusted according to feedback, and the semi-supervision/weak supervision learning paradigm and other technologies are introduced, so that the model has stronger adaptability and robustness, and high segmentation precision is maintained for a long time.

Example 2

The embodiment of the disclosure also provides a CT image tumor segmentation device combining a convolution network and a transducer, which comprises a processor and a memory. Optionally, the apparatus may further comprise a communication interface (CommunicationInterface) and a bus. The processor, the communication interface and the memory can complete communication with each other through the bus. The communication interface may be used for information transfer. The processor may invoke logic instructions in the memory to perform the CT image tumor segmentation method of the above-described embodiments that combines a convolutional network and a transducer.

Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.

The memory is used as a computer readable storage medium for storing a software program, a computer executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor executes the program instructions/modules stored in the memory to perform the functional application and data processing, i.e., to implement the CT image tumor segmentation method in combination with the convolutional network and the transducer in the above embodiments.

The memory may include a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the terminal device, etc. Further, the memory may include a high-speed random access memory, and may also include a nonvolatile memory.

Embodiments of the present disclosure provide a computer readable storage medium storing computer executable instructions configured to perform the above-described CT image tumor segmentation method combining a convolutional network and a transducer.

The computer readable storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.

Embodiments of the present disclosure may be embodied in a software product stored on a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of a method according to embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium including: a plurality of media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or a transitory storage medium.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A CT image tumor segmentation method combining a convolutional network and a transducer, comprising:

2. The method for segmenting a tumor of a CT image combining a convolutional network and a fransformer according to claim 1, wherein the convolutional network encoder is composed of 5 standard convolutional blocks, each block comprises a convolutional layer, a BN layer and an active layer, the input CT image is encoded, and local underlying visual features are extracted; the transducer encoder consists of 3 standard transducer encoder layers, models an input CT image through a self-attention mechanism, and captures long-range dependency and global context information.

3. The CT image tumor segmentation method combining convolutional network and transducer according to claim 1, wherein the attention feature fusion module processes the feature map as follows:

4. The CT image tumor segmentation method combining a convolutional network and a transducer of claim 1, wherein the spatial pyramid pooling module comprises 5 parallel convolutional branch paths: the first branch path uses a standard 1 multiplied by 1 convolution check input feature map to carry out convolution, and low-level fine-grained local features are obtained; the second branch path is convolved with a hole with an expansion rate of 6; the third branch path uses hole convolution with expansion rate of 6; the fourth branch path uses hole convolution with expansion rate of 6; the fifth path sequentially comprises a global average pooling layer, 1×1 convolution and bilinear interpolation; the output feature graphs of the five paths are spliced in the channel dimension, and multi-scale convolution features are fused; and carrying out channel dimension reduction on the spliced feature map through 1X 1 convolution, and outputting a final multi-scale fusion feature representation.

5. The CT image tumor segmentation method combining convolutional network and transducer as recited in claim 1, wherein the attention gate module takes a low-level feature map of a shallow layer of the convolutional network as a feature input and takes the feature map as a feature inputAs gating signal input, the signals are spliced after being respectively subjected to convolution and batch normalization; the spliced feature images are subjected to convolution, batch normalization and Relu function blocks, and then a Sigmoid function is used for generating an attention coefficient feature image; the spatial regions of the shallow low-level features are adaptively weighted by the attention coefficient feature map.

6. The CT image tumor segmentation method combining convolutional network and transducer of claim 1, wherein the joint loss functionThe following are provided:

，

Wherein, Representation/>Weight coefficient of loss,/>Representation/>Weight coefficient of loss,/>Is true mark,/>Output probability for model,/>And/>Pixels of the tumor region and the model prediction region are truly labeled.

7. The method for CT image segmentation as recited in claim 6, wherein the CT image segmentation method comprises the steps of,The value is 0.6,/>The value is 0.4.

8. A CT image tumor segmentation apparatus incorporating a convolutional network and a transducer comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the CT image tumor segmentation method incorporating a convolutional network and a transducer as claimed in any one of claims 1-7 when executing the program instructions.

9. A computer readable storage medium having stored thereon a computer program for execution by a processor of a CT image tumor segmentation method combining a convolutional network and a transducer according to any one of claims 1-7.