CN117745746A

CN117745746A - Image segmentation method and device based on deformable nnUNet

Info

Publication number: CN117745746A
Application number: CN202410183518.7A
Authority: CN
Inventors: 赵晶鑫; 句福娇; 吴奕初; 王雅; 李原
Original assignee: Beijing University of Technology; Fourth Medical Center General Hospital of Chinese PLA
Current assignee: Beijing University of Technology; Fourth Medical Center General Hospital of Chinese PLA
Priority date: 2024-02-19
Filing date: 2024-02-19
Publication date: 2024-03-22
Anticipated expiration: 2044-02-19

Abstract

The invention relates to the field of image processing, and particularly discloses an image segmentation method and device based on deformable nnUNet. Acquiring a 3D pelvic image, identifying one or more target regions in the 3D pelvic image based on a deformable nnune comprising a deformable convolution-based feature encoder, a feature fusion module, and a feature decoder connected in sequence; calculating reconstruction parameters based on the 3D pelvic image, the reconstruction parameters including at least a radius of a disc region, the reconstruction parameters being calculated based on an area of a largest connected component of the 3D pelvic image; a segmented image of the one or more target regions is obtained based on the reconstruction parameters. The accuracy of image segmentation and the effect of image reconstruction are improved.

Description

Image segmentation method and device based on deformable nnUNet

Technical Field

The invention relates to the field of image processing, in particular to an image segmentation method and device based on deformable nnUNet.

Background

Pelvic segmentation has been an essential step in clinical diagnosis and surgical planning of pelvic bone disease, and the processing of CT images is one fundamental task of this step, with CT images preserving the actual anatomy including depth information, providing the surgeon with more detailed information about the lesion site and therefore often used for three-dimensional reconstruction for subsequent surgical planning and postoperative evaluation. Thus, accurate segmentation of pelvic bones from pelvic CT images is critical to assess the severity of pelvic injury and to assist the surgeon in making the correct judgment and selecting the appropriate surgical procedure.

The main task of segmenting the pelvic bones from the pelvic CT image is to acquire four important joints belonging to the pelvic bone structure: segmented representations of lumbar, sacrum, left hip and right hip. The prior art pelvic bone segmentation methods are either manual or semi-automatic and have limited accuracy in dealing with changes in image appearance due to multi-site domain offsets, the presence of contrasting blood vessels, fecalix and chyme, fractures, low doses, metal artifacts, etc. With the development of the deep learning technology in the fields of image segmentation, classification and feature learning, the pelvis CT image segmentation and classification by utilizing the computer vision and feature recognition and fusion technology is an effective solution.

In recent years, due to the wide and successful application of UNet on medical images, many variants have been derived, nnUNet being a variant in which performance is very good, this network focused on rationally designed data processing procedures and leveraging the features of data sets to train models, performance beyond other UNet variants that make adjustments on the architecture, and was introduced into a wide variety of medical image processing tasks. However, in the prior art, when the nnUNet is utilized to segment the pelvis CT image in the medical field, a 3D pelvis CT image needs to be acquired so as to perform training of a deep learning model, wherein training, verification and testing of the pelvis CT segmentation model need to be performed by using the marked clear 3D CT image, so that the fitting and generalization capability of the model to a new sample is improved, but the pelvis CT image contains noise such as fracture, blood vessels, metal artifacts and the like, and the precision of CT image segmentation is reduced.

Therefore, in the process of segmenting and reconstructing the 3D pelvis CT, due to the fact that pelvis data are large in size, noise such as metal is possibly generated, boundaries are not clear, and the like, and a data set which is effectively marked is not trained for a long time, the pelvis segmentation and reconstruction are not studied with results, model learning characteristics are inaccurate, and segmentation and reconstruction effects are poor.

Disclosure of Invention

The invention aims to provide an image segmentation method and device based on deformable nnune, which are used for solving the technical problem of poor image segmentation effect in the background technology.

The first aspect of the present invention provides an image segmentation method based on deformable nnUNet, which specifically includes:

acquiring a 3D pelvic image, identifying one or more target regions in the 3D pelvic image based on a deformable nnune comprising a deformable convolution-based feature encoder, a feature fusion module, and a feature decoder connected in sequence;

calculating reconstruction parameters based on the 3D pelvic image, the reconstruction parameters including at least a radius of a disc region, the reconstruction parameters being calculated based on an area of a largest connected component of the 3D pelvic image;

a segmented image of the one or more target regions is obtained based on the reconstruction parameters.

Preferably, the deformable convolution-based feature encoder comprises:

the offset adjusts the convolution kernels of a plurality of convolution layers, the offset of the convolution kernels of each convolution layer is not exactly the same,

connecting a plurality of offset convolution layers in a residual connection mode to form a convolution module,

the convolution module and the difference layer are connected to form a deformable convolved feature encoder,

the feature encoder based on deformable convolution is used for identifying irregular pelvis medical features in the 3D pelvis image, and an encoded 3D pelvis image feature map is obtained.

Preferably, the adjusting the convolution kernels of the plurality of convolution layers specifically includes:

(1)

(2)

preferably, the difference layer adjusts the output of the convolution module in the following manners (3), (4):

(3)

(4)

preferably, the feature fusion module is composed of a Ladder-ASPP, and specifically includes:

the system comprises an upper global pooling layer and a lower fusion layer, wherein the upper global pooling layer at least comprises a global pooling module, and the output of the global pooling module is mixed with the input image of the feature fusion module;

the downlink fusion layer at least comprises a plurality of parallel hole convolution layers with variable sampling rates and fusion layers, the hierarchy of each parallel hole convolution layer is different, and the input of each parallel hole convolution layer at least comprises an input image of the input feature fusion module and the output of the upper parallel hole convolution layer.

Preferably, the output of the upper global pooling layer is subjected to convolution and Sigmoid of 1×1 and then is subjected to mixing processing with the input image of the feature fusion module, and the mixing processing is weighted feature channel multiplication.

Preferably, the upper path pooling layer is calculated in the following manner (formula 5):

(5)。

preferably, the drop fusion layer is calculated as follows (formula 6):

（6）。

preferably, training the deformable nnune is further included before identifying one or more target areas in the 3D pelvic image based on the deformable nnune;

the training deformable nnUNet specifically comprises the following steps:

establishing a loss function, training the deformable nnune by using back propagation;

the loss function is formed by the sum of a cross entropy loss function and a dice overall loss function.

A second aspect of the present invention provides an image segmentation apparatus based on a deformable nnUNet, comprising:

a segmentation module for acquiring a 3D pelvic image, identifying one or more target regions in the 3D pelvic image based on a deformable nnune comprising a deformable convolution-based feature encoder, a feature fusion module and a feature decoder connected in sequence;

a calculation module for calculating reconstruction parameters based on a 3D pelvic image, the reconstruction parameters including at least a radius of a disc region, the reconstruction parameters being calculated based on an area of a maximum connected component of the 3D pelvic image;

and a post-processing module for obtaining segmented images of the one or more target regions based on the reconstruction parameters.

The invention provides an image segmentation method and device based on deformable nnune, which utilizes a deformable convolution feature encoder to extract the features of a pelvis image, and compared with the common convolution, the method and device improve the effectiveness and accuracy of the encoder in recognizing and extracting the features of an irregular pelvis image; meanwhile, reconstruction parameters are adaptively calculated based on the condition of the image, and the artifact and the cavity part are accurately removed, so that morphological reconstruction can accord with the actual condition of the image, the reconstruction effect from the characteristic information to the image is improved, and the accuracy and the effect of image segmentation are further improved.

Drawings

Fig. 1 is a flowchart of an image segmentation method based on deformable nnUNet according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a deformable nnUNet model according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a deformable convolution-based feature encoder according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a feature fusion module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a segmented image according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an image segmentation apparatus based on a deformable nnUNet according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of overall architecture of a model of an image segmentation method based on deformable nnUNet according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown.

The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.

All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is further described below with reference to the accompanying drawings.

The whole model framework of the method is as follows: the method proposes a deformable encoder-decoder network and applies it to pelvic segmentation. As can be seen from fig. 7, this is an enhanced nnUNet, which consists of four parts, including encoder, intermediate processing module, decoder and post-processing; stage1 represents a low resolution nnUNet network and stage2 represents a high and low resolution cascaded nnUNet network. The encoder and decoder of the network in both stages are 4 layers, each consisting of a deformable convolution with residual structure. Each layer is such that the length, width, height of the input image is halved and the number of channels is doubled.

The embodiment of the invention provides an image segmentation method based on deformable nnune, as shown in fig. 1, the image segmentation method based on deformable nnune comprises the following steps:

According to the image segmentation method based on the deformable nnUNet, provided by the invention, the feature of the pelvis image is extracted by using the feature encoder of the deformable convolution, and compared with the common convolution, the effectiveness and accuracy of the encoder for identifying and extracting the feature of the irregular pelvis image are improved; meanwhile, reconstruction parameters are adaptively calculated based on the condition of the image, and the artifact and the cavity part are accurately removed, so that morphological reconstruction can accord with the actual condition of the image, the reconstruction effect from the characteristic information to the image is improved, and the accuracy and the effect of image segmentation are further improved.

Specifically, the 3D pelvic image is a 3D pelvic CT image, further comprising training the deformable nnune before identifying one or more target regions in the 3D pelvic image based on the deformable nnune. As shown in fig. 2, the deformable nnune includes a deformable convolution-based feature encoder, a feature fusion module, and a feature decoder, which are connected in sequence. Wherein a deformable convolution-based feature encoder is used to encode features of the 3D pelvic image during downsampling to extract features of the 3D pelvic image, as shown in fig. 3, the deformable convolution-based feature encoder comprising: the method comprises the steps of adjusting convolution kernels of a plurality of convolution layers in an offset mode, connecting the convolution kernels of the convolution layers after the offset by adopting a residual connection mode to form a convolution module, and connecting the convolution module and a difference layer to form a deformable convolution feature encoder, wherein the deformable convolution-based feature encoder is used for identifying irregular pelvis medical features in a 3D pelvis image and obtaining a coded 3D pelvis image feature map.

As an alternative embodiment, the convolution kernels of the multiple convolution layers are adjusted using the following formulas (1), (2):

（1）

（2）

where w represents a weight, e0 represents a position of a pixel, and en represents a position of an adjacent pixel falling into L. And L-L is the deformation result of L. L-is an irregular grid containing offset positions Deltaen, compared to L.

As an alternative embodiment, the difference layer is a bilinear difference layer, and the difference layer adjusts the output of the convolution module in the following manners (3) and (4):

（3）

（4）

where, since it is noted that B is two-dimensional, wi is the corresponding weight, qj is the surrounding 4 pixel units participating in the calculation at the irregular sampling positions, and B (wi, qj) is the bilinear interpolation kernel.

In the prior art, a standard square convolution block is adopted for feature extraction, and in the actual use process, the feature extraction effect of images facing objects with different scales or deformations at different positions is poor; furthermore, the information amount is increased by fusing the residual connection mode, and the information of the next layer not only comprises the information x of the layer, but also comprises new information of the layer after nonlinear transformation. The processing makes the information show a gradual increasing trend instead, and avoids the problem of information loss caused by the increase of the network level. According to the encoder based on the deformable convolution, the residual error deformable convolution is added in the downsampling process to enable the feature encoder to identify the irregular pelvis medical features more effectively and accurately, the convolution layer is converted into the deformable convolution which is suitable for the edges of organs or focuses to adapt to images through offset operation, and further, the residual error connection and bilinear difference are added to optimize the deformable convolution, so that the effect of the convolution on feature extraction is more remarkable, and the feature extraction accuracy is improved.

The feature fusion module is composed of a Ladder-ASPP, and specifically comprises: the system comprises an upper global pooling layer and a lower fusion layer, wherein the upper global pooling layer at least comprises a global pooling module, and the output of the global pooling module is mixed with the input image of the feature fusion module; the downlink fusion layer at least comprises a plurality of parallel hole convolution layers with variable sampling rates and fusion layers, the hierarchy of each parallel hole convolution layer is different, and the input of each parallel hole convolution layer at least comprises an input image of the input feature fusion module and the output of the upper parallel hole convolution layer. ASPP is composed of a hole convolution (Atrous Convolution), which uses multiple parallel hole convolution layers with different sampling rates. Illustratively, as shown in fig. 4, an exemplary Ladder-ASPP includes an upper Global pooling layer (Global pooling), and the output of the upper Global pooling layer is subjected to a 1×1 convolution and Sigmoid and then is subjected to a mixing process with the input image of the feature fusion module, where the mixing process is weighted feature channel multiplication (Weighted feature channel multiplication); an exemplary Ladder-ASPP further includes a plurality of parallel hole convolution layers with variable sample rates for achieving feature fusion, e.g., a plurality of 3 x 3 convolution layers, whose coefficients of expansion may be 1, 2, 5, 7, respectively, with each sample rate extracted feature being further processed in a separate branch and fused via a concat (shown as C in fig. 4) to generate the input for the next branch.

As an alternative embodiment, the add pooling layer is calculated in the following manner (formula 5):

（5）

where x is the feature map obtained by the feature encoder, followed by global pooling, denoted by GPS (x), C1 denotes the weight of each feature channel convolved by 1 x 1, and B is the normalization of the feature weights.

The drop fusion layer is calculated as follows (formula 6):

（6）

where GK, D is the pooled output of the densely connected pyramid, where K is the level of the pyramid, D is the expansion rate, and further where @ denotes the cascading operation, each x is denoted as:

the feature fusion module provided by the invention constructs convolution kernels of different receptive fields through different hole expansion rates, is used for acquiring multi-scale object information, further processes the extracted features of each sampling rate in separate branches, and fuses the features to generate a final result, so that the processing precision of the processing effect of feature fusion is improved. In order to keep the feature information, a global pooling and 1×1 convolution path is further added to realize that important features are not lost, so that the image processing precision of the feature fusion module is improved, and the quality of the fused image is improved.

Further, the feature decoder includes a plurality of decoding modules, and an input of each decoding module includes at least an output of a decoding module of a previous layer and image information input to the feature decoder.

The training deformable nnUNet specifically comprises the following steps: establishing a loss function, training the deformable nnune by using back propagation; the loss function is formed by the sum of a cross entropy loss function and a dice overall loss function. Compared with the traditional dice total loss function which can learn that the gradient can not be reversely propagated when the detail loss is learned, the method provided by the invention introduces the cross entropy loss function to relieve the problem of gradient disappearance, so that the whole Lloss can update parameters more accurately, and the performance of the model is optimized.

Preferably, the loss function is specifically:

（7）

（8）

（9）

after the deformable nnune pre-training is completed, a 3D pelvic image is acquired, and one or more target areas in the 3D pelvic image are identified based on the deformable nnune, wherein the one or more target areas at least comprise: four important joints of pelvic bone structure: lumbar, sacrum, left hip and right hip.

calculating the proportion of the maximum connected component in the 3D pelvic image to the total area of the 3D pelvic image, and calculating the radius of the disc area based on the proportion value:

（10）

where R is the radius of the disk region, R is the area of the largest connected region in the segmentation result, and H and W are the height and width of the input image, respectively.

For the disk region, if the ratio value is larger, a larger S will be used; conversely, if the ratio value is smaller, a smaller S will be employed.

Obtaining segmented images of the one or more target regions based on reconstruction parameters, comprising: a target disk is constructed based on the reconstruction parameters, and segmented images of one or more target areas in the 3D pelvic image are obtained based on the target disk and the one or more target areas. As shown in fig. 5, fig. 5 shows a comparative example of the reconstructed medical image after morphological processing. The process of reconstruction is actually morphological reconstruction, resulting in segmented representations of lumbar, sacrum, left hip and right hip. Where MCR represents the largest connected region of the anatomy in the predicted outcome, SDF represents the distance function based on MCR symbols, GT represents the true segmentation outcome.

The morphological reconstruction enables binary image filtering with the large object kept unchanged, and is used to remove artifacts and void parts and smooth boundaries. Compared with the mode of fixing the size of the disc, the method flexibly determines the size of the disc according to the characteristics of the image, and if the disc is larger, more small areas can be removed. On the contrary, under the condition that the reconstruction parameter is smaller in value, the removed area is less, the key features of the image are reserved to the maximum extent, and the image segmentation effect is improved.

The image segmentation based on the deformable nnUNet provided by the invention can be applied to pelvis segmentation. Compared with nnUNet in the prior art, the deformable nnUNet provided by the invention uses the deformable convolution with residual structure to generate the feature map, and the original image is spliced with the outputs of different layers of the decoder to obtain better feature representation; in addition, unlike the common pyramid pooling module, the deformable nnune of the invention comprises a better feature fusion module Ladder-ASPP, and the Ladder-ASPP adopts a dense connection mode, but only needs smaller memory usage due to the use of depth separable convolution. And improving the picture identifiability in the morphological reconstruction process.

Firstly, extracting features by downsampling through a feature encoder with residual deformable convolution, performing feature fusion through a Ladder-ASPP module, inputting features decoded by a feature decoder with transposed convolution, identifying segmented images, performing back propagation to optimize performance through a cross entropy and race loss function, and processing the output segmented images through morphological post-processing. Finally, a three-dimensional CT image of 128X 128 in size is trained on the published pelvic dataset CTPLEMIC 1K, obtaining a final model.

The embodiment of the invention also provides an image segmentation device based on the deformable nnUNet, as shown in fig. 6, which comprises:

It is to be noted that this embodiment is an example of a device corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units that are not so close to solving the technical problem presented by the present invention are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. The image segmentation method based on the deformable nnune is characterized by comprising the following steps of:

obtaining segmented images of the one or more target regions based on the reconstruction parameters;

the deformable convolution-based feature encoder includes:

2. The deformable nnune-based image segmentation method of claim 1, wherein,

the adjusting the convolution kernels of the plurality of convolution layers specifically includes:

(1)

(2)

where w represents a weight, e0 represents a position of a pixel, and en represents a position of an adjacent pixel falling into L; L-L is the deformation result of L; l-is an irregular grid containing offset positions Deltaen, compared to L.

3. The deformable nnune-based image segmentation method of claim 1, wherein,

the difference layer adjusts the output of the convolution module in the following modes (3) and (4):

(3)

(4)

where wi is the corresponding weight, qj is the surrounding 4 pixel units participating in calculation at the irregular sampling position, and B (wi, qj) is the bilinear interpolation kernel.

4. The deformable nnune-based image segmentation method of claim 1, wherein,

the feature fusion module is composed of a Ladder-ASPP, and specifically comprises:

5. The deformable nnune-based image segmentation method of claim 4, wherein,

and the output of the upper global pooling layer is subjected to convolution and Sigmoid of 1 multiplied by 1 and then is subjected to mixing processing with the input image of the feature fusion module, and the mixing processing is weighted feature channel multiplication.

6. The deformable nnune-based image segmentation method of claim 4, wherein,

the upper path pooling layer is calculated by the following mode:

(5)

7. The deformable nnune-based image segmentation method of claim 4, wherein,

the drop fusion layer is calculated in the following way:

(6)

。

8. the deformable nnune-based image segmentation method of claim 1, wherein,

training the deformable nnune prior to identifying one or more target areas in the 3D pelvic image based on the deformable nnune;

the training deformable nnUNet specifically comprises the following steps:

9. An image segmentation apparatus based on deformable nnune, characterized in that the image segmentation apparatus based on deformable nnune comprises: