CN116416261A

CN116416261A - CT image super-resolution segmentation method assisted by super-resolution reconstruction

Info

Publication number: CN116416261A
Application number: CN202310682299.2A
Authority: CN
Inventors: 葛荣骏; 徐颖; 张道强
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-07-11
Anticipated expiration: 2043-06-09
Also published as: CN116416261B

Abstract

The invention discloses a CT image super-resolution segmentation method assisted by super-resolution reconstruction, which comprises the following steps: 4 times down-sampling the original CT image by using a bicubic algorithm; -converting said low resolution image I _lr Inputting the super-resolution reconstruction and super-resolution segmentation into an encoder through two independent decoder branches; extracting multi-scale features by using a multi-scale fusion module MSFB, and transmitting intermediate features of the encoder to a decoder; fusing intermediate features of the encoder and decoder using a dual channel attention module DCAB; the model described above is optimized by a loss function.

Description

CT image super-resolution segmentation method assisted by super-resolution reconstruction

Technical Field

The invention belongs to the technical field of medical image processing, relates to the technical field of CT image reconstruction and segmentation, and particularly relates to a CT image super-resolution segmentation method assisted by super-resolution reconstruction.

Background

In clinical diagnosis, it is important to perform super-resolution (SR) segmentation on medical scan images. Existing segmentation techniques aim at segmenting regions of interest, such as vital organs or infected regions, from medical images, thereby obtaining important information about the size, shape and location of the region. However, for some regions where the anatomy is complex, the segmentation mask of the original resolution may not accurately express the segmented region, and thus it is necessary to predict a segmentation mask of high resolution from a CT image of low resolution using the SR segmentation method. However, the low resolution image contains limited detailed information that is insufficient to support the prediction of a precise high resolution segmentation mask. Therefore, we consider predicting a corresponding high resolution CT from a low resolution CT using SR reconstruction techniques, with low-level features restored during reconstruction, such as texture and edges, to aid in predicting the high resolution segmentation mask.

Existing methods for assisting SR segmentation using SR reconstruction are mainly divided into two categories: the first is to take the SR reconstruction as a preprocessing step of the image, which ignores the correlation and complementarity between the SR reconstruction and the SR segmentation task. In fact, not only can the detail information of the SR reconstruction process assist the SR segmentation process to generate a more accurate segmentation mask, but also abstract semantic information provided by the SR segmentation process can guide the SR reconstruction process to generate texture details more in line with real distribution; the second is to combine the SR reconstruction model and the SR segmentation model in a serial manner, which allows the SR reconstruction and SR segmentation processes to interact and adjust to each other, but the interaction between the two processes is still insufficient. Furthermore, the parallel approach can lead to an accumulation of errors. Thus, there is still a lack of a method that can effectively combine SR reconstruction and SR segmentation, thereby enabling the two processes to interact.

Disclosure of Invention

The invention aims to: the invention aims to provide a CT image super-resolution segmentation method assisted by super-resolution reconstruction. The method utilizes complementarity between the SR reconstruction and the SR segmentation task, specifically, the detailed characteristics such as texture, edge and the like restored in the SR reconstruction process can help the SR segmentation process to predict a more accurate segmentation mask, and abstract semantic features extracted in the SR segmentation process can also guide the SR reconstruction process to generate texture details more conforming to real distribution. In addition, considering that the size of the region of interest in the CT image varies greatly, the method uses multi-scale large-kernel convolution to extract multi-scale features, thereby further improving the performance of reconstruction and segmentation.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a method for CT image super-resolution segmentation with super-resolution reconstruction assistance, comprising the steps of:

s1: downsampling the original CT image using a bicubic interpolation algorithm to downsample the original CT image 4 times to a low resolution image

Use of the original CT image +.>

And split tag->

Respectively used as a super-resolution reconstruction tag and a super-resolution segmentation tag;

s2: the low resolution image in step S1

Inputting the super-resolution reconstruction and super-resolution segmentation into an encoder through two independent decoder branches;

s3: the multi-scale features are extracted using a multi-scale fusion Module (MSFB) and the intermediate features generated during the encoding in step S2 are passed to a decoder.

S4: the intermediate features of the encoder and decoder in step S2 are fused using a two-channel attention module (DCAB).

S5: the model described above is optimized by a loss function.

The scheme fully utilizes complementarity between the SR reconstruction and the SR segmentation task, specifically, the detailed characteristics such as texture, edge and the like restored in the SR reconstruction process can help the SR segmentation process to predict a more accurate segmentation mask, and abstract semantic characteristics extracted in the SR segmentation process can also guide the SR reconstruction process to generate texture details more conforming to real distribution.

Further, a common encoder is utilized in the step S2Features are extracted, and then features conforming to the respective tasks are extracted through independent decoder branches. Specifically, the intermediate features of the decoder of the reconstruction branch contain more detail features of low semantic information such as edges, textures and the like, and the intermediate features of the decoder of the segmentation branch contain more abstract high-level features. The method comprises the following steps: for a given input

It is first encoded by three serial convolution modules comprising 2 layers +.>

convolution-Relu activation function-BN normalization layer, downsampling layer is max-pooling downsampling, +.>

Processing by a first convolution module to obtain a first coding feature +.>

The feature generates a second coding feature via the downsampling layer and a second convolution module>

And so on, respectively obtaining the third coding feature +.>

And bottleneck characteristics->

Will get->

And the two decoders are identical in structure and comprise three serial up-sampling layers and a convolution module, and the up-sampling layers use a bilinear interpolation method. Taking the SR split branch decoder as an example, +.>

Input to decoder and upsampledThe layer and convolution module gets a third segmentation feature +.>

By doing so, the second segmentation feature +.>

First segmentation feature->

Third reconstruction feature in SR reconstruction branch decoder

Second reconstruction feature->

First reconstruction feature->

. The common encoder utilizes the correlation and complementarity between reconstruction and segmentation tasks to perform preliminary fusion of reconstruction and segmentation features. The independent decoder branches take the difference between different tasks into consideration, so that the mutual side effect between the tasks is avoided.

Further, in the step S3, the features in each layer of the encoder are fused by using the MSFB module, the multi-scale features are extracted, and the result is sent to the decoder, where each branch includes three parallel MSFB modules, specifically, the first MFSB module of the SR splitting branch, which will

，/>

，/>

Interpolation to the same size, and splicing to obtain splicing characteristics

With three parallel large-kernel convolutions, the convolution kernels are +.>

Extracting a first multi-scale segmentation residual +.>

Feed-forward neural network FFN pair +.>

Further adjusting to obtain a first split residual +.>

，/>

In a spliced manner with->

Binding to obtain new->

The subsequent modules input to the split decoder and so on, the remaining MSFB modules are according to +.>

Extracting the second partition residual->

Third partition residual->

First reconstruction residual->

Second reconstructed residual->

Third reconstruction residual->

And spliced with corresponding decoder intermediate features. These residual features contain multi-scale information to enable the model to better cope with different organs or diseases in the medical imageThe problem of large range area size variations and complements the information that the encoder loses when downsampling.

Further, the MSFB module in step S3 uses a large-kernel convolution, which we decompose into three smaller convolutions in series in this method. For one of

We decompose it into three parts: one or more of

Is depthwise convolution, DWconv, a +.>

DWDconv and a point-by-point convolution pointwise convolution, PWconv, in the present method, in order to achieve a 9 x 9 large-kernel convolution +.>

The first scale feature is obtained by passing through 3X 3 DWconv, 5X 5 DWDconv and PWconv in sequence>

The method comprises the steps of carrying out a first treatment on the surface of the To achieve a large kernel convolution of 27 x 27, < >>

The second scale feature is obtained by passing 5X 5 DWconv, 7X 7 DWDconv and PWconv in sequence>

The method comprises the steps of carrying out a first treatment on the surface of the For the convolution of 3X 3, it is not decomposed +.>

Third scale feature +.>

Will be

、/>

、/>

And splicing and fusing, and sending the result to a subsequent module of the MSFB. The large-kernel convolution enables the model to have a larger receptive field, so that the performance is improved, and the large-kernel convolution has the defect of large calculation cost and is unfavorable for the deployment of algorithms. By decomposing the large-kernel convolution into three serial convolutions, our method effectively reduces the large overhead that large-kernel convolutions bring.

Further, the step S4 uses the DCAB module to fuse the features of the reconstruction and segmentation branches by using a cross-attention mechanism,

、/>

、/>

feature adjustment is carried out through a convolution module respectively to obtain segmentation fusion features +.>

Reconstruction of fusion characteristics->

Input fusion feature->

To supplement the segmentation fusion feature with detailed information, avoiding its negative influence on the reconstructed feature, will +.>

And->

Adding to obtain new segmentation fusion feature->

，

And->

Is mapped separately into separate query features>

Split key feature->

Segmentation candidate feature->

And rebuild query feature->

Reconstruction of key value features->

Reconstruction candidate feature->

The above features are fused by using a cross-attention operation, and the fusion result is subjected to a local enhancement feed-forward network LEFF to obtain a new segmentation fusion feature +.>

And reconstructing fusion characteristics->

The process can be expressed as:

，

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing feature dimensions, LEFF represents parameters of a locally enhanced feed forward network, < + > in order to guarantee feature stability>

And->

Respectively and->

And->

Adding to obtain new->

And->

. The module fully exploits complementarity between the reconstructed and segmented features: the detail features such as edges, textures and the like generated in the reconstructed features can be used as supplements of low-resolution input, so that the segmentation branches can be helped to accurately predict a high-resolution segmentation mask; abstract semantic features generated in the segmentation process can also guide the reconstruction branches to generate more real detail features.

Further, the pair in the step S5

And->

Respectively performing pixelshutdown up-sampling, and calculating a loss function with each label, wherein the loss function of the SR segmentation task comprises cross entropy loss and dice loss, the loss function of the SR reconstruction task comprises L1 loss, and in order to balance the loss functions of the two tasks, a dynamic adjustment mechanism is used, and the specific expression of the loss function is as follows:

wherein (1)>

And->

Respectively represent pair->

And->

Final result of pixelshutdown up-sampling, +.>

And->

Representing a real super-resolution reconstruction tag and a super-resolution segmentation tag, respectively, < >>

Representing the calculation->

Loss (S)>

Representing the calculated cross entropy loss,/->

Representing the calculated race loss,/->

And->

Representing the calculated reconstruction loss and segmentation loss, respectively, in order to balance the loss functions of the two tasks, use +.>

And->

Calculating the scaling factor of dynamic change and weighting the scaling factor to obtain the final loss function +.>

。

The beneficial effects are that: compared with the prior art, the invention has the following advantages:

1. the invention provides a CT image super-resolution segmentation method assisted by super-resolution reconstruction by utilizing complementarity between super-resolution reconstruction and super-resolution segmentation;

2. the invention uses the characteristics of each layer of the parallel large-core convolution fusion encoder, and extracts multi-scale characteristics from the characteristics so as to better process the situation that the sizes of organs in the medical image are greatly different;

3. the invention provides a DCAB module which utilizes cross-attention operation to effectively fuse SR reconstruction characteristics and SR segmentation characteristics, so that the performances of two tasks can be improved;

4. the present invention uses dynamic weights to balance the reconstruction and segmentation tasks, dynamically adjusting the loss function.

Drawings

FIG. 1 is a schematic flow chart of a CT image super-resolution segmentation model assisted by super-resolution reconstruction;

FIG. 2 is a general frame structure diagram of a CT image super-resolution segmentation model assisted by super-resolution reconstruction provided by the invention;

FIG. 3 is a schematic diagram of a dual channel attention module (DCAB) topology according to the present invention;

FIG. 4 is a graph showing the comparison of the results of super-resolution segmentation;

fig. 5 is a comparative graph of the results of super-resolution reconstruction.

Detailed Description

The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various modifications of the invention, which are equivalent to those skilled in the art upon reading the invention, will fall within the scope of the invention as defined in the appended claims.

Examples: the super-resolution segmentation task and the super-resolution reconstruction task have great relevance and complementarity. The super-resolution reconstruction task can gradually restore detail characteristics in the reconstruction process, supplement limited detail information in the input low-resolution CT image, and help the super-resolution segmentation task to more accurately predict the segmentation mask; the super-resolution segmentation can extract abstracted semantic information and guide the reconstruction process to generate texture details which are more in line with real distribution. The method uses two parallel branches to process reconstruction and segmentation tasks simultaneously, and designs a special fusion module to effectively fuse the middle characteristics of different branches, so that the reconstruction and segmentation tasks are mutually promoted.

The invention comprises the following steps:

Use of the original CT image +.>

And split tag->

the data set used contains the CT image and its corresponding segmentation label. To meet the requirement of the method, we downsampled the CT image 4 times by the bicubic interpolation algorithm in the traditional image processing algorithm, and take it as the low resolution input of the model.

S2: the low resolution image in step S1

The encoder is input and super-resolution reconstruction and super-resolution segmentation are performed by two independent decoder branches, respectively.

As shown in fig. 2, the method uses a common encoder to extract features, performs preliminary fusion on SR reconstruction features and SR segmentation features, and then extracts features fitting respective tasks through independent decoder branches. Specifically, the intermediate features of the decoder of the reconstruction branch comprise more detail features of low semantic information such as edges, textures and the like, and the intermediate features of the decoder of the segmentation branch comprise more abstractAdvanced features. For a given input

It is first encoded by three serial convolution modules comprising 2 layers +.>

Processing by a first convolution module to obtain a first coding feature +.>

And so on, respectively obtaining the third coding feature +.>

And bottleneck characteristics->

Will get->

After input to the decoder, the third segmentation feature is obtained by an upsampling layer and a convolution module>

By doing so, the second segmentation feature +.>

First segmentation feature->

And third reconstruction feature +.>

Second reconstruction feature->

First reconstruction feature->

As shown in fig. 2, the method utilizes the MSFB module to fuse the features in each layer of the encoder, extract the multi-scale features, and send the result to the decoder. The first MFSB module of the SR partition branch, which module is to

，/>

，/>

Interpolation to the same size, and splicing to obtain splicing characteristics->

Three parallel large-kernel convolutions are utilized, the convolution kernels are respectively

Extracting a first multi-scale segmentation residual +.>

Feed-forward neural network FFN pair +.>

Further adjusting to obtain a first split residual +.>

，/>

In a spliced manner with->

Binding to obtain new->

Extracting a second segmentation residual

Third partition residual->

First reconstruction residual->

Second reconstructed residual->

Third reconstruction residual

And spliced with corresponding decoder intermediate features. These residual features contain multi-scale information, which enables the model to better cope with the problem of large changes in the size of different organs or lesion areas in the medical image, and supplements the information lost by the encoder during downsampling.

The MSFB module in the step S3Using a large kernel convolution, we decompose the large kernel convolution into three smaller convolutions in series in this method. For one of

We decompose it into three parts: one->

Is depthwise convolution, DWconv, a +.>

The method comprises the steps of carrying out a first treatment on the surface of the For a 3×3 convolution, it is not decomposed

Third scale feature +.>

Will->

、/>

、/>

As shown in fig. 3, the DCAB module fuses features of the reconstructed and split branches using a cross-attention mechanism.

、/>

、/>

Reconstruction of fusion characteristics->

Input fusion feature->

And->

Adding to obtain new segmentation fusion feature->

，

And->

Is mapped separately into separate query features>

Split key feature->

Segmentation candidate feature->

And rebuild query feature->

Reconstruction of key value features->

Reconstruction candidate feature->

And reconstructing fusion characteristics->

The process can be expressed as:

，

And->

Respectively and->

And->

Adding to obtain new->

And->

S5: the model described above is optimized by a loss function.

To demonstrate the effectiveness of the present invention, the present invention also provides the following comparative experiments:

in particular, the present invention uses SegTHOR to disclose a dataset that is a chest multi-organ segmentation dataset comprising segmentation tags for 4 organs: heart, aorta, trachea and esophagus. The dataset contained 40 CT scans of the patient, we randomly selected 28 as the training set, 4 as the validation set, and 8 as the test set. Before formal training, we intercept the Hu values between [ -128, 384] and pre-process them as described in step S1. The training process adopts an Adamw optimizer, the initial learning rate is 0.001, and the total training period is 150.

In order to verify the effectiveness of the method in reconstruction and segmentation, we compare with the segmentation and reconstruction algorithm with the best current effect. On the segmentation task, the experimental results of the method are compared with CPFNet, kiU-Net, unet++, UCTransNet, and the evaluation indexes are universal dice and hd95, and the comparison results are shown in Table 1. It can be seen that the method has a more obvious improvement in segmentation performance compared with other methods; in the reconstruction task, the experimental result of the method is compared with RDN, EDSR, NSLA, the evaluation indexes are psnr and ssim, and the comparison result is shown in table 2.

Table 1 the results of the method compared with other algorithms on the segmentation task, the bolded data represent the best performing data. (Eso oesophageal Hea: heart Tra: trachea Aor: aorta)

Table 2 the method compares the results of the reconstruction task with other algorithms and the bolded data represents the best performing data.

To intuitively demonstrate the effectiveness of the present method, we compare the results of the present method with other methods in visual effect. FIG. 3 shows the segmentation results of the methods, and it can be seen that the method can segment the organs more accurately than other methods; FIG. 4 is a reconstruction of the various methods, which can accurately restore the unclear boundary between the esophagus and the aorta, as shown, thanks to the semantic information provided by the segmentation process.

Claims

1. The CT image super-resolution segmentation method assisted by super-resolution reconstruction is characterized by comprising the following steps of:

Use of the original CT image +.>

And split tag->

s2: the low resolution image in step S1

s3: extracting multi-scale features by using a multi-scale fusion module MSFB, and transmitting intermediate features generated in the encoding process in the step S2 to a decoder;

s4: fusing intermediate features of the encoder and decoder in step S2 using a dual channel attention module DCAB;

s5: the model described above is optimized by a loss function.

2. The super-resolution reconstruction-assisted CT image super-resolution segmentation method according to claim 1, wherein: in the step S2, features are extracted by using a common encoder, the SR reconstruction features and the SR segmentation features are primarily fused, and then features matching with respective tasks are respectively extracted by independent decoder branches, which is specifically as follows: for a given input

It is first encoded by three serial convolution modules comprising 2 layers +.>

Processing by a first convolution module to obtain a first coding feature +.>

And so on, respectively obtaining the third coding feature +.>

And bottleneck characteristics->

Will get->

And the two decoders are identical in structure and comprise three serial up-sampling layers and a convolution module, and the up-sampling layers use a bilinear interpolation method.

3. The super-resolution reconstruction-assisted CT image super-resolution segmentation method according to claim 1, wherein: in the step S3, features in each layer of the encoder are fused by using the MSFB module, multi-scale features are extracted, and the result is sent to the decoder, wherein each branch comprises three parallel MSFB modules, specifically, the first MFSB module of the SR splitting branch is as follows, and the module will

，/>

，/>

Using three parallel large core volumesThe convolution kernels are +.>

Extracting a first multi-scale segmentation residual +.>

Feed-forward neural network FFN pair +.>

Further adjusting to obtain a first split residual +.>

，/>

In a spliced manner with

Binding to obtain new->

Extracting the second partition residual->

Third partition residual->

First reconstruction residual->

Second reconstructed residual->

Third reconstruction residual->

And spliced with corresponding decoder intermediate features.

4. The super-resolution reconstruction-assisted CT image super-resolution segmentation method according to claim 1, wherein: the MSFB module in step S3 uses large-kernel convolution for one

Is decomposed into three parts: one->

Is depthwise convolution, DWconv, a +.>

Is a depth-expanded convolution depthwise dilation convolution, DWDconv and a point-wise convolution pointwise convolution, PWconv,/->

The method comprises the steps of carrying out a first treatment on the surface of the For a convolution of 3×3, < >>

Third scale feature +.>

Will->

、/>

、/>

And splicing and fusing, and sending the result to a subsequent module of the MSFB.

5. The super-resolution reconstruction-assisted CT image super-resolution segmentation method according to claim 1, wherein: the DCAB module is used in said step S4 to fuse the features of the reconstructed and split branches using a cross-attention mechanism,

、/>

、/>

Reconstruction of fusion characteristics->

Input fusion feature->

Will->

And->

Adding to obtain new segmentation fusion feature->

，/>

And->

Is mapped separately into separate query features>

Split key feature->

Segmentation candidate feature->

And rebuild query feature->

Reconstruction of key value features->

Reconstruction candidate feature->

And reconstructing fusion characteristics->

The process can be expressed as:

，

representing feature dimensions, LEFF represents parameters of the locally enhanced feed forward network, in order to guarantee the stability of the feature,