CN113256500B

CN113256500B - Deep learning neural network model system for multi-modal image synthesis

Info

Publication number: CN113256500B
Application number: CN202110746839.XA
Authority: CN
Inventors: 武王将; 杨瑞杰; 庄洪卿; 王皓
Original assignee: Peking University Third Hospital Peking University Third Clinical Medical College
Current assignee: Peking University Third Hospital Peking University Third Clinical Medical College
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2021-10-01
Anticipated expiration: 2041-07-02
Also published as: CN113256500A

Abstract

The invention relates to a deep learning neural network model system for multi-modal image synthesis, which comprises a multi-resolution residual deep neural network formed by combining a Residual Deep Neural Network (RDNN) and a multi-resolution optimization strategy; the RDNN includes a convolution layers, B exfoliation layers, C Batch Normalization (Batch Normalization) layers and D long-term residual connections; wherein, the convolution layer is used for extracting image characteristics; the falling layer is used for avoiding network overfitting; the batch normalization layer is used for normalizing the input of the corresponding convolution kernel; long-term residual concatenation is used to preserve structural information in the input image; each peeling layer is provided with two coiling layers on two sides and is connected with the coiling layers adjacent to the two sides; a convolution layer is arranged between each falling layer and each batch normalization layer; the shedding layer, the convolution layer and the batch normalization layer are connected in sequence; one end of each long-term residual error connection is connected between the convolution layer and the batch normalization layer; the other end is connected between the other group of convolution layers and the batch normalization layer.

Description

Deep learning neural network model system for multi-modal image synthesis

Technical Field

The invention relates to the technical field of medical image processing and guide treatment, in particular to a deep learning neural network model system for multi-modal image synthesis.

Background

Adaptive Radio Therapy (ART) technology based on CBCT images can improve the irradiation of rays to tumors and protect organs at risk near the tumors. However, prior art CBCT images HU have low accuracy, low soft tissue resolution and severe artifacts. Therefore, to implement Adaptive Radiotherapy (ART), it is first necessary to generate a synthetic ct (sct) image with high HU accuracy and structural fidelity based on a CBCT image.

U-Net and other deep learning networks have been widely used in the task of sCT image generation. Researches show that HU accuracy of the generated sCT image is remarkably improved, and dose calculation can be carried out on the basis of the sCT image. However, these sCT images are not highly fidelity to structures present in CBCT images, and produce image blur.

One important reason for the low fidelity of the structure of the sCT images described above is that previous studies did not optimize the network based on specific locations due to the difficulty of obtaining medical data. For example, in training a network generated by head sCT images, some studies train them with data of the head, abdomen, pelvis, etc. Secondly, the network lacks the constraint of improving the structural fidelity of the sCT image.

Disclosure of Invention

The invention aims to provide a deep learning neural network model system for multi-modal image synthesis, and solves the technical problems that how to enable the generated sCT image not only to have high HU accuracy, but also to have good fidelity to structural information in a CBCT image.

The invention aims to solve the defects of the prior art and provides a deep learning neural network model system for multi-modal image synthesis, which comprises a multi-resolution residual deep learning network (multi-resolution RDNN) formed by combining a residual deep learning network (RDNN) and a multi-resolution (multi-resolution) rate optimization strategy; the residual error deep learning network comprises A convolutional layers, B shedding layers (dropouts), C Batch Normalization layers and D long-term residual error connections (long-term residual errors); wherein, the convolution layer is used for extracting image characteristics; the falling layer is used for avoiding network overfitting; the batch normalization layer is used for standardizing the input of the corresponding convolution kernel, so that the network training process is stabilized and the learning efficiency is improved; the long-term residual connection is used for keeping the structural information in the deep image; the two sides of each release layer are provided with the convolution layers, and each release layer is connected with the convolution layers adjacent to the two sides; inputting the extracted image characteristics into the adjacent peeling layer by the convolution layer;

a convolution layer is arranged between each falling layer and each batch normalization layer; the shedding layer, the convolution layer and the batch normalization layer are connected in sequence;

one end of each long-term residual error connection is connected between the convolution layer and the batch normalization layer; the other end of each long-term residual error connection is connected between the other group of convolution layers and the batch normalization layer;

the multi-resolution optimization strategy is to sequentially optimize the optimization process in different resolution modes from low to high; at low resolution, the optimization process focuses only on the profile of the entire image and does not need to be tied to local details in the image; the detail information in the image is learned in a higher resolution mode.

Preferably, the residual deep learning network includes 15 convolutional layers.

Each of the convolutional layers may be represented as a (k') k convolutional layer, in the form of n, where k represents the size of the convolutional kernels, n represents the number of the convolutional kernels, and the activation function used is elu.

Preferably, the Dropout Rate (Dropout Rate) in the residual deep learning network is 20%.

Preferably, 7 shedding layers are arranged.

Preferably, 7 batch normalization layers are arranged.

Preferably, there are 3 long-term residual connections.

Preferably, the residual deep learning network does not use pooling layer (pooling) to avoid loss of structural information in the image.

Advantageous effects

Compared with the prior art, the invention has the beneficial effects that:

the deep learning neural network model system for multi-modal image synthesis provided by the invention optimizes and tests a specially designed multi-resolution residual deep neural network by using the image of a specific part of a patient, and combines the residual deep neural network provided by the invention with a multi-resolution optimization strategy to form the multi-resolution residual deep neural network (see figure 2). The network integrates the advantages of the two, so that the generated sCT image has high HU accuracy and good fidelity to the structural information in the CBCT image.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a schematic structural diagram of a residual deep neural network according to the present invention.

Fig. 2 is a schematic structural diagram of a multi-resolution residual deep neural network according to the present invention.

Detailed Description

The present invention is described in more detail below to facilitate an understanding of the present invention.

As shown in fig. 1 and fig. 2, the deep learning neural network model system for multimodal image synthesis according to the present invention includes a multi-resolution residual deep neural network formed by combining a Residual Deep Neural Network (RDNN) and a multi-resolution optimization strategy; the residual deep neural network includes 15 convolutional layers, each convolutional layer may be represented as (k') conv, n, where k represents the size of a convolution kernel, and n represents the number of convolution kernels. To avoid network overfitting, 7 shedding layers were added to the residual deep neural network (shedding rate of 20%); 7 batch normalization layers were used to normalize the input of the corresponding convolution kernel, thereby stabilizing the network training process and improving learning efficiency. To better preserve structural information in CBCT images, we add three long-term residual connections. Since pooling layers can lead to loss of structural information in the image, we do not use pooling layers in the present network. The two sides of each release layer are provided with the convolution layers, and each release layer is connected with the convolution layers adjacent to the two sides; inputting the extracted image characteristics into the adjacent peeling layer by the convolution layer; a convolution layer is arranged between each falling layer and each batch normalization layer; the shedding layer, the convolution layer and the batch normalization layer are connected in sequence; one end of each long-term residual error connection is connected between the convolution layer and the batch normalization layer; the other end of each long-term residual connection is connected between the other set of convolutional layers and the batch normalization layer.

The multi-resolution optimization process is similar to the way the human visual system observes something. The mechanism sequentially optimizes the optimization process in different resolution modes from low to high. At low resolution, the optimization process can focus only on the profile of the entire image and does not need to be tied to local details in the image. The detail information in the image is learned in a higher resolution mode.

In the application, the residual deep neural network is trained by using a low-resolution image, and then fine-tuned by using a medium-resolution image and a high-resolution image, so that the network can continuously learn detail information in the image until the sCT image with high HU accuracy and high structural fidelity is synthesized.

The applicant respectively makes comparison tests for the multi-modal image synthesis deep learning neural network model system in the prior art and the application, and as the HU accuracy of the sCT image related to the test result and the fidelity of the structural information in the CBCT image do not have accurate quantitative indexes as those of the physical test and the chemical test, and more needs depend on the subjective judgment of the doctor, the applicant organizes 50 doctors with clinical experience of the tumor for more than 5 years in the test process, tests the sCT image and the CBCT image of 5 different patients, and the test results are as follows: (in this test 7 exfoliation layers, 7 batch normalization layers, and 3 long-term residual connections were provided).

Regarding the sCT image HU of the patient 1, the doctors with high HU accuracy that the sCT image generated in the prior art has are considered to be 0 person, and the doctors with high HU accuracy that the sCT image generated in the present application has are considered to be 50 persons.

Regarding the fidelity of the structural information in the CBCT image of the patient 1, the doctor who has better fidelity to the structural information in the CBCT image in the prior art is considered to be 0 person, the doctor who has better fidelity to the structural information in the CBCT image in the present application is considered to be 49 persons, and the doctor who is not much different from the doctor is considered to be 1 person.

To patient 2's sCT image HU, think that the doctor that the HU accuracy height that the sCT image that prior art produced was had is 0 people, think that the doctor that the HU accuracy height that the sCT image that this application produced was high is 48 people, think that the doctor that the two is little is 2 people.

Regarding the fidelity of the structural information in the CBCT image of the patient 2, 0 doctors having better fidelity to the structural information in the CBCT image in the prior art were considered, and 50 doctors having better fidelity to the structural information in the CBCT image in the present application were considered.

Regarding the sCT image HU of the patient 3, the doctors with high HU accuracy that the sCT image generated in the prior art has are considered to be 0 person, and the doctors with high HU accuracy that the sCT image generated in the present application has are considered to be 50 persons.

Regarding the fidelity of the structural information in the CBCT image of the patient 3, the doctor who has better fidelity to the structural information in the CBCT image in the prior art is considered to be 0 person, the doctor who has better fidelity to the structural information in the CBCT image in the present application is considered to be 47 persons, and the doctor who is not much different from the doctor is considered to be 3 persons.

Regarding the sCT image HU of the patient 4, the doctors with high HU accuracy that the sCT image generated in the prior art has are considered to be 0 person, and the doctors with high HU accuracy that the sCT image generated in the present application has are considered to be 50 persons.

Regarding the fidelity of the structural information in the CBCT image of the patient 4, the doctor who has better fidelity to the structural information in the CBCT image in the prior art is considered to be 0, the doctor who has better fidelity to the structural information in the CBCT image in the present application is considered to be 49, and the doctor who is not much different from the doctor is considered to be 1.

Regarding the sCT image HU of the patient 5, it is considered that the sCT image generated in the prior art has 0 doctors with high HU accuracy, and it is considered that the sCT image generated in the present application has 50 doctors with high HU accuracy.

Regarding the fidelity of the structural information in the CBCT image of the patient 5, the doctor who has better fidelity to the structural information in the CBCT image in the prior art is considered to be 0 person, the doctor who has better fidelity to the structural information in the CBCT image in the present application is considered to be 45 persons, and the doctor who is not much different from the doctor is considered to be 5 persons.

The above experimental data show that, compared with the prior art, the method and the device for generating the sCT image have higher HU accuracy and better fidelity to the structural information in the CBCT image.

The foregoing describes preferred embodiments of the present invention, but is not intended to limit the invention thereto. Modifications and variations of the embodiments disclosed herein may be made by those skilled in the art without departing from the scope and spirit of the invention.

Claims

1. A deep learning neural network model system for multi-modal image synthesis is characterized by comprising a multi-resolution residual deep neural network formed by combining with a multi-resolution optimization strategy; firstly, training a residual deep neural network by using a low-resolution image, and then finely adjusting the residual deep neural network by using a medium-resolution image and a high-resolution image, so that the network can continuously learn detail information in the image until synthesizing an sCT image with high HU accuracy and structure fidelity; the residual deep neural network comprises A convolutional layers, B shedding layers, C batch normalization layers and D long-term residual errors which are connected; wherein, the convolution layer is used for extracting image characteristics; the falling layer is used for avoiding network overfitting; the batch normalization layer is used for standardizing the input of the corresponding convolution kernel, so that the network training process is stabilized and the learning efficiency is improved; the long-term residual connection is used for keeping structural information in the input image;

the two sides of each release layer are provided with the convolution layers, and each release layer is connected with the convolution layers adjacent to the two sides; inputting the extracted image characteristics into the adjacent peeling layer by the convolution layer;

one end of each long-term residual error connection is connected between the convolution layer and the batch normalization layer; the other end of each long-term residual connection is connected between the other set of convolutional layers and the batch normalization layer.

2. The system of claim 1, wherein the residual deep neural network comprises 15 convolutional layers.

3. The system of claim 1, wherein the residual deep neural network has a shedding rate of 20%.

4. The system of claim 1, wherein there are 7 exfoliation layers.

5. The system of claim 1, wherein there are 7 batch normalization layers.

6. The system of claim 1, wherein there are 3 long-term residual connections.

7. The system of claim 1, wherein no pooling layer is used in the residual deep neural network to avoid loss of structural information in the image.