CN115100045A

CN115100045A - Method and device for converting modality of image data

Info

Publication number: CN115100045A
Application number: CN202210573549.4A
Authority: CN
Inventors: 蒋雪; 徐晨阳; 魏军; 田孟秋
Original assignee: Perception Vision Medical Technology Co ltd
Current assignee: Perception Vision Medical Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-09-23

Abstract

The invention provides a modal conversion method and a modal conversion device of image data, which relate to the technical field of medical image processing, and the modal conversion method of the image data comprises the following steps: acquiring an image to be converted; extracting a main body area contained in an image to be converted, and carrying out gray level conversion on the main body area to obtain a main body image corresponding to the main body area; inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model; and restoring the converted image to the image to be converted to obtain a target image corresponding to the image to be converted. According to the modal conversion method and device of the image data, the quality of the converted image obtained through the generator network is relatively better, the texture is clearer, meanwhile, no manual participation is needed in the modal conversion process, the operation speed is high, and the application range is wider.

Description

Method and device for converting modality of image data

Technical Field

The present invention relates to the technical field of medical image processing, and in particular, to a method and an apparatus for modality conversion of image data.

Background

Radiation therapy is one of the most widely used cancer treatment therapies, primarily using different forms of energy (X-rays, particles, etc.) radiation to destroy and destroy tumor tissue while trying to avoid irradiating normal tissue. During the radiation treatment procedure, the physician uses the linear accelerator to take CBCT (Cone-Beam Computed Tomography) images of the treatment site, which eliminates the positioning error during the treatment process and allows the patient and the ct (Computed Tomography) images used during the radiation treatment planning to be located in the same coordinate system.

Because CBCT images shot by a conventional linear accelerator are limited to eliminate positioning errors due to insufficient shooting range, large artifacts, poor image resolution and the like, image quality is generally improved by converting CBCT images into CT images so as to provide more subsequent possibilities for radiotherapy research.

However, when the CBCT image is subjected to image conversion at present, the defects of unclear texture and poor reality of the generated image exist, and the use requirement is difficult to achieve.

Disclosure of Invention

Accordingly, the present invention is directed to a method and an apparatus for modality conversion of image data to alleviate the above-mentioned technical problems.

In a first aspect, an embodiment of the present invention provides a method for modality conversion of image data, including: acquiring an image to be converted, wherein the image to be converted is a CBCT image; extracting a main body area contained in the image to be converted, and carrying out gray level transformation on the main body area to obtain a main body image corresponding to the main body area; inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model; wherein, the conversion image is a CT image; the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each layer of the up-sampling layers is connected with the down-sampling layers in a sliding manner; and restoring the converted image to the image to be converted to obtain a target image corresponding to the image to be converted, wherein the target image comprises the converted image.

Preferably, in a possible implementation manner, the step of extracting a main body region included in the image to be converted, and performing gray-scale transformation on the main body region to obtain a main body image corresponding to the main body region includes: extracting a gray level histogram of each layer of image contained in the CBCT image; and intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the intercepted gray level value of the main body area to obtain a main body image corresponding to the main body area.

Preferably, in a possible implementation, the step of restoring the converted image to the image to be converted includes: and carrying out reverse operation on the converted image based on the extraction rule of the gray level histogram and the preset interception rule so as to restore the converted image to the original size matched with the image to be converted.

Preferably, in a possible embodiment, the method further comprises: acquiring a preset training set, wherein the training set comprises at least one pair of image data, each pair of image data comprises first image data and second image data, the first image data comprises a positioning CT image, and the second image data is a CBCT image corresponding to the first image data; inputting the training set into a pre-constructed generation confrontation network model, and respectively carrying out confrontation training on a generator network and a discriminator network included in the generation confrontation network model so as to update parameters of the generator network and the discriminator network; and storing the updated parameters of the generator network and the discriminator network, and storing the generator network as an image conversion model for carrying out modal conversion on the input image to be converted.

Preferably, in a possible implementation, the loss function of the generator network envelopes the L1 loss function, the style loss function, and the content loss function, and is used to perform style migration on the to-be-converted image.

Preferably, in a possible implementation manner, the step of inputting the training set to a pre-constructed generative confrontation network model and performing confrontation training on the generator network and the discriminator network included in the generative confrontation network model respectively includes: continuously executing the following steps until the generator network and the discriminator network meet a preset training condition, wherein the preset training condition comprises that the training times reach a preset number, or the loss value of the generator network is smaller than a preset loss threshold value and the loss value of the discriminator network is maintained in a preset range: freezing parameters of the generator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data; inputting the pseudo image and first image data corresponding to the second image data into the discriminator network for judgment, and calculating a loss value of the discriminator network; propagating loss values of the arbiter network back into the arbiter network to update parameters of the arbiter network; freezing parameters of the discriminator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data; calculating loss values of the generator network, and back-propagating the loss values of the generator network into the generator network to update parameters of the generator network.

Preferably, in one possible embodiment, the above L1 loss function is calculated by the formula

Wherein n is the number of input image data, yi is the first image data in the training set, and f (xi) is the dummy image corresponding to the second image data; the formula for calculating the style loss function is as follows:

wherein the gamma matrix represents the inner product group of vectorsThe CT _ style represents the style characteristic of a positioning CT image included in the first image data, and the sct _ style represents the style characteristic of a pseudo image corresponding to the second image data obtained after the second image data is input to the generator network; the calculation formula of the content loss function is as follows:

wherein CBCT _ feat represents a content feature of a CBCT image included in the second image data, and sct _ feat represents a content feature of a dummy image corresponding to the second image data obtained after the second image data is input to the generator network.

Preferably, in a possible embodiment, the method further comprises: acquiring a pre-acquired image data set, wherein the image data set comprises a plurality of data pairs, and each data pair comprises a positioning CT image and a CBCT image corresponding to the positioning CT image; carrying out deformation registration on the data pair, and preprocessing the registered data pair; and performing data amplification on the data pair obtained after the preprocessing according to a preset amplification scheme to generate a training data set, wherein the training data set comprises a training set.

Preferably, in a possible implementation, the step of preprocessing the registered data pair includes: the following processing is respectively executed on the positioning CT image included in the data pair and the CBCT image corresponding to the positioning CT image: respectively extracting a gray level histogram of each layer of image contained in the positioning CT image and the CBCT image; intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the intercepted gray level of the main body area to obtain a main body image corresponding to the main body area; and storing the main body image as a preprocessed data pair corresponding to the positioning CT image and the CBCT image.

In a second aspect, an embodiment of the present invention further provides a device for modality conversion of image data, where the device includes: the acquisition module is used for acquiring an image to be converted, wherein the image to be converted is a CBCT image; the transformation module is used for extracting a main body area contained in the image to be transformed and carrying out gray level transformation on the main body area to obtain a main body image corresponding to the main body area; the conversion module is used for inputting the main body image into a pre-trained image conversion model and outputting a conversion image corresponding to the main body image through the image conversion model; wherein the converted image is a CT image; the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each layer of the up-sampling layers is connected with the down-sampling layers in a sliding manner; and the restoring module is used for restoring the converted image to the image to be converted so as to obtain a target image corresponding to the image to be converted, wherein the target image comprises the converted image.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method in the first aspect when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method in the first aspect.

The embodiment of the invention brings the following beneficial effects:

the method and the device for modality conversion of the image data provided by the embodiment of the invention can acquire the image to be converted; extracting a main body area contained in the image to be converted, and performing gray level conversion on the main body area to obtain a main body image corresponding to the main body area; then inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model; and then the converted image is restored to the image to be converted to obtain a target image corresponding to the image to be converted, wherein the target image comprises the converted image, the image conversion model used is a generator network in a pre-trained generation confrontation network model for performing mode conversion on the input image to be converted, and the generation confrontation network model is a deep learning network and needs to be trained by a large amount of data, so that the quality of the converted image obtained through the generator network is relatively better, the texture is clearer, meanwhile, no manual participation is needed in the mode conversion process, the operation speed is high, and the application range is wider.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating a method for modality conversion of image data according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training method of an image transformation model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a generator network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a training process for generating a confrontation network model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a mode conversion apparatus for image data according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the field of medical image processing, the CBCT image is often converted into the CT image to improve the image quality, so as to provide more possibilities for the radiotherapy research. The artificial intelligence technology makes a great breakthrough in the field of medical image processing, and the UNet-based neural network obtains better results in the image segmentation, classification and image generation directions. However, when image generation is performed by using UNet alone, there is a difference between the generated image quality and the real image. And then, a part of networks generate images in a form of generating countermeasures, the generators are used for generating the images, and the discriminators classify the generated results and the real images, so that the details of the generated images are further improved.

At present, a CBCT (Cone Beam computed tomography) mode conversion technology based on a loop generation countermeasure network is greatly applied to the mode conversion of images, the technology carries out network training by using images in a non-pairing form, a loss function of the network training is a generator loss function and a discriminator loss function, the network is difficult to converge due to a semi-supervision form of the loss function, and the generation effect is poor. In addition, in the traditional generator based on UNet, partial characteristics may be lost due to deconvolution operation in the up-sampling process, so that more black hole areas appear in the generated result, and the network training result is unstable by only using a batch normalization mode.

In addition, in the prior art, the mode conversion of CBCT images is mainly divided into two categories: based on conventional algorithms and based on deep learning. The mode conversion method based on the traditional algorithm mainly utilizes statistical information to map gray values of different tissues into gray values corresponding to the CT image, the method is poor in adaptability, the quality of the generated image is not real, only the gray mapping of the image is needed, and the authenticity of a texture structure in the image cannot be guaranteed. The CBCT modality conversion technique based on the cyclic generation countermeasure network belongs to a modality conversion method based on deep learning, and the current method also has the defects of unclear texture and poor fidelity of the generated image, so that the generated image is difficult to completely copy the gray level distribution of the CT image and the texture structure of the input CBCT image, and therefore, the use requirement is difficult to achieve.

Accordingly, embodiments of the present invention provide a method and an apparatus for modality conversion of image data to alleviate the above technical problems.

To facilitate understanding of the present embodiment, a mode conversion method of image data disclosed in the present embodiment is first described in detail.

In a possible implementation manner, an embodiment of the present invention provides a method for modality conversion of image data, and fig. 1 illustrates a flowchart of a method for modality conversion of image data, where the method includes the following steps:

step S102, acquiring an image to be converted;

step S104, extracting a main body area contained in the image to be converted, and performing gray level transformation on the main body area to obtain a main body image corresponding to the main body area;

step S106, inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model;

in the embodiment of the invention, the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each up-sampling layer is connected with the down-sampling layer in a sliding manner;

further, in the embodiment of the present invention, the image to be converted obtained in the step S102 is a CBCT image, and the converted image obtained in the step S106 is a CT image; therefore, the CBCT image can be converted into the CT image through the above steps in the embodiment of the present invention, thereby realizing the mode conversion of the image data. The image conversion model in the embodiment of the invention uses the generator network in the generation confrontation network model, belongs to the deep learning network, and therefore, before use, the quality of image generation is ensured through training of a large amount of data, so that the texture is clearer.

Step S108, restoring the converted image to an image to be converted to obtain a target image corresponding to the image to be converted.

In step S108, a restoration process is adopted to restore the main body region to the original image to be converted, so as to obtain a complete target image after modality conversion.

The modality conversion method of the image data provided by the embodiment of the invention can acquire the image to be converted; extracting a main body area contained in the image to be converted, and carrying out gray level conversion on the main body area to obtain a main body image corresponding to the main body area; then inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model; and then the converted image is restored to the image to be converted to obtain a target image corresponding to the image to be converted, and the target image comprises the converted image, and the used image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on the input image to be converted, and the generation confrontation network model is a deep learning network and needs to be trained by a large amount of data, so that the quality of the converted image obtained through the generator network is relatively better, the texture is clearer, meanwhile, in the mode conversion process, no manual participation is needed, the operation speed is high, and the application range is wider.

In practical use, the process of step S104 is actually a preprocessing process of the image to be converted, and since the image to be converted is a CBCT image, the preprocessing process specifically includes the following steps: extracting a gray level histogram of each layer of image contained in the CBCT image; and intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the intercepted gray level of the main body area to obtain a main body image corresponding to the main body area.

In a specific implementation, a preset algorithm may be used to pre-process the gray scale of the CBCT image, for example, the body threshold of the CBCT image is extracted by using the tsui method, so as to segment a body region, which is used as the main body region in the embodiment of the present invention, and then corresponding interception is performed according to the size of the body of each layer of the CBCT image; then, the individual gray level adjustment method is used for carrying out the gray level transformation of the CBCT image, and the specific method is to extract the gray level histogram of each layer of the CBCT image, intercept the gray level histogram in a certain range, for example, the median and the width of the gray level histogram are taken to further adjust the gray level of each layer of the CBCT image, and finally normalize the gray level to be within the range of 0-1. The specific gray scale preprocessing algorithm may be set according to an actual use situation, which is not limited in this embodiment of the present invention.

Further, since the image conversion model performs modality conversion on the main body region, after the converted image is obtained, the converted image needs to be further restored to an image to be converted, specifically, when the converted image is restored, the extraction rule based on the gray histogram and the preset interception rule are required to perform reverse operation on the converted image, so as to restore the converted image to the original size matched with the image to be converted.

Therefore, the modality conversion method of image data provided by the embodiment of the invention is a CBCT image modality conversion method based on a generated confrontation network model, and the embodiment of the invention can obviously improve the image quality of the generated CT image based on the generated confrontation network model. On the other hand, the mode conversion is carried out by using a deep learning method, the running speed is high, and compared with the traditional image processing method, the speed is improved by 3-5 times.

Further, in order to ensure that the image texture of the CT image obtained after the modality conversion in the embodiment of the present invention is clearer and the reality degree is higher, and the generated CT image can more completely copy the gray level distribution of the CT image and the texture structure of the input CBCT image, the image conversion model in the embodiment of the present invention further combines the content loss and the style loss in the style migration, and learns the tissue structure of the CBCT image and the gray level range of the CT image at the same time, so in order to obtain the image conversion model, the embodiment of the present invention further includes a training method of the image conversion model, specifically, fig. 2 shows a flow chart of the training method of the image conversion model, which includes the following steps:

step S202, acquiring a preset training set;

the training set comprises at least one pair of image data, each pair of image data comprises first image data and second image data, the first image data comprises a positioning CT image, and the second image data is a CBCT image corresponding to the first image data;

step S204, inputting the training set into a pre-constructed generated countermeasure network model, and respectively performing countermeasure training on a generator network and a discriminator network included in the generated countermeasure network model to update parameters of the generator network and the discriminator network;

and step S206, storing the updated parameters of the generator network and the discriminator network, and storing the generator network as an image conversion model for carrying out modal conversion on the input image to be converted.

In practical use, in order to make data in a training set more fit clinical use scenarios and clinical data distributions, in a data preparation stage, the data may be collected from a patient data set undergoing radiation therapy in a hospital and used as preparation data, and a series of data preparation, data registration, and the like are performed to generate the training set for training, so that the training method of the image transformation model described in the embodiment of the present invention further includes a data preparation process, and specifically, the data preparation process may include the following manners:

(1) acquiring a pre-acquired image data set;

the image data set in the embodiment of the invention comprises a plurality of data pairs, wherein each data pair comprises a positioning CT image and a CBCT image corresponding to the positioning CT image; for example, a positioning CT image for positioning and radiotherapy planning during radiotherapy can be acquired, and then a CBCT image scanned when the patient is first subjected to radiotherapy is acquired, so as to form the data pair.

(2) Carrying out deformation registration on the data pair, and preprocessing the registered data pair;

(3) and performing data amplification on the preprocessed data pair according to a preset amplification scheme to generate a training data set, wherein the training data set comprises the training set.

Specifically, when the data pair after registration is preprocessed in (2), the following processing needs to be performed on the positioning CT image included in the data pair and the CBCT image corresponding to the positioning CT image: respectively extracting a gray level histogram of each layer of image contained in the positioning CT image and the CBCT image; intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the gray level value of the intercepted main body area to obtain a main body image corresponding to the main body area; and storing the main body image as a preprocessed data pair corresponding to the positioning CT image and the CBCT image.

In a specific implementation, when each data pair includes a positioning CT image and a CBCT image, deformation registration is usually performed first, where methods of deformation registration include, but are not limited to, non-rigid registration based on feature extraction, non-rigid registration based on gray scale extraction, non-rigid registration based on a transform domain, and a method of image non-rigid registration based on mixed features, gray scales and transform domains, which may be specifically set according to an actual use situation, and this is not limited in this embodiment of the present invention.

For the data pairs after deformation registration, after the preprocessing process, a training data set can be obtained, and the training data set can be divided into a part of data as a training set to train the image conversion model required in the embodiment of the invention.

For the above preprocessing process, the CBCT image preprocessing process is exemplified, and mainly includes the following processes:

(a) extracting the gray threshold of the CBCT image to further segment a body (body) area, and correspondingly intercepting according to the size of the body area of each layer of image;

specifically, the step may use the Otsu method to extract the gray threshold of the CBCT image, and may be specifically set according to the actual use situation.

(b) The method comprises the specific steps of extracting a gray histogram of each layer of CBCT image, intercepting the gray histogram within a certain range, for example, taking the median and the width of the gray histogram to further adjust the gray value of each layer of CBCT image, and finally normalizing the gray value to be within the range of 0-1.

The same preprocessing method as described above is applied to both the CBCT image and the scout CT image included in each data pair.

Further, after the preprocessing process, data augmentation processing may be further performed to increase the data diversity of the training data set, and specific data augmentation methods include but are not limited to: the noise, the rotation, the inversion, the scale transformation, and the like are added, and may be specifically set according to an actual use situation, which is not limited in the embodiment of the present invention.

Further, the training set obtained according to the preprocessing method may be fed into the constructed generated confrontation network model, and used to execute the method shown in fig. 2 in the embodiment of the present invention, so as to obtain the image transformation model in the embodiment of the present invention.

Specifically, in the embodiment of the present invention, the generator network includes a preset number of down-sampling layers and up-sampling layers, and each up-sampling layer and each down-sampling layer are connected by sliding; and the loss function envelope L1 of the generator network, the style loss function and the content loss function are used for carrying out style migration on the image to be converted.

For ease of understanding, fig. 3 shows a schematic structure diagram of a generator network, where, for convenience of illustration, in fig. 3, the generator network is composed of several down-sampling layers and several up-sampling layers, fig. 3 only shows several convolution layers by way of example, and the specific number of layers may vary according to a specific task, which is not limited in the embodiment of the present invention. The upsampling layer and the downsampling layer of each layer are connected by sliding Connection (Skip Connection in fig. 3) to the convolution layer corresponding to the upsampling layer, and the configuration of the generator network is shown in fig. 3, for example. Compared with the traditional downsampling layer, the downsampling layer in the embodiment of the invention consists of a plurality of ConvBlock, and specifically comprises a convolution layer, an activation layer and a pixel normalization layer. In addition, in the embodiment of the invention, the traditional batch normalization layer is replaced by the pixel normalization layer, so that the network convergence can be accelerated, and the individual characteristics among the examples of each input image can be maintained.

Further, in fig. 3, the upsampling layer used in the embodiment of the present invention is composed of an upsample layer and a ConvBlock layer. Compared with the deconvolution layer in the original upper sampling layer, the upsample layer can avoid the defect of using zero elements during interpolation in the image generation process, and the quality of image generation is improved.

In addition, the discriminator network in the training process can directly adopt the discriminator network in the originally generated confrontation network model.

Based on the generator network shown in fig. 3, the loss functions of the generator network in the embodiment of the present invention are weighted L1 loss functions, style loss functions, and content loss functions, where the L1 loss function is also referred to as L1_ loss, which is a minimum absolute error, and mainly evaluates the absolute value error between the true value and the predicted value, and in particular, the calculation formula of the L1 loss function is:

where n is the number of input image data, yi is the first image data in the training set, i.e. the pre-processed positioning CT image, and f (xi) is the pseudo image corresponding to the second image data, specifically the pseudo CT image predicted by the generator network.

The style loss function and the content loss function are both from the style migration, and during the specific calculation, the VGG-19 model pre-trained based on the ImageNet dataset may be used to extract the image features, specifically, the first image data, the second image data in the training set, and the artifact predicted by the generator network may be respectively input into the VGG-19 network after the pre-training, and the content features and the style features are extracted and respectively marked as CBCT _ fiat (the content feature of the CBCT image included in the second image data), CBCT _ flat (the style feature of the CBCT image included in the second image data), CT _ fiat (the content feature of the positioning CT image included in the first image data), CT _ flat (the style feature of the positioning CT image included in the first image data), sct _ fiat (the content feature of the dummy image predicted by the generator network), and, sctstyle (style characteristics of the artifact predicted by the generator network).

Generally, the closer to the network input layer, the easier the extraction of the detail information of the obtained image is, and on the contrary, the easier the extraction of the global information of the image is, so that the VGG-19 network is selected to output the content feature by the layer close to the output, and the previous different layers are selected to match the local and global features as the style feature. For example, in the embodiment of the present invention, the feature of the shallow output in the VGG-19 network may be selected as the style feature, and the feature of the deep output may be selected as the content feature.

In order to further preserve the content information of the input CBCT images, but with a style close to CT images, the generator network in the present embodiment uses both a style loss function and a content loss function.

The formula for calculating the style loss function in the embodiment of the invention is as follows:

the gamma matrix represents a matrix formed by vector inner products, combination among different features is mainly obtained, CT _ style represents style features of a positioning CT image included in first image data, and sct _ style represents style features of a pseudo image corresponding to second image data obtained after the second image data are input into a generator network; the formula mainly evaluates the style difference between the pseudo-image generated by the generator network and the real positioning CT image.

Further, the content loss function is calculated as:

wherein CBCT _ feat represents a content feature of a CBCT image included in the second image data, and sct _ feat represents a content feature of a dummy image corresponding to the second image data obtained after the second image data is input to the generator network. The formula mainly evaluates the content difference between the pseudo image generated by the generator network and the input CBCT image. After the style loss function and the content loss function are added, the generator network provided by the embodiment of the invention can greatly improve the quality of the generated image, so that the generated image is more similar to the style of the CT image on the basis of keeping the content of the input image.

Further, based on the generator network and the loss function, in the embodiment of the present invention, when the training process of the step S204 is executed, the following steps may be continuously executed until the generator network and the discriminator network satisfy a preset training condition, where the preset training condition includes that the number of times of training reaches a preset number of times, or the loss value of the generator network is smaller than a preset loss threshold and the loss value of the discriminator network is maintained in a preset range:

(1) freezing parameters of a generator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data;

(2) inputting the pseudo image and first image data corresponding to the second image data into a discriminator network for judgment, and calculating a loss value of the discriminator network;

(3) reversely transmitting the loss value of the discriminator network to the discriminator network so as to update the parameters of the discriminator network;

(4) freezing parameters of the discriminator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data;

(5) and calculating loss values of the generator network, and back propagating the loss values of the generator network into the generator network to update the parameters of the generator network.

For easy understanding, fig. 4 shows a training schematic diagram for generating a countermeasure network model, as shown in fig. 4, respectively showing a generator network and a discriminator network, wherein the generator network also shows an upper sampling module and a lower sampling module, and the discriminator network also further shows a lower sampling module, as shown in fig. 4, during specific training, since the calculation process of the style loss function and the content loss function in the embodiment of the present invention utilizes a pre-trained VGG-19 model, when training to generate the countermeasure network model, all parameters of the VGG-19 network need to be frozen after initializing all network parameters, and then the training process for each batch (batch) is as follows:

firstly, parameters of all generator networks are frozen, a CBCT image in second image data is input into the generator networks to obtain a pseudo image of the CBCT image, the pseudo image and a real positioning CT image included in first image data are simultaneously input into a discriminator network for true and false judgment, a loss value of the discriminator network is calculated, then the loss value is reversely transmitted into the discriminator network, and the parameters of the discriminator network are updated. And then freezing parameters of all the discriminator networks, inputting the CBCT image in the second image data into the generator network to obtain a predicted pseudo image, calculating a loss value of the generator network at the moment, updating the loss value into the generator network through back propagation, and updating the weight parameters of the generator network.

Further, for the generator network after training, a further verification process may be performed, specifically, a verification set used for verification may be obtained by dividing a part of data from the training data set, after a cycle (epoch) of training of the generated confrontation network model is completed, the CBCT image included in the second image data in the verification set may be input to the generator network to obtain a predicted pseudo image, the predicted pseudo image and the true positioning CT image in the first image data may be used to calculate a loss value, when the loss value is minimum (smaller than a preset loss threshold value), the training may be stopped, and parameters of the generator network and the discriminator network at this time may be stored.

In summary, the modality conversion method of image data provided in the embodiments of the present invention combines image generation and style migration, and provides a CBCT image modality conversion method based on generation of a confrontation network model and style migration. Compared with other deep learning methods, the method for generating the confrontation network model after the style migration is combined is more stable, the generated image quality and the texture details are greatly improved, and compared with other traditional methods, the method for converting the image data mode provided by the embodiment of the invention has the advantages of wider adaptability and higher running speed.

Furthermore, in the embodiment of the present invention, the loss function used for generating the anti-network model is a common L1 loss function, and a style loss function and a content loss function are additionally added, so that the convergence speed of the generator network can be increased, and the generated image texture is clearer. In addition, compared with the original generated countermeasure network model, the generated countermeasure network model used in the embodiment of the invention replaces the batch normalization layer in the generated countermeasure network model with the pixel normalization layer, and replaces the deconvolution layer with the upsampling layer, so that a new network structure is formed by combination, and the image pixel information can be better extracted and the image texture can be more substantially restored to generate.

Further, on the basis of the above embodiments, an embodiment of the present invention further provides a modality conversion apparatus of image data, and specifically, fig. 5 shows a schematic structural diagram of a modality conversion apparatus of image data, which includes the following structures:

the acquiring module 50 is configured to acquire an image to be converted, where the image to be converted is a CBCT image;

a transformation module 52, configured to extract a main body region included in the image to be transformed, and perform gray-scale transformation on the main body region to obtain a main body image corresponding to the main body region;

a conversion module 54, configured to input the main body image into a pre-trained image conversion model, and output a conversion image corresponding to the main body image through the image conversion model; the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out modal conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each up-sampling layer is connected with the down-sampling layer in a sliding manner;

the restoring module 56 is configured to restore the converted image to the image to be converted, so as to obtain a target image corresponding to the image to be converted, where the target image includes the converted image.

The modality conversion apparatus of image data provided by the embodiment of the present invention has the same technical features as the modality conversion method of image data provided by the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.

Further, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method shown in fig. 1 or fig. 2.

Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method shown in fig. 1 or fig. 2.

An embodiment of the present invention further provides a schematic structural diagram of an electronic device, as shown in fig. 6, which is the schematic structural diagram of the electronic device, where the electronic device includes a processor 61 and a memory 60, the memory 60 stores computer-executable instructions that can be executed by the processor 61, and the processor 61 executes the computer-executable instructions to implement the method shown in fig. 1 or fig. 2.

In the embodiment shown in fig. 6, the electronic device further comprises a bus 62 and a communication interface 63, wherein the processor 61, the communication interface 63 and the memory 60 are connected by the bus 62.

The Memory 60 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 62 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 62 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

The processor 61 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 61. The Processor 61 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and the processor 61 reads information in the memory and, in conjunction with hardware thereof, performs the method shown in fig. 1 or fig. 2.

The method and the computer program product for modality conversion of image data provided by the embodiments of the present invention include a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood in specific cases for those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that the following embodiments are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, and the following detailed description is given: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for modality conversion of image data, comprising:

acquiring an image to be converted, wherein the image to be converted is a CBCT image;

extracting a main body area contained in the image to be converted, and carrying out gray level transformation on the main body area to obtain a main body image corresponding to the main body area;

inputting the main body image into a pre-trained image conversion model, and outputting a conversion image corresponding to the main body image through the image conversion model; wherein the converted image is a CT image; the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each layer of the up-sampling layers is connected with the down-sampling layers in a sliding manner;

and restoring the converted image to the image to be converted to obtain a target image corresponding to the image to be converted, wherein the target image comprises the converted image.

2. The method according to claim 1, wherein the step of extracting a main body region included in the image to be converted and performing gray-scale transformation on the main body region to obtain a main body image corresponding to the main body region comprises:

extracting a gray level histogram of each layer of image contained in the CBCT image;

and intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the intercepted gray level value of the main body area to obtain a main body image corresponding to the main body area.

3. The method of claim 2, wherein the step of restoring the converted image to the image to be converted comprises:

and carrying out reverse operation on the converted image based on the extraction rule of the gray level histogram and the preset interception rule so as to restore the converted image to the original size matched with the image to be converted.

4. The method of claim 1, further comprising:

acquiring a preset training set, wherein the training set comprises at least one pair of image data, each pair of image data comprises first image data and second image data, the first image data comprises a positioning CT image, and the second image data is a CBCT image corresponding to the first image data;

inputting the training set into a pre-constructed generation confrontation network model, and respectively carrying out confrontation training on a generator network and a discriminator network included in the generation confrontation network model so as to update parameters of the generator network and the discriminator network;

and storing the updated parameters of the generator network and the discriminator network, and storing the generator network as an image conversion model for carrying out modal conversion on the input image to be converted.

5. The method of claim 4, wherein the loss function of the generator network envelopes L1 loss function, style loss function and content loss function for performing style migration on the image to be converted.

6. The method of claim 5, wherein the step of inputting the training set into a pre-constructed generative confrontation network model and performing confrontation training on a generator network and a discriminator network included in the generative confrontation network model respectively comprises:

continuously executing the following steps until the generator network and the discriminator network meet a preset training condition, wherein the preset training condition comprises that the training times reach a preset number, or the loss value of the generator network is smaller than a preset loss threshold value and the loss value of the discriminator network is maintained in a preset range:

freezing parameters of the generator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data;

inputting the pseudo image and first image data corresponding to the second image data into the discriminator network for judgment, and calculating a loss value of the discriminator network;

propagating loss values of the arbiter network back into the arbiter network to update parameters of the arbiter network; and the number of the first and second groups,

freezing parameters of the discriminator network, and inputting second image data in the training set into the generator network to obtain a pseudo image corresponding to the second image data;

calculating loss values of the generator network, and back-propagating the loss values of the generator network into the generator network to update parameters of the generator network.

7. The method of claim 6, wherein the L1 loss function is calculated by the formula

Wherein n is the number of input image data, yi is the first image data in the training set, and f (xi) is the dummy image corresponding to the second image data;

the formula for calculating the style loss function is as follows:

the gamma matrix represents a matrix formed by vector inner products, CT _ style represents the style characteristics of a positioning CT image included in first image data, and sct _ style represents the style characteristics of a pseudo image corresponding to second image data obtained after the second image data is input to the generator network;

the calculation formula of the content loss function is as follows:

8. The method of claim 4, further comprising:

acquiring a pre-acquired image data set, wherein the image data set comprises a plurality of data pairs, and each data pair comprises a positioning CT image and a CBCT image corresponding to the positioning CT image;

carrying out deformation registration on the data pair, and preprocessing the registered data pair;

and performing data amplification on the data pair obtained after the preprocessing according to a preset amplification scheme to generate a training data set, wherein the training data set comprises a training set.

9. The method of claim 8, wherein the step of preprocessing the registered data pairs comprises:

and respectively executing the following processing to the positioning CT image included in the data pair and the CBCT image corresponding to the positioning CT image:

respectively extracting a gray level histogram of each layer of image contained in the positioning CT image and the CBCT image;

intercepting a main body area contained in the gray level histogram according to a preset intercepting rule, and performing normalization processing on the intercepted gray level of the main body area to obtain a main body image corresponding to the main body area;

and storing the main body image as a preprocessed data pair corresponding to the positioning CT image and the CBCT image.

10. A modality conversion apparatus of image data, characterized in that the apparatus comprises:

the acquisition module is used for acquiring an image to be converted, wherein the image to be converted is a CBCT image;

the transformation module is used for extracting a main body area contained in the image to be transformed, and carrying out gray level transformation on the main body area to obtain a main body image corresponding to the main body area;

the conversion module is used for inputting the main body image into a pre-trained image conversion model and outputting a conversion image corresponding to the main body image through the image conversion model; wherein the converted image is a CT image; the image conversion model is a generator network in a pre-trained generation confrontation network model and is used for carrying out mode conversion on an input image to be converted, the generator network comprises a preset number of down-sampling layers and up-sampling layers, and each layer of the up-sampling layers is connected with the down-sampling layers in a sliding manner;

and the restoring module is used for restoring the converted image to the image to be converted so as to obtain a target image corresponding to the image to be converted, wherein the target image comprises the converted image.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of claims 1-9 when executing the computer program.

12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method of any of the preceding claims 1-9.