CN114240753A

CN114240753A - Cross-modal medical image synthesis method, system, terminal and storage medium

Info

Publication number: CN114240753A
Application number: CN202111551447.4A
Authority: CN
Inventors: 张俊杰
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-25

Abstract

The invention relates to a cross-modal medical image synthesis method, a cross-modal medical image synthesis system, a cross-modal medical image synthesis terminal and a storage medium. The method comprises the steps of constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair; the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs a true and false discrimination result; and constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model. The method can generate high-reliability multi-modal data.

Description

Cross-modal medical image synthesis method, system, terminal and storage medium

Technical Field

The present application relates to the field of medical image processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for cross-modality medical image synthesis.

Background

With the development of science and technology, the acquisition modes of medical images are various, and medical images in different modalities have different advantages and disadvantages. For example, Magnetic Resonance Imaging (MRI) does not radiate a human body, soft tissue structures are clearly displayed, abundant diagnostic information can be obtained, but acquisition time is long, and artifacts are easily generated; positron Emission Tomography (PET) can make early diagnosis of diseases by changing the tissue functionality of the lesion region, but is expensive and has low image resolution. Research shows that the morphological or functional abnormality of human body caused by diseases is often expressed in various aspects, and the information acquired by the single-mode imaging equipment cannot fully reflect the complex characteristics of the diseases. And clinically acquiring medical images of different modalities simultaneously requires a lot of time and money. Therefore, how to accurately synthesize images of a desired modality by computer technology using medical images of existing modalities is a research direction in recent years.

Although the existing cross-mode synthesis methods have good effects, due to the complex spatial structure of medical images, the synthesis results still cannot well represent the edge information of human tissues, and the problems of low signal-to-noise ratio, fuzzy edge and the like exist. So that the composite effect of the multi-modal imagery of a particular subject is reduced with limited paired data.

Therefore, how to improve the composite effect of multi-modal images of a specific subject under the condition of limited paired data is an urgent problem to be solved.

Disclosure of Invention

In view of the above, it is necessary to provide a cross-modality medical image synthesis method, which includes:

constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;

the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results; and

and constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model.

The cross-modal medical image synthesis method comprises the steps of firstly constructing a generation confrontation network model comprising a generator and a discriminator, then controlling the generator to carry out feature mapping relation between a first modal medical image and a second modal medical image, generating a synthesized second modal medical image according to the feature mapping relation, and then splicing the synthesized second modal medical image and a real second modal medical image with the real first modal medical image respectively to output a first image pair and a second image pair; on one hand, paired data can be added, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

In one possible embodiment, the generator adopts a U-Net network structure, which comprises an encoder and a decoder with symmetrical network structures;

the generating of the composite second modality medical image includes:

outputting a real feature map of the first modality medical image through the feature extraction operation of the encoder multilayer convolution;

and the decoder performs multilayer deconvolution operation on the feature map output by the encoder, performs multiple splicing operations on the generated feature map and the feature map with the same size as the corresponding position of the encoder, and finally outputs a target reconstructed image, namely the synthesized second modality medical image.

In one possible embodiment, the synthesis method further comprises:

taking each pixel in the feature map as a random variable, and calculating the pairing covariance among the pixels;

the value of each pixel is selectively enhanced or reduced according to the calculated pair covariance.

In one possible embodiment, the encoder includes a convolution module layer, a batch normalization layer, and an activation layer;

the number of the convolution module layers is seven, the second to the fifth of the convolution module layers are mixed expansion convolution module layers, and the rest are full convolution layers.

In one possible embodiment, the hybrid convolutional module layer comprises 6 3 × 3 × 3D convolutional layers, the expansion ratio of each 3D convolutional layer is set to be a zigzag structure;

each of the convolutional layers is denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5, and convolutional layer 6, respectively; wherein, the

convolutional layers

1 and 3, 2 and 5, and 4 and 6 are respectively connected through residual error structures;

the expansion rates of the

convolutional layers

1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.

In one possible embodiment, the discriminator comprises 6 convolutional layers, a batch normalization layer and an activation layer; wherein each of the convolutional layers is respectively denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5 and convolutional layer 6; wherein, the

convolutional layers

1 and 4, and 2 and 6 are respectively connected through residual error structures.

In one possible embodiment, the real first modality medical image comprises a CT image or an MRI image; the composite second modality medical image includes a SPECT image or a PET image.

In one possible embodiment, the loss function MAE of the generator is set to:

wherein m is the model batch size, y_iIn order to be the true value of the value,

is a predicted value.

In one possible embodiment, the penalty function MSE of the discriminator is set as:

is a predicted value.

Based on the same inventive concept, the present application further provides a cross-modality medical image synthesis system, comprising:

a model building module configured to build a generative confrontation network model including a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;

the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results;

and the model training module is used for respectively constructing different loss functions for the generator and the discriminator so as to carry out image synthesis training on the generated confrontation network model.

According to the cross-modal medical image synthesis system, the generation confrontation network model comprising the generator and the discriminator is constructed by setting the model construction module, then the generator is controlled to carry out feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and then the synthesized second-modal medical image and the real second-modal medical image are respectively spliced with the real first-modal medical image to output the first image pair and the second image pair; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relation, meanwhile, the first image pair and the second image pair are subjected to true and false judgment through the discriminator, and then the model training module is arranged to respectively construct different loss functions for the generator and the discriminator so as to perform image synthesis training on the generated impedance model, so that the finally obtained result is more reliable. That is to say, the synthesis system of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is blurred, and the like.

Based on the same inventive concept, the present application further provides a terminal comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is operable to execute the computer program to perform the method of any of the preceding claims.

The terminal has a processor which can be used for executing the cross-modal medical image synthesis method, so that the beneficial effects generated by the method are naturally applicable to the terminal of the application.

Based on the same inventive concept, the present application also provides a computer-readable storage medium having stored thereon a computer program, which, when being executed by a processor, is adapted to carry out the method of any of the preceding claims.

The above-mentioned computer-readable storage medium, since the computer program stored thereon can be used to execute the above-mentioned cross-modality medical image synthesis method when executed by a processor, the advantageous effects produced by the method are naturally applicable to the computer-readable storage medium of the present application.

Drawings

FIG. 1 is a schematic flow chart of a cross-modality medical image synthesis method according to an embodiment;

FIG. 2 is a diagram of a framework for generating a confrontation network model in one embodiment;

FIG. 3 is a model framework diagram of the generator portion of FIG. 2;

FIG. 4 is a schematic diagram of a substructure of structure 212 of FIG. 3;

FIG. 5 is a model frame diagram of the discriminator section of FIG. 2;

FIG. 6 is a block diagram of a cross-modality medical image synthesis system in an embodiment.

Detailed Description

To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In view of the above, the present application is intended to provide a new solution to the above-mentioned technical problem, and the specific structure thereof will be described in detail in the following embodiments.

According to a first aspect of the present invention, as shown in fig. 1, the present application provides a cross-modality medical image synthesis method, which may include steps S100-S300.

Step S100, constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;

step S200, the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results; and

and step S300, constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model.

According to the cross-modal medical image synthesis method, the generation confrontation network model comprising the generator and the discriminator is constructed firstly, then the generator is controlled to carry out feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and then the synthesized second-modal medical image and the real second-modal medical image are respectively spliced with the real first-modal medical image to output the first image pair and the second image pair, on one hand, paired data can be added, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

Specifically, referring to fig. 2, the real first modality medical image SP1 of the present application may be image data of size 256 × 3, and for convenience of description, the real first modality medical image SP1 is subsequently simplified to the first modality medical image SP 1. The first modality medical image SP1 may include a CT image or an MRI image. Meanwhile, the first modality medical image SP1 may be selected from a first modality medical image set which is a collection of images acquired at a first modality for a plurality of reference objects. For example, the first modality may be MRI and the plurality of reference objects may be certain organs of a plurality of persons, such as hearts of a plurality of persons, in which case the first modality medical image set is a set of cardiac MRI images acquired of the hearts of a plurality of persons using MRI. The first modality is described above by taking MRI as an example, and the plurality of reference objects are described by taking the heart of a plurality of persons as an example, but it should be understood that the present disclosure is not limited thereto, and the first modality may also be various other modalities such as CT, PET, SPECT, etc., and the plurality of reference objects may also be various other reference objects such as the kidneys of a plurality of persons, the bones of a plurality of persons, etc.

Further, the real second modality medical image SP2 of the present application may also be image data of size 256 × 3, and for convenience of description, the real second modality medical image SP2 is subsequently simplified to the second modality medical image SP 2. The second modality medical image SP2 may include a SPECT image or a PET image. Meanwhile, the second modality medical image SP2 may be selected from a second modality medical image set which is a set of images acquired with respect to a plurality of reference objects at a second modality. For example, the second modality may be PET and the plurality of reference objects may be certain organs of a plurality of persons, such as hearts of a plurality of persons, in which case the second modality medical image set is a set of cardiac PET images acquired of the plurality of persons' hearts using PET. The second modality is described above with PET as an example, and the plurality of reference objects are described with the plurality of human hearts as an example, but it should be understood that the disclosure is not limited thereto, and the second modality may also be various other modalities such as CT, MRI, SPECT, etc., and the plurality of reference objects may also be various other reference objects such as a plurality of human kidneys, a plurality of human bones, etc. It will be appreciated that the first modality medical image SP1 and the second modality medical image SP2, regardless of the particular modality selected, should be image data obtained for the same sample.

For convenience of description, in the following embodiments, the first modality medical image SP1 is an MRI image, and the second modality medical image SP2 is a PET image.

In one possible embodiment, with continuing reference to FIGS. 2, 3 and 4, the generator 20 of the present application may employ a U-Net network architecture, which is a 3D architecture. Which may include an encoder 210 and a decoder 220 that are symmetric in network architecture. The U-Net model is designed based on a full convolution network with jump connection, and the main idea is to design an encoder and a decoder with symmetrical network structures, so that the encoder and the decoder have feature maps with the same quantity and size, and combine the corresponding feature maps of the encoder and the decoder through the jump connection, so that feature information in a down-sampling process can be reserved to the maximum extent, and the efficiency of feature expression is improved. The MRI and PET images are from the same sample, sharing a large amount of primary feature information between them, so the U-Net model is well suited for complex feature mapping between the two modality images.

Further, the generating step of the synthesized second modality medical image SS may include the sub-steps of:

and the decoder performs multilayer deconvolution operation on the feature map output by the encoder, performs multiple splicing operations on the generated feature map and the feature map with the same size as the corresponding position of the encoder, and finally outputs a target reconstructed image, namely the synthesized second modality medical image SS.

In particular, the synthesized second modality medical image SS may be a SPEC image or a PET image. Meanwhile, the synthesized second modality medical image SS should be of the same type and subject as the real second modality medical image SP 2.

Further, the aforementioned first image pair may be a mosaic of the first modality medical image SP1 and the second modality medical image SP2, and the second image pair may be a mosaic of the first modality medical image SP1 and the synthesized second modality medical image SS. It will be appreciated that the composition of the first and second image pairs may also be interchanged and will not be described in detail herein.

In one possible embodiment, referring to fig. 3, the encoder 210 may include a convolution module layer, a batch normalization layer (not shown), and an activation layer (not shown); the batch normalization layer is also denoted as BN, and the active layer is also denoted as ReLU, and it is understood that reference may be made to the description of the prior art for the batch normalization layer BN and the active layer ReLU, which is not the focus of the present application, and further details are not described herein.

The number of the convolution module layers is seven, and the seven convolution module layers are respectively marked as a convolution module layer 211, a convolution module layer 212, a convolution module layer 213, a convolution module layer 214, a convolution module layer 215, a convolution module layer 216 and a convolution module layer 217. And the second to fifth of the convolution module layers are hybrid dilated convolution module layers, i.e., the convolution module layers 212, 213, 214, 215 with bold borders in fig. 2, and the rest are common 3 × 3 × 3D full convolution layers.

In one possible embodiment, referring to fig. 4 for assistance, taking the hybrid convolutional layer 212 as an example, the hybrid convolutional layer 212 may include 6 3 × 3 × 3D convolutional layers, and the expansion rate of each 3D convolutional layer is set to be a zigzag structure.

Specifically, each of the convolutional layers is referred to as convolutional layer 1(2121), convolutional layer 2(2122), convolutional layer 3(2123), convolutional layer 4(2124), convolutional layer 5(2125), and convolutional layer 6 (2126); wherein, the convolutional layers 1(2121) and 3(2123), 2(2122) and 5(2125), 4(2124) and 6(2126) are connected by residual structure;

the expansion rates of the

convolutional layers

1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.

The hybrid expansion convolution module layer is only arranged in the middle four layers of the encoder 210, so that the network grid problem is avoided, and the network parameters and the training time are reduced to a certain extent while the characteristic information is fully extracted to improve the generation quality.

In one possible embodiment, with continued reference to fig. 4, the decoder 220 essentially reconstructs the final output from the feature map compressed by the encoder 210. As can be seen from the foregoing description, the network structure of the decoder 220 and the encoder 210 is symmetrical. Therefore, as shown, the decoder 220 of the present application is also composed of 7 deconvolution module layers, a batch normalization layer (not shown), and an activation layer (not shown). The batch normalization layer is also denoted as BN, and the active layer is also denoted as ReLU, and it is understood that reference may be made to the description of the prior art for the batch normalization layer BN and the active layer ReLU, which is not the focus of the present application, and further details are not described herein.

Further, the seven deconvolution module layers are respectively marked as a deconvolution module layer 221, a deconvolution module layer 222, a deconvolution module layer 223, a deconvolution module layer 224, a deconvolution module layer 225, a deconvolution module layer 226, and a deconvolution module layer 227; each deconvolution module layer is composed of 3 2 × 2 3D convolution layers, and the deconvolution module layer 221 and the deconvolution module layer 223 are connected by a residual structure.

In one possible embodiment, with continued reference to fig. 5, the arbiter 220 is an 8-layer 3D full convolutional network, which may include 6 convolutional layers, a batch normalization layer (317) and an activation layer (318); wherein each of the convolutional layers is denoted as convolutional layer 1(311), convolutional layer 2(312), convolutional layer 3(313), convolutional layer 4(314), convolutional layer 5(315), and convolutional layer 6 (316); wherein, the convolutional layers 1(311) and 4(314), convolutional layers 2(312) and 6(316) are connected by residual structure.

Further, as shown in fig. 5, the convolutional layer 6 uses Global Pooling (GAP), the 7 th layer uses 1 convolution kernel of 1 × 1, and finally, a Sigmoid activation function is used to determine whether the first image pair and the second image pair output via the generator belong to a real image or a generated image. It can be understood that the principle of the discriminator for performing true and false discrimination on the input image can be known by referring to the prior art, and further description is not provided herein.

Further, to prevent the network from overfitting, the present application also adds a dropout operation to the activation layer ReLu in the generator 20, with the correlation value set to 0.5. And finally, obtaining a synthesized second modality medical image (PET image) through the Tanh activation function according to the coded and decoded characteristic information.

Specifically, the manner in which the generator 20 synthesizes the real MRI images into the corresponding PET images through the encoder 210 and the decoder 220 can be understood with reference to fig. 3 and 4, and the detailed description of the present application is omitted.

In one possible embodiment, to further improve the quality of the synthesis. The synthesis method of the present application may further include:

That is, the present application introduces a self-attention mechanism between the encoder 210 and the decoder 220, where the self-attention mechanism is to take each pixel in the feature map as a random variable, calculate the pair covariance between all pixels, enhance or weaken the value of each predicted pixel according to the similarity between each predicted pixel and other pixels in the image, that is, selectively amplify more valuable feature channels and suppress useless feature channels, that is, amplify the weight of relevant features, and suppress the weight of irrelevant features. Interference caused by irrelevant features and noise in jump connection is further eliminated, and key features in residual structure connection are highlighted, so that key information of MRI images is better captured.

In one possible embodiment, the loss function MAE of the generator is set to:

is a predicted value.

is a predicted value.

The method utilizes the loss function to train and generate the confrontation network model, and can improve the quality of the synthetic training.

According to a second aspect of the present application, reference may be made to fig. 6, which also provides a cross-modality medical image synthesis system, which may include a model construction module 2 and a model training module 3. The image data input by the input unit 1 is transmitted to the model construction module 2 and the model training module 3 and then output via the output unit 4.

Wherein the model construction module 2 is configured to construct a generative confrontation network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a characteristic mapping relation between the first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the characteristic mapping relation, and outputs a true and false medical image pair; the true and false medical data image pair is formed by splicing the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively;

the discriminator takes the true and false medical data image pair as input, carries out true and false discrimination on the true and false medical data image pair and outputs a discrimination result;

the model training module 3 is configured to construct different loss functions for the generator and the discriminator, respectively, so as to perform image synthesis training on the generated confrontation network model.

The cross-modal medical image synthesis system constructs a generation countermeasure network model comprising a generator and a discriminator by setting a model construction module 2, then controls the generator to perform feature mapping relation between a first modal medical image and a second modal medical image, generates a synthesized second modal medical image according to the feature mapping relation, and then splices the synthesized second modal medical image and a real second modal medical image with the real first modal medical image respectively to output a first image pair and a second image pair, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are further subjected to true and false judgment through the discriminator, and then the model training module 3 is arranged to respectively construct different loss functions for the generator and the discriminator so as to perform image synthesis training on the generated impedance model, so that the final obtained result is more reliable. That is to say, the synthesis system of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is blurred, and the like.

According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to execute the method according to any of the above embodiments of the present invention when executing the program.

Optionally, a memory for storing a program; a Memory, which may include a Volatile Memory (english: Volatile Memory), such as a Random-Access Memory (RAM), a Static Random-Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the Memory may also comprise a Non-Volatile Memory, such as a Flash Memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.

The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.

The above terminal, including the processor configured to execute the method according to any of the foregoing embodiments, in the cross-modality medical image synthesis method, by first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to perform a feature mapping relationship between a first-modality medical image and a second-modality medical image, and generating a synthesized second-modality medical image according to the feature mapping relationship, and then stitching the synthesized second-modality medical image and a real second-modality medical image with the real first-modality medical image, respectively, to output a first image pair and a second image pair, on one hand, paired data may be added, so that a better synthesis effect is achieved; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any of the above-mentioned embodiments of the present invention.

The computer readable storage medium may be configured to perform the cross-modality medical image synthesis method described in any one of the foregoing embodiments when the computer program stored thereon is executed by the processor, and by first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to perform a feature mapping relationship between a first modality medical image and a second modality medical image, and generate a synthesized second modality medical image according to the feature mapping relationship, and then stitching the synthesized second modality medical image and a real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair, on one hand, paired data may be added to make the synthesis effect better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

The cross-modal medical image synthesis method and system provided by the above embodiment of the present invention, wherein the system includes modules corresponding to the steps of the method, and the method includes first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to generate a feature mapping relationship between a first-modal medical image and a second-modal medical image, generating a synthesized second-modal medical image according to the feature mapping relationship, and outputting a true-false medical image pair, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the true and false medical image pairs are further subjected to true and false judgment through the discriminator, and then the generator and the discriminator respectively construct different loss functions so as to perform image synthesis training on the impedance model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

According to the cross-modal medical image synthesis method and system provided by the embodiment of the invention, the generation countermeasure network model comprising the generator and the discriminator is firstly constructed, then the generator is controlled to perform feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and the true-false medical image pair is output, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the true and false medical image pairs are further subjected to true and false judgment through the discriminator, and then the generator and the discriminator respectively construct different loss functions so as to perform image synthesis training on the impedance model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3DCGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A cross-modal medical image synthesis method is characterized by comprising the following steps:

2. The cross-modality medical image synthesis method according to claim 1, wherein the generator employs a U-Net network structure including an encoder and a decoder with symmetric network structure;

the generating of the composite second modality medical image includes:

3. The cross-modality medical image synthesis method according to claim 2, further comprising:

4. The cross-modality medical image synthesis method of claim 2, wherein the encoder comprises a convolution module layer, a batch normalization layer and an activation layer;

5. The cross-modality medical image synthesis method of claim 4, wherein the hybrid dilation-convolution module layer comprises 6 3 x 3D convolution layers, wherein the dilation rate of each 3D convolution layer is set to be a saw-tooth structure;

each of the convolutional layers is denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5, and convolutional layer 6, respectively; wherein, the convolutional layers 1 and 3, 2 and 5, and 4 and 6 are respectively connected through residual error structures;

the expansion rates of the convolutional layers 1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.

6. The cross-modality medical image synthesis method of claim 1, wherein the discriminator comprises 6 convolutional layers, a batch normalization layer and an activation layer; wherein each of the convolutional layers is respectively denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5 and convolutional layer 6; wherein, the convolutional layers 1 and 4, and 2 and 6 are respectively connected through residual error structures.

7. The cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the real first modality medical image includes a CT image or an MRI image; the composite second modality medical image includes a SPECT image or a PET image.

8. The cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the loss function MAE of the generator is set as:

is a predicted value.

9. A cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the loss function MSE of the discriminator is set as:

is a predicted value.

10. A cross-modality medical image synthesis system, comprising:

11. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is operable to perform the method of any of claims 1-9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1-9.