CN114240753A - Cross-modal medical image synthesis method, system, terminal and storage medium - Google Patents

Cross-modal medical image synthesis method, system, terminal and storage medium Download PDF

Info

Publication number
CN114240753A
CN114240753A CN202111551447.4A CN202111551447A CN114240753A CN 114240753 A CN114240753 A CN 114240753A CN 202111551447 A CN202111551447 A CN 202111551447A CN 114240753 A CN114240753 A CN 114240753A
Authority
CN
China
Prior art keywords
medical image
image
modality medical
real
modality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111551447.4A
Other languages
Chinese (zh)
Inventor
张俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202111551447.4A priority Critical patent/CN114240753A/en
Publication of CN114240753A publication Critical patent/CN114240753A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10104Positron emission tomography [PET]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10108Single photon emission computed tomography [SPECT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention relates to a cross-modal medical image synthesis method, a cross-modal medical image synthesis system, a cross-modal medical image synthesis terminal and a storage medium. The method comprises the steps of constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair; the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs a true and false discrimination result; and constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model. The method can generate high-reliability multi-modal data.

Description

Cross-modal medical image synthesis method, system, terminal and storage medium
Technical Field
The present application relates to the field of medical image processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for cross-modality medical image synthesis.
Background
With the development of science and technology, the acquisition modes of medical images are various, and medical images in different modalities have different advantages and disadvantages. For example, Magnetic Resonance Imaging (MRI) does not radiate a human body, soft tissue structures are clearly displayed, abundant diagnostic information can be obtained, but acquisition time is long, and artifacts are easily generated; positron Emission Tomography (PET) can make early diagnosis of diseases by changing the tissue functionality of the lesion region, but is expensive and has low image resolution. Research shows that the morphological or functional abnormality of human body caused by diseases is often expressed in various aspects, and the information acquired by the single-mode imaging equipment cannot fully reflect the complex characteristics of the diseases. And clinically acquiring medical images of different modalities simultaneously requires a lot of time and money. Therefore, how to accurately synthesize images of a desired modality by computer technology using medical images of existing modalities is a research direction in recent years.
Although the existing cross-mode synthesis methods have good effects, due to the complex spatial structure of medical images, the synthesis results still cannot well represent the edge information of human tissues, and the problems of low signal-to-noise ratio, fuzzy edge and the like exist. So that the composite effect of the multi-modal imagery of a particular subject is reduced with limited paired data.
Therefore, how to improve the composite effect of multi-modal images of a specific subject under the condition of limited paired data is an urgent problem to be solved.
Disclosure of Invention
In view of the above, it is necessary to provide a cross-modality medical image synthesis method, which includes:
constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;
the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results; and
and constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model.
The cross-modal medical image synthesis method comprises the steps of firstly constructing a generation confrontation network model comprising a generator and a discriminator, then controlling the generator to carry out feature mapping relation between a first modal medical image and a second modal medical image, generating a synthesized second modal medical image according to the feature mapping relation, and then splicing the synthesized second modal medical image and a real second modal medical image with the real first modal medical image respectively to output a first image pair and a second image pair; on one hand, paired data can be added, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
In one possible embodiment, the generator adopts a U-Net network structure, which comprises an encoder and a decoder with symmetrical network structures;
the generating of the composite second modality medical image includes:
outputting a real feature map of the first modality medical image through the feature extraction operation of the encoder multilayer convolution;
and the decoder performs multilayer deconvolution operation on the feature map output by the encoder, performs multiple splicing operations on the generated feature map and the feature map with the same size as the corresponding position of the encoder, and finally outputs a target reconstructed image, namely the synthesized second modality medical image.
In one possible embodiment, the synthesis method further comprises:
taking each pixel in the feature map as a random variable, and calculating the pairing covariance among the pixels;
the value of each pixel is selectively enhanced or reduced according to the calculated pair covariance.
In one possible embodiment, the encoder includes a convolution module layer, a batch normalization layer, and an activation layer;
the number of the convolution module layers is seven, the second to the fifth of the convolution module layers are mixed expansion convolution module layers, and the rest are full convolution layers.
In one possible embodiment, the hybrid convolutional module layer comprises 6 3 × 3 × 3D convolutional layers, the expansion ratio of each 3D convolutional layer is set to be a zigzag structure;
each of the convolutional layers is denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5, and convolutional layer 6, respectively; wherein, the convolutional layers 1 and 3, 2 and 5, and 4 and 6 are respectively connected through residual error structures;
the expansion rates of the convolutional layers 1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.
In one possible embodiment, the discriminator comprises 6 convolutional layers, a batch normalization layer and an activation layer; wherein each of the convolutional layers is respectively denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5 and convolutional layer 6; wherein, the convolutional layers 1 and 4, and 2 and 6 are respectively connected through residual error structures.
In one possible embodiment, the real first modality medical image comprises a CT image or an MRI image; the composite second modality medical image includes a SPECT image or a PET image.
In one possible embodiment, the loss function MAE of the generator is set to:
Figure BDA0003417295880000041
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure BDA0003417295880000042
is a predicted value.
In one possible embodiment, the penalty function MSE of the discriminator is set as:
Figure BDA0003417295880000043
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure BDA0003417295880000044
is a predicted value.
Based on the same inventive concept, the present application further provides a cross-modality medical image synthesis system, comprising:
a model building module configured to build a generative confrontation network model including a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;
the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results;
and the model training module is used for respectively constructing different loss functions for the generator and the discriminator so as to carry out image synthesis training on the generated confrontation network model.
According to the cross-modal medical image synthesis system, the generation confrontation network model comprising the generator and the discriminator is constructed by setting the model construction module, then the generator is controlled to carry out feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and then the synthesized second-modal medical image and the real second-modal medical image are respectively spliced with the real first-modal medical image to output the first image pair and the second image pair; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relation, meanwhile, the first image pair and the second image pair are subjected to true and false judgment through the discriminator, and then the model training module is arranged to respectively construct different loss functions for the generator and the discriminator so as to perform image synthesis training on the generated impedance model, so that the finally obtained result is more reliable. That is to say, the synthesis system of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is blurred, and the like.
Based on the same inventive concept, the present application further provides a terminal comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is operable to execute the computer program to perform the method of any of the preceding claims.
The terminal has a processor which can be used for executing the cross-modal medical image synthesis method, so that the beneficial effects generated by the method are naturally applicable to the terminal of the application.
Based on the same inventive concept, the present application also provides a computer-readable storage medium having stored thereon a computer program, which, when being executed by a processor, is adapted to carry out the method of any of the preceding claims.
The above-mentioned computer-readable storage medium, since the computer program stored thereon can be used to execute the above-mentioned cross-modality medical image synthesis method when executed by a processor, the advantageous effects produced by the method are naturally applicable to the computer-readable storage medium of the present application.
Drawings
FIG. 1 is a schematic flow chart of a cross-modality medical image synthesis method according to an embodiment;
FIG. 2 is a diagram of a framework for generating a confrontation network model in one embodiment;
FIG. 3 is a model framework diagram of the generator portion of FIG. 2;
FIG. 4 is a schematic diagram of a substructure of structure 212 of FIG. 3;
FIG. 5 is a model frame diagram of the discriminator section of FIG. 2;
FIG. 6 is a block diagram of a cross-modality medical image synthesis system in an embodiment.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
With the development of science and technology, the acquisition modes of medical images are various, and medical images in different modalities have different advantages and disadvantages. For example, Magnetic Resonance Imaging (MRI) does not radiate a human body, soft tissue structures are clearly displayed, abundant diagnostic information can be obtained, but acquisition time is long, and artifacts are easily generated; positron Emission Tomography (PET) can make early diagnosis of diseases by changing the tissue functionality of the lesion region, but is expensive and has low image resolution. Research shows that the morphological or functional abnormality of human body caused by diseases is often expressed in various aspects, and the information acquired by the single-mode imaging equipment cannot fully reflect the complex characteristics of the diseases. And clinically acquiring medical images of different modalities simultaneously requires a lot of time and money. Therefore, how to accurately synthesize images of a desired modality by computer technology using medical images of existing modalities is a research direction in recent years.
Although the existing cross-mode synthesis methods have good effects, due to the complex spatial structure of medical images, the synthesis results still cannot well represent the edge information of human tissues, and the problems of low signal-to-noise ratio, fuzzy edge and the like exist. So that the composite effect of the multi-modal imagery of a particular subject is reduced with limited paired data.
In view of the above, the present application is intended to provide a new solution to the above-mentioned technical problem, and the specific structure thereof will be described in detail in the following embodiments.
According to a first aspect of the present invention, as shown in fig. 1, the present application provides a cross-modality medical image synthesis method, which may include steps S100-S300.
Step S100, constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;
step S200, the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results; and
and step S300, constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model.
According to the cross-modal medical image synthesis method, the generation confrontation network model comprising the generator and the discriminator is constructed firstly, then the generator is controlled to carry out feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and then the synthesized second-modal medical image and the real second-modal medical image are respectively spliced with the real first-modal medical image to output the first image pair and the second image pair, on one hand, paired data can be added, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
Specifically, referring to fig. 2, the real first modality medical image SP1 of the present application may be image data of size 256 × 3, and for convenience of description, the real first modality medical image SP1 is subsequently simplified to the first modality medical image SP 1. The first modality medical image SP1 may include a CT image or an MRI image. Meanwhile, the first modality medical image SP1 may be selected from a first modality medical image set which is a collection of images acquired at a first modality for a plurality of reference objects. For example, the first modality may be MRI and the plurality of reference objects may be certain organs of a plurality of persons, such as hearts of a plurality of persons, in which case the first modality medical image set is a set of cardiac MRI images acquired of the hearts of a plurality of persons using MRI. The first modality is described above by taking MRI as an example, and the plurality of reference objects are described by taking the heart of a plurality of persons as an example, but it should be understood that the present disclosure is not limited thereto, and the first modality may also be various other modalities such as CT, PET, SPECT, etc., and the plurality of reference objects may also be various other reference objects such as the kidneys of a plurality of persons, the bones of a plurality of persons, etc.
Further, the real second modality medical image SP2 of the present application may also be image data of size 256 × 3, and for convenience of description, the real second modality medical image SP2 is subsequently simplified to the second modality medical image SP 2. The second modality medical image SP2 may include a SPECT image or a PET image. Meanwhile, the second modality medical image SP2 may be selected from a second modality medical image set which is a set of images acquired with respect to a plurality of reference objects at a second modality. For example, the second modality may be PET and the plurality of reference objects may be certain organs of a plurality of persons, such as hearts of a plurality of persons, in which case the second modality medical image set is a set of cardiac PET images acquired of the plurality of persons' hearts using PET. The second modality is described above with PET as an example, and the plurality of reference objects are described with the plurality of human hearts as an example, but it should be understood that the disclosure is not limited thereto, and the second modality may also be various other modalities such as CT, MRI, SPECT, etc., and the plurality of reference objects may also be various other reference objects such as a plurality of human kidneys, a plurality of human bones, etc. It will be appreciated that the first modality medical image SP1 and the second modality medical image SP2, regardless of the particular modality selected, should be image data obtained for the same sample.
For convenience of description, in the following embodiments, the first modality medical image SP1 is an MRI image, and the second modality medical image SP2 is a PET image.
In one possible embodiment, with continuing reference to FIGS. 2, 3 and 4, the generator 20 of the present application may employ a U-Net network architecture, which is a 3D architecture. Which may include an encoder 210 and a decoder 220 that are symmetric in network architecture. The U-Net model is designed based on a full convolution network with jump connection, and the main idea is to design an encoder and a decoder with symmetrical network structures, so that the encoder and the decoder have feature maps with the same quantity and size, and combine the corresponding feature maps of the encoder and the decoder through the jump connection, so that feature information in a down-sampling process can be reserved to the maximum extent, and the efficiency of feature expression is improved. The MRI and PET images are from the same sample, sharing a large amount of primary feature information between them, so the U-Net model is well suited for complex feature mapping between the two modality images.
Further, the generating step of the synthesized second modality medical image SS may include the sub-steps of:
outputting a real feature map of the first modality medical image through the feature extraction operation of the encoder multilayer convolution;
and the decoder performs multilayer deconvolution operation on the feature map output by the encoder, performs multiple splicing operations on the generated feature map and the feature map with the same size as the corresponding position of the encoder, and finally outputs a target reconstructed image, namely the synthesized second modality medical image SS.
In particular, the synthesized second modality medical image SS may be a SPEC image or a PET image. Meanwhile, the synthesized second modality medical image SS should be of the same type and subject as the real second modality medical image SP 2.
Further, the aforementioned first image pair may be a mosaic of the first modality medical image SP1 and the second modality medical image SP2, and the second image pair may be a mosaic of the first modality medical image SP1 and the synthesized second modality medical image SS. It will be appreciated that the composition of the first and second image pairs may also be interchanged and will not be described in detail herein.
In one possible embodiment, referring to fig. 3, the encoder 210 may include a convolution module layer, a batch normalization layer (not shown), and an activation layer (not shown); the batch normalization layer is also denoted as BN, and the active layer is also denoted as ReLU, and it is understood that reference may be made to the description of the prior art for the batch normalization layer BN and the active layer ReLU, which is not the focus of the present application, and further details are not described herein.
The number of the convolution module layers is seven, and the seven convolution module layers are respectively marked as a convolution module layer 211, a convolution module layer 212, a convolution module layer 213, a convolution module layer 214, a convolution module layer 215, a convolution module layer 216 and a convolution module layer 217. And the second to fifth of the convolution module layers are hybrid dilated convolution module layers, i.e., the convolution module layers 212, 213, 214, 215 with bold borders in fig. 2, and the rest are common 3 × 3 × 3D full convolution layers.
In one possible embodiment, referring to fig. 4 for assistance, taking the hybrid convolutional layer 212 as an example, the hybrid convolutional layer 212 may include 6 3 × 3 × 3D convolutional layers, and the expansion rate of each 3D convolutional layer is set to be a zigzag structure.
Specifically, each of the convolutional layers is referred to as convolutional layer 1(2121), convolutional layer 2(2122), convolutional layer 3(2123), convolutional layer 4(2124), convolutional layer 5(2125), and convolutional layer 6 (2126); wherein, the convolutional layers 1(2121) and 3(2123), 2(2122) and 5(2125), 4(2124) and 6(2126) are connected by residual structure;
the expansion rates of the convolutional layers 1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.
The hybrid expansion convolution module layer is only arranged in the middle four layers of the encoder 210, so that the network grid problem is avoided, and the network parameters and the training time are reduced to a certain extent while the characteristic information is fully extracted to improve the generation quality.
In one possible embodiment, with continued reference to fig. 4, the decoder 220 essentially reconstructs the final output from the feature map compressed by the encoder 210. As can be seen from the foregoing description, the network structure of the decoder 220 and the encoder 210 is symmetrical. Therefore, as shown, the decoder 220 of the present application is also composed of 7 deconvolution module layers, a batch normalization layer (not shown), and an activation layer (not shown). The batch normalization layer is also denoted as BN, and the active layer is also denoted as ReLU, and it is understood that reference may be made to the description of the prior art for the batch normalization layer BN and the active layer ReLU, which is not the focus of the present application, and further details are not described herein.
Further, the seven deconvolution module layers are respectively marked as a deconvolution module layer 221, a deconvolution module layer 222, a deconvolution module layer 223, a deconvolution module layer 224, a deconvolution module layer 225, a deconvolution module layer 226, and a deconvolution module layer 227; each deconvolution module layer is composed of 3 2 × 2 3D convolution layers, and the deconvolution module layer 221 and the deconvolution module layer 223 are connected by a residual structure.
In one possible embodiment, with continued reference to fig. 5, the arbiter 220 is an 8-layer 3D full convolutional network, which may include 6 convolutional layers, a batch normalization layer (317) and an activation layer (318); wherein each of the convolutional layers is denoted as convolutional layer 1(311), convolutional layer 2(312), convolutional layer 3(313), convolutional layer 4(314), convolutional layer 5(315), and convolutional layer 6 (316); wherein, the convolutional layers 1(311) and 4(314), convolutional layers 2(312) and 6(316) are connected by residual structure.
Further, as shown in fig. 5, the convolutional layer 6 uses Global Pooling (GAP), the 7 th layer uses 1 convolution kernel of 1 × 1, and finally, a Sigmoid activation function is used to determine whether the first image pair and the second image pair output via the generator belong to a real image or a generated image. It can be understood that the principle of the discriminator for performing true and false discrimination on the input image can be known by referring to the prior art, and further description is not provided herein.
Further, to prevent the network from overfitting, the present application also adds a dropout operation to the activation layer ReLu in the generator 20, with the correlation value set to 0.5. And finally, obtaining a synthesized second modality medical image (PET image) through the Tanh activation function according to the coded and decoded characteristic information.
Specifically, the manner in which the generator 20 synthesizes the real MRI images into the corresponding PET images through the encoder 210 and the decoder 220 can be understood with reference to fig. 3 and 4, and the detailed description of the present application is omitted.
In one possible embodiment, to further improve the quality of the synthesis. The synthesis method of the present application may further include:
taking each pixel in the feature map as a random variable, and calculating the pairing covariance among the pixels;
the value of each pixel is selectively enhanced or reduced according to the calculated pair covariance.
That is, the present application introduces a self-attention mechanism between the encoder 210 and the decoder 220, where the self-attention mechanism is to take each pixel in the feature map as a random variable, calculate the pair covariance between all pixels, enhance or weaken the value of each predicted pixel according to the similarity between each predicted pixel and other pixels in the image, that is, selectively amplify more valuable feature channels and suppress useless feature channels, that is, amplify the weight of relevant features, and suppress the weight of irrelevant features. Interference caused by irrelevant features and noise in jump connection is further eliminated, and key features in residual structure connection are highlighted, so that key information of MRI images is better captured.
In one possible embodiment, the loss function MAE of the generator is set to:
Figure BDA0003417295880000131
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure BDA0003417295880000132
is a predicted value.
In one possible embodiment, the penalty function MSE of the discriminator is set as:
Figure BDA0003417295880000133
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure BDA0003417295880000134
is a predicted value.
The method utilizes the loss function to train and generate the confrontation network model, and can improve the quality of the synthetic training.
According to a second aspect of the present application, reference may be made to fig. 6, which also provides a cross-modality medical image synthesis system, which may include a model construction module 2 and a model training module 3. The image data input by the input unit 1 is transmitted to the model construction module 2 and the model training module 3 and then output via the output unit 4.
Wherein the model construction module 2 is configured to construct a generative confrontation network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a characteristic mapping relation between the first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the characteristic mapping relation, and outputs a true and false medical image pair; the true and false medical data image pair is formed by splicing the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively;
the discriminator takes the true and false medical data image pair as input, carries out true and false discrimination on the true and false medical data image pair and outputs a discrimination result;
the model training module 3 is configured to construct different loss functions for the generator and the discriminator, respectively, so as to perform image synthesis training on the generated confrontation network model.
The cross-modal medical image synthesis system constructs a generation countermeasure network model comprising a generator and a discriminator by setting a model construction module 2, then controls the generator to perform feature mapping relation between a first modal medical image and a second modal medical image, generates a synthesized second modal medical image according to the feature mapping relation, and then splices the synthesized second modal medical image and a real second modal medical image with the real first modal medical image respectively to output a first image pair and a second image pair, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are further subjected to true and false judgment through the discriminator, and then the model training module 3 is arranged to respectively construct different loss functions for the generator and the discriminator so as to perform image synthesis training on the generated impedance model, so that the final obtained result is more reliable. That is to say, the synthesis system of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is blurred, and the like.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to execute the method according to any of the above embodiments of the present invention when executing the program.
Optionally, a memory for storing a program; a Memory, which may include a Volatile Memory (english: Volatile Memory), such as a Random-Access Memory (RAM), a Static Random-Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the Memory may also comprise a Non-Volatile Memory, such as a Flash Memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.
The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.
The above terminal, including the processor configured to execute the method according to any of the foregoing embodiments, in the cross-modality medical image synthesis method, by first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to perform a feature mapping relationship between a first-modality medical image and a second-modality medical image, and generating a synthesized second-modality medical image according to the feature mapping relationship, and then stitching the synthesized second-modality medical image and a real second-modality medical image with the real first-modality medical image, respectively, to output a first image pair and a second image pair, on one hand, paired data may be added, so that a better synthesis effect is achieved; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any of the above-mentioned embodiments of the present invention.
The computer readable storage medium may be configured to perform the cross-modality medical image synthesis method described in any one of the foregoing embodiments when the computer program stored thereon is executed by the processor, and by first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to perform a feature mapping relationship between a first modality medical image and a second modality medical image, and generate a synthesized second modality medical image according to the feature mapping relationship, and then stitching the synthesized second modality medical image and a real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair, on one hand, paired data may be added to make the synthesis effect better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the first image pair and the second image pair are subjected to true and false discrimination through the discriminator, and then the generator and the discriminator respectively construct different loss functions to perform image synthesis training on the generation countermeasure model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
The cross-modal medical image synthesis method and system provided by the above embodiment of the present invention, wherein the system includes modules corresponding to the steps of the method, and the method includes first constructing a generation countermeasure network model including a generator and a discriminator, then controlling the generator to generate a feature mapping relationship between a first-modal medical image and a second-modal medical image, generating a synthesized second-modal medical image according to the feature mapping relationship, and outputting a true-false medical image pair, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the true and false medical image pairs are further subjected to true and false judgment through the discriminator, and then the generator and the discriminator respectively construct different loss functions so as to perform image synthesis training on the impedance model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3D CGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
According to the cross-modal medical image synthesis method and system provided by the embodiment of the invention, the generation countermeasure network model comprising the generator and the discriminator is firstly constructed, then the generator is controlled to perform feature mapping relation between the first-modal medical image and the second-modal medical image, the synthesized second-modal medical image is generated according to the feature mapping relation, and the true-false medical image pair is output, on one hand, paired data can be increased, so that the synthesis effect is better; on the other hand, the generated synthesized second modality medical image is obtained according to the feature mapping relationship, meanwhile, the true and false medical image pairs are further subjected to true and false judgment through the discriminator, and then the generator and the discriminator respectively construct different loss functions so as to perform image synthesis training on the impedance model, so that the final obtained result is more reliable. That is to say, the synthesis method of the application is constructed based on 3DCGAN, and can generate high-reliability multi-modal data by fully utilizing spatial structure information of multi-modal medical images, so as to solve the problems that the existing synthesis result cannot well represent the edge information of human tissues, and the signal-to-noise ratio is low, the edge is fuzzy and the like.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A cross-modal medical image synthesis method is characterized by comprising the following steps:
constructing a generation countermeasure network model comprising a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;
the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results; and
and constructing different loss functions for the generator and the discriminator respectively so as to carry out image synthesis training on the generated confrontation network model.
2. The cross-modality medical image synthesis method according to claim 1, wherein the generator employs a U-Net network structure including an encoder and a decoder with symmetric network structure;
the generating of the composite second modality medical image includes:
outputting a real feature map of the first modality medical image through the feature extraction operation of the encoder multilayer convolution;
and the decoder performs multilayer deconvolution operation on the feature map output by the encoder, performs multiple splicing operations on the generated feature map and the feature map with the same size as the corresponding position of the encoder, and finally outputs a target reconstructed image, namely the synthesized second modality medical image.
3. The cross-modality medical image synthesis method according to claim 2, further comprising:
taking each pixel in the feature map as a random variable, and calculating the pairing covariance among the pixels;
the value of each pixel is selectively enhanced or reduced according to the calculated pair covariance.
4. The cross-modality medical image synthesis method of claim 2, wherein the encoder comprises a convolution module layer, a batch normalization layer and an activation layer;
the number of the convolution module layers is seven, the second to the fifth of the convolution module layers are mixed expansion convolution module layers, and the rest are full convolution layers.
5. The cross-modality medical image synthesis method of claim 4, wherein the hybrid dilation-convolution module layer comprises 6 3 x 3D convolution layers, wherein the dilation rate of each 3D convolution layer is set to be a saw-tooth structure;
each of the convolutional layers is denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5, and convolutional layer 6, respectively; wherein, the convolutional layers 1 and 3, 2 and 5, and 4 and 6 are respectively connected through residual error structures;
the expansion rates of the convolutional layers 1, 2, 3, 4, 5 and 6 are 1, 2, 5, 1, 2 and 5, respectively.
6. The cross-modality medical image synthesis method of claim 1, wherein the discriminator comprises 6 convolutional layers, a batch normalization layer and an activation layer; wherein each of the convolutional layers is respectively denoted as convolutional layer 1, convolutional layer 2, convolutional layer 3, convolutional layer 4, convolutional layer 5 and convolutional layer 6; wherein, the convolutional layers 1 and 4, and 2 and 6 are respectively connected through residual error structures.
7. The cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the real first modality medical image includes a CT image or an MRI image; the composite second modality medical image includes a SPECT image or a PET image.
8. The cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the loss function MAE of the generator is set as:
Figure FDA0003417295870000021
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure FDA0003417295870000022
is a predicted value.
9. A cross-modality medical image synthesis method according to any one of claims 1 to 6, wherein the loss function MSE of the discriminator is set as:
Figure FDA0003417295870000031
wherein m is the model batch size, yiIn order to be the true value of the value,
Figure FDA0003417295870000032
is a predicted value.
10. A cross-modality medical image synthesis system, comprising:
a model building module configured to build a generative confrontation network model including a generator and a discriminator; the generator takes a real first modality medical image as input, learns a feature mapping relation between the real first modality medical image and a real second modality medical image, generates a synthesized second modality medical image according to the feature mapping relation, and then splices the synthesized second modality medical image and the real second modality medical image with the real first modality medical image respectively to output a first image pair and a second image pair;
the discriminator takes the first image pair and the second image pair as input, respectively carries out true and false discrimination on the first image pair and the second image pair, and outputs true and false discrimination results;
and the model training module is used for respectively constructing different loss functions for the generator and the discriminator so as to carry out image synthesis training on the generated confrontation network model.
11. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is operable to perform the method of any of claims 1-9 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1-9.
CN202111551447.4A 2021-12-17 2021-12-17 Cross-modal medical image synthesis method, system, terminal and storage medium Pending CN114240753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111551447.4A CN114240753A (en) 2021-12-17 2021-12-17 Cross-modal medical image synthesis method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111551447.4A CN114240753A (en) 2021-12-17 2021-12-17 Cross-modal medical image synthesis method, system, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN114240753A true CN114240753A (en) 2022-03-25

Family

ID=80757877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111551447.4A Pending CN114240753A (en) 2021-12-17 2021-12-17 Cross-modal medical image synthesis method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN114240753A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129235A (en) * 2023-04-14 2023-05-16 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical images from cerebral infarction CT to MRI conventional sequence
CN116152235A (en) * 2023-04-18 2023-05-23 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical image from CT (computed tomography) to PET (positron emission tomography) of lung cancer
WO2024022485A1 (en) * 2022-07-29 2024-02-01 中国人民解放军总医院第一医学中心 Computer angiography imaging synthesis method based on multi-scale discrimination
CN117853695A (en) * 2024-03-07 2024-04-09 成都信息工程大学 3D perception image synthesis method and device based on local spatial self-attention
WO2024087218A1 (en) * 2022-10-28 2024-05-02 深圳先进技术研究院 Cross-modal medical image generation method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024022485A1 (en) * 2022-07-29 2024-02-01 中国人民解放军总医院第一医学中心 Computer angiography imaging synthesis method based on multi-scale discrimination
WO2024087218A1 (en) * 2022-10-28 2024-05-02 深圳先进技术研究院 Cross-modal medical image generation method and apparatus
CN116129235A (en) * 2023-04-14 2023-05-16 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical images from cerebral infarction CT to MRI conventional sequence
CN116152235A (en) * 2023-04-18 2023-05-23 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical image from CT (computed tomography) to PET (positron emission tomography) of lung cancer
CN117853695A (en) * 2024-03-07 2024-04-09 成都信息工程大学 3D perception image synthesis method and device based on local spatial self-attention
CN117853695B (en) * 2024-03-07 2024-05-03 成都信息工程大学 3D perception image synthesis method and device based on local spatial self-attention

Similar Documents

Publication Publication Date Title
CN114240753A (en) Cross-modal medical image synthesis method, system, terminal and storage medium
CN110580695B (en) Multi-mode three-dimensional medical image fusion method and system and electronic equipment
EP3600026B1 (en) Improving quality of medical images using multi-contrast and deep learning
US9547902B2 (en) Method and system for physiological image registration and fusion
Zhan et al. Multi-modal MRI image synthesis via GAN with multi-scale gate mergence
Jafari et al. Semi-supervised learning for cardiac left ventricle segmentation using conditional deep generative models as prior
CN113099208A (en) Method and device for generating dynamic human body free viewpoint video based on nerve radiation field
CN111369562B (en) Image processing method, image processing device, electronic equipment and storage medium
Harouni et al. Universal multi-modal deep network for classification and segmentation of medical images
CN111368849A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112819914B (en) PET image processing method
CN111860782B (en) Triple multi-scale CycleGAN, fundus fluorography generation method, computer device, and storage medium
CN116630463B (en) Enhanced CT image generation method and system based on multitask learning
Singh et al. Medical image generation using generative adversarial networks
Yan et al. Cine MRI analysis by deep learning of optical flow: Adding the temporal dimension
Yang et al. Domain-agnostic learning with anatomy-consistent embedding for cross-modality liver segmentation
Marin et al. Numerical surrogates for human observers in myocardial motion evaluation from SPECT images
CN112991220A (en) Method for correcting image artifacts by convolutional neural network based on multiple constraints
CN112508775A (en) MRI-PET image mode conversion method and system based on loop generation countermeasure network
CN110852993B (en) Imaging method and device under action of contrast agent
Xu et al. Applying cross-modality data processing for infarction learning in medical internet of things
CN115965785A (en) Image segmentation method, device, equipment, program product and medium
WO2022120731A1 (en) Mri-pet image modality conversion method and system based on cyclic generative adversarial network
CN115965837A (en) Image reconstruction model training method, image reconstruction method and related equipment
CN115861464A (en) Pseudo CT (computed tomography) synthesis method based on multimode MRI (magnetic resonance imaging) synchronous generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220530

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right