CN115457359A

CN115457359A - PET-MRI image fusion method based on adaptive countermeasure generation network

Info

Publication number: CN115457359A
Application number: CN202211094448.5A
Authority: CN
Inventors: 刘尚旺; 杨荔涵; 刘国奇; 申华磊; 张新明; 张非; 李文凤
Original assignee: Henan Normal University
Current assignee: Henan Normal University
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2022-12-09

Abstract

The invention generates a countermeasure network by constructing an adaptive residual dense and combining YC-based _b C _r The method mainly adopts a regional residual error learning module and an output cascade deepening generation network to avoid feature loss, dynamically guides a generator to generate a fusion image with the same distribution as a source image through a self-adaptive decision block, and fuses an image gradient map and the source imagePerforming a antagonism game between the input images and the gradient maps to efficiently train the generator and the discriminator so as to obtain a fused image with rich details and clear textures; the method has the characteristics of end-to-end and unsupervised, manual intervention is not needed, real data is not needed to be used as a label, and images with different resolutions can be fused under the condition of not introducing a traditional frame; the peak signal-to-noise ratio and the structural similarity respectively reach 55.2124 and 0.4697 in the test of an MRI/PET data set of Harvard medical institute, are superior to the most advanced algorithm at present, and are more favorable for clinical application diagnosis.

Description

PET-MRI image fusion method based on adaptive countermeasure generation network

Technical Field

The invention belongs to the technical field of medical imaging, and particularly relates to a PET-MRI image fusion method based on a self-adaptive countermeasure generation network.

Background

Medical images are divided into structural systems and functional systems, and Imaging mechanisms of different systems can acquire different Imaging information of the same part, for example, magnetic Resonance Imaging (MRI) images can provide high-resolution brain soft tissue structural information, and Positron Emission Tomography (PET) images can reflect color information of metabolism and functional conditions of tissues, but the images of different modalities are limited by the MRI images, the MRI images lack movement information such as body metabolism, the PET images are low in resolution, and focuses cannot be accurately positioned; different images have specific characteristics, and the limited information of the single-mode images hardly meets the requirement of clinical diagnosis and treatment on information quantity, and the images from multiple imaging mechanisms are required to be fused; in recent years, the success of PET-MRI fusion imaging in the clinical field has led to a great interest in non-invasive functional imaging and anatomical imaging.

In the fusion process, the spatial information of the MRI image and the spectral information of the PET image need to be retained, or the spatial information existing in the MRI data needs to be introduced into the PET, so that the limitation of the single-mode medical image is overcome, the imaging quality is improved while the image characteristics are retained, and the clinical applicability of the image in diagnosing and evaluating medical problems is improved.

The most widely applied technology in the field of traditional medical image fusion is pixel-level fusion, which can be divided into two types, namely a spatial domain and a transform domain, wherein the former fusion rule directly acts on pixels, and the rule is simple, but the fusion effect is poor, for example, in He C T, liu Q X, li H L, et al, multimodal media image fusion on IHS and PCA [ J ]. Procedia Engineering, 2010, 7: 280-285", the image is converted into a luminance, chrominance, saturation (Intensity, hue, saturation, IHS) channel, and the IHS transformation causes spectral and spatial distortion; the transform domain-based image fusion technology mostly adopts a multi-scale transform (MST) technology, and is divided into three processes of decomposition, fusion and reconstruction; the method comprises the following steps that a source image is firstly transformed to a frequency domain and fused according to a certain rule, and then the fused coefficient and a transformation base are used for image reconstruction; the method well protects the detail information of the source image, but neglects the space consistency, and leads to the distortion of the brightness and the color of the fused image; the rules of the traditional fusion method need to be artificially designed and selected, different filter parameters are selected from MST, and the obtained fusion effect has great difference; however, due to the diversity of feature extraction and the complexity of the fusion rules, it becomes difficult to manually design the fusion method, so that the model robustness is reduced.

With the rise of deep learning in recent years, a neural network is used to solve the above problems, a Convolutional Neural Network (CNN) model is mostly used for image fusion based on the existing deep learning, research based on the deep learning in the field of image fusion is gradually activated in the last years, and scholars successively propose a plurality of fusion methods, and an important branch is gradually formed; in some approaches, a deep learning framework is used to extract Image features for reconstruction in an end-to-end manner, typically the document "Liu Y, chen X, ward R K, et al. Image fusion with a convolutional sparse representation [ J ]. IEEE signal processing letters, 2016, 23 (12): 1882-1886" applies a Convolutional Sparse Representation (CSR) to Image fusion, extracts multi-layer features, and generates a fused Image using these features; IFCNN adds convolutional neural networks to transform domain image Fusion algorithms (Zhang Y, liu Y, sun P, et al. IFCNN: A genetic image Fusion frame based on volumetric neural network [ J ] Information Fusion, 2020, 54: 99-118); the documents "Young A S, omar Z, shell U. An improved adaptation for a Medical image fusion using a sparse representation and a simple convolutional neural network [ J ]. Biological Signal Processing and Control, 2022, 72: 103357" propose a fusion method for Medical images based on a sparse representation and a twin convolutional neural network, the documents "Hou R, zhou D, nie R, et al. Brain CT and MRI Medical image fusion using a sparse convolutional neural network [ J ]. Medical & biological engineering & computing, 2019, 57 (4): 887-900" choose to add a deep learning technique to the conventional image fusion scheme, and N is fused with a high frequency coefficient for a high frequency fusion technique; the Denseuse comprises a convolution layer and a fusion layer dense block, an encoder is responsible for providing input for a network, and after the network obtains a characteristic diagram, a decoder reconstructs a fusion Image (Li H, wu X J. Denseuse: A fusion approach to input and visual images [ J ]. IEEE Transactions on Image Processing, 2018, 28 (5): 2614-2623.); the GCF is a multi-focus image fusion unsupervised model based on gradient and connected regions; the document "Chen M, zheng H, lu C, et al. A spatial-temporal fusion segmentation in DCE-MRI [ C ]// International Conference on Neural Information processing. Springer, cham, 2018: 358-368" extracts features by combining CNN and RNN, then fuses for segmentation, and introduces the generated confrontation network into the fusion of infrared and visible light images for the first time, wherein the purpose of the generator is to generate a fused image in which mainly infrared Information contains a small amount of visible light Information, and the purpose of the discriminator is to force the fused image to have more detailed Information in the visible light images; DDcGAN constructs a dual discriminator generation countermeasure network (Ma J, xu H, jiang J, et al. DDcGAN: A dual-discrete passive adaptive network for multi-resolution Image fusion [ J ]. IEEE Transactions on Image Processing, 2020, 29: 4980-4995); the document "Tang W, liu Y, zhang C, et al, green fluoro protein and phase-contrast image fusion visual general networks [ J ]. Computational and chemical Methods in Medicine, 2019, 5450373" proposes to combat network fusion biological images by generativity; PMGI extracts information using Image Gradient and contrast and performs feature reuse on the same path (Zhang H, xu H, xiao Y, et al. Retening the Image Fusion: A Fast Unified Image Fusion Network based on probability of Gradient and probability [ C ]// probabilities of the AAAI Conference on scientific probability, 2020, 34 (7): 12797-12804).

The research based on deep learning becomes an active topic in the field of image fusion in the last years, a plurality of fusion methods based on deep learning are proposed in succession and gradually form an important branch, although the methods have achieved good results, most of fusion rules are still designed manually, and the whole method can not get rid of the limitation of the traditional fusion method; the biggest obstacle to image fusion by using deep learning is missing real tag data, and the MRI-PET fusion task is difficult to directly acquire a real tag image.

Thus, while these prior efforts have been successful, there are still some disadvantages: (1) The deep learning framework is only used for making up for certain defects of the traditional fusion method, such as feature extraction, the design of the whole fusion method is still based on the traditional method, and the end-to-end result generation cannot be realized based on the traditional fusion framework which needs complex fusion rule design; (2) Due to the loss of label data, a solution depending on the design of a loss function is not comprehensive, and due to the limitation of a physical imaging process, a fusion task cannot obtain a real fusion image as a label, the existing deep learning method depends on artificial prior in a large amount, and a manually made pseudo data label is adopted, so that the performance of an algorithm is limited to a great extent; (3) The solution based on the traditional generation countermeasure network can only make the result similar to a source image, namely, only the pixel level L1 loss training is used for generating the network, and due to the existence of the Nash equilibrium theory, partial high-frequency detail information contained in the source image is lost.

Disclosure of Invention

In order to avoid the loss of spatial information in the image fusion process, the empty texture structure of the MRI and PET images is protected, and further the texture and detail information of the high-resolution image and the structure information of the low-resolution image are simultaneously saved, a PET-MRI image fusion method based on a self-adaptive countermeasure generation network is provided, which specifically comprises the following steps:

a) Mapping PET image from RGB space to YC space _b C _r Space and extracting Y components;

b) Inputting the Y component of the PET image and the MRI image into a generation network;

c) Respectively extracting a combined gradient map of the input image and a gradient map for generating a network output result by using a Laplace operator;

d) Inputting the two gradient maps into a discrimination network, and enabling the probability of a real input label to be 0.7-1.2 (soft label), and enabling the probability of a generated result input label to be 0-0.3 (soft label);

e) Training a generation network and a discrimination network based on the confrontation generation strategy;

f) Optimizing by adopting an Adam optimizer;

g) Obtaining a trained generation network model;

h) And predicting by using the network model.

Wherein, in connection with step a), a decorrelated color model YC is used _b C _r The model, which divides the image information into three channels: y channel, C _b Channel and C _r Channels representing luminance components of colors, and chrominance offset components of blue and red colors, respectively; y channel stores luminance information of image, C _b Channel and C _r The channel stores red and blue color difference information of the image; therefore, in image fusion iteration, only the information of the Y-channel component in the MRI image and the PET image needs to be processed, and both are gray level images; specifically, the transformation equation and the inverse transformation equation based on are respectively shown in the formulas (1) and (2):

（1）

（2）

the framework of the adaptive countermeasure generation network mainly comprises a generator, a discriminator and a regional residual error learning module, wherein the network can fuse a low-resolution Y component (PET _ Y) of a PET image with a grayscale image MRI with higher spatial resolution to obtain a fused image comprising abundant structural information and higher spatial resolution; in order to simultaneously store the texture and detail information of the high-resolution image and the structure information of the low-resolution image, a mechanism for adjusting a loss function is used for optimizing a prediction result; the general architecture of the adaptive countermeasure generation network is shown in fig. 1; wherein, the adaptation of the network is derived from a decision block shown in a preprocessing stage before the input of the discriminator network in fig. 1, and the decision block can also be referred to as a maximum matching algorithm shown in formula (3);

（3）

the maximum matching algorithm process is as follows: inputting a PET _ Y component Y and an MRI image M, extracting a Laplace gradient image of the PET _ Y component Y and the MRI image M through a Laplace operator, comparing pixel values pixel by pixel, taking the pixel value in each pixel of the two images as a gradient pixel after fusion, and finally calculating to obtain a combined gradient image; the decision block can guide the fusion result to approach the brightness and gradient distribution of the source image, and the principle is to evaluate the definition of each pixel so as to generate a screening image with effective information positions.

The structure of the generator, as shown in fig. 2, the generator is a two-branch fusion network, and divides the Y component of the PET image and the MRI image into two paths for processing respectively; the two-branch fusion network framework respectively uses a group of 3 x 3 convolution layers to carry out feature extraction, then deepens the network to carry out feature processing, and finally uses a group of 1 x 1 convolution layers to carry out reconstruction; wherein the first convolution layer extracts shallow features by equation (4):

（4）

wherein, the first and the second end of the pipe are connected with each other,H _conv convolution operation representing convolution kernel of 5 × 5 in the shallow feature extraction layer; the second layer output can be obtained by equation (5):

（5）

wherein, whereinH _LRLP Is a complex function of LRLR layer operations;F _pre the convolution of each layer in the block is fully utilized to generate the local feature; the third layer output, as shown in equation (6):

（6）

wherein, the first and the second end of the pipe are connected with each other,H _RL represents a residual join; lambda is the weight when residual errors are fused; the fourth layer and the third layer have the same principle, and the input of the fourth layer and the third layer is based on the output cascade of the first three layers; the fourth layer output, as shown in equation (7):

（7）

and then, the output characteristics of each layer are spliced and fused by using a 3 x 3 convolution, and the output formula is shown as the formula (8):

（8）

wherein, the first and the second end of the pipe are connected with each other,H _concat representing a characteristic graph splicing operation; the last layer of the extraction module is set as a convolution of 1 multiplied by 1 after splicing the characteristic graphs, W is a weight matrix of the first four layers of the fusion extraction module, and then the output of two pathsF _ext,1 ,F _ext,2 Entering a fusion module to be fused,after the fusion operation, finally fusing the image, as shown in formula (9):

（9）

wherein the content of the first and second substances,H _fuse representing the composite operation of the fusion module.

The structure of the discriminator is shown in fig. 3, two sources are input, the two input images are subjected to calculation of a gradient map through a laplacian operator, and then a combined gradient map obtained through a maximum function and a gradient map obtained by fusion of the images through the laplacian operator are used as two inputs of the discriminator; the four convolution layers and one linear layer form a discriminator of the model, the sizes of convolution kernels of the convolution layers are all set to be 3 multiplied by 3, the step length is set to be 4, and an ELU is used as an activation function; the last layer is a linear layer used for calculating probability so as to judge the truth of the generated data.

Region residual learning module (LRLP):

in the forward transmission process of the convolutional neural network, with the increase of the network depth, the information contained in the feature map obtained by convolution is gradually reduced, and in order to solve the problems, the method uses a regional residual error learning module, and the features contained in each layer are stored as much as possible through the direct mapping of information among different layers; the LRLP module is as shown in FIG. 4, and firstly obtains image features of different depths by connecting different convolutions c times in series, then performs weight splicing on the features of different depths after convolution, then performs compression reconstruction by using a 1 × 1 convolution layer, and finally performs activation by using an ELU (element-free unit), so that the features contained in each layer are saved as much as possible; if there are c convolutional layers, then its final output, as shown in equation (10):

（10）

wherein, the first and the second end of the pipe are connected with each other,F _c is the output of the c convolutional layer;H _concat a stitching function representing a feature map; w is a group representing each roll at the time of splicingA union function of the lamination weights;H _active the ELU activation is carried out on the spliced data; in the LRLP block, the output of each previous layer is used as the input of the next layer, which not only preserves the feedforward characteristics, but also improves the utilization of the input data.

In step e), the discriminator defines the joint gradient map as real data, and performs continuous antagonistic learning with the gradient map of the fusion image defined as pseudo data, and the target function of GAN is defined as shown in formula (11):

（11）

in implementing antagonism learning in GAN, a group of distinguishable semi-supervised loss functions are designed, which are different from the fixed loss functions of the traditional deep learning.

Loss function of generator

The loss function of the generator is based on the antagonism loss, the pixel level euclidean loss and the texture loss, and is shown in equation (12):

（12）

wherein the content of the first and second substances,

is a resistance loss from the generator-discriminator network;

by using a screening chartOptimized pixel-level euclidean penalties;

representing texture loss based on the gradient map;

and

the weights are respectively pixel level loss and texture loss and are used for ensuring that the importance of the three loss functions is the same;

loss of antagonism

In order for the image generated by the generator to be closer to the ideal fused image, a loss needs to be established between the generator and the discriminator, and the traditional countermeasures to reduce the maximum-minimum problem into a maximum-minimum problem

But at the beginning of the training phase,

saturation is possible, so maximization is used to train the generator network; in order to provide stronger gradient, a square operation is added on the basis of the maximum operation,

the definition is shown in formula (13):

（13）

wherein M is the number of images of a batch during training; c, the discriminator identifies the rate-of-change label of the true and false images; the invention is to get

Indicating that the calculation of the gradient map is performed by using the laplacian operator; m, Y represents the Y channel of the input MRI image and PET image;

pixel level euclidean loss

The invention utilizes Euclidean distance between the fused image and the original image pixel to restrict the intensity distribution of the fused image and the original image in a clear area, and pixel level Euclidean loss can be formulated as shown in formula (14):

（14）

wherein the content of the first and second substances,h,wis shown ashGo to the firstwPixel values of the columns; h, W are the height and width of the image, respectively;Map ₁ ，Map ₂ representing a sifting map generated by the decision block based on two input images;

texture loss

The gradient of the image may partially characterize the texture details, especially for sharp MRI images, thus requiring the fused image to have similar gradients as the input image, and in conjunction with the screening map, the texture loss may be formulated as shown in equation (15):

（15）。

discriminator loss function

The invention designs a gradient map-based loss function for the discriminator, wherein 'false data' is the gradient map of the fused image and can be formulated as shown in an equation (16):

（16）

the "true data" required by the discriminator comes from the joint gradient map of the MRI and PET _ y constructs, formulated as shown in equation (17):

（17）

wherein abs represents an absolute value function; maximum represents the maximization function; based on the above two gradient maps, the loss function is expressed as shown in equation (18):

（18）

wherein a is a label of "false data" set to 0; b is a label of "true data" set to 1; so that the discriminator treats the combined gradient map of the image as true data and the gradient map of the fused image as false data; this constraint may guide the generator based onGrad _union To adjustGrad _fused The texture of the fused image is enhanced in the confrontation.

The invention has the following beneficial effects:

the invention provides a novel image fusion method by constructing a self-adaptive residual dense generation countermeasure network and combining a color space method based on YCbCr, the method enables the generation network to avoid gradient disappearance and gradient explosion, improves the network characteristic extraction performance, performs a countermeasure game between a fusion image gradient map and an input image combined gradient map, combines the designed countermeasure loss, discriminator loss, pixel level consistency loss and gradient consistency loss to obtain a fusion image with rich details and clear texture, does not need real data as a label for training, can fuse images with different resolutions under the condition of not introducing a traditional frame, greatly optimizes the fusion rule design of the traditional method, realizes the self-adaptive fusion without manual intervention, has more high-frequency details of the fusion image and greatly retains MRI pseudo-color content information, achieves the peak signal-to-noise ratio PSNR =55.2124 in the test of a Harvard institute of medicine/PET data set, achieves the structural similarity SSIM =0.4697, RMSE =0.1968 and the peak signal-to-noise ratio reaches PSNR = 55.4 _abf =0.3635 and Q _cv =2009.348, is superior to the most advanced algorithm at present, and is more helpful for assisting clinical application diagnosis.

Drawings

FIG. 1 is a schematic diagram of a countermeasure dense residual countermeasure generation network;

FIG. 2 is a schematic diagram of a generator network;

FIG. 3 is a schematic diagram of a network of discriminators;

FIG. 4 is a schematic diagram of a region residual learning module (LRLP);

fig. 5 is a schematic diagram of the qualitative comparison of the proposed method with other internationally leading methods.

Detailed Description

The invention is further illustrated by the following specific examples.

Example 1

The PET, MRI image used in this example is from the public data set of the website of the medical college of harvard university, wherein the MRI image is a single channel image of size 256 × 256; the PET image is a pseudo color image of size 256 × 256 × 3;

training a generator and a discriminator iteratively according to a antagonism process, setting the batch processing size as b, training an iteration to require k steps, and training M times with the ratio of the training number of the discriminator to the training number of the generator as p; the method is obtained through multiple tests: setting parameters in b =32, p =2, m =300, adrgan to be updated by adammoptimizer; to make the training of GAN more stable, a soft tag is used for the loss term parameter, which is set to a random number from 0.7 to 1.2 for the tag that should be set to 1;

the image is preprocessed from RGB channel to YC _b C _r Color space, since the Y channel (luminance channel) can represent structural details and luminance variations, only the Y channel needs to be fused; fusing C using a color space based approach _b And C _r A channel, and then inversely transforming the fusion component into an RGB channel; the experimental environment of this example is: windows 10, CPU AMD R5 5600X, memory 16G GPU RTX-3060 (6G); the software environments are Python 3.7.6 and Pytrch 1.10.0, and the training set, the verification set and the test set of the data set are as follows: 2:1, the specific training process is shown as algorithm 1:

quantitative evaluation index

The method and the comparison method are objectively evaluated by using five evaluation indexes, namely Q _abf 、Q _cv PSNR, SSIM and Rmse, Q _abf The algorithm uses local metrics to estimate the performance of the important information input in the fused image, the higher the value, the better the quality of the fused image, as shown in equation (19):

（19）

wherein the content of the first and second substances,Wused for dividing local areas;λ(w)representing the local area weight;A,B,Ftwo input images and a fused image respectively;

Q _cv the quality of the local area image is obtained by calculating the mean square error of the weighted difference image of the fused area image and the source area image, and finally, the fused image quality is the weighted sum of the quality measures of the local area image, and the formula is shown as the formula (20):

（20）

wherein the content of the first and second substances,Dis a local region similarity measure function;

the peak signal-to-noise ratio (PSNR) is the ratio of peak power to noise power in the fused image, reflects the distortion condition of the fused image, and is calculated according to the following formulas (21) to (24):

（21）

（22）

（23）

（24）

wherein the content of the first and second substances,MSErepresenting in the image as mean square erroriLine ofjPixels of a column; r represents the peak value of the fused image, and the larger the signal-to-noise ratio of the peak value is, the closer the fused image is to the source image;

the Structural Similarity (SSIM) is used to model the loss and distortion of images, and the index consists of three parts, respectively: correlation loss, contrast loss, brightness loss; the product of these three components is the result of the evaluation of the fused image, defined as follows:

（25）

wherein x and f respectively represent a source image and a block in the fused image;

is the covariance between the two blocks;

，

standard Deviation (SD);u _x ，u _y representing the mean value between two blocks, additionally addingC1,C2,C3The loss function is more stable;

the Root Mean Square Error (RMSE) is based on MSE, and the quantitative description of the difference between the source image and the fused image is completed by calculating the mean square error of the source image and the result image, and the calculation is as shown in a formula (26):

(26)

quantitative and qualitative comparison results

In order to verify the effect of the model on PET-MRI image fusion and verify the robustness of the model, five methods of DDcMan, densefruse, GCF, IFCNN and PMGI are selected to be compared with the model, and the methods have good effect in the traditional medical image fusion;

the effect of the vision experiment of the six related methods is shown in fig. 4, in which the result of DDcGan (third column) has a problem of spectral distortion, and the edge is blurred compared with the present model; denseuse (fourth column) loses the intensity of colors in the PET image, loses partial functional information and increases the difficulty of searching for focuses; GCF (fifth column) color is well stored, but a plurality of images have large-area noise blocks, structure information is directly lost, clinical judgment can be misled due to information loss, and robustness is poor; IFCNN (sixth column) blurs lost details near the boundary line, and is not clear enough where the texture is dense; when PMGI (seventh column) is fused, although color intensity is high and functional information is completely stored, background blurring, high-frequency information loss and no texture detail exist; the fusion image has no problems, the structure and function information is well preserved, the details are clear and have high contrast, particularly, the contrast at the edge is obvious, the details at the dense texture position are clear, and the fusion image contains enough image information required by clinical diagnosis; since most of these methods attempt to sharpen the edge by direct object enhancement and gradient, the difference in naturalness and reality of the fused image is still large; in addition, almost all the methods rely on large data sets, and the structural content loss function and the antagonistic loss function provided by the invention respectively protect high-frequency information and content information, and improve the effect of image fusion through respective nonlinear loss constraints.

The fusion effect is evaluated qualitatively and only by subjective feelings of human eyes, so that the invention has great limitation, in order to objectively verify the superiority of the invention, the invention selects an objective evaluation mode to carry out quantitative evaluation on the experimental result, and the result is shown in table 1:

experimental methods	PSNR↑	SSIM↑	RMSE↓	Q _abf ↑	Q _cv ↓
						DDcGan	54.8162	0.3000	0.2146	0.1602	2534.607
DenseFuse	55.1830	0.3628	0.1986	0.1368	2242.367
						GCF	54.4163	0.3347	0.2367	0.3401	2521.672
IFCNN	54.4163	0.4160	0.2083	0.3516	2226.219
						PMGI	54.0151	0.1022	0.2581	0.0460	3469.525
OURS	55.2124	0.4697	0.1968	0.3635	2009.348

It can be seen that the five indexes of the method are respectively superior to the other five comparison methods; q _cv The indexes are based on the mean square error of a Human Visual System (HVS) and the region, and the model can adaptively judge the pixel weight by benefiting from the adaptive module, so that the region similarity is improved; compared with DDcGan, the invention reduces 20.7%, which proves that the human eye perception is stronger than other methods and the regional similarity is higher; the antagonism game enables the model to have excellent denoising capability, so that the PSNR is improved, and compared with other methods, the method provided by the invention has the advantages of less noise and less interference information; the SSIM index tends to verify structural information, and is increased by 11.4% compared with the most excellent IFCNN, the index is higher, so that the texture structure is completely preserved, fuzzy areas are few, and the PMGI structural similarity is only 21% of that of the method, and the structure preservation is poor; compared with qualitative comparison result, the invention adopts a ground pixel scale control strategy, and the pixelThe Euclidean distance between the two indexes is well controlled, and the pixel level fusion index Q _abf The visual information is perfect, and the difference between the pixel levels of the fusion image and the source image is small; while a smaller Rmse indicates that the fused image of the present invention has less error and distortion.

The foregoing detailed description is intended to illustrate and not limit the invention, which is intended to be within the spirit and scope of the appended claims, and any changes and modifications that fall within the true spirit and scope of the invention are intended to be covered by the following claims.

Claims

1. A PET-MRI image fusion method based on an adaptive countermeasure generation network is characterized by comprising the following steps:

a) Mapping PET image from RGB space to YC space _b C _r Space and extracting Y components; the transformation equation is shown in equation (1), and the inverse transformation equation is shown in equation (2):

（1）

（2）

b) Inputting the Y component of the PET image and the MRI image into a generator; the generator is a double-branch fusion network, and the input of the generator is the Y component of the PET image and the MRI grayscale image respectively; the double-branch fusion network framework respectively uses a group of 3 multiplied by 3 convolution layers to extract features; then deepening the network to perform feature processing; finally, a group of 1 × 1 convolution layers are used for reconstruction; the deepening network adopts a region residual error learning module LRLP in the process of carrying out feature processing, the module obtains different image features through c times of different convolutions at different network branches, then carries out weight splicing on the features after the convolutions, and finally uses ELU to activate, so that the features contained in each layer are saved as much as possible, and the process is as shown in formula (3):

（3）

wherein, the first and the second end of the pipe are connected with each other,F _c is as followscThe output of each convolutional layer;H _concat a stitching function representing a feature map;Wis a group of union functions representing the weight of each convolution layer during splicing;H _active the ELU activation is performed on the spliced data, in an LRLP block, the output of each previous layer is used as the input of the next layer, the feedforward characteristic is kept, the utilization degree of the input data is improved, and then a characteristic extraction stage is entered, and the shallow characteristic is extracted by a first convolution layer through formula (4):

（4）

wherein, the first and the second end of the pipe are connected with each other,H _conv convolution operation with convolution kernel of 5 × 5 in the shallow feature extraction layer, then the extracted shallow features are sent to the next layer, and the second layer output is obtained by equation (5):

（5）

wherein, the first and the second end of the pipe are connected with each other,H _LRLP is a complex function of LRLP layer operations;F _pre the method fully utilizes convolution generated by each layer in a block and takes the convolution as local features, then further convolution is carried out to realize deep feature extraction, and in the subsequent convolution layer, the input is all the previous layers and the output cascade of an LRLP module; meanwhile, parameter sharing is also set between the two paths; the third layer output, as shown in equation (6):

（6）

wherein, the first and the second end of the pipe are connected with each other,H _RL represents a residual join; lambda is the weight when residual errors are fused; the fourth layer is the same as the third layer, the input of the fourth layer is based on the output cascade of the first three layers, and the output of the fourth layer is shown as the formula (7):

（7）

（8）

wherein the content of the first and second substances,H _concat representing the splicing operation of the feature maps, setting the last layer of an extraction module as the convolution of 1 multiplied by 1 after the feature maps are spliced, W is the weight matrix of the front four layers of the fusion extraction module, and then outputting two pathsF _ext,1 ，F _ext,2 Entering a fusion module, and obtaining a fusion image after fusion operation, wherein the fusion image is shown as a formula (9):

（9）

wherein, the first and the second end of the pipe are connected with each other,H _fuse representing a composite operation of the fusion module;

c) Respectively extracting the fused image obtained in the step b), the Y component of the PET image and the gradient map of MRI which are really input in the step a) by using a Laplace operator, and processing the extracted Y component of the PET image and the gradient map of MRI by a decision block, wherein the processing result of the decision block is shown as a formula (10);

（10）

wherein abs represents an absolute value function; maximum represents the maximization function, and the decision block processing procedure comprises: inputting a PET _ Y component Y and an MRI image M, extracting a Laplace gradient image of the PET _ Y component Y and the MRI image M through a Laplace operator, comparing pixel values pixel by pixel, taking the pixel value in each pixel of the two images as a gradient pixel after fusion, and finally calculating to obtain a combined gradient image;

d) Inputting the gradient map of the fused image extracted in the step c) and the combined gradient map obtained by calculation into a discriminator, and enabling the probability of a real input label to be 0.7-1.2, and enabling the probability of a generated result input label to be 0-0.3; the discriminator consists of four convolution layers and a linear layer, the sizes of convolution kernels of the convolution layers are all set to be 3 multiplied by 3, the step length is set to be 4, an ELU is used as an activation function, and the last layer is the linear layer and used for calculating probability so as to judge the truth of generated data;

e) Training a generation network and a discrimination network based on a countermeasure generation strategy, defining a joint gradient map as real data by a discriminator, and performing continuous countermeasure learning with a gradient map of a fusion image defined as pseudo data, wherein an objective function of GAN is defined as shown in formula (11):

（11）

the loss function of the generator is shown in equation (12):

（12）

wherein the content of the first and second substances,

is an antagonistic loss from the generator-discriminator network;

（13）

is pixel level euclidean loss optimized using the filter graph;

（14）

representing texture loss based on the gradient map;

（15）

and

the weights of pixel-level loss and texture loss are used for ensuring that the importance of the three loss functions is the same;

the loss function of the discriminator is shown in equation (16):

（16）

wherein a is a label of 'false data' and is set to be 0-0.3; "false data" is a gradient map of the fused image, formulated as equation (17):

（17）

b is a label of 'true data' and is set to be 0.7-1.2; "true data" is from the joint gradient map of MRI and PET _ y constructed by the decision block, formulated as equation (10);

f) Optimizing by adopting an Adam optimizer;

g) Obtaining a trained confrontation generation network model;

h) And predicting by using the network model.