CN114565816B

CN114565816B - Multi-mode medical image fusion method based on global information fusion

Info

Publication number: CN114565816B
Application number: CN202210202366.1A
Authority: CN
Inventors: 陈勋; 张静; 刘爱萍; 张勇东; 吴枫
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2024-04-02
Anticipated expiration: 2042-03-03
Also published as: CN114565816A

Abstract

The invention discloses a multi-mode medical image fusion method based on global information fusion, which comprises the following steps: 1. performing color space conversion and image shearing pretreatment on the medical images of the original multiple modes; 2. establishing a modal branch network which is interacted in a plurality of scales through a fusion module, and establishing a fusion module formed by a transducer to combine multi-modal characteristic information; 3. a reconstruction module is established, and a fusion image is synthesized from multi-mode characteristics of multiple scales; 4. training and evaluating the model on a public data set; 4. and realizing medical image fusion tasks by using the trained model. According to the invention, the multi-mode semantic information can be fully fused through the transducer fusion module and the interactive mode branch network, so that a fine granularity fusion effect is realized, the structure and texture information of the original image are well reserved, and the mosaic phenomenon caused by the low-resolution medical image is also improved.

Description

Multi-mode medical image fusion method based on global information fusion

Technical Field

The invention relates to the technical field of image fusion, in particular to a medical image fusion technology based on deep learning.

Background

The medical image can assist doctors to better understand human body structures and tissues, and is widely applied to clinical applications such as disease diagnosis, treatment planning, operation guidance and the like. Medical images of different modalities have a difference in the degree of interest in human organs and tissues due to the difference in imaging mechanisms. Medical images in a single mode often cannot provide comprehensive and sufficient information, and doctors often need to observe a plurality of images at the same time to accurately judge the illness state, which brings certain difficulty to diagnosis. Due to the limitations of single-modality medical images, multi-modality medical image fusion is a very necessary area of research. The multi-mode medical image fusion refers to synthesizing important information of medical images of different modes in the same scene to synthesize one image.

In general, medical images can be divided into anatomical images and functional images. Anatomical images have a high spatial resolution, are capable of clearly imaging the anatomy of an organ, but are not capable of displaying functional changes in the metabolism of the human body, such as computed tomography (Computed Tomography, CT) and magnetic resonance (Magnetic Resonance, MR); functional images, on the contrary, perform well for functional and metabolic displays, but at low resolution, do not accurately describe anatomical details of organs such as positron emission tomography (Positron Emission Tomography, PET) and Single photon emission computed tomography (Single-Photon Emission Computed Tomography). Even though CT and MR are both anatomical images, PET and SPECT are both functional images, their information of interest is not the same. CT mainly reflects the position information of human bones and implants, while MR is mainly used to provide clear detailed information to soft tissues and other parts. MR contains multiple modes, focusing on subregions of different properties, and the common modes are T1 weighting (denoted as T1), contrast enhancement T1 weighting (denoted as T1 c), T2 weighting (denoted as T2), and liquid decay inversion recovery pulse (denoted as FLAIR). PET primarily reflects tumor function and metabolic information, while SPECT primarily provides blood flow information for organs and tissues.

Most multi-modality medical image fusion methods can be summarized as three processes: extracting, fusing and reconstructing the characteristics. In order to achieve medical image fusion, various algorithms have been proposed by students at home and abroad over the last three decades, and in general, these methods can be divided into two main categories: traditional fusion methods and fusion methods based on deep learning.

In the traditional medical image fusion framework, researchers have proposed a number of decomposition or transformation methods to extract features of source images, then select a certain fusion strategy to fuse the features, and finally, inverse transform the fused features to reconstruct a fused image. Conventional methods can be divided into four classes according to the feature extraction scheme: (1) sparse representation based methods; (2) Methods based on multi-scale decomposition, such as pyramids and wavelets; (3) Subspace-based methods such as independent component analysis; (4) salient feature-based methods. The traditional medical image fusion method has good fusion effect, but has some defects, and further improvement of fusion performance is limited. First, the fusion performance of conventional approaches relies heavily on artificially defined features, which can limit the generalization of the approach to other fusion tasks. Second, different features may require different fusion strategies to function. Third, for the sparse representation-based fusion method, its dictionary learning is relatively time consuming, and therefore, it takes much time to synthesize one fused image.

In recent years, a deep learning-based method becomes a new research hotspot in the field of image fusion, a deep learning model represented by a convolutional neural network (Convolutional Neural Network, CNN) and a generated countermeasure network (Generative Adversarial Network) is successfully applied to the problems of multi-focusing, infrared and visible light image fusion, characteristics and fusion strategies are not required to be defined manually, and advantages relative to the traditional fusion method are presented. However, because the reference image of the fusion result cannot be constructed for supervision learning, and the complex diversity of human body structure and tissue, the imaging characteristics of each mode are not easy to quantitatively describe, and the like, the research of the current medical image fusion method based on deep learning is relatively less, and the current medical image fusion method is still in a starting stage.

It has been discovered that existing medical image fusion methods typically use either a manually defined fusion strategy or a convolution-based network to fuse multi-modal image features. However, these fusion strategies cannot effectively extract global semantic information of the multi-modal image. In addition, the current medical image fusion method based on deep learning has the problems of insufficient and inexact utilization of multi-mode image information. Most methods use multi-modality images in a simplistic manner, the most common being to stack the channel dimensions of the original different modality images (or respectively extracted underlying features) and then input directly into the network model for fusion.

Disclosure of Invention

The invention provides a medical image fusion method based on global information fusion, aiming at combining global information of multi-mode characteristics through a self-attention mechanism and maximizing information of different modes through an interactive mode branch network, thereby realizing high-quality medical image fusion effect.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention discloses a multi-mode medical image fusion method based on global information fusion, which is characterized by comprising the following steps:

step one, acquiring M original medical images in different modes and converting YCbCr color space to obtain Y channel images { I } of all modes ₁ ，...，I _m ...，I _M -a }; wherein I is _m A Y-channel image representing an mth modality, M e {1,2,., M }; y channel images { I } for all modalities ₁ ，...，I _m ...，I _M Image cutting is carried out to obtain an image block set { S } of all modes ₁ ，...，S _m ...，S _M S, where S _m A set of image blocks representing an mth modality;

step two, constructing a fusion network Transfusion, which comprises the following steps: m modal branch networks, N fusion modules and a reconstruction module; and aggregate the image blocks { S } of all modalities ₁ ，...，S _m ...，S _M Input into the fusion network transfusions:

step 2.1, constructing M modal branch networks and N fusion modules:

step 2.1.1, constructing M modal branch networks:

the mth modal branch network in the M modal branch networks consists of N convolution modules, and the N convolution modules are respectively marked as ConvBlock _m1 ，...，ConvBlock _mn ，...，ConvBlock _mN Wherein, convBlock _mn An nth convolution module representing an mth modal branch network, N e {1, 2., N };

when n=1, the nth convolution module ConvBlock of the mth modal branch network _mn From X _mn A plurality of two-dimensional convolution layers;

when n=2, 3, N, an nth convolution module ConvBlock of the mth modal branch network _mn From a maximum pooling layer and X _mn A plurality of two-dimensional convolution layers;

the convolution kernel size of the x two-dimensional convolution layer of the nth convolution module of the mth modal branch network is ks _mnx ×ks _mnx The number of convolution kernels is kn _mnx ，x∈{1，2，...，X _mn }；

Step 2.1.2, constructing N fusion modules:

any nth fusion module of the N fusion modules is a transducer network and is composed of L self-attention mechanism modules; the first self-attention mechanism module of the L self-attention mechanism modules includes: 1 multi-head attention layer, 2 layer normalization and 1 full connection layer;

step 2.2, collecting the image blocks { S } of all modes ₁ ，...，S _m ...，S _M Inputting into M modal branch networks, and carrying out information fusion through N fusion modules:

when n=1, the image block set S of the mth modality _m An nth convolution module ConvBlock input to the mth modal branch network _mn X of (2) _mn Outputting characteristic diagram after two-dimensional convolution layerWherein H is ⁿ 、W ⁿ 、D ⁿ Output characteristic diagram of mth modal branch network in nth convolution module>The height, width and channel number of the channel; thereby obtaining the output characteristic diagram of M modal branch networks in the nth convolution module +.>

Outputting characteristic diagrams of the M modal branch networks in an nth convolution moduleAfter being processed by the nth fusion module, the characteristic diagram is output +.>Wherein (1)>The nth feature map output by the nth fusion module is shown;

outputting the mth feature map from the nth fusion moduleA characteristic diagram outputted by an nth convolution module of the mth modal branch network +.>Adding to obtain a feature map of the mth modal branch network after interaction of the nth convolution moduleThus obtaining a characteristic diagram ++of M modal branch networks after interaction of the nth convolution module>

When n=2, 3, N, feature map of mth modal branch network after interaction of nth-1 convolution moduleThe feature map ∈10 of the nth convolution module of the mth modal branch network after downsampling is obtained by downsampling the mth modal branch network in the maximum pooling layer of the nth convolution module>The feature map after downsamplingInputting into the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network, and sequentially passing through X _m After processing of the two-dimensional convolution layers, a feature map +.>Thereby obtaining a characteristic diagram output by an nth convolution module of the M modal branch networks +.>-fitting the feature map>After the processing of the nth fusion module, a characteristic diagram is output +.>The mth feature map outputted by the nth fusion module +.>Feature map output by nth convolution module of branch network of mth mode +.>Adding to obtain added characteristic diagram +.>Thereby obtaining M added feature maps +.>And further obtaining a feature map output by the Nth fusion module

Step 2.3, the reconstruction module is composed of an N-level convolution network; and outputting the characteristic graphs of the N fusion modulesInputting the primary fusion image F 'into the reconstruction module to obtain a primary fusion image F':

step 2.3.1, outputting all feature graphs output by the nth fusion moduleAdding to obtain a fusion characteristic diagram phi ⁿ The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining fusion characteristic diagrams { phi } of N fusion modules ¹ ，...，Φ ⁿ ，...，Φ ^N }；

Step 2.3.2, constructing a reconstruction module formed by an N-level convolution network, and fusing a feature map phi of an nth fusion module ⁿ N-th stage convolutional network input to reconstruction module:

when n=1, the nth stage convolutional network includes: b (B) _n Each convolution module RConvBlock _n1 ，...，RConvBlock _nb ，...，

When n=2, 3, N, the nth level convolutional network includes: b (B) _n Each convolution module RConvBlock _n1 ，...，RConvBlock _nb ，...，And B _n +1 upsampling layers Upsample _n1 ，...，UpSample _nb ，...，The b-th convolution module RConvBlock of the nth level convolution network _nb Consists of Y two-dimensional convolutional layers, B e {1, 2., B _n }；

When n=1 and b=1, the fusion profile Φ of the nth fusion module is calculated ⁿ A b-th convolution module RConvBlock input to the n-th level convolution network _nb And outputs a characteristic map ΦR _nb ；

When n=2, 3,..n and b=1, the fusion profile Φ of the nth fusion module is taken ⁿ B-th up-sampling layer Upsample input to nth stage convolutional network _nb After the up-sampling process, an up-sampled characteristic diagram phi U is output _nb The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining the up-sampled characteristic diagram { phi U (phi) of the convolution network from the level 2 to the level N-1 _2b ，...，ΦU _nb ，...，ΦU _Nb }；

When n=2, 3, N and b=2, 3, B _n When in use, the fusion characteristic diagram phi of the nth fusion module is obtained ⁿ Output characteristic diagram { ΦR of front b-1 convolution modules of nth-stage convolution network _n1 ，...，ΦR _n(b-1) The first b up-sampled feature maps { ΦU } of the n+1st level convolutional network _(n+1)1 ，...，ΦU _(n+1)b After splicing, obtaining a spliced characteristic diagram; inputting the spliced characteristic diagram to a b-th convolution module RConvBlock of an n-th level convolution network _nb And outputs an output characteristic diagram ΦR of a b-th convolution module of the nth-stage convolution network _nb The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining the B-th of the level 1 convolutional network ₁ Output feature map of each convolution module

B of the level 1 convolutional network ₁ Output feature map of each convolution moduleAfter processing of a convolution layer, a primary fusion image F' is obtained;

step three, constructing a loss function and training a network to obtain an optimal fusion model:

step 3.1, respectively calculating the image block sets { S } of all modes ₁ ，...，S _m ...，S _M Entropy of each image block set in the sequence and obtaining corresponding entropy value { e } ₁ ，...，e _m ...，e _M E, where e _m An entropy value representing a set of image blocks of the mth modality;

step 3.2, for the entropy { e } ₁ ，...，e _m ...，e _M Respectively carrying out normalization processing to obtain an image block set { S } of all modes ₁ ，...，S _m ...，S _M Weights { omega } ₁ ，...，ω _m ，...，ω _M }, wherein omega _m A weight representing a set of image blocks of the mth modality;

step 3.3, constructing a total Loss function Loss by using the formula (1):

in the formula (1), L _ssim (S _m F') represents the set S of image blocks of the mth modality _m A structural similarity loss function with the preliminary fusion image F';

step 3.4, carrying out minimum solution on the total Loss function Loss by using an optimizer, so as to optimize all parameters in the fusion network transformation and obtain an optimal fusion model;

step four, utilizing an optimal fusion model to carry out Y-channel image { I } of all modes ₁ ，I ₂ ，...，I _M Processing and outputting a preliminary fusion image F'; the preliminary fusion image F' is converted into an RGB color space, thereby obtaining a final fusion image F.

The multi-mode medical image fusion method based on global information fusion is also characterized in that the nth fusion module in the step 2.2 is processed according to the following process:

2.2.1, outputting a characteristic diagram of the M modal branch networks output by the nth convolution module by the nth fusion moduleSplicing and leveling to obtain a product with a size of (M.times.H) ⁿ *W ⁿ )×D ⁿ Is a flattened feature vector; adding the leveled feature vector with a trainable vector with the same size to obtain a feature vector +.>

Step 2.2.2, when l=1, the 1 st self-attention mechanism module of the nth fusion module takes the feature vectorAfter linear mapping, three matrixes Q are obtained ^nl ，K ^nl ，V ^nl The method comprises the steps of carrying out a first treatment on the surface of the Recalculating Q ^nl ，K ^nl ，V ^nl Multi-head attention result Z between ^nl The method comprises the steps of carrying out a first treatment on the surface of the The multi-head attention result Z ^nl Input into the full connection layer of the first self-attention mechanism module of the nth fusion module, and obtain the output sequence vector of the first self-attention mechanism module of the nth fusion module>

When l=2, 3, N, the output sequence vector of the 1 st self-attention mechanism module of the nth fusion moduleInputting the first self-attention mechanism module of the nth fusion module into the first self-attention mechanism module of the nth fusion module, and obtaining the first self-attention mechanism module of the nth fusion moduleOutput sequence vector +.>Thereby obtaining the output sequence vector of the L self-attention mechanism module of the nth fusion module +.>

Step 2.2.3, vector the output sequenceDividing into M modes, and then deforming the size of each mode into H ⁿ ×W ⁿ ×D ⁿ To obtain an output characteristic map->

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides an unsupervised anatomic and functional image fusion method. According to the method, a transducer structure is introduced as a fusion strategy, and the transducer utilizes a self-attention mechanism contained in the transducer structure to merge global information of the multi-mode medical image and fully fuse multi-mode semantic information, so that a fine-granularity fusion effect is realized. The invention not only well reserves the structure and texture information in the anatomical image, but also improves the mosaic phenomenon caused by low resolution in the functional image.

2. The present invention proposes a modal branching network that interacts on multiple scales. The network can extract multi-scale complementary characteristics of each modal image and fully utilizes multi-modal image information. The interactive branching network enhances the anatomic and functional image fusion effect.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention, wherein "ks×ks, kn" represents one convolution layer with a kernel size of ks×ks and a kernel number of kn;

FIG. 2 is a block diagram of a modal branching network and a fusion module provided by an embodiment of the present invention;

fig. 3 is a structural diagram of a reconstruction module according to an embodiment of the present invention.

Detailed Description

In this embodiment, a multi-mode image fusion method based on global information fusion, as shown in fig. 1, includes the following steps:

step one, acquiring M original medical images in different modes, performing color space conversion and image shearing pretreatment to obtain a pretreated image block set { S } of all modes ₁ ，S ₂ ，...，S _M S, where S _m Image block set representing the M-th modality, M e {1,2,., M }:

step 1.1, acquiring original medical images of a plurality of modes required by an experiment from a Harvard medical image dataset website (http:// www.med.harvard.edu/AANLIB/home.html); the present embodiment collects medical images of m=2 modalities from the public dataset, including 279 pairs of MR-T1 and PET images and 318 pairs of MR-T2 and SPECT images, wherein MR-T1 and MR-T2 are gray scale anatomical images, the number of channels is 1, PET and SPECT are functional images of RGB color space, the number of channels is 3;

step 1.2, converting the image of the RGB color space into the YCbCr space according to the formula (1):

in the formula (1), R, G and B are three channels of RGB color space respectively, Y is a brightness channel, and Cb and Cr are two color channels;

step 1.3, in order to expand the sample number, the gray level image and the image of the Y channel are cut into image blocks to obtain an image block set { S } ₁ ，S ₂ ，...，S _M -a }; in this embodiment, the size of the clipped image block is 64×64;

step two, constructing a fusion network Transfusion, which comprises the following steps: m modal branch networks, N fusion modules and a reconstruction module; and aggregate the image blocks { S } of all modalities ₁ ，...，S _m ...，S _M Input fusion network fusion:

step 2.1, constructing M modal branch networks and N fusion modules:

step 2.1.1, constructing M modal branch networks:

when n=2, 3, N, nth convolution module ConvBlock of mth modal branch network _mn From a maximum pooling layer and X _mn A plurality of two-dimensional convolution layers;

In this embodiment, n=4, the core size of all the largest pooling layers is 2×2, the step size is 2, x _mn 、ks _mnx 、kn _mnx As shown in fig. 2;

step 2.1.2, constructing N fusion modules:

any nth fusion module of the N fusion modules is a transducer network and is composed of L self-attention mechanism modules; the first self-attention mechanism module of the L self-attention mechanism modules includes: 1 multi-head attention layer, 2 layer normalization and 1 full connection layer; in this embodiment, l=1;

when n=1, the image block set of the mth modalityS in S _m N-th convolution module ConvBlock input into m-th modal branch network _mn X of (2) _mn Outputting characteristic diagram after two-dimensional convolution layerWherein H is ⁿ 、W ⁿ 、D ⁿ Output characteristic diagram of mth modal branch network in nth convolution module>The number of wells, channels; thereby obtaining the output characteristic diagram of M modal branch networks in the nth convolution module +.>In this embodiment, (H) ¹ ，W ¹ ，D ¹ )＝(64，64，64)，(H ² ，W ² ，D ² )＝(32，32，128)，(H ³ ，W ³ ，D ³ )＝(16，16，256)，(H ⁴ ，W ⁴ ，D ⁴ )＝(8，8，512)；

Output characteristic diagram of M modal branch networks in nth convolution moduleAfter the processing of the nth fusion module according to the formula (2), a characteristic map is output +.>Wherein (1)>The mth feature map representing the output of the nth fusion module:

in the formula (2), a transducer _n Represents an nth fusion module and is realized according to the following steps:

step 2.2.1, outputting a characteristic diagram of the nth fusion module to the M modal branch networks at the nth convolution moduleSplicing and leveling to obtain a product with a size of (M.times.H) ⁿ *W ⁿ )×D ⁿ Is a flattened feature vector; adding the leveled feature vector with a trainable vector with the same size to obtain a feature vector +.>

Step 2.2.2 the self-attention mechanism module of the nth fusion module takes the feature vectorLinear mapping to three matrices, Q ⁿ ，K ⁿ ，V ⁿ ：

In the formula (3), the amino acid sequence of the compound,is a trainable matrix with the size of D ⁿ ×D ⁿ ；

Step 2.2.3, Q ⁿ ，K ⁿ ，V ⁿ Respectively dividing into h heads to obtaini e {1, 2..h }, and then calculating the multi-head attention according to equation (4) -equation (6), resulting in the result Z:

in the formula (5), concat represents a splicing operation,is a trainable matrix; in formula (6), layerNorm represents layer normalization;

step 2.2.4 results of Multi-head attention Z ⁿ Input to the full connection layer to obtain the output sequence vector of the nth fusion module

In formula (7), MLP represents a full link layer;

step 2.2.5, willDividing into M modes, and then deforming the size of each mode into H ⁿ ×W ⁿ ×D ⁿ Obtain the output characteristic diagram->Wherein (1)>Representing an mth feature map output by an nth fusion module;

outputting the mth feature map from the nth fusion moduleFeature map output by nth convolution module of branch network of mth mode +.>Adding to obtain a characteristic diagram ++of the mth modal branch network after interaction of the nth convolution module>Thus obtaining a characteristic diagram ++of M modal branch networks after interaction of the nth convolution module>

When n=2, 3, N, feature map of mth modal branch network after interaction of nth-1 convolution moduleThe feature map ∈10 of the nth convolution module of the mth modal branch network after downsampling is obtained by downsampling the mth modal branch network in the maximum pooling layer of the nth convolution module>Feature map after downsampling +.>Ru in the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network and sequentially passes through X _mn After processing of the two-dimensional convolution layers, a feature map +.>Thereby obtaining the feature map output by the nth convolution module of the M modal branch networksFeature map +.>After the processing of the nth fusion module, a characteristic diagram is outputThe mth feature map outputted by the nth fusion module +.>Feature map output by nth convolution module of branch network of mth mode +.>Adding to obtain added characteristic diagram +.>Thereby obtaining M added feature mapsAnd further obtaining a characteristic diagram +.>

Step 2.3, constructing a reconstruction module formed by an N-level convolution network, and outputting characteristic graphs of the N fusion modulesInputting the primary fusion image F 'into a reconstruction module to obtain a primary fusion image F':

when n=1, the nth stage convolutional network includes: b (B) _n Multiple convolution modesBlock RConvBlock _n1 ，RConvBlock _n2 ，...，

When n=2, 3, N, the nth level convolutional network includes: b (B) _n Each convolution module RConvBlock _n1 ，RConvBlock _n2 ，...，And B _n +1 upsampling layers Upsample _n1 ，UpSample _n2 ，...，The b-th convolution module RConvBlock of the nth level convolution network _nb Consists of Y two-dimensional convolutional layers, B e {1, 2., B _n }；

When n=1 and b=1, the fusion profile Φ of the nth fusion module is taken ⁿ The b-th convolution module RConvBlock input to the nth stage convolution network _nb And outputs a characteristic map ΦR _nb ；

When n=2, 3, N and b=2, 3, B _n When in use, the fusion characteristic diagram phi of the nth fusion module is obtained ⁿ Output characteristic diagram { ΦR of front b-1 convolution modules of nth-stage convolution network _n1 ，...，ΦR _n(b-1) The first b up-sampled feature maps { ΦU } of the n+1st level convolutional network _(n+1)1 ，...，ΦU _(n+1)b After splicing, obtaining a spliced characteristic diagram; inputting the spliced characteristic diagram to a b-th convolution module RConvBlock of an n-th level convolution network _nb And outputs the b-th of the nth stage convolutional networkOutput characteristic diagram phi R of convolution module _nb The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining the B-th of the level 1 convolutional network ₁ Output feature map of each convolution module

in this embodiment, the reconstruction module is shown in fig. 3, where y=2, b ₁ ＝3，B ₂ ＝2，B ₃ ＝1，B ₄ ＝0；

step 3.1, respectively calculating the image block sets { S } of all modes according to the formulas (8) - (9) ₁ ，S ₂ ，...，S _M Entropy of each image block set in the sequence and obtaining corresponding entropy value { e } ₁ ，e ₂ ，...，e _M E, where e _m Entropy value representing the image block set of the mth modality:

e _m ＝Entropy(S _m ) (8)

in the formula (9), p _l Probability of being the first gray value;

step 3.2, for entropy { e } ₁ ，e ₂ ，...，e _M Respectively carrying out normalization processing to obtain an image block set { S } of all modes ₁ ，S ₂ ，...，S _M Weights { omega } ₁ ，ω ₂ ，...，ω _M }, wherein omega _m Weights representing the image block set of the mth modality:

in the formula (10), η is a temperature parameter; in this embodiment, η=1;

step 3.3, constructing a total Loss function Loss by using the formula (11):

Loss＝ω ₁ L _ssim (S ₁ ，F′)+ω ₂ L _ssim (S ₂ ，F′) (11)

L _ssim (S _j ，F′)＝1-SSIM(S _j ，F′) (12)

in the formula (11), L _ssim (S _m F') represents the set S of image blocks of the mth modality _m A structural similarity loss function with the preliminary fusion image F'; in formula (12), SSIM is a structural similarity function;

step 3.4, carrying out minimized solution on the total Loss function Loss by adopting an AdamW optimizer, so as to optimize all parameters in the fusion network transformation and obtain an optimal fusion model;

step four, utilizing an optimal fusion model to carry out the fusion on Y-channel images or gray-scale images { I } ₁ ，I ₂ ，...，I _M Processing and outputting a preliminary fusion image F'; splicing the primary fusion image F 'with Cb and Cr channels and converting the primary fusion image F' into an RGB color space to obtain a final fusion image F;

step five, evaluating the performance of the invention:

in specific implementation, the invention is compared with the traditional method CSMC A and the deep learning methods DDcGAN and EMfusion. In addition, in order to illustrate the effectiveness of the fusion module based on the transducer and the interactive mode branching network in the invention, two comparative experiments are set up. The first experiment removes the transducer fusion module and the second experiment replaces the interactive modal branch network with a weight-shared modal branch network. Similarity measurement Q based on boundary using mutual information, average gradient _AB/F Visual perception index Q _CV As evaluation index, mutual information, average gradient, Q _AB/F The larger Q _CV The smaller the fusion image, the higher the quality. The average fusion quality for 30 pairs of MR-T1 and PET test images and 30 pairs of MR-T2 and SPECT test images is shown in the following table:

TABLE 1 fusion performance of different methods

Experimental results show that the invention has the advantages of mutual information, average gradient and Q _AB/F 、Q _CV The four indexes are all optimal. The transducer fusion module of the invention helps to enhance 5.10% -10.02% of mutual information, 2.59% -5.28% of average gradient, 3.04% -4.36% of Q _AB/F Q of 1.43% -12.66% _CV The method comprises the steps of carrying out a first treatment on the surface of the The interactive mode branching network of the invention helps to enhance the mutual information of 18.39-19.91%, the average gradient of 1.06-6.69% and the Q of 7.68-11.02% _AB/F 27.69% -62.22% of Q _CV 。

Claims

1. A multimode medical image fusion method based on global information fusion is characterized by comprising the following steps:

step one, acquiring M original medical images in different modes and converting YCbCr color space to obtain Y channel images { I } of all modes ₁ ,…,I _m …,I _M -a }; wherein I is _m A Y-channel image representing the mth modality, M ε {1,2, …, M }; y channel images { I } for all modalities ₁ ,…,I _m …,I _M Image cutting is carried out to obtain an image block set { S } of all modes ₁ ,…,S _m …,S _M S, where S _m A set of image blocks representing an mth modality;

step two, constructing a fusion network Transfusion, which comprises the following steps: m modal branch networks, N fusion modules and a reconstruction module; and aggregate the image blocks { S } of all modalities ₁ ,…,S _m …,S _M Input into the fusion network transfusions:

step 2.1, constructing M modal branch networks and N fusion modules:

step 2.1.1, constructing M modal branch networks:

the mth modal branch network in the M modal branch networks consists of N convolution modules, and the N convolution modules are respectively marked as ConvBlock _m1 ,…,COnvBlock _mn ,…,ConvBlock _mN Wherein, convBlock _mn An nth convolution module representing an mth modal branch network, N e {1,2, …, N };

when n=2, 3, …, N, the nth convolution module ConvBlock of the mth modal branch network _mn From a maximum pooling layer and X _mn A plurality of two-dimensional convolution layers;

the convolution kernel size of the x two-dimensional convolution layer of the nth convolution module of the mth modal branch network is ks _mnx ×ks _mnx The number of convolution kernels is kn _mnx ，x∈{1,2,…,X _mn }；

Step 2.1.2, constructing N fusion modules:

step 2.2, collecting the image blocks { S } of all modes ₁ ,…,S _m …,S _M Inputting into M modal branch networks, and carrying out information fusion through N fusion modules:

when n=1, the image block set S of the mth modality _m An nth convolution module ConvBlock input to the mth modal branch network _mn X of (2) _mn Outputting characteristic diagram after two-dimensional convolution layerWherein H is ⁿ 、W ⁿ 、D ⁿ Respectively represent the mth mouldOutput characteristic diagram of state branch network in nth convolution module +.>The height, width and channel number of the channel; thereby obtaining the output characteristic diagram of M modal branch networks in the nth convolution module +.>

Outputting characteristic diagrams of the M modal branch networks in an nth convolution moduleAfter being processed by the nth fusion module, the characteristic diagram is output +.>Wherein (1)>Representing an mth feature map output by an nth fusion module;

outputting the mth feature map from the nth fusion moduleA characteristic diagram outputted by an nth convolution module of the mth modal branch network +.>Adding to obtain a characteristic diagram ++of the mth modal branch network after interaction of the nth convolution module>Thus obtaining a characteristic diagram ++of M modal branch networks after interaction of the nth convolution module>When n=2, 3…, N, the feature diagram of the mth modal branch network after interaction of the N-1 convolution module is ∈ ->The feature map ∈10 of the nth convolution module of the mth modal branch network after downsampling is obtained by downsampling the mth modal branch network in the maximum pooling layer of the nth convolution module>-down-sampled feature map->Inputting into the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network, and sequentially passing through X _mn After processing of the two-dimensional convolution layers, a feature map +.>Thereby obtaining the feature map output by the nth convolution module of the M modal branch networks-fitting the feature map>After the processing of the nth fusion module, a characteristic diagram is output +.>The mth feature map outputted by the nth fusion module +.>Feature map output by nth convolution module of branch network of mth mode +.>Adding to obtain added characteristic diagram +.>Thereby obtaining M added feature maps +.>And further obtaining a characteristic diagram +.>

step 2.3.1, outputting all feature graphs output by the nth fusion moduleAdding to obtain a fusion characteristic diagram phi ⁿ The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining fusion characteristic diagrams { phi } of N fusion modules ¹ ,…,Φ ⁿ ,…,Φ ^N }；

when n=1, the nth stage convolutional network includes: b (B) _n Each convolution module

When n=2, 3, …, N, the nth stage convolutional network comprises: b (B) _n Each convolution module And B _n +1 upsampling layers-> The b-th convolution module RConvBlock of the nth level convolution network _nb Consists of Y two-dimensional convolution layers, B epsilon {1,2, …, B _n }；

When n=2, 3, …, N and b=1, the fusion profile Φ of the nth fusion module is calculated ⁿ B-th up-sampling layer Upsample input to nth stage convolutional network _nb After the up-sampling process, an up-sampled characteristic diagram phi U is output _nb The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining the up-sampled characteristic diagram { phi U (phi) of the convolution network from the level 2 to the level N-1 _2b ,…,ΦU _nb ,…,ΦU _Nb }；

When n=2, 3, …, N and b=2, 3, …, B _n When in use, the fusion characteristic diagram phi of the nth fusion module is obtained ⁿ Output characteristic diagram { ΦR of front b-1 convolution modules of nth-stage convolution network _n1 ,…,ΦR _n(b-1 ) The first b up-sampled feature maps { ΦU } of the n+1st level convolutional network _(n+1)1 ,…,ΦU _(n+1)b After splicing, obtaining a spliced characteristic diagram; inputting the spliced characteristic diagram to a b-th convolution module RConvBlock of an n-th level convolution network _nb And outputs an output characteristic diagram ΦR of a b-th convolution module of the nth-stage convolution network _nb The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining the B-th of the level 1 convolutional network ₁ The outputs of the convolution modulesFeature drawing

step 3.1, respectively calculating the image block sets { S } of all modes ₁ ,…,S _m …,S _M Entropy of each image block set in the sequence and obtaining corresponding entropy value { e } ₁ ,…,e _m …,e _M E, where e _m An entropy value representing a set of image blocks of the mth modality;

step 3.2, for the entropy { e } ₁ ,…,e _m …,e _M Respectively carrying out normalization processing to obtain an image block set { S } of all modes ₁ ,…,S _m …,S _M Weights { omega } ₁ ,…,ω _m ,…,ω _M }, wherein omega _m A weight representing a set of image blocks of the mth modality;

step 3.3, constructing a total Loss function Loss by using the formula (1):

step four, utilizing an optimal fusion dieY channel images { I } for all modalities ₁ ,I ₂ ,…,I _M Processing and outputting a preliminary fusion image F'; the preliminary fusion image F' is converted into an RGB color space, thereby obtaining a final fusion image F.

2. The multi-modal medical image fusion method based on global information fusion according to claim 1, wherein the nth fusion module in step 2.2 is processed according to the following procedures:

Step 2.2.2, when l=1, the nth self-attention mechanism module of the nth fusion module compares the feature vectorAfter linear mapping, three matrixes Q are obtained ^nl ，K ^nl ，V ^nl The method comprises the steps of carrying out a first treatment on the surface of the Recalculating Q ^nl ，K ^nl ，V ^nl Multi-head attention result Z between ^nl The method comprises the steps of carrying out a first treatment on the surface of the The multi-head attention result Z ^nl Input into the full connection layer of the first self-attention mechanism module of the nth fusion module, and obtain the output sequence vector of the first self-attention mechanism module of the nth fusion module>

When lWhen=2, 3, …, N, the output sequence vector of the 1 st self-attention mechanism module of the nth fusion moduleInputting the output sequence vector of the first self-attention mechanism module of the nth fusion module into the first self-attention mechanism module of the nth fusion module>Thereby obtaining the output sequence vector of the L self-attention mechanism module of the nth fusion module +.>