CN114565816A

CN114565816A - Multi-modal medical image fusion method based on global information fusion

Info

Publication number: CN114565816A
Application number: CN202210202366.1A
Authority: CN
Inventors: 陈勋; 张静; 刘爱萍; 张勇东; 吴枫
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-05-31
Anticipated expiration: 2042-03-03
Also published as: CN114565816B

Abstract

The invention discloses a multi-modal medical image fusion method based on global information fusion, which comprises the following steps: 1. carrying out color space conversion and image shearing preprocessing on the original medical images of a plurality of modalities; 2. establishing a modal branch network which interacts through a fusion module in a plurality of scales, and establishing a fusion module consisting of a Transformer to combine multi-modal characteristic information; 3. establishing a reconstruction module, and synthesizing a fusion image from multi-scale multi-modal characteristics; 4. training and evaluating the model on a public data set; 4. and realizing a medical image fusion task by using the trained model. The invention can fully fuse the multi-modal semantic information through the Transformer fusion module and the interactive modal branch network, realizes the fine-grained fusion effect, well retains the structure and texture information of the original image, and improves the mosaic phenomenon caused by the low-resolution medical image.

Description

Multi-modal medical image fusion method based on global information fusion

Technical Field

The invention relates to the technical field of image fusion, in particular to a medical image fusion technology based on deep learning.

Background

Medical images can assist doctors in better understanding human structures and tissues, and are widely used in clinical applications such as disease diagnosis, treatment planning, surgical guidance, and the like. Due to the difference of imaging mechanisms, the attention degree of medical images of different modalities to human organs and tissues is different. The medical image of single modality often can not provide comprehensive and sufficient information, and doctors often need to observe a plurality of images simultaneously to accurately judge the state of an illness, which inevitably brings certain difficulty to diagnosis. Due to the limitations of single modality medical images, multi-modality medical image fusion is a field in which research is very necessary. The multi-modal medical image fusion is to synthesize an image by synthesizing important information of medical images of different modalities in the same scene.

Generally, medical images can be divided into anatomical images and functional images. The anatomical image has high spatial resolution, can clearly image the anatomical structure of organs, but cannot display the function change of human metabolism, such as Computed Tomography (CT) and Magnetic Resonance imaging (MR); functional images, on the contrary, are very functional and metabolic displays, but have low resolution and do not accurately describe anatomical details of organs, such as Positron Emission Tomography (PET) and Single-Photon Emission Computed Tomography (Single-Photon Emission Computed Tomography). Even if CT and MR are both anatomical images and PET and SPECT are both functional images, the information they are interested in is different. CT mainly reflects the position information of human bones and implants, while MR is mainly used to provide clear detailed information to soft tissues and other parts. MR also contains multiple modalities, focusing on sub-regions of different properties, and the commonly used modalities are T1 weighted (denoted as T1), contrast enhanced T1 weighted (denoted as T1c), T2 weighted (denoted as T2), and liquid decay inversion recovery pulse (denoted as FLAIR). PET primarily reflects tumor function and metabolic information, whereas SPECT primarily provides organ and tissue blood flow information.

Most multi-modality medical image fusion methods can be summarized in three processes: and (5) extracting, fusing and reconstructing the characteristics. In order to realize medical image fusion, various algorithms have been proposed by scholars at home and abroad for over three decades, and generally speaking, the methods can be divided into two major categories: a conventional fusion method and a fusion method based on deep learning.

In the traditional medical image fusion framework, researchers propose a plurality of decomposition modes or transformation modes to extract the characteristics of a source image, then select a certain fusion strategy to fuse the characteristics, and finally, perform inverse transformation on the fused characteristics to reconstruct a fused image. Conventional methods can be divided into four categories according to the way of feature extraction: (1) sparse representation-based methods; (2) methods based on multi-scale decomposition, such as pyramids and wavelets; (3) subspace-based methods, such as independent component analysis; (4) a method based on salient features. The traditional medical image fusion method has achieved good fusion effect, but has some defects, and further improvement of fusion performance is limited. First, the fusion performance of the conventional approach relies heavily on artificially defined features, which may limit the generalization of the approach to other fusion tasks. Second, different features may require different fusion strategies to function. Third, for the fusion method based on sparse representation, the dictionary learning is relatively time-consuming, so that it takes much time to synthesize a fusion image.

In recent years, a method based on deep learning becomes a new research hotspot in the field of image fusion, a deep learning model represented by a Convolutional Neural Network (CNN) and a Generative adaptive Network (generic adaptive Network) is successfully applied to image fusion problems of multi-focus, infrared, visible light and the like, characteristics and fusion strategies do not need to be defined artificially, and the method has the advantages over the traditional fusion method. However, because a reference image of a fusion result cannot be constructed for supervised learning and the complex diversity of human structures and tissues causes that the imaging characteristics of each modality are not easy to quantitatively describe and other factors, the current medical image fusion method based on deep learning has relatively few researches and is still in a starting stage.

Investigations have found that existing medical image fusion methods typically use either a human-defined fusion strategy or a convolution-based network to fuse multimodal image features. However, these fusion strategies do not efficiently extract global semantic information of multimodal images. In addition, the current medical image fusion method based on deep learning has the problems of insufficient and imprecise utilization of multi-modal image information. Most approaches use multimodal images in a simplistic way, the most common way being to stack the original different modality images (or the respective extracted underlying features) in channel dimensions and then input them directly into the network model for fusion.

Disclosure of Invention

The invention provides a medical image fusion method based on global information fusion to overcome the defects of the prior art, so that global information with multi-modal characteristics can be combined through a self-attention mechanism, and information of different modes can be utilized to the maximum extent through an interactive mode branch network, and a high-quality medical image fusion effect is realized.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a multi-modal medical image fusion method based on global information fusion, which is characterized by comprising the following steps:

step one, obtaining original medical images of M different modes and carrying out conversion of YCbCr color space to obtain Y channel images { I) of all modes₁，...，I_m...，I_M}; wherein, I_mA Y-channel image representing the mth modality, M ∈ {1, 2.., M }; y-channel image for all modalities { I₁，...，I_m...，I_MCutting the image to obtain the image block set S of all the modes₁，...，S_m...，S_MIn which S is_mA set of image blocks representing an mth modality;

step two, constructing a fusion network Transfusion, comprising the following steps: m modal branch networks, N fusion modules and a reconstruction module; and sets the image blocks of all the modalities { S₁，...，S_m...，S_MInputting into the fusion network TransFusion:

step 2.1, constructing the M modal branch networks and the N fusion modules:

step 2.1.1, M modal branch networks are constructed:

the mth modal branch network in the M modal branch networks is composed of N convolution modules, and the N convolution modules are respectively marked as ConvBlock_m1，...，ConvBlock_mn，...，ConvBlock_mNWherein, ConvBlock_mnAn nth convolution module representing an mth modal branching network, N ∈ {1, 2.., N };

when n is 1, the nth convolution module convBlock of the mth modal branching network_mnFrom X_mnTwo-dimensional convolution layers;

when N is 2, 3.. times.n, the nth convolution module ConvBlock of the mth modal branching network_mnFrom a maximum pooling layer and X_mnTwo-dimensional convolution layers;

the convolution kernel size of the x two-dimensional convolution layer of the n convolution module of the m modal branching network is ks_mnx×ks_mnxThe number of convolution kernels is kn_mnx，x∈{1，2，...，X_mn}；

Step 2.1.2, constructing N fusion modules:

any nth fusion module of the N fusion modules is a Transformer network and consists of L self-attention mechanism modules; the first of the L self-attention mechanism modules comprises: 1 multi-head attention layer, 2 layers of normalization and 1 full-connection layer;

step 2.2, gathering image blocks of all modes { S₁，...，S_m...，S_MInputting into M modal branch networks, and performing information fusion through N fusion modules:

when n is 1, the set S of image blocks of the mth modality_mAn nth convolution module ConvBlock input to the mth modal branching network_mnX of (2)_mnOutputting characteristic diagram after two-dimensional convolution layer

Wherein Hⁿ、Wⁿ、DⁿRespectively representing the output characteristic diagram of the mth modal branch network in the nth convolution module

High, wide, number of channels; thereby obtaining the output characteristic diagram of the M modal branch networks in the nth convolution module

Output characteristic diagram of the M modal branch networks in the nth convolution module

After the n-th fusion module processes, outputting a characteristic diagram

Wherein the content of the first and second substances,

friend the mth characteristic diagram output by the nth fusion module;

the mth feature map output by the nth fusion module

Feature map output by the nth convolution module of the mth modal branching network

Adding to obtain the characteristic diagram of the mth modal branch network after the nth convolution module interaction

Thereby obtaining the characteristic diagram of the M modal branch networks after the n convolution module interaction

When N is 2,3, the characteristic diagram of the mth modal branch network after the interaction of the nth-1 convolution modules

Inputting the data into the maximum pooling layer of the nth convolution module of the mth modal branch network for down-sampling to obtain a feature map of the mth modal branch network after down-sampling in the nth convolution module

The feature map after down sampling

Inputting the data into the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network, and sequentially passing through X_mAfter the two-dimensional convolution layer is processed, a characteristic diagram is output

Thereby obtaining a characteristic diagram output by the nth convolution module of the M modal branch networks

The characteristic diagram is combined

Through the n fusionAfter the module is processed, the characteristic diagram is output

The mth feature map output by the nth fusion module

Characteristic diagram output by nth convolution module of mth modal branch network

Adding to obtain an added characteristic diagram

Thereby obtaining M added feature maps

Further obtaining a characteristic diagram output by the Nth fusion module

Step 2.3, the reconstruction module is composed of an N-level convolution network; and outputting the feature maps of the N fusion modules

Inputting the image data into the reconstruction module to obtain a preliminary fusion image F':

step 2.3.1, all characteristic graphs output by the n fusion module

Adding to obtain a fused feature map phiⁿ(ii) a Thereby obtaining a fusion characteristic diagram { phi ] of N fusion modules¹，...，Φⁿ，...，Φ^N}；

Step 2.3.2, a reconstruction module formed by N-level convolution networks is constructed, and the fusion characteristic diagram phi of the N-th fusion module is usedⁿThe nth stage convolution network input to the reconstruction module:

when n is 1The nth stage convolutional network comprises: b is_nConvolution module RConvBlock_n1，...，RConvBlock_nb，...，

When N is 2, 3.. times.n, the nth stage convolutional network includes: b is_nConvolution module RConvBlock_n1，...，RConvBlock_nb，...，

And B_n+1 upsampling layer UpSample_n1，...，UpSample_nb，...，

The b convolution module RConvBlock of the nth stage convolution network_nbConsists of Y two-dimensional convolution layers, B is in the form of {1,2_n}；

When n is 1 and b is 1, fusing the feature map phi of the n-th fusion moduleⁿA b-th convolution module RConvBlock input to the n-th stage convolution network_nbAnd outputting a characteristic diagram phi R_nb；

When N is 2,3,.., N and b is 1, the fusion feature map Φ of the N-th fusion module is comparedⁿUpSample of the b-th upsampling layer input to the n-th convolutional network_nbAfter up-sampling processing is carried out, an up-sampled characteristic diagram phi U is output_nb(ii) a Thereby obtaining the characteristic diagram { phi U after the up-sampling of the convolution networks from the 2 nd level to the N-1 th level_2b，...，ΦU_nb，...，ΦU_Nb}；

When N is 2,3, the_nThen, the fusion characteristic diagram phi of the n-th fusion module is combinedⁿOutput characteristic diagram { phi R of first b-1 convolution modules of nth-level convolution network_n1，...，ΦR_n(b-1)B characteristic graphs after up-sampling of the first b of the (n + 1) th level convolution network (phi U)_(n+1)1，...，ΦU_(n+1)bSplicing to obtain a spliced characteristic diagram; inputting the spliced feature map to the nth stageThe b-th convolution module RConvBlock of the convolution network_nbAnd outputting the output characteristic diagram phi R of the b convolution module of the nth-level convolution network_nb(ii) a Thereby obtaining the B-th of the 1 st convolution network₁Output characteristic diagram of convolution module

The B-th of the 1 st level convolution network₁Output characteristic diagram of convolution module

After processing of a convolution layer, obtaining a primary fusion image F';

step three, constructing a loss function and training a network to obtain an optimal fusion model:

step 3.1, respectively calculating image block sets { S of all modes₁，...，S_m...，S_MEntropy of each image block set in the image block set is obtained, and a corresponding entropy value { e }is obtained₁，...，e_m...，e_MIn which e_mEntropy values of a set of image blocks representing an mth modality;

step 3.2, for the entropy value { e₁，...，e_m...，e_MRespectively carrying out normalization processing to obtain image block sets (S) of all modes₁，...，S_m...，S_MWeight of [ omega ]₁，...，ω_m，...，ω_MIn which ω is_mWeights representing a set of image blocks of the mth modality;

step 3.3, constructing a total Loss function Loss by using the formula (1):

in the formula (1), L_ssim(S_mF') represents the set S of image blocks of the m-th modality_mA structural similarity loss function with the preliminary fused image F';

step 3.4, an optimizer is used for carrying out minimum solving on the total Loss function Loss, so that all parameters in the fusion network Transfusion are optimized, and an optimal fusion model is obtained;

step four, utilizing the optimal fusion model to perform Y-channel image { I) of all modes₁，I₂，...，I_MProcessing and outputting a primary fusion image F'; the preliminary fused image F' is converted into an RGB color space, thereby obtaining a final fused image F.

The multi-modal medical image fusion method based on the global information fusion is also characterized in that the nth fusion module in the step 2.2 is processed according to the following processes:

step 2.2.1, the characteristic diagram of the N-th convolution module output by the n-th fusion module to the M modal branch networks

Splicing and leveling to obtain the size of (M H)ⁿ*Wⁿ)×DⁿThe flattened feature vectors of (a); adding the flattened feature vector and a trainable vector with the same size to obtain the feature vector containing the position information of the nth fusion module

Step 2.2.2, when l is 1, the 1 st self-attention mechanism module of the nth fusion module uses the feature vector

After linear mapping, three matrixes Q are obtained^nl，K^nl，V^nl(ii) a Recalculate Q^nl，K^nl，V^nlMultiple head attention results Z in between^nl(ii) a The multiple head attention result Z^nlInputting the data into a full connection layer of the ith self-attention mechanism module of the nth fusion module, and obtaining an output sequence vector of the ith self-attention mechanism module of the nth fusion module

When l is 2,3,.. times.N, the output sequence vector of the l-1 self-attention mechanism module of the N-th fusion module is used

Inputting the input into the first self-attention mechanism module of the nth fusion module, and obtaining the output sequence vector of the first self-attention mechanism module of the nth fusion module

Thereby obtaining the output sequence vector of the L-th self-attention mechanism module of the n-th fusion module

Step 2.2.3, outputting the output sequence vector

Dividing into M modes, and deforming the size of each mode into Hⁿ×Wⁿ×DⁿTo obtain an output characteristic diagram

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides an unsupervised anatomical and functional image fusion method. The method introduces a Transformer structure as a fusion strategy, and the Transformer combines the global information of the multi-modal medical image and sufficiently fuses the semantic information of the multi-modal image by utilizing a self-attention mechanism contained in the Transformer, so that a fine-grained fusion effect is realized. The invention not only well retains the structure and texture information in the anatomical image, but also improves the mosaic phenomenon caused by low resolution in the functional image.

2. The present invention proposes modal branching networks that interact on multiple scales. The network can extract multi-scale complementary features of each modal image, and fully utilizes multi-modal image information. The interactive branched network enhances the anatomical and functional image fusion effect.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention, wherein "ks × ks, kn" represents a convolution layer with a kernel size of ks × ks and a number of kernels of kn;

fig. 2 is a structural diagram of a modal branching network and a convergence module provided in an embodiment of the present invention;

fig. 3 is a structural diagram of a reconstruction module according to an embodiment of the present invention.

Detailed Description

In this embodiment, a multi-modal image fusion method based on global information fusion, as shown in fig. 1, includes the following steps:

step one, obtaining M original medical images of different modalities and carrying out preprocessing of color space conversion and image shearing to obtain a preprocessed image block set { S) of all modalities₁，S₂，...，S_MIn which S is_mA set of image blocks representing the mth modality, M ∈ {1, 2.., M }:

step 1.1, acquiring original medical images of a plurality of modalities required by an experiment from a Harvard medical image data set website (http:// www.med.harvard.edu/AANLIB/home. html); the present embodiment collects medical images of M-2 modalities from this public dataset, including 279 pairs of MR-T1 and PET images and 318 pairs of MR-T2 and SPECT images, where MR-T1 and MR-T2 are anatomical images in grayscale, the number of channels is 1, PET and SPECT are functional images of RGB color space, and the number of channels is 3;

step 1.2, converting the image of the RGB color space into the YCbCr space according to the formula (1):

in formula (1), R, G, B are three channels of RGB color space, Y is a luminance channel, and Cb and Cr are two color channels, respectively;

step (ii) of1.3, in order to expand the number of samples, cutting the gray level image and the image of the Y channel into image blocks to obtain an image block set { S) of all modes₁，S₂，...，S_M}; in this embodiment, the size of the clipped image block is 64 × 64;

step two, constructing a fusion network Transfusion, comprising the following steps: m modal branch networks, N fusion modules and a reconstruction module; and sets the image blocks of all the modalities { S₁，...，S_m...，S_MInputting into a fusion network TransFusion:

step 2.1, constructing M modal branch networks and N fusion modules:

step 2.1.1, constructing M modal branch networks:

when n is equal to 1, the nth convolution module convBlock of the mth modal branching network_mnFrom X_mnTwo-dimensional convolution layers;

In this embodiment, N is 4, the kernel size of all the largest pooling layers is 2 × 2, the step size is 2, and X is_mn、ks_mnx、kn_mnxAs shown in fig. 2;

step 2.1.2, constructing N fusion modules:

any nth fusion module of the N fusion modules is a Transformer network and consists of L self-attention mechanism modules; the first of the L self-attention mechanism modules comprises: 1 multi-head attention layer, 2 layers of normalization and 1 full-connection layer; in the present embodiment, L ═ 1;

when n is 1, the set S of image blocks of the mth modality_mThe nth convolution module ConvBlock input to the mth modal branching network_mnX of (2)_mnOutputting characteristic diagram after two-dimensional convolution layer

Wherein Hⁿ、Wⁿ、DⁿRespectively representing the output characteristic diagram of the mth modal branch network at the nth convolution module

High, nest, channel count; thereby obtaining the output characteristic diagram of the M modal branch networks in the nth convolution module

In this example, (H)¹，W¹，D¹)＝(64，64，64)，(H²，W²，D²)＝(32，32，128)，(H³，W³，D³)＝(16，16，256)，(H⁴，W⁴，D⁴)＝(8，8，512)；

Output characteristic diagram of M modal branch networks in nth convolution module

After the n-th fusion module is processed according to the formula (2), a characteristic diagram is output

Wherein, the first and the second end of the pipe are connected with each other,

an mth feature map representing an output of the nth fusion module:

in the formula (2), a Transformer_nRepresenting the nth fusion module and realizing the following steps:

Splicing and leveling to obtain the size of (M H)ⁿ*Wⁿ)×DⁿThe flattened feature vectors of (a); adding the flattened feature vector with a trainable vector with the same size to obtain the feature vector containing the position information of the nth fusion module

Step 2.2.2, the feature vector is processed by the self-attention mechanism module of the n fusion module

Linear mapping to three matrices, Qⁿ，Kⁿ，Vⁿ：

In the formula (3), the reaction mixture is,

is a trainable matrix with dimensions of Dⁿ×Dⁿ；

Step 2.2.3, mixing Qⁿ，Kⁿ，VⁿAre respectively divided into h heads to obtain

i ∈ {1, 2., h }, and then the multi-head attention is calculated according to equation (4) -equation (6), resulting in a result Z:

in the formula (5), Concat represents a splicing operation,

is a trainable matrix; in formula (6), LayerNorm indicates layer normalization;

step 2.2.4, result Z of multi-head attentionⁿInputting the data into a full connection layer to obtain an output sequence vector of the nth fusion module

In formula (7), MLP represents a fully connected layer;

step 2.2.5, mixing

Dividing into M modes, and deforming the size of each mode into Hⁿ×Wⁿ×DⁿObtaining an output characteristic diagram

Wherein the content of the first and second substances,

an mth feature map representing an output of the nth fusion module;

the mth feature map output by the nth fusion module

Adding to obtain a characteristic diagram of the mth modal branch network after the nth convolution module is interacted

When N is 2,3,.. times, N, the characteristic diagram of the mth modal branch network after the interaction of the (N-1) th convolution module is obtained

The feature map after down sampling

Transporting into the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network, and sequentially passing through X_mnAfter the two-dimensional convolution layer is processed, a characteristic diagram is output

Thereby obtaining M modal branched netsCharacteristic diagram output by nth convolution module of network

Will feature map

After the processing of the n-th fusion module, a characteristic diagram is output

The mth feature map output by the nth fusion module

Adding to obtain an added characteristic diagram

Thereby obtaining M added feature maps

Further obtaining a characteristic diagram output by the Nth fusion module

Step 2.3, constructing a reconstruction module consisting of N-level convolution networks, and fusing output characteristic graphs of the N fusion modules

Inputting the image into a reconstruction module to obtain a preliminary fusion image F':

step 2.3.1, all characteristic graphs output by the n fusion module

Adding to obtain a fused feature map phiⁿ(ii) a Thereby obtaining N fusion modulesFused feature map of blocks Φ¹，...，Φⁿ，...，Φ^N}；

Step 2.3.2, a reconstruction module formed by N-level convolution networks is constructed, and the fusion characteristic diagram phi of the N-th fusion module is usedⁿThe nth level convolution network input to the reconstruction module:

when n is 1, the nth stage convolutional network includes: b is_nConvolution module RConvBlock_n1，RConvBlock_n2，...，

When N is 2, 3.. times.n, the nth stage convolutional network includes: b is_nConvolution module RConvBlock_n1，RConvBlock_n2，...，

And B_n+1 upsampling layer UpSample_n1，UpSample_n2，...，

When n is 1 and b is 1, fusing characteristic diagram phi of the n-th fusing moduleⁿThe b convolution module RConvBlock input to the nth stage convolution network_nbAnd outputting a characteristic diagram phi R_nb；

When N is 2,3, the. N, and b is 1, the fusion feature map Φ of the N-th fusion module is comparedⁿInput to the b-th upsampling layer UpSample of the nth stage convolutional network_nbAfter up-sampling processing is carried out, an up-sampled characteristic diagram phi U is output_nb(ii) a Thereby obtaining the characteristic diagram { phi U after the up-sampling of the convolution networks from the 2 nd level to the N-1 th level_2b，...，ΦU_nb，...，ΦU_Nb}；

When N is 2,3, the_nThen, the fusion characteristic diagram phi of the n-th fusion module is combinedⁿN-th order convolutionOutput characteristic diagram { phi R of first b-1 convolution modules of network_n1，...，ΦR_n(b-1)B characteristic graphs after up-sampling of the first b of the (n + 1) th level convolution network (phi U)_(n+1)1，...，ΦU_(n+1)bSplicing to obtain a spliced characteristic diagram; inputting the spliced feature map into a b-th convolution module RConvBlock of an nth-stage convolution network_nbAnd outputting the output characteristic diagram phi R of the b convolution module of the nth-level convolution network_nb(ii) a Thereby obtaining the B-th of the 1 st convolution network₁Output characteristic diagram of convolution module

Convolution network B of level 1₁Output characteristic diagram of convolution module

After processing of a convolution layer, obtaining a primary fusion image F';

in this embodiment, the reconstruction module is shown in fig. 3, where Y is 2 and B is₁＝3，B₂＝2，B₃＝1，B₄＝0；

step 3.1, calculating image block sets { S of all the modalities according to the formulas (8) to (9)₁，S₂，...，S_MEntropy of each image block set in the image block set is obtained, and a corresponding entropy value { e }is obtained₁，e₂，...，e_MIn which e_mEntropy values of a set of image blocks representing the mth modality:

e_m＝Entropy(S_m) (8)

in the formula (9), p_lProbability of being the l-th gray value;

step 3.2, entropy value { e }₁，e₂，...，e_MRespectively carrying out normalization processing to obtain image block sets (S) of all modes₁，S₂，...，S_MWeight of { omega } ω₁，ω₂，...，ω_MIn which ω is_mWeight of set of image blocks representing mth modality:

in the formula (10), η is a temperature parameter; in the present embodiment, η ═ 1;

step 3.3, constructing a total Loss function Loss by using the formula (11):

Loss＝ω₁L_ssim(S₁，F′)+ω₂L_ssim(S₂，F′) (11)

L_ssim(S_j，F′)＝1-SSIM(S_j，F′) (12)

in the formula (11), L_ssim(S_mF') represents the set S of image blocks of the m-th modality_mA structural similarity loss function with the preliminary fused image F'; in formula (12), SSIM is a structural similarity function;

step 3.4, adopting an AdamW optimizer to carry out minimum solution on the Loss function Loss, thereby optimizing all parameters in the fusion network Transfusion and obtaining an optimal fusion model;

step four, utilizing the optimal fusion model to perform Y-channel image or gray level image { I) of all modes₁，I₂，...，I_MProcessing and outputting a primary fusion image F'; splicing the preliminary fusion image F 'with Cb and Cr channels and converting the preliminary fusion image F' into an RGB color space to obtain a final fusion image F;

step five, evaluating the performance of the invention:

in specific implementation, the present invention is compared with the conventional methods CSMCA and the deep learning methods DDcGAN and EMFusion. In addition, for explaining the Transformer-based fusion module and the interactive modal branching network in the inventionTwo comparative experiments were set up. The first experiment removed the Transformer fusion module and the second replaced the interactive modal branching network with a weight-shared modal branching network. Boundary-based similarity measurement Q using mutual information, mean gradient_AB/FVisual perception index Q_CVMutual information, average gradient, Q as evaluation index_AB/FThe larger, Q_CVThe smaller the quality of the fused image. The average fusion quality for the 30 pairs of MR-T1 and PET test images and the 30 pairs of MR-T2 and SPECT test images is as follows:

TABLE 1. fusion Performance of different methods

The experimental result shows that the invention has mutual information, average gradient and Q_AB/F、Q_CVThe four indexes are all optimal. The Transformer fusion module of the invention helps to enhance mutual information of 5.10% -10.02%, average gradient of 2.59% -5.28%, Q of 3.04% -4.36%_AB/F1.43% -12.66% of Q_CV(ii) a The inventive interactive modal branching network helps to enhance mutual information of 18.39% -19.91%, average gradient of 1.06% -6.69%, Q of 7.68% -11.02%_AB/F27.69% -62.22% of Q_CV。

Claims

1. A multi-modal medical image fusion method based on global information fusion is characterized by comprising the following steps:

step one, obtaining original medical images of M different modes and carrying out conversion of YCbCr color space to obtain Y channel images { I) of all modes₁,…,I_m…,I_M}; wherein, I_mA Y-channel image representing the mth modality, M ∈ {1,2, …, M }; y-channel image for all modalities { I₁,…,I_m…,I_MCutting the image to obtain the image block set S of all the modes₁,…,S_m…,S_MIn which S_mDiagram representing the m-th modalityA set of blocks;

step two, constructing a fusion network Transfusion, which comprises the following steps: m modal branch networks, N fusion modules and a reconstruction module; and sets the image blocks of all the modalities { S₁,…,S_m…,S_MInputting into the fusion network TransFusion:

step 2.1, constructing the M modal branch networks and the N fusion modules:

step 2.1.1, constructing M modal branch networks:

the mth modal branch network in the M modal branch networks is composed of N convolution modules, and the N convolution modules are respectively recorded as ConvBlock_m1,…,COnvBlock_mn,…,ConvBlock_mNWherein, ConvBlock_mnAn nth convolution module representing an mth modal branching network, N ∈ {1,2, …, N };

when N is 2,3, …, N, the nth convolution module ConvBlock of the mth modal branching network_mnFrom a maximum pooling layer and X_mnTwo-dimensional convolution layers;

the convolution kernel size of the x two-dimensional convolution layer of the n convolution module of the m modal branching network is ks_mnx×ks_mnxThe number of convolution kernels is kn_mnx，x∈{1,2,…,X_mn}；

Step 2.1.2, constructing N fusion modules:

any nth fusion module of the N fusion modules is a Transformer network and consists of L self-attention mechanism modules; the first of the L self-attention mechanism modules comprises: 1 multi-head attention layer, 2 layers of normalization and 1 full connection layer;

step 2.2, gathering image blocks of all modes { S₁,…,S_m…,S_MInputting into M modal branch networks, and performing information fusion through N fusion modules:

when n is 1, the map of the mth modeSet of blocks S_mThe nth convolution module ConvBlock input to the mth modal branching network_mnX of (2)_mnOutputting characteristic diagram after two-dimensional convolution layer

After the n-th fusion module processes, outputting a characteristic diagram

Wherein the content of the first and second substances,

an mth feature map representing an output of the nth fusion module;

the mth feature map output by the nth fusion module

When N is 2,3, …, N, the characteristic diagram of the mth modal branching network after the interaction of the nth-1 convolution module

The feature map after down sampling

Inputting the data into the 1 st two-dimensional convolution layer of the nth convolution module of the mth modal branch network, and sequentially passing through X_mnAfter the two-dimensional convolution layer is processed, a characteristic diagram is output

The characteristic diagram is combined

The mth feature map output by the nth fusion module

Adding to obtain an added characteristic diagram

Thereby obtaining M added feature maps

Further obtaining a characteristic diagram output by the Nth fusion module

step 2.3.1, all characteristic graphs output by the n fusion module

Adding to obtain a fused feature map phiⁿ(ii) a Thereby obtaining a fusion characteristic diagram { phi ] of N fusion modules¹,…,Φⁿ,…,Φ^N}；

when n is 1, the nth stage convolutional network includes: b is_nA convolution module

When N is 2,3, …, N, the nth stage convolutional network comprises: b is_nA convolution module

And B_n+1 upsampling layers

The b convolution module RConvBlock of the nth stage convolution network_nbConsists of Y two-dimensional convolution layers, B ∈ {1,2, …, B_n}；

When N is 2,3, …, N and b is 1, fusing the feature map phi of the N-th fusing moduleⁿUpSample of the b-th upsampling layer input to the n-th convolutional network_nbAfter up-sampling processing is carried out, an up-sampled characteristic diagram phi U is output_nb(ii) a Thereby obtaining the up-sampled characteristic diagram { phi U of the convolution network from the 2 nd level to the N-1 st level_2b,…,ΦU_nb,…,ΦU_Nb}；

When N is 2,3, …, N and B is 2,3, …, B_nThen, the fusion characteristic diagram phi of the n-th fusion module is combinedⁿOutput characteristic diagram { phi R of the first b-1 convolution modules of the nth-level convolution network_n1,…,ΦR_n(b-1) B characteristic graphs after up-sampling of the first b of the (n + 1) th level convolution network (phi U)_(n+1)1,…,ΦU_(n+1)bAfter splicing, obtainingSplicing the characteristic graphs; inputting the spliced feature map into a b-th convolution module RConvBlock of an n-th-level convolution network_nbAnd outputting the output characteristic diagram phi R of the b convolution module of the nth-level convolution network_nb(ii) a Thereby obtaining the B-th of the 1 st convolution network₁Output characteristic diagram of convolution module

After the processing of a convolution layer, obtaining a primary fusion image F';

step 3.1, respectively calculating image block sets { S of all modes₁,…,S_m…,S_MEntropy of each image block set in the image block set is obtained, and a corresponding entropy value { e }is obtained₁,…,e_m…,e_MIn which e_mEntropy values of a set of image blocks representing an mth modality;

step 3.2, for the entropy value { e₁,…,e_m…,e_MRespectively carrying out normalization processing to obtain image block sets (S) of all modes₁,…,S_m…,S_MWeight of [ omega ]₁,…,ω_m,…,ω_MIn which ω is_mWeights representing a set of image blocks of the mth modality;

and 3.3, constructing a total Loss function Loss by using the formula (1):

in the formula (1), L_ssim(S_mF') represents the set S of image blocks of the m-th modality_mStructural similarity with the preliminary fusion image FA loss function;

step four, utilizing the optimal fusion model to perform Y-channel image { I) of all modes₁,I₂,…,I_MProcessing and outputting a primary fusion image F'; and converting the preliminary fusion image F' into an RGB color space, thereby obtaining a final fusion image F.

2. The multi-modal medical image fusion method based on global information fusion as claimed in claim 1, wherein the n-th fusion module in the step 2.2 is processed according to the following procedures:

Splicing and leveling to obtain the size of (M H)ⁿ*Wⁿ)×DⁿThe flattened feature vectors of (a); adding the flattened characteristic vector and a trainable vector with the same size to obtain a characteristic vector containing position information of the nth fusion module

Step 2.2.2, when l is 1, the ith self-attention mechanism module of the nth fusion module uses the feature vector

After linear mapping, three matrixes Q are obtained^nl，K^nl，V^nl(ii) a Recalculate Q^nl，K^nl，V^nlMultiple head attention results in between Z^nl(ii) a The multi-head attention result Z^nlFull connection of the ith attention mechanism module input to the nth fusion moduleIn the layer connection, the output sequence vector of the ith self-attention mechanism module of the nth fusion module is obtained

When l is 2,3, …, N, the output sequence vector of the l-1 self-attention mechanism module of the N fusion module is added

Step 2.2.3, outputting the sequence vector