CN113506222A

CN113506222A - Multi-mode image super-resolution method based on convolutional neural network

Info

Publication number: CN113506222A
Application number: CN202110870612.6A
Authority: CN
Inventors: 刘羽; 朱文瑜; 成娟; 李畅; 宋仁成; 陈勋
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-10-15
Anticipated expiration: 2041-07-30
Also published as: CN113506222B

Abstract

The invention discloses a multi-modal image super-resolution method based on a convolutional neural network, which comprises the following steps: firstly, preparing data; secondly, constructing a super-resolution network for cascading a plurality of dense residual attention modules, wherein the super-resolution network comprises the following components: a shallow feature extraction, feature depth processing and image reconstruction part; thirdly, performing super resolution on the input low-resolution image, comprising: super-resolution network training and super-resolution image testing. The invention can fully utilize complementary and redundant information in medical images of different modes to reconstruct a high-resolution image with better quality, provides an image with better quality for human eye observation, and simultaneously provides support for computer vision tasks such as segmentation, classification and the like of the image.

Description

Multi-mode image super-resolution method based on convolutional neural network

Technical Field

The invention relates to the technical field of image super-resolution, in particular to a multi-mode image super-resolution method based on a convolutional neural network.

Background

The image super-resolution refers to a process of reconstructing a high-resolution image from a given low-resolution image or images, and the high-resolution image can be obtained only through an algorithm in the whole process. At present, the more advanced super-resolution method is mostly based on the super-resolution of the image of a single mode, so that although a high-resolution image with good effect can be obtained, complementary and redundant information among multi-mode images is definitely ignored, and the information sometimes has importance on the image reconstruction result in the process of reconstructing the high-resolution image. Nowadays, information technology is facing explosive growth, forms of data resources are increasingly diversified, and multi-modal data becomes a main form of data used by people. Generally speaking, more information, and more feature expression capability, tends to reconstruct more excellent high resolution images. Therefore, the research relates to a multi-mode learning method of multiple input modes, and more prior information given to the super-resolution network has huge application prospect and wide research value.

In the field of natural images, multimodality image data are increasingly developed, such as visible light images and infrared images. The visible light has higher resolution, contrast and good visual effect, but the infrared image can be less influenced by environmental factors, so that the general applicability of the infrared image is stronger. In many computer vision tasks such as pedestrian re-recognition, face recognition, etc., combining images of different modalities can achieve a more excellent effect. However, in the current super-resolution field, only a few methods are combined with images of multiple different modalities to perform super-resolution, so that the performance of many network structures cannot be further improved.

Medical imaging includes a plurality of image modalities, and the use of multi-modality Magnetic Resonance Imaging (MRI) technology is common in medical image data. Among the more common MRI modalities, T1-weighted imaging (T1) and T2-weighted imaging (T2) are included. Generally, only one-sided medical information can be obtained from a single-modality MRI image, and in order to obtain more complete and accurate information, the mutual complementation of different modalities of MRI plays a crucial role.

Meanwhile, medical image super-resolution is always a big hot spot in the field of image super-resolution. Only conventional image super resolution methods are used, for example: nearest neighbor algorithm, bilinear algorithm, bicubic interpolation, etc. Although these methods are fast to run and easy to implement, their results often show edge blurring and loss of high frequency details, which is a fatal problem in the field of medical images. The super-resolution method based on deep learning can extract depth features through long-time training so as to reconstruct a high-resolution image with quality higher than that of the traditional method. However, obtaining high-resolution images still has the problems of generating artifacts, losing details and the like, and thus it is difficult to obtain reliable high-resolution medical images.

Compared with natural images, the super-resolution problem of medical images has stronger complexity and strictness, and comprises the following characteristics: 1) medical images generally require high accuracy, the super-resolution result must faithfully reflect the actual situation, and once deviation occurs, the subsequent processing steps (such as segmentation, classification and other high-level tasks) are seriously wrong; 2) the anatomical tissue structure and the shape of a human body are complex, and individual differences exist, so that difficulty is brought to super resolution of an image; 3) the acquisition of medical image information is extremely easily affected by various factors, such as external noise, field offset effect, local body effect, unconscious movement of an acquired object, unavoidable tissue activity and the like, and inevitably generates problems of motion artifacts, nonuniformity and the like, thereby bringing great difficulty to the application of the image super-resolution method based on deep learning to medical images. Therefore, it is necessary to conduct a deeper research on the super resolution method with respect to the above-mentioned features of medical images, and consider combining information in medical images of different modalities to improve the performance of the super resolution method.

Disclosure of Invention

The invention provides a multi-mode image super-resolution method based on a convolutional neural network for overcoming the problems of the prior image super-resolution technology, so as to provide better image characteristic expression by fully utilizing complementary and redundant information of images among different modes and reconstruct a high-resolution image with higher quality, thereby providing better quality images for human eye observation and simultaneously providing support for computer vision tasks such as image segmentation, classification and the like.

The invention adopts the following technical scheme for solving the problems:

the invention discloses a multi-modal image super-resolution method based on a convolutional neural network, which is characterized by comprising the following steps of:

step 1, data preparation:

acquiring any group of reference image sets with K × L resolution and S modalities

Wherein the content of the first and second substances,

a reference image representing an S-th modality, S1, 2.., S; obtaining a corresponding set of low resolution image gathers having a resolution of η K by η L

Wherein the content of the first and second substances,

representing a low resolution image of the s-th modality, η representing a zoom factor, and 0 < η < 1;

step 2, constructing a multi-mode image super-resolution network, comprising the following steps: a shallow layer feature extraction part, a feature refining part and an image reconstruction part;

step 2.1, the shallow feature extraction part comprises a convolution layer with convolution kernel size of NxN and an activation function;

low resolution images of S different modalities

After cascading, obtaining a cascaded low-resolution image I with the size of eta Kx eta Lx S_inAnd inputting the data into the multi-mode image super-resolution network, and outputting a shallow feature map F with the size of eta K multiplied by eta L multiplied by C after the processing of the shallow feature extraction part_initC is the number of channels set by the network;

step 2.2, the feature refining part consists of G dense residual error attention modules, an NxN convolutional layer and an MxM convolutional layer;

g dense residual attention modules are denoted as DRAB₁,DRAB₂,…,DRAB_g,...,DRAB_GWherein DRAB_gIs the g-th dense residual attention module;

the g-th dense residual attention module DRAB_gThe system is formed by cascading a g-th dense residual error unit and a g-th channel attention mechanism unit;

the g-th dense residual error unit is composed of Y NxN convolutional layers and an MxM convolutional layer, and the Y NxN convolutional layers are densely connected;

the g channel attention mechanism unit consists of a g global pooling layer P_GThe g-th one-dimensional convolution layer for adaptively adjusting the size of the convolution kernel and the g-th activation function F_AForming;

when g is 1, the shallow feature map F_initAs the input characteristic of the g dense residual unit, and inputting the characteristic into the g dense residual unit, and cascading the output characteristic graph of each N × N convolution layer with the input shallow characteristic graph F_initThen, the g-th intermediate feature is obtained through the M × M convolutional layer

The g-th intermediate feature

Adding the input characteristics of the g dense residual error unit to obtain the output characteristics of the g dense residual error unit

Output characteristics of the g-th dense residual unit

As the input characteristic of the g channel attention mechanism unit, obtaining a weight vector L by the g channel attention mechanism unit_AgAnd then, obtaining the output of the g channel attention mechanism unit by using the formula (1) and using the output as a g dense residual attention module DRAB_gOutput characteristic of

F_DRAg＝L_Ag×F_DRg (1)

When G is more than or equal to 2 and less than or equal to G, the G-th dense residual error attention module DRAB_gIs the g-1 th dense residual attention module DRAB_g-1Output characteristic of

So that the output characteristics are obtained by G dense residual attention modules

After cascade connection, sequentially passing through the N × N convolution layer and one M × M convolution layer of the feature refining part, outputting an intermediate feature map F 'of the feature refining part with the size of eta K × eta L × C'_fineAnd from said intermediate feature map F'_fineAnd the shallow feature map F_initAfter jump linking, adding to obtain final characteristic diagram F of low resolution space with the size of eta K multiplied by eta L multiplied by C_LR；

Step 2.3, the image reconstruction part comprises an up-sampling layer and S image reconstruction branches, wherein the S-th image reconstruction branch comprises: h, NxN convolutional layers with activation functions and one NxN convolutional layer without activation functions;

final feature F of low resolution space_LRInputting the image into the image reconstruction part, and obtaining a high-resolution spatial feature F through an up-sampling layer_HRAnd outputting a residual error map of the s-th mode after passing through the s-th image reconstruction branch

Thus, residual error maps of all modes are obtained by the S image reconstruction branches

Step 2.4, for low resolution image set

Upsampling to obtain an interpolated low resolution image set

Wherein the content of the first and second substances,

low resolution image representing the s-th modality

Carrying out up-sampling to obtain an interpolated low-resolution image;

step 2.5, residual error map of the s-th mode

And interpolated low resolution map of the s-th modality

Adding the obtained S-th modal super-resolution image

So as to obtain S super-resolution images of different modes

Step 3, training the multi-mode image super-resolution network:

step 3.1, obtaining R groups of reference image sets and R groups of low-resolution image sets corresponding to the R groups of reference image sets according to the process of the step 1;

step 3.2, defining the current cycle number as t, and initializing t to be 0; defining the maximum number of iterations as

Z is the maximum number of preset rounds of super-resolution network training; x is the number of groups extracted each time;

3.3, randomly taking out X groups of low-resolution image sets from the t times of the R groups of low-resolution image sets, inputting the X groups of low-resolution image sets into the multi-mode image super-resolution network for training, and obtaining super-resolution image sets output by the t times of training

Representing the super-resolution image of the s-th mode in the x-th group of multi-mode super-resolution images output by the t-th training; x is 1,2, …, X;

and correspondingly taking X groups of images from the t-th time of the R group of reference image sets, and constructing a loss function L of the t-th training shown in formula (2)_t(θ)：

In the formula (2), the reaction mixture is,

a reference image of the s-th modality representing the x-th set of reference images; optimizing and solving the loss function constructed by the formula (2) by adopting a back propagation algorithm, so as to adjust all parameters in the deep learning network;

step 3.4, after T +1 is assigned to T, judging whether T is greater than T or not, if so, indicating that a super-resolution network model which is finally trained is obtained; otherwise, returning to the step 3.3 for sequential execution;

and 3.5, inputting the low-resolution image to be tested into the trained super-resolution network model so as to obtain a predicted super-resolution image.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a unified network framework, simultaneously realizes the super-resolution tasks of images in different modes, fully utilizes the redundant and complementary information among the images in different modes, simultaneously reconstructs high-resolution images in multiple modes, and improves the super-resolution effect of any image in a single mode. Compared with the prior art that networks need to be trained on different modal image super-resolution tasks respectively, the method can realize simultaneous super-resolution of multi-modal images by only carrying out network training once.

2. Compared with most of the existing deep learning super-resolution networks, the invention designs a lightweight network to realize multi-mode image super-resolution, improves the calculation efficiency, reduces the storage cost and has stronger practicability. In addition, the method carries out up-sampling at the end of the feature extraction part, so that a large amount of convolution operation is carried out in a low-resolution space, and a strategy of reconstructing a residual error map is adopted in an image reconstruction link, so that the network is easy to train, and the calculation efficiency is high.

3. The invention designs a cascade basic structure of a plurality of dense residual attention modules, so that the information flow in the network is more reasonable, the loss of characteristic information is reduced, the network can learn more deep characteristics with complex hierarchical structures, the characteristic information in the original image is prevented from being lost in large quantity, and the quality of super-resolution results is greatly improved. In addition, the method further improves the flow path of information in the network by adopting the global residual learning and the local residual learning, prevents the network from losing shallow features while extracting deeper features, and enables the feature information extracted by the network to be more comprehensive.

Drawings

FIG. 1 is a flow chart of a multi-modal image super-resolution method based on a convolutional neural network of the present invention;

fig. 2 is a schematic diagram of a specific framework of the present invention, taking s-2 as an example;

FIG. 3 is a schematic diagram of a dense residual attention module according to the present invention;

FIG. 4 is a diagram illustrating a dense residual error unit structure according to the present invention;

FIG. 5 is a schematic diagram of a channel attention mechanism unit according to the present invention.

Detailed Description

In this embodiment, taking MRI of two different modalities as an example, a specific network framework is shown in fig. 2, and a multi-modality image super-resolution method based on a convolutional neural network is shown in fig. 1, and includes the following steps:

step 1, data preparation:

Wherein the content of the first and second substances,

Wherein the content of the first and second substances,

in this embodiment, T1 and T2 MR images in MICCAI BraTS _2019, each of which has a size of 240 × 240 × 155, were used as raw data, and 457 3D MR images were included in the data set. Slicing the 3D MR images along the Z axis, selecting one slice as training data for every 5 slices of each 3D image from the 60 th slice, and taking 10 slices of each 3D image to obtain 4570 groups of 2D MR images with 2 modalities;

using the obtained data as a reference image set

Respectively carrying out bicubic downsampling with different scaling factors on the reference image to obtain a low-resolution image set under different scale factors

The scaling factor adopted in the embodiment is 2, but other scaling factors can also achieve good effects in the network;

step 2.1, the shallow feature extraction part comprises a convolution layer with convolution kernel size of nxn and an activation function, in the embodiment, N is taken to be 3, the activation function adopts a relu activation function, and the convolution layer with convolution kernel size of 3 is adopted, so that a good effect can be obtained, and too many parameters cannot be introduced to slow down the network training speed;

low resolution images of S different modalities

After cascading, obtaining a cascaded low-resolution image I with the size of eta Kx eta Lx S_inIn the present embodiment, 2 images of different modalities are used, each having a size of 120 × 120, i.e., S ═ 2, K ═ L ═ 240, and η ═ 0.5;

to cascade low resolution images I_inInputting the image into a multi-mode image super-resolution network, processing the image by a shallow feature extraction part, and outputting a shallow feature map F with the size of eta K multiplied by eta L multiplied by C_initC is the number of channels set by the network, and in this embodiment, C is set to 64;

step 2.2, the feature refining part is composed of G dense residual attention modules, an M × M convolutional layer and an N × N convolutional layer, where in this embodiment, N is 3, M is 1, and G is 3;

g dense residual attention modules are denoted as DRAB₁,DRAB₂,...,DRAB_g,...,DRAB_GWherein DRAB_gThe structure of the specific dense residual attention module is shown in fig. 3;

g-th dense residual attention module DRAB_gThe system is formed by cascading a g-th dense residual error unit and a g-th channel attention mechanism unit;

in this embodiment, Y is 6, the growth rate of the dense connection is 32, and the specific structure of the dense residual unit is as shown in fig. 4. Dense connection can enable the feature diagram to be continuously reused, so that a network can obtain a better effect under a shallower condition, and the computing efficiency of the network is improved;

the g channel attention mechanism unit consists of a g global pooling layer P_GThe g-th one-dimensional convolution layer for adaptively adjusting the size of the convolution kernel and the g-th activation function F_AComposition, in this embodiment, the function F is activated_AWith softmax, the specific structure of the channel attention mechanism unit is shown in fig. 5;

when g is 1, the shallow feature map F_initAs the input characteristic of the g dense residual unit, and concatenating the output characteristic diagram of each NxN convolutional layer with the input shallow characteristic diagram F_initThen, the g-th intermediate feature is obtained through the M × M convolutional layer

In this embodiment, each NxN convolutional layer outputs a feature map of 32 channels, and each MxM convolutional layer outputs a feature map of 64 channels, and an intermediate feature

Has 64 channels; the g-th intermediate feature

By adopting a strategy of local residual error learning, the input feature graph can be directly transmitted to a deep network through jump connection, the deep feature is continuously extracted by the network, and meanwhile, the shallow feature is prevented from being lost, so that the network obtains a more comprehensive feature graph;

output characteristics of the g-th dense residual unit

As the input characteristic of the g channel attention mechanism unit, the g channel attention mechanism unit obtains a weight vector L_AgAnd then, obtaining the output of the g channel attention mechanism unit by using the formula (1) and using the output as a g dense residual attention module DRAB_gOutput characteristic of

F_DRAg＝L_Ag×F_DRg (1)

When G is more than or equal to 2 and less than or equal to G, the G-th dense residual error attention module DRAB_gIs characterized by the g-1 th dense residual attention module DRAB_g-1Output characteristic of

After cascade connection, the intermediate feature map F 'of the feature refining part with the size of eta K eta L multiplied by C is output after sequentially passing through the N multiplied by N convolution layers and one M multiplied by M convolution layers of the feature refining part'_fineAnd from an intermediate feature map F'_fineAnd shallow feature map F_initAfter jump linking, adding to obtain final characteristic F of low resolution space with size of eta K multiplied by eta L multiplied by C_LR. The output of the G dense residual attention modules is cascaded to obtain a characteristic diagram of G × 64 channels, which is 192 channels in this embodiment. The characteristic diagram is firstly passed through a 1 × 1 convolutional layer,feature map recompression to 64 channels with some feature fusion, followed by further feature refinement through a 3 × 3 convolutional layer to obtain an intermediate feature map F 'of size 120 × 120 × 64'_fine. Then intermediate feature map F'_fineAnd shallow feature map F_initGlobal residual learning is formed by adding, and the representation capability of the network is further improved;

step 2.3, the image reconstruction part comprises an up-sampling layer and S image reconstruction branches, and the S-th image reconstruction branch comprises: h is an nxn convolutional layer with an activation function and an nxn convolutional layer without an activation function, in this embodiment, the upsampled convolutional layer used is a high-efficiency sub-pixel convolutional layer, and H is 1;

final feature F of low resolution space_LRIn the input image reconstruction part, a high-resolution spatial feature F is obtained through an upsampling layer_HRAnd then the residual error image of the s-th mode is output after passing through the s-th image reconstruction branch

In the embodiment, 2 image reconstruction branches are provided to reconstruct 2 residual images with different modes, and the characteristic up-sampling is performed at the rear part of the network, so that most of convolution operations of the network can be performed in a low-resolution space, and the calculation resources are also saved;

step 2.4, for low resolution image set

Upsampling to obtain an interpolated low resolution image set

Wherein the content of the first and second substances,

low resolution image representing the s-th modality

In this embodiment, a bicubic interpolation method is adopted as a method for up-sampling the low-resolution image;

step 2.5, residual error map of the s-th mode

And interpolated low resolution map of the s-th modality

Adding the obtained S-th modal super-resolution image

So as to obtain S super-resolution images of different modes

The difficulty of reconstructing the image is reduced by the network in a mode of reconstructing a residual error map, so that the network is easier to train;

step 3, training the multi-mode image super-resolution network:

Z is the maximum number of rounds of the super-resolution network training, X is the number of groups extracted each time, in this embodiment, X is set to be 32, and Z is set to be 200;

3.3, randomly taking out X groups of low-resolution image sets from the t time of the R groups of low-resolution image sets, inputting the X groups of low-resolution image sets into a multi-mode image super-resolution network for training, and obtaining the super-resolution image set output by the t time of training

In the formula (2), the reaction mixture is,

a reference image of the s-th modality representing the x-th set of reference images; the loss function constructed by the formula (2) is optimized and solved by adopting a back propagation algorithm, so that all parameters in the whole network are adjusted_t(theta) carrying out optimization solution;

Claims

1. A multi-modal image super-resolution method based on a convolutional neural network is characterized by comprising the following steps:

step 1, data preparation:

Wherein the content of the first and second substances,

Wherein the content of the first and second substances,

low resolution images of S different modalities

g dense residual attention modules are denoted as DRAB₁,DRAB₂,...,DRAB_g,…,DRAB_GWherein DRAB_gIs the g-th dense residual attention module;

The g-th intermediate feature

Output characteristics of the g-th dense residual unit

F_DRAg＝L_Ag×F_DRg (1)

When G is more than or equal to 2 and less than or equal to G, the G-th dense residual error attention module DRAB_gIs the g-1 th dense residual attentionForce module DRAB_g-1Output characteristic of