CN112070676B

CN112070676B - Picture super-resolution reconstruction method of double-channel multi-perception convolutional neural network

Info

Publication number: CN112070676B
Application number: CN202010946074.XA
Authority: CN
Inventors: 张淑芬; 王鑫; 吕艳霞
Original assignee: Northeastern University Qinhuangdao Branch
Current assignee: Northeastern University Qinhuangdao Branch
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2023-10-27
Anticipated expiration: 2040-09-10
Also published as: CN112070676A

Abstract

The invention provides a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network, and relates to the technical field of image processing. The invention uses double convolution channels with different convolution kernels and combines local dense connection to obtain various perceptibility of the picture characteristic information, and an interlayer fusion structure with a convolution adjusting function restores more accurate fusion information. The network is trained by the DIV2K data set, and under the condition that only 8 layers of DMRB modules are used, the test result of a plurality of reference data sets is better than the current most advanced reconstruction algorithms such as MSRN, EDSR and the like. The reconstruction result graph of the DMCN contains richer high-frequency detail information, is closer to the original picture, can sense various information in the picture more comprehensively by the network structure of the DMCN, and has stronger reconstruction capability.

Description

Picture super-resolution reconstruction method of double-channel multi-perception convolutional neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network.

Background

Picture super-resolution reconstruction aims at reconstructing blurred Low-resolution (LR) pictures into clearer high-resolution (HR) pictures. The method can solve the problems of image blurring, noise interference and the like in the fields of video monitoring, medicine, satellite imaging and the like. Common super-resolution methods of pictures include interpolation methods, methods based on sparse representation, local linear regression methods and methods based on deep learning.

Recent studies have shown that deep neural networks can significantly improve the quality of single image super-resolution. Current research tends to use deeper convolutional neural networks to improve performance. However, blindly increasing the depth of the network does not effectively improve the network. Worse, as the depth of the network increases, more problems occur in the training process and more training skills are required.

For the existing super-resolution reconstruction technology of pictures, a new multi-scale residual error network (MSRN) is proposed by the MSRN to fully utilize the characteristics of the images: MSRB is used to acquire image features of different scales (local multiscale features). The outputs of each MSRB are combined for global feature fusion (HFFS, one with a 1 x 1 convolution kernel as the bottleneck layer). The local multiscale features are combined with the global features, LR image features are utilized to the greatest extent, the problem that features disappear in the transmission process is thoroughly solved, and a simple and efficient reconstruction structure is designed, so that multiscale amplification can be easily realized.

The EDSR is structurally compared to SRResNet by operating a batch normalization process (batch normalization, BN) to be eliminated. Because the batch normalization layer consumes the same amount of memory as the convolution layer in front of it, the EDSR can stack more network layers or extract more features from each layer with the same computing resources after this operation is removed, thus resulting in better performance. EDSR optimizes the network model with a loss function of the L1 norm pattern. The method comprises the steps of firstly training a low-multiple up-sampling model during training, then initializing a high-multiple up-sampling model by using parameters obtained by training the low-multiple up-sampling model, so that the training time of the high-multiple up-sampling model can be reduced, and meanwhile, the training result is better; the middle part of the MDSR is also like the EDSR, except that different pre-trained models are added in front of the network to reduce the difference of the input pictures by different multiples. And finally, the structures which are up-sampled at different times are arranged in parallel to obtain output results at different times.

The existing method has the following problems: first, the performance of the depth network for super-resolution reconstruction is highly dependent on the network width and depth, and the new approach tends to use wider networks and more network layers to enhance the reconstruction effect. However, the continuously-growing network scale enables the training difficulty to be synchronously improved, the network needs to be designed more reasonably to avoid the problems of gradient disappearance and the like in training, meanwhile, the time complexity and the space complexity of the network in the calculation process are multiplied, and the dependence on GPU hardware is high. Secondly, most super-resolution networks use residual stacking structures similar to those in ResNet to improve training effect, but the simple residual structures cannot fully extract image features in the network. Although the MSRN network uses a multi-scale method to extract the picture features and enhances the reconstruction effect, the MSRB module structure of the MSRN network can not extract the complete features of the picture, particularly the fusion feature information obtained through dense channel connection, and the extraction of the global features of the picture is insufficient.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network.

In order to solve the technical problems, the invention adopts the following technical scheme:

a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network comprises the following steps:

step 1: constructing a two-channel multi-perception residual error module DMRB as a basic module of a reconstruction network;

the reconstruction network comprises a shallow layer feature extraction layer, a deep layer feature extraction layer and an amplifying reconstruction layer;

step 2: low resolution picture I of shallow feature extraction layer to input network ^LR The dimension is increased from the 3 feature dimension of the RGB picture to the 64 feature dimension of the deep feature extraction layer, the preliminary feature information of the picture is obtained, and the feature value X is output ₀ The process is as follows: x is X ₀ ＝H _SFE (I ^LR ) Wherein H is _SFE (-) represents a shallow feature extraction function;

step 3: x is X ₀ Input into deep feature extraction layer, transfer between multi-layer double-channel multi-perception residual error module DMRB, continuously extract feature information, output each layer through adjusting layer (1*1 convolution), input into interlayer fusion layer, and finally promote feature extraction efficiency through residual error structure, output feature value X _d The process is: x is X _d ＝H _DFE (X ₀ ) Wherein H is _DFE (-) represents a depth feature extraction function;

the adjusting layer adjusts the proportion relation of each layer in the fusion process;

the interlayer fusion layer fuses the feature information output by each deep feature extraction layer;

step 4: amplifying the picture to a specific multiple through an amplifying reconstruction layer, wherein the process is as follows: i ^SR ＝H _{up_REC} (X _d ) Wherein H is _{up_REC} () represents an amplifying and reconstructing function;

step 5: representing the entire network function as H _DMCN (-), low resolution picture I ^LR Mapping to high resolution picture I ^SR ：I ^SR ＝H _{up_REC} (H _DEF (H _SFE (I ^LR )))。

The beneficial effects of adopting above-mentioned technical scheme to produce lie in:

the invention provides a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network, which uses double-convolution channels with different convolution kernels and combines local dense connection to obtain multiple perception capacities on picture characteristic information, and an interlayer fusion structure with a convolution adjusting function restores more accurate fusion information. The network is trained by the DIV2K data set, and under the condition that only 8 layers of DMRB modules are used, the test result of a plurality of reference data sets is better than the current most advanced reconstruction algorithms such as MSRN, EDSR and the like. The reconstruction result graph of the DMCN contains richer high-frequency detail information, is closer to the original picture, can sense various information in the picture more comprehensively by the network structure of the DMCN, and has stronger reconstruction capability.

Drawings

FIG. 1 is a flow chart of a picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network;

FIG. 2 is a diagram of a reconstruction network architecture according to the present invention;

FIG. 3 is a block diagram of a dual-channel multi-perception residual module according to the present invention;

fig. 4 is a schematic diagram of a comparison of quality in a 4-fold reconstruction of an embodiment of the present invention.

Detailed Description

The following describes the embodiments of the present invention in detail with reference to the drawings.

A picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network, as shown in figure 1, comprises the following steps:

step 1: constructing a dual-channel multi-perception residual error module DMRB as a basic module of a reconstruction network, wherein the module can maximize the perception picture characteristics and has stronger high-frequency information reduction capability in reconstruction;

the reconstruction network is shown in fig. 2, and comprises a shallow layer feature extraction layer, a deep layer feature extraction layer and an amplifying reconstruction layer;

this embodiment uses sub-pixel convolution up-sampling.

Step 5: representing the entire network function as H _DMCN (-), low resolution picture I ^LR Mapping to high resolution picture I ^SR ：I ^SR ＝H _{up_REC} (H _DEF (H _SFE (I ^LR )))＝H _DMCN (I ^LR )

The network is trained by the DIV2K data set, and under the condition that only 8 layers of DMRB modules are used, the test result of a plurality of reference data sets is better than that of the current most advanced reconstruction models.

The DMRB structure of the two-channel multi-perception residual module is shown in fig. 3. The feature extraction channels on the left and right sides respectively adopt convolution kernels of 3×3 and 5×5. Different convolution kernels can enable convolution operation to obtain picture feature information on different scales, if the information is fused and further feature extraction is carried out, the perception capability of a depth structure can be effectively enhanced, the method is successfully applied to a GoogLeNet network, and a similar structure is used in a later MSRN network. The structure of the invention is different from the structure of the invention in that the DMRB is fused with local dense connection information besides the characteristic values output by two convolution operations.

In the embodiment, 800 pictures in the DIV2K dataset are used for training a convolution network, input pictures are RGB images and cut into 48 multiplied by 48, and the input images are subjected to rotation, turnover and other transformation according to the method in the EDSR network so as to enhance the training effect. Each training sample number (batch size) was 16 for a total of 1000 iterations. In the training, 2 times, 3 times and 4 times reconstruction are respectively trained. Training results were tested based on Set5, set14, B100 and Urban100 baseline data sets, with evaluation indices peak signal-to-noise ratio (PSNR) and structural similarity (Structural Similarity Inex, SSIM). Table 1 gives a comparison of the method herein with several classical SR methods.

Table 1 benchmark dataset test results

The underlined data in the table is the best result in the test, and the DMCN+ uses a Geometric Self-assembly (Self-assembly) method to improve the test effect. It can be seen that the best test data is obtained for the DMCN network we propose in most training sets.

Fig. 4 shows a comparison of the reconstruction effect of a DMCN in a 4-fold reconstruction with a high degree of difficulty with several mainstream reconstruction algorithms VDSR, MSRN, etc. From subjective visual experience, it can be obviously seen that the reconstructed result image of the DMCN contains more abundant high-frequency detail information, and is more similar to the original image. The result is mainly beneficial to the DMCN network structure to more comprehensively sense various information in the picture, and the reconstruction capability is stronger.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A picture super-resolution reconstruction method of a double-channel multi-perception convolutional neural network is characterized by comprising the following steps of: the method comprises the following steps:

step 3: x is X ₀ Input into deep feature extraction layer, transfer between multi-layer double-channel multi-perception residual error module DMRB, continuously extract feature information, output each layer through adjusting layer (1*1 convolution), input into interlayer fusion layer, and finally promote special through residual error structureThe sign extraction efficiency, output characteristic value X _d The process is: x is X _d ＝H _DFE (X ₀ ) Wherein H is _DFE (-) represents a depth feature extraction function;

2. The method for reconstructing the super-resolution of the picture of the two-channel multi-perception convolutional neural network according to claim 1, wherein the reconstruction network in the step 1 comprises a shallow feature extraction layer, a deep feature extraction layer and an amplifying reconstruction layer.

3. The method for reconstructing the super-resolution of the picture of the double-channel multi-perception convolutional neural network according to claim 1, wherein in the step 3, the adjusting layer adjusts the proportional relation of each layer in the fusion process; and the interlayer fusion layer fuses the characteristic information output by each deep characteristic extraction layer.

4. The method for reconstructing super resolution of a picture of a two-channel multi-perception convolutional neural network according to claim 3, wherein said inter-layer fusion module outputs X except for the last layer _n In addition, X ₀ To X _n-1 When the jump connection is carried out to the fusion layer, a 1 multiplied by 1 convolution layer is passed, and besides the output of the last layer has a fixing function in the fusion, the other layers are dynamically regulated by the convolution layer, so that the accurate picture characteristic value is extracted.