CN114170079A

CN114170079A - Depth map super-resolution method based on attention guide mechanism

Info

Publication number: CN114170079A
Application number: CN202111400288.8A
Authority: CN
Inventors: 杨敬钰; 陈昶佚; 岳焕景; 李坤
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-03-11

Abstract

The invention relates to a depth map super-resolution method based on a guide attention mechanism, which comprises the following steps of: analyzing the characteristics of the low-resolution depth map, and establishing a degradation model in the screen shot image; establishing a data set; designing a network framework: the whole network structure is composed of a lightweight network HDSRnet-light and an optional edge refinement network ERnet, a low-resolution depth map, a depth map subjected to bicubic interpolation up-sampling and a high-resolution color map are input into the HDS Rnet-light, and a high-resolution depth map is obtained. Inputting the high-resolution depth map and the high-resolution color map into the ERnet for further reconstruction, and finally obtaining fine high-resolution output, wherein the whole network is HDSRnet; setting the learning rate of the network and the weight of each part of loss function, and training the convolutional neural network by using a deep learning frame Pythrch until loss is converged to generate a training model.

Description

Depth map super-resolution method based on attention guide mechanism

Technical Field

The invention belongs to the field of image restoration, and relates to a super-resolution method of a depth map.

Background

A depth map is also called a distance image, and is an image in which the distance between an image pickup and each point of a scene is defined as a pixel value. Currently, the acquisition of the depth map is mainly realized by using structured light illumination and tof (time of flight) technology. However, due to the limitation of the physical performance of the current devices, the resolution of the depth map captured by the current depth camera is low, and the depth map is difficult to be applied to scenes with high requirements on fineness.

Current depth cameras are typically configured with one depth camera lens in combination with one color camera lens, and the corresponding collected data is a low resolution depth map and a paired high resolution color map. To obtain a high resolution depth map, a high resolution color image is typically used to guide a low resolution depth image for super resolution.

Depth map super-resolution methods based on the guide thought are mainly divided into two categories:

(1) conventional methods, for example: a hyper-resolution scheme based on an image filter; introducing prior information, and converting the overdivision problem into an optimization problem by utilizing a maximum posterior probability theory; and methods using dictionary learning and the like.

(2) A Convolutional Neural Network (CNN) is used to directly learn the mapping between low-resolution depth images to corresponding high-resolution depth images. Such methods require a large number of pairs of low-resolution depth images and high-resolution depth images, which are training data sets for the network, to train the network. However, in the existing research, most methods simply splice the high-resolution color image features and the depth image features, which results in low information utilization efficiency. Currently, most networks improve performance by stacking parameters, which results in large network parameters, low operation efficiency, and difficult deployment into practical application devices. Therefore, a more efficient characteristic fusion mode is adopted to improve the network performance, and further the practical application value of the method is improved to become the development trend of the current method.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a depth map super-resolution method with high operation efficiency and excellent performance by a guiding attention mechanism mode, and the technical scheme is as follows:

a depth map super-resolution method based on a guiding attention mechanism comprises the following steps:

1) analyzing the characteristics of the low-resolution depth map, and establishing a degradation model in the screen shot image:

D_lr＝HD_hr+n₀

in the formula D_lrFor low-resolution images captured by depth cameras, D_hrIs a high resolution depth image, H is a downsampled matrix, n₀Is noise.

2) Building data sets

The high-resolution data of the training set is formed by randomly cropping a plurality of RGB-D images in an MPI Sintel depth data set and a plurality of RGB-D images in a Middlebury data set, the size of each pair of RGB-D data is such that the diversity of the data is increased by introducing the random rotation of an angle theta epsilon [0 DEG, 90 DEG, 180 DEG, 270 DEG ], and the low-resolution data is obtained by the high-resolution depth image through double-cubic interpolation downsampling.

3) Designing a network framework

31) The whole network structure is composed of a lightweight network HDSRnet-light and an optional edge refinement network ERnet, a low-resolution depth map, a depth map subjected to bicubic interpolation up-sampling and a high-resolution color map are input into the HDS Rnet-light, and a high-resolution depth map is obtained. Inputting the high-resolution depth map and the high-resolution color map into the ERnet for further reconstruction, and finally obtaining fine high-resolution output, wherein the whole network is HDSRnet;

32) the HDSRnet-light of the lightweight network consists of 3 branches: a main branch; a structural side branch; the detail side branch, two side branches are composed of up-sampling module and down-sampling module, the main branch is composed of up-sampling module, down-sampling module, attention guiding module and attention module. Wherein, the low resolution depth map is input into the main branch, the depth map which is up-sampled by bicubic interpolation is input into the structural side branch, and the high resolution color map is input into the detail side branch.

321) The structure of the down-sampling module: the convolution layer 1, the pooling layer 1 and the convolution layer 2 are connected with ReLu activation functions after all convolution layers in the module.

322) The up-sampling module structure: deconvolution layer 1-convolution layer 1, all convolution layers and deconvolution layers in the module are connected with ReLu activation functions.

323) Guiding attention module structure: the method comprises the steps of convolutional layer 1-convolutional layer 2-multiplication layer-convolutional layer 3, wherein main branch characteristics enter convolutional layer 1, side branch characteristics enter convolutional layer 2, element multiplication is carried out on the outputs of convolutional layer 1 and convolutional layer 2 in the multiplication layer, the output of convolutional layer 2 enters convolutional layer 3, the output of convolutional layer 3 is added with the output of the multiplication layer, and all convolutional layers are connected with ReLu activation functions.

324) Attention module structure: a lead attention module 1-a lead attention module 2, where the main branch features and side branch features enter the lead attention module 1 and the output and side branch features of the lead attention module 1 enter the lead attention module 2.

325) Main branch structure: the system comprises a convolutional layer 1, a fusion layer 1, a guide attention module 1, an up-sampling module 1, an attention module 1, an up-sampling module 2, an attention module 2 and a convolutional layer 2, wherein the fusion layer 1 combines the outputs of a main branch convolutional layer 1 and a structure side branch down-sampling module 2 in a channel dimension, the guide attention module 1 combines the outputs of the detail side branch down-sampling module 2 and the main branch fusion layer 1, the attention module 1 combines the outputs of the structure side branch down-sampling module 1, the detail side branch down-sampling module 1 and the main branch up-sampling module 1, and the attention module 2 combines the outputs of the structure side branch convolutional layer 1, the detail side branch convolutional layer 1 and the main branch up-sampling module 2. The final output is the output of the main branch convolution layer 2 and the output obtained by sampling and adding through bicubic interpolation;

326) detail side branch structure: the system comprises a convolution layer 1, a down-sampling module 1 and a down-sampling module 2, wherein the branches are responsible for processing information of a high-resolution color image, and the outputs of the down-sampling module 1 and the down-sampling module 2 are sent to an attention module of a main branch for feature fusion. The convolution layer 1 is connected with a ReLu activation function;

327) structure-side branch structure: the system comprises a convolutional layer 1, a downsampling module 1 and a downsampling module 2, wherein the convolutional layer 1, the downsampling module 1 and the downsampling module 2 are responsible for processing depth map information of bicubic interpolation upsampling, and the outputs of the downsampling module 1 and the downsampling module 2 are sent to an attention module of a main branch for feature fusion. The convolutional layer 1 is followed by a ReLu activation function.

33) The edge refinement network ERnet consists of two branches: a deep main branch and a detail side branch. The output of HDSRnet-light is input to the main deep branch and the high resolution color image is input to the side detailed branch. Both branches are composed of convolutional layers for fine reconstruction of the edge.

331) Detail side branch structure: convolutional layer C₁-a convolutional layer C₂-a convolutional layer C₃-a convolutional layer C₄-a convolutional layer C₅-a convolutional layer C₆-a convolutional layer C₇-a convolutional layer C₈-a convolutional layer C₉The branch is responsible for processing high-resolution color image information, and all the convolution layers are connected in series and are connected with a ReLu activation function.

332) Deep branch structure: convolutional layer D₁-a fusion layer F₁-convolutional layer D₂-a fusion layer F₂-convolutional layer D₃-a fusion layer F₃-convolutional layer D₄-a fusion layer F₄-convolutional layer D₅-a fusion layer F₅-convolutional layer D₆-a fusion layer F₆-convolutional layer D₇-a fusion layer F₇-convolutional layer D₈-a fusion layer F₈-convolutional layer D₉-a fusion layer F₉-convolutional layer D₁₀The branch is responsible for processing depth map information and fusing color branch information, all convolution layers are connected in series, and a fusion layer F_iMerging depth branch convolutional layers in channel dimension D_iOutput and color branch convolution layer C of_iTo output of (c).

4) Setting the learning rate of the network and the weight of each loss function, and training the convolutional neural network by using a deep learning frame Pythrch until loss is converged to generate a training model;

41) after determining the network structure, inputting training data into the network;

42) in the first stage of network training, the initial learning rate of the HDSRnet-light network is set to be 0.0001, the learning rate is reduced to 0.1 time of the original learning rate every iteration of an epoch, and the HDSR is trainednet-light, in order to

Norm as a loss function;

43) in the second stage of network training, after HDSRnet-light is trained for 30 epochs, the HDSRnet-light is followed by ERnet to train HDSRnet-light

Norm as a loss function;

44) training is carried out, and mapping from a low-resolution depth map to a high-resolution depth map is obtained through HDSRnet-light; and obtaining the mapping from the high-resolution depth map to the fine high-resolution depth map through the ERnet.

Aiming at super resolution of a depth map, the method generates low-resolution data on the basis of a public data set based on a deep learning method, and trains on a convolutional neural network HDSRnet. The HDSRnet-light section achieves efficient depth map super resolution, achieving superior performance with very little parameter amount. The ERnet part realizes more precise edge reconstruction, and the invention has the following characteristics:

1. an efficient lightweight depth map super-resolution network HDSRnet-light is provided, wherein an attention module can efficiently fuse the characteristics of a high-resolution color image and a low-resolution depth image. The network can perform 8-time super-resolution on a 135X 240 depth map at the speed of 360 frames/second, and the performance and the speed of the network exceed those of other methods in the related art.

2. The method provides an edge refinement network ERnet, carries out more fine repair on the basis of HDSRnet-light, corrects the deviation of some edges and realizes more excellent performance.

3. According to the characteristics of different loss functions, a two-stage training strategy is provided, and meanwhile, the characteristics of different loss functions are considered

Norm sum

The characteristics of norm accelerate the convergence speedDegree and performance.

Drawings

FIG. 1 is an algorithm flow diagram;

FIG. 2 is a network framework diagram;

FIG. 3 is an attention module frame diagram;

FIG. 4 is a comparison of various methods: (a) the method comprises the steps of (a) obtaining an original high-resolution Depth map, (b) obtaining a result after bicubic interpolation up-sampling, (c) obtaining a result of a Depth learning method Depth SR, (d) obtaining a result of a Depth learning method PMBANet, (e) obtaining a result of a Depth learning method DEIN, and (f) obtaining a super-resolution result of the method.

Detailed Description

In order to overcome the defects of the prior art, the invention aims to provide a depth map super-resolution method with high operation efficiency and excellent performance by a guiding attention mechanism mode:

D_lr＝HD_hr+n₀

in the above formula D_lrFor low-resolution images captured by depth cameras, D_hrIs a high resolution depth image, H is a downsampled matrix, n₀Is noise.

2) Building data sets

The high resolution data of the training set consists of a random cropping of 58 RGB-D images in the MPI sinter depth dataset and 34 RGB-D images in the Middlebury dataset, with the size of each pair of RGB-D data being such as to increase the diversity of the data by introducing a random rotation of the angle θ ∈ [0 °,90 °,180 °,270 °. And low-resolution data is acquired from the high-resolution depth map by bi-cubic interpolation down-sampling.

The test set was generated in the same way using six images in the Middlebury 2005 dataset.

3) Designing a network framework

31) The whole network structure is composed of a lightweight network HDSRnet-light and an optional edge refinement network ERnet, a low-resolution depth map, a depth map subjected to bicubic interpolation up-sampling and a high-resolution color map are input into the HDS Rnet-light, and a high-resolution depth map is obtained. And inputting the high-resolution depth map and the high-resolution color map into the ERnet for further reconstruction, and finally obtaining fine high-resolution output.

The whole network is HDSRnet;

32) the HDSRnet-light of the lightweight network consists of 3 branches: a main branch; a structural side branch; detail side branching. The two side branches are composed of an up-sampling module and a down-sampling module, and the main branch is composed of an up-sampling module, a down-sampling module, a guiding attention module and an attention module. Wherein, the low resolution depth map is input into the main branch, the depth map which is up-sampled by bicubic interpolation is input into the structural side branch, and the high resolution color map is input into the detail side branch.

321) The structure of the down-sampling module: convolutional layer 1-pooling layer 1-convolutional layer 2. All the convolution layers in the module are connected with ReLu activation functions.

322) The up-sampling module structure: deconvolution layer 1 — convolution layer 1. All the convolution layers and the deconvolution layers in the module are connected with ReLu activation functions.

323) Guiding attention module structure: convolutional layer 1-convolutional layer 2-multiplicative layer-convolutional layer 3. Wherein the main branch features enter convolutional layer 1, the side branch features enter convolutional layer 2, the output of convolutional layer 1 and convolutional layer 2 is element multiplied in the multiplication layer, the output of convolutional layer 2 enters convolutional layer 3, and the output of convolutional layer 3 and the multiplication layer output are added. All convolutional layers are connected with the ReLu activation function.

324) Attention module structure: direct attention module 1-direct attention module 2. Wherein the main branch features and the side branch features enter the guiding attention module 1 and the output and side branch features of the guiding attention module 1 enter the guiding attention module 2.

325) Main branch structure: convolutional layer 1-fusion layer 1-direction attention module 1-upsampling module 1-attention module 1-upsampling module 2-attention module 2-convolutional layer 2. The fusion layer 1 combines the outputs of the main branch convolution layer 1 and the structure side branch down-sampling module 2 in the channel dimension, the guide attention module 1 combines the outputs of the detail side branch down-sampling module 2 and the main branch fusion layer 1, the attention module 1 combines the structure side branch down-sampling module 1, the output of the detail side branch down-sampling module 1 and the main branch up-sampling module 1, the attention module 2 combines the output of the structure side branch convolution layer 1, the output of the detail side branch convolution layer 1 and the main branch up-sampling module 2. The final output is the output of the main branch convolution layer 2 and the output obtained by sampling and adding through bicubic interpolation;

326) detail side branch structure: convolutional layer 1-down sampling module 2. The branch is responsible for processing the information of the high-resolution color image, wherein the outputs of the down-sampling module 1 and the down-sampling module 2 are sent to the attention module of the main branch for feature fusion. The convolution layer 1 is connected with a ReLu activation function;

327) structure-side branch structure: convolutional layer 1-down sampling module 2. The branch is responsible for processing the depth map information of bicubic interpolation up-sampling, wherein the outputs of the down-sampling module 1 and the down-sampling module 2 are sent to the attention module of the main branch for feature fusion. The convolutional layer 1 is followed by a ReLu activation function.

331) Detail side branch structure: convolutional layer C₁-a convolutional layer C₂-a convolutional layer C₃-a convolutional layer C₄-a convolutional layer C₅-a convolutional layer C₆-a convolutional layer C₇-a convolutional layer C₈-a convolutional layer C₉. The branch is responsible for processing high-resolution color image information, and all the convolution layers are connected in series and are connected with a ReLu activation function.

332) Deep branch structure: convolutional layer D₁-a fusion layer F₁-convolutional layer D₂-a fusion layer F₂-convolutional layer D₃-a fusion layer F₃-convolutional layer D₄-a fusion layer F₄-convolutional layer D₅-a fusion layer F₅-convolutional layer D₆-a fusion layer F₆-a convolutional layerD₇-a fusion layer F₇-convolutional layer D₈-a fusion layer F₈-convolutional layer D₉-a fusion layer F₉-convolutional layer D₁₀. The branch is responsible for processing depth map information and fusing color branch information, all convolution layers are connected in series, and a fusion layer F_iMerging depth branch convolutional layers in channel dimension D_iOutput and color branch convolution layer C of_iTo output of (c).

42) in the first stage of network training, the initial learning rate of HDSRnet-light network is set to 0.0001, and every iteration of epoch, the learning rate is reduced to 0.1 times of the original learning rate, and when training HDSRnet-light, the learning rate is reduced to 0.1 times

Norm as a loss function;

Norm as a loss function;

5) And inputting the RGB-D in the test set into a network to finally obtain a corresponding fine high-resolution depth image.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A depth map super-resolution method based on a guiding attention mechanism comprises the following steps:

D_lr＝HD_hr+n₀

in the formula D_lrFor low-resolution images captured by depth cameras, D_hrIs a high resolution depth image, H is a downsampled matrix, n₀Is noise;

2) building data sets

The high-resolution data of the training set is formed by randomly cutting a plurality of RGB-D images in an MPI Sintel depth data set and a plurality of RGB-D images in a Middlebury data set, the size of each pair of RGB-D data is that the diversity of the data is increased by introducing the random rotation of an angle theta which belongs to [0 ], 90 degrees, 180 degrees and 270 degrees ], and the low-resolution data is obtained by the high-resolution depth image through double-cubic interpolation downsampling;

3) designing a network framework

31) The whole network structure is composed of a lightweight network HDSRnet-light and an optional edge refinement network ERnet, a low-resolution depth map, a depth map subjected to bicubic interpolation up-sampling and a high-resolution color map are input into the HDS Rnet-light, and a high-resolution depth map is obtained; inputting the high-resolution depth map and the high-resolution color map into the ERnet for further reconstruction, and finally obtaining fine high-resolution output, wherein the whole network is HDSRnet;

32) the HDSRnet-light of the lightweight network consists of 3 branches: a main branch; a structural side branch; the system comprises detail side branches, a main branch and a detail side branch, wherein the detail side branches comprise an up-sampling module and a down-sampling module; wherein, the low resolution depth map is input into the main branch, the depth map which is up-sampled by bicubic interpolation is input into the structural side branch, and the high resolution color map is input into the detail side branch;

321) the structure of the down-sampling module: the system comprises convolution layers 1, a pooling layer 1 and a convolution layer 2, wherein all convolution layers in a module are connected with ReLu activation functions;

322) the up-sampling module structure: the system comprises an deconvolution layer 1, a convolution layer 1, wherein all convolution layers and all deconvolution layers in a module are connected with a ReLu activation function;

323) guiding attention module structure: the method comprises the following steps of (1) convolutional layer 1-convolutional layer 2-multiplication layer-convolutional layer 3, wherein a main branch characteristic enters the convolutional layer 1, a side branch characteristic enters the convolutional layer 2, element multiplication is carried out on the outputs of the convolutional layer 1 and the convolutional layer 2 in the multiplication layer, the output of the convolutional layer 2 enters the convolutional layer 3, the output of the convolutional layer 3 is added with the output of the multiplication layer, and all convolutional layers are connected with ReLu activation functions;

324) attention module structure: a guiding attention module 1-a guiding attention module 2, wherein the main branch features and the side branch features enter the guiding attention module 1, and the output and side branch features of the guiding attention module 1 enter the guiding attention module 2;

325) main branch structure: the system comprises a convolutional layer 1, a fusion layer 1, a guide attention module 1, an upsampling module 1, an attention module 1, an upsampling module 2, an attention module 2 and a convolutional layer 2, wherein the fusion layer 1 combines the outputs of a main branch convolutional layer 1 and a structure side branch downsampling module 2 in a channel dimension, the guide attention module 1 combines the outputs of the detail side branch downsampling module 2 and the main branch fusion layer 1, the attention module 1 combines the outputs of the structure side branch downsampling module 1, the detail side branch downsampling module 1 and the main branch upsampling module 1, and the attention module 2 combines the outputs of the structure side branch convolutional layer 1, the detail side branch convolutional layer 1 and the main branch upsampling module 2; the final output is the output of the main branch convolution layer 2 and the output obtained by sampling and adding through bicubic interpolation;

326) detail side branch structure: the system comprises a convolution layer 1, a down-sampling module 1 and a down-sampling module 2, wherein the branches are responsible for processing the information of the high-resolution color image, and the outputs of the down-sampling module 1 and the down-sampling module 2 are all sent to an attention module of a main branch for feature fusion; the convolution layer 1 is connected with a ReLu activation function;

327) structure-side branch structure: the system comprises a convolutional layer 1, a downsampling module 1 and a downsampling module 2, wherein the convolutional layer 1, the downsampling module 1 and the downsampling module 2 are responsible for processing depth map information of bicubic interpolation upsampling, and the outputs of the downsampling module 1 and the downsampling module 2 are sent to an attention module of a main branch for feature fusion; the convolution layer 1 is connected with a ReLu activation function;

33) the edge refinement network ERnet consists of two branches: a deep main branch and a detail side branch; the output of HDSRnet-light is input to a depth main branch, and a high-resolution color image is input to a detail side branch; the two branches are composed of convolution layers and used for realizing fine reconstruction of edges;

331) detail side branch structure: convolutional layer C₁-a convolutional layer C₂-a convolutional layer C₃-a convolutional layer C₄-a convolutional layer C₅-a convolutional layer C₆-a convolutional layer C₇-a convolutional layer C₈-a convolutional layer C₉The branch is responsible for processing high-resolution color image information, and all the convolution layers are connected in series and are connected with a ReLu activation function;

332) deep branch structure: convolutional layer D₁-a fusion layer F₁-convolutional layer D₂-a fusion layer F₂-convolutional layer D₃-a fusion layer F₃-convolutional layer D₄-a fusion layer F₄-convolutional layer D₅-a fusion layer F₅-convolutional layer D₆-a fusion layer F₆-convolutional layer D₇-a fusion layer F₇-convolutional layer D₈-a fusion layer F₈-convolutional layer D₉-a fusion layer F₉-convolutional layer D₁₀The branch is responsible for processing depth map information and fusing color branch information, all convolution layers are connected in series, and a fusion layer F_iMerging depth branch convolutional layers in channel dimension D_iOutput and color branch convolution layer C of_iAn output of (d);

42) in the first stage of network training, the initial learning rate of the HDSRnet-light network is set to be 0.0001, and the learning rate is reduced to 0 per iteration of an epoch1 times, in training HDSRnet-light, to

Norm as a loss function;

Norm as a loss function;