CN116188273A

CN116188273A - Uncertainty-oriented bimodal separable image super-resolution method

Info

Publication number: CN116188273A
Application number: CN202310261226.6A
Authority: CN
Inventors: 张浩鹏; 韩喆鑫; 姜志国; 谢凤英; 赵丹培
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-05-30

Abstract

The invention discloses a dual-mode separable image super-resolution method based on uncertainty guiding, which comprises the following steps: obtaining a low-resolution depth image and a corresponding color image; performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image; according to the sampling image, obtaining random features of the low-resolution depth image; inputting the low-resolution depth image into a depth encoder to obtain deep features of the low-resolution depth image; according to the randomness characteristic and the deep layer characteristic, a first characteristic diagram and a corresponding first uncertainty estimation diagram are obtained; inputting the color image and the sampling image into a depth detail estimation network, and outputting a second feature map and a corresponding second uncertainty estimation map; performing feature enhancement processing on the first feature map and the second feature map; and carrying out 3X 3 convolution processing on the first depth feature map and the second depth feature map after feature enhancement to obtain a super-resolution depth map. By the method, the super-resolution depth map with high quality can be obtained.

Description

Uncertainty-oriented bimodal separable image super-resolution method

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a double-mode separable image super-resolution method based on uncertainty guiding.

Background

The depth information can provide key information of a scene and is widely applied to the fields of computer vision such as three-dimensional reconstruction, object detection, instance segmentation and the like. However, due to technical limitations, it is difficult to obtain high quality depth maps, which limits the development of many more difficult visual tasks. The deep super-resolution technology can well solve the problem, and the implementation cost of the technology is low. Therefore, how to effectively improve the quality of a low resolution depth image using a depth super resolution technique is a very important research topic.

Compared to color images, depth images often have abrupt discontinuities due to differences in the actual object being occluded or positioned. In addition, depth maps often suffer from large areas of artifacts due to technical limitations or external disturbances. Super-resolution methods that focus on reconstructing color image details are often no longer suitable for depth image reconstruction processes. Therefore, reconstructing the depth map with a single modality is very difficult, whereas color images typically have clearer textures, which can effectively guide the reconstruction of high quality depth maps. There are three ways to accomplish efficient fusion of depth information and color information before, namely feature fusion at the input stage, reconstruction stage and output stage, respectively. The color map can not provide effective guidance for the depth map reconstruction process when the input stage is fused, the information can be highly coupled when the reconstruction stage is fused, and the feature fusion at the output stage ensures the effective guidance between the features and the separable process between the information.

When super-resolution reconstruction is performed on a depth map with discontinuous areas, the discontinuous areas cannot be distinguished in the reconstruction process by a traditional algorithm such as bicubic interpolation and the like, and meanwhile, the influence of noise interference is larger, so that the reconstruction effect is poor. Moreover, the traditional interpolation algorithm is designed according to the positions of pixels in the image, and the mode is often difficult to reflect the real situation of the image, so that the reliability of a calculation result is poor.

Most of depth map super-resolution reconstruction methods based on deep learning are single-point prediction, and it is difficult to distinguish holes from correct depth values. In addition, most of the prior depth image super-resolution reconstruction technologies based on deep learning color image guidance finish information fusion in a reconstruction stage, and the methods excessively depend on color images aligned with the depth images, and in an application process, the two modes often cannot be separated, so that the method is not suitable for the actual production requirement. In addition, the depth super-resolution reconstruction methods only consider the reconstruction performance of the model itself in the reconstruction process, and neglect the interpretation study of the reconstruction result.

Therefore, how to implement the mode separation in the process of improving the quality of the low-resolution depth image becomes a key problem of the current research.

Disclosure of Invention

In view of the above problems, the present invention provides a dual-mode separable image super-resolution method based on uncertainty guidance, which at least solves some of the above technical problems, and by which a high-quality super-resolution depth map can be obtained.

The embodiment of the invention provides a dual-mode separable image super-resolution method based on uncertainty guiding, which comprises the following steps:

obtaining a low-resolution depth image and a color image corresponding to the low-resolution depth image;

performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image;

according to the sampling image, obtaining random features of the low-resolution depth image;

inputting the low-resolution depth image into a depth encoder, and obtaining deep features of the low-resolution depth image through a plurality of residual modules;

obtaining a first feature map and a corresponding first uncertainty estimation map according to the randomness features and the deep features;

inputting the color image and the sampling image into a depth detail estimation network, and outputting a second characteristic image and a corresponding second uncertainty estimation image;

performing feature enhancement processing on the first feature map according to the first uncertainty estimation map;

performing feature enhancement processing on the second feature map according to the second uncertainty estimation map;

and carrying out 3X 3 convolution processing on the first depth feature map and the second depth feature map after feature enhancement to obtain a super-resolution depth map.

Further, the size of the sampled image is consistent with the size of the super-resolution depth map.

Further, the obtaining random features of the low-resolution depth image according to the sampling image specifically includes:

modulating the sampled image into a prior mean and a prior variance of a gaussian distribution by a plurality of 3 x 3 convolutions; and processing the prior mean value and the prior variance through heavy parameterized sampling to obtain the randomness characteristic of the low-resolution depth image.

Further, the obtaining a first depth feature map and a corresponding first uncertainty estimation map according to the randomness feature and the deep feature specifically includes:

concatenating the randomness feature and the deep feature;

fusing the features after series connection by adopting 1X 1 convolution to obtain a first fused feature;

enhancing the first fusion characteristic through a plurality of residual modules to obtain a first characteristic diagram;

processing the first feature map by adopting 3×3 convolution to obtain a first depth map corresponding to the first feature map;

and processing the first feature map by adopting 3×3 convolution to obtain a first uncertainty estimation map corresponding to the first feature map.

Further, the inputting the color image and the sampling image into a depth detail estimation network, and outputting a second depth feature map and a corresponding second uncertainty estimation map specifically includes:

the color image and the sampling image are connected in series and then used as the input of a depth detail estimation network;

in the depth detail estimation network, fusing the characteristics after series connection by adopting 1X 1 convolution to obtain a second fused characteristic;

enhancing the second fusion characteristic through a plurality of residual modules to obtain a second characteristic diagram;

processing the second feature map by adopting 3×3 convolution to obtain a second depth map corresponding to the second feature map;

and processing the second feature map by adopting 3×3 convolution to obtain a second uncertainty estimation map corresponding to the second feature map.

Further, the method further comprises the following steps:

processing the color image by using a Laplace filter to obtain a depth texture region;

and performing characteristic imposition processing on the deep texture region through a texture loss function.

Further, the texture loss function is expressed as:

L＝||(y _te -y)+(y _te -y)·t|| ₁

wherein ,y_te Representing a second depth map; y represents the high resolution depth map and t represents the depth texture region to be enhanced.

Further, the method further comprises the following steps:

training and optimizing the super-resolution depth map through a loss function;

the loss function is expressed as:

wherein ,

representing a super-resolution depth map; y represents a high resolution depth map; u represents an uncertainty estimation map. The high-resolution depth map y refers to a reference image, namely a group trunk, which is used, and is supervised training, so that a label exists in the training process.

Further, the method further comprises the following steps:

concatenating the sampled image with the final super-resolution depth map;

modulating the serial images into a posterior mean value and a posterior variance of Gaussian distribution through a plurality of 3X 3 convolutions;

and constraining the prior mean value, the prior variance, the posterior mean value and the posterior variance through KL divergence.

Compared with the prior art, the method for super-resolution of the bimodal separable image based on uncertainty guiding has the following beneficial effects:

1. the depth super-resolution architecture provided by the invention can use the guiding mode as input for extraction, and also supports reasoning under the condition of no guiding mode, so as to obtain a high-quality super-resolution depth map.

2. The invention provides a cross-task learning scheme, which encourages depth detail to estimate the discontinuity of a network learning depth map and uses an uncertainty to guide a fusion network to fuse super-resolution depth results.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a schematic flow chart of a bimodal separable image super-resolution method based on uncertainty guidance according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of feature enhancement of uncertainty guidance provided by an embodiment of the present invention.

Fig. 3 is a schematic diagram showing comparison of visual results on an NYUv2 dataset according to an embodiment of the present invention.

Fig. 4 is a schematic diagram showing a comparison of visual results on an rgbd dataset according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a visualization result of an uncertainty chart generated in a learning process of a condition variation editor network and a depth detail estimation network according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Referring to fig. 1, an embodiment of the present invention provides a dual-mode separable image super-resolution method based on uncertainty guidance, and the method provided by the present invention is described in detail below from a conditional variable encoder network (CVAENet), a depth detail estimation network and an uncertainty guidance fusion module, and a training optimization module, respectively.

1. Condition variable encoder network (CVAENet)

A conditional variable encoder network (CVAENet) includes a probability network, a depth editor, and a prediction network; the specific operation content is as follows:

obtaining a low-resolution depth image and a color image corresponding to the low-resolution depth image; performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image; the size of the sampling image and the super-resolution depth map

Is uniform in size;

taking the sampled image as an input of a probability network; in the probability network, the input sampling image is modulated into a prior mean value and a prior variance of Gaussian distribution through a plurality of 3X 3 convolutions (4 in the embodiment of the invention); processing the prior mean value and the prior variance through a heavy parameterized sampling (Sample) to obtain a randomness characteristic of the low-resolution depth image; expressed by the formula:

wherein μ represents an a priori average; σ represents a priori variance; the hidden variable z contains the randomness characteristic of the image; sigma epsilon R ^64×h×w H and w represent the height and width of the sampled image, respectively.

Because the deep features of the low-resolution image play a very important role in modeling the super-resolution depth image, a depth encoder is designed in the embodiment of the invention to extract the deep features of the low-resolution depth image, specifically: the low-resolution depth image is input into a depth encoder, deep features of the low-resolution depth image are obtained through a plurality of residual modules (16 in the embodiment of the invention), and more complex features can be gradually extracted from the image through superposition processing of the 16 residual modules;

according to the randomness characteristic and the deep characteristic, a first depth characteristic map and a corresponding first uncertainty estimation map are obtained: the series connection of the random feature and the deep feature in the feature dimension is used as the input of a prediction network, and the purpose of the series connection is thatThe two different types of features are fused for subsequent effects; in a prediction network, the features after series connection are fused by adopting 1X 1 convolution, so that the randomness features and the deep features have the same dimension, and a first fusion feature is obtained; because the current fused features still cannot accurately direct the generation of the super-resolution depth map, a plurality of residual modules (3 in the embodiment of the invention) are adopted in the prediction network to enhance the first fused features, so as to obtain a first feature map F _d The method comprises the steps of carrying out a first treatment on the surface of the Processing the first feature map by adopting 3X 3 convolution to obtain a first depth map y corresponding to the first feature map _d The method comprises the steps of carrying out a first treatment on the surface of the Simultaneously, 1 3 multiplied by 3 convolution is additionally adopted to process the first feature map to obtain a first uncertainty estimation map u corresponding to the first feature map _d 。

2. Depth detail estimation network and uncertainty guidance fusion module

Because the color image can provide a great deal of details for the depth map, a depth detail estimation network is designed in the embodiment of the invention to fuse the color image characteristics and the depth image characteristics and guide the reconstruction of the depth map. Specifically, a high-resolution color image consistent with the original low-resolution depth image scene is acquired; taking the serial connection of the color image and the sampled image obtained after the three-time interpolation up-sampling as the input of a depth detail estimation network, in the depth detail estimation network, adopting 1X 1 convolution to perform information fusion on the image characteristics after the serial connection to obtain second fusion characteristics, adopting a plurality of residual modules (5 in the embodiment of the invention) to perform enhancement processing on the second fusion characteristics, and converting the second fusion characteristics into a second characteristic diagram F rich in texture estimation _te The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, a 3 multiplied by 3 convolution is additionally added to the last layer of the depth detail estimation network, and the second characteristic map is processed by adopting the 3 multiplied by 3 convolution to obtain a second depth map yte corresponding to the second characteristic map; processing the second feature map by 3×3 convolution to obtain a first uncertainty estimation map u corresponding to the second feature map _te 。

Encoder network (CVAENet) and depth detail estimation network with conditional variationThe method has the advantages that different types of depth information exist, and therefore an uncertainty guiding fusion module is designed to fuse two reconstructed images, and meanwhile additional regularization can be provided for the original two networks to help the two networks to reconstruct super-resolution depth images more effectively, on the other hand, as RGB images and depth images have serious inconsistencies in texture details, when occlusion occurs or the distance difference between adjacent objects is large, compared with a depth image, the color images have smoother texture transition, and therefore, the method can be used for effectively fusing the information of the two images, so that the problem of inconsistent RGB-D structures is relieved; referring to fig. 2, the following are specifically: first super-resolution depth map F to be outputted in prediction network _d And a first uncertainty estimation map u _d And a second super-resolution depth map F output in the depth detail estimation network _te And a second uncertainty estimation map u _te All serve as inputs to the uncertainty guide fusion module; in the uncertainty guiding fusion module, performing feature enhancement processing on a first depth feature map according to a first uncertainty estimation map; performing feature enhancement processing on the second depth feature map according to the second uncertainty estimation map; 3X 3 convolution processing is carried out on the first depth feature map and the second depth feature map after feature enhancement, and a super-resolution depth map is obtained

The feature map after feature enhancement processing based on uncertainty is expressed as:

F′＝F*(1+SoftMax(Conv _3×3 (u)))

wherein F' represents a feature map after feature enhancement; f represents a first depth feature map to be reinforced or a second depth feature map to be reinforced; u represents the first uncertainty estimation map or the second uncertainty estimation map.

3. Training optimization

1. In order to enable the output result of the depth detail estimation network to have richer texture representation, in the embodiment of the invention, a Laplacian filter is adopted to process a color image in the depth detail estimation network, so as to obtain a depth texture region; performing characteristic imposition processing on the depth texture region through a texture loss function; the texture loss function is expressed as:

L＝||(y _te -y)+(y _te -y)·t|| ₁

wherein ,y_te Representing a super-resolution result graph corresponding to the second depth feature graph; y represents a high resolution depth map; t represents the depth texture region to be reinforced.

2. According to the embodiment of the invention, training and optimizing are carried out on the super-resolution depth map through a loss function;

the loss function is expressed as:

to avoid instability of the equation due to the divide by 0 process, we design u=logσ in the network ² Thus the loss can be further expressed as:

/>

wherein ,

Thus, in each module, the loss function of the model can be expressed as:

in a conditional variable encoder network:

in a depth detail estimation network:

in an uncertainty fusion network:

it is further noted here that the probability network in the conditional variable encoder network (CVAENet) includes an a priori network and a posterior network; the parts relating to the a priori network have been described above; in the embodiment of the invention, the structure of the posterior network is consistent with that of the prior network, except that the inputs of the prior network and the posterior network are different; in the embodiment of the invention, the serial connection between the sampled image and the super-resolution depth map trained by the loss function is used as the input of a posterior network, and in the posterior network, the serial connection image is modulated into a prior mean value and a prior variance of Gaussian distribution by a plurality of 3X 3 convolutions (4 in the embodiment of the invention); KL divergence is additionally introduced to constrain the gap between a priori and a posterior networks.

The effectiveness of a dual modality separable image super resolution method based on uncertainty steering provided by the present invention is described next by way of a specific embodiment.

The present invention uses two data sets to verify the validity of our method, the NYUv2 data set and the real data rgbd data set, respectively. The evaluation index is RMSE; the lower the RMSE, the higher the reconstruction quality of the image. The results of the method provided by the examples of the present invention compared to other methods on these two data sets can be seen in tables 1 and 2 below:

TABLE 1 comparison of the methods provided by the examples of the invention with other methods on the NYUv2 dataset

/>

TABLE 2 comparison of the methods provided by embodiments of the invention with other methods on RGBDD data sets

As shown in the above tables 1 and 2, the embodiment of the present invention compares with the most advanced depth image super-resolution method on two test sets of three different scale factors (×4, ×8, ×16); it is evident that the method provided by the embodiments of the present invention achieves optimal performance at a plurality of data sets and a plurality of scaling factors, which also proves that the method provided by the embodiments of the present invention has a strong advantage over the prior art. The model performance of the method provided by the embodiment of the invention is obviously superior to that of a suboptimal model of all two data sets when the scaling factor is 16, and the method provided by the embodiment of the invention is proved to have the capability of recovering richer results by utilizing information on smaller images. In addition, the invention tests the performance of the model on the real data set to verify the generalization capability of the model, and compared with other methods, the invention can obtain better performance on the real data set, and the specific visual comparison result diagram can be seen in fig. 3 and 4.

In addition, the embodiment of the invention also visualizes uncertainty results in the two backbone network learning processes, and the results are shown in fig. 5, and as can be seen from the graph, the depth detail estimation network pays more attention to texture details in the reconstructed image, and the reconstruction results of the condition variation encoder network are well supplemented. Experimental results on multiple data sets demonstrate the excellent performance and popularity of the methods provided by the examples of the present invention. Compared with the comparison method, the method provided by the embodiment of the invention achieves competitive results.

The embodiment of the invention provides a dual-mode separable image super-resolution method based on uncertainty guiding, which is characterized in that a depth reconstruction network based on a conditional variation automatic encoder is designed firstly, and different from a common depth super-resolution reconstruction method, the method provided by the embodiment of the invention introduces label information into the network through mutual constraint of priori and posterior, so that a more reliable super-resolution result is obtained. Furthermore, this structure can easily enhance the depth encoder network, improving its performance. In order to improve the performance of the fusion result, the invention introduces uncertainty learning in the deep super-resolution task for the first time, so that the color information can provide more effective supplement for the fusion network, and more reliable fusion result is realized. The method for realizing the super-resolution of the dual-mode separable image based on uncertainty guidance can realize mode separation in the process of improving the quality of the low-resolution depth image; the mode separation is embodied in the training and reasoning process, the color map and the depth reconstruction process can be separated, namely, whether the depth map is used for training can be selected in the training process, and whether the depth map is used in the testing process can be selected.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A bi-modal separable image super-resolution method based on uncertainty steering, comprising:

performing bicubic interpolation up-sampling on the low-resolution depth image to obtain a sampling image;

2. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, wherein the sampled image is of a size consistent with the super-resolution depth map.

3. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, wherein said obtaining random features of said low resolution depth image from said sampled image comprises:

4. The method for super-resolution of a bimodal separable image based on uncertainty guiding according to claim 1, wherein said obtaining a first depth feature map and a corresponding first uncertainty estimate map based on said randomness features and said depth features comprises:

concatenating the randomness feature and the deep feature;

5. The method for super-resolution of a bimodal separable image based on uncertainty guiding according to claim 1, wherein the steps of inputting the color image and the sampled image into a depth detail estimation network, and outputting a second depth feature map and a corresponding second uncertainty estimation map comprise:

6. A bi-modal separable image super-resolution method as recited in claim 5, further comprising:

7. A bi-modal separable image super-resolution method as recited in claim 6, wherein the texture loss function is expressed as:

L＝||(y _te -y)+(y _te -y)·t|| ₁

wherein ,y_te Representing a second depth map; y represents a high resolution depth map; t represents the depth texture region to be reinforced.

8. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, further comprising:

training and optimizing the super-resolution depth map through a loss function;

the loss function is expressed as:

wherein ,

representing a super-resolution depth map; y represents a high resolution depth map; u represents an uncertainty estimation map.

9. A bi-modal separable image super-resolution method as recited in claim 8, further comprising:

concatenating the sampled image with the final super-resolution depth map;