CN116188273A - Uncertainty-oriented bimodal separable image super-resolution method - Google Patents

Uncertainty-oriented bimodal separable image super-resolution method Download PDF

Info

Publication number
CN116188273A
CN116188273A CN202310261226.6A CN202310261226A CN116188273A CN 116188273 A CN116188273 A CN 116188273A CN 202310261226 A CN202310261226 A CN 202310261226A CN 116188273 A CN116188273 A CN 116188273A
Authority
CN
China
Prior art keywords
image
depth
map
resolution
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310261226.6A
Other languages
Chinese (zh)
Inventor
张浩鹏
韩喆鑫
姜志国
谢凤英
赵丹培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310261226.6A priority Critical patent/CN116188273A/en
Publication of CN116188273A publication Critical patent/CN116188273A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a dual-mode separable image super-resolution method based on uncertainty guiding, which comprises the following steps: obtaining a low-resolution depth image and a corresponding color image; performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image; according to the sampling image, obtaining random features of the low-resolution depth image; inputting the low-resolution depth image into a depth encoder to obtain deep features of the low-resolution depth image; according to the randomness characteristic and the deep layer characteristic, a first characteristic diagram and a corresponding first uncertainty estimation diagram are obtained; inputting the color image and the sampling image into a depth detail estimation network, and outputting a second feature map and a corresponding second uncertainty estimation map; performing feature enhancement processing on the first feature map and the second feature map; and carrying out 3X 3 convolution processing on the first depth feature map and the second depth feature map after feature enhancement to obtain a super-resolution depth map. By the method, the super-resolution depth map with high quality can be obtained.

Description

Uncertainty-oriented bimodal separable image super-resolution method
Technical Field
The invention belongs to the technical field of digital image processing, and particularly relates to a double-mode separable image super-resolution method based on uncertainty guiding.
Background
The depth information can provide key information of a scene and is widely applied to the fields of computer vision such as three-dimensional reconstruction, object detection, instance segmentation and the like. However, due to technical limitations, it is difficult to obtain high quality depth maps, which limits the development of many more difficult visual tasks. The deep super-resolution technology can well solve the problem, and the implementation cost of the technology is low. Therefore, how to effectively improve the quality of a low resolution depth image using a depth super resolution technique is a very important research topic.
Compared to color images, depth images often have abrupt discontinuities due to differences in the actual object being occluded or positioned. In addition, depth maps often suffer from large areas of artifacts due to technical limitations or external disturbances. Super-resolution methods that focus on reconstructing color image details are often no longer suitable for depth image reconstruction processes. Therefore, reconstructing the depth map with a single modality is very difficult, whereas color images typically have clearer textures, which can effectively guide the reconstruction of high quality depth maps. There are three ways to accomplish efficient fusion of depth information and color information before, namely feature fusion at the input stage, reconstruction stage and output stage, respectively. The color map can not provide effective guidance for the depth map reconstruction process when the input stage is fused, the information can be highly coupled when the reconstruction stage is fused, and the feature fusion at the output stage ensures the effective guidance between the features and the separable process between the information.
When super-resolution reconstruction is performed on a depth map with discontinuous areas, the discontinuous areas cannot be distinguished in the reconstruction process by a traditional algorithm such as bicubic interpolation and the like, and meanwhile, the influence of noise interference is larger, so that the reconstruction effect is poor. Moreover, the traditional interpolation algorithm is designed according to the positions of pixels in the image, and the mode is often difficult to reflect the real situation of the image, so that the reliability of a calculation result is poor.
Most of depth map super-resolution reconstruction methods based on deep learning are single-point prediction, and it is difficult to distinguish holes from correct depth values. In addition, most of the prior depth image super-resolution reconstruction technologies based on deep learning color image guidance finish information fusion in a reconstruction stage, and the methods excessively depend on color images aligned with the depth images, and in an application process, the two modes often cannot be separated, so that the method is not suitable for the actual production requirement. In addition, the depth super-resolution reconstruction methods only consider the reconstruction performance of the model itself in the reconstruction process, and neglect the interpretation study of the reconstruction result.
Therefore, how to implement the mode separation in the process of improving the quality of the low-resolution depth image becomes a key problem of the current research.
Disclosure of Invention
In view of the above problems, the present invention provides a dual-mode separable image super-resolution method based on uncertainty guidance, which at least solves some of the above technical problems, and by which a high-quality super-resolution depth map can be obtained.
The embodiment of the invention provides a dual-mode separable image super-resolution method based on uncertainty guiding, which comprises the following steps:
obtaining a low-resolution depth image and a color image corresponding to the low-resolution depth image;
performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image;
according to the sampling image, obtaining random features of the low-resolution depth image;
inputting the low-resolution depth image into a depth encoder, and obtaining deep features of the low-resolution depth image through a plurality of residual modules;
obtaining a first feature map and a corresponding first uncertainty estimation map according to the randomness features and the deep features;
inputting the color image and the sampling image into a depth detail estimation network, and outputting a second characteristic image and a corresponding second uncertainty estimation image;
performing feature enhancement processing on the first feature map according to the first uncertainty estimation map;
performing feature enhancement processing on the second feature map according to the second uncertainty estimation map;
and carrying out 3X 3 convolution processing on the first depth feature map and the second depth feature map after feature enhancement to obtain a super-resolution depth map.
Further, the size of the sampled image is consistent with the size of the super-resolution depth map.
Further, the obtaining random features of the low-resolution depth image according to the sampling image specifically includes:
modulating the sampled image into a prior mean and a prior variance of a gaussian distribution by a plurality of 3 x 3 convolutions; and processing the prior mean value and the prior variance through heavy parameterized sampling to obtain the randomness characteristic of the low-resolution depth image.
Further, the obtaining a first depth feature map and a corresponding first uncertainty estimation map according to the randomness feature and the deep feature specifically includes:
concatenating the randomness feature and the deep feature;
fusing the features after series connection by adopting 1X 1 convolution to obtain a first fused feature;
enhancing the first fusion characteristic through a plurality of residual modules to obtain a first characteristic diagram;
processing the first feature map by adopting 3×3 convolution to obtain a first depth map corresponding to the first feature map;
and processing the first feature map by adopting 3×3 convolution to obtain a first uncertainty estimation map corresponding to the first feature map.
Further, the inputting the color image and the sampling image into a depth detail estimation network, and outputting a second depth feature map and a corresponding second uncertainty estimation map specifically includes:
the color image and the sampling image are connected in series and then used as the input of a depth detail estimation network;
in the depth detail estimation network, fusing the characteristics after series connection by adopting 1X 1 convolution to obtain a second fused characteristic;
enhancing the second fusion characteristic through a plurality of residual modules to obtain a second characteristic diagram;
processing the second feature map by adopting 3×3 convolution to obtain a second depth map corresponding to the second feature map;
and processing the second feature map by adopting 3×3 convolution to obtain a second uncertainty estimation map corresponding to the second feature map.
Further, the method further comprises the following steps:
processing the color image by using a Laplace filter to obtain a depth texture region;
and performing characteristic imposition processing on the deep texture region through a texture loss function.
Further, the texture loss function is expressed as:
L=||(y te -y)+(y te -y)·t|| 1
wherein ,yte Representing a second depth map; y represents the high resolution depth map and t represents the depth texture region to be enhanced.
Further, the method further comprises the following steps:
training and optimizing the super-resolution depth map through a loss function;
the loss function is expressed as:
Figure BDA0004131294230000041
wherein ,
Figure BDA0004131294230000042
representing a super-resolution depth map; y represents a high resolution depth map; u represents an uncertainty estimation map. The high-resolution depth map y refers to a reference image, namely a group trunk, which is used, and is supervised training, so that a label exists in the training process.
Further, the method further comprises the following steps:
concatenating the sampled image with the final super-resolution depth map;
modulating the serial images into a posterior mean value and a posterior variance of Gaussian distribution through a plurality of 3X 3 convolutions;
and constraining the prior mean value, the prior variance, the posterior mean value and the posterior variance through KL divergence.
Compared with the prior art, the method for super-resolution of the bimodal separable image based on uncertainty guiding has the following beneficial effects:
1. the depth super-resolution architecture provided by the invention can use the guiding mode as input for extraction, and also supports reasoning under the condition of no guiding mode, so as to obtain a high-quality super-resolution depth map.
2. The invention provides a cross-task learning scheme, which encourages depth detail to estimate the discontinuity of a network learning depth map and uses an uncertainty to guide a fusion network to fuse super-resolution depth results.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a schematic flow chart of a bimodal separable image super-resolution method based on uncertainty guidance according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of feature enhancement of uncertainty guidance provided by an embodiment of the present invention.
Fig. 3 is a schematic diagram showing comparison of visual results on an NYUv2 dataset according to an embodiment of the present invention.
Fig. 4 is a schematic diagram showing a comparison of visual results on an rgbd dataset according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a visualization result of an uncertainty chart generated in a learning process of a condition variation editor network and a depth detail estimation network according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, an embodiment of the present invention provides a dual-mode separable image super-resolution method based on uncertainty guidance, and the method provided by the present invention is described in detail below from a conditional variable encoder network (CVAENet), a depth detail estimation network and an uncertainty guidance fusion module, and a training optimization module, respectively.
1. Condition variable encoder network (CVAENet)
A conditional variable encoder network (CVAENet) includes a probability network, a depth editor, and a prediction network; the specific operation content is as follows:
obtaining a low-resolution depth image and a color image corresponding to the low-resolution depth image; performing tertiary interpolation up-sampling on the low-resolution depth image to obtain a sampling image; the size of the sampling image and the super-resolution depth map
Figure BDA0004131294230000061
Is uniform in size;
taking the sampled image as an input of a probability network; in the probability network, the input sampling image is modulated into a prior mean value and a prior variance of Gaussian distribution through a plurality of 3X 3 convolutions (4 in the embodiment of the invention); processing the prior mean value and the prior variance through a heavy parameterized sampling (Sample) to obtain a randomness characteristic of the low-resolution depth image; expressed by the formula:
Figure BDA0004131294230000062
wherein μ represents an a priori average; σ represents a priori variance; the hidden variable z contains the randomness characteristic of the image; sigma epsilon R 64×h×w H and w represent the height and width of the sampled image, respectively.
Because the deep features of the low-resolution image play a very important role in modeling the super-resolution depth image, a depth encoder is designed in the embodiment of the invention to extract the deep features of the low-resolution depth image, specifically: the low-resolution depth image is input into a depth encoder, deep features of the low-resolution depth image are obtained through a plurality of residual modules (16 in the embodiment of the invention), and more complex features can be gradually extracted from the image through superposition processing of the 16 residual modules;
according to the randomness characteristic and the deep characteristic, a first depth characteristic map and a corresponding first uncertainty estimation map are obtained: the series connection of the random feature and the deep feature in the feature dimension is used as the input of a prediction network, and the purpose of the series connection is thatThe two different types of features are fused for subsequent effects; in a prediction network, the features after series connection are fused by adopting 1X 1 convolution, so that the randomness features and the deep features have the same dimension, and a first fusion feature is obtained; because the current fused features still cannot accurately direct the generation of the super-resolution depth map, a plurality of residual modules (3 in the embodiment of the invention) are adopted in the prediction network to enhance the first fused features, so as to obtain a first feature map F d The method comprises the steps of carrying out a first treatment on the surface of the Processing the first feature map by adopting 3X 3 convolution to obtain a first depth map y corresponding to the first feature map d The method comprises the steps of carrying out a first treatment on the surface of the Simultaneously, 1 3 multiplied by 3 convolution is additionally adopted to process the first feature map to obtain a first uncertainty estimation map u corresponding to the first feature map d
2. Depth detail estimation network and uncertainty guidance fusion module
Because the color image can provide a great deal of details for the depth map, a depth detail estimation network is designed in the embodiment of the invention to fuse the color image characteristics and the depth image characteristics and guide the reconstruction of the depth map. Specifically, a high-resolution color image consistent with the original low-resolution depth image scene is acquired; taking the serial connection of the color image and the sampled image obtained after the three-time interpolation up-sampling as the input of a depth detail estimation network, in the depth detail estimation network, adopting 1X 1 convolution to perform information fusion on the image characteristics after the serial connection to obtain second fusion characteristics, adopting a plurality of residual modules (5 in the embodiment of the invention) to perform enhancement processing on the second fusion characteristics, and converting the second fusion characteristics into a second characteristic diagram F rich in texture estimation te The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, a 3 multiplied by 3 convolution is additionally added to the last layer of the depth detail estimation network, and the second characteristic map is processed by adopting the 3 multiplied by 3 convolution to obtain a second depth map yte corresponding to the second characteristic map; processing the second feature map by 3×3 convolution to obtain a first uncertainty estimation map u corresponding to the second feature map te
Encoder network (CVAENet) and depth detail estimation network with conditional variationThe method has the advantages that different types of depth information exist, and therefore an uncertainty guiding fusion module is designed to fuse two reconstructed images, and meanwhile additional regularization can be provided for the original two networks to help the two networks to reconstruct super-resolution depth images more effectively, on the other hand, as RGB images and depth images have serious inconsistencies in texture details, when occlusion occurs or the distance difference between adjacent objects is large, compared with a depth image, the color images have smoother texture transition, and therefore, the method can be used for effectively fusing the information of the two images, so that the problem of inconsistent RGB-D structures is relieved; referring to fig. 2, the following are specifically: first super-resolution depth map F to be outputted in prediction network d And a first uncertainty estimation map u d And a second super-resolution depth map F output in the depth detail estimation network te And a second uncertainty estimation map u te All serve as inputs to the uncertainty guide fusion module; in the uncertainty guiding fusion module, performing feature enhancement processing on a first depth feature map according to a first uncertainty estimation map; performing feature enhancement processing on the second depth feature map according to the second uncertainty estimation map; 3X 3 convolution processing is carried out on the first depth feature map and the second depth feature map after feature enhancement, and a super-resolution depth map is obtained
Figure BDA0004131294230000084
The feature map after feature enhancement processing based on uncertainty is expressed as:
F′=F*(1+SoftMax(Conv 3×3 (u)))
wherein F' represents a feature map after feature enhancement; f represents a first depth feature map to be reinforced or a second depth feature map to be reinforced; u represents the first uncertainty estimation map or the second uncertainty estimation map.
3. Training optimization
1. In order to enable the output result of the depth detail estimation network to have richer texture representation, in the embodiment of the invention, a Laplacian filter is adopted to process a color image in the depth detail estimation network, so as to obtain a depth texture region; performing characteristic imposition processing on the depth texture region through a texture loss function; the texture loss function is expressed as:
L=||(y te -y)+(y te -y)·t|| 1
wherein ,yte Representing a super-resolution result graph corresponding to the second depth feature graph; y represents a high resolution depth map; t represents the depth texture region to be reinforced.
2. According to the embodiment of the invention, training and optimizing are carried out on the super-resolution depth map through a loss function;
the loss function is expressed as:
Figure BDA0004131294230000081
to avoid instability of the equation due to the divide by 0 process, we design u=logσ in the network 2 Thus the loss can be further expressed as:
Figure BDA0004131294230000082
/>
wherein ,
Figure BDA0004131294230000083
representing a super-resolution depth map; y represents a high resolution depth map; u represents an uncertainty estimation map. The high-resolution depth map y refers to a reference image, namely a group trunk, which is used, and is supervised training, so that a label exists in the training process.
Thus, in each module, the loss function of the model can be expressed as:
in a conditional variable encoder network:
Figure BDA0004131294230000091
in a depth detail estimation network:
Figure BDA0004131294230000092
in an uncertainty fusion network:
Figure BDA0004131294230000093
it is further noted here that the probability network in the conditional variable encoder network (CVAENet) includes an a priori network and a posterior network; the parts relating to the a priori network have been described above; in the embodiment of the invention, the structure of the posterior network is consistent with that of the prior network, except that the inputs of the prior network and the posterior network are different; in the embodiment of the invention, the serial connection between the sampled image and the super-resolution depth map trained by the loss function is used as the input of a posterior network, and in the posterior network, the serial connection image is modulated into a prior mean value and a prior variance of Gaussian distribution by a plurality of 3X 3 convolutions (4 in the embodiment of the invention); KL divergence is additionally introduced to constrain the gap between a priori and a posterior networks.
The effectiveness of a dual modality separable image super resolution method based on uncertainty steering provided by the present invention is described next by way of a specific embodiment.
The present invention uses two data sets to verify the validity of our method, the NYUv2 data set and the real data rgbd data set, respectively. The evaluation index is RMSE; the lower the RMSE, the higher the reconstruction quality of the image. The results of the method provided by the examples of the present invention compared to other methods on these two data sets can be seen in tables 1 and 2 below:
TABLE 1 comparison of the methods provided by the examples of the invention with other methods on the NYUv2 dataset
Figure BDA0004131294230000094
/>
Figure BDA0004131294230000101
TABLE 2 comparison of the methods provided by embodiments of the invention with other methods on RGBDD data sets
Figure BDA0004131294230000102
As shown in the above tables 1 and 2, the embodiment of the present invention compares with the most advanced depth image super-resolution method on two test sets of three different scale factors (×4, ×8, ×16); it is evident that the method provided by the embodiments of the present invention achieves optimal performance at a plurality of data sets and a plurality of scaling factors, which also proves that the method provided by the embodiments of the present invention has a strong advantage over the prior art. The model performance of the method provided by the embodiment of the invention is obviously superior to that of a suboptimal model of all two data sets when the scaling factor is 16, and the method provided by the embodiment of the invention is proved to have the capability of recovering richer results by utilizing information on smaller images. In addition, the invention tests the performance of the model on the real data set to verify the generalization capability of the model, and compared with other methods, the invention can obtain better performance on the real data set, and the specific visual comparison result diagram can be seen in fig. 3 and 4.
In addition, the embodiment of the invention also visualizes uncertainty results in the two backbone network learning processes, and the results are shown in fig. 5, and as can be seen from the graph, the depth detail estimation network pays more attention to texture details in the reconstructed image, and the reconstruction results of the condition variation encoder network are well supplemented. Experimental results on multiple data sets demonstrate the excellent performance and popularity of the methods provided by the examples of the present invention. Compared with the comparison method, the method provided by the embodiment of the invention achieves competitive results.
The embodiment of the invention provides a dual-mode separable image super-resolution method based on uncertainty guiding, which is characterized in that a depth reconstruction network based on a conditional variation automatic encoder is designed firstly, and different from a common depth super-resolution reconstruction method, the method provided by the embodiment of the invention introduces label information into the network through mutual constraint of priori and posterior, so that a more reliable super-resolution result is obtained. Furthermore, this structure can easily enhance the depth encoder network, improving its performance. In order to improve the performance of the fusion result, the invention introduces uncertainty learning in the deep super-resolution task for the first time, so that the color information can provide more effective supplement for the fusion network, and more reliable fusion result is realized. The method for realizing the super-resolution of the dual-mode separable image based on uncertainty guidance can realize mode separation in the process of improving the quality of the low-resolution depth image; the mode separation is embodied in the training and reasoning process, the color map and the depth reconstruction process can be separated, namely, whether the depth map is used for training can be selected in the training process, and whether the depth map is used in the testing process can be selected.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. A bi-modal separable image super-resolution method based on uncertainty steering, comprising:
obtaining a low-resolution depth image and a color image corresponding to the low-resolution depth image;
performing bicubic interpolation up-sampling on the low-resolution depth image to obtain a sampling image;
according to the sampling image, obtaining random features of the low-resolution depth image;
inputting the low-resolution depth image into a depth encoder, and obtaining deep features of the low-resolution depth image through a plurality of residual modules;
obtaining a first feature map and a corresponding first uncertainty estimation map according to the randomness features and the deep features;
inputting the color image and the sampling image into a depth detail estimation network, and outputting a second characteristic image and a corresponding second uncertainty estimation image;
performing feature enhancement processing on the first feature map according to the first uncertainty estimation map;
performing feature enhancement processing on the second feature map according to the second uncertainty estimation map;
and carrying out 3X 3 convolution processing on the first depth feature map and the second depth feature map after feature enhancement to obtain a super-resolution depth map.
2. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, wherein the sampled image is of a size consistent with the super-resolution depth map.
3. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, wherein said obtaining random features of said low resolution depth image from said sampled image comprises:
modulating the sampled image into a prior mean and a prior variance of a gaussian distribution by a plurality of 3 x 3 convolutions; and processing the prior mean value and the prior variance through heavy parameterized sampling to obtain the randomness characteristic of the low-resolution depth image.
4. The method for super-resolution of a bimodal separable image based on uncertainty guiding according to claim 1, wherein said obtaining a first depth feature map and a corresponding first uncertainty estimate map based on said randomness features and said depth features comprises:
concatenating the randomness feature and the deep feature;
fusing the features after series connection by adopting 1X 1 convolution to obtain a first fused feature;
enhancing the first fusion characteristic through a plurality of residual modules to obtain a first characteristic diagram;
processing the first feature map by adopting 3×3 convolution to obtain a first depth map corresponding to the first feature map;
and processing the first feature map by adopting 3×3 convolution to obtain a first uncertainty estimation map corresponding to the first feature map.
5. The method for super-resolution of a bimodal separable image based on uncertainty guiding according to claim 1, wherein the steps of inputting the color image and the sampled image into a depth detail estimation network, and outputting a second depth feature map and a corresponding second uncertainty estimation map comprise:
the color image and the sampling image are connected in series and then used as the input of a depth detail estimation network;
in the depth detail estimation network, fusing the characteristics after series connection by adopting 1X 1 convolution to obtain a second fused characteristic;
enhancing the second fusion characteristic through a plurality of residual modules to obtain a second characteristic diagram;
processing the second feature map by adopting 3×3 convolution to obtain a second depth map corresponding to the second feature map;
and processing the second feature map by adopting 3×3 convolution to obtain a second uncertainty estimation map corresponding to the second feature map.
6. A bi-modal separable image super-resolution method as recited in claim 5, further comprising:
processing the color image by using a Laplace filter to obtain a depth texture region;
and performing characteristic imposition processing on the deep texture region through a texture loss function.
7. A bi-modal separable image super-resolution method as recited in claim 6, wherein the texture loss function is expressed as:
L=||(y te -y)+(y te -y)·t|| 1
wherein ,yte Representing a second depth map; y represents a high resolution depth map; t represents the depth texture region to be reinforced.
8. A bi-modal separable image super-resolution method based on uncertainty steering as claimed in claim 1, further comprising:
training and optimizing the super-resolution depth map through a loss function;
the loss function is expressed as:
Figure FDA0004131294220000031
wherein ,
Figure FDA0004131294220000032
representing a super-resolution depth map; y represents a high resolution depth map; u represents an uncertainty estimation map.
9. A bi-modal separable image super-resolution method as recited in claim 8, further comprising:
concatenating the sampled image with the final super-resolution depth map;
modulating the serial images into a posterior mean value and a posterior variance of Gaussian distribution through a plurality of 3X 3 convolutions;
and constraining the prior mean value, the prior variance, the posterior mean value and the posterior variance through KL divergence.
CN202310261226.6A 2023-03-17 2023-03-17 Uncertainty-oriented bimodal separable image super-resolution method Pending CN116188273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310261226.6A CN116188273A (en) 2023-03-17 2023-03-17 Uncertainty-oriented bimodal separable image super-resolution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310261226.6A CN116188273A (en) 2023-03-17 2023-03-17 Uncertainty-oriented bimodal separable image super-resolution method

Publications (1)

Publication Number Publication Date
CN116188273A true CN116188273A (en) 2023-05-30

Family

ID=86432850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310261226.6A Pending CN116188273A (en) 2023-03-17 2023-03-17 Uncertainty-oriented bimodal separable image super-resolution method

Country Status (1)

Country Link
CN (1) CN116188273A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649343A (en) * 2024-01-29 2024-03-05 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649343A (en) * 2024-01-29 2024-03-05 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder
CN117649343B (en) * 2024-01-29 2024-04-12 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder

Similar Documents

Publication Publication Date Title
Bashir et al. A comprehensive review of deep learning-based single image super-resolution
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN109035146B (en) Low-quality image super-resolution method based on deep learning
CN111242238B (en) RGB-D image saliency target acquisition method
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
Luo et al. Lattice network for lightweight image restoration
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN112381716B (en) Image enhancement method based on generation type countermeasure network
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN108171654B (en) Chinese character image super-resolution reconstruction method with interference suppression
Yue et al. IENet: Internal and external patch matching ConvNet for web image guided denoising
CN113538246A (en) Remote sensing image super-resolution reconstruction method based on unsupervised multi-stage fusion network
Xu et al. AutoSegNet: An automated neural network for image segmentation
CN116188273A (en) Uncertainty-oriented bimodal separable image super-resolution method
Yu et al. Semantic-driven face hallucination based on residual network
Yao et al. Depth super-resolution by texture-depth transformer
CN116563100A (en) Blind super-resolution reconstruction method based on kernel guided network
Zuo et al. MIG-net: Multi-scale network alternatively guided by intensity and gradient features for depth map super-resolution
CN110288529B (en) Single image super-resolution reconstruction method based on recursive local synthesis network
Shen et al. Mutual information-driven triple interaction network for efficient image dehazing
Chen et al. Dynamic degradation intensity estimation for adaptive blind super-resolution: A novel approach and benchmark dataset
CN114283058A (en) Image super-resolution reconstruction method based on countermeasure network and maximum mutual information optimization
Xu et al. Depth map super-resolution via joint local gradient and nonlocal structural regularizations
CN115661340B (en) Three-dimensional point cloud up-sampling method and system based on source information fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination