CN113920014A - Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method - Google Patents

Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method Download PDF

Info

Publication number
CN113920014A
CN113920014A CN202111240795.XA CN202111240795A CN113920014A CN 113920014 A CN113920014 A CN 113920014A CN 202111240795 A CN202111240795 A CN 202111240795A CN 113920014 A CN113920014 A CN 113920014A
Authority
CN
China
Prior art keywords
resolution
depth
combined
color
depth map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111240795.XA
Other languages
Chinese (zh)
Inventor
左一帆
王皓
姜文晖
夏雪
方玉明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Finance and Economics
Original Assignee
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Finance and Economics filed Critical Jiangxi University of Finance and Economics
Priority to CN202111240795.XA priority Critical patent/CN113920014A/en
Publication of CN113920014A publication Critical patent/CN113920014A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/60Rotation of a whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention provides a combined trilateral filter depth map super-resolution reconstruction method based on neural networking, which comprises the steps of obtaining a low-resolution depth map and a high-resolution color map, constructing a neural network model by adopting a progressive up-sampling mode, and respectively extracting depth domain characteristics of the low-resolution depth map and color domain characteristics of the high-resolution color map; fusing color domain features and depth domain features in a content perception mode by adopting a combined trilateral filtering module; based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result; and applying the updated depth domain features to reconstruct and form a high-quality depth map. The method has good robustness and superiority, can reduce errors, and improves the quality of the obtained high-resolution depth map.

Description

Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method
Technical Field
The invention relates to the technical field of image processing, in particular to a method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking.
Background
With the wide application of RGB-D data consisting of RGB color images and depth images in the fields of virtual reality, three-dimensional reconstruction, SLAM, and the like, depth maps have been acquired in real time by consumer-grade depth sensors. However, the production cost of the sensor is high, and the resolution of the obtained original depth map is low, the noise interference is strong, and the application requirements can not be met. Therefore, the reconstruction and enhancement of the low-quality depth map become an essential part of the depth map application process.
In practical applications, for a low-resolution original depth map, it is desirable to perform upsampling on a larger scale, for example, eight times or more. However, in the case of a large up-sampling scale, the super-resolution reconstruction of a single depth map may distort or lose details and local structures of a reconstruction result. In order to improve the performance of the algorithm, researchers propose to extract information from a high-resolution color image or intensity image so as to guide a task of super-resolution reconstruction of a depth image. In some existing studies, a joint bilateral filter and its variants, such as a joint trilateral filter, use an exponential function to calculate the convolution kernel weight of each neighborhood pixel based on the assumption that a color map boundary and a corresponding depth map boundary have consistency. The convolution kernel weight changes along with the change of the pixel position, the color guide information can be fused in a self-adaptive mode, and the high-quality depth map boundary is reconstructed.
The traditional filtering-based depth map super-resolution reconstruction method is developed on the basis of the classical filtering theory of digital image processing, and the depth value of each pixel is independently calculated according to local smooth prior. Based on the assumption that the color boundary and the depth boundary are consistent, researchers explicitly design various predefined functions to compute convolution kernel weights within the local window. The method mainly comprises the following steps: (1) taking the low-resolution depth map and the corresponding high-resolution color map as a target domain and a guide domain, and introducing a nonlinear combined bilateral filter; (2) calculating a convolution kernel weight based on the geodesic distance of the combined image coordinate and color, wherein the convolution kernel weight has better performance in the aspect of edge preservation; (3) in view of the situation that the above assumption is not true, some researchers propose information such as gradient and space of a joint depth map to reduce artifacts of texture copy; (4) establishing a linear relationship between color gradient and depth gradient within the local image block; (5) selecting an optimal depth candidate value according to the minimum loss value to refine the depth map; (6) it is proposed to determine the optimal depth value by the maximum value of the L1 norm optimized joint histogram. While filter-based approaches can adaptively adjust the color guidance information for each pixel location, shallow models based on predefined convolution kernels cannot describe fine-grained correlations between color images and corresponding depth images.
In recent years, due to the strong model expression capability of the deep neural network, the deep neural network has made remarkable progress in the task of color-guided super-resolution reconstruction of the depth map and has a dominant position. Under the guidance of the color domain features, the deep neural network implicitly learns the mapping function from the low-resolution depth map to the high-resolution depth map in a supervised learning mode. Compared with the traditional method, the method based on the deep neural network has greatly improved performance. Various researchers have proposed a variety of implementations, including: (1) and by fusing the multi-scale color guide features, the low-resolution depth domain features are up-sampled step by step. (2) The method comprises the steps of obtaining a multi-scale convolution kernel in a learning mode; (3) local and global residual error learning techniques are adopted to improve the robustness of training; (4) the fidelity and regularization prior of the Markov random field are learned through a deep neural network; (5) a special module is provided for adaptively decomposing high-frequency components in the RGB image to guide depth map reconstruction; (6) it is proposed to mitigate the artifacts of texture copying by an affine transform layer. The deep neural network method described above always fuses color guide features through channel stitching in the test phase, and the weights of the convolution kernels are shared by all locations in each channel and are independent of the input.
In order to be able to adaptively adjust color guidance information during the testing process, various researchers have proposed various implementation methods, including: (1) a progressive multi-branch aggregation network is provided, and a channel attention mechanism is introduced to fuse and splice cross-domain characteristics on channel dimensions; (2) designing a deep neural network to simulate a traditional combined trilateral filter, constructing two sub-networks to respectively extract color and depth domain characteristics and realizing fusion through channel splicing; (3) the attention of a conventional convolution kernel is generated with a function that is predefined in terms of its characteristics.
In the latest methods based on deep neural networks, feature reuse is usually achieved using channel feature stitching. However, in contrast to conventional joint trilateration filters, channel feature stitching does not have content-dependent parameter readjustment characteristics during the test phase. At the same time, the effect of the color guidance feature cannot be adaptively adjusted at all pixel positions. In addition, since the mainstream depth neural network method only considers forward depth domain feature reconstruction from low resolution to high resolution, when the up-sampling scale is large, it is easy to cause accumulation of multi-scale reconstruction errors, which brings challenges to depth map reconstruction.
Disclosure of Invention
Aiming at the defects of the prior art and based on the advantages of the traditional trilateral filter and the deep neural network, the invention provides a combined trilateral filter based on neural networking for a depth map super-resolution reconstruction method.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a combined trilateral filter based on neural networking is used for a depth map super-resolution reconstruction method, and the combined trilateral filter based on neural networking is used for obtaining a low-resolution depth map and a high-resolution color map, a neural network model is constructed in a progressive up-sampling mode, and depth domain features of the low-resolution depth map and color domain features of the high-resolution color map are respectively extracted; fusing color domain features and depth domain features in a content perception mode by adopting a combined trilateral filtering module; based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result; and applying the updated depth domain features to reconstruct and form a high-quality depth map.
Further, acquiring the low-resolution depth map and the high-resolution color map comprises: and dividing a training set and a test set based on the synthetic data set and the real data set, acquiring an image pair of the high-resolution color image, the high-resolution depth image and the corresponding low-resolution depth image, extracting sub-images according to a preset size, and performing random enhancement to obtain training data.
Further, randomly enhancing the data of the sub-image comprises: the data of the sub-image is rotated by 90 degrees, rotated by 180 degrees, vertically flipped, or horizontally flipped.
Further, after the data of the sub-images are randomly enhanced, the randomly enhanced data are normalized.
Further, the step of constructing a neural network model by adopting a progressive up-sampling mode, and the step of respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map comprises the following steps: performing double super-resolution processing on input features in one stage; respectively constructing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for extracting color domain features of the high-resolution color image and then gradually performing down-sampling to generate multi-scale color domain features; and the depth reconstruction branch is used for extracting the depth domain features of the low-resolution depth map and gradually up-sampling to reconstruct the multi-scale depth features.
Further, the color guide branch comprises a shallow feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.
Further, adopting the combined trilateral filtering module to fuse the color domain feature and the depth domain feature in a content perception manner includes: designing a convolution kernel generation sub-network inside the combined trilateral filtering module, obtaining a convolution kernel in a learning mode, wherein the convolution kernel function only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter; and obtaining the color feature guided depth domain features in combination with the initial depth domain features.
Further, the bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution includes: inputting the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages into a bidirectional depth feature fusion unit, wherein the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module; and the cross-scale depth domain features are fused in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and the depth domain features of the current scale are updated by using the uplink combined trilateral filtering module.
Further, a variation of the joint trilateration filtering module generates the convolution kernel directly at the resolution of the target feature domain.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a neural-networking-based combined trilateral filter used for a depth map super-resolution reconstruction method, which adopts a variant of a combined trilateral filter module to realize bidirectional fusion of cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution, and updates the depth domain features of a high-resolution depth map by using a fusion result.
Drawings
FIG. 1 is a flowchart of an embodiment of a method for reconstructing super-resolution depth map based on a combined trilateral filter of neural networking according to the present invention.
FIG. 2 is a network topology diagram of an embodiment of a method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking.
Fig. 3 is a topological diagram of a bidirectional depth fusion unit used in an embodiment of the method for reconstructing super-resolution depth map based on a neural-networked combined trilateral filter of the present invention.
Fig. 4 is a topological diagram of a combined trilateral filtering module and its variants according to an embodiment of the present invention, which is based on a neural-networking combined trilateral filter and is used in a depth map super-resolution reconstruction method.
Fig. 5 is a depth map reconstruction effect diagram of the embodiment of the method for reconstructing super-resolution depth map based on neural network combined trilateral filter of the invention under the condition of synthetic data and real data.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the combined trilateral filter based on neural networking for super-resolution depth map reconstruction method firstly executes step S1 to obtain a low-resolution depth map and a high-resolution color map, and adopts a progressive upsampling mode to construct a neural network model, and then extracts depth domain features of the low-resolution depth map and color domain features of the high-resolution color map respectively. Specifically, the embodiment divides a training set and a test set based on a synthetic data set and a real data set, obtains an image pair of a high-resolution color image, a high-resolution depth image and a corresponding low-resolution depth image, extracts sub-images according to a predetermined size, and performs random enhancement to obtain training data.
The method comprises the steps of establishing a neural network model by adopting a progressive up-sampling mode, respectively extracting depth domain features of a low-resolution depth map and color domain features of a high-resolution color map, performing double super-resolution processing on input features in one stage, and respectively establishing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for performing color domain feature extraction on the high-resolution color map, then performing down-sampling step by step to generate multi-scale color domain features, and the depth reconstruction branch is used for performing depth domain feature extraction on the low-resolution depth map, and then performing up-sampling step to reconstruct the multi-scale depth features. Preferably, the color guide branch comprises a shallow feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.
In addition, the data of the sub-images can be randomly enhanced by rotating the data of the sub-images by 90 degrees, rotating the data of the sub-images by 180 degrees, vertically turning the data of the sub-images or horizontally turning the data of the sub-images, and after the data of the sub-images are randomly enhanced, the randomly enhanced data are normalized, that is, all the data need to be normalized before being input into the convolution model.
Then, step S2 is executed to adopt the joint trilateration filtering module to fuse the color domain features and the depth domain features in a content-aware manner. Specifically, when the combined trilateral filtering module is adopted to fuse the color domain features and the depth domain features in a content perception mode, a convolution kernel generation sub-network inside the combined trilateral filtering module is designed firstly, a convolution kernel is obtained in a learning mode, the convolution kernel function only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter, instead of independently learning a single weight of the convolution kernel, and finally the depth domain features guided by the color features are obtained by being combined with the initial depth domain features.
Next, step S3 is performed to achieve bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution based on a variation of the joint trilateration filtering module. Specifically, the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages are input into a bidirectional depth feature fusion unit, and the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module; and the cross-scale depth domain features are fused in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and the depth domain features of the current scale are updated by using the uplink combined trilateral filtering module. And, a variation of the joint trilateration filtering module generates the convolution kernel directly at the resolution of the target feature domain.
Preferably, the variant of the combined trilateral filtering module of this embodiment directly generates a convolution kernel function at the resolution of the target feature domain, and by defining the target domain of the depth domain features, i.e., the high resolution or low resolution feature domain between the inputs, the corresponding variants are respectively referred to as an uplink combined trilateral filtering module and a downlink combined trilateral filtering module.
Then, step S4 is performed to update the depth domain feature of the high resolution depth map using the fusion result. Finally, step S5 is executed to apply the updated depth domain feature reconstruction to form a high quality depth map. Specifically, for a model with a reconstruction scale of 2, a super-resolution reconstructed depth image can be obtained based on a result output in an up-sampling stage, and for a model with a reconstruction scale of more than 2, corresponding depth domain features are updated through a multi-scale bidirectional depth feature fusion module, and a super-resolution reconstructed depth image can be obtained.
The topology structure of the combined trilateration filter based on neural networking for the depth map super-resolution reconstruction method of the present embodiment is described below with reference to fig. 2. This embodiment uses a low resolution depth map DLRAnd high resolution color map IHRAs input and by means of a high resolution depth map DHRLearning the generation function O (theta | D) with theta as a parameterLR,IHR) To predict a corresponding high resolution depth map DSR. Preferably, the topology used in this embodiment includes two branches of color guidance and depth reconstruction, and extracts multi-scale color guidance features respectively, and refines the depth domain features from coarse to fine.
Specifically, the color guide branch includes a shallow feature extraction module 20 and a multi-scale guide feature generation module 15 composed of a plurality of guide feature extraction units 30, 40, 50 of a specific scale.
The high-resolution color image 2 is input into a shallow layer feature extraction module 20, the shallow layer feature extraction module 20 comprises two convolution layers 21 and 22, the two convolution layers 21 and 22 are convolution layers activated by Prelu, color domain feature extraction is carried out on the high-resolution color image 2 through the convolution layers 21 and 22, and the extracted color domain features are used as input of a multi-scale guidance feature generation module 15. For example, the color domain features of the high-resolution color map 2 are extracted using a known shallow feature extraction method.
Each of the guiding feature extracting units 30, 40, 50 has the same structure, and the guiding feature extracting unit 30 includes a convolutional layer 31 and a pooling layer 32. For example, when the image is enlarged by l, i.e. the width and height of the image are both enlarged by l times, the present embodiment gradually up-samples the depth domain features by twice by transposing the convolutional layer. And, the color guide features are progressively downsampled to match the resolution of the corresponding depth domain features. The calculation formula of the guidance feature extraction unit with a specific scale is as follows:
Figure BDA0003319462980000081
in formula 1
Figure BDA0003319462980000082
And
Figure BDA0003319462980000083
respectively represent the m-th
Figure BDA0003319462980000084
The function and output of each guide feature extraction unit.
The depth reconstruction branch comprises a shallow layer feature extraction module 10, a multi-scale guide feature fusion module 16, a cross-scale depth feature fusion unit 90 and a depth super-resolution reconstruction module 94. The low-resolution depth map 1 is input to a shallow feature extraction module 10, the shallow feature extraction module 10 includes two convolutional layers 11 and 12, both convolutional layers 11 and 12 are convolutional layers activated by Prelu, depth domain feature extraction is performed on the low-resolution depth map 1 through the convolutional layers 11 and 12, and the extracted depth domain features are used as input of a multi-scale guidance feature fusion module 16. For example, the depth domain features of the low resolution depth map 1 are extracted using a known shallow feature extraction method.
The multi-scale guiding feature Fusion module 16 comprises a plurality of guiding feature Fusion units, for example guiding feature Fusion units 60, 70, 80, each of which has the same structure, and taking the guiding feature Fusion unit 60 as an example, the guiding feature Fusion unit 60 comprises a residual dense linking module (RDB)61, a joint trilateral filtering Fusion module (JTF Fusion)62, a convolution layer (Conv)63 and a transposed convolution layer (Trans Conv)64 activated by Prelu.
Each guide feature fusion unit receives a feature output by the guide feature extraction unit. For example, after extracting the color domain features, the shallow layer feature extraction module 20 sequentially outputs the color domain features to the plurality of guidance feature extraction units 30, 40, and 50, and each guidance feature extraction unit 30, 40, and 50 respectively extracts the color domain features of the features output from the previous layer. Therefore, it is necessary to extract the shallow feature by the two convolution layers 21, 22 of the shallow feature extraction module 20
Figure BDA0003319462980000096
The combined trilateral filtering fusion module 62 is composed of n combined trilateral filtering modules, and the calculation process of the combined trilateral filtering fusion is as follows:
Figure BDA0003319462980000091
in formula 2, Tconvm、Convm、JTFm、RDBmAnd
Figure BDA0003319462980000092
respectively, indicating transitions activated by PreluAnd the convolution layer 64, the convolution layer 63, the combined trilateral filtering fusion module 62, the residual dense connection module 61 and the output of the mth guide feature fusion unit.
The cross-scale feature fusion unit 90 includes two bidirectional depth feature fusion units 91 and 92, and in order to introduce bidirectional fusion of features in the cross-scale depth domain, the output results of two adjacent guidance feature fusion units are input to one bidirectional depth feature fusion unit in the present embodiment, for example, the output results of the guidance feature fusion units 60 and 70 are input to the bidirectional depth feature fusion unit 92, and the output results of the guidance feature fusion units 70 and 80 are input to the bidirectional depth feature fusion unit 91.
The bidirectional depth feature fusion unit performs bidirectional depth feature fusion calculation as follows:
Figure BDA0003319462980000093
in the formula 3, the first step is,
Figure BDA0003319462980000094
a function representing the mth bi-directional depth feature fusion unit,
Figure BDA0003319462980000095
the updated depth domain feature is used as the input of the next guiding feature fusion unit. The output result of the cross-scale feature fusion unit 90 is input to a depth super-resolution reconstruction module 94, and the depth super-resolution reconstruction module 94 includes a convolutional layer 93.
In addition, the data of the low resolution depth map 1 is also output to a Bicubic (Bicubic) module 95, and a coarse up-sampling depth map D is obtained by Bicubic interpolation calculationBICThe results of the bicubic interpolation module 95 and the depth super-resolution reconstruction module 94 pass through a convolution layer to generate a high-resolution depth map 3, i.e. a high-resolution depth image D is obtainedSRHigh resolution depth image DSRThe calculation process of (2) is as follows:
Figure BDA0003319462980000101
in formula 4, DRBDSRFunction representing a depth super-resolution reconstruction module 94, DBICAnd DSRRepresenting a coarsely up-sampled depth map and a predicted high-resolution depth map.
Because the convolution kernel of the conventional combined trilateral filter calculates the difference of the depth pixel position, the depth gradient and the color value through an exponential function, based on the advantages of the conventional combined trilateral filter, the present embodiment extends the combined trilateral filter definition domain from the original pixel domain to the feature domain as follows:
Figure BDA0003319462980000102
in the formula 5, the first step is,
Figure BDA0003319462980000103
the c-th channel representing feature F is a local window of size K x K centered on p,
Figure BDA0003319462980000104
and
Figure BDA0003319462980000105
representing the learned convolution kernel weights at the c channel position q of the depth domain features and the color guide features,
Figure BDA0003319462980000106
corresponding to the fused depth domain features.
The combined trilateral filtering module 62 learns two convolution kernels δ and γ from the depth domain feature and the color domain feature, respectively, and then multiplies δ and γ by elements to obtain a final combined trilateral filtering kernel φ. It follows that the delta and gamma elements vary with the position of the feature, and the role of the color guide feature in all feature dimensions can be adaptively adjusted.
Although designs incorporating the trilateral filtering modules 62, 72, 82 may be possibleThe model representation capability is remarkably improved, but direct and independent learning of all convolution kernel weights in delta and gamma can lead to a sharp increase of parameters, and the model is easy to overfit. To solve this problem, the present embodiment focuses only on fitting of the convolution kernel generation function. B, C, H, W respectively represent the sample number, channel number, height and width dimensions of the feature, and the embodiment designs two lightweight subnetworks with the same structure, thereby realizing the slave feature F e RB ×C×H×WTo the core
Figure BDA0003319462980000111
To (3) is performed. Thus, the trilateral filtering kernel is finally combined
Figure BDA0003319462980000112
Generate K2One weight corresponds to each feature position (c, h, W), c e C, h e H, W e W. In a particular implementation, the present embodiment fits a convolution kernel generation function using a plurality of convolution layers. To further compress the size of the model, the present embodiment divides the channels into groups, and r channels in a group share one convolution kernel, so that the convolution kernel of the last combined trilateral filtering module is
Figure BDA0003319462980000113
Furthermore, the present embodiment proposes a novel depth feature fusion unit to fuse the cross-scale depth domain features in a bi-directional manner, i.e. to achieve a bi-directional fusion from low resolution to high resolution and from high resolution to low resolution. Specifically, referring to fig. 3, the bidirectional depth feature fusion unit 91 of this embodiment is composed of two uplink combined trilateral filtering modules 111 and 113 and a downlink combined trilateral filtering module 112, where the uplink combined trilateral filtering modules 111 and 113 and the downlink combined trilateral filtering module 112 are all variants of the combined trilateral filtering module. Specifically, the cross-scale depth domain features are first bi-directionally fused by using the uplink combined trilateral filtering module 111 and the downlink combined trilateral filtering module 112, and then the depth domain features under the current scale are further updated by using the uplink combined trilateral filtering module 113.
Since the resolution of the features across the scale depth domain is different, the combined trilateral filtering module cannot be used directly, so that the present embodiment proposes a variant of the combined trilateral filtering module to generate a convolution kernel directly in the resolution of the target feature domain. And respectively calling the corresponding variants as an uplink combined trilateral filtering module and a downlink combined trilateral filtering module according to the high resolution and the low resolution of the target domain features. In a specific implementation, the difference between the variant and the joint trilateral filtering module is that the second layer of convolutional layers in the convolutional kernel generation subnetwork is replaced with the transposed convolutional layer and convolutional layer, respectively, with step size 2.
Referring to fig. 4, the combined trilateral filtering module and its variants include a kernel generation sub-network 120, a convolution layer 121, a combined trilateral filter module 122, a convolution layer 123, a convolution layer 131, a convolution layer 132, and a convolution layer 133 are disposed in the sub-network 120, after the domain features are input to the convolution layer 121, the convolution results are obtained through the combined trilateral filter module 122 and the convolution layer 123 in sequence, after the target domain features are input to the convolution layer 131, the convolution results are obtained through the convolution layer 132 and the convolution layer 133 in sequence, after the target domain features pass through the combined trilateral filtering kernel 140, the combined trilateral filtering convolution is performed with the features of the image block extraction 145, and the combined trilateral filtering result is obtained through the combined trilateral filtering kernel 142, the convolution layer 143, and the convolution layer 144. Fig. 5 shows a depth map reconstruction effect map under the synthesized data and the real data.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking is characterized by comprising the following steps:
acquiring a low-resolution depth map and a high-resolution color map, constructing a neural network model by adopting a progressive up-sampling mode, and respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map;
fusing the color domain features and the depth domain features in a content perception mode by adopting a combined trilateral filtering module;
based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result;
and applying the updated depth domain features to reconstruct and form a high-quality depth map.
2. The method for reconstructing super-resolution depth map based on the combined neural network trilateration filter as claimed in claim 1, wherein:
acquiring the low resolution depth map and the high resolution color map comprises: and dividing a training set and a test set based on the synthetic data set and the real data set, acquiring an image pair of the high-resolution color image, the high-resolution depth image and the corresponding low-resolution depth image, extracting sub-images according to a preset size, and performing random enhancement to obtain training data.
3. The method for reconstructing super-resolution depth map based on the combined neural network trilateration filter as claimed in claim 2, wherein:
randomly enhancing the data of the sub-image comprises: and performing 90-degree rotation, 180-degree rotation, vertical turning or horizontal turning on the data of the sub-image.
4. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 3, wherein:
and after the data of the sub-images are randomly enhanced, normalizing the randomly enhanced data.
5. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:
adopting a progressive up-sampling mode to construct a neural network model, and respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map comprises the following steps:
performing double super-resolution processing on input features in one stage;
respectively constructing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for extracting color domain features of the high-resolution color image and then gradually performing down-sampling to generate multi-scale color domain features; and the depth reconstruction branch is used for extracting the depth domain features of the low-resolution depth map and gradually up-sampling to reconstruct the multi-scale depth features.
6. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 5, wherein:
the color guide branch comprises a shallow layer feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.
7. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:
adopting and uniting trilateral filtering module and fusing colour domain characteristic and depth domain characteristic with content perception's mode includes:
designing a convolution kernel generation sub-network inside a combined trilateral filtering module, and obtaining a convolution kernel in a learning mode, wherein the convolution kernel only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter; and obtaining the color feature guided depth domain features in combination with the initial depth domain features.
8. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:
the bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution includes:
inputting the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages into a bidirectional depth feature fusion unit, wherein the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module;
and fusing the cross-scale depth domain features in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and updating the depth domain features of the current scale by using the uplink combined trilateral filtering module.
9. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 8, wherein:
a variation of the joint trilateration filtering module generates convolution kernels directly at the resolution of the target feature domain.
CN202111240795.XA 2021-10-25 2021-10-25 Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method Pending CN113920014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111240795.XA CN113920014A (en) 2021-10-25 2021-10-25 Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111240795.XA CN113920014A (en) 2021-10-25 2021-10-25 Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method

Publications (1)

Publication Number Publication Date
CN113920014A true CN113920014A (en) 2022-01-11

Family

ID=79242665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111240795.XA Pending CN113920014A (en) 2021-10-25 2021-10-25 Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method

Country Status (1)

Country Link
CN (1) CN113920014A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972043A (en) * 2022-08-03 2022-08-30 江西财经大学 Image super-resolution reconstruction method and system based on combined trilateral feature filtering
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972043A (en) * 2022-08-03 2022-08-30 江西财经大学 Image super-resolution reconstruction method and system based on combined trilateral feature filtering
CN114972043B (en) * 2022-08-03 2022-10-25 江西财经大学 Image super-resolution reconstruction method and system based on combined trilateral feature filtering
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
CN116523759B (en) * 2023-07-04 2023-09-05 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism

Similar Documents

Publication Publication Date Title
CN111062872B (en) Image super-resolution reconstruction method and system based on edge detection
CN111047516B (en) Image processing method, image processing device, computer equipment and storage medium
CN109389556B (en) Multi-scale cavity convolutional neural network super-resolution reconstruction method and device
CN110427968B (en) Binocular stereo matching method based on detail enhancement
CN112634137B (en) Hyperspectral and panchromatic image fusion method for extracting multiscale spatial spectrum features based on AE
WO2021022929A1 (en) Single-frame image super-resolution reconstruction method
CN109462747B (en) DIBR system cavity filling method based on generation countermeasure network
Cheng et al. Zero-shot image super-resolution with depth guided internal degradation learning
CN113920014A (en) Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method
Zuo et al. Residual dense network for intensity-guided depth map enhancement
CN114972043B (en) Image super-resolution reconstruction method and system based on combined trilateral feature filtering
CN113793286B (en) Media image watermark removing method based on multi-order attention neural network
CN113837946B (en) Lightweight image super-resolution reconstruction method based on progressive distillation network
CN112734644A (en) Video super-resolution model and method combining multiple attention with optical flow
CN114663552B (en) Virtual fitting method based on 2D image
CN113077545B (en) Method for reconstructing clothing human body model from image based on graph convolution
CN112529776A (en) Training method of image processing model, image processing method and device
Bastanfard et al. Toward image super-resolution based on local regression and nonlocal means
CN114049420B (en) Model training method, image rendering method, device and electronic equipment
CN112200719B (en) Image processing method, electronic device, and readable storage medium
CN113724134A (en) Aerial image blind super-resolution reconstruction method based on residual distillation network
CN109272450A (en) A kind of image oversubscription method based on convolutional neural networks
CN116342377A (en) Self-adaptive generation method and system for camouflage target image in degraded scene
CN114494022B (en) Model training method, super-resolution reconstruction method, device, equipment and medium
CN115705616A (en) True image style migration method based on structure consistency statistical mapping framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination