CN113920014A

CN113920014A - Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method

Info

Publication number: CN113920014A
Application number: CN202111240795.XA
Authority: CN
Inventors: 左一帆; 王皓; 姜文晖; 夏雪; 方玉明
Original assignee: Jiangxi University of Finance and Economics
Current assignee: Jiangxi University of Finance and Economics
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-11

Abstract

The invention provides a combined trilateral filter depth map super-resolution reconstruction method based on neural networking, which comprises the steps of obtaining a low-resolution depth map and a high-resolution color map, constructing a neural network model by adopting a progressive up-sampling mode, and respectively extracting depth domain characteristics of the low-resolution depth map and color domain characteristics of the high-resolution color map; fusing color domain features and depth domain features in a content perception mode by adopting a combined trilateral filtering module; based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result; and applying the updated depth domain features to reconstruct and form a high-quality depth map. The method has good robustness and superiority, can reduce errors, and improves the quality of the obtained high-resolution depth map.

Description

Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method

Technical Field

The invention relates to the technical field of image processing, in particular to a method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking.

Background

With the wide application of RGB-D data consisting of RGB color images and depth images in the fields of virtual reality, three-dimensional reconstruction, SLAM, and the like, depth maps have been acquired in real time by consumer-grade depth sensors. However, the production cost of the sensor is high, and the resolution of the obtained original depth map is low, the noise interference is strong, and the application requirements can not be met. Therefore, the reconstruction and enhancement of the low-quality depth map become an essential part of the depth map application process.

In practical applications, for a low-resolution original depth map, it is desirable to perform upsampling on a larger scale, for example, eight times or more. However, in the case of a large up-sampling scale, the super-resolution reconstruction of a single depth map may distort or lose details and local structures of a reconstruction result. In order to improve the performance of the algorithm, researchers propose to extract information from a high-resolution color image or intensity image so as to guide a task of super-resolution reconstruction of a depth image. In some existing studies, a joint bilateral filter and its variants, such as a joint trilateral filter, use an exponential function to calculate the convolution kernel weight of each neighborhood pixel based on the assumption that a color map boundary and a corresponding depth map boundary have consistency. The convolution kernel weight changes along with the change of the pixel position, the color guide information can be fused in a self-adaptive mode, and the high-quality depth map boundary is reconstructed.

The traditional filtering-based depth map super-resolution reconstruction method is developed on the basis of the classical filtering theory of digital image processing, and the depth value of each pixel is independently calculated according to local smooth prior. Based on the assumption that the color boundary and the depth boundary are consistent, researchers explicitly design various predefined functions to compute convolution kernel weights within the local window. The method mainly comprises the following steps: (1) taking the low-resolution depth map and the corresponding high-resolution color map as a target domain and a guide domain, and introducing a nonlinear combined bilateral filter; (2) calculating a convolution kernel weight based on the geodesic distance of the combined image coordinate and color, wherein the convolution kernel weight has better performance in the aspect of edge preservation; (3) in view of the situation that the above assumption is not true, some researchers propose information such as gradient and space of a joint depth map to reduce artifacts of texture copy; (4) establishing a linear relationship between color gradient and depth gradient within the local image block; (5) selecting an optimal depth candidate value according to the minimum loss value to refine the depth map; (6) it is proposed to determine the optimal depth value by the maximum value of the L1 norm optimized joint histogram. While filter-based approaches can adaptively adjust the color guidance information for each pixel location, shallow models based on predefined convolution kernels cannot describe fine-grained correlations between color images and corresponding depth images.

In recent years, due to the strong model expression capability of the deep neural network, the deep neural network has made remarkable progress in the task of color-guided super-resolution reconstruction of the depth map and has a dominant position. Under the guidance of the color domain features, the deep neural network implicitly learns the mapping function from the low-resolution depth map to the high-resolution depth map in a supervised learning mode. Compared with the traditional method, the method based on the deep neural network has greatly improved performance. Various researchers have proposed a variety of implementations, including: (1) and by fusing the multi-scale color guide features, the low-resolution depth domain features are up-sampled step by step. (2) The method comprises the steps of obtaining a multi-scale convolution kernel in a learning mode; (3) local and global residual error learning techniques are adopted to improve the robustness of training; (4) the fidelity and regularization prior of the Markov random field are learned through a deep neural network; (5) a special module is provided for adaptively decomposing high-frequency components in the RGB image to guide depth map reconstruction; (6) it is proposed to mitigate the artifacts of texture copying by an affine transform layer. The deep neural network method described above always fuses color guide features through channel stitching in the test phase, and the weights of the convolution kernels are shared by all locations in each channel and are independent of the input.

In order to be able to adaptively adjust color guidance information during the testing process, various researchers have proposed various implementation methods, including: (1) a progressive multi-branch aggregation network is provided, and a channel attention mechanism is introduced to fuse and splice cross-domain characteristics on channel dimensions; (2) designing a deep neural network to simulate a traditional combined trilateral filter, constructing two sub-networks to respectively extract color and depth domain characteristics and realizing fusion through channel splicing; (3) the attention of a conventional convolution kernel is generated with a function that is predefined in terms of its characteristics.

In the latest methods based on deep neural networks, feature reuse is usually achieved using channel feature stitching. However, in contrast to conventional joint trilateration filters, channel feature stitching does not have content-dependent parameter readjustment characteristics during the test phase. At the same time, the effect of the color guidance feature cannot be adaptively adjusted at all pixel positions. In addition, since the mainstream depth neural network method only considers forward depth domain feature reconstruction from low resolution to high resolution, when the up-sampling scale is large, it is easy to cause accumulation of multi-scale reconstruction errors, which brings challenges to depth map reconstruction.

Disclosure of Invention

Aiming at the defects of the prior art and based on the advantages of the traditional trilateral filter and the deep neural network, the invention provides a combined trilateral filter based on neural networking for a depth map super-resolution reconstruction method.

In order to achieve the purpose, the invention is realized by the following technical scheme:

a combined trilateral filter based on neural networking is used for a depth map super-resolution reconstruction method, and the combined trilateral filter based on neural networking is used for obtaining a low-resolution depth map and a high-resolution color map, a neural network model is constructed in a progressive up-sampling mode, and depth domain features of the low-resolution depth map and color domain features of the high-resolution color map are respectively extracted; fusing color domain features and depth domain features in a content perception mode by adopting a combined trilateral filtering module; based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result; and applying the updated depth domain features to reconstruct and form a high-quality depth map.

Further, acquiring the low-resolution depth map and the high-resolution color map comprises: and dividing a training set and a test set based on the synthetic data set and the real data set, acquiring an image pair of the high-resolution color image, the high-resolution depth image and the corresponding low-resolution depth image, extracting sub-images according to a preset size, and performing random enhancement to obtain training data.

Further, randomly enhancing the data of the sub-image comprises: the data of the sub-image is rotated by 90 degrees, rotated by 180 degrees, vertically flipped, or horizontally flipped.

Further, after the data of the sub-images are randomly enhanced, the randomly enhanced data are normalized.

Further, the step of constructing a neural network model by adopting a progressive up-sampling mode, and the step of respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map comprises the following steps: performing double super-resolution processing on input features in one stage; respectively constructing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for extracting color domain features of the high-resolution color image and then gradually performing down-sampling to generate multi-scale color domain features; and the depth reconstruction branch is used for extracting the depth domain features of the low-resolution depth map and gradually up-sampling to reconstruct the multi-scale depth features.

Further, the color guide branch comprises a shallow feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.

Further, adopting the combined trilateral filtering module to fuse the color domain feature and the depth domain feature in a content perception manner includes: designing a convolution kernel generation sub-network inside the combined trilateral filtering module, obtaining a convolution kernel in a learning mode, wherein the convolution kernel function only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter; and obtaining the color feature guided depth domain features in combination with the initial depth domain features.

Further, the bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution includes: inputting the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages into a bidirectional depth feature fusion unit, wherein the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module; and the cross-scale depth domain features are fused in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and the depth domain features of the current scale are updated by using the uplink combined trilateral filtering module.

Further, a variation of the joint trilateration filtering module generates the convolution kernel directly at the resolution of the target feature domain.

Compared with the prior art, the invention has the beneficial effects that:

the invention discloses a neural-networking-based combined trilateral filter used for a depth map super-resolution reconstruction method, which adopts a variant of a combined trilateral filter module to realize bidirectional fusion of cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution, and updates the depth domain features of a high-resolution depth map by using a fusion result.

Drawings

FIG. 1 is a flowchart of an embodiment of a method for reconstructing super-resolution depth map based on a combined trilateral filter of neural networking according to the present invention.

FIG. 2 is a network topology diagram of an embodiment of a method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking.

Fig. 3 is a topological diagram of a bidirectional depth fusion unit used in an embodiment of the method for reconstructing super-resolution depth map based on a neural-networked combined trilateral filter of the present invention.

Fig. 4 is a topological diagram of a combined trilateral filtering module and its variants according to an embodiment of the present invention, which is based on a neural-networking combined trilateral filter and is used in a depth map super-resolution reconstruction method.

Fig. 5 is a depth map reconstruction effect diagram of the embodiment of the method for reconstructing super-resolution depth map based on neural network combined trilateral filter of the invention under the condition of synthetic data and real data.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the combined trilateral filter based on neural networking for super-resolution depth map reconstruction method firstly executes step S1 to obtain a low-resolution depth map and a high-resolution color map, and adopts a progressive upsampling mode to construct a neural network model, and then extracts depth domain features of the low-resolution depth map and color domain features of the high-resolution color map respectively. Specifically, the embodiment divides a training set and a test set based on a synthetic data set and a real data set, obtains an image pair of a high-resolution color image, a high-resolution depth image and a corresponding low-resolution depth image, extracts sub-images according to a predetermined size, and performs random enhancement to obtain training data.

The method comprises the steps of establishing a neural network model by adopting a progressive up-sampling mode, respectively extracting depth domain features of a low-resolution depth map and color domain features of a high-resolution color map, performing double super-resolution processing on input features in one stage, and respectively establishing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for performing color domain feature extraction on the high-resolution color map, then performing down-sampling step by step to generate multi-scale color domain features, and the depth reconstruction branch is used for performing depth domain feature extraction on the low-resolution depth map, and then performing up-sampling step to reconstruct the multi-scale depth features. Preferably, the color guide branch comprises a shallow feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.

In addition, the data of the sub-images can be randomly enhanced by rotating the data of the sub-images by 90 degrees, rotating the data of the sub-images by 180 degrees, vertically turning the data of the sub-images or horizontally turning the data of the sub-images, and after the data of the sub-images are randomly enhanced, the randomly enhanced data are normalized, that is, all the data need to be normalized before being input into the convolution model.

Then, step S2 is executed to adopt the joint trilateration filtering module to fuse the color domain features and the depth domain features in a content-aware manner. Specifically, when the combined trilateral filtering module is adopted to fuse the color domain features and the depth domain features in a content perception mode, a convolution kernel generation sub-network inside the combined trilateral filtering module is designed firstly, a convolution kernel is obtained in a learning mode, the convolution kernel function only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter, instead of independently learning a single weight of the convolution kernel, and finally the depth domain features guided by the color features are obtained by being combined with the initial depth domain features.

Next, step S3 is performed to achieve bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution based on a variation of the joint trilateration filtering module. Specifically, the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages are input into a bidirectional depth feature fusion unit, and the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module; and the cross-scale depth domain features are fused in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and the depth domain features of the current scale are updated by using the uplink combined trilateral filtering module. And, a variation of the joint trilateration filtering module generates the convolution kernel directly at the resolution of the target feature domain.

Preferably, the variant of the combined trilateral filtering module of this embodiment directly generates a convolution kernel function at the resolution of the target feature domain, and by defining the target domain of the depth domain features, i.e., the high resolution or low resolution feature domain between the inputs, the corresponding variants are respectively referred to as an uplink combined trilateral filtering module and a downlink combined trilateral filtering module.

Then, step S4 is performed to update the depth domain feature of the high resolution depth map using the fusion result. Finally, step S5 is executed to apply the updated depth domain feature reconstruction to form a high quality depth map. Specifically, for a model with a reconstruction scale of 2, a super-resolution reconstructed depth image can be obtained based on a result output in an up-sampling stage, and for a model with a reconstruction scale of more than 2, corresponding depth domain features are updated through a multi-scale bidirectional depth feature fusion module, and a super-resolution reconstructed depth image can be obtained.

The topology structure of the combined trilateration filter based on neural networking for the depth map super-resolution reconstruction method of the present embodiment is described below with reference to fig. 2. This embodiment uses a low resolution depth map D_LRAnd high resolution color map I_HRAs input and by means of a high resolution depth map D_HRLearning the generation function O (theta | D) with theta as a parameter_LR,I_HR) To predict a corresponding high resolution depth map D_SR. Preferably, the topology used in this embodiment includes two branches of color guidance and depth reconstruction, and extracts multi-scale color guidance features respectively, and refines the depth domain features from coarse to fine.

Specifically, the color guide branch includes a shallow feature extraction module 20 and a multi-scale guide feature generation module 15 composed of a plurality of guide feature extraction units 30, 40, 50 of a specific scale.

The high-resolution color image 2 is input into a shallow layer feature extraction module 20, the shallow layer feature extraction module 20 comprises two convolution layers 21 and 22, the two convolution layers 21 and 22 are convolution layers activated by Prelu, color domain feature extraction is carried out on the high-resolution color image 2 through the convolution layers 21 and 22, and the extracted color domain features are used as input of a multi-scale guidance feature generation module 15. For example, the color domain features of the high-resolution color map 2 are extracted using a known shallow feature extraction method.

Each of the guiding feature extracting units 30, 40, 50 has the same structure, and the guiding feature extracting unit 30 includes a convolutional layer 31 and a pooling layer 32. For example, when the image is enlarged by l, i.e. the width and height of the image are both enlarged by l times, the present embodiment gradually up-samples the depth domain features by twice by transposing the convolutional layer. And, the color guide features are progressively downsampled to match the resolution of the corresponding depth domain features. The calculation formula of the guidance feature extraction unit with a specific scale is as follows:

in formula 1

And

respectively represent the m-th

The function and output of each guide feature extraction unit.

The depth reconstruction branch comprises a shallow layer feature extraction module 10, a multi-scale guide feature fusion module 16, a cross-scale depth feature fusion unit 90 and a depth super-resolution reconstruction module 94. The low-resolution depth map 1 is input to a shallow feature extraction module 10, the shallow feature extraction module 10 includes two convolutional layers 11 and 12, both convolutional layers 11 and 12 are convolutional layers activated by Prelu, depth domain feature extraction is performed on the low-resolution depth map 1 through the convolutional layers 11 and 12, and the extracted depth domain features are used as input of a multi-scale guidance feature fusion module 16. For example, the depth domain features of the low resolution depth map 1 are extracted using a known shallow feature extraction method.

The multi-scale guiding feature Fusion module 16 comprises a plurality of guiding feature Fusion units, for example guiding feature Fusion units 60, 70, 80, each of which has the same structure, and taking the guiding feature Fusion unit 60 as an example, the guiding feature Fusion unit 60 comprises a residual dense linking module (RDB)61, a joint trilateral filtering Fusion module (JTF Fusion)62, a convolution layer (Conv)63 and a transposed convolution layer (Trans Conv)64 activated by Prelu.

Each guide feature fusion unit receives a feature output by the guide feature extraction unit. For example, after extracting the color domain features, the shallow layer feature extraction module 20 sequentially outputs the color domain features to the plurality of guidance feature extraction units 30, 40, and 50, and each guidance feature extraction unit 30, 40, and 50 respectively extracts the color domain features of the features output from the previous layer. Therefore, it is necessary to extract the shallow feature by the two convolution layers 21, 22 of the shallow feature extraction module 20

The combined trilateral filtering fusion module 62 is composed of n combined trilateral filtering modules, and the calculation process of the combined trilateral filtering fusion is as follows:

in formula 2, Tconv_m、Conv_m、JTF_m、RDB_mAnd

respectively, indicating transitions activated by PreluAnd the convolution layer 64, the convolution layer 63, the combined trilateral filtering fusion module 62, the residual dense connection module 61 and the output of the mth guide feature fusion unit.

The cross-scale feature fusion unit 90 includes two bidirectional depth feature fusion units 91 and 92, and in order to introduce bidirectional fusion of features in the cross-scale depth domain, the output results of two adjacent guidance feature fusion units are input to one bidirectional depth feature fusion unit in the present embodiment, for example, the output results of the guidance feature fusion units 60 and 70 are input to the bidirectional depth feature fusion unit 92, and the output results of the guidance feature fusion units 70 and 80 are input to the bidirectional depth feature fusion unit 91.

The bidirectional depth feature fusion unit performs bidirectional depth feature fusion calculation as follows:

in the formula 3, the first step is,

a function representing the mth bi-directional depth feature fusion unit,

the updated depth domain feature is used as the input of the next guiding feature fusion unit. The output result of the cross-scale feature fusion unit 90 is input to a depth super-resolution reconstruction module 94, and the depth super-resolution reconstruction module 94 includes a convolutional layer 93.

In addition, the data of the low resolution depth map 1 is also output to a Bicubic (Bicubic) module 95, and a coarse up-sampling depth map D is obtained by Bicubic interpolation calculation_BICThe results of the bicubic interpolation module 95 and the depth super-resolution reconstruction module 94 pass through a convolution layer to generate a high-resolution depth map 3, i.e. a high-resolution depth image D is obtained_SRHigh resolution depth image D_SRThe calculation process of (2) is as follows:

in formula 4, DRB_DSRFunction representing a depth super-resolution reconstruction module 94, D_BICAnd D_SRRepresenting a coarsely up-sampled depth map and a predicted high-resolution depth map.

Because the convolution kernel of the conventional combined trilateral filter calculates the difference of the depth pixel position, the depth gradient and the color value through an exponential function, based on the advantages of the conventional combined trilateral filter, the present embodiment extends the combined trilateral filter definition domain from the original pixel domain to the feature domain as follows:

in the formula 5, the first step is,

the c-th channel representing feature F is a local window of size K x K centered on p,

and

representing the learned convolution kernel weights at the c channel position q of the depth domain features and the color guide features,

corresponding to the fused depth domain features.

The combined trilateral filtering module 62 learns two convolution kernels δ and γ from the depth domain feature and the color domain feature, respectively, and then multiplies δ and γ by elements to obtain a final combined trilateral filtering kernel φ. It follows that the delta and gamma elements vary with the position of the feature, and the role of the color guide feature in all feature dimensions can be adaptively adjusted.

Although designs incorporating the trilateral filtering modules 62, 72, 82 may be possibleThe model representation capability is remarkably improved, but direct and independent learning of all convolution kernel weights in delta and gamma can lead to a sharp increase of parameters, and the model is easy to overfit. To solve this problem, the present embodiment focuses only on fitting of the convolution kernel generation function. B, C, H, W respectively represent the sample number, channel number, height and width dimensions of the feature, and the embodiment designs two lightweight subnetworks with the same structure, thereby realizing the slave feature F e R^B ^×C×H×WTo the core

To (3) is performed. Thus, the trilateral filtering kernel is finally combined

Generate K²One weight corresponds to each feature position (c, h, W), c e C, h e H, W e W. In a particular implementation, the present embodiment fits a convolution kernel generation function using a plurality of convolution layers. To further compress the size of the model, the present embodiment divides the channels into groups, and r channels in a group share one convolution kernel, so that the convolution kernel of the last combined trilateral filtering module is

Furthermore, the present embodiment proposes a novel depth feature fusion unit to fuse the cross-scale depth domain features in a bi-directional manner, i.e. to achieve a bi-directional fusion from low resolution to high resolution and from high resolution to low resolution. Specifically, referring to fig. 3, the bidirectional depth feature fusion unit 91 of this embodiment is composed of two uplink combined trilateral filtering modules 111 and 113 and a downlink combined trilateral filtering module 112, where the uplink combined trilateral filtering modules 111 and 113 and the downlink combined trilateral filtering module 112 are all variants of the combined trilateral filtering module. Specifically, the cross-scale depth domain features are first bi-directionally fused by using the uplink combined trilateral filtering module 111 and the downlink combined trilateral filtering module 112, and then the depth domain features under the current scale are further updated by using the uplink combined trilateral filtering module 113.

Since the resolution of the features across the scale depth domain is different, the combined trilateral filtering module cannot be used directly, so that the present embodiment proposes a variant of the combined trilateral filtering module to generate a convolution kernel directly in the resolution of the target feature domain. And respectively calling the corresponding variants as an uplink combined trilateral filtering module and a downlink combined trilateral filtering module according to the high resolution and the low resolution of the target domain features. In a specific implementation, the difference between the variant and the joint trilateral filtering module is that the second layer of convolutional layers in the convolutional kernel generation subnetwork is replaced with the transposed convolutional layer and convolutional layer, respectively, with step size 2.

Referring to fig. 4, the combined trilateral filtering module and its variants include a kernel generation sub-network 120, a convolution layer 121, a combined trilateral filter module 122, a convolution layer 123, a convolution layer 131, a convolution layer 132, and a convolution layer 133 are disposed in the sub-network 120, after the domain features are input to the convolution layer 121, the convolution results are obtained through the combined trilateral filter module 122 and the convolution layer 123 in sequence, after the target domain features are input to the convolution layer 131, the convolution results are obtained through the convolution layer 132 and the convolution layer 133 in sequence, after the target domain features pass through the combined trilateral filtering kernel 140, the combined trilateral filtering convolution is performed with the features of the image block extraction 145, and the combined trilateral filtering result is obtained through the combined trilateral filtering kernel 142, the convolution layer 143, and the convolution layer 144. Fig. 5 shows a depth map reconstruction effect map under the synthesized data and the real data.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for reconstructing super-resolution depth map by combining a trilateral filter based on neural networking is characterized by comprising the following steps:

acquiring a low-resolution depth map and a high-resolution color map, constructing a neural network model by adopting a progressive up-sampling mode, and respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map;

fusing the color domain features and the depth domain features in a content perception mode by adopting a combined trilateral filtering module;

based on the variation of the combined trilateral filtering module, bidirectional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution is realized, and the depth domain features of the high-resolution depth map are updated by using the fusion result;

and applying the updated depth domain features to reconstruct and form a high-quality depth map.

2. The method for reconstructing super-resolution depth map based on the combined neural network trilateration filter as claimed in claim 1, wherein:

acquiring the low resolution depth map and the high resolution color map comprises: and dividing a training set and a test set based on the synthetic data set and the real data set, acquiring an image pair of the high-resolution color image, the high-resolution depth image and the corresponding low-resolution depth image, extracting sub-images according to a preset size, and performing random enhancement to obtain training data.

3. The method for reconstructing super-resolution depth map based on the combined neural network trilateration filter as claimed in claim 2, wherein:

randomly enhancing the data of the sub-image comprises: and performing 90-degree rotation, 180-degree rotation, vertical turning or horizontal turning on the data of the sub-image.

4. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 3, wherein:

and after the data of the sub-images are randomly enhanced, normalizing the randomly enhanced data.

5. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:

adopting a progressive up-sampling mode to construct a neural network model, and respectively extracting the depth domain characteristics of the low-resolution depth map and the color domain characteristics of the high-resolution color map comprises the following steps:

performing double super-resolution processing on input features in one stage;

respectively constructing a color guide branch and a depth reconstruction branch, wherein the color guide branch is used for extracting color domain features of the high-resolution color image and then gradually performing down-sampling to generate multi-scale color domain features; and the depth reconstruction branch is used for extracting the depth domain features of the low-resolution depth map and gradually up-sampling to reconstruct the multi-scale depth features.

6. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 5, wherein:

the color guide branch comprises a shallow layer feature extraction module and a multi-scale guide feature generation module consisting of a plurality of guide feature extraction units with specific scales.

7. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:

adopting and uniting trilateral filtering module and fusing colour domain characteristic and depth domain characteristic with content perception's mode includes:

designing a convolution kernel generation sub-network inside a combined trilateral filtering module, and obtaining a convolution kernel in a learning mode, wherein the convolution kernel only learns a corresponding domain convolution kernel generation function similar to an exponential function in a traditional combined trilateral filter; and obtaining the color feature guided depth domain features in combination with the initial depth domain features.

8. The method for reconstructing super-resolution depth map based on the combined neural-networking trilateration filter as claimed in any one of claims 1 to 4, wherein:

the bi-directional fusion of the cross-scale depth domain features from low resolution to high resolution and from high resolution to low resolution includes:

inputting the high-resolution depth features and the low-resolution depth features obtained in two adjacent stages into a bidirectional depth feature fusion unit, wherein the bidirectional depth feature fusion unit comprises two uplink combined trilateral filtering modules and a downlink combined trilateral filtering module;

and fusing the cross-scale depth domain features in a bidirectional mode by using the uplink combined trilateral filtering module and the downlink combined trilateral filtering module, and updating the depth domain features of the current scale by using the uplink combined trilateral filtering module.

9. The method for reconstructing super-resolution depth map based on the neural network joint trilateral filter as claimed in claim 8, wherein:

a variation of the joint trilateration filtering module generates convolution kernels directly at the resolution of the target feature domain.