CN114820344A - Depth map enhancement method and device - Google Patents

Depth map enhancement method and device Download PDF

Info

Publication number
CN114820344A
CN114820344A CN202210295510.0A CN202210295510A CN114820344A CN 114820344 A CN114820344 A CN 114820344A CN 202210295510 A CN202210295510 A CN 202210295510A CN 114820344 A CN114820344 A CN 114820344A
Authority
CN
China
Prior art keywords
feature
depth map
map
depth
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210295510.0A
Other languages
Chinese (zh)
Inventor
高跃
徐阳
别林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210295510.0A priority Critical patent/CN114820344A/en
Publication of CN114820344A publication Critical patent/CN114820344A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a depth map enhancement method and device, wherein the method comprises the following steps: acquiring an initial depth map from original visual data, performing multi-scale feature extraction of an alternating convolution and deconvolution module on the initial depth map to obtain a feature view, and performing scale compression and convolution to obtain two-stage feature vectors in sequence; and carrying out spatial transformation and feature extraction on the initial depth map to obtain depth map features, strengthening the depth map features based on the feature vectors of the two stages, recovering the depth structure, generating a feature map, and mapping by a multilayer perceptron to obtain a final depth map. Therefore, the technical problems that in the related technology, due to the fact that a target object needs to meet certain specific geometric symmetry, feature extraction and prediction are conducted, in the process of deep acquisition, completion and densification of geometric features are difficult to conduct through a universal enhancement model, and the conditions of information deficiency and low accuracy exist in acquisition are solved.

Description

Depth map enhancement method and device
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a depth map enhancement method and apparatus.
Background
As technology advances, the acquisition and acquisition of a stereo object model becomes easier, and in particular depth maps have attracted more and more research attention due to their application in various fields such as autopilot and robot.
Due to factors such as low resolution and occlusion of the device, the quality of data of depth directly acquired by a depth scanning device such as a laser radar and a depth camera is generally poor, and low-quality data mainly appears in aspects such as incompleteness of a depth space structure and information redundancy, the enhancement method and system for the depth map become more and more important in practical engineering application.
In the related art, a completion and densification method based on geometric features is adopted, a target object is required to meet certain specific geometric symmetry, and feature extraction and prediction are performed by using an acquired image. However, the geometric differences among real world objects are large, and it is difficult to use a general enhancement model to complete and densify geometric features during the deep acquisition process, so that the conditions of information deficiency and low precision exist in the acquisition, and improvement is urgently needed.
Disclosure of Invention
The application provides a depth map enhancement method and device, which are used for solving the technical problems that in the related technology, as a target object needs to meet certain specific geometric symmetry and then performs feature extraction and prediction, in the process of depth acquisition, a universal enhancement model is difficult to be used for completing and densifying geometric features, and the conditions of incomplete information and low precision exist in acquisition.
An embodiment of a first aspect of the present application provides a depth map enhancement method, including the following steps: acquiring an initial depth map from original visual data; carrying out multi-scale feature extraction of an alternating convolution and deconvolution module on the initial depth map, and gradually recovering by the deconvolution module to obtain a feature view taking three-dimensional coordinates of the depth map as three channels; carrying out scale compression and convolution on the characteristic view to sequentially obtain characteristic vectors of two stages; carrying out spatial transformation and feature extraction on the initial depth map to obtain depth map features, strengthening the depth map features based on the feature vectors of the two stages, recovering a depth structure and generating a first three-channel feature map; and mapping by the multi-layer perceptron of the feature view to obtain a second three-channel feature map, fusing the first three-channel feature map and the second three-channel feature map, and mapping by the multi-layer perceptron to obtain a final depth map.
Optionally, in an embodiment of the present application, the obtaining an initial depth map from raw visual data includes: obtaining a low-quality depth map meeting a preset condition from the original visual data; and preprocessing and feature extraction are carried out on the low-quality depth map and the infrared and depth map original data to generate the initial depth map.
Optionally, in an embodiment of the present application, the deriving a low-quality depth map satisfying a preset condition from the original visual data includes: and acquiring the original visual data based on a preset acquisition visual angle threshold value to obtain the low-quality depth map.
Optionally, in an embodiment of the present application, the performing spatial transformation and feature extraction on the initial depth map to obtain depth map features, and performing enhancement on the depth map features based on the two-stage feature vectors, and recovering a depth structure to generate a first three-channel feature map includes: carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation; extracting a first feature based on the depth after the three-dimensional transformation, and fusing the first feature with a view feature vector of the feature view to obtain a first-stage fusion feature; extracting a second feature based on the first-stage fusion feature, and fusing the second feature with the view feature vector to obtain a second-stage fusion feature; and mapping the second-stage fusion feature multi-layer perceptron to obtain the first three-channel feature map.
An embodiment of a second aspect of the present application provides a depth map enhancing apparatus, including: the acquisition module is used for acquiring an initial depth map from the original visual data; the characteristic extraction module is used for carrying out multi-scale characteristic extraction of the alternative convolution and deconvolution module on the initial depth map, and the deconvolution module carries out gradual recovery to obtain a characteristic view taking the three-dimensional coordinates of the depth map as three channels; the calculation module is used for carrying out scale compression and convolution on the characteristic view to sequentially obtain two stages of characteristic vectors; the enhancement module is used for carrying out spatial transformation and feature extraction on the initial depth map to obtain depth map features, enhancing the depth map features based on the feature vectors of the two stages, recovering a depth structure and generating a first three-channel feature map; and the fusion module is used for obtaining a second three-channel feature map by mapping the feature view multilayer perceptron, fusing the first three-channel feature map and the second three-channel feature map, and obtaining a final depth map by mapping the multilayer perceptron.
Optionally, in an embodiment of the present application, the obtaining module includes: the acquisition unit is used for obtaining a low-quality depth map meeting preset conditions from the original visual data; and the preprocessing unit is used for preprocessing and extracting features of the low-quality depth map and the infrared and depth map original data to generate the initial depth map.
Optionally, in an embodiment of the present application, the obtaining unit is further configured to collect the original visual data based on a preset collection perspective threshold, so as to obtain the low-quality depth map.
Optionally, in an embodiment of the present application, the reinforcement module includes: the three-dimensional transformation unit is used for carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation; the first fusion unit is used for extracting a first feature based on the depth after the three-dimensional transformation, and fusing the first feature with the view feature vector of the feature view to obtain a first-stage fusion feature; the second fusion unit is used for extracting second features based on the first-stage fusion features and fusing the second features and the view feature vectors to obtain second-stage fusion features; and the mapping unit is used for obtaining the first three-channel feature map by mapping the second-stage fusion feature multilayer perceptron.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the depth map enhancement method as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the depth map enhancement method according to any one of claims 1 to 4.
The embodiment of the application can acquire the initial depth map from the original visual data, extract the two-stage characteristic vector and the depth map characteristic from the initial depth map and reinforce the two-stage characteristic vector and the depth map characteristic, recover the depth structure, obtain the final depth map through the mapping of the multilayer perceptron, while keeping the original depth geometric structure, can complete the shape missing area in the original data, and thicken the whole depth map, has the capacity of adapting to the shape missing and the sparseness at different degrees and different angles, can effectively improve the effective output precision of low-precision acquisition equipment, and improve the quality of data acquisition. Therefore, the technical problems that in the related technology, due to the fact that a target object needs to meet certain specific geometric symmetry, feature extraction and prediction are conducted, in the process of deep acquisition, completion and densification of geometric features are difficult to conduct through a universal enhancement model, and the conditions of information deficiency and low accuracy exist in acquisition are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a depth map enhancement method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a depth map enhancement method according to one embodiment of the present application;
FIG. 3 is a schematic diagram of an enhancement network of a depth map enhancement method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a generator of a depth map enhancement method according to one embodiment of the present application;
FIG. 5 is a flow diagram of a depth map enhancement method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a depth map enhancement apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The depth map enhancement method and apparatus according to an embodiment of the present application are described below with reference to the drawings. In order to solve the technical problems that in the related art mentioned in the background technology center, as a target object needs to meet certain specific geometric symmetry, and then feature extraction and prediction are performed, in the process of depth acquisition, it is difficult to use a general enhancement model to perform geometric feature completion and densification, so that the conditions of incomplete information and low precision exist in acquisition The incomplete shape and sparse capability at different angles can effectively improve the effective output precision of low-precision acquisition equipment and improve the quality of data acquisition. Therefore, the technical problems that in the related technology, due to the fact that a target object needs to meet certain specific geometric symmetry, feature extraction and prediction are conducted, in the process of deep acquisition, completion and densification of geometric features are difficult to conduct through a universal enhancement model, and the conditions of information deficiency and low accuracy exist in acquisition are solved.
Specifically, fig. 1 is a schematic flow chart of a depth map enhancement method according to an embodiment of the present disclosure.
As shown in fig. 1, the depth map enhancement method includes the following steps:
in step S101, an initial depth map is acquired from the original visual data.
In an actual implementation process, the depth camera and the camera can be used for matched acquisition to acquire original visual data, and after a view and a sparse and incomplete low-quality depth map are acquired from the original visual data, an initial depth map is acquired after data processing, including but not limited to data processing.
Optionally, in an embodiment of the present application, obtaining an initial depth map from raw visual data includes: obtaining a low-quality depth map meeting preset conditions from the original visual data; and preprocessing and characteristic extraction are carried out on the low-quality depth map and the infrared and depth map original data to generate an initial depth map.
As a possible implementation manner, in the embodiment of the present application, a low-quality depth map satisfying a preset condition may be obtained from original visual data, and the view, infrared and depth map original data obtained from the original data is subjected to preprocessing and feature extraction to generate an initial depth map for processing in subsequent steps. The embodiment of the application takes infrared data as assistance, enhances the data output of the low-power-consumption low-performance depth camera and the laser radar, and is described in detail below.
It should be noted that the preset condition of the low quality depth map may be set by a person skilled in the art according to practical situations, and is not limited in particular here.
Optionally, in an embodiment of the present application, obtaining a low quality depth map satisfying a preset condition from original visual data includes: and acquiring the original visual data based on a preset acquisition visual angle threshold to obtain a low-quality depth map.
Specifically, the depth camera and the camera can be used for matched acquisition, and the original visual data is acquired based on a preset acquisition visual angle threshold value to obtain a low-quality depth map. The method and the device for acquiring the original visual data based on the preset acquisition visual angle threshold value are beneficial to strengthening and densifying the low-quality depth map subsequently, and lay a foundation for generating a high-quality image.
It should be noted that the preset view angle threshold may change correspondingly according to different acquisition targets, and the view angle threshold may be set by a person skilled in the art according to actual situations, which is not limited herein.
In step S102, the initial depth map is subjected to multi-scale feature extraction by the alternating convolution and deconvolution modules, and the deconvolution module performs gradual recovery, so as to obtain a three-dimensional coordinate of the depth map as a three-channel feature view.
Further, the embodiment of the application can perform multi-scale feature extraction of the alternating convolution and deconvolution module on the initial depth map obtained through the steps, and the deconvolution module performs gradual recovery to obtain the feature view with three-dimensional coordinates of the depth map as three channels.
In step S103, scale compression and convolution are performed on the feature view, and two stages of feature vectors are obtained in sequence.
In the actual implementation process, the embodiment of the application can perform scale compression and convolution on the feature view and the infrared features, so that two stages of feature vectors are obtained, extraction of multi-modal features is realized, and subsequent feature fusion and recovery are facilitated to generate a high-precision image.
In step S104, the initial depth map is subjected to spatial transformation and feature extraction to obtain depth map features, the depth map features are enhanced based on the two-stage feature vectors, and the depth structure is restored to generate a first three-channel feature map.
Specifically, the embodiment of the application may perform spatial transformation and feature extraction on the initial depth map obtained in step S101, input the two-stage feature vectors obtained in step S103, strengthen the features of the depth map, and restore the depth structure, thereby generating the first three-channel feature map. The embodiment of the application strengthens and recovers the depth structure through the characteristics of the depth map, is favorable for complementing the shape missing area in the original data while keeping the original depth geometric structure, carries out densification on the whole depth map, has the capacity of adapting to shape missing and sparseness at different degrees and different angles, can effectively improve the effective output precision of low-precision acquisition equipment, and improves the quality of data acquisition.
Optionally, in an embodiment of the present application, performing spatial transformation and feature extraction on the initial depth map to obtain depth map features, and performing enhancement on the depth map features based on the two-stage feature vectors, and recovering a depth structure to generate a first three-channel feature map, including: carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation; extracting a first feature based on the depth after three-dimensional transformation, and fusing the first feature with a view feature vector of a feature view to obtain a first-stage fusion feature; extracting a second feature based on the first-stage fusion feature, and fusing the second feature with the view feature vector to obtain a second-stage fusion feature; and mapping the second-stage fusion feature multi-layer perceptron to obtain a first three-channel feature map.
For example, in the actual implementation process, the depth feature enhancement in the embodiment of the present application includes the following steps:
1. carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation;
2. extracting a first feature according to the depth after three-dimensional transformation, and fusing the first feature with a view feature vector of a feature view to obtain a first-stage fusion feature;
3. extracting a second feature through the first-stage fusion feature, and fusing the second feature with the view feature vector to obtain a second-stage fusion feature;
4. and mapping the second-stage fusion feature multi-layer perceptron to obtain a first three-channel feature map.
For example, as shown in fig. 2, the depth feature enhancement method includes:
the point branches extract features from the points and generate enhanced point features using attention masks generated from image features in the image branches;
the enhanced point elements are then forwarded to the fully connected layer to reconstruct a set of points representing the global geometry that contribute to another subset of the final predicted depth.
Specifically, the embodiment of the present application may convert the original N input points represented in 3D in euclidean space (N × 3) into a fixed dimension C in feature space (N × C), and extract point features using the EdgeConv learning module proposed in DGCNN.
In the spatial transform layer, the embodiment of the present application may align the input point set to the canonical space using an estimated 3 × 3 matrix, and in order to estimate the 3 × 3 matrix, the embodiment of the present application may connect the coordinates of each point and the coordinate differences of k adjacent points thereof using a tensor, where the global feature CA × R of the image feature point feature ARC is a feature enhancement module in the point branch of fig. 2.
It will be appreciated that the feature enhancement module fuses global features from the view modality and geometric local features F extracted from the point modality through the attention mechanism p . Specifically, the K-dimensional feature vector from the first enhanced cell view branch (as shown in the top row of fig. 2) is compressed into Nx 1-dimensional vector after being connected with N-times of the repetition points using MLP, and this vector is normalized by sigmoid functionIs [0,1 ]]Can get an attention mask m a (Nx1), thereafter, by executing m a Element-by-element multiplication with point feature F' to realize enhanced local point feature F p
Further, the embodiment of the application can obtain the local features enhanced by the global image features:
Figure BDA0003561662160000061
final enhancement point feature F e (N × 2 × C) may be determined by matching the local enhancement point feature F' with the pair F p And performing N times of repeated cascade of global point features obtained by average pooling.
In step S105, a second three-channel feature map is obtained by mapping the feature view multilayer perceptron, and the first three-channel feature map and the second three-channel feature map are fused, and the multilayer perceptron maps to obtain a final depth map.
As a possible implementation manner, in the embodiment of the present application, the compressed feature vectors obtained in the above steps may be subjected to multilayer perceptron mapping to obtain a second three-channel feature map, and the first three-channel feature map and the second three-channel feature map obtained in the above steps are fused, and then subjected to multilayer perceptron mapping to obtain a recovered dense and complete depth map. The embodiment of the application can complement the shape missing region in the original data while keeping the original depth geometric structure, and can perform densification on the whole depth map, so that the method has the capacity of adapting to shape missing and sparseness at different degrees and different angles, the effective output precision of low-precision acquisition equipment can be effectively improved, and the quality of data acquisition is improved.
The following describes embodiments of the present application in detail with reference to fig. 2 to 5.
The embodiment of the application comprises the following steps:
step S501: and collecting original visual data. In an actual execution process, the depth camera and the camera can be used, the original visual data are acquired in a matched mode based on a preset acquisition visual angle threshold, the original visual data are acquired, and a view and a sparse and incomplete low-quality depth map are acquired from the original visual data.
It should be noted that the preset view angle threshold may change correspondingly according to different acquisition targets, and the view angle threshold may be set by a person skilled in the art according to actual situations, which is not limited herein.
Step S502: preprocessing of raw data and feature extraction. As a possible implementation manner, in the embodiment of the present application, a low-quality depth map satisfying a preset condition may be obtained from original visual data, and the view, infrared and depth map original data obtained from the original data is subjected to preprocessing and feature extraction to generate an initial depth map for processing in subsequent steps.
It should be noted that the preset condition of the low quality depth map may be set by a person skilled in the art according to practical situations, and is not limited in particular here.
Step S503: and acquiring a characteristic view taking the three-dimensional coordinates of the depth map as three channels. Further, the embodiment of the application can perform multi-scale feature extraction of the alternating convolution and deconvolution module on the initial depth map obtained through the steps, and the deconvolution module performs gradual recovery to obtain the feature view with three-dimensional coordinates of the depth map as three channels.
Step S504: and acquiring the feature vectors of the two stages. In the actual implementation process, the embodiment of the application can perform scale compression and convolution on the feature view and the infrared features, so that two stages of feature vectors are obtained, extraction of multi-modal features is realized, and subsequent feature fusion and recovery are facilitated to generate a high-precision image.
Step S505: and performing depth map feature enhancement and recovering a depth structure. Specifically, the embodiment of the application may perform spatial transformation and feature extraction on the initial depth map obtained in step S101, input the two-stage feature vectors obtained in step S103, strengthen the features of the depth map, and restore the depth structure, thereby generating the first three-channel feature map. The embodiment of the application strengthens and recovers the depth structure through the characteristics of the depth map, is favorable for complementing the shape missing area in the original data while keeping the original depth geometric structure, carries out densification on the whole depth map, has the capacity of adapting to shape missing and sparseness at different degrees and different angles, can effectively improve the effective output precision of low-precision acquisition equipment, and improves the quality of data acquisition.
In an actual implementation process, the depth feature enhancement performed in the embodiment of the present application includes the following steps:
1. carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation;
2. extracting a first feature according to the depth after three-dimensional transformation, and fusing the first feature with a view feature vector of a feature view to obtain a first-stage fusion feature;
3. extracting a second feature through the first-stage fusion feature, and fusing the second feature with the view feature vector to obtain a second-stage fusion feature;
4. and mapping the second-stage fusion feature multi-layer perceptron to obtain a first three-channel feature map.
For example, as shown in fig. 2, the depth feature enhancement method includes:
the point branches extract features from the points and generate enhanced point features using attention masks generated from image features in the image branches;
the enhanced point elements are then forwarded to the fully connected layer to reconstruct a set of points representing the global geometry that contribute to another subset of the final predicted depth.
Specifically, the embodiment of the present application may convert the original N input points represented in 3D in euclidean space (N × 3) into a fixed dimension C in feature space (N × C), and extract point features using the EdgeConv learning module proposed in DGCNN.
In the spatial transform layer, the embodiment of the present application may align the input point set to the canonical space using an estimated 3 × 3 matrix, and in order to estimate the 3 × 3 matrix, the embodiment of the present application may connect the coordinates of each point and the coordinate differences of k adjacent points thereof using a tensor, where the global feature CA × R of the image feature point feature ARC is a feature enhancement module in the point branch of fig. 2.
It will be appreciated that the feature enhancement module fuses global features from the view modality and geometric local features F extracted from the point modality through the attention mechanism p . Specifically, the K-dimensional feature vector from the first enhanced cell view branch (as shown in the top row of FIG. 2) is compressed into an Nx 1-dimensional vector after being connected with N times of the repetition points using MLP, and this vector is normalized to [0,1 ] by sigmoid function]Can get an attention mask m a (Nx1), thereafter, by executing m a And point feature F Element by element multiplication of (a) to (b) to achieve enhanced local point feature F p
Further, the embodiment of the application can obtain the local features enhanced by the global image features:
Figure BDA0003561662160000081
final enhancement point feature F e (N × 2 × C) may be determined by matching the local enhancement point feature F' with the pair F p And performing N times of repeated cascade of global point features obtained by average pooling.
Step S506: and fusing the three-channel characteristic diagram, and mapping by a multilayer perceptron to obtain a recovered dense and complete depth diagram. As a possible implementation manner, in the embodiment of the present application, the compressed feature vectors obtained in the above steps may be subjected to multilayer perceptron mapping to obtain a second three-channel feature map, and the first three-channel feature map and the second three-channel feature map obtained in the above steps are fused, and then subjected to multilayer perceptron mapping to obtain a recovered dense and complete depth map. The embodiment of the application can complement the shape missing region in the original data while keeping the original depth geometric structure, and can perform densification on the whole depth map, so that the method has the capacity of adapting to shape missing and sparseness at different degrees and different angles, the effective output precision of low-precision acquisition equipment can be effectively improved, and the quality of data acquisition is improved.
Further, the depth map enhancement method of the embodiment of the present application mainly includes the following aspects:
1. and (4) multi-modal feature extraction, fusion and recovery generation. The deep enhancement network is a countermeasure architecture including a generator and a discriminator, and as shown in fig. 3, the enhancement network is a countermeasure architecture including a generator and a discriminator. As shown in fig. 4, the generator is a cascaded structure of two enhancement units with similar structure, each enhancement unit consisting of three parallel functional branches, with a low quality depth X and a single view reference image I as inputs: view branches, point branches and blend branches that predict point sets by processing view, infrared features, depth features and multimodal image point blend features, respectively.
As shown in fig. 4, the depth generator has two cascaded enhancement units, the first unit takes the original image I and the depth map X as input directly and transmits them into the potential embedding space, and then forwards the elements to the next unit; the second cell is functionally similar to the first cell, but the output is a three-dimensional set of points X. From a dataflow perspective, the generator includes three parallel branches, predicting depth from view, depth and multimodal image point features, respectively, and the output depth is a union of the predicted point sets of the three branches.
The view branch takes the image and the infrared image as input, and extracts geometric features of different levels step by step.
Depth branching, point branching extracts features from points and generates enhanced point features using attention masks generated from image features in image branches, forwarding the enhanced point elements to fully connected layers to reconstruct a set of points representing the global geometry that contributes to another subset of the final predicted depth.
And merging branches, wherein the merging branches mainly merge two characteristic flows.
2. A depth discriminator is generated. The countermeasure training facilitates the development of a series of representations in terms of image representation and generation applications, but little work is done in applying this architecture to depth completion, so the embodiments of the present application can add a discriminator to perform countermeasure training in the framework for the purpose of identifying true and false depth. The embodiment of the application can train the network by using a joint loss function considering integrity and distribution uniformity to obtain a 'coarse' complete depth, then finely adjust the network by using a resistance loss to realize a 'fine' complete depth, and use PointNet as a binary classification network of a discriminator to distinguish a prediction result from a generation set X or an actual set Y. The embodiment of the application can be trained by using resistance loss in an improved Wasserstein GAN:
Figure BDA0003561662160000091
wherein D represents a set of 1-Lipschitz functions, y to PY and
Figure BDA0003561662160000092
the point set samples extracted from the generated data and the actual data, respectively, and the polynomial in the second term is a gradient penalty.
According to the depth map enhancement method provided by the embodiment of the application, the initial depth map can be obtained from the original visual data, the two-stage feature vector and the depth map feature are extracted from the initial depth map and are enhanced, the depth structure is recovered, the final depth map is obtained through the mapping of the multilayer perceptron, the original depth geometric structure is maintained, meanwhile, the shape missing area in the original data can be completed, the whole depth map is densified, the method has the capability of adapting to the shape missing and the sparseness of different degrees and different angles, the effective output precision of low-precision acquisition equipment can be effectively improved, and the quality of data acquisition is improved. Therefore, the technical problems that in the related art, when data are acquired, due to the fact that geometric differences of objects are large, feature completion is difficult to perform through geometric symmetry, acquired data have the conditions of information deficiency and low precision, and high-precision image output is difficult to achieve are solved.
Next, a depth map enhancing apparatus proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 6 is a block diagram of a depth map enhancement device according to an embodiment of the present application.
As shown in fig. 6, the depth map enhancing apparatus 10 includes: the system comprises an acquisition module 100, a feature extraction module 200, a calculation module 300, a reinforcement module 400 and a fusion module 500.
In particular, the acquisition module 100 is configured to acquire an initial depth map from the raw visual data.
And the feature extraction module 200 is configured to perform multi-scale feature extraction of the alternating convolution and deconvolution modules on the initial depth map, and perform gradual recovery by the deconvolution module to obtain a feature view with three-dimensional coordinates of the depth map as three channels.
And the calculating module 300 is configured to perform scale compression and convolution on the feature view to sequentially obtain two stages of feature vectors.
And the enhancing module 400 is configured to perform spatial transformation and feature extraction on the initial depth map to obtain depth map features, enhance the depth map features based on the two-stage feature vectors, recover the depth structure, and generate a first three-channel feature map.
And the fusion module 500 is used for obtaining a second three-channel feature map by mapping the feature view multilayer perceptron, fusing the first three-channel feature map and the second three-channel feature map, and obtaining a final depth map by mapping the multilayer perceptron.
Optionally, in an embodiment of the present application, the obtaining module 100 includes: the device comprises an acquisition unit and a preprocessing unit.
The acquisition unit is used for obtaining a low-quality depth map meeting preset conditions from the original visual data.
And the preprocessing unit is used for preprocessing the low-quality depth map and the original data of the infrared and depth maps and extracting characteristics to generate an initial depth map.
Optionally, in an embodiment of the present application, the obtaining unit is further configured to collect the original visual data based on a preset collection perspective threshold, so as to obtain a low-quality depth map.
Optionally, in an embodiment of the present application, the reinforcement module 400 includes: the three-dimensional transformation unit, the first fusion unit, the second fusion unit and the mapping unit.
And the three-dimensional transformation unit is used for carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation.
And the first fusion unit is used for extracting a first feature based on the depth after three-dimensional transformation, and fusing the first feature and the view feature vector of the feature view to obtain a first-stage fusion feature.
And the second fusion unit is used for extracting second features based on the first-stage fusion features and fusing the second features and the view feature vectors to obtain second-stage fusion features.
And the mapping unit is used for obtaining a first three-channel feature map by mapping the second-stage fusion feature multilayer perceptron.
It should be noted that the foregoing explanation of the depth map enhancement method embodiment is also applicable to the depth map enhancement apparatus of this embodiment, and is not repeated here.
According to the depth map enhancement device provided by the embodiment of the application, an initial depth map can be obtained from original visual data, two-stage feature vectors and depth map features are extracted from the initial depth map and are enhanced, a depth structure is recovered, a final depth map is obtained through mapping of a multilayer perceptron, while the original depth geometric structure is maintained, a shape missing region in the original data can be completed, the whole depth map is densified, the depth map enhancement device has the capacity of adapting to shape missing and sparseness of different degrees and different angles, the effective output precision of low-precision acquisition equipment can be effectively improved, and the quality of data acquisition is improved. Therefore, the technical problems that in the related art, when data are acquired, due to the fact that geometric differences of objects are large, feature completion is difficult to perform through geometric symmetry, acquired data have the conditions of information deficiency and low precision, and high-precision image output is difficult to achieve are solved.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 701, processor 702, and a computer program stored on memory 701 and executable on processor 702.
The processor 702, when executing the program, implements the depth map enhancement method provided in the embodiments described above.
Further, the electronic device further includes:
a communication interface 703 for communication between the memory 701 and the processor 702.
A memory 701 for storing computer programs operable on the processor 702.
The memory 701 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the communication interface 703, the memory 701 and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may complete mutual communication through an internal interface.
The processor 702 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the depth map enhancement method as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A depth map enhancement method is characterized by comprising the following steps:
acquiring an initial depth map from original visual data;
carrying out multi-scale feature extraction of an alternating convolution and deconvolution module on the initial depth map, and gradually recovering by the deconvolution module to obtain a feature view taking three-dimensional coordinates of the depth map as three channels;
carrying out scale compression and convolution on the characteristic view to sequentially obtain characteristic vectors of two stages;
carrying out spatial transformation and feature extraction on the initial depth map to obtain depth map features, strengthening the depth map features based on the feature vectors of the two stages, recovering a depth structure and generating a first three-channel feature map; and
and mapping by the multilayer perceptron of the characteristic view to obtain a second three-channel characteristic map, fusing the first three-channel characteristic map and the second three-channel characteristic map, and mapping by the multilayer perceptron to obtain a final depth map.
2. The method of claim 1, wherein the obtaining an initial depth map from raw visual data comprises:
obtaining a low-quality depth map meeting a preset condition from the original visual data;
and preprocessing and feature extraction are carried out on the low-quality depth map and the infrared and depth map original data to generate the initial depth map.
3. The method according to claim 1, wherein said deriving a low quality depth map from said raw visual data, which satisfies a predetermined condition, comprises:
and acquiring the original visual data based on a preset acquisition visual angle threshold value to obtain the low-quality depth map.
4. The method of claim 1, wherein the performing spatial transformation and feature extraction on the initial depth map to obtain depth map features, and performing enhancement on the depth map features and recovering a depth structure based on the two-stage feature vectors to generate a first three-channel feature map comprises:
carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation;
extracting a first feature based on the depth after the three-dimensional transformation, and fusing the first feature with a view feature vector of the feature view to obtain a first-stage fusion feature;
extracting a second feature based on the first-stage fusion feature, and fusing the second feature with the view feature vector to obtain a second-stage fusion feature;
and mapping the second-stage fusion feature multilayer perceptron to obtain the first three-channel feature map.
5. A depth map enhancement apparatus, comprising:
the acquisition module is used for acquiring an initial depth map from the original visual data;
the characteristic extraction module is used for carrying out multi-scale characteristic extraction of the alternative convolution and deconvolution module on the initial depth map, and the deconvolution module carries out gradual recovery to obtain a characteristic view taking the three-dimensional coordinates of the depth map as three channels;
the calculation module is used for carrying out scale compression and convolution on the characteristic view to sequentially obtain two stages of characteristic vectors;
the enhancement module is used for carrying out spatial transformation and feature extraction on the initial depth map to obtain depth map features, enhancing the depth map features based on the feature vectors of the two stages, recovering a depth structure and generating a first three-channel feature map; and
and the fusion module is used for obtaining a second three-channel feature map by mapping the feature view multilayer perceptron, fusing the first three-channel feature map and the second three-channel feature map, and mapping the multilayer perceptron to obtain a final depth map.
6. The apparatus of claim 5, wherein the obtaining module comprises:
the acquisition unit is used for obtaining a low-quality depth map meeting preset conditions from the original visual data;
and the preprocessing unit is used for preprocessing and extracting features of the low-quality depth map and the infrared and depth map original data to generate the initial depth map.
7. The apparatus of claim 6, wherein the obtaining unit is further configured to acquire the raw visual data based on a preset acquisition view threshold, resulting in the low-quality depth map.
8. The apparatus of claim 5, wherein the strengthening module comprises:
the three-dimensional transformation unit is used for carrying out space three-dimensional transformation on the initial depth map to obtain the depth after three-dimensional transformation;
the first fusion unit is used for extracting a first feature based on the depth after the three-dimensional transformation, and fusing the first feature with the view feature vector of the feature view to obtain a first-stage fusion feature;
the second fusion unit is used for extracting second features based on the first-stage fusion features and fusing the second features and the view feature vectors to obtain second-stage fusion features;
and the mapping unit is used for obtaining the first three-channel feature map by mapping the second-stage fusion feature multilayer perceptron.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor executing the program to implement the depth map enhancement method of any one of claims 1-4.
10. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing the depth map enhancement method as claimed in any one of claims 1 to 4.
CN202210295510.0A 2022-03-23 2022-03-23 Depth map enhancement method and device Pending CN114820344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210295510.0A CN114820344A (en) 2022-03-23 2022-03-23 Depth map enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210295510.0A CN114820344A (en) 2022-03-23 2022-03-23 Depth map enhancement method and device

Publications (1)

Publication Number Publication Date
CN114820344A true CN114820344A (en) 2022-07-29

Family

ID=82530110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210295510.0A Pending CN114820344A (en) 2022-03-23 2022-03-23 Depth map enhancement method and device

Country Status (1)

Country Link
CN (1) CN114820344A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102691078B1 (en) * 2023-02-27 2024-08-05 고려대학교 산학협력단 Method and apparatus for generating image based on denoising diffusion model reflecting geometric information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362511A1 (en) * 2018-05-23 2019-11-28 Apple Inc. Efficient scene depth map enhancement for low power devices
CN111080688A (en) * 2019-12-25 2020-04-28 左一帆 Depth map enhancement method based on depth convolution neural network
US20210150726A1 (en) * 2019-11-14 2021-05-20 Samsung Electronics Co., Ltd. Image processing apparatus and method
CN113436220A (en) * 2021-05-28 2021-09-24 华东师范大学 Image background estimation method based on depth map segmentation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362511A1 (en) * 2018-05-23 2019-11-28 Apple Inc. Efficient scene depth map enhancement for low power devices
US20210150726A1 (en) * 2019-11-14 2021-05-20 Samsung Electronics Co., Ltd. Image processing apparatus and method
CN111080688A (en) * 2019-12-25 2020-04-28 左一帆 Depth map enhancement method based on depth convolution neural network
CN113436220A (en) * 2021-05-28 2021-09-24 华东师范大学 Image background estimation method based on depth map segmentation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SARAH XU: "Light Field Depth Estimation With Multi-Layer Perceptron", HTTPS://GITHUB.COM/YSX001/EE367-LIGHTFIELD-DEPTH, 31 December 2021 (2021-12-31), pages 1 - 5 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102691078B1 (en) * 2023-02-27 2024-08-05 고려대학교 산학협력단 Method and apparatus for generating image based on denoising diffusion model reflecting geometric information

Similar Documents

Publication Publication Date Title
Sindagi et al. Mvx-net: Multimodal voxelnet for 3d object detection
Chen et al. Multi-view 3d object detection network for autonomous driving
Qi et al. Review of multi-view 3D object recognition methods based on deep learning
CN113205466B (en) Incomplete point cloud completion method based on hidden space topological structure constraint
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
WO2023040247A1 (en) Road area image recognition method based on image and point cloud fusion network
CN110929736A (en) Multi-feature cascade RGB-D significance target detection method
Yang et al. Road detection via deep residual dense u-net
JP2019159940A (en) Point group feature extraction device, point group feature extraction method, and program
CN114663514B (en) Object 6D attitude estimation method based on multi-mode dense fusion network
Bhattacharya et al. Interleaved deep artifacts-aware attention mechanism for concrete structural defect classification
Tao et al. Pseudo-mono for monocular 3d object detection in autonomous driving
Wan et al. Mffnet: Multi-modal feature fusion network for vdt salient object detection
CN114820344A (en) Depth map enhancement method and device
Ge et al. WGI-Net: A weighted group integration network for RGB-D salient object detection
Meng et al. Kgnet: Knowledge-guided networks for category-level 6d object pose and size estimation
CN118379537A (en) Universal topological neural network layer method based on continuous coherent theory
Zeng et al. Deep superpixel convolutional network for image recognition
Zou et al. Gpt-cope: A graph-guided point transformer for category-level object pose estimation
CN110334237B (en) Multi-mode data-based three-dimensional object retrieval method and system
CN117351078A (en) Target size and 6D gesture estimation method based on shape priori
CN111402429A (en) Scale reduction and three-dimensional reconstruction method, system, storage medium and equipment
CN114913330B (en) Point cloud component segmentation method and device, electronic equipment and storage medium
CN111047571B (en) Image salient target detection method with self-adaptive selection training process
CN112733934A (en) Multi-modal feature fusion road scene semantic segmentation method in complex environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination