WO2022045495A1 - Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter - Google Patents

Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter Download PDF

Info

Publication number
WO2022045495A1
WO2022045495A1 PCT/KR2021/000997 KR2021000997W WO2022045495A1 WO 2022045495 A1 WO2022045495 A1 WO 2022045495A1 KR 2021000997 W KR2021000997 W KR 2021000997W WO 2022045495 A1 WO2022045495 A1 WO 2022045495A1
Authority
WO
WIPO (PCT)
Prior art keywords
tensor
decoder
convolver
layer
depth map
Prior art date
Application number
PCT/KR2021/000997
Other languages
English (en)
Inventor
Sergey Stanislavovich ZAVALISHIN
Maksim Alexandrovich PENKIN
Aleksei Mikhailovich GRUZDEV
Evgeny Andreevich Dorokhov
Artur Andreevich BEGAEV
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2022045495A1 publication Critical patent/WO2022045495A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates, in general, to the fields of computer vision, depth map completion using artificial intelligence and machine learning for depth map reconstruction, in particular, to methods for depth map reconstruction and an electronic computing device for implementing the same.
  • Some methods use a high-precision multi-beam depth sensor and an image to reconstruct the depth map from said image.
  • a high-precision depth sensor provides high quality depth data that densely covers substantially the whole image field and from which a high quality depth map can be obtained.
  • the use of high-precision depth sensor in devices such as mobile phone, smartphone, tablet computer, robot vacuum cleaner, autonomous vehicle control device, etc. makes the device too expensive and can significantly increase the size of the device.
  • Other methods use a simple low-precision depth sensor and an image to reconstruct a depth map on that image. Such depth sensors provide low quality depth data with low data density, which is then processed by an artificial intelligence tool to obtain a depth map of the required quality.
  • Obtaining a high quality depth map from low quality depth data requires a device with high processing power, such as a computing device with a high performance processor, high quality graphics card, and large memory.
  • a device with high processing power such as a computing device with a high performance processor, high quality graphics card, and large memory.
  • Devices such as mobile phone, smartphone, tablet computer, robot vacuum cleaner, autonomous vehicle control device, etc. do not have the characteristics of devices with high processing power, so it is difficult to obtain a high-quality depth map on them.
  • the process of depth map reconstruction with low quality depth data is too slow to work in real time to obtain a depth map with acceptable quality.
  • Patent application CN102938142A published on 20.02.2013 and entitled “METHOD FOR FILLING INDOOR LIGHT DETECTION AND RANGING (LIDAR) MISSING DATA BASED ON KINECT", offers a technical solution to supplement the sparse depth data acquired by a laser depth sensor such as a Lidar with Kinect depth data.
  • the Kinect device is expensive and integration of the Kinect device into devices such as a mobile phone, smartphone, tablet computer, robot vacuum cleaner, autonomous vehicle control device, etc. will lead to a significant increase in the size of such devices.
  • this technical solution uses a multi-channel Lidar as well as an iterative closest algorithm, which reduces performance of the device and does not allow the real time depth map reconstruction.
  • Patent application US20190004535A1 published on 03.01.2019 and entitled "HIGH RESOLUTION 3D POINT CLOUDS GENERATION BASED ON CNN AND CRF MODELS", offers a technical solution for depth map reconstruction using a convolutional neural network that converts an image acquired by a camera and the corresponding sparse point cloud acquired by a Lidar into a dense depth map.
  • the proposed technical solution uses a ResNet-like architecture, which is too slow to implement this approach in real time.
  • the described architecture is not designed to use pre-trained image features, which reduces its ability to be trained on a small dataset.
  • this technical solution uses a high-performance graphics card and the post-processing of the proposed algorithm uses iterative processes that do not allow real-time calculations.
  • Patent Application CN108961390A published on 07.12.2018 and entitled "REAL-TIME 3D RECONSTRUCTION METHOD BASED ON DEPTH MAP” offers a technical solution that uses an RGB-D camera to extract sparse depth points that are converted into a dense depth map using texture information received by the RGB camera.
  • RGB-D camera in a device makes it more complicated and expensive, as an RGB-D camera requires the installation of an additional powerful light emitting diode, and the system must be calibrated for the entire device to work correctly.
  • the proposed technical solution uses an algorithm without the use of artificial intelligence, which is adapted to the specific configuration of the device and is poorly generalized for input data from other depth sensors.
  • Patent application CN106651925A published on 10.05.2017 and entitled “COLOR DEPTH IMAGE OBTAINING METHOD AND DEVICE”, offers a technical solution in which the data, being a point cloud covering the whole image, is received by a high quality depth sensor such as an RGB-D camera, and is converted into a dense depth map.
  • the method mainly focuses on RGB and depth data alignment. Using an RGB-D camera in the device makes it more complex and expensive.
  • the currently existing methods for depth map reconstruction have the following drawbacks.
  • the most of the methods use depth data that covers the whole image.
  • Non-intellectual intelligence methods such as neural networks, are designed to use only a particular sensor types and require calibration of the device implementing the method.
  • Intellectual intelligence methods use depth data distributed uniformly across the image as input.
  • the most of artificial intelligence tools require a lot of training data.
  • the most of trained artificial intelligence tools contain a large number of parameters, such as weights, which requires a lot of memory.
  • Highly accurate depth maps require a complex architecture of the processing algorithm, which makes it impossible to obtain highly accurate depth maps in real time or essentially in real time.
  • the present invention has been created to overcome at least one of the above-described drawbacks and to provide at least one of the advantages described below.
  • One aspect of the present invention provides a method for depth map reconstruction, the method comprising: acquiring (S101) an image and a sparse depth map containing depth data being the depths in no more than two planes perpendicular to an image plane in the acquired image; calculating (S102) an image feature tensor by a trained image encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers by extracting image features from the acquired image; calculating (S103) a depth feature tensor by a trained depth encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers, by extracting depth features from the acquired sparse depth map; calculating (S104) a concatenated feature tensor by a converter by concatenating the image feature tensor and the depth feature tensor, and converting the concatenated feature tensor by the converter using a predetermined parameter specifying the size of the converted concaten
  • the image encoder is one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted concatenated feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder further comprises at least one convolver before each of the at least one upsampling unit, wherein each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, wherein the step (S105) further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver
  • each of the activation layers in the image encoder, depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Another aspect of the present invention provides a method for depth map reconstruction, the method comprising: acquiring (S201) a sparse depth map containing depth data in an image; calculating (S203) a depth feature tensor by a trained depth encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers by extracting depth features from the acquired sparse depth map; converting (S204) a depth feature tensor by a converter using a predetermined parameter specifying the size of the converted depth feature tensor; predicting (S205) a gradient depth map by a trained decoder using the converted depth feature tensor as an input tensor, wherein the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and
  • the predetermined parameter specifying the size of the converted depth feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder further comprises at least one convolver before each of the at least one upsampling unit, wherein each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, wherein the step (S205) further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver,
  • each of the activation layers in the depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Another aspect provides a method for depth map reconstruction, the method comprising: acquiring (S301) an image; calculating (S302) an image feature tensor by a trained image encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers by extracting image features from the acquired image; converting (S304) the image feature tensor by a converter using a predetermined parameter specifying the size of the converted image feature tensor; predicting (S305) a gradient depth map by a trained decoder using the converted image feature tensor and the tensor obtained on each of the at least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder, wherein the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer
  • the image encoder is one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted image feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder further comprises at least one convolver before each of the at least one upsampling unit, wherein each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, wherein the step (S305) further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver
  • each of the activation layers in the image encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Another aspect provides a method for depth map reconstruction, the method comprising: acquiring (S401) an image and a sparse depth map containing depth data being the depths of objects set by the user in the acquired image; calculating (S402) an image feature tensor by a trained image encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers by extracting image features from the acquired image; calculating (S403) a depth feature tensor by a trained depth encoder comprising at least one convolutional layer, at least one activation layer and at least two downsampling layers, by extracting depth features from the acquired sparse depth map; calculating (S404) the concatenated feature tensor by a converter by concatenating the image feature tensor and the depth feature tensor, and converting the concatenated feature tensor by the converter using a predetermined parameter specifying the size of the converted concatenated feature tensor; predicting (S405)
  • the image encoder is one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted concatenated feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder further comprises at least one convolver before each of the at least one upsampling unit, wherein each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel, and a second activation layer, wherein the step (S405) further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convol
  • each of the activation layers in the image encoder, depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Another aspect provides an electronic computing device, comprising: at least one processor; and a memory storing numerical parameters of a trained image encoder, a trained depth encoder, a converter, a trained decoder, an iterative spatial distribution unit, and instructions that, when executed by the at least one processor, cause the at least one processor to perform any of the above methods for depth map reconstruction.
  • the electronic computing device further comprises at least one of a camera, a depth sensor, and a user input device.
  • the aim of the present invention is to provide methods for depth map reconstruction and an electronic computing device for implementing the same, which allow obtaining at least one of the following advantages:
  • the above advantages provide implementation of methods for depth map reconstruction on devices with low computational power to obtain a high quality depth map from sparse depth data that does not cover the whole image, in real time or substantially in real time with very low delay.
  • Fig. 1 is a block diagram of an artificial intelligence tool for implementing a first embodiment of a method 100 for depth map reconstruction.
  • Fig. 2 is a block diagram of an artificial intelligence tool for implementing a second embodiment of a method 200 for depth map reconstruction.
  • Fig. 3 is a block diagram of an artificial intelligence tool for implementing a third embodiment of a method 300 for depth map reconstruction.
  • Fig. 4 is a block diagram of an artificial intelligence tool for implementing a fourth embodiment of a method 400 for depth map reconstruction.
  • Fig. 5 is a schematic diagram of an interleaving operation using slicing a tensor into sub-tensors in a decoder.
  • Fig. 6 is a flowchart of the first embodiment of the method 100 for depth map reconstruction.
  • Fig. 7 is a flowchart of the second embodiment of the method 200 for depth map reconstruction.
  • Fig. 8 is a flowchart of the third embodiment of the method 300 for depth map reconstruction.
  • Fig. 9 is a flowchart of the fourth embodiment of the method 400 for depth map reconstruction.
  • Fig. 10 is a block diagram of an electronic computing device 500 for performing any of the claimed embodiments of the method for depth map reconstruction.
  • the proposed methods may be performed on an electronic computing device 500, which may be, for example, a mobile phone, a smartphone, a tablet computer, a robot vacuum cleaner, an autonomous vehicle control device, and so on.
  • the electronic computing device 500 comprises at least one or more processors 501 and memory 502.
  • the electronic computing device 500 comprises one or more modules for performing method steps. At least one of the plurality of modules may be implemented by an artificial intelligence (AI) model. AI related function can be performed using non-volatile memory, volatile memory and a processor.
  • AI artificial intelligence
  • One or more processors may include a general purpose processor such as a central processing unit (CPU), an application processor (AP), or the like, a graphics processor such as a graphics processing unit (GPU), an image processing unit (VPU) and/or a specialized AI processor such as a neural processor (NPU).
  • a general purpose processor such as a central processing unit (CPU), an application processor (AP), or the like
  • a graphics processor such as a graphics processing unit (GPU), an image processing unit (VPU) and/or a specialized AI processor such as a neural processor (NPU).
  • One or more processors control the processing of input data in accordance with a specified rule of operation or AI model stored in non-volatile memory and volatile memory.
  • the specified rule of operation or AI model is provided through training.
  • training means that by applying a training algorithm to a set of training data, the specified operation rule or AI model of the required characteristic is created.
  • the training can be performed in the device itself, which contains the AI in accordance with the embodiment, and/or can be implemented via a separate server/system.
  • the AI model can be composed of multiple layers of a neural network. Each layer has multiple weights and performs a layer operation using calculation of the previous layer and multiple weights.
  • Examples of neural networks include, but are not limited to, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), Generative Adversarial Networks (GAN) and deep Q-networks.
  • a training algorithm is a method for training a predetermined target device using a plurality of training data to cause, enable or control the target device to perform determination or prediction.
  • Examples of the training algorithms include, but are not limited to, supervised training, unsupervised training, partially supervised training or reinforcement training.
  • the present invention is intended for depth map reconstruction basing on depth and/or image data.
  • the present invention can be implemented on an electronic computing device with low computational power such as a mobile phone, smartphone, tablet computer, robot cleaner, autonomous vehicle control device, etc.
  • the present invention provides acquisition of a high quality dense depth map from sparse depth data, that does not cover the whole image, in real time or substantially in real time with very low delay.
  • a dense high quality depth map with a size of 512x256 pixels can be obtained on a Samsung smartphone with an Exynos 8895 processor in 10 milliseconds.
  • Fig. 1 shows a block diagram of an artificial intelligence tool for implementing a method 100 for depth map reconstruction.
  • the artificial intelligence tool comprises an image encoder, a depth encoder, a converter, a decoder and an iterative spatial distribution unit.
  • the artificial intelligence tool is contained in the electronic computing device 500 that is configured to implement the method 100 for depth map reconstruction.
  • the artificial intelligence tool is pre-trained by inputting training data containing different images and sparse depth maps of these images. Such training is known in the art and therefore will not be described in detail in this disclosure.
  • the image encoder is designed to extract image features from the image inputted into it and outputs an image feature tensor.
  • the image can be color or black and white.
  • the image can be captured by a camera additionally contained in the electronic computing device 500, or retrieved from any available source such as the Internet, external storage, or the like.
  • the image encoder comprises at least one convolutional layer, at least one activation layer and at least two downsampling layers and can be any standard pre-trained classification network, for example, MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the depth encoder is designed to extract depth features from a sparse depth map inputted into it and outputs a depth feature tensor.
  • the sparse depth map can be acquired from a single-beam or dual-beam depth sensor additionally contained in the electronic computing device 500, or retrieved from any available source such as the Internet, external storage, and the like.
  • the sparse depth map comprises depth data that is depths in no more than two planes perpendicular to the image plane in the image inputted into the depth encoder.
  • the converter is designed to obtain a converted concatenated feature tensor.
  • the converted concatenated feature tensor is obtained by concatenating the image feature tensor obtained by the image encoder and the depth feature tensor obtained by the depth encoder, and by converting the concatenated tensor using a predetermined parameter specifying the size of the converted concatenated feature tensor.
  • the parameter specifying the size of the converted concatenated feature tensor is a multiple of 2.
  • the value of the parameter specifying the size of the converted concatenated feature tensor is selected basing on the processing speed and quality requirements of the dense depth map at the AI output for depth map reconstruction.
  • the value of the parameter specifying the size of the converted concatenated feature tensor is selected in the range of 256-1024.
  • the decoder is intended to predict a gradient depth map using the converted concatenated feature tensor and the tensor obtained on each of at least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder.
  • the decoder comprises at least one upsampling unit and a final upsampling layer after the last upsampling unit.
  • Each upsampling unit comprises layers in the following order: a first convolutional layer with a 1x1 kernel (pointwise convolution), a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer (depthwise convolution), the second convolutional layer with a 1x1 kernel, and second activation layer.
  • the number of upsampling layers of the decoder corresponds to the number of downsampling layers of the image encoder. Such a decoder structure makes it possible to increase the accuracy of data processing and reduce the processing time.
  • the tensor obtained in the corresponding downsampling layer of the image encoder starting from the last sampling layer of the image encoder is used in the corresponding upsampling unit of the decoder starting from the first upsampling unit of the decoder among the at least one upsampling unit of the decoder, and wherein before inputting into the first upsampling unit of the decoder among the at least one upsampling unit of the decoder, the input tensor is obtained by concatenating the converted concatenated feature tensor with the tensor obtained in the last downsampling layer of the image encoder, and before inputting into each subsequent upsampling unit of the decoder among the at least one upsampling unit of the decoder, the input tensor is obtained by concatenating the tensor obtained in the previous upsampling unit of the decoder among the at least one upsampling unit of the decoder with the tensor obtained in the corresponding downsampling layer of
  • the converted concatenated feature tensor obtained in the converter is concatenated with the tensor obtained in the third downsampling layer of the image encoder and said concatenated tensor is processed by the first upsampling unit.
  • the tensor obtained in the first upsampling unit is concatenated with the tensor obtained in the second downsampling layer of the image encoder, and said concatenated tensor is processed by the second upsampling unit.
  • the tensor obtained in the second upsampling unit is processed with the final upsampling layer to obtain a gradient depth map.
  • the decoder may further comprise at least one convolver before each of the at least one upsampling unit.
  • Each convolver applies interleaving operations using tensor slicing into sub-tensors and comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, the second convolutional layer with a 1x1 kernel, and second activation layer.
  • the following interleaving operations are performed using slicing of the tensor into sub-tensors: the input tensor is sliced into a set of n+1 sub-tensors, where n is the number of convolvers; convolution of one sub-tensor by the first unit is performed; before each subsequent convolver, the tensor obtained in the previous convolver is split into a set of k+1 sub-tensors, where k is the number of remaining convolvers, and obtaining a concatenated tensor by concatenating different n-k sub-tensors, taken one from each of the n-k sets of sub-tensors obtained by slicing the tensors before each of the n-k previous convolvers, to which no convolution has been applied in convolvers, with a sub-tensor obtained by slicing the tensor before each subsequent convolver, to which no convolution has been applied in convolvers, with
  • the concatenated tensor is then obtained by concatenating the remaining sub-tensors to which no convolution has been applied in the convolvers; and processing of the concatenated tensor obtained by concatenating the remaining sub-tensors to which no convolution has been applied in convolvers is performed by the upsampling unit.
  • Fig. 5 shows an example in which the decoder further comprises two convolvers before the upsampling unit. This example, shown in Fig. 5 is only intended to illustrate the operation of the decoder and is not intended to limit the invention, since the decoder may comprise any number of convolvers before each upsampling unit.
  • the input tensor is sliced into three sub-tensors 1, 2, 3.
  • Sub-tensor 3 is processed by the first convolver.
  • the obtained tensor is sliced into two sub-tensors 3.1 and 3.2.
  • Sub-tensor 1 is concatenated with sub-tensor 3.1 and processed with the second convolver.
  • the obtained tensor 3.1.1 is concatenated with sub-tensor 3.2 and sub-tensor 2 and processed by the upsampling unit.
  • Each of the activation layers in the artificial intelligence is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • An iterative spatial distribution unit predicts a dense depth map using the gradient depth map, the sparse depth map, and the tensor obtained in the last upsampling unit of the decoder.
  • the dense depth map prediction is performed iteratively with a preset number of iterations.
  • the number of iterations is set by the user and can be any one. Preferably, the number of iterations is 24.
  • the iterative spatial distribution unit can be any spatial distribution network (SPN).
  • SPN spatial distribution network
  • the known SPN is supplemented by the fact that during the iterations of updating the dense depth map, the depth values in the dense depth map are replaced with known depth values from the sparse depth map input to the depth encoder. Replacing the depth values in the dense depth map with the known depth values from the sparse depth map provides a more accurate representation of the depth values in the dense depth map from the iterative spatial distribution unit.
  • Fig. 2 shows a block diagram of an artificial intelligence tool for implementing a method 200 for depth map reconstruction.
  • the artificial intelligence tool comprises a depth encoder, a converter, a decoder and an iterative spatial distribution unit.
  • the artificial intelligence tool is contained in an electronic computing device 500 that is configured to implement a method 200 for depth map reconstruction.
  • the artificial intelligence tool for implementing the method 200 is pre-trained in the same way as the artificial intelligence tool for implementing the method 100.
  • the structures of the depth encoder, converter, decoder and iterative artificial intelligence spatial distribution unit for implementing the method 200 correspond to the structures of the depth encoder, converter, decoder, and iterative artificial intelligence spatial distribution module for implementing the method 100.
  • the operations, input and output of the AI depth encoder for implementing the method 200 are the same as the operations, input and output of the AI depth encoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the converter is designed to obtain the converted depth feature tensor from the depth feature tensor obtained in the depth encoder using a predetermined parameter specifying the size of the converted depth feature tensor.
  • the conversion operation in the AI converter for implementing the method 200 is the same as the conversion operation in the AI converter for implementation of the method 100. Therefore, a detailed description thereof will be omitted.
  • the decoder is intended to predict the gradient depth map using the converted depth feature tensor as an input tensor.
  • the AI decoder for implementing the method 200 has no concatenation operations with the tensor obtained on each of the at least two downsampling layers of the image encoder. The remaining operations of the AI decoder for implementing the method 200 are the same as those of the AI decoder for implementing the method 100, so their detailed description will be omitted.
  • the operations, input and output of the iterative AI spatial distribution unit for implementing the method 200 are the same as the operations, input and output of the iterative artificial intelligence spatial distribution unit for implementing the method 100. Therefore, their detailed description will be omitted.
  • Fig. 3 shows a block diagram of an artificial intelligence tool for implementing a method 300 for depth map reconstruction.
  • the artificial intelligence tool comprises an image encoder, a converter, a decoder and an iterative spatial distribution unit.
  • the artificial intelligence tool is contained in an electronic computing device 500 that is configured to implement the method 300 for depth map reconstruction.
  • the artificial intelligence tool for implementing the method 300 is pre-trained in the same way as the artificial intelligence tool for implementing the method 100.
  • the structures of the image encoder, converter, decoder and iterative spatial distribution unit of the artificial intelligence tool for implementing the method 300 correspond to the structures of the image encoder, converter, decoder, and iterative spatial distributor unit of the artificial intelligence tool for implementing the method 100.
  • the operations, input and output of the AI image encoder for implementing the method 300 are the same as the operations, input and output of the AI image encoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the converter is intended to obtain the converted depth feature tensor from the image feature tensor obtained in the depth encoder using a predetermined parameter specifying the size of the converted depth feature tensor.
  • the conversion operation in the AI converter for implementing the method 300 is the same as the conversion operation in the AI converter for implementing the method 100. Therefore, detailed description thereof will be omitted.
  • the decoder is intended to predict a gradient depth map using the converted image feature tensor and the tensor obtained on each of the at least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder.
  • the operations and output of the AI decoder for implementing the method 300 are the same as the operations and output of the AI decoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the AI iterative spatial distribution unit for implementing the method 300 there is no step of replacing depth values in the dense depth map with the known depth values from the sparse depth map during iterations of updating the dense depth map.
  • the rest of the operations of the AI iterative spatial distribution unit for implementing the method 300 are the same as those of the AI iterative spatial distribution unit for implementing the method 100, so their detailed description will be omitted.
  • Fig. 4 shows a block diagram of an artificial intelligence tool for implementing a method 400 for depth map reconstruction.
  • the artificial intelligence tool comprises an image encoder, a depth encoder, a converter, a decoder, and an iterative spatial distribution unit.
  • the artificial intelligence tool is contained in an electronic computing device 500 that is configured to implement a method 400 for depth map reconstruction.
  • the artificial intelligence tool for implementing the method 400 is pre-trained in the same way as the artificial intelligence tool for implementing the method 100.
  • the structures of the image encoder, the depth encoder, the converter, the decoder and the iterative spatial distribution unit of the artificial intelligence tool for implementing the method 400 correspond to the structures of the image encoder, the depth encoder, the converter, the decoder, and the iterative spatial distribution unit of the artificial intelligence tool for implementing the method 100.
  • the operations, input and output of the AI image encoder for implementing the method 400 are the same as the operations, input and output of the AI image encoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the depth encoder is designed to extract depth features from a sparse depth map inputted into it and outputs a depth feature tensor.
  • the sparse depth map comprises depth data, being the depths of objects set by the user in the image inputted into the image encoder.
  • the sparse depth map can be generated by the user on any remote device containing a user input device and inputted into the depth encoder from said remote device or from any available source such as the Internet, external storage, and the like.
  • the sparse depth map may also be generated on the electronic computing device 500 if the electronic computing device 500 further comprises a user input device.
  • the operations and output of the AI depth encoder for implementing the method 400 are the same as the operations and output of the AI depth encoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the operations, input and output of the AI decoder for implementing the method 400 are the same as the operations, input and output of the AI decoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the operations, input and output of the AI converter for implementing the method 400 are the same as the operations, input and output of the AI converter for implementing the method 100. Therefore, their detailed description will be omitted.
  • the operations, input and output of the AI decoder for implementing the method 400 are the same as the operations, input and output of the AI decoder for implementing the method 100. Therefore, their detailed description will be omitted.
  • the operations, input and output of the AI iterative spatial distribution unit for implementing the method 400 are the same as the operations, input and output of the AI iterative spatial distribution unit for implementing the method 100. Therefore, their detailed description will be omitted.
  • Fig. 6 is a flowchart illustrating a first embodiment of a method 100 for depth map reconstruction.
  • the method 100 for depth map reconstruction is performed by the electronic computing device 500.
  • step S101 an image and a sparse depth map are acquired.
  • a sparse depth map comprises depth data being the depths in no more than two planes perpendicular to the image plane in the acquired image.
  • the trained image encoder calculates an image feature tensor by extracting image features from the image acquired in step S101.
  • the trained image encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • the trained depth encoder calculates a depth feature tensor by extracting depth features from the sparse depth map acquired in step S101.
  • the trained depth encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • step S104 the converter calculates the concatenated feature tensor by concatenating the image feature tensor calculated in step S102 and the depth feature tensor calculated in step S103, and converts the concatenated feature tensor using a predetermined parameter specifying the size of the converted concatenated feature tensor.
  • the trained decoder predicts the gradient depth map using the converted concatenated feature tensor obtained in step S104 and the tensor obtained on each of at least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder.
  • the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, and the final upsampling layer after the last upsampling unit.
  • the number of upsampling layers of the decoder corresponds to the number of downsampling layers of the image encoder.
  • the tensor obtained in the corresponding downsampling layer of the image encoder starting from the last sampling layer of the image encoder is used in the corresponding upsampling unit of the decoder starting from the first upsampling unit of the decoder among the at least one upsampling unit of the decoder.
  • the input tensor is obtained by concatenating the converted concatenated feature tensor with the tensor obtained in the last downsampling layer of the image encoder.
  • the input tensor is obtained by concatenating the tensor obtained in the previous upsampling unit of the decoder among the at least one upsampling unit of the decoder with the tensor obtained in the corresponding downsampling layer of the image encoder.
  • step S106 the iterative spatial distribution unit predicts a dense depth map using the gradient depth map predicted in step S105, the sparse depth map acquired in step S105, and the tensor obtained in the last upsampling unit of the decoder among the at least one upsampling unit of the decoder, wherein the dense depth map prediction is performed iteratively with a preset number of iterations.
  • the image encoder can be one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted concatenated feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder may further comprise at least one convolver before each of the at least one upsampling unit.
  • Each convolver of at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, the second convolutional layer with a 1x1 kernel, and second activation layer.
  • step S105 further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver, and obtaining a concatenated tensor by concatenating different n-k sub-tensors, taken one from each of the n-k sets of sub-tensors obtained by slicing the tensors before each of the n-k previous convolvers, to which no convolution has been applied in convolvers, with the sub-tensor obtained by slicing the tensor before
  • Each of the activation layers in the image encoder, depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Fig. 7 is a flowchart illustrating a second embodiment of a method 200 for depth map reconstruction.
  • the method 200 for depth map reconstruction is performed by the electronic computing device 500.
  • step S201 a sparse depth map containing depth data in the image is acquired.
  • the trained depth encoder calculates a depth feature tensor by extracting depth features from the sparse depth map acquired in step S201.
  • the trained depth encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • step S204 the converter converts the depth feature tensor calculated in step S203 using a predetermined parameter specifying the size of the converted depth feature tensor.
  • the trained decoder predicts the gradient depth map using the converted depth feature tensor obtained in step S204 as an input tensor.
  • the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, and the final upsampling layer after the last upsampling unit.
  • step S206 the iterative spatial distribution unit predicts a dense depth map using the gradient depth map predicted in step S205, the sparse depth map acquired in step S201, and the tensor obtained in the last upsampling unit of the decoder among the at least one upsampling unit of the decoder.
  • the dense depth map prediction is performed iteratively with a preset number of iterations.
  • the predetermined parameter specifying the size of the converted depth feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder may further comprise at least one convolver before each of the at least one upsampling unit.
  • Each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, the second convolutional layer with a 1x1 kernel, and second activation layer.
  • step S205 further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver, and obtaining a concatenated tensor by concatenating different n-k sub-tensors, taken one from each of the n-k sets of sub-tensors obtained by slicing the tensors before each of the n-k previous convolvers, to which no convolution has been applied in convolvers, with the sub-tensor obtained by slicing the tensor before
  • Each of the activation layers in the depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Fig. 8 is a flowchart illustrating a third embodiment of a method 300 for depth map reconstruction.
  • the method 300 for depth map reconstruction is performed by the electronic computing device 500.
  • step S301 an image is acquired.
  • the trained image encoder calculates an image feature tensor by extracting image features from the image acquired in step S301.
  • the trained image encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • step S304 the converter converts the image feature tensor calculated in step S302 using a predetermined parameter specifying the size of the converted image feature tensor.
  • the trained decoder predicts the gradient depth map using the converted image feature tensor obtained in step S304 and the tensor obtained on each of at the least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder.
  • the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, and the final upsampling layer after the last upsampling unit.
  • the number of upsampling layers of the decoder corresponds to the number of downsampling layers of the image encoder.
  • the tensor obtained in the corresponding downsampling layer of the image encoder starting from the last sampling layer of the image encoder is used in the corresponding upsampling unit of the decoder starting from the first upsampling unit of the decoder among the at least one upsampling unit of the decoder.
  • the input tensor is obtained by concatenating the converted image feature tensor with the tensor obtained in the last downsampling layer of the image encoder.
  • the input tensor is obtained by concatenating the tensor obtained in the previous upsampling unit of the decoder among the at least one upsampling unit of the decoder with the tensor obtained in the corresponding downsampling layer of the image encoder.
  • step S306 the iterative spatial distribution unit predicts a dense depth map using the gradient depth map predicted in step S305 and the tensor obtained in the last upsampling unit of the decoder among the at least one upsampling unit of the decoder.
  • the dense depth map prediction is performed iteratively with a preset number of iterations.
  • the image encoder can be one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted image feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder may further comprise at least one convolver before each of the at least one upsampling unit.
  • Each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, the second convolutional layer with a 1x1 kernel, and second activation layer.
  • step S305 further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver, and obtaining a concatenated tensor by concatenating different n-k sub-tensors, taken one from each of the n-k sets of sub-tensors obtained by slicing the tensors before each of the n-k previous convolvers, to which no convolution has been applied in convolvers, with the sub-tensor obtained by slicing the tensor before
  • Each of the activation layers in the image encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Fig. 9 is a flowchart illustrating a fourth embodiment of a method 400 for depth map reconstruction.
  • the method 400 for depth map reconstruction is performed by an electronic computing device 500.
  • step S401 an image and a sparse depth map are acquired.
  • the sparse depth map comprises depth data, being the depths of objects set by the user in the captured image.
  • the trained image encoder calculates an image feature tensor by extracting image features from the image acquired in step S401.
  • the trained image encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • the trained depth encoder calculates a depth feature tensor by extracting depth features from the sparse depth map acquired in step S401.
  • the trained depth encoder comprises at least one convolutional layer, at least one activation layer, and at least two downsampling layers.
  • step S404 the converter calculates the concatenated feature tensor by concatenating the image feature tensor calculated in step S402 and the depth feature tensor calculated in step S403, and converts the concatenated feature tensor using a predetermined parameter specifying the size of the converted concatenated feature tensor.
  • the trained decoder predicts a gradient depth map using the converted concatenated feature tensor obtained in step S404 and the tensor obtained on each of the at least two downsampling layers of the image encoder except for the first downsampling layer of the image encoder.
  • the decoder comprises at least one upsampling unit comprising layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, an upsampling layer, a per-channel 2D convolutional layer, a second convolutional layer with a 1x1 kernel and a second activation layer, and the final upsampling layer after the last upsampling unit.
  • the number of upsampling layers of the decoder corresponds to the number of downsampling layers of the image encoder.
  • the tensor obtained in the corresponding downsampling layer of the image encoder starting from the last sampling layer of the image encoder is used in the corresponding upsampling unit of the decoder starting from the first upsampling unit of the decoder among the at least one upsampling unit of the decoder.
  • the input tensor is obtained by concatenating the converted concatenated feature tensor with the tensor obtained in the last downsampling layer of the image coder.
  • the input tensor is obtained by concatenating the tensor obtained in the previous upsampling unit of the decoder among the at least one upsampling unit of the decoder with the tensor obtained in the corresponding downsampling layer of the image encoder.
  • step S406 the iterative spatial distribution unit predicts a dense depth map using the gradient depth map predicted in step S405, the sparse depth map acquired in step S401, and the tensor obtained in the last upsampling unit of the decoder from the at least at least one upsampling unit of the decoder.
  • the dense depth map prediction is performed iteratively with a preset number of iterations.
  • the image encoder can be one of the standard pre-trained classification networks MobileNet v1, MobileNet v2, InceptionNet, ResNet, R-CNN.
  • the predetermined parameter specifying the size of the converted concatenated feature tensor is a multiple of 2 and preferably selected in the range of 256-1024.
  • the decoder may further comprise at least one convolver before each of the at least one upsampling unit.
  • Each convolver of the at least one convolver comprises layers in the following order: a first convolutional layer with a 1x1 kernel, a first activation layer, a per-channel 2D convolutional layer, the second convolutional layer with a 1x1 kernel, and second activation layer.
  • step S405 further comprises the steps of: slicing the input tensor into a set of n+1 sub-tensors, where n is the number convolvers; performing convolution of one sub-tensor by the first convolver of the at least one convolver; before each subsequent convolver of the at least one convolver, slicing the tensor obtained in the previous convolver of the at least one convolver into a set of k+1 sub-tensors, where k is the number of remaining convolvers of the at least one convolver, and obtaining a concatenated tensor by concatenating different n-k sub-tensors, taken one from each of the n-k sets of sub-tensors obtained by slicing the tensors before each of the n-k previous convolvers, to which no convolution has been applied in convolvers, with the sub-tensor obtained by slicing the tensor
  • Each of the activation layers in the image encoder, depth encoder and decoder is one of the ReLU, leaky ReLU, ReLU6, ELU layers.
  • Fig. 10 is a block diagram illustrating an electronic computing device 500 configured to perform any of methods 100, 200, 300, 400 for depth map reconstruction.
  • the electronic computing device 500 comprises at least one processor 501 and a memory 502.
  • the memory 502 stores the numerical parameters of a trained image encoder, a trained depth encoder, a converter, a trained decoder, an iterative spatial distribution unit, and instructions that when executed by at the least one processor 501 cause at least one processor 501 to perform any of the depth map reconstruction methods 100, 200, 300, 400.
  • the electronic computing device may further comprise at least one of a camera, a depth sensor, and a user input device.
  • the camera is designed at least for capturing an image inputted into the image encoder.
  • the depth sensor is designed to at least acquire a sparse depth map inputted into the depth encoder.
  • the depth sensor can be any depth sensor, such as a single beam or double beam depth sensor.
  • the user input device is designed for at least a user input of depth data by indicating the depth of objects in the image when generating a sparse depth map.
  • the methods disclosed herein may be implemented on a computer-readable medium that stores numeric parameters of the trained artificial intelligence tools and computer-executable instructions that, when executed by a computer processor, cause the computer to perform the methods of the invention.
  • the trained artificial intelligence tools and instructions for implementing the present methods can be downloaded to an electronic computing device via a network or from a medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention se rapporte, de manière générale, aux domaines de la vision artificielle, de la réalisation d'une carte de profondeur à l'aide d'une intelligence artificielle et d'un apprentissage automatique permettant la reconstruction d'une carte de profondeur, en particulier, à des procédés de reconstruction de carte de profondeur et à un dispositif informatique électronique permettant de les implémenter. Le résultat technique consiste à permettre l'implémentation des procédés de reconstruction de carte de profondeur sur des dispositifs à faible puissance informatique pour acquérir une carte de profondeur de haute qualité de données de profondeur éparses qui ne couvrent pas l'image entière, en temps réel ou sensiblement en temps réel avec un très faible retard. Le résultat technique est obtenu par un procédé de reconstruction de carte de profondeur consistant : à acquérir une image et une carte de profondeur éparse contenant des données de profondeur représentant les profondeurs dans au plus deux plans perpendiculaires à un plan d'image dans l'image acquise ; à calculer un tenseur de caractéristique d'image par un codeur d'image formé ; à calculer un tenseur de caractéristique de profondeur par un codeur de profondeur formé ; à calculer un tenseur de caractéristique concaténé par un convertisseur en concaténant le tenseur de caractéristique d'image et le tenseur de caractéristique de profondeur et en convertissant le tenseur de caractéristique concaténé par le convertisseur à l'aide d'un paramètre prédéterminé spécifiant la taille du tenseur de caractéristique concaténé converti ; à prédire une carte de profondeur de gradient par un décodeur formé à l'aide du tenseur de caractéristique concaténé converti et du tenseur obtenu sur chacune des au moins deux couches de sous-échantillonnage du codeur d'image à l'exception de la première couche de sous-échantillonnage du codeur d'image ; et à prédire une carte de profondeur dense par une unité de distribution spatiale itérative à l'aide de la carte de profondeur de gradient, la carte de profondeur éparse et le tenseur obtenu dans la dernière unité de sur-échantillonnage du décodeur, la prédiction de carte de profondeur dense étant effectuée de manière itérative avec un nombre prédéfini d'itérations.
PCT/KR2021/000997 2020-08-25 2021-01-26 Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter WO2022045495A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2020128195A RU2745010C1 (ru) 2020-08-25 2020-08-25 Способы реконструкции карты глубины и электронное вычислительное устройство для их реализации
RU2020128195 2020-08-25

Publications (1)

Publication Number Publication Date
WO2022045495A1 true WO2022045495A1 (fr) 2022-03-03

Family

ID=74874429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/000997 WO2022045495A1 (fr) 2020-08-25 2021-01-26 Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter

Country Status (2)

Country Link
RU (1) RU2745010C1 (fr)
WO (1) WO2022045495A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119889A (zh) * 2021-11-12 2022-03-01 杭州师范大学 基于跨模态融合的360度环境深度补全和地图重建方法
CN114677315A (zh) * 2022-04-11 2022-06-28 探维科技(北京)有限公司 基于图像与激光点云的图像融合方法、装置、设备和介质
CN116468768A (zh) * 2023-04-20 2023-07-21 南京航空航天大学 基于条件变分自编码器和几何引导的场景深度补全方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097589A (zh) * 2019-04-29 2019-08-06 广东工业大学 一种应用于稀疏地图稠密化的深度补全方法
US20200258248A1 (en) * 2018-08-31 2020-08-13 Snap Inc. Active image depth prediction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030022304A (ko) * 2001-05-23 2003-03-15 코닌클리케 필립스 일렉트로닉스 엔.브이. 깊이 지도 계산
KR101484487B1 (ko) * 2007-10-11 2015-01-28 코닌클리케 필립스 엔.브이. 깊이-맵을 프로세싱하는 방법 및 디바이스
US9294662B2 (en) * 2013-10-16 2016-03-22 Broadcom Corporation Depth map generation and post-capture focusing
CN104660900B (zh) * 2013-10-30 2018-03-02 株式会社摩如富 图像处理装置及图像处理方法
US9514537B2 (en) * 2013-12-27 2016-12-06 Xerox Corporation System and method for adaptive depth map reconstruction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258248A1 (en) * 2018-08-31 2020-08-13 Snap Inc. Active image depth prediction
CN110097589A (zh) * 2019-04-29 2019-08-06 广东工业大学 一种应用于稀疏地图稠密化的深度补全方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DIANA WOFK; FANGCHANG MA; TIEN-JU YANG; SERTAC KARAMAN; VIVIENNE SZE: "FastDepth: Fast Monocular Depth Estimation on Embedded Systems", ARXIV.ORG, 8 March 2019 (2019-03-08), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081130984 *
JIE TANG; FEI-PENG TIAN; WEI FENG; JIAN LI; PING TAN: "Learning Guided Convolutional Network for Depth Completion", ARXIV.ORG, 4 August 2019 (2019-08-04), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081455409 *
YANCHAO YANG; ALEX WONG; STEFANO SOATTO: "Dense Depth Posterior (DDP) from Single Image and Sparse Range", ARXIV.ORG, 29 January 2019 (2019-01-29), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081009350 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119889A (zh) * 2021-11-12 2022-03-01 杭州师范大学 基于跨模态融合的360度环境深度补全和地图重建方法
CN114677315A (zh) * 2022-04-11 2022-06-28 探维科技(北京)有限公司 基于图像与激光点云的图像融合方法、装置、设备和介质
CN114677315B (zh) * 2022-04-11 2022-11-29 探维科技(北京)有限公司 基于图像与激光点云的图像融合方法、装置、设备和介质
US11954835B2 (en) 2022-04-11 2024-04-09 Tanway Technology (beijing) Co., Ltd. Methods, devices, apparatuses, and media for image fusion utilizing images and LiDAR point clouds
CN116468768A (zh) * 2023-04-20 2023-07-21 南京航空航天大学 基于条件变分自编码器和几何引导的场景深度补全方法
CN116468768B (zh) * 2023-04-20 2023-10-17 南京航空航天大学 基于条件变分自编码器和几何引导的场景深度补全方法

Also Published As

Publication number Publication date
RU2745010C1 (ru) 2021-03-18

Similar Documents

Publication Publication Date Title
WO2022045495A1 (fr) Procédés de reconstruction de carte de profondeur et dispositif informatique électronique permettant de les implémenter
US10970864B2 (en) Method and apparatus for recovering point cloud data
KR102037893B1 (ko) 디지털 외관조사망도 구축 시스템 및 방법
WO2021201422A1 (fr) Procédé et système de segmentation sémantique applicables à l'ar
WO2021107610A1 (fr) Procédé et système de production d'une carte triple pour un matage d'image
WO2020139009A1 (fr) Dispositif d'apprentissage de maladies cérébrovasculaires, dispositif de détection de maladies cérébrovasculaires, procédé d'apprentissage de maladies cérébrovasculaires et procédé de détection de maladies cérébrovasculaires
CN112200057B (zh) 人脸活体检测方法、装置、电子设备及存储介质
WO2020231226A1 (fr) Procédé de réalisation, par un dispositif électronique, d'une opération de convolution au niveau d'une couche donnée dans un réseau neuronal, et dispositif électronique associé
WO2022050532A1 (fr) Procédé et système d'obtention et d'analyse de carte de source sonore haute résolution utilisant un réseau neuronal à intelligence artificielle
CN114972085B (zh) 一种基于对比学习的细粒度噪声估计方法和系统
WO2022005157A1 (fr) Dispositif électronique et procédé de commande de dispositif électronique
WO2019127049A1 (fr) Procédé de mise en correspondance d'images, dispositif et support d'enregistrement
Jin et al. Embedded real-time pedestrian detection system using YOLO optimized by LNN
CN111696044B (zh) 一种大场景动态视觉观测方法及装置
CN112508099A (zh) 一种实时目标检测的方法和装置
EP4374316A1 (fr) Procédé et dispositif électronique de segmentation d'objets dans une scène
US20240161254A1 (en) Information processing apparatus, information processing method, and program
US11379692B2 (en) Learning method, storage medium and image processing device
CN116168393B (zh) 基于点云神经辐射场的语义标注数据自动生成方法、装置
WO2022004970A1 (fr) Appareil et procédé d'entraînement de points clés basés sur un réseau de neurones artificiels
WO2021137415A1 (fr) Procédé et appareil de traitement d'image basé sur l'apprentissage automatique
WO2019107624A1 (fr) Procédé de traduction de séquence à séquence et appareil associé
WO2022098164A1 (fr) Dispositif électronique et son procédé de commande
CN111126454B (zh) 图像处理方法、装置、存储介质及电子设备
WO2024090989A1 (fr) Segmentation multi-vues et inpainting perceptuel à champs de rayonnement neuronal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21861799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21861799

Country of ref document: EP

Kind code of ref document: A1