WO2019223262A1 - 单目图像的深度恢复方法及装置、计算机设备 - Google Patents

单目图像的深度恢复方法及装置、计算机设备 Download PDF

Info

Publication number
WO2019223262A1
WO2019223262A1 PCT/CN2018/116276 CN2018116276W WO2019223262A1 WO 2019223262 A1 WO2019223262 A1 WO 2019223262A1 CN 2018116276 W CN2018116276 W CN 2018116276W WO 2019223262 A1 WO2019223262 A1 WO 2019223262A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
monocular
gradient
feature image
Prior art date
Application number
PCT/CN2018/116276
Other languages
English (en)
French (fr)
Inventor
鲍虎军
章国锋
蒋沁宏
石建萍
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Priority to JP2020520708A priority Critical patent/JP6850399B2/ja
Priority to SG11201912423WA priority patent/SG11201912423WA/en
Publication of WO2019223262A1 publication Critical patent/WO2019223262A1/zh
Priority to US16/724,287 priority patent/US11004221B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the field of computer vision, and in particular, to a method and an apparatus for depth restoration of a monocular image, a computer device, a computer-readable storage medium, and a computer program.
  • Depth restoration of monocular images refers to the depth restoration of monocular images. Depth restoration of monocular images is an important issue in the field of computer vision and has important applications in many fields, such as 3D reconstruction, real-time positioning of robots, vision Obstacle avoidance and other fields.
  • embodiments of the present application provide a method and apparatus for depth restoration of a monocular image, a computer device, a computer-readable storage medium, and a computer program.
  • Depth estimation is performed according to the area-enhanced feature image to obtain a depth image of the monocular image.
  • performing feature extraction on the monocular image to obtain a feature image of the monocular image includes:
  • the monocular image is input to a first neural network for feature extraction to obtain a feature image of the monocular image.
  • the feature extraction performed by the first neural network includes:
  • Feature fusion is performed on the adjusted multi-scale feature information to obtain the feature image.
  • the decoupling the feature image to obtain a scene structure diagram of the feature image includes:
  • the feature image is input to a second neural network for decoupling to obtain a scene structure diagram of the feature image.
  • the method before the feature image is input into the second neural network, the method further includes:
  • the second neural network is established in advance, wherein the second neural network includes at least a convolution layer and a linear rectification function.
  • performing the gradient perception processing on the feature image and the scene structure map to obtain a region-enhanced feature image includes:
  • the feature image and the scene structure map are input to a third neural network for gradient sensing processing to obtain a region-enhanced feature image.
  • the gradient sensing processing performed by the third neural network includes:
  • performing a similarity analysis based on the actual gradient information and the predicted gradient information to obtain a mask includes:
  • An actual gradient image with a similarity greater than a preset threshold is used as the mask.
  • performing the residual filtering process on the mask and the feature image includes:
  • Pre-processing the fused image to obtain a pre-processed image includes, in order, convolution calculation, linear rectification calculation, and convolution calculation;
  • the feature image and the pre-processed image are superimposed to obtain a region-enhanced feature image.
  • performing the depth estimation based on the enhanced feature image of the region to obtain the depth image of the monocular image includes:
  • a convolution calculation is performed on the area-enhanced feature image to obtain a depth image of the monocular image.
  • a feature extraction module configured to perform feature extraction on the monocular image to obtain a feature image of the monocular image
  • a scene structure estimation module configured to decouple the feature image to obtain a scene structure diagram of the feature image
  • a gradient perception module configured to perform gradient perception processing on the feature image and the scene structure map to obtain a region-enhanced feature image
  • a depth estimation module is configured to perform a depth estimation according to the feature image enhanced in the region to obtain a depth image of the monocular image.
  • the feature extraction module is configured to input the monocular image into a first neural network for feature extraction to obtain a feature image of the monocular image.
  • the feature extraction performed by the first neural network includes:
  • Feature fusion is performed on the adjusted multi-scale feature information to obtain the feature image.
  • the scene structure estimation module is configured to input the feature image into a second neural network for decoupling to obtain a scene structure diagram of the feature image.
  • the device further includes:
  • a building module is configured to build the second neural network in advance, wherein the second neural network includes at least a convolution layer and a linear rectification function.
  • the gradient sensing module is configured to input the feature image and the scene structure map to a third neural network for gradient sensing processing to obtain a region-enhanced feature image.
  • the gradient sensing processing performed by the third neural network includes:
  • performing a similarity analysis based on the actual gradient information and the predicted gradient information to obtain a mask includes:
  • An actual gradient image with a similarity greater than a preset threshold is used as the mask.
  • performing the residual filtering process on the mask and the feature image includes:
  • Pre-processing the fused image to obtain a pre-processed image includes, in order, convolution calculation, linear rectification calculation, and convolution calculation;
  • the feature image and the pre-processed image are superimposed to obtain a region-enhanced feature image.
  • the depth estimation module is configured to perform a convolution calculation on the feature image enhanced in the region to obtain a depth image of the monocular image.
  • An embodiment of the present application provides a computer device.
  • the computer device includes a memory and a processor.
  • the memory stores computer-executable instructions.
  • the processor runs the computer-executable instructions on the memory, the application is implemented.
  • the depth restoration method for a monocular image provided by the embodiment.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the method for depth restoration of a monocular image provided by the embodiment of the present application is implemented.
  • An embodiment of the present application provides a computer program including computer instructions, and when the computer instructions are run in a processor of a device, the method for depth restoration of a monocular image provided in the embodiment of the present application is implemented.
  • feature extraction is performed on the monocular image to obtain a feature image of the monocular image; the feature image is decoupled to obtain a scene structure diagram of the feature image; The feature image and the scene structure map are subjected to gradient sensing processing to obtain a region-enhanced feature image; and a depth estimation is performed according to the region-enhanced feature image to obtain a depth image of the monocular image.
  • the above-mentioned method and device for monocular image depth restoration can not only obtain better depth estimation results with a small amount of data, but also use gradient perception processing to obtain more depth details.
  • FIG. 1 is a schematic flowchart of a monocular image depth restoration method according to an embodiment of the present application
  • FIG. 2 is a diagram of a neural network architecture according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of feature extraction of a monocular image according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a gradient sensing process according to an embodiment of the present application.
  • FIG. 5 is a first schematic structural composition diagram of a depth recovery device for a monocular image according to an embodiment of the present application.
  • FIG. 6 is a second schematic diagram of the structure and composition of a monocular image depth restoration device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the embodiments of the present application can be applied to electronic devices such as computer systems / servers, which can operate with many other general-purpose or special-purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and / or configurations suitable for use with electronic devices such as computer systems / servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop Equipment, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of these systems, and more.
  • Electronic devices such as computer systems / servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and so on, which perform specific tasks or implement specific abstract data types.
  • the computer system / server can be implemented in a distributed cloud computing environment.
  • tasks are performed by remote processing devices linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including a storage device.
  • FIG. 1 is a schematic flowchart of a monocular image depth restoration method according to an embodiment of the present application. As shown in FIG. 1, the monocular image depth restoration method includes the following steps:
  • Step 101 Perform feature extraction on the monocular image to obtain a feature image of the monocular image.
  • the monocular image is also referred to as a two-dimensional image.
  • the information of the monocular image only includes color information of each pixel, such as RGB information, and does not include depth information of each pixel.
  • the embodiment of the present application aims to estimate the corresponding depth image from the monocular image. Assuming that (x, y) is the coordinate of any pixel in the monocular image, the coordinate (x, y) can be determined from the monocular image. The depth information (depth) corresponding to the coordinates (x, y) cannot be determined. To determine the depth information (depth) corresponding to the coordinates (x, y), the depth of the monocular image needs to be restored.
  • Feature extraction is first required on the monocular image to obtain a characteristic image of the monocular image.
  • Feature extraction refers to performing depth-perceptual feature extraction on the monocular image to provide basic features for subsequent estimation of the depth image.
  • a convolution operation may be used to implement feature extraction on the monocular image.
  • Step 102 Decoupling the feature image to obtain a scene structure diagram of the feature image.
  • the scene structure diagram contains the scene structure information of the monocular image, where the scene structure information includes the structure information of each object in the monocular image and the relative position relationship (such as the front-rear relationship) between the objects. It reflects the relative depth information of the object.
  • the feature image includes two types of information, one is scene structure information and the other is depth scale information. It is very difficult to estimate these two types of information at the same time.
  • the feature image is decoupled, and the scene structure information of the feature image is estimated to obtain a scene structure map.
  • the scene structure information of the feature image can be estimated by using a convolution operation.
  • Step 103 Perform gradient perception processing on the feature image and the scene structure map to obtain a region-enhanced feature image.
  • the embodiment of the present application estimates the enhanced region of the feature image through gradient perception processing, so as to obtain a feature image with enhanced region.
  • the strong correlation between the gradient and the geometric details is used to estimate the gradient information of the obtained feature image and the scene structure map respectively.
  • the enhanced region can be determined, thereby A region-enhanced feature image is obtained.
  • the geometric details are enhanced, which provides a basic guarantee for subsequent high-quality depth images.
  • Step 104 Perform depth estimation according to the feature image enhanced by the region to obtain a depth image of the monocular image.
  • a depth estimation is performed on the feature image enhanced by the region through a convolution operation, so as to obtain a depth image of the monocular image.
  • a convolution calculation is performed on the area-enhanced feature image to obtain a depth image of the monocular image.
  • the technical solution of the embodiment of the present application decouples the depth estimation into two parts, namely 1) the estimation of the scene structure, and 2) the estimation of the depth, which can significantly accelerate the convergence of the neural network and improve the depth estimation. Accuracy.
  • the local details of the depth image are further improved through gradient perception processing to obtain high-quality depth images, thereby providing a high-quality data source for applications that require fine geometric details and accurate 3D reconstruction of object boundaries.
  • an embodiment of the present application further provides a neural network architecture (referred to as DCNet).
  • DCNet is composed of three parts, namely: 1) a feature extraction module, 2 ) A decoupling module, and 3) a gradient sensing module, wherein the decoupling module includes two parts, namely 2.1) a scene structure estimation module and 2.2) a depth estimation module.
  • the DCNet shown in FIG. 2 may be trained by using a Euclidean loss function.
  • feature extraction is performed on the monocular image through the network in part (a) to obtain the feature image of the monocular image; and the feature image is decoupled through the network in part (b), Obtain the scene structure map of the feature image; perform gradient perception processing on the feature image and the scene structure map through the network of part (c) to obtain a region-enhanced feature image; according to the network of part (d) The depth-enhanced feature image is subjected to depth estimation to obtain a depth image of the monocular image.
  • feature extraction of a monocular image may be performed by inputting the monocular image into a first neural network for feature extraction, thereby obtaining a feature image of the monocular image.
  • the following describes how to perform feature extraction through the first neural network with reference to FIGS. 2 and 3.
  • FIG. 3 is a schematic diagram of a feature extraction process of a monocular image according to an embodiment of the present application. As shown in FIG. 3, the process includes the following steps:
  • Step 301 Perform multi-scale feature extraction on the monocular image to obtain multi-scale feature information of the monocular image.
  • the first neural network is the network of part (a).
  • a column of convolution layers on the left is used to perform multi-scale on the monocular image.
  • Feature extraction to obtain multi-scale feature information of the monocular image.
  • Step 302 Perform residual adjustment on the multi-scale feature information to obtain adjusted multi-scale feature information.
  • the multi-scale feature information refers to different scales of the extracted feature information. Some feature information has a larger scale and some feature information has a smaller scale. Due to the inconsistent scales of the multi-scale feature information, the small-scale feature information can be displayed at It disappears after the fusion, therefore, it is necessary to perform residual adjustment on the multi-scale feature information through a residual adjustment module (Residual adjustment) (refer to (e) in FIG. 2) in the middle column to obtain the adjusted
  • residual adjustment residual adjustment
  • the purpose of the residual adjustment is to adjust the scale of each feature information in the multi-scale feature information to obtain a better fusion effect.
  • Step 303 Perform feature fusion on the adjusted multi-scale feature information to obtain the feature image.
  • part of the feature information is selected from the adjusted multi-scale feature information, and half of the input image size is sampled and input to a right-side fusion module (Concat) for feature fusion to obtain the feature image.
  • Concat right-side fusion module
  • the feature image is input to a second neural network for decoupling to obtain a scene structure diagram of the feature image.
  • the following describes how to estimate the scene structure graph through the second neural network in conjunction with FIG. 2.
  • the second neural network is the network of part (b).
  • the network of part (a) extracts the feature image I from the monocular image, and then inputs the feature image I to the network of part (b).
  • the scene structure graph R is predicted by the network in part (b).
  • the second neural network is established in advance, wherein the second neural network includes at least a convolution layer and a linear rectification function.
  • the convolution layer may be a 512-channel convolution layer
  • the linear rectification function is implemented by modifying a linear unit (ReLU, Rectified Linear Unit).
  • the relationship between I and R can be expressed by the following formula:
  • F1 represents the mapping from the feature image I to the scene structure graph R, and F1 corresponds to the network in part (b) of FIG. 2, and the network in part (b) is used to learn the scene structure graph R.
  • the embodiment of the present application performs gradient perception processing on the feature image and the scene structure map to obtain a feature image with enhanced area.
  • the gradient perception processing may obtain the feature image with enhanced region by inputting the feature image and the scene structure map into a third neural network for gradient perception processing.
  • the following describes how to perform gradient sensing processing through a third neural network in conjunction with FIG. 2 and FIG. 4.
  • FIG. 4 is a schematic flowchart of a gradient sensing process according to an embodiment of the present application. As shown in FIG. 4, the process includes the following steps:
  • Step 401 Obtain an actual gradient image of the scene structure map according to the scene structure map.
  • Step 402 Obtain a predicted gradient image corresponding to the feature image according to the feature image.
  • the third neural network is the network of part (c).
  • the actual gradient image is estimated based on the scene structure graph R
  • Step 403 Perform a similarity analysis according to the actual gradient image and the predicted gradient image to obtain a mask.
  • the similarity between the actual gradient image and the predicted gradient image is calculated (for example, the similarity is calculated by a cosine function); an actual gradient image with a similarity greater than a preset threshold is used as the mask.
  • calculate with The similarity between them is an actual gradient image with a similarity greater than ⁇ as the mask.
  • the mask corresponds to an area of the actual gradient image.
  • the characteristics of this area are conducive to further optimizing the details of the depth image, so that the depth image can be used for high-precision applications such as 3D modeling.
  • Step 404 Perform a residual filtering process on the mask and the feature image to obtain a region-enhanced feature image.
  • the region-enhanced feature image is obtained by: calculating a product of the mask and the feature image to obtain a fused image; and pre-processing the fused image to obtain a pre-processed image, where The preprocessing includes, in order, convolution calculation, linear rectification calculation, and convolution calculation; superimposing the feature image and the preprocessed image to obtain a region-enhanced feature image.
  • the Multiply module is used to calculate the product of the mask and the feature image to obtain a fused image.
  • the fused image is input into the Conv module, the RuLU module, and the Conv module in sequence.
  • the corresponding convolution calculation, linear rectification calculation, and convolution calculation are realized, and the final result is then superimposed on the original feature image through the Sum module to output a feature image with enhanced area.
  • the area-enhanced feature image is subjected to convolution calculation through the network in part (d) of FIG. 2 to obtain a depth image of the monocular image.
  • the convolution layer performing the convolution calculation may be a 64-channel convolution layer.
  • the scene structure estimation module (the network in part (b) in FIG. 2) and the depth estimation module (the network in (d) in FIG. 2) are used to estimate the depth image in a divide-and-conquer strategy.
  • the method is decomposed into an estimation of the scene structure and an estimation of the depth, so as to accelerate the convergence speed of the DCNet and obtain more accurate results.
  • a gradient perception module is established between the scene structure estimation module and the depth estimation module.
  • the depth estimation module can obtain a region-enhanced feature image, so that deeper neural networks (such as the network in part (d) in FIG. 2) can be used. Focus more on the enhanced areas, recovering depth images with better boundaries and details.
  • the recovered high-precision depth images provide a high-quality data source for applications such as 3D reconstruction.
  • FIG. 5 is a first schematic diagram of the structure and composition of a depth recovery device for a monocular image according to an embodiment of the present application. As shown in FIG.
  • a feature extraction module 501 configured to perform feature extraction on the monocular image to obtain a feature image of the monocular image
  • the scene structure estimation module 502 is configured to decouple the feature image to obtain a scene structure diagram of the feature image
  • the gradient perception module 503 is configured to perform gradient perception processing on the feature image and the scene structure map to obtain a region-enhanced feature image
  • the depth estimation module 504 is configured to perform a depth estimation according to the feature image enhanced in the region to obtain a depth image of the monocular image.
  • the implementation functions of the units in the depth restoration device of the monocular image shown in FIG. 5 can be understood by referring to the related description of the depth restoration method of the monocular image.
  • the functions of the units in the depth restoration device of the monocular image shown in FIG. 5 may be implemented by a program running on a processor, or may be implemented by a specific logic circuit.
  • FIG. 6 is a second schematic diagram of the structure and composition of a depth restoration device for a monocular image according to an embodiment of the present application. As shown in FIG.
  • a feature extraction module 501 configured to perform feature extraction on the monocular image to obtain a feature image of the monocular image
  • the scene structure estimation module 502 is configured to decouple the feature image to obtain a scene structure diagram of the feature image
  • the gradient perception module 503 is configured to perform gradient perception processing on the feature image and the scene structure map to obtain a region-enhanced feature image
  • the depth estimation module 504 is configured to perform a depth estimation according to the feature image enhanced in the region to obtain a depth image of the monocular image.
  • the feature extraction module 501 is configured to input the monocular image into a first neural network for feature extraction to obtain a feature image of the monocular image.
  • the feature extraction performed by the first neural network includes:
  • Feature fusion is performed on the adjusted multi-scale feature information to obtain the feature image.
  • the scene structure estimation module 502 is configured to input the feature image into a second neural network for decoupling to obtain a scene structure diagram of the feature image.
  • the device further includes:
  • the establishing module 505 is configured to establish the second neural network in advance, wherein the second neural network includes at least a convolution layer and a linear rectification function.
  • the gradient sensing module 503 is configured to input the feature image and the scene structure map to a third neural network for gradient sensing processing to obtain a region-enhanced feature image.
  • the gradient sensing processing performed by the third neural network includes:
  • performing a similarity analysis based on the actual gradient information and the predicted gradient information to obtain a mask includes:
  • An actual gradient image with a similarity greater than a preset threshold is used as the mask.
  • performing the residual filtering process on the mask and the feature image includes:
  • Pre-processing the fused image to obtain a pre-processed image includes, in order, convolution calculation, linear rectification calculation, and convolution calculation;
  • the feature image and the pre-processed image are superimposed to obtain a region-enhanced feature image.
  • the depth estimation module 504 is configured to perform a convolution calculation on the feature image enhanced in the region to obtain a depth image of the monocular image.
  • the implementation functions of the units in the depth restoration device of the monocular image shown in FIG. 6 can be understood by referring to the related description of the depth restoration method of the monocular image.
  • the functions of the units in the depth restoration device for the monocular image shown in FIG. 6 may be implemented by a program running on a processor, or may be implemented by a specific logic circuit.
  • the depth restoration device for a monocular image is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device) is caused to execute all or part of the methods described in the embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read Only Memory), a magnetic disk, or an optical disk, and other media that can store program codes.
  • ROM Read Only Memory
  • magnetic disk or an optical disk, and other media that can store program codes.
  • an embodiment of the present application further provides a computer storage medium in which computer-executable instructions are stored.
  • the computer-executable instructions are executed by a processor, the method for depth restoration of the monocular image in the embodiment of the present application is implemented.
  • FIG. 7 is a schematic structural composition diagram of a computer device according to an embodiment of the present application.
  • the computer device 100 may include one or more (only one shown in the figure) a processor 1002 (the processor 1002 may include but is not limited to A processing device such as a microprocessor (MCU, Micro Controller) or a programmable logic device (FPGA, Field Programmable Gate Array), a memory 1004 for storing data, and a transmission device 1006 for a communication function.
  • MCU microprocessor
  • FPGA Field Programmable Gate Array
  • FIG. 7 is only for illustration, and it does not limit the structure of the electronic device.
  • the computer device 100 may also include more or fewer components than those shown in FIG. 7, or have a different configuration from that shown in FIG.
  • the memory 1004 may be used to store software programs and modules of application software, such as program instructions / modules corresponding to the methods in the embodiments of the present application.
  • the processor 1002 executes various functional applications by running the software programs and modules stored in the memory 1004. As well as data processing, the method described above is implemented.
  • the memory 1004 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
  • the memory 1004 may further include memory remotely set with respect to the processor 1002, and these remote memories may be connected to the computer device 100 through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 1006 is used for receiving or transmitting data via a network.
  • a specific example of the network described above may include a wireless network provided by a communication provider of the computer device 100.
  • the transmission device 1006 includes a network adapter (NIC, Network Interface Controller), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 1006 may be a radio frequency (RF, Radio Frequency) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • the disclosed method and smart device may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division.
  • there may be another division manner such as multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed components are coupled, or directly coupled, or communicated with each other through some interfaces.
  • the indirect coupling or communications of the device or unit may be electrical, mechanical, or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, which may be located in one place or distributed across multiple network units; Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into a second processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into a unit;
  • the above integrated unit may be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • feature extraction is performed on the monocular image to obtain a feature image of the monocular image; decoupling the feature image to obtain a scene structure diagram of the feature image; The feature image and the scene structure map are subjected to gradient sensing processing to obtain a region-enhanced feature image; and a depth estimation is performed according to the region-enhanced feature image to obtain a depth image of the monocular image.
  • the above-mentioned method and device for monocular image depth restoration can not only obtain better depth estimation results with a small amount of data, but also use gradient perception processing to obtain more depth details.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种单目图像的深度恢复方法及装置、计算机设备、计算机可读存储介质、计算机程序,所述方法包括:对所述单目图像进行特征提取,得到所述单目图像的特征图像(101);对所述特征图像进行解耦,得到所述特征图像的场景结构图(102);将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像(103);根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像(104)。

Description

单目图像的深度恢复方法及装置、计算机设备
相关申请的交叉引用
本申请基于申请号为201810502947.0、申请日为2018年05月23日、发明名称为“单目图像的深度恢复方法及装置、计算机设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机视觉领域,尤其涉及一种单目图像的深度恢复方法及装置、计算机设备、计算机可读存储介质、计算机程序。
背景技术
单目图像的深度恢复是指对单目图像进行深度恢复,单目图像的深度恢复是计算机视觉领域中的重要问题,且在很多领域都有重要的应用,比如三维重建、机器人实时定位、视觉避障等领域。
然而,单目图像的深度恢复由于深度尺度的不确定性,是一个复杂的多解问题。很多现有的深度估计方案往往简单地将场景结构估计和深度尺度估计耦合在一起同时求解,使得问题的求解过程相对困难,往往需要较多的数据和训练时间,精度低。
发明内容
为解决上述技术问题,本申请实施例提供了一种单目图像的深度恢复方法及装置、计算机设备、计算机可读存储介质、计算机程序。
本申请实施例提供的单目图像的深度恢复方法,包括:
对所述单目图像进行特征提取,得到所述单目图像的特征图像;
对所述特征图像进行解耦,得到所述特征图像的场景结构图;
将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
本申请实施例中,所述对所述单目图像进行特征提取,得到所述单目图像的特征图像,包括:
将所述单目图像输入第一神经网络进行特征提取,得到所述单目图像的特征图像。
本申请实施例中,所述第一神经网络执行的特征提取,包括:
对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息;
对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息;
对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
本申请实施例中,所述对所述特征图像进行解耦,得到所述特征图像的场景结构图,包括:
将所述特征图像输入第二神经网络进行解耦,得到所述特征图像的场景结构图。
本申请实施例中,在将所述特征图像输入第二神经网络之前,还包括:
预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。
本申请实施例中,所述将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像,包括:
将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处 理,得到区域增强的特征图像。
本申请实施例中,所述第三神经网络执行的梯度感知处理,包括:
根据所述场景结构图得到所述场景结构图的实际梯度图像;
根据所述特征图像得到所述特征图像对应的预测梯度图像;
根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模;
将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
本申请实施例中,所述根据所述实际梯度信息和所述预测梯度信息进行相似度分析,得到掩模,包括:
计算所述实际梯度图像和所述预测梯度图像之间的相似度;
将相似度大于预设阈值的实际梯度图像作为所述掩模。
本申请实施例中,所述将所述掩模和所述特征图像进行残差滤波处理,包括:
计算所述掩模和所述特征图像的乘积,得到融合图像;
将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;
将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的特征图像。
本申请实施例中,所述根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像,包括:
对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
本申请实施例提供的单目图像的深度恢复装置,包括:
特征提取模块,配置为对所述单目图像进行特征提取,得到所述单目 图像的特征图像;
场景结构估计模块,配置为对所述特征图像进行解耦,得到所述特征图像的场景结构图;
梯度感知模块,配置为将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
深度估计模块,配置为根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
本申请实施例中,所述特征提取模块,配置为将所述单目图像输入第一神经网络进行特征提取,得到所述单目图像的特征图像。
本申请实施例中,所述第一神经网络执行的特征提取,包括:
对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息;
对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息;
对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
本申请实施例中,所述场景结构估计模块,配置为将所述特征图像输入第二神经网络进行解耦,得到所述特征图像的场景结构图。
本申请实施例中,所述装置还包括:
建立模块,配置为预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。
本申请实施例中,所述梯度感知模块,配置为将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处理,得到区域增强的特征图像。
本申请实施例中,所述第三神经网络执行的梯度感知处理,包括:
根据所述场景结构图得到所述场景结构图的实际梯度图像;
根据所述特征图像得到所述特征图像对应的预测梯度图像;
根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模;
将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
本申请实施例中,所述根据所述实际梯度信息和所述预测梯度信息进行相似度分析,得到掩模,包括:
计算所述实际梯度图像和所述预测梯度图像之间的相似度;
将相似度大于预设阈值的实际梯度图像作为所述掩模。
本申请实施例中,所述将所述掩模和所述特征图像进行残差滤波处理,包括:
计算所述掩模和所述特征图像的乘积,得到融合图像;
将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;
将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的特征图像。
本申请实施例中,所述深度估计模块,配置为对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
本申请实施例提供一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时,实现本申请实施例提供的单目图像的深度恢复方法。
本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本申请实施例提供的单目图像的深度恢复方法。
本申请实施例提供一种计算机程序,包括计算机指令,当所述计算机 指令在设备的处理器中运行时,实现本申请实施例提供的单目图像的深度恢复方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
本申请实施例的技术方案中,对所述单目图像进行特征提取,得到所述单目图像的特征图像;对所述特征图像进行解耦,得到所述特征图像的场景结构图;将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。上述的单目图像的深度恢复方法和装置,不仅能够采用少量的数据获得更好的深度估计结果,而且还采用梯度感知处理获得更多深度细节。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1为本申请实施例的单目图像的深度恢复方法的流程示意图;
图2为本申请实施例的神经网络架构图;
图3为本申请实施例的单目图像的特征提取的流程示意图;
图4为本申请实施例的梯度感知处理的流程示意图;
图5为本申请实施例的单目图像的深度恢复装置的结构组成示意图一;
图6为本申请实施例的单目图像的深度恢复装置的结构组成示意图二;
图7为本申请实施例的计算机设备的结构组成示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到: 除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器等电子设备一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本申请实施例的单目图像的深度恢复方法的流程示意图,如图1所示,所述单目图像的深度恢复方法包括以下步骤:
步骤101:对单目图像进行特征提取,得到所述单目图像的特征图像。
这里,单目图像也称为二维图像,单目图像的信息仅包含有各个像素点的颜色信息,例如RGB信息,而不包含有各个像素点的深度信息。本申请实施例旨在从单目图像中估计出对应的深度图像,假设(x,y)为单目图像中任一个像素点的坐标,从单目图像中可以确定坐标(x,y)对应的RGB信息,而无法确定坐标(x,y)对应的深度信息(depth),为了确定坐标(x,y)对应的深度信息(depth),需要对该单目图像的深度进行恢复。
为实现单目图像的深度恢复,首先需要对所述单目图像进行特征提取,得到所述单目图像的特征图像。这里的特征提取是指对所述单目图像进行深度感知特征提取,以为后续进行深度图像的估计提供基本特征。
本申请实施例中,可以采用卷积操作来实现对所述单目图像进行特征提取。
步骤102:对所述特征图像进行解耦,得到所述特征图像的场景结构图。
这里,场景结构图包含有单目图像的场景结构信息,其中,场景结构信息包括单目图像中各个物体的结构信息以及物体之间的相对位置关系(例如前后关系),这里的场景结构信息本质上反映了物体的相对深度信息。
本申请实施例中,特征图像中包括了两种信息,一种是场景结构信息,另一种是深度尺度信息,同时对这两种信息进行估计非常困难,为此,本申请实施例对所述特征图像进行解耦,先估计所述特征图像的场景结构信息,进而得到场景结构图。
本申请实施例中,可以采用卷积操作来实现估计所述特征图像的场景结构信息。
步骤103:将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像。
考虑到深度图像的几何细节,对于3D重建等应用非常关键,本申请实施例通过梯度感知处理来估计特征图像的增强区域,从而得到区域增强的特征图像。
本申请实施例中,利用梯度和几何细节之间的强相关性,对获取到的特征图像和场景结构图分别进行梯度信息的估计,对比这两个估计结果,即可确定出增强区域,从而得到区域增强的特征图像。在该区域增强的特征图像中,几何细节得到增强,为后续得到高质量的深度图像提供了基本保障。
步骤104:根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
本申请实施例中,通过卷积操作对所述区域增强的特征图像进行深度估计,从而得到所述单目图像的深度图像。具体地,对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
本申请实施例的技术方案,将深度估计解耦为两个部分,即1)场景结构(scene structure)的估计、2)深度的估计,如此可以显著加速神经网络的收敛,并且提高深度估计的准确度。此外,深度图像的局部细节通过梯度感知处理进一步改进,可以得到高质量的深度图像,从而为需要精细的几何细节和准确的物体边界的3D重建等应用提供优质的数据源。
为实现上述的单目图像的深度恢复方法,本申请实施例还提供一种神经网络架构(称为DCNet),如图2所示,DCNet由三部分组成,即:1)特征提取模块,2)解耦模块,以及3)梯度感知模块,其中,解耦模块包括两个部分,分别为2.1)场景结构估计模块、2.2)深度估计模块。参照图2,(a)部分的网络对应特征提取模块,(b)部分的网络对应场景结构估计 模块,(d)部分的网络对应深度估计模块,(c)部分的网络对应梯度感知模块。本申请实施例中,可以采用欧几里德损失函数对图2所示的DCNet进行训练。在图2所示的DCNet中,通过(a)部分的网络对单目图像进行特征提取,得到所述单目图像的特征图像;通过(b)部分的网络对所述特征图像进行解耦,得到所述特征图像的场景结构图;通过(c)部分的网络将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;通过(d)部分的网络根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
图1所示的方案中,单目图像的特征提取可以通过将所述单目图像输入第一神经网络进行特征提取,从而得到所述单目图像的特征图像。以下结合图2及图3对如何通过第一神经网络进行特征提取进行描述。
图3为本申请实施例的单目图像的特征提取的流程示意图,如图3所示,该流程包括以下步骤:
步骤301:对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息。
基于图2所示的DCNet,第一神经网络为(a)部分的网络,参照图2中的(a)部分的网络,左侧的一列卷积层用于对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息。
步骤302:对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息。
这里,多尺度特征信息是指提取出的特征信息的尺度不同,有些特征信息的尺度较大,有些特征信息的尺度较小,由于多尺度特征信息的尺度不一致,会使得小尺度的特征信息在融合之后消失,因此,需要通过中间一列的类残差调整模块(Residual like adjustment,简称为adjust)(参照图2中的(e))对所述多尺度特征信息进行残差调整,得到调整后的多尺度特 征信息,这里,残差调整的目的是对多尺度特征信息中的各个特征信息的尺度进行调整,以得到更好的融合效果。
步骤303:对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
在一实施方式中,在调整后的多尺度特征信息中选择部分特征信息并上采样到输入图像大小的一半输入到右侧的融合模块(Concat)进行特征融合,得到所述特征图像。
通过上述步骤得到特征图像后,将所述特征图像输入第二神经网络进行解耦,得到所述特征图像的场景结构图。以下结合图2对如何通过第二神经网络对场景结构图进行估计进行描述。
基于图2所示的DCNet,第二神经网络为(b)部分的网络,(a)部分的网络从单目图像中提取出特征图像I后,将特征图像I输入至(b)部分的网络,通过(b)部分的网络预测场景结构图R。本申请实施例中,预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。这里,卷积层可以是512通道的卷积层,线性整流函数通过修正线性单元(ReLU,Rectified Linear Unit)来实现。I和R之间的关系可以通过以下公式表示:
Figure PCTCN2018116276-appb-000001
其中,F1表示从特征图像I到场景结构图R的映射,F1对应图2中的(b)部分的网络,(b)部分的网络用于学习场景结构图R。
为了细化图像的布局细节,本申请实施例将所述特征图像和所述场景结构图进行梯度感知处理,从而得到区域增强的特征图像。这里,梯度感知处理可以通过将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处理,得到区域增强的特征图像。以下结合图2及图4对如何通过第三神经网络进行梯度感知处理进行描述。
图4为本申请实施例的梯度感知处理的流程示意图,如图4所示,该流程包括以下步骤:
步骤401:根据所述场景结构图得到所述场景结构图的实际梯度图像。
步骤402:根据所述特征图像得到所述特征图像对应的预测梯度图像。
基于图2所示的DCNet,第三神经网络为(c)部分的网络,参照图2中的(c)部分的网络,根据场景结构图R估计实际梯度图像
Figure PCTCN2018116276-appb-000002
根据特征图像I估计预测梯度图像
Figure PCTCN2018116276-appb-000003
理想情况下,
Figure PCTCN2018116276-appb-000004
Figure PCTCN2018116276-appb-000005
相同,然而,由于
Figure PCTCN2018116276-appb-000006
是实际梯度图像,而
Figure PCTCN2018116276-appb-000007
是预测梯度图像,因此,
Figure PCTCN2018116276-appb-000008
Figure PCTCN2018116276-appb-000009
可能不同。
步骤403:根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模。
本实施例中,计算所述实际梯度图像和所述预测梯度图像之间的相似度(例如通过余弦函数计算相似度);将相似度大于预设阈值的实际梯度图像作为所述掩模。参照图2中的(c)部分的网络,计算
Figure PCTCN2018116276-appb-000010
Figure PCTCN2018116276-appb-000011
之间的相似度,将相似度大于δ的实际梯度图像作为所述掩模。
这里,掩模对应实际梯度图像的一个区域,这部分区域的特征有利于后续进一步优化深度图像的细节,使得深度图像能用于三维建模等高精度应用。
步骤404:将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
在一实施方式中,区域增强的特征图像通过如下方式得到:计算所述掩模和所述特征图像的乘积,得到融合图像;将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的特征图像。例如,参照图2中的(f)部分的网络(Residual like filtering),Multiply模块用于计算掩模和特征图像的乘积,得到融合图像, 将融合图像依次输入Conv模块、RuLU模块以及Conv模块,从而实现对应的卷积计算、线性整流计算以及卷积计算,最后的结果再通过Sum模块与原始的特征图像进行叠加,输出区域增强的特征图像。
通过上述方案得到区域增强的特征图像后,通过图2中的(d)部分的网络对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。这里,执行卷积计算的卷积层可以是64通道的卷积层。
本申请实施例中,通过场景结构估计模块(图2中的(b)部分的网络)和深度估计模块(图2中的(d)部分的网络),将深度图像的估计以分治策略的方式分解成了对场景结构的估计和对深度的估计,从而能够加快DCNet的收敛速度并得到更加精准的结果。并且,在场景结构估计模块和深度估计模块之间建立梯度感知模块,通过深度估计模块能够得到区域增强的特征图像,使得较深层的神经网络(例如图2中的(d)部分的网络)能够更集中在增强的区域,恢复出具有更好的边界以及细节的深度图像,恢复出来的高精度深度图像为三维重建等应用提供了高质量的数据源。
图5为本申请实施例的单目图像的深度恢复装置的结构组成示意图一,如图5所示,所述单目图像的深度恢复装置包括:
特征提取模块501,配置为对所述单目图像进行特征提取,得到所述单目图像的特征图像;
场景结构估计模块502,配置为对所述特征图像进行解耦,得到所述特征图像的场景结构图;
梯度感知模块503,配置为将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
深度估计模块504,配置为根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
本领域技术人员应当理解,图5所示的单目图像的深度恢复装置中的 各单元的实现功能可参照前述单目图像的深度恢复方法的相关描述而理解。图5所示的单目图像的深度恢复装置中的各单元的功能可通过运行于处理器上的程序而实现,也可通过具体的逻辑电路而实现。
图6为本申请实施例的单目图像的深度恢复装置的结构组成示意图二,如图6所示,所述单目图像的深度恢复装置包括:
特征提取模块501,配置为对所述单目图像进行特征提取,得到所述单目图像的特征图像;
场景结构估计模块502,配置为对所述特征图像进行解耦,得到所述特征图像的场景结构图;
梯度感知模块503,配置为将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
深度估计模块504,配置为根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
在一实施方式中,所述特征提取模块501,配置为将所述单目图像输入第一神经网络进行特征提取,得到所述单目图像的特征图像。
在一实施方式中,所述第一神经网络执行的特征提取,包括:
对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息;
对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息;
对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
在一实施方式中,所述场景结构估计模块502,配置为将所述特征图像输入第二神经网络进行解耦,得到所述特征图像的场景结构图。
在一实施方式中,所述装置还包括:
建立模块505,配置为预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。
在一实施方式中,所述梯度感知模块503,配置为将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处理,得到区域增强的特征图像。
在一实施方式中,所述第三神经网络执行的梯度感知处理,包括:
根据所述场景结构图得到所述场景结构图的实际梯度图像;
根据所述特征图像得到所述特征图像对应的预测梯度图像;
根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模;
将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
在一实施方式中,所述根据所述实际梯度信息和所述预测梯度信息进行相似度分析,得到掩模,包括:
计算所述实际梯度图像和所述预测梯度图像之间的相似度;
将相似度大于预设阈值的实际梯度图像作为所述掩模。
在一实施方式中,所述将所述掩模和所述特征图像进行残差滤波处理,包括:
计算所述掩模和所述特征图像的乘积,得到融合图像;
将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;
将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的特征图像。
在一实施方式中,所述深度估计模块504,配置为对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
本领域技术人员应当理解,图6所示的单目图像的深度恢复装置中的各单元的实现功能可参照前述单目图像的深度恢复方法的相关描述而理 解。图6所示的单目图像的深度恢复装置中的各单元的功能可通过运行于处理器上的程序而实现,也可通过具体的逻辑电路而实现。
本申请实施例上述单目图像的深度恢复装置如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
相应地,本申请实施例还提供一种计算机存储介质,其中存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现本申请实施例的上述单目图像的深度恢复方法。
图7为本申请实施例的计算机设备的结构组成示意图,如图7所示,计算机设备100可以包括一个或多个(图中仅示出一个)处理器1002(处理器1002可以包括但不限于微处理器(MCU,Micro Controller Unit)或可编程逻辑器件(FPGA,Field Programmable Gate Array)等的处理装置)、用于存储数据的存储器1004、以及用于通信功能的传输装置1006。本领域普通技术人员可以理解,图7所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,计算机设备100还可包括比图7中所示更多或者更少的组件,或者具有与图7所示不同的配置。
存储器1004可用于存储应用软件的软件程序以及模块,如本申请实施例中的方法对应的程序指令/模块,处理器1002通过运行存储在存储器1004内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上 述的方法。存储器1004可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1004可进一步包括相对于处理器1002远程设置的存储器,这些远程存储器可以通过网络连接至计算机设备100。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置1006用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机设备100的通信供应商提供的无线网络。在一个实例中,传输装置1006包括一个网络适配器(NIC,Network Interface Controller),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置1006可以为射频(RF,Radio Frequency)模块,其用于通过无线方式与互联网进行通讯。
本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和智能设备,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个第二处 理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。
工业实用性
本申请实施例的技术方案,对所述单目图像进行特征提取,得到所述单目图像的特征图像;对所述特征图像进行解耦,得到所述特征图像的场景结构图;将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。上述的单目图像的深度恢复方法和装置,不仅能够采用少量的数据获得更好的深度估计结果,而且还采用梯度感知处理获得更多深度细节。

Claims (23)

  1. 一种单目图像的深度恢复方法,所述方法包括:
    对所述单目图像进行特征提取,得到所述单目图像的特征图像;
    对所述特征图像进行解耦,得到所述特征图像的场景结构图;
    将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
    根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
  2. 根据权利要求1所述的方法,其中,所述对所述单目图像进行特征提取,得到所述单目图像的特征图像,包括:
    将所述单目图像输入第一神经网络进行特征提取,得到所述单目图像的特征图像。
  3. 根据权利要求2所述的方法,其中,所述第一神经网络执行的特征提取,包括:
    对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息;
    对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息;
    对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
  4. 根据权利要求1所述的方法,其中,所述对所述特征图像进行解耦,得到所述特征图像的场景结构图,包括:
    将所述特征图像输入第二神经网络进行解耦,得到所述特征图像的场景结构图。
  5. 根据权利要求4所述的方法,其中,在将所述特征图像输入第二神经网络之前,还包括:
    预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。
  6. 根据权利要求1所述的方法,其中,所述将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像,包括:
    将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处理,得到区域增强的特征图像。
  7. 根据权利要求6所述的方法,其中,所述第三神经网络执行的梯度感知处理,包括:
    根据所述场景结构图得到所述场景结构图的实际梯度图像;
    根据所述特征图像得到所述特征图像对应的预测梯度图像;
    根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模;
    将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
  8. 根据权利要求7所述的方法,其中,所述根据所述实际梯度信息和所述预测梯度信息进行相似度分析,得到掩模,包括:
    计算所述实际梯度图像和所述预测梯度图像之间的相似度;
    将相似度大于预设阈值的实际梯度图像作为所述掩模。
  9. 根据权利要求7所述的方法,其中,所述将所述掩模和所述特征图像进行残差滤波处理,包括:
    计算所述掩模和所述特征图像的乘积,得到融合图像;
    将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;
    将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的特征图像。
  10. 根据权利要求1所述的方法,其中,所述根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像,包括:
    对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
  11. 一种单目图像的深度恢复装置,所述装置包括:
    特征提取模块,配置为对所述单目图像进行特征提取,得到所述单目图像的特征图像;
    场景结构估计模块,配置为对所述特征图像进行解耦,得到所述特征图像的场景结构图;
    梯度感知模块,配置为将所述特征图像和所述场景结构图进行梯度感知处理,得到区域增强的特征图像;
    深度估计模块,配置为根据所述区域增强的特征图像进行深度估计,得到所述单目图像的深度图像。
  12. 根据权利要求11所述的装置,其中,所述特征提取模块,配置为将所述单目图像输入第一神经网络进行特征提取,得到所述单目图像的特征图像。
  13. 根据权利要求12所述的装置,其中,所述第一神经网络执行的特征提取,包括:
    对所述单目图像进行多尺度特征提取,得到所述单目图像的多尺度特征信息;
    对所述多尺度特征信息进行残差调整,得到调整后的多尺度特征信息;
    对所述调整后的多尺度特征信息进行特征融合,得到所述特征图像。
  14. 根据权利要求11所述的装置,其中,所述场景结构估计模块,配置为将所述特征图像输入第二神经网络进行解耦,得到所述特征图像 的场景结构图。
  15. 根据权利要求14所述的装置,其中,所述装置还包括:
    建立模块,配置为预先建立所述第二神经网路,其中,所述第二神经网路至少包括一卷积层和一线性整流函数。
  16. 根据权利要求11所述的装置,其中,所述梯度感知模块,配置为将所述特征图像和所述场景结构图输入第三神经网络进行梯度感知处理,得到区域增强的特征图像。
  17. 根据权利要求16所述的装置,其中,所述第三神经网络执行的梯度感知处理,包括:
    根据所述场景结构图得到所述场景结构图的实际梯度图像;
    根据所述特征图像得到所述特征图像对应的预测梯度图像;
    根据所述实际梯度图像和所述预测梯度图像进行相似度分析,得到掩模;
    将所述掩模和所述特征图像进行残差滤波处理,得到区域增强的特征图像。
  18. 根据权利要求17所述的装置,其中,所述根据所述实际梯度信息和所述预测梯度信息进行相似度分析,得到掩模,包括:
    计算所述实际梯度图像和所述预测梯度图像之间的相似度;
    将相似度大于预设阈值的实际梯度图像作为所述掩模。
  19. 根据权利要求17所述的装置,其中,所述将所述掩模和所述特征图像进行残差滤波处理,包括:
    计算所述掩模和所述特征图像的乘积,得到融合图像;
    将所述融合图像进行预处理,得到预处理后的图像,其中所述预处理依次包括:卷积计算、线性整流计算、卷积计算;
    将所述特征图像和所述预处理后的图像进行叠加,得到区域增强的 特征图像。
  20. 根据权利要求11所述的装置,其中,所述深度估计模块,配置为对所述区域增强的特征图像进行卷积计算,得到所述单目图像的深度图像。
  21. 一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时实现权利要求1至10任一项所述的方法步骤。
  22. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现权利要求1至10任一项所述的方法。
  23. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现权利要求1至10任一项所述的方法。
PCT/CN2018/116276 2018-05-23 2018-11-19 单目图像的深度恢复方法及装置、计算机设备 WO2019223262A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020520708A JP6850399B2 (ja) 2018-05-23 2018-11-19 単眼画像の深度回復方法及び装置、コンピュータ機器
SG11201912423WA SG11201912423WA (en) 2018-05-23 2018-11-19 Depth recovery methods and apparatuses for monocular image, and computer devices
US16/724,287 US11004221B2 (en) 2018-05-23 2019-12-21 Depth recovery methods and apparatuses for monocular image, and computer devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810502947.0 2018-05-23
CN201810502947.0A CN108932734B (zh) 2018-05-23 2018-05-23 单目图像的深度恢复方法及装置、计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/724,287 Continuation US11004221B2 (en) 2018-05-23 2019-12-21 Depth recovery methods and apparatuses for monocular image, and computer devices

Publications (1)

Publication Number Publication Date
WO2019223262A1 true WO2019223262A1 (zh) 2019-11-28

Family

ID=64449119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116276 WO2019223262A1 (zh) 2018-05-23 2018-11-19 单目图像的深度恢复方法及装置、计算机设备

Country Status (5)

Country Link
US (1) US11004221B2 (zh)
JP (1) JP6850399B2 (zh)
CN (1) CN108932734B (zh)
SG (1) SG11201912423WA (zh)
WO (1) WO2019223262A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6971934B2 (ja) * 2018-08-10 2021-11-24 株式会社東芝 画像処理装置
CN109948689B (zh) * 2019-03-13 2022-06-03 北京达佳互联信息技术有限公司 一种视频生成方法、装置、电子设备及存储介质
CN110515463B (zh) * 2019-08-29 2023-02-28 南京泛在地理信息产业研究院有限公司 一种手势交互式视频场景中基于单目视觉的3d模型嵌入方法
CN112446909B (zh) * 2019-08-30 2022-02-01 上海商汤临港智能科技有限公司 一种深度图像补全方法及装置、计算机可读存储介质
CN110992304B (zh) * 2019-10-30 2023-07-07 浙江力邦合信智能制动系统股份有限公司 二维图像深度测量方法及其在车辆安全监测中的应用
US20210366139A1 (en) * 2020-05-21 2021-11-25 Samsung Electronics Co., Ltd. Method and apparatus for generating depth image
US12014507B2 (en) 2021-06-10 2024-06-18 Toyota Motor Engineering & Manufacturing North America, Inc. Systems and methods for training a prediction system
CN114143517A (zh) * 2021-10-26 2022-03-04 深圳华侨城卡乐技术有限公司 一种基于重叠区域的融合蒙板计算方法、系统及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120002871A1 (en) * 2010-07-01 2012-01-05 Miao Hu Method of Estimating Depths from a Single Image Displayed on Display
CN103413347A (zh) * 2013-07-05 2013-11-27 南京邮电大学 基于前景背景融合的单目图像深度图提取方法
CN107204010A (zh) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 一种单目图像深度估计方法与系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009157895A1 (en) * 2008-06-24 2009-12-30 Thomson Licensing System and method for depth extraction of images with motion compensation
CN102413756B (zh) * 2009-04-29 2015-04-01 皇家飞利浦电子股份有限公司 从单目内窥镜图像估计实时深度
US8982187B2 (en) * 2011-09-19 2015-03-17 Himax Technologies Limited System and method of rendering stereoscopic images
US9471988B2 (en) * 2011-11-02 2016-10-18 Google Inc. Depth-map generation for an input image using an example approximate depth-map associated with an example similar image
CN105374039B (zh) * 2015-11-16 2018-09-21 辽宁大学 基于轮廓锐度的单目图像深度信息估计方法
CN106157307B (zh) * 2016-06-27 2018-09-11 浙江工商大学 一种基于多尺度cnn和连续crf的单目图像深度估计方法
CN106768325A (zh) * 2016-11-21 2017-05-31 清华大学 多光谱光场视频采集装置
WO2018160998A1 (en) * 2017-03-02 2018-09-07 Arizona Board Of Regents On Behalf Of Arizona State University Live-cell computed tomography
CN107578436B (zh) * 2017-08-02 2020-06-12 南京邮电大学 一种基于全卷积神经网络fcn的单目图像深度估计方法
US10504282B2 (en) * 2018-03-21 2019-12-10 Zoox, Inc. Generating maps without shadows using geometry

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120002871A1 (en) * 2010-07-01 2012-01-05 Miao Hu Method of Estimating Depths from a Single Image Displayed on Display
CN103413347A (zh) * 2013-07-05 2013-11-27 南京邮电大学 基于前景背景融合的单目图像深度图提取方法
CN107204010A (zh) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 一种单目图像深度估计方法与系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI, YAOYU ET AL.: "Structured Deep Learning Based Depth Estimation from a Monocular Image", ROBOT, vol. 39, no. 6, 1 November 2017 (2017-11-01), pages 812 - 819, XP055656924, DOI: 10.13973/j.cnki.robot.2017.0812 *

Also Published As

Publication number Publication date
JP2020524355A (ja) 2020-08-13
US11004221B2 (en) 2021-05-11
US20200143552A1 (en) 2020-05-07
JP6850399B2 (ja) 2021-03-31
CN108932734A (zh) 2018-12-04
SG11201912423WA (en) 2020-01-30
CN108932734B (zh) 2021-03-09

Similar Documents

Publication Publication Date Title
WO2019223262A1 (zh) 单目图像的深度恢复方法及装置、计算机设备
US11145083B2 (en) Image-based localization
KR102647351B1 (ko) 3차원의 포인트 클라우드를 이용한 모델링 방법 및 모델링 장치
Zeng et al. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions
JP2020524355A5 (zh)
US11315313B2 (en) Methods, devices and computer program products for generating 3D models
CN110109535A (zh) 增强现实生成方法及装置
US11948310B2 (en) Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator
EP3506149A1 (en) Method, system and computer program product for eye gaze direction estimation
Ye et al. Keypoint-based LiDAR-camera online calibration with robust geometric network
US11188787B1 (en) End-to-end room layout estimation
KR20230049969A (ko) 글로벌 측위 장치 및 방법
Chang et al. YOLOv4‐tiny‐based robust RGB‐D SLAM approach with point and surface feature fusion in complex indoor environments
Ailani et al. Self localization with edge detection in 3D space
Liu et al. Depth estimation of traffic scenes from image sequence using deep learning
US20220198707A1 (en) Method and apparatus with object pose estimation
Wan et al. View consistency aware holistic triangulation for 3D human pose estimation
Lin et al. 6D object pose estimation with pairwise compatible geometric features
Chen et al. Depth recovery with face priors
WO2020021238A1 (en) Method of model alignment through localisation usage
Li et al. Virtual reality realization technology and its application based on augmented reality
Liu et al. DOE: a dynamic object elimination scheme based on geometric and semantic constraints
Kim et al. Robust 3D Hand Tracking with Multi-View Videos
Jin et al. SLAM Fusion Optimization Based on Monocular Vision and Inertial Sensor
Yang et al. Intelligent Robotics and Applications: 16th International Conference, ICIRA 2023, Hangzhou, China, July 5–7, 2023, Proceedings, Part II

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020520708

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919889

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18919889

Country of ref document: EP

Kind code of ref document: A1