WO2019149206A1 - 深度估计方法和装置、电子设备、程序和介质 - Google Patents

深度估计方法和装置、电子设备、程序和介质 Download PDF

Info

Publication number
WO2019149206A1
WO2019149206A1 PCT/CN2019/073820 CN2019073820W WO2019149206A1 WO 2019149206 A1 WO2019149206 A1 WO 2019149206A1 CN 2019073820 W CN2019073820 W CN 2019073820W WO 2019149206 A1 WO2019149206 A1 WO 2019149206A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
neural network
binocular
sample
disparity
Prior art date
Application number
PCT/CN2019/073820
Other languages
English (en)
French (fr)
Inventor
罗越
任思捷
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to SG11202003141PA priority Critical patent/SG11202003141PA/en
Priority to KR1020207009470A priority patent/KR102295403B1/ko
Priority to JP2020517931A priority patent/JP6951565B2/ja
Publication of WO2019149206A1 publication Critical patent/WO2019149206A1/zh
Priority to US16/835,418 priority patent/US11308638B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to computer vision technology, and more particularly to a depth estimation method and apparatus, an electronic device, a computer program, and a computer readable storage medium.
  • Depth estimation is an important issue in the field of computer vision. Accurate depth estimation methods have important value in many fields, such as autonomous driving, 3D scene reconstruction and augmented reality.
  • Embodiments of the present disclosure provide a depth estimation technical solution.
  • a depth estimation method including:
  • a depth estimating apparatus including:
  • An image acquisition module configured to use a single image as the first image in the binocular image, and acquire the second image in the binocular image based on the first image via the first neural network;
  • a stereo matching module configured to acquire, by the second neural network, the depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image.
  • an electronic device including:
  • a memory for storing executable instructions
  • a processor for communicating with the memory to execute the executable instructions to perform the operations of the method of any of the above-described embodiments of the present disclosure.
  • a computer program comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above An instruction that operates in the method of any of the embodiments.
  • a computer readable storage medium for storing computer readable instructions that, when executed, implement operations in the method of any of the above embodiments of the present disclosure .
  • the depth estimation method and apparatus, the electronic device, the computer program, and the computer readable storage medium provided by the above embodiments of the present disclosure, using a single picture as the first image in the binocular image, via the first neural network, based on the first image Obtaining a second image in the binocular image, and performing binocular stereo matching on the first image and the second image via the second neural network, acquiring depth information corresponding to the first image, thereby implementing the single image based on the single image
  • the depth estimation of the scene in the picture without the need of a binocular camera, avoids the extra hardware overhead generated by the binocular camera, reduces the cost, and can avoid the depth information error obtained by the inaccurate setting of the binocular camera, and improves the The accuracy of the depth estimate.
  • FIG. 1 is a flow chart of an embodiment of a depth estimation method according to the present disclosure
  • FIG. 2 is a flow chart of another embodiment of a depth estimation method of the present disclosure.
  • FIG. 3 is a flow chart of an application embodiment of a depth estimation method according to the present disclosure.
  • FIG. 4 is an exemplary block diagram corresponding to the embodiment shown in FIG. 3;
  • FIG. 5 is a schematic structural diagram of an embodiment of a depth estimating apparatus according to the present disclosure.
  • FIG. 6 is a schematic structural view of another embodiment of a depth estimating apparatus according to the present disclosure.
  • Figure 7 is a schematic structural view of still another embodiment of the depth estimating apparatus of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an application embodiment of an electronic device according to the present disclosure.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the depth estimation method of this embodiment includes:
  • the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network.
  • the binocular image is two images taken by a binocular camera or two images of a plurality of images taken by a multi-head camera, which may be referred to as a left image and a right image. Wherein, when the first image is the left image, the second image is the right image; or, when the first image is the right image, the second image is the left image.
  • the binocular image may also be referred to as a main image and a sub-picture, and when any one of the binocular images is used as the main image, the other image is used as the sub-picture.
  • the operation 102 may be performed by a processor invoking a corresponding instruction stored in a memory or by an image acquisition module executed by the processor.
  • the first neural network and the second neural network may each be a multi-layer neural network (ie, a deep neural network), such as a multi-layer convolutional neural network, such as LeNet, AlexNet. Any neural network such as GoogLeNet, VGG, ResNet.
  • the first neural network and the second neural network may employ a neural network of the same type and structure, or a neural network of different types and structures.
  • the operation 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a stereo matching module executed by the processor.
  • the inventors have found through research that the current depth estimation methods can be mainly divided into two categories.
  • One is to use a large number of pixel-level depth tags to supervise the neural network, and to obtain depth estimation through trained neural network acquisition, but obtaining depth labels is not only expensive, but the existing technology cannot obtain high quality and dense depth. label.
  • the second type is a depth estimation method based on binocular stereo matching. In this depth estimation method, two images taken from different orientations will be used as input. Based on the rules of geometric space, the depth can be calculated by calculating two images. Obtained by the parallax of the pixel.
  • the accuracy of this type of prediction method is limited by the setting of the binocular camera, and this type of method generates additional hardware overhead due to the need for a binocular camera.
  • the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network.
  • the neural network obtains depth information by performing binocular stereo matching on the first image and the second image, thereby realizing depth estimation of the scene in the single picture based on a single picture, without requiring a binocular camera, avoiding double
  • the additional hardware overhead generated by the camera reduces the cost; and the depth information error caused by the inaccuracy of the binocular camera setting can be avoided, and the accuracy of the depth estimation is improved.
  • the depth estimation method of this embodiment includes:
  • 202 Taking a single picture as the first image in the binocular image, processing the first image by using the first neural network, and outputting a disparity probability map of the N channels.
  • the first horizontal direction is the horizontal left direction; when the first image is the right image, the first horizontal direction is the horizontal right direction. That is, when the first image is the left image, the disparity probability map of the i-th channel indicates the probability that the pixel on the left image is horizontally shifted to the left by a certain disparity.
  • the disparity probability maps of the first, second, third, fourth, and fifth channels respectively indicate the probability that the pixel on the left image is shifted to the left by 0, 1, 2, 3, and 4 parallaxes.
  • the probability that a pixel level is shifted to the left by 0, 1, 2, 3, 4 disparity may be, for example, 0.3, 0.4, 0.2, 0.1, and 0, respectively.
  • the operation 202 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first sub-neural network unit of an image acquisition module operated by the processor.
  • the operation 204 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an offset unit of an image acquisition module that is executed by the processor.
  • the operation 206 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a point multiply unit of an image acquisition module operated by the processor.
  • the second image in the binocular image is the right shot by the binocular camera.
  • the first image in the binocular image is the right image taken by the binocular camera
  • the second image in the binocular image is the left image taken by the binocular camera
  • the pixel corresponding to the foreground object in the first image has a larger probability value in the disparity probability map of the channel corresponding to the larger disparity in the disparity pixel position in the disparity probability map;
  • the pixel corresponding to the background object in the first image has a larger probability value in the disparity probability map of the channel corresponding to the smaller disparity in the corresponding pixel position in the disparity probability map.
  • the first image includes a background and a face as a foreground object, and the probability value of the pixel corresponding to the face in the disparity probability map of the channel corresponding to the larger disparity in the disparity probability map of the N channels is 0.8, at N
  • the probability value in the disparity probability map of the channel corresponding to the smaller disparity in the disparity probability map of the channel is 0.1; the probability value in the disparity probability map of the channel corresponding to the smaller disparity in the disparity probability map of the N channels in the corresponding channel
  • the probability value in the disparity probability map of the channel corresponding to the larger disparity in the disparity probability map of the N channels is 0.
  • the operation 208 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an addition unit of an image acquisition module operated by the processor.
  • each pixel position in the first image can be regarded as a variable, and the value of the variable is a parallax probability.
  • the value of the corresponding parallax in the figure may be 0, 1, ..., N-1.
  • a correlation coefficient of a positional relationship of one image in a binocular image with respect to a pixel in another image includes a correlation coefficient of a variable of all pixel positions in one image and a variable of adjacent d pixel positions in the second image, which may be expressed as one W*H*N correlation coefficient map or a correlation coefficient matrix.
  • W, H, and N respectively represent the width, height, and number of channels of an image, and the values of W, H, and N are integers greater than 0.
  • one image in the binocular image includes the first image or the second image, and the other image corresponds to the second image or the first image in the binocular image.
  • the operation 210 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first acquisition unit of a stereo matching module operated by the processor.
  • the values of at least one pixel in the disparity map respectively represent the disparity of a certain point in the first image capturing scene, that is, the coordinates of the certain point in the first image coordinate system and the coordinates in the second image coordinate system The difference between the two.
  • the operation 212 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a generating unit of a stereo matching module operated by the processor.
  • the depth information corresponding to the first image may be acquired based on the disparity map and the camera parameter, for example, the focal length and the binocular image of the camera that captures the first image may be based on the disparity map.
  • the depth information corresponding to the first image is obtained by the distance between the corresponding binocular cameras.
  • Disp represents the predicted disparity map
  • f is the focal length of the camera that captures the first image
  • B is the distance between the binocular cameras
  • Z is the monocular global depth map to be predicted.
  • the operation 214 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a third acquisition unit of a stereo matching module operated by the processor.
  • the problem of the monocular depth estimation is transformed into the binocular stereo matching problem.
  • the more difficult depth estimation problem is converted into the problem of matching two image similar pixel points, and the matching is no longer It is necessary to speculate the geometric relationship between at least two pixels in a single image, which reduces the computational complexity.
  • the embodiment of the present disclosure can better implement the two operations of synthesizing the second image and the binocular stereo matching by using the deep learning method, and explicitly setting the geometric transformation in the first neural network and the second neural network. Improve the accuracy of the operation results.
  • the embodiment of the present disclosure performs binocular stereo matching by synthesizing the corresponding right image using a single picture, and when training for the first neural network and the second neural network, a large number of precise depth labels are no longer needed for supervision. It is only necessary to train the first neural network (also known as image synthesis network) using the easily corrected binocular image, and train the second neural network (also called binocular stereo) using a large number of computer rendered binocular images and depth maps. Matching network) reduces training data overhead compared to a large number of sophisticated depth tags.
  • processing the first image by using the first neural network to output a disparity probability map may include:
  • Feature extraction is performed on the first image by network layers of two or more network depths in the first neural network, respectively, to obtain feature maps of two or more scales (ie, sizes); in the present disclosure, at least two are two One or two or more.
  • the preliminary disparity probability maps of the above two or more resolutions are enlarged to the resolution of the first image for superposition, and the disparity probability maps of the N channels are obtained.
  • feature maps of different sizes and resolutions are generated at different stages of the neural network, and the feature maps of different sizes and resolutions can generate preliminary disparity probability maps of different sizes and resolutions, thereby Helps provide different local and global information for predicting depth information.
  • the first image is a red, green, and blue (RGB) image with W*H*N of 200*200*3, and a characteristic map of 100*100*64 is obtained through a network layer of a certain network depth of the first neural network, and then Continue to pass through the network layer of another network depth to obtain a feature map of 50*50*128.
  • a preliminary disparity probability map of different sizes and resolutions can be obtained, for example, 100*100*N and Preliminary parallax probability plot of 50*50*N.
  • the resolution of the first image is 200*200, and the resolutions of the two preliminary disparity probability maps are 100*100 and 50*50, respectively, and the resolutions of the two preliminary disparity probability maps are respectively the resolution of the first image. 1/2*1/2, 1/4*1/4.
  • the size of the feature extracted by the network layer of two or more network depths in the first neural network is different, wherein the feature depth field extracted by the network layer with a shallow network depth is smaller, and the first image is compared.
  • the information of the small area, the deeper network layer extracts the feature receptive field can reflect the information of the larger area in the first image, and even the global information, and use different resolution feature maps to simultaneously provide information of different fields of view. Can produce a more accurate probability disparity map.
  • the operation 210 may include:
  • Feature extraction is performed on one image and another image, respectively.
  • feature extraction can be performed on one image and another image through a convolutional neural network
  • the second neural network based on the extracted features of one image and the features of the other image, the positional relationship between one image and the pixels in the other image is acquired, and the correlation coefficient is output.
  • the operation 212 may include superimposing features of another image and correlation coefficients to generate a disparity map of the first image and the second image.
  • the disparity map of the single image capturing scene in the binocular image can be obtained based on the single image, and the more difficult depth estimation problem is converted into the problem of matching the similar pixel points of the two images, and the matching no longer needs to be speculated.
  • the geometric relationship between at least two pixels in a single image reduces the computational complexity.
  • the embodiment uses the deep learning method to explicitly set the geometric transformation in the second neural network, thereby improving the accuracy of the operation result.
  • the operation 212 may include: superimposing features of another image and correlation coefficients to obtain a superposition result, where the superposition result may be, for example, a feature map;
  • the feature of the result is obtained by fusing the extracted feature of the superimposed result with the superimposed result to obtain a disparity map of the first image and the second image.
  • the features of the overlay results may be extracted by a convolutional neural network, which may for instance include, but is not limited to, a layer of convolutional layer and a layer of activation layer (ReLu).
  • the convolutional neural network can be implemented, for example, by an encoding-decoding model, and the feature of the superimposed result is extracted by the convolution layer to obtain a feature map having the same size as the superimposed result, and the feature map is merged with the superimposed result (concat). A disparity map of the first image and the second image is obtained.
  • the range of the receptive field can be increased, and the extracted superimposed result and the superimposed result are combined to obtain a disparity map of the first image and the second image, so that the disparity map is obtained. More information can be fused, and more global information can be obtained, thereby helping to improve the depth information corresponding to the first image of the subsequent prediction.
  • FIG. 3 is a flow chart of an application embodiment of the depth estimation method of the present disclosure.
  • 4 is an exemplary block diagram corresponding to the embodiment shown in FIG. 3.
  • the first image and the second image in the above at least one embodiment of the present disclosure are respectively described in the left diagram and the right diagram.
  • the application embodiment includes:
  • the single picture is used as the left picture in the binocular image, and the left picture is processed by the first neural network, and the disparity probability map of the N channels is output.
  • the operation 302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first sub-neural network element of an image acquisition module operated by the processor.
  • the operation 304 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an offset unit of an image acquisition module that is executed by the processor.
  • the operation 306 may be performed by a processor invoking a corresponding instruction stored in a memory, or by a point multiply unit of an image acquisition module being executed by the processor.
  • the operation 308 may be performed by a processor invoking a corresponding instruction stored in a memory, or by an addition unit of an image acquisition module being executed by the processor.
  • the operation 310 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second sub-neural network unit of a first acquisition unit of a stereo matching module operated by the processor.
  • the operation 312 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition sub-unit of a first acquisition unit of a stereo matching module operated by the processor.
  • the superposition result may be, for example, a feature map.
  • the feature map obtained from the left image may be further extracted by the neural network, and the extracted features and the correlation coefficients are superimposed.
  • the neural network can be exemplarily composed of a layer of convolution layer and an activation layer, and further feature extraction is performed on the feature map obtained from the left image, so that the range of the receptive field can be increased, and the further processed feature map is obtained. (feature), and then superimposed with the correlation coefficient, so that the superposition result can include more global information, and improve the accuracy of the subsequently obtained disparity map and depth information.
  • the operation 314 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an overlay subunit of a generating unit of a stereo matching module operated by the processor.
  • the operation 316 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a fusion sub-unit of a generating unit of a stereo matching module operated by the processor.
  • the operation 318 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a third acquisition unit of a stereo matching module operated by the processor.
  • the three-dimensional spatial scene of the scene in the first image may also be acquired based on the depth information corresponding to the first image and the second image.
  • Embodiments of the present disclosure may be applied, for example, but not limited to the following:
  • Embodiments of the present disclosure may be applied to three-dimensional scene reconstruction, and depth information corresponding to the predicted first image (also referred to as a global depth map) may be applied to various scenes, such as automatic driving, three-dimensional scene restoration. , 3D movie production, and more. With the embodiment of the present disclosure, only a single picture is needed to obtain a better effect and reduce the cost.
  • the three-dimensional space scene of the entire scene in the original image ie, the first image
  • the restored three-dimensional space scene has many application scenarios. For example, 3D movies, autopilot, etc.
  • the first neural network may be pre-trained by using a sample binocular image in the first sample set, the first sample set including at least one set of the first sample binocular The image; and/or the second neural network may be pre-trained using the sample binocular image in the second sample set.
  • the method may further include:
  • the first neural network is trained using the sample binocular image in the first sample set
  • the second neural network is trained using the sample binocular image and the depth map in the second sample set.
  • the first sample set includes at least one set of first sample binocular images
  • each set of first sample binocular images includes a first image and a second image
  • the second sample set includes at least one set of second sample binoculars Image and disparity map labels.
  • training the first neural network with the sample binocular image in the first sample set may include:
  • a first difference between a second image output by the first neural network and a second image of the at least one set of first sample binocular images is obtained, and the first neural network is based on the first difference
  • the network performs training until the first training completion condition is met, which may include:
  • the parameter values of the network parameters in the first neural network are adjusted based on the first difference until the first training completion condition is satisfied.
  • the first training completion condition may include, but is not limited to, the first difference is less than the first preset threshold, and/or the number of trainings for the first neural network reaches the first preset number of times.
  • training the second neural network with the sample binocular image and the disparity map tag in the second sample set may include:
  • obtaining a second difference between the disparity map output by the second neural network and the disparity map label of the at least one set of second sample binocular images, and training the second neural network based on the second difference Until the second training completion condition is met it may include:
  • the parameter values of the network parameters in the second neural network are adjusted based on the second difference until the second training completion condition is satisfied.
  • the second training completion condition may include, but is not limited to, the second difference is less than the second preset threshold, and/or the number of times of training the second neural network reaches the second preset number of times.
  • the computer-generated left, right, and depth map labels may be used as the second sample binocular image and the second in the second sample set.
  • the depth map label corresponding to the sample image trains the second neural network.
  • the method may further include:
  • the first neural network and the second neural network are trained using the sample binocular image in the third sample set and the depth map tag corresponding to the third sample image.
  • the third sample set includes at least one set of third sample binocular images and a depth map label corresponding to the third sample image.
  • training the first neural network and the second neural network by using the sample binocular image in the third sample set and the depth map tag corresponding to the third sample image may include:
  • the parameter values of the network parameters in the first neural network and the second neural network are adjusted based on the third difference until the third training completion condition is satisfied.
  • the third training completion condition may include, but is not limited to, the third difference is less than the third preset threshold, and/or the number of times of training the first neural network and the second neural network reaches a third preset number of times.
  • any of the depth estimation methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any depth estimation method provided by an embodiment of the present disclosure may be performed by a processor, such as the processor performing any of the depth estimation methods mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. This will not be repeated below.
  • the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
  • the operation of the foregoing method embodiment is included; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 5 is a schematic structural diagram of an embodiment of a depth estimating apparatus according to the present disclosure.
  • the depth estimation apparatus of this embodiment can be used to implement the above-described at least one depth estimation method embodiment of the present disclosure.
  • the depth estimating apparatus of this embodiment includes an image acquiring module and a stereo matching module. among them:
  • an image obtaining module configured to use the single image as the first image in the binocular image, and acquire the second image in the binocular image based on the first image via the first neural network.
  • the stereo matching module is configured to obtain, by the second neural network, the depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image.
  • the depth estimation apparatus wherein the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network, and the second neural
  • the network obtains depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image, thereby implementing depth estimation of the scene in the single image based on the single image, without requiring a binocular camera
  • the additional hardware overhead generated by the binocular camera is avoided, and the cost is reduced; and the depth information error obtained by the inaccuracy of the binocular camera setting can be avoided, and the accuracy of the depth estimation is improved.
  • FIG. 6 is a schematic structural view of another embodiment of a depth estimating apparatus according to the present disclosure.
  • an image acquisition module includes: a first sub-neural network unit, an offset unit, a point multiplication unit, and an addition unit. among them:
  • a first sub-neural network unit configured to process the first image and output a disparity probability map of the N channels; wherein, the disparity probability map of each channel indicates that the pixels on the first image are offset by i parallax in the first horizontal direction
  • an offset unit configured to offset the first image by i pixels in the first horizontal direction according to the disparity probability map of the N channels, to obtain N offset maps.
  • the point multiplication unit is configured to multiply each of the offset maps in the N offset maps by the disparity probability map of the corresponding channel to obtain N multiplication results.
  • An adding unit is configured to superimpose the N point multiplication results based on the pixels to obtain a second image.
  • the first sub-neural network unit includes two or more network layers of network depth, and is configured to perform feature extraction on the first image by using network layers of two or more network depths respectively to obtain two Feature maps of one or more scales; preliminary disparity probability maps of N channels with two or more resolutions based on feature maps of two or more scales; and two or more resolutions for each channel respectively
  • the preliminary disparity probability map is enlarged to the resolution of the first image for superposition, and a disparity probability map of N channels is obtained.
  • the corresponding pixel position in the disparity probability map of the N channels in the first image has a larger probability value in the disparity probability map of the channel corresponding to the larger disparity; in the first image The corresponding pixel position of the pixel corresponding to the background object in the disparity probability map of the N channels has a larger probability value in the disparity probability map of the channel corresponding to the smaller disparity.
  • the stereo matching module may include: a first acquiring unit, a generating unit, and a third acquiring unit. among them:
  • a first acquiring unit configured to acquire a correlation coefficient for indicating a positional relationship of one image of the binocular image with respect to pixels in another image; one image of the binocular image includes the first image or the second image, and the other image corresponds to A second image or a first image is included.
  • a generating unit configured to generate a disparity map of the first image and the second image based on another image and the correlation coefficient.
  • the third acquiring unit acquires depth information corresponding to the first image based on the disparity map.
  • the first obtaining unit may include: a second sub-neural network unit configured to perform feature extraction on one image and another image respectively; and acquiring sub-units for extracting features of an image based on the extracted image Another feature of the image acquires the positional relationship of one image with the pixels in the other image and outputs a correlation coefficient for indicating the positional relationship of one image in the binocular image with respect to the pixels in the other image.
  • the generating unit is configured to superimpose the features of the other image with the correlation coefficients to generate a disparity map of the first image and the second image.
  • the generating unit may include: an overlay subunit for superimposing features of another image with correlation coefficients to obtain a superposition result; a fusion subunit for extracting features of the superposition result, and extracting The feature of the superimposed result is fused with the superimposed result to obtain a disparity map of the first image and the second image.
  • the third obtaining unit is configured to acquire depth information corresponding to the first image based on the disparity map, the focal length of the camera that captures the first image, and the distance between the binocular cameras corresponding to the binocular image.
  • the depth estimating apparatus of the at least one embodiment of the present disclosure may further include: an acquiring module, configured to acquire a three-dimensional spatial scene of the scene in the first image based on the depth information corresponding to the first image and the second image.
  • FIG. 7 is a schematic structural diagram of still another embodiment of the depth estimating apparatus of the present disclosure.
  • the image acquisition module and the stereo matching module may be selectively implemented by using the structure of any of the embodiments shown in FIG. 6, or may be implemented by other structures.
  • the first neural network may be pre-trained using the sample binocular image in the first sample set, the first sample set including at least one set of the first sample double Eye image.
  • the second neural network may be pre-trained using the sample binocular image in the second set of samples; the second set of samples includes at least one set of second sample binocular images and disparity map labels.
  • a first training module is further included.
  • the first neural network is configured to acquire and output the second image in the at least one set of first sample binocular images from the first image in the at least one set of first sample binocular images.
  • a first training module configured to acquire a first difference between the second image output by the first neural network and the second image of the at least one set of first sample binocular images, and to the first neural network based on the first difference Train until the first training completion condition is met.
  • the first training module is configured to: acquire a first difference in pixels between the second image output by the first neural network and the second image in the at least one set of first sample binocular images;
  • the parameter values of the network parameters in the first neural network are adjusted based on the first difference until the first training completion condition is satisfied.
  • the first training completion condition may include, but is not limited to, the first difference is less than the first preset threshold, and/or the number of trainings for the first neural network reaches the first preset number of times.
  • a second training module may be further included.
  • the second neural network is configured to acquire and output a disparity map of at least one set of second sample binocular images.
  • a second training module configured to acquire a second difference between the disparity map output by the second neural network and the disparity map label of the at least one second sample binocular image, and train the second neural network based on the second difference, Until the second training completion condition is met.
  • the second training module is specifically configured to: acquire a second difference in pixels between the disparity map output by the second neural network and the disparity map label of the at least one second sample binocular image;
  • the parameter values of the network parameters in the second neural network are adjusted based on the second difference until the second training completion condition is satisfied.
  • the second training completion condition may include, but is not limited to, the second difference is less than the second preset threshold, and/or the number of times of training the second neural network reaches the second preset number of times.
  • a third training module may be further included, configured to utilize a depth map corresponding to the sample binocular image and the third sample image in the third sample set.
  • the tag trains the first neural network and the second neural network.
  • the third sample set includes at least one set of third sample binocular images and a depth map label corresponding to the third sample image.
  • the first neural network is configured to acquire a second image of the at least one set of third sample binocular images from the first image of the at least one set of third sample binocular images; the second neural network, A disparity map for acquiring at least one set of third sample binocular images.
  • the third training module is configured to: acquire at least one set of depth information based on the disparity map of the at least one set of third sample binocular images; and obtain between the at least one set of depth information and the depth map label of the at least one set of third sample binocular images a third difference; adjusting parameter values of the network parameters in the first neural network and the second neural network based on the third difference until the third training completion condition is satisfied.
  • the third training completion condition may include, but is not limited to, the third difference is less than the third preset threshold, and/or the number of times of training the first neural network and the second neural network reaches a third preset number of times.
  • an electronic device provided by an embodiment of the present disclosure includes:
  • a memory for storing executable instructions
  • a processor for communicating with the memory to execute the executable instructions to perform the operations of the depth estimation method of any of the above-described embodiments of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an application embodiment of an electronic device according to the present disclosure. Referring now to Figure 8, there is shown a block diagram of an electronic device suitable for use in implementing a terminal device or server of an embodiment of the present disclosure. As shown in FIG.
  • the electronic device includes one or more processors, communication units, etc., such as one or more central processing units (CPUs) 801, and/or one or more Acceleration unit (GPU) 813, etc.
  • acceleration unit 813 may include, but is not limited to, a GPU, an FPGA, other types of dedicated processors, etc.
  • the processor may be in accordance with executable instructions stored in read only memory (ROM) 802 or from a storage portion 808 loads the executable instructions into random access memory (RAM) 803 to perform various appropriate actions and processes.
  • ROM read only memory
  • RAM random access memory
  • Communication portion 812 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions over bus 804.
  • a network card which can include, but is not limited to, an IB (Infiniband) network card
  • the processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions over bus 804.
  • a neural network acquiring a second image in the binocular image based on the first image; acquiring, by performing a binocular stereo matching on the first image and the second image via a second neural network The depth information corresponding to the first image.
  • RAM 803 various programs and data required for the operation of the device can be stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • ROM 802 is an optional module.
  • the RAM 803 stores executable instructions, or writes executable instructions to the ROM 802 at runtime, the executable instructions causing the processor to perform operations corresponding to any of the methods described above.
  • An input/output (I/O) interface 805 is also coupled to bus 804.
  • the communication unit 812 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 808 including a hard disk or the like. And a communication portion 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the Internet.
  • Driver 810 is also coupled to I/O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage portion 808 as needed.
  • FIG. 8 is only an optional implementation manner.
  • the number and types of the components in FIG. 8 may be selected, deleted, added, or replaced according to actual needs; Different functional component settings may also be implemented by separate settings or integrated settings.
  • the acceleration unit 813 and the CPU 801 may be separately disposed or the acceleration unit 813 may be integrated on the CPU 801.
  • the communication unit 812 may be separately configured or integrated.
  • CPU 801 or acceleration unit 813, and so on are all within the scope of the disclosure.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising The depth estimation method provided by the embodiment of the present disclosure is executed to operate the corresponding instruction.
  • the computer program can be downloaded and installed from the network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by the CPU.
  • embodiments of the present disclosure also provide a computer program comprising computer readable code, the processor in the device executing any of the embodiments of the present disclosure when the computer readable code is run on a device An instruction to operate in the depth estimation method.
  • an embodiment of the present disclosure further provides a computer readable storage medium for storing computer readable instructions, wherein when the instructions are executed, the operations in the depth estimation method according to any embodiment of the present disclosure are implemented. .
  • the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
  • the operation includes the c operation of the above method embodiment; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the methods and apparatus of the present disclosure may be implemented in a number of ways.
  • the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware or any combination of software, hardware, firmware.
  • the above-described sequence of operations for the method is for illustrative purposes only, and the operation of the method of the present disclosure is not limited to the order specifically described above unless otherwise specifically stated.
  • the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例公开了一种深度估计方法和装置、电子设备、程序和介质,其中,方法包括:以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。本公开实施例基于单张图片实现了深度估计,而不需要双目相机,避免了双目相机产生的额外硬件开销,降低了成本;并且,可以避免双目相机设定不准确导致获得的深度信息错误,提高了深度估计的准确性。

Description

深度估计方法和装置、电子设备、程序和介质
本公开要求在2018年02月01日提交中国专利局、申请号为CN 201810103195.0、发明名称为“深度估计方法和装置、电子设备、程序和介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术,尤其是一种深度估计方法和装置、电子设备、计算机程序和计算机可读存储介质。
背景技术
深度估计是计算机视觉领域内的重要问题。准确的深度估计方法在许多领域,例如自动驾驶、三维场景重建以及增强现实等领域,有着重要的利用价值。
在卷积神经网络发展的驱动下,深度估计的相关技术得到了快速发展。
发明内容
本公开实施例提供一种深度估计技术方案。
根据本公开实施例的一个方面,提供的一种深度估计方法,包括:
以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;
经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
根据本公开实施例的另一个方面,提供的一种深度估计装置,包括:
图像获取模块,用于以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;
立体匹配模块,用于经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
根据本公开实施例的又一个方面,提供的一种电子设备,包括:
存储器,用于存储可执行指令;以及
处理器,用于与所述存储器通信以执行所述可执行指令从而完成本公开上述任一实施例所述方法的操作。
根据本公开实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本公开上述任一实施例所述方法中操作的指令。
根据本公开实施例的再一个方面,提供的一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本公开上述任一实施例所述方法中的操作。
基于本公开上述实施例提供的深度估计方法和装置、电子设备、计算机程序和计算机可读存储介质,以单张图片作为双目图像中的第一图像,经第一神经网络,基于第一图像获取双目图像中的第二图像,经第二神经网络,基于对第一图像与第二图像进行双目立体匹配,获取第一图像对应的深度信息,由此基于单张图片实现了该单张图片中场景的深度估计,而不需要双目相机,避免了双目相机产生的额外硬件开销,降低了成本;并且,可以避免双目相机设定不准确导致获得的深度信息错误,提高了深度估计的准确性。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开深度估计方法一个实施例的流程图;
图2为本公开深度估计方法另一个实施例的流程图;
图3为本公开深度估计方法一个应用实施例的流程图;
图4为图3所示实施例对应的示例性框图;
图5为本公开深度估计装置一个实施例的结构示意图;
图6为本公开深度估计装置另一个实施例的结构示意图;
图7为本公开深度估计装置又一个实施例的结构示意图;
图8为本公开电子设备一个应用实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和操作的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本公开深度估计方法一个实施例的流程图。如图1所示,该实施例的深度估计方法包括:
102,以单张图片作为双目图像中的第一图像,经第一神经网络,基于该第一图像获取双目图像中的第二图像。
双目图像为经双目相机拍摄的两张图像、或者经多目相机拍摄的多张图像中的两张图像,可以称为左图和右图。其中,第一图像为左图时,第二图像为右图;或者,第一图像 为右图时,第二图像为左图。另外,双目图像也可以称为主图和副图,双目图像中的任意一张图像作为主图时,另一张图像即作为副图。
在一个可选示例中,该操作102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块执行。
104,经第二神经网络,通过对上述第一图像与第二图像进行双目立体匹配,获取上述第一图像对应的深度信息。
在本公开至少一个实施例中,第一神经网络、第二神经网络,分别可以是一个多层神经网络(即:深度神经网络),例如多层的卷积神经网络,例如可以是LeNet、AlexNet、GoogLeNet、VGG、ResNet等任意神经网络。第一神经网络和第二神经网络可以采用相同类型和结构的神经网络,也可以采用不同类型和结构的神经网络。
在一个可选示例中,该操作104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块执行。
在实现本公开的过程中,发明人通过研究发现,目前的深度估计方法主要可以分为两类。一类是使用大量的像素级别的深度标签对神经网络进行监督、通过训练好的神经网络获取进行深度估计,但是获取深度标签不仅花费极大,现有的技术也不能得到高质量且稠密的深度标签。第二类是基于双目立体匹配的深度估计方法,在这种深度估计方法中,两张从不同方位拍摄的图像将被作为输入,基于几何空间的规则,深度可以通过计算两张图相对应像素的视差而获得。但是,该类预测方法的准确率受限于双目相机的设定,而且该类方法由于需要双目相机产生额外的硬件开销。而基于本公开上述实施例提供的深度估计方法,以单张图片作为双目图像中的第一图像,经第一神经网络,基于第一图像获取双目图像中的第二图像,经第二神经网络,通过对第一图像与第二图像进行双目立体匹配,获取深度信息,由此基于单张图片实现了该单张图片中场景的深度估计,而不需要双目相机,避免了双目相机产生的额外硬件开销,降低了成本;并且,可以避免双目相机设定不准确导致获得的深度信息错误,提高了深度估计的准确性。
图2为本公开深度估计方法另一个实施例的流程图。如图2所示,该实施例的深度估计方法包括:
202,以单张图片作为双目图像中的第一图像,经第一神经网络对该第一图像进行处理,输出N个通道的视差概率图。
其中,N个通道中不同的通道对应不同的视差,每个通道的视差概率图的表示第一图像上像素向第一水平方向偏移i个视差的概率,概率归一化后,同一个像素在所有通道上的概率值之和为1;i=0,1,…,N-1,N的取值为大于1的整数。第一图像为左图时,第一水平方向为水平向左的方向;第一图像为右图时,第一水平方向为水平向右的方向。即,第一图像为左图时,第i个通道的视差概率图表示该左图上像素水平向左偏移某i个视差的概率,例如,假设N的取值为5、第一图像为左图,第1、2、3、4、5个通道的视差概率图分别表示该左图上像素水平向左偏移0、1、2、3、4个视差的概率,该左图上某一个像素水平向左偏移0、1、2、3、4个视差的概率例如可以分别为0.3、0.4、0.2、0.1和0。
在一个可选示例中,该操作202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的第一子神经网络单元执行。
204,根据上述N个通道的视差概率图,将第一图像分别向第一水平方向偏移i个像素,得到N张偏移图。
其中,i=0,1,…,N-1,N的取值为大于1的整数。
在一个可选示例中,该操作204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的偏移单元执行。
206,将上述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图,得到N个 点乘结果。
在一个可选示例中,该操作206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的点乘单元执行。
208,将上述N个点乘结果基于像素进行叠加,得到上述双目图像中的第二图像。
由于双目相机拍摄的左、右图像遵守基本的空间规则,当双目图像中的第一图像为双目相机拍摄的左图时,双目图像中的第二图像为双目相机拍摄的右图;或者,当双目图像中的第一图像为双目相机拍摄的右图时,双目图像中的第二图像为双目相机拍摄的左图,因此本公开实施例获取到的第二图像与第一图像相对应像素的视差遵守空间几何规律。在本公开实施例的其中一个实施方式中,第一图像中对应前景对象的像素在视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;第一图像中对应背景对象的像素在视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。例如,第一图像中包括背景和作为前景对象的人脸,对应人脸的像素在N个通道的视差概率图中对应较大视差的通道的视差概率图中的概率值为0.8,在N个通道的视差概率图中对应较小视差的通道的视差概率图中的概率值为0.1;对应背景的像素在N个通道的视差概率图中对应较小视差的通道的视差概率图中的概率值为0.9,在N个通道的视差概率图中对应较大视差的通道的视差概率图中的概率值为0。
在一个可选示例中,该操作208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的加法单元执行。
210,通过第二神经网络,获取用于表示双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数。
因为将第一图像中的像素在第一水平方向上移动即可得到第二图像中的像素,可以将第一图像中的每个像素位置分别视为一个变量,该变量的取值为视差概率图中对应的视差的取值,可以是0,1,…,N-1共N个。获取第一图像中每个像素位置的变量与第二图像中相邻d个像素位置的变量的相关系数并保存在对应的像素位置和通道,由第一图像中所有像素位置的相关系数即得到第一图像相对于第二图像中像素的位置关系的相关系数。其中,d的取值为整数,例如可以取值-40~+40。双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数包括一个图像中所有像素位置的变量与第二图像中相邻d个像素位置的变量的相关系数,其可以表示为一个W*H*N的相关系数图或者一个相关系数矩阵。其中,W、H、N分别表示一个图像的宽度、高度和通道数,W、H、N的取值均为大于0的整数。
假设上述一个图像为左图,另一个图像为右图,由于左图和右图在竖直方向上是对齐的,将左图的像素在水平方向上向左移动即可得到右图的像素。因此,在水平方向上计算至少一个像素位置的相关系数有助于在水平方向上更好的匹配左右图对应的像素。
其中,双目图像中一个图像包括第一图像或第二图像,另一个图像对应为该双目图像中的第二图像或第一图像。
在一个可选示例中,该操作210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的第一获取单元执行。
212,基于另一个图像与相关系数,生成第一图像与第二图像的视差图(disparity)。
其中,视差图中至少一个像素的取值分别表示第一图像拍摄场景中的某一点的视差,即:该某一点在第一图像坐标系中的坐标与在第二图像坐标系中的坐标之间的差值。
在一个可选示例中,该操作212可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的生成单元执行。
214,基于上述视差图获取第一图像对应的深度信息。
在本公开实施例的又一个实施方式中,可以基于上述视差图和相机参数获取上述第一 图像对应的深度信息,例如,可以基于上述视差图、拍摄第一图像的相机的焦距和双目图像对应的双目相机之间的距离,获取上述第一图像对应的深度信息。
例如,可以通过但不限于公式:Z=f×B/Disp,获取上述第一图像对应的深度信息。
其中,Disp代表预测的视差图,f为拍摄第一图像的相机的焦距,B为双目相机之间的距离,Z即为所要预测的单目全局深度图。
在一个可选示例中,该操作214可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的第三获取单元执行。
本公开实施例,将单目深度估计的问题转化为双目立体匹配的问题,通过这种方式,将较为困难的深度估计问题转化为了匹配两张图像相似像素点的问题,这种匹配不再需要推测单张图像中至少两个像素间的几何关系,降低了计算的复杂度。另外,本公开实施例利用深度学习方法,可以更好的实现合成第二图像和双目立体匹配两个操作,并通过将几何的变换显性地设置在第一神经网络和第二神经网络中,提高了操作结果的准确性。
本公开实施例通过使用单张图片合成相对应的右图再进行双目立体匹配,在用于对第一神经网络和第二神经网络进行训练时,不再需要大量精密的深度标签作为监督,只需要使用容易得到的矫正好的双目图像训练第一神经网络(也称为图像合成网络),使用大量计算机渲染的双目图像和深度图训练第二神经网络(也称为:双目立体匹配网络),相比于大量精密的深度标签,降低了训练数据开销。
在本公开实施例的其中一个实施方式中,操作202中,经第一神经网络对第一图像进行处理,输出视差概率图,可以包括:
分别通过第一神经网络中的两个或以上网络深度的网络层对第一图像进行特征提取,得到两个或以上尺度(即:大小)的特征图;在本公开中,至少两个为两个或者两个以上。
基于上述两个或以上尺度的特征图分别获取两个或以上分辨率的、N个通道的初步视差概率图;
分别针对每个通道,将上述两个或以上分辨率的初步视差概率图放大至第一图像的分辨率下进行叠加,得到N个通道的视差概率图。
因为神经网络中的池化层效果,在神经网络的不同阶段会产生不同大小、分辨率的特征图,基于不同大小、分辨率的特征图能够产生不同大小、分辨率的初步视差概率图,从而有助于为预测深度信息提供不同的局部信息和全局信息。
例如,第一图像是W*H*N为200*200*3的红绿蓝(RGB)图,通过第一神经网络的某一网络深度的网络层得到100*100*64的特征图,再继续经过另一网络深度的网络层得到50*50*128的特征图,基于这两个不同大小的特征图,可以获得不同大小、分辨率的初步视差概率图,例如得到100*100*N以及50*50*N的初步视差概率图。其中,第一图像的分辨率为200*200,两个初步视差概率图的分辨率分别为100*100和50*50,两个初步视差概率图的分辨率分别为第一图像的分辨率大小的1/2*1/2、1/4*1/4。
基于本实施例,第一神经网络中的两个或以上网络深度的网络层提取的特征的尺寸不同,其中,网络深度较浅的网络层提取的特征感受野较小,体现第一图像中较小区域的信息,网络深度较深的网络层提取的特征感受野较大,可以体现第一图像中较大区域的信息、甚至全局信息,使用不同分辨率的特征图同时提供不同视野域的信息,可以产生更准确的概率视差图。
在本公开实施例的其中一个实施方式中,操作210可以包括:
分别对一个图像与另一个图像进行特征提取。例如,可以通过一段卷积神经网络,分别对一个图像与另一个图像进行特征提取;
通过第二神经网络,基于提取的一个图像的特征与另一个图像的特征,获取一个图像 与另一个图像中像素的位置关系,并输出相关系数。
相应地,在本公开实施例的另一个实施方式中,操作212可以包括:将另一个图像的特征与相关系数进行叠加,生成第一图像与第二图像的视差图。这样,基于单张图像即可获得该单张图像拍摄场景在双目图像中的视差图,将较为困难的深度估计问题转化为了匹配两张图像相似像素点的问题,这种匹配不再需要推测单张图像中至少两个像素间的几何关系,降低了计算的复杂度。另外,本实施例利用深度学习方法,将几何变换显性地设置在第二神经网络中,提高了操作结果的准确性。
可选地,在本公开实施例的又一个实施方式中,该操作212可以包括:将另一个图像的特征与相关系数进行叠加,得到叠加结果,该叠加结果例如可以是一个特征图;提取叠加结果的特征,并将提取的叠加结果的特征与叠加结果进行融合,获得第一图像与第二图像的视差图。
在其中一个实施方式中,可以通过一段卷积神经网络提取叠加结果的特征,该卷积神经网络例如可以示例性地包括但不限于一层卷积层和一层激活层(ReLu)。该卷积神经网络例如可以通过一个编码-解码模型实现,通过卷积层对叠加结果进行特征提取,得到一个与叠加结果相同大小的特征图,将该特征图与叠加结果进行融合(concat),获得第一图像与第二图像的视差图。
本公开实施例通过对叠加结果进行进一步特征提取,可以加大感受野的范围,再将提取的叠加结果的特征与叠加结果进行融合,获得第一图像与第二图像的视差图,使得视差图可以融合较多的信息,能够获取更多的全局信息,从而有助于提升后续预测的第一图像对应的深度信息。
图3为本公开深度估计方法一个应用实施例的流程图。图4为图3所示实施例对应的示例性框图。该应用实施例中,分别以左图、右图作为本公开上述至少一个实施例中的第一图像和第二图像进行说明。参见图3和图4,该应用实施例包括:
302,以单张图片作为双目图像中的左图,经第一神经网络对该左图进行处理,输出N个通道的视差概率图。
其中,每个通道分别表示左图上像素向水平向左偏移i个视差的概率;i=0,1,…,N-1,N的取值为大于1的整数。
在一个可选示例中,该操作302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的第一子神经网络单元执行。
304,根据上述N个通道的视差概率图,将左图分别向第一水平方向偏移i个像素,得到N张偏移图。
在一个可选示例中,该操作304可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的偏移单元执行。
306,将上述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图,得到N个点乘结果。
在一个可选示例中,该操作306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的点乘单元执行。
308,将上述N个点乘结果基于像素进行叠加,得到双目图像中的右图。
在一个可选示例中,该操作308可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的图像获取模块的加法单元执行。
310,通过第二神经网络,分别对左图与右图进行特征提取。
在一个可选示例中,该操作310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的第一获取单元的第二子神经网络单元执行。
312,通过第二神经网络,基于提取的左图的特征与右图的特征,获取左图与右图中 像素的位置关系,并输出相关系数。
在一个可选示例中,该操作312可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的第一获取单元的获取子单元执行。
314,将左图的特征与相关系数进行叠加,得到叠加结果,该叠加结果例如可以是一个特征图。
可选地,为了得到与左图像素对齐的视差图,可以先将从左图得到的特征图通过神经网络进行进一步特征提取,再将提取到的特征与相关系数进行叠加。其中,该神经网络可以示例性地由一层卷积层和一层激活层组成,对从左图得到的特征图进行进一步特征提取,可以加大感受野的范围,得到进一步处理后的特征图(feature),再与相关系数进行叠加,从而使得叠加结果可以包括更多的全局信息,提高后续获得的视差图和深度信息的准确性。
在一个可选示例中,该操作314可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的生成单元的叠加子单元执行。
316,通过第二神经网络,提取叠加结果的特征,并将提取的叠加结果的特征与叠加结果进行融合,获得第一图像与第二图像的视差图。
在一个可选示例中,该操作316可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的生成单元的融合子单元执行。
318,基于上述视差图、拍摄第一图像的相机的焦距和双目图像对应的双目相机之间的距离,获取第一图像对应的深度信息。
在一个可选示例中,该操作318可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的立体匹配模块的第三获取单元执行。
基于本公开上述至少一个实施例,获得深度信息后,还可以基于第一图像对应的深度信息与第二图像,获取该第一图像中场景的三维空间场景。
本公开实施例例如可以应用但不限于以下方面:
1,对单张图片进行全局深度估计;
2,本公开实施例可以被应用于三维场景重建,使用预测得到的第一图像对应的深度信息(也称为;全局深度图)可被应用到多种场景中,例如自动驾驶、三维场景恢复、3D电影制作等等。使用本公开实施例,只需要单张图片即可得到较好的效果,降低了成本。
使用预测得到的第一图像对应的深度信息、结合合成的右图,可以将原图(即:第一图像)中的整个场景的三维空间场景恢复出来,恢复出来的三维空间场景有许多应用场景,例如3D电影、自动驾驶等等。
可选的,本公开上述至少一个实施例中,第一神经网络可以利用第一样本集中的样本双目图像预先训练而得,该第一样本集包括至少一组第一样本双目图像;和/或,第二神经网络可以利用第二样本集中的样本双目图像预先训练而得。
相应地,在本公开上述至少一个实施例之前,还可以包括:
利用第一样本集中的样本双目图像对第一神经网络进行训练,以及利用第二样本集中的样本双目图像和深度图对第二神经网络进行训练。其中,第一样本集包括至少一组第一样本双目图像,每组第一样本双目图像包括第一图像和第二图像;第二样本集包括至少一组第二样本双目图像和视差图标签。
在其中一个实施方式中,利用第一样本集中的样本双目图像对第一神经网络进行训练,可以包括:
经第一神经网络,由至少一组第一样本双目图像中的第一图像,获取至少一组第一样本双目图像中的第二图像并输出;
获取第一神经网络输出的第二图像与至少一组第一样本双目图像中的第二图像之间的第一差异,并基于第一差异对第一神经网络进行训练,直至满足第一训练完成条件。
在其中一个可选示例中,获取第一神经网络输出的第二图像与至少一组第一样本双目图像中的第二图像之间的第一差异,并基于第一差异对第一神经网络进行训练,直至满足第一训练完成条件,可以包括:
获取第一神经网络输出的第二图像与至少一组第一样本双目图像中的第二图像之间在像素上的第一差异,例如,在像素上差值的绝对值之和;
基于第一差异调整第一神经网络中网络参数的参数值,直至满足第一训练完成条件。
其中,第一训练完成条件例如可以包括但不限于:第一差异小于第一预设阈值,和/或,对第一神经网络的训练次数达到第一预设次数。
在另一个实施方式中,利用第二样本集中的样本双目图像和视差图标签对第二神经网络进行训练,可以包括:
经第二神经网络,获取至少一组第二样本双目图像的视差图并输出;
获取第二神经网络输出的视差图与至少一组第二样本双目图像的视差图标签之间的第二差异,并基于第二差异对第二神经网络进行训练,直至满足第二训练完成条件。
在其中一个可选示例中,获取第二神经网络输出的视差图与至少一组第二样本双目图像的视差图标签之间的第二差异,并基于第二差异对第二神经网络进行训练,直至满足第二训练完成条件,可以包括:
获取第二神经网络输出的视差图与至少一组第二样本双目图像的视差图标签之间在像素上的第二差异,例如在像素上差值的绝对值之和;
基于第二差异调整第二神经网络中网络参数的参数值,直至满足第二训练完成条件。
其中,第二训练完成条件例如可以包括但不限于:第二差异小于第二预设阈值,和/或,对第二神经网络的训练次数达到第二预设次数。
由于真实采集的深度图标签不容易获取,基于本公开实施例在实际应用中,可以使用计算机合成的左图、右图和深度图标签作为第二样本集中的第二样本双目图像和第二样本图像对应的深度图标签,训练第二神经网络。
另外,通过本公开上述实施例对第一神经网络和第二神经网络分阶段训练完成后,还可以包括:
利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对第一神经网络和第二神经网络进行训练。
其中,第三样本集包括至少一组第三样本双目图像和第三样本图像对应的深度图标签。
在其中一个实施方式中,利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对第一神经网络和第二神经网络进行训练,可以包括:
经第一神经网络,由至少一组第三样本双目图像中的第一图像,获取至少一组第三样本双目图像中的第二图像;
经第二神经网络,获取至少一组第三样本双目图像的视差图;
基于至少一组第三样本双目图像的视差图获取至少一组深度信息;
获取至少一组深度信息与至少一组第三样本双目图像的深度图标签之间的第三差异,该第三差异例如可以是在像素上差值的绝对值之和;
基于第三差异调整第一神经网络和第二神经网络中网络参数的参数值,直至满足第三训练完成条件。
其中,第三训练完成条件例如可以包括但不限于:第三差异小于第三预设阈值,和/或,对第一神经网络和第二神经网络的训练次数达到第三预设次数。
本公开实施例提供的任一种深度估计方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种深度估计方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例 提及的任一种深度估计方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分操作可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的操作;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等至少一个种可以存储程序代码的介质。
图5为本公开深度估计装置一个实施例的结构示意图。该实施例的深度估计装置可用于实现本公开上述至少一个深度估计方法实施例。如图5所示,该实施例的深度估计装置包括:图像获取模块和立体匹配模块。其中:
图像获取模块,用于以单张图片作为双目图像中的第一图像,经第一神经网络,基于第一图像获取双目图像中的第二图像。
立体匹配模块,用于经第二神经网络,通过对第一图像与第二图像进行双目立体匹配,获取第一图像对应的深度信息。
基于本公开上述实施例提供的深度估计装置,以单张图片作为双目图像中的第一图像,经第一神经网络,基于第一图像获取双目图像中的第二图像,经第二神经网络,基于对第一图像与第二图像进行双目立体匹配,获取第一图像对应的深度信息,由此基于单张图片实现了该单张图片中场景的深度估计,而不需要双目相机,避免了双目相机产生的额外硬件开销,降低了成本;并且,可以避免双目相机设定不准确导致获得的深度信息错误,提高了深度估计的准确性。
图6为本公开深度估计装置另一个实施例的结构示意图。如图6所示,在本公开至少一个实施例的其中一个实施方式中,图像获取模块包括:第一子神经网络单元,偏移单元,点乘单元和加法单元。其中:
第一子神经网络单元,用于对第一图像进行处理,输出N个通道的视差概率图;其中,每个通道的视差概率图表示第一图像上像素向第一水平方向偏移i个视差的概率,i=0,1,…,N-1,N的取值为大于1的整数。
偏移单元,用于根据N个通道的视差概率图,将第一图像分别向第一水平方向偏移i个像素,得到N张偏移图。
点乘单元,用于将N张偏移图中的各张偏移图分别点乘对应通道的视差概率图中,得到N个点乘结果。
加法单元,用于将N个点乘结果基于像素进行叠加,得到第二图像。
在其中一个可选示例中,第一子神经网络单元,包括两个或以上网络深度的网络层,用于:分别通过两个或以上网络深度的网络层对第一图像进行特征提取,获得两个或以上尺度的特征图;基于两个或以上尺度的特征图分别获取两个或以上分辨率的、N个通道的初步视差概率图;以及分别针对每个通道,将两个或以上分辨率的初步视差概率图放大至第一图像的分辨率下进行叠加,得到N个通道的视差概率图。
可选地,第一图像中对应前景对象的像素在N个通道的视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;第一图像中对应背景对象的像素在N个通道的视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。
另外,再参见图6,在本公开至少一个实施例的另一个实施方式中,立体匹配模块可以包括:第一获取单元,生成单元和第三获取单元。其中:
第一获取单元,用于获取用于表示双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数;双目图像中一个图像包括第一图像或第二图像,另一个图像对应包括第二图像或第一图像。
生成单元,用于基于另一个图像与相关系数,生成第一图像与第二图像的视差图。
第三获取单元,基于视差图获取第一图像对应的深度信息。
在其中一个可选示例中,第一获取单元可以包括:第二子神经网络单元,用于分别对一个图像与另一个图像进行特征提取;获取子单元,用于基于提取的一个图像的特征与另一个图像的特征,获取一个图像与另一个图像中像素的位置关系,并输出用于表示双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数。
相应地,在另一个可选示例中,生成单元用于将另一个图像的特征与相关系数进行叠加,生成第一图像与第二图像的视差图。
在另一个可选示例中,生成单元可以包括:叠加子单元,用于将另一个图像的特征与相关系数进行叠加,得到叠加结果;融合子单元,用于提取叠加结果的特征,并将提取的叠加结果的特征与叠加结果进行融合,获得第一图像与第二图像的视差图。
在又一个可选示例中,第三获取单元用于基于视差图、拍摄第一图像的相机的焦距和双目图像对应的双目相机之间的距离,获取第一图像对应的深度信息。
另外,本公开上述至少一个实施例的深度估计装置中,还可以包括:获取模块,用于基于第一图像对应的深度信息与第二图像,获取第一图像中场景的三维空间场景。
如图7所示,为本公开深度估计装置又一个实施例的结构示意图。图7中,图像获取模块和立体匹配模块可以选择性地采用图6所示任一实施例的结构实现,也可以采用其他结构实现。
另外,本公开上述至少一个实施例的深度估计装置中,第一神经网络可以利用第一样本集中的样本双目图像预先训练而得,第一样本集包括至少一组第一样本双目图像。和/或,第二神经网络可以利用第二样本集中的样本双目图像预先训练而得;第二样本集包括至少一组第二样本双目图像和视差图标签。
再参见图7,在本公开深度估计装置的再一个实施例中,还包括第一训练模块。该实施例中,第一神经网络,用于由至少一组第一样本双目图像中的第一图像,获取至少一组第一样本双目图像中的第二图像并输出。第一训练模块,用于获取第一神经网络输出的第二图像与至少一组第一样本双目图像中的第二图像之间的第一差异,并基于第一差异对第一神经网络进行训练,直至满足第一训练完成条件。
在其中一个实施方式中,第一训练模块用于:获取第一神经网络输出的第二图像与至少一组第一样本双目图像中的第二图像之间在像素上的第一差异;
基于第一差异调整第一神经网络中网络参数的参数值,直至满足第一训练完成条件。其中,第一训练完成条件例如可以包括但不限于:第一差异小于第一预设阈值,和/或,对第一神经网络的训练次数达到第一预设次数。
再参见图7,在本公开深度估计装置的再一个实施例中,还可以包括第二训练模块。该实施例中,第二神经网络,用于获取至少一组第二样本双目图像的视差图并输出。第二训练模块,用于获取第二神经网络输出的视差图与至少一组第二样本双目图像的视差图标签之间的第二差异,并基于第二差异对第二神经网络进行训练,直至满足第二训练完成条件。
在其中一个实施方式中,第二训练模块具体用于:获取第二神经网络输出的视差图与至少一组第二样本双目图像的视差图标签之间在像素上的第二差异;
基于第二差异调整第二神经网络中网络参数的参数值,直至满足第二训练完成条件。其中,第二训练完成条件例如可以包括但不限于:第二差异小于第二预设阈值,和/或,对第二神经网络的训练次数达到第二预设次数。
进一步地,再参见图7,在本公开深度估计装置的再一个实施例中,还可以包括第三训练模块,用于利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对第一神经网络和第二神经网络进行训练。其中,第三样本集包括至少一组第三样本双目图像 和第三样本图像对应的深度图标签。
在其中一个实施方式中,第一神经网络用于由至少一组第三样本双目图像中的第一图像,获取至少一组第三样本双目图像中的第二图像;第二神经网络,用于获取至少一组第三样本双目图像的视差图。
第三训练模块用于:基于至少一组第三样本双目图像的视差图获取至少一组深度信息;获取至少一组深度信息与至少一组第三样本双目图像的深度图标签之间的第三差异;基于第三差异调整第一神经网络和第二神经网络中网络参数的参数值,直至满足第三训练完成条件。其中,第三训练完成条件例如可以包括但不限于:第三差异小于第三预设阈值,和/或,对第一神经网络和第二神经网络的训练次数达到第三预设次数。本公开深度估计装置实施例中至少一个方案的技术效果,可参加相应方法实施例中的相应描述,在此不再赘述。
另外,本公开实施例提供的一种电子设备,包括:
存储器,用于存储可执行指令;以及
处理器,用于与所述存储器通信以执行所述可执行指令从而完成本公开上述任一实施例所述深度估计方法的操作。
图8为本公开电子设备一个应用实施例的结构示意图。下面参考图8,其示出了适于用来实现本公开实施例的终端设备或服务器的电子设备的结构示意图。如图8所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)801,和/或一个或多个加速单元(GPU)813等,加速单元813可包括但不限于GPU、FPGA、其他类型的专用处理器等,处理器可以根据存储在只读存储器(ROM)802中的可执行指令或者从存储部分808加载到随机访问存储器(RAM)803中的可执行指令而执行各种适当的动作和处理。通信部812可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器802和/或随机访问存储器803中通信以执行可执行指令,通过总线804与通信部812相连、并经通信部812与其他目标设备通信,从而完成本公开实施例提供的任一方法对应的操作,例如,以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
本公开电子设备中各方案的技术效果,可参加相应方法实施例中的相应描述,在此不再赘述。
此外,在RAM803中,还可存储有装置操作所需的各种程序和数据。CPU801、ROM802以及RAM803通过总线804彼此相连。在有RAM803的情况下,ROM802为可选模块。RAM803存储可执行指令,或在运行时向ROM802中写入可执行指令,可执行指令使处理器执行本公开上述任一方法对应的操作。输入/输出(I/O)接口805也连接至总线804。通信部812可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
需要说明的,如图8所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图8的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如加速单元813和CPU801可分离设 置或者可将加速单元813集成在CPU801上,通信部812可分离设置,也可集成设置在CPU801或加速单元813上,等等。这些可替换的实施方式均落入本公开公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本公开实施例提供的深度估计方法操作对应的指令。在这样的实施例中,该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。在该计算机程序被CPU执行时,执行本公开的方法中限定的上述功能。
另外,本公开实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本公开任一实施例所述深度估计方法中操作的指令。
另外,本公开实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,指令被执行时实现本公开任一实施例所述深度估计方法中的操作。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分操作可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的c操作;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的操作的上述顺序仅是为了进行说明,本公开的方法的操作不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (37)

  1. 一种深度估计方法,其特征在于,包括:
    以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;
    经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
  2. 根据权利要求1所述的方法,其特征在于,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像,包括:
    经第一神经网络对所述第一图像进行处理,输出N个通道的视差概率图;其中,每个通道的视差概率图表示所述第一图像上像素向第一水平方向偏移i个视差的概率,i=0,1,…,N-1,N的取值为大于1的整数;
    根据所述N个通道的视差概率图,将所述第一图像分别向第一水平方向偏移i个像素,得到N张偏移图;
    将所述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图,得到N个点乘结果;
    将所述N个点乘结果基于像素进行叠加,得到所述第二图像。
  3. 根据权利要求2所述的方法,其特征在于,所述经第一神经网络对所述第一图像进行处理,输出N个通道的视差概率图,包括:
    分别通过第一神经网络中至少两个网络深度的网络层对所述第一图像进行特征提取,获得至少两个尺度的特征图;
    基于所述至少两个尺度的特征图分别获取至少两个分辨率的、N个通道的初步视差概率图;
    分别针对每个通道,将所述至少两个分辨率的初步视差概率图放大至所述第一图像的分辨率下进行叠加,得到所述N个通道的视差概率图。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一图像中对应前景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;所述第一图像中对应背景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。
  5. 根据权利要求1-4任一所述的方法,其特征在于,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息,包括:
    获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数;所述双目图像中一个图像包括所述第一图像或所述第二图像,所述另一个图像对应包括所述第二图像或所述第一图像;
    基于所述另一个图像与所述相关系数,生成所述第一图像与所述第二图像的视差图;
    基于所述视差图获取所述第一图像对应的深度信息。
  6. 根据权利要求5所述的方法,其特征在于,所述获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数,包括:
    分别对所述一个图像与所述另一个图像进行特征提取;
    基于提取的所述一个图像的特征与所述另一个图像的特征,获取所述一个图像与所述另一个图像中像素的位置关系,并输出用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的所述相关系数。
  7. 根据权利要求6所述的方法,其特征在于,基于所述另一个图像与所述相关系数,生成第一图像与所述第二图像的视差图,包括:
    将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图。
  8. 根据权利要求7所述的方法,其特征在于,将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图,包括:
    将所述另一个图像的特征与所述相关系数进行叠加,得到叠加结果;
    提取所述叠加结果的特征,并将提取的所述叠加结果的特征与所述叠加结果进行融合,获得所述第一图像与所述第二图像的视差图。
  9. 根据权利要求6-8任一所述的方法,其特征在于,所述基于所述视差图获取所述第一图像对应的深度信息,包括:
    基于所述视差图、拍摄所述第一图像的相机的焦距和所述双目图像对应的双目相机之间的距离,获取所述第一图像对应的深度信息。
  10. 根据权利要求1-9任一所述的方法,其特征在于,还包括:
    基于所述第一图像对应的所述深度信息与所述第二图像,获取所述第一图像中场景的三维空间场景。
  11. 根据权利要求5-10任一所述的方法,其特征在于,所述第一神经网络利用第一样本集中的样本双目图像预先训练而得,所述第一样本集包括至少一组第一样本双目图像;和/或,
    所述第二神经网络利用第二样本集中的样本双目图像预先训练而得;所述第二样本集包括至少一组第二样本双目图像和视差图标签。
  12. 根据权利要求11所述的方法,其特征在于,所述第一神经网络的训练包括:
    经所述第一神经网络,由所述至少一组第一样本双目图像中的第一图像,获取所述至少一组第一样本双目图像中的第二图像并输出;
    获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件。
  13. 根据权利要求12所述的方法,其特征在于,获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件,包括:
    获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间在像素上的第一差异;
    基于所述第一差异调整所述第一神经网络中网络参数的参数值,直至满足第一训练完成条件;
    所述第一训练完成条件包括:所述第一差异小于第一预设阈值,和/或,对所述第一神经网络的训练次数达到第一预设次数。
  14. 根据权利要求11-13任一所述的方法,其特征在于,所述第二神经网络的训练包括:
    经所述第二神经网络,获取所述至少一组第二样本双目图像的视差图并输出;
    获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件。
  15. 根据权利要求14所述的方法,其特征在于,获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件,包括:
    获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标 签之间在像素上的第二差异;
    基于所述第二差异调整所述第二神经网络中网络参数的参数值,直至满足第二训练完成条件;
    所述第二训练完成条件包括:所述第二差异小于第二预设阈值,和/或,对所述第二神经网络的训练次数达到第二预设次数。
  16. 根据权利要求11-15任一所述的方法,其特征在于,所述第一神经网络和所述第二神经网络的训练还包括:
    利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练;
    其中,所述第三样本集包括至少一组第三样本双目图像和第三样本图像对应的深度图标签。
  17. 根据权利要求16所述的方法,其特征在于,所述利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练,包括:
    经所述第一神经网络,由所述至少一组第三样本双目图像中的第一图像,获取所述至少一组第三样本双目图像中的第二图像;
    经所述第二神经网络,获取所述至少一组第三样本双目图像的视差图;
    基于所述至少一组第三样本双目图像的视差图获取至少一组深度信息;
    获取所述至少一组深度信息与所述至少一组第三样本双目图像的深度图标签之间的第三差异;
    基于所述第三差异调整所述第一神经网络和所述第二神经网络中网络参数的参数值,直至满足第三训练完成条件;
    所述第三训练完成条件包括:所述第三差异小于第三预设阈值,和/或,对所述第一神经网络和所述第二神经网络的训练次数达到第三预设次数。
  18. 一种深度估计装置,其特征在于,包括:
    图像获取模块,用于以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;
    立体匹配模块,用于经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
  19. 根据权利要求18所述的装置,其特征在于,所述图像获取模块包括:
    第一子神经网络单元,用于对所述第一图像进行处理,输出N个通道的视差概率图;其中,每个通道的视差概率图表示所述第一图像上像素向第一水平方向偏移i个视差的概率,i=0,1,…,N-1,N的取值为大于1的整数;
    偏移单元,用于根据所述N个通道的视差概率图,将所述第一图像分别向第一水平方向偏移i个像素,得到N张偏移图;
    点乘单元,用于将所述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图中,得到N个点乘结果;
    加法单元,用于将所述N个点乘结果基于像素进行叠加,得到所述第二图像。
  20. 根据权利要求19所述的装置,其特征在于,所述第一子神经网络单元,包括至少两个网络深度的网络层,用于:
    分别通过至少两个网络深度的网络层对所述第一图像进行特征提取,获得至少两个尺度的特征图;
    基于所述至少两个尺度的特征图分别获取至少两个分辨率的、N个通道的初步视差概率图;
    分别针对每个通道,将所述至少两个分辨率的初步视差概率图放大至所述第一图像的分辨率下进行叠加,得到所述N个通道的视差概率图。
  21. 根据权利要求19或20所述的装置,其特征在于,所述第一图像中对应前景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;所述第一图像中对应背景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。
  22. 根据权利要求18-21任一所述的装置,其特征在于,所述立体匹配模块包括:
    第一获取单元,用于获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数;所述双目图像中一个图像包括所述第一图像或所述第二图像,所述另一个图像对应包括所述第二图像或所述第一图像;
    生成单元,用于基于所述另一个图像与所述相关系数,生成所述第一图像与所述第二图像的视差图;
    第三获取单元,基于所述视差图获取所述第一图像对应的深度信息。
  23. 根据权利要求22所述的装置,其特征在于,所述第一获取单元,包括:
    第二子神经网络单元,用于分别对所述一个图像与所述另一个图像进行特征提取;
    获取子单元,用于基于提取的所述一个图像的特征与所述另一个图像的特征,获取所述一个图像与所述另一个图像中像素的位置关系,并输出用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的所述相关系数。
  24. 根据权利要求23所述的装置,其特征在于,所述生成单元,用于将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图。
  25. 根据权利要求24所述的装置,其特征在于,所述生成单元,包括:
    叠加子单元,用于将所述另一个图像的特征与所述相关系数进行叠加,得到叠加结果;
    融合子单元,用于提取所述叠加结果的特征,并将提取的所述叠加结果的特征与所述叠加结果进行融合,获得所述第一图像与所述第二图像的视差图。
  26. 根据权利要求23-25任一所述的装置,其特征在于,所述第三获取单元,用于基于所述视差图、拍摄所述第一图像的相机的焦距和所述双目图像对应的双目相机之间的距离,获取所述第一图像对应的深度信息。
  27. 根据权利要求18-26任一所述的装置,其特征在于,还包括:
    获取模块,用于基于所述第一图像对应的所述深度信息与所述第二图像,获取所述第一图像中场景的三维空间场景。
  28. 根据权利要求22-27任一所述的装置,其特征在于,所述第一神经网络利用第一样本集中的样本双目图像预先训练而得,所述第一样本集包括至少一组第一样本双目图像;和/或,
    所述第二神经网络利用第二样本集中的样本双目图像预先训练而得;所述第二样本集包括至少一组第二样本双目图像和视差图标签。
  29. 根据权利要求28所述的装置,其特征在于,所述第一神经网络,用于由所述至少一组第一样本双目图像中的第一图像,获取所述至少一组第一样本双目图像中的第二图像并输出;
    所述装置还包括:
    第一训练模块,用于获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件。
  30. 根据权利要求29所述的装置,其特征在于,所述第一训练模块,用于:
    获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二 图像之间在像素上的第一差异;
    基于所述第一差异调整所述第一神经网络中网络参数的参数值,直至满足第一训练完成条件;
    所述第一训练完成条件包括:所述第一差异小于第一预设阈值,和/或,对所述第一神经网络的训练次数达到第一预设次数。
  31. 根据权利要求28-30任一所述的装置,其特征在于,所述第二神经网络,用于获取所述至少一组第二样本双目图像的视差图并输出;
    所述装置还包括:
    第二训练模块,用于获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件。
  32. 根据权利要求31所述的装置,其特征在于,所述第二训练模块,用于:
    获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间在像素上的第二差异;
    基于所述第二差异调整所述第二神经网络中网络参数的参数值,直至满足第二训练完成条件;
    所述第二训练完成条件包括:所述第二差异小于第二预设阈值,和/或,对所述第二神经网络的训练次数达到第二预设次数。
  33. 根据权利要求28-33任一所述的装置,其特征在于,还包括:
    第三训练模块,用于利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练;
    其中,所述第三样本集包括至少一组第三样本双目图像和第三样本图像对应的深度图标签。
  34. 根据权利要求33所述的装置,其特征在于,所述第一神经网络,用于由所述至少一组第三样本双目图像中的第一图像,获取所述至少一组第三样本双目图像中的第二图像;
    所述第二神经网络,用于获取所述至少一组第三样本双目图像的视差图;
    所述第三训练模块,用于:
    基于所述至少一组第三样本双目图像的视差图获取至少一组深度信息;
    获取所述至少一组深度信息与所述至少一组第三样本双目图像的深度图标签之间的第三差异;
    基于所述第三差异调整所述第一神经网络和所述第二神经网络中网络参数的参数值,直至满足第三训练完成条件;
    所述第三训练完成条件包括:所述第三差异小于第三预设阈值,和/或,对所述第一神经网络和所述第二神经网络的训练次数达到第三预设次数。
  35. 一种电子设备,其特征在于,包括:
    存储器,用于存储可执行指令;以及
    处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1-17任一所述方法的操作。
  36. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-17任一所述方法中操作的指令。
  37. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-17任一所述方法中的操作。
PCT/CN2019/073820 2018-02-01 2019-01-30 深度估计方法和装置、电子设备、程序和介质 WO2019149206A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11202003141PA SG11202003141PA (en) 2018-02-01 2019-01-30 Depth estimation method and apparatus, electronic device, program, and medium
KR1020207009470A KR102295403B1 (ko) 2018-02-01 2019-01-30 깊이 추정 방법 및 장치, 전자 기기, 프로그램 및 매체
JP2020517931A JP6951565B2 (ja) 2018-02-01 2019-01-30 深度推定方法及び装置、電子機器並びに媒体
US16/835,418 US11308638B2 (en) 2018-02-01 2020-03-31 Depth estimation method and apparatus, electronic device, program, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810103195.0 2018-02-01
CN201810103195.0A CN108335322B (zh) 2018-02-01 2018-02-01 深度估计方法和装置、电子设备、程序和介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/835,418 Continuation US11308638B2 (en) 2018-02-01 2020-03-31 Depth estimation method and apparatus, electronic device, program, and medium

Publications (1)

Publication Number Publication Date
WO2019149206A1 true WO2019149206A1 (zh) 2019-08-08

Family

ID=62928066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073820 WO2019149206A1 (zh) 2018-02-01 2019-01-30 深度估计方法和装置、电子设备、程序和介质

Country Status (6)

Country Link
US (1) US11308638B2 (zh)
JP (1) JP6951565B2 (zh)
KR (1) KR102295403B1 (zh)
CN (1) CN108335322B (zh)
SG (1) SG11202003141PA (zh)
WO (1) WO2019149206A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021084530A1 (en) * 2019-10-27 2021-05-06 Ramot At Tel-Aviv University Ltd. Method and system for generating a depth map
CN112862877A (zh) * 2021-04-09 2021-05-28 北京百度网讯科技有限公司 用于训练图像处理网络和图像处理的方法和装置

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335322B (zh) * 2018-02-01 2021-02-12 深圳市商汤科技有限公司 深度估计方法和装置、电子设备、程序和介质
CN110622213B (zh) * 2018-02-09 2022-11-15 百度时代网络技术(北京)有限公司 利用3d语义地图进行深度定位和分段的系统和方法
CN109299656B (zh) * 2018-08-13 2021-10-22 浙江零跑科技股份有限公司 一种车载视觉系统场景视深确定方法
CN109598754B (zh) * 2018-09-29 2020-03-17 天津大学 一种基于深度卷积网络的双目深度估计方法
US10503966B1 (en) * 2018-10-11 2019-12-10 Tindei Network Technology (Shanghai) Co., Ltd. Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same
CN111209770B (zh) * 2018-11-21 2024-04-23 北京三星通信技术研究有限公司 一种车道线识别方法及装置
CN111210467A (zh) * 2018-12-27 2020-05-29 上海商汤智能科技有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN111383256B (zh) * 2018-12-29 2024-05-17 北京市商汤科技开发有限公司 图像处理方法、电子设备及计算机可读存储介质
CN109741388B (zh) * 2019-01-29 2020-02-28 北京字节跳动网络技术有限公司 用于生成双目深度估计模型的方法和装置
CN109840500B (zh) * 2019-01-31 2021-07-02 深圳市商汤科技有限公司 一种三维人体姿态信息检测方法及装置
CN110223334B (zh) * 2019-05-07 2021-09-14 深圳云天励飞技术有限公司 一种景深图获取方法及装置
CN109934307B (zh) * 2019-05-08 2021-04-09 北京奇艺世纪科技有限公司 视差图预测模型训练方法、预测方法、装置及电子设备
US20200364442A1 (en) * 2019-05-15 2020-11-19 Getac Technology Corporation System for detecting surface pattern of object and artificial neural network-based method for detecting surface pattern of object
CN112434702A (zh) * 2019-08-26 2021-03-02 阿里巴巴集团控股有限公司 图像处理方法、装置、计算机设备、存储介质
US11294996B2 (en) 2019-10-15 2022-04-05 Assa Abloy Ab Systems and methods for using machine learning for image-based spoof detection
US11348375B2 (en) 2019-10-15 2022-05-31 Assa Abloy Ab Systems and methods for using focal stacks for image-based spoof detection
CN111047634B (zh) * 2019-11-13 2023-08-08 杭州飞步科技有限公司 场景深度的确定方法、装置、设备及存储介质
CN112991254A (zh) * 2019-12-13 2021-06-18 上海肇观电子科技有限公司 视差估计系统、方法、电子设备及计算机可读存储介质
CN113034568B (zh) * 2019-12-25 2024-03-29 杭州海康机器人股份有限公司 一种机器视觉深度估计方法、装置、系统
CN111652922B (zh) * 2020-06-04 2023-09-08 江苏天宏机械工业有限公司 一种基于双目视觉的单目视频深度估计方法
US11275959B2 (en) * 2020-07-07 2022-03-15 Assa Abloy Ab Systems and methods for enrollment in a multispectral stereo facial recognition system
US11836965B2 (en) 2020-08-12 2023-12-05 Niantic, Inc. Determining visual overlap of images by using box embeddings
CN112489103B (zh) * 2020-11-19 2022-03-08 北京的卢深视科技有限公司 一种高分辨率深度图获取方法及系统
CN112446328B (zh) * 2020-11-27 2023-11-17 汇纳科技股份有限公司 单目深度的估计系统、方法、设备及计算机可读存储介质
CN112903952B (zh) * 2021-01-21 2022-05-27 北京航空航天大学 一种金属板结构损伤评价系统和方法
CN112861940A (zh) * 2021-01-26 2021-05-28 上海西井信息科技有限公司 双目视差估计方法、模型训练方法以及相关设备
CN112949504B (zh) * 2021-03-05 2024-03-19 深圳市爱培科技术股份有限公司 立体匹配方法、装置、设备及存储介质
CN112967332B (zh) * 2021-03-16 2023-06-16 清华大学 基于门控成像的双目深度估计方法、装置及计算机设备
US11823402B2 (en) 2021-05-03 2023-11-21 Electronics And Telecommunications Research Institute Method and apparatus for correcting error in depth information estimated from 2D image
KR102641108B1 (ko) * 2021-08-03 2024-02-27 연세대학교 산학협력단 깊이맵 완성 장치 및 방법
CN113928282A (zh) * 2021-11-24 2022-01-14 扬州大学江都高端装备工程技术研究所 融合路面环境和车辆安全模型的辅助巡航主动刹车方法
CN114627535B (zh) * 2022-03-15 2024-05-10 平安科技(深圳)有限公司 基于双目摄像头的坐标匹配方法、装置、设备及介质
CN114615507B (zh) * 2022-05-11 2022-09-13 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种图像编码方法、解码方法及相关装置
CN115937290B (zh) * 2022-09-14 2024-03-22 北京字跳网络技术有限公司 一种图像深度估计方法、装置、电子设备及存储介质
CN116129036B (zh) * 2022-12-02 2023-08-29 中国传媒大学 一种深度信息引导的全方向图像三维结构自动恢复方法
CN117726666B (zh) * 2024-02-08 2024-06-04 北京邮电大学 跨相机单目图片度量深度估计方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523464A (zh) * 2011-12-12 2012-06-27 上海大学 一种双目立体视频的深度图像估计方法
WO2018006296A1 (en) * 2016-07-06 2018-01-11 SZ DJI Technology Co., Ltd. Systems and methods for stereoscopic imaging
CN107578435A (zh) * 2017-09-11 2018-01-12 清华-伯克利深圳学院筹备办公室 一种图像深度预测方法及装置
CN108335322A (zh) * 2018-02-01 2018-07-27 深圳市商汤科技有限公司 深度估计方法和装置、电子设备、程序和介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02101584A (ja) * 1988-10-11 1990-04-13 Nippon Telegr & Teleph Corp <Ntt> ステレオ画像処理方式
CN101907448B (zh) * 2010-07-23 2013-07-03 华南理工大学 一种基于双目三维视觉的深度测量方法
KR101691034B1 (ko) * 2010-08-03 2016-12-29 삼성전자주식회사 3차원 그래픽 기반 단말기에서 객체 렌더링 시 부가정보 합성 장치 및 방법
JP6210483B2 (ja) * 2012-04-26 2017-10-11 国立大学法人山口大学 立体内視鏡画像からの3次元形状取得装置
CN102750702B (zh) * 2012-06-21 2014-10-15 东华大学 基于优化bp神经网络模型的单目红外图像深度估计方法
WO2016105541A1 (en) * 2014-12-24 2016-06-30 Reald Inc. Adjustment of perceived roundness in stereoscopic image of a head
US9811756B2 (en) * 2015-02-23 2017-11-07 Mitsubishi Electric Research Laboratories, Inc. Method for labeling images of street scenes
GB2553782B (en) * 2016-09-12 2021-10-20 Niantic Inc Predicting depth from image data using a statistical model
CN106355570B (zh) * 2016-10-21 2019-03-19 昆明理工大学 一种结合深度特征的双目立体视觉匹配方法
CN106612427B (zh) * 2016-12-29 2018-07-06 浙江工商大学 一种基于卷积神经网络的时空一致性深度图序列的生成方法
CN106504190B (zh) * 2016-12-29 2019-09-13 浙江工商大学 一种基于3d卷积神经网络的立体视频生成方法
RU2698402C1 (ru) * 2018-08-30 2019-08-26 Самсунг Электроникс Ко., Лтд. Способ обучения сверточной нейронной сети для восстановления изображения и система для формирования карты глубины изображения (варианты)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523464A (zh) * 2011-12-12 2012-06-27 上海大学 一种双目立体视频的深度图像估计方法
WO2018006296A1 (en) * 2016-07-06 2018-01-11 SZ DJI Technology Co., Ltd. Systems and methods for stereoscopic imaging
CN107578435A (zh) * 2017-09-11 2018-01-12 清华-伯克利深圳学院筹备办公室 一种图像深度预测方法及装置
CN108335322A (zh) * 2018-02-01 2018-07-27 深圳市商汤科技有限公司 深度估计方法和装置、电子设备、程序和介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021084530A1 (en) * 2019-10-27 2021-05-06 Ramot At Tel-Aviv University Ltd. Method and system for generating a depth map
CN112862877A (zh) * 2021-04-09 2021-05-28 北京百度网讯科技有限公司 用于训练图像处理网络和图像处理的方法和装置
CN112862877B (zh) * 2021-04-09 2024-05-17 北京百度网讯科技有限公司 用于训练图像处理网络和图像处理的方法和装置

Also Published As

Publication number Publication date
CN108335322A (zh) 2018-07-27
JP6951565B2 (ja) 2021-10-20
JP2020535547A (ja) 2020-12-03
CN108335322B (zh) 2021-02-12
US11308638B2 (en) 2022-04-19
KR102295403B1 (ko) 2021-08-31
KR20200049833A (ko) 2020-05-08
US20200226777A1 (en) 2020-07-16
SG11202003141PA (en) 2020-05-28

Similar Documents

Publication Publication Date Title
WO2019149206A1 (zh) 深度估计方法和装置、电子设备、程序和介质
EP3698323B1 (en) Depth from motion for augmented reality for handheld user devices
US11468585B2 (en) Pseudo RGB-D for self-improving monocular slam and depth prediction
TWI766175B (zh) 單目圖像深度估計方法、設備及儲存介質
Luo et al. Single view stereo matching
EP2992508B1 (en) Diminished and mediated reality effects from reconstruction
US9237330B2 (en) Forming a stereoscopic video
CN111899282B (zh) 基于双目摄像机标定的行人轨迹跟踪方法及装置
Guizilini et al. Full surround monodepth from multiple cameras
TW202117611A (zh) 電腦視覺訓練系統及訓練電腦視覺系統的方法
KR100560464B1 (ko) 관찰자의 시점에 적응적인 다시점 영상 디스플레이 시스템을 구성하는 방법
JP2020523703A (ja) ダブル視野角画像較正および画像処理方法、装置、記憶媒体ならびに電子機器
JP7184748B2 (ja) 場面の積層深度データを生成するための方法
US20170064279A1 (en) Multi-view 3d video method and system
US9483836B2 (en) Method and apparatus for real-time conversion of 2-dimensional content to 3-dimensional content
JPWO2021076757A5 (zh)
CN111598927B (zh) 一种定位重建方法和装置
US11810308B2 (en) Vertical disparity detection in stereoscopic images using a deep neural network
CN111260544B (zh) 数据处理方法及装置、电子设备和计算机存储介质
Chantara et al. Initial depth estimation using EPIs and structure tensor
San et al. Early experience of depth estimation on intricate objects using generative adversarial networks
Xian et al. ViTA: Video Transformer Adaptor for Robust Video Depth Estimation
KR20220071935A (ko) 광학 흐름을 이용한 고해상도 깊이 영상 추정 방법 및 장치
CN116402878A (zh) 光场图像处理方法及装置
Takaya et al. Interactive 3D Contents Generation for Auto-stereoscopic Display based on Depth Camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19747562

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020517931

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20207009470

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 19747562

Country of ref document: EP

Kind code of ref document: A1