WO2019149206A1 - 深度估计方法和装置、电子设备、程序和介质 - Google Patents
深度估计方法和装置、电子设备、程序和介质 Download PDFInfo
- Publication number
- WO2019149206A1 WO2019149206A1 PCT/CN2019/073820 CN2019073820W WO2019149206A1 WO 2019149206 A1 WO2019149206 A1 WO 2019149206A1 CN 2019073820 W CN2019073820 W CN 2019073820W WO 2019149206 A1 WO2019149206 A1 WO 2019149206A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- neural network
- binocular
- sample
- disparity
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
- G06T7/85—Stereo camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to computer vision technology, and more particularly to a depth estimation method and apparatus, an electronic device, a computer program, and a computer readable storage medium.
- Depth estimation is an important issue in the field of computer vision. Accurate depth estimation methods have important value in many fields, such as autonomous driving, 3D scene reconstruction and augmented reality.
- Embodiments of the present disclosure provide a depth estimation technical solution.
- a depth estimation method including:
- a depth estimating apparatus including:
- An image acquisition module configured to use a single image as the first image in the binocular image, and acquire the second image in the binocular image based on the first image via the first neural network;
- a stereo matching module configured to acquire, by the second neural network, the depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image.
- an electronic device including:
- a memory for storing executable instructions
- a processor for communicating with the memory to execute the executable instructions to perform the operations of the method of any of the above-described embodiments of the present disclosure.
- a computer program comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above An instruction that operates in the method of any of the embodiments.
- a computer readable storage medium for storing computer readable instructions that, when executed, implement operations in the method of any of the above embodiments of the present disclosure .
- the depth estimation method and apparatus, the electronic device, the computer program, and the computer readable storage medium provided by the above embodiments of the present disclosure, using a single picture as the first image in the binocular image, via the first neural network, based on the first image Obtaining a second image in the binocular image, and performing binocular stereo matching on the first image and the second image via the second neural network, acquiring depth information corresponding to the first image, thereby implementing the single image based on the single image
- the depth estimation of the scene in the picture without the need of a binocular camera, avoids the extra hardware overhead generated by the binocular camera, reduces the cost, and can avoid the depth information error obtained by the inaccurate setting of the binocular camera, and improves the The accuracy of the depth estimate.
- FIG. 1 is a flow chart of an embodiment of a depth estimation method according to the present disclosure
- FIG. 2 is a flow chart of another embodiment of a depth estimation method of the present disclosure.
- FIG. 3 is a flow chart of an application embodiment of a depth estimation method according to the present disclosure.
- FIG. 4 is an exemplary block diagram corresponding to the embodiment shown in FIG. 3;
- FIG. 5 is a schematic structural diagram of an embodiment of a depth estimating apparatus according to the present disclosure.
- FIG. 6 is a schematic structural view of another embodiment of a depth estimating apparatus according to the present disclosure.
- Figure 7 is a schematic structural view of still another embodiment of the depth estimating apparatus of the present disclosure.
- FIG. 8 is a schematic structural diagram of an application embodiment of an electronic device according to the present disclosure.
- Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
- Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
- program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
- program modules may be located on a local or remote computing system storage medium including storage devices.
- the depth estimation method of this embodiment includes:
- the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network.
- the binocular image is two images taken by a binocular camera or two images of a plurality of images taken by a multi-head camera, which may be referred to as a left image and a right image. Wherein, when the first image is the left image, the second image is the right image; or, when the first image is the right image, the second image is the left image.
- the binocular image may also be referred to as a main image and a sub-picture, and when any one of the binocular images is used as the main image, the other image is used as the sub-picture.
- the operation 102 may be performed by a processor invoking a corresponding instruction stored in a memory or by an image acquisition module executed by the processor.
- the first neural network and the second neural network may each be a multi-layer neural network (ie, a deep neural network), such as a multi-layer convolutional neural network, such as LeNet, AlexNet. Any neural network such as GoogLeNet, VGG, ResNet.
- the first neural network and the second neural network may employ a neural network of the same type and structure, or a neural network of different types and structures.
- the operation 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a stereo matching module executed by the processor.
- the inventors have found through research that the current depth estimation methods can be mainly divided into two categories.
- One is to use a large number of pixel-level depth tags to supervise the neural network, and to obtain depth estimation through trained neural network acquisition, but obtaining depth labels is not only expensive, but the existing technology cannot obtain high quality and dense depth. label.
- the second type is a depth estimation method based on binocular stereo matching. In this depth estimation method, two images taken from different orientations will be used as input. Based on the rules of geometric space, the depth can be calculated by calculating two images. Obtained by the parallax of the pixel.
- the accuracy of this type of prediction method is limited by the setting of the binocular camera, and this type of method generates additional hardware overhead due to the need for a binocular camera.
- the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network.
- the neural network obtains depth information by performing binocular stereo matching on the first image and the second image, thereby realizing depth estimation of the scene in the single picture based on a single picture, without requiring a binocular camera, avoiding double
- the additional hardware overhead generated by the camera reduces the cost; and the depth information error caused by the inaccuracy of the binocular camera setting can be avoided, and the accuracy of the depth estimation is improved.
- the depth estimation method of this embodiment includes:
- 202 Taking a single picture as the first image in the binocular image, processing the first image by using the first neural network, and outputting a disparity probability map of the N channels.
- the first horizontal direction is the horizontal left direction; when the first image is the right image, the first horizontal direction is the horizontal right direction. That is, when the first image is the left image, the disparity probability map of the i-th channel indicates the probability that the pixel on the left image is horizontally shifted to the left by a certain disparity.
- the disparity probability maps of the first, second, third, fourth, and fifth channels respectively indicate the probability that the pixel on the left image is shifted to the left by 0, 1, 2, 3, and 4 parallaxes.
- the probability that a pixel level is shifted to the left by 0, 1, 2, 3, 4 disparity may be, for example, 0.3, 0.4, 0.2, 0.1, and 0, respectively.
- the operation 202 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first sub-neural network unit of an image acquisition module operated by the processor.
- the operation 204 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an offset unit of an image acquisition module that is executed by the processor.
- the operation 206 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a point multiply unit of an image acquisition module operated by the processor.
- the second image in the binocular image is the right shot by the binocular camera.
- the first image in the binocular image is the right image taken by the binocular camera
- the second image in the binocular image is the left image taken by the binocular camera
- the pixel corresponding to the foreground object in the first image has a larger probability value in the disparity probability map of the channel corresponding to the larger disparity in the disparity pixel position in the disparity probability map;
- the pixel corresponding to the background object in the first image has a larger probability value in the disparity probability map of the channel corresponding to the smaller disparity in the corresponding pixel position in the disparity probability map.
- the first image includes a background and a face as a foreground object, and the probability value of the pixel corresponding to the face in the disparity probability map of the channel corresponding to the larger disparity in the disparity probability map of the N channels is 0.8, at N
- the probability value in the disparity probability map of the channel corresponding to the smaller disparity in the disparity probability map of the channel is 0.1; the probability value in the disparity probability map of the channel corresponding to the smaller disparity in the disparity probability map of the N channels in the corresponding channel
- the probability value in the disparity probability map of the channel corresponding to the larger disparity in the disparity probability map of the N channels is 0.
- the operation 208 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an addition unit of an image acquisition module operated by the processor.
- each pixel position in the first image can be regarded as a variable, and the value of the variable is a parallax probability.
- the value of the corresponding parallax in the figure may be 0, 1, ..., N-1.
- a correlation coefficient of a positional relationship of one image in a binocular image with respect to a pixel in another image includes a correlation coefficient of a variable of all pixel positions in one image and a variable of adjacent d pixel positions in the second image, which may be expressed as one W*H*N correlation coefficient map or a correlation coefficient matrix.
- W, H, and N respectively represent the width, height, and number of channels of an image, and the values of W, H, and N are integers greater than 0.
- one image in the binocular image includes the first image or the second image, and the other image corresponds to the second image or the first image in the binocular image.
- the operation 210 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first acquisition unit of a stereo matching module operated by the processor.
- the values of at least one pixel in the disparity map respectively represent the disparity of a certain point in the first image capturing scene, that is, the coordinates of the certain point in the first image coordinate system and the coordinates in the second image coordinate system The difference between the two.
- the operation 212 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a generating unit of a stereo matching module operated by the processor.
- the depth information corresponding to the first image may be acquired based on the disparity map and the camera parameter, for example, the focal length and the binocular image of the camera that captures the first image may be based on the disparity map.
- the depth information corresponding to the first image is obtained by the distance between the corresponding binocular cameras.
- Disp represents the predicted disparity map
- f is the focal length of the camera that captures the first image
- B is the distance between the binocular cameras
- Z is the monocular global depth map to be predicted.
- the operation 214 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a third acquisition unit of a stereo matching module operated by the processor.
- the problem of the monocular depth estimation is transformed into the binocular stereo matching problem.
- the more difficult depth estimation problem is converted into the problem of matching two image similar pixel points, and the matching is no longer It is necessary to speculate the geometric relationship between at least two pixels in a single image, which reduces the computational complexity.
- the embodiment of the present disclosure can better implement the two operations of synthesizing the second image and the binocular stereo matching by using the deep learning method, and explicitly setting the geometric transformation in the first neural network and the second neural network. Improve the accuracy of the operation results.
- the embodiment of the present disclosure performs binocular stereo matching by synthesizing the corresponding right image using a single picture, and when training for the first neural network and the second neural network, a large number of precise depth labels are no longer needed for supervision. It is only necessary to train the first neural network (also known as image synthesis network) using the easily corrected binocular image, and train the second neural network (also called binocular stereo) using a large number of computer rendered binocular images and depth maps. Matching network) reduces training data overhead compared to a large number of sophisticated depth tags.
- processing the first image by using the first neural network to output a disparity probability map may include:
- Feature extraction is performed on the first image by network layers of two or more network depths in the first neural network, respectively, to obtain feature maps of two or more scales (ie, sizes); in the present disclosure, at least two are two One or two or more.
- the preliminary disparity probability maps of the above two or more resolutions are enlarged to the resolution of the first image for superposition, and the disparity probability maps of the N channels are obtained.
- feature maps of different sizes and resolutions are generated at different stages of the neural network, and the feature maps of different sizes and resolutions can generate preliminary disparity probability maps of different sizes and resolutions, thereby Helps provide different local and global information for predicting depth information.
- the first image is a red, green, and blue (RGB) image with W*H*N of 200*200*3, and a characteristic map of 100*100*64 is obtained through a network layer of a certain network depth of the first neural network, and then Continue to pass through the network layer of another network depth to obtain a feature map of 50*50*128.
- a preliminary disparity probability map of different sizes and resolutions can be obtained, for example, 100*100*N and Preliminary parallax probability plot of 50*50*N.
- the resolution of the first image is 200*200, and the resolutions of the two preliminary disparity probability maps are 100*100 and 50*50, respectively, and the resolutions of the two preliminary disparity probability maps are respectively the resolution of the first image. 1/2*1/2, 1/4*1/4.
- the size of the feature extracted by the network layer of two or more network depths in the first neural network is different, wherein the feature depth field extracted by the network layer with a shallow network depth is smaller, and the first image is compared.
- the information of the small area, the deeper network layer extracts the feature receptive field can reflect the information of the larger area in the first image, and even the global information, and use different resolution feature maps to simultaneously provide information of different fields of view. Can produce a more accurate probability disparity map.
- the operation 210 may include:
- Feature extraction is performed on one image and another image, respectively.
- feature extraction can be performed on one image and another image through a convolutional neural network
- the second neural network based on the extracted features of one image and the features of the other image, the positional relationship between one image and the pixels in the other image is acquired, and the correlation coefficient is output.
- the operation 212 may include superimposing features of another image and correlation coefficients to generate a disparity map of the first image and the second image.
- the disparity map of the single image capturing scene in the binocular image can be obtained based on the single image, and the more difficult depth estimation problem is converted into the problem of matching the similar pixel points of the two images, and the matching no longer needs to be speculated.
- the geometric relationship between at least two pixels in a single image reduces the computational complexity.
- the embodiment uses the deep learning method to explicitly set the geometric transformation in the second neural network, thereby improving the accuracy of the operation result.
- the operation 212 may include: superimposing features of another image and correlation coefficients to obtain a superposition result, where the superposition result may be, for example, a feature map;
- the feature of the result is obtained by fusing the extracted feature of the superimposed result with the superimposed result to obtain a disparity map of the first image and the second image.
- the features of the overlay results may be extracted by a convolutional neural network, which may for instance include, but is not limited to, a layer of convolutional layer and a layer of activation layer (ReLu).
- the convolutional neural network can be implemented, for example, by an encoding-decoding model, and the feature of the superimposed result is extracted by the convolution layer to obtain a feature map having the same size as the superimposed result, and the feature map is merged with the superimposed result (concat). A disparity map of the first image and the second image is obtained.
- the range of the receptive field can be increased, and the extracted superimposed result and the superimposed result are combined to obtain a disparity map of the first image and the second image, so that the disparity map is obtained. More information can be fused, and more global information can be obtained, thereby helping to improve the depth information corresponding to the first image of the subsequent prediction.
- FIG. 3 is a flow chart of an application embodiment of the depth estimation method of the present disclosure.
- 4 is an exemplary block diagram corresponding to the embodiment shown in FIG. 3.
- the first image and the second image in the above at least one embodiment of the present disclosure are respectively described in the left diagram and the right diagram.
- the application embodiment includes:
- the single picture is used as the left picture in the binocular image, and the left picture is processed by the first neural network, and the disparity probability map of the N channels is output.
- the operation 302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first sub-neural network element of an image acquisition module operated by the processor.
- the operation 304 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an offset unit of an image acquisition module that is executed by the processor.
- the operation 306 may be performed by a processor invoking a corresponding instruction stored in a memory, or by a point multiply unit of an image acquisition module being executed by the processor.
- the operation 308 may be performed by a processor invoking a corresponding instruction stored in a memory, or by an addition unit of an image acquisition module being executed by the processor.
- the operation 310 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second sub-neural network unit of a first acquisition unit of a stereo matching module operated by the processor.
- the operation 312 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition sub-unit of a first acquisition unit of a stereo matching module operated by the processor.
- the superposition result may be, for example, a feature map.
- the feature map obtained from the left image may be further extracted by the neural network, and the extracted features and the correlation coefficients are superimposed.
- the neural network can be exemplarily composed of a layer of convolution layer and an activation layer, and further feature extraction is performed on the feature map obtained from the left image, so that the range of the receptive field can be increased, and the further processed feature map is obtained. (feature), and then superimposed with the correlation coefficient, so that the superposition result can include more global information, and improve the accuracy of the subsequently obtained disparity map and depth information.
- the operation 314 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an overlay subunit of a generating unit of a stereo matching module operated by the processor.
- the operation 316 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a fusion sub-unit of a generating unit of a stereo matching module operated by the processor.
- the operation 318 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a third acquisition unit of a stereo matching module operated by the processor.
- the three-dimensional spatial scene of the scene in the first image may also be acquired based on the depth information corresponding to the first image and the second image.
- Embodiments of the present disclosure may be applied, for example, but not limited to the following:
- Embodiments of the present disclosure may be applied to three-dimensional scene reconstruction, and depth information corresponding to the predicted first image (also referred to as a global depth map) may be applied to various scenes, such as automatic driving, three-dimensional scene restoration. , 3D movie production, and more. With the embodiment of the present disclosure, only a single picture is needed to obtain a better effect and reduce the cost.
- the three-dimensional space scene of the entire scene in the original image ie, the first image
- the restored three-dimensional space scene has many application scenarios. For example, 3D movies, autopilot, etc.
- the first neural network may be pre-trained by using a sample binocular image in the first sample set, the first sample set including at least one set of the first sample binocular The image; and/or the second neural network may be pre-trained using the sample binocular image in the second sample set.
- the method may further include:
- the first neural network is trained using the sample binocular image in the first sample set
- the second neural network is trained using the sample binocular image and the depth map in the second sample set.
- the first sample set includes at least one set of first sample binocular images
- each set of first sample binocular images includes a first image and a second image
- the second sample set includes at least one set of second sample binoculars Image and disparity map labels.
- training the first neural network with the sample binocular image in the first sample set may include:
- a first difference between a second image output by the first neural network and a second image of the at least one set of first sample binocular images is obtained, and the first neural network is based on the first difference
- the network performs training until the first training completion condition is met, which may include:
- the parameter values of the network parameters in the first neural network are adjusted based on the first difference until the first training completion condition is satisfied.
- the first training completion condition may include, but is not limited to, the first difference is less than the first preset threshold, and/or the number of trainings for the first neural network reaches the first preset number of times.
- training the second neural network with the sample binocular image and the disparity map tag in the second sample set may include:
- obtaining a second difference between the disparity map output by the second neural network and the disparity map label of the at least one set of second sample binocular images, and training the second neural network based on the second difference Until the second training completion condition is met it may include:
- the parameter values of the network parameters in the second neural network are adjusted based on the second difference until the second training completion condition is satisfied.
- the second training completion condition may include, but is not limited to, the second difference is less than the second preset threshold, and/or the number of times of training the second neural network reaches the second preset number of times.
- the computer-generated left, right, and depth map labels may be used as the second sample binocular image and the second in the second sample set.
- the depth map label corresponding to the sample image trains the second neural network.
- the method may further include:
- the first neural network and the second neural network are trained using the sample binocular image in the third sample set and the depth map tag corresponding to the third sample image.
- the third sample set includes at least one set of third sample binocular images and a depth map label corresponding to the third sample image.
- training the first neural network and the second neural network by using the sample binocular image in the third sample set and the depth map tag corresponding to the third sample image may include:
- the parameter values of the network parameters in the first neural network and the second neural network are adjusted based on the third difference until the third training completion condition is satisfied.
- the third training completion condition may include, but is not limited to, the third difference is less than the third preset threshold, and/or the number of times of training the first neural network and the second neural network reaches a third preset number of times.
- any of the depth estimation methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
- any depth estimation method provided by an embodiment of the present disclosure may be performed by a processor, such as the processor performing any of the depth estimation methods mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. This will not be repeated below.
- the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
- the operation of the foregoing method embodiment is included; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- FIG. 5 is a schematic structural diagram of an embodiment of a depth estimating apparatus according to the present disclosure.
- the depth estimation apparatus of this embodiment can be used to implement the above-described at least one depth estimation method embodiment of the present disclosure.
- the depth estimating apparatus of this embodiment includes an image acquiring module and a stereo matching module. among them:
- an image obtaining module configured to use the single image as the first image in the binocular image, and acquire the second image in the binocular image based on the first image via the first neural network.
- the stereo matching module is configured to obtain, by the second neural network, the depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image.
- the depth estimation apparatus wherein the single image is used as the first image in the binocular image, and the second image in the binocular image is acquired based on the first image via the first neural network, and the second neural
- the network obtains depth information corresponding to the first image by performing binocular stereo matching on the first image and the second image, thereby implementing depth estimation of the scene in the single image based on the single image, without requiring a binocular camera
- the additional hardware overhead generated by the binocular camera is avoided, and the cost is reduced; and the depth information error obtained by the inaccuracy of the binocular camera setting can be avoided, and the accuracy of the depth estimation is improved.
- FIG. 6 is a schematic structural view of another embodiment of a depth estimating apparatus according to the present disclosure.
- an image acquisition module includes: a first sub-neural network unit, an offset unit, a point multiplication unit, and an addition unit. among them:
- a first sub-neural network unit configured to process the first image and output a disparity probability map of the N channels; wherein, the disparity probability map of each channel indicates that the pixels on the first image are offset by i parallax in the first horizontal direction
- an offset unit configured to offset the first image by i pixels in the first horizontal direction according to the disparity probability map of the N channels, to obtain N offset maps.
- the point multiplication unit is configured to multiply each of the offset maps in the N offset maps by the disparity probability map of the corresponding channel to obtain N multiplication results.
- An adding unit is configured to superimpose the N point multiplication results based on the pixels to obtain a second image.
- the first sub-neural network unit includes two or more network layers of network depth, and is configured to perform feature extraction on the first image by using network layers of two or more network depths respectively to obtain two Feature maps of one or more scales; preliminary disparity probability maps of N channels with two or more resolutions based on feature maps of two or more scales; and two or more resolutions for each channel respectively
- the preliminary disparity probability map is enlarged to the resolution of the first image for superposition, and a disparity probability map of N channels is obtained.
- the corresponding pixel position in the disparity probability map of the N channels in the first image has a larger probability value in the disparity probability map of the channel corresponding to the larger disparity; in the first image The corresponding pixel position of the pixel corresponding to the background object in the disparity probability map of the N channels has a larger probability value in the disparity probability map of the channel corresponding to the smaller disparity.
- the stereo matching module may include: a first acquiring unit, a generating unit, and a third acquiring unit. among them:
- a first acquiring unit configured to acquire a correlation coefficient for indicating a positional relationship of one image of the binocular image with respect to pixels in another image; one image of the binocular image includes the first image or the second image, and the other image corresponds to A second image or a first image is included.
- a generating unit configured to generate a disparity map of the first image and the second image based on another image and the correlation coefficient.
- the third acquiring unit acquires depth information corresponding to the first image based on the disparity map.
- the first obtaining unit may include: a second sub-neural network unit configured to perform feature extraction on one image and another image respectively; and acquiring sub-units for extracting features of an image based on the extracted image Another feature of the image acquires the positional relationship of one image with the pixels in the other image and outputs a correlation coefficient for indicating the positional relationship of one image in the binocular image with respect to the pixels in the other image.
- the generating unit is configured to superimpose the features of the other image with the correlation coefficients to generate a disparity map of the first image and the second image.
- the generating unit may include: an overlay subunit for superimposing features of another image with correlation coefficients to obtain a superposition result; a fusion subunit for extracting features of the superposition result, and extracting The feature of the superimposed result is fused with the superimposed result to obtain a disparity map of the first image and the second image.
- the third obtaining unit is configured to acquire depth information corresponding to the first image based on the disparity map, the focal length of the camera that captures the first image, and the distance between the binocular cameras corresponding to the binocular image.
- the depth estimating apparatus of the at least one embodiment of the present disclosure may further include: an acquiring module, configured to acquire a three-dimensional spatial scene of the scene in the first image based on the depth information corresponding to the first image and the second image.
- FIG. 7 is a schematic structural diagram of still another embodiment of the depth estimating apparatus of the present disclosure.
- the image acquisition module and the stereo matching module may be selectively implemented by using the structure of any of the embodiments shown in FIG. 6, or may be implemented by other structures.
- the first neural network may be pre-trained using the sample binocular image in the first sample set, the first sample set including at least one set of the first sample double Eye image.
- the second neural network may be pre-trained using the sample binocular image in the second set of samples; the second set of samples includes at least one set of second sample binocular images and disparity map labels.
- a first training module is further included.
- the first neural network is configured to acquire and output the second image in the at least one set of first sample binocular images from the first image in the at least one set of first sample binocular images.
- a first training module configured to acquire a first difference between the second image output by the first neural network and the second image of the at least one set of first sample binocular images, and to the first neural network based on the first difference Train until the first training completion condition is met.
- the first training module is configured to: acquire a first difference in pixels between the second image output by the first neural network and the second image in the at least one set of first sample binocular images;
- the parameter values of the network parameters in the first neural network are adjusted based on the first difference until the first training completion condition is satisfied.
- the first training completion condition may include, but is not limited to, the first difference is less than the first preset threshold, and/or the number of trainings for the first neural network reaches the first preset number of times.
- a second training module may be further included.
- the second neural network is configured to acquire and output a disparity map of at least one set of second sample binocular images.
- a second training module configured to acquire a second difference between the disparity map output by the second neural network and the disparity map label of the at least one second sample binocular image, and train the second neural network based on the second difference, Until the second training completion condition is met.
- the second training module is specifically configured to: acquire a second difference in pixels between the disparity map output by the second neural network and the disparity map label of the at least one second sample binocular image;
- the parameter values of the network parameters in the second neural network are adjusted based on the second difference until the second training completion condition is satisfied.
- the second training completion condition may include, but is not limited to, the second difference is less than the second preset threshold, and/or the number of times of training the second neural network reaches the second preset number of times.
- a third training module may be further included, configured to utilize a depth map corresponding to the sample binocular image and the third sample image in the third sample set.
- the tag trains the first neural network and the second neural network.
- the third sample set includes at least one set of third sample binocular images and a depth map label corresponding to the third sample image.
- the first neural network is configured to acquire a second image of the at least one set of third sample binocular images from the first image of the at least one set of third sample binocular images; the second neural network, A disparity map for acquiring at least one set of third sample binocular images.
- the third training module is configured to: acquire at least one set of depth information based on the disparity map of the at least one set of third sample binocular images; and obtain between the at least one set of depth information and the depth map label of the at least one set of third sample binocular images a third difference; adjusting parameter values of the network parameters in the first neural network and the second neural network based on the third difference until the third training completion condition is satisfied.
- the third training completion condition may include, but is not limited to, the third difference is less than the third preset threshold, and/or the number of times of training the first neural network and the second neural network reaches a third preset number of times.
- an electronic device provided by an embodiment of the present disclosure includes:
- a memory for storing executable instructions
- a processor for communicating with the memory to execute the executable instructions to perform the operations of the depth estimation method of any of the above-described embodiments of the present disclosure.
- FIG. 8 is a schematic structural diagram of an application embodiment of an electronic device according to the present disclosure. Referring now to Figure 8, there is shown a block diagram of an electronic device suitable for use in implementing a terminal device or server of an embodiment of the present disclosure. As shown in FIG.
- the electronic device includes one or more processors, communication units, etc., such as one or more central processing units (CPUs) 801, and/or one or more Acceleration unit (GPU) 813, etc.
- acceleration unit 813 may include, but is not limited to, a GPU, an FPGA, other types of dedicated processors, etc.
- the processor may be in accordance with executable instructions stored in read only memory (ROM) 802 or from a storage portion 808 loads the executable instructions into random access memory (RAM) 803 to perform various appropriate actions and processes.
- ROM read only memory
- RAM random access memory
- Communication portion 812 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions over bus 804.
- a network card which can include, but is not limited to, an IB (Infiniband) network card
- the processor can communicate with read only memory 802 and/or random access memory 803 to execute executable instructions over bus 804.
- a neural network acquiring a second image in the binocular image based on the first image; acquiring, by performing a binocular stereo matching on the first image and the second image via a second neural network The depth information corresponding to the first image.
- RAM 803 various programs and data required for the operation of the device can be stored.
- the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
- ROM 802 is an optional module.
- the RAM 803 stores executable instructions, or writes executable instructions to the ROM 802 at runtime, the executable instructions causing the processor to perform operations corresponding to any of the methods described above.
- An input/output (I/O) interface 805 is also coupled to bus 804.
- the communication unit 812 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
- the following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 808 including a hard disk or the like. And a communication portion 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the Internet.
- Driver 810 is also coupled to I/O interface 805 as needed.
- a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage portion 808 as needed.
- FIG. 8 is only an optional implementation manner.
- the number and types of the components in FIG. 8 may be selected, deleted, added, or replaced according to actual needs; Different functional component settings may also be implemented by separate settings or integrated settings.
- the acceleration unit 813 and the CPU 801 may be separately disposed or the acceleration unit 813 may be integrated on the CPU 801.
- the communication unit 812 may be separately configured or integrated.
- CPU 801 or acceleration unit 813, and so on are all within the scope of the disclosure.
- an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising The depth estimation method provided by the embodiment of the present disclosure is executed to operate the corresponding instruction.
- the computer program can be downloaded and installed from the network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by the CPU.
- embodiments of the present disclosure also provide a computer program comprising computer readable code, the processor in the device executing any of the embodiments of the present disclosure when the computer readable code is run on a device An instruction to operate in the depth estimation method.
- an embodiment of the present disclosure further provides a computer readable storage medium for storing computer readable instructions, wherein when the instructions are executed, the operations in the depth estimation method according to any embodiment of the present disclosure are implemented. .
- the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
- the operation includes the c operation of the above method embodiment; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- the methods and apparatus of the present disclosure may be implemented in a number of ways.
- the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware or any combination of software, hardware, firmware.
- the above-described sequence of operations for the method is for illustrative purposes only, and the operation of the method of the present disclosure is not limited to the order specifically described above unless otherwise specifically stated.
- the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with the present disclosure.
- the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (37)
- 一种深度估计方法,其特征在于,包括:以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
- 根据权利要求1所述的方法,其特征在于,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像,包括:经第一神经网络对所述第一图像进行处理,输出N个通道的视差概率图;其中,每个通道的视差概率图表示所述第一图像上像素向第一水平方向偏移i个视差的概率,i=0,1,…,N-1,N的取值为大于1的整数;根据所述N个通道的视差概率图,将所述第一图像分别向第一水平方向偏移i个像素,得到N张偏移图;将所述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图,得到N个点乘结果;将所述N个点乘结果基于像素进行叠加,得到所述第二图像。
- 根据权利要求2所述的方法,其特征在于,所述经第一神经网络对所述第一图像进行处理,输出N个通道的视差概率图,包括:分别通过第一神经网络中至少两个网络深度的网络层对所述第一图像进行特征提取,获得至少两个尺度的特征图;基于所述至少两个尺度的特征图分别获取至少两个分辨率的、N个通道的初步视差概率图;分别针对每个通道,将所述至少两个分辨率的初步视差概率图放大至所述第一图像的分辨率下进行叠加,得到所述N个通道的视差概率图。
- 根据权利要求2或3所述的方法,其特征在于,所述第一图像中对应前景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;所述第一图像中对应背景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。
- 根据权利要求1-4任一所述的方法,其特征在于,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息,包括:获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数;所述双目图像中一个图像包括所述第一图像或所述第二图像,所述另一个图像对应包括所述第二图像或所述第一图像;基于所述另一个图像与所述相关系数,生成所述第一图像与所述第二图像的视差图;基于所述视差图获取所述第一图像对应的深度信息。
- 根据权利要求5所述的方法,其特征在于,所述获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数,包括:分别对所述一个图像与所述另一个图像进行特征提取;基于提取的所述一个图像的特征与所述另一个图像的特征,获取所述一个图像与所述另一个图像中像素的位置关系,并输出用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的所述相关系数。
- 根据权利要求6所述的方法,其特征在于,基于所述另一个图像与所述相关系数,生成第一图像与所述第二图像的视差图,包括:将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图。
- 根据权利要求7所述的方法,其特征在于,将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图,包括:将所述另一个图像的特征与所述相关系数进行叠加,得到叠加结果;提取所述叠加结果的特征,并将提取的所述叠加结果的特征与所述叠加结果进行融合,获得所述第一图像与所述第二图像的视差图。
- 根据权利要求6-8任一所述的方法,其特征在于,所述基于所述视差图获取所述第一图像对应的深度信息,包括:基于所述视差图、拍摄所述第一图像的相机的焦距和所述双目图像对应的双目相机之间的距离,获取所述第一图像对应的深度信息。
- 根据权利要求1-9任一所述的方法,其特征在于,还包括:基于所述第一图像对应的所述深度信息与所述第二图像,获取所述第一图像中场景的三维空间场景。
- 根据权利要求5-10任一所述的方法,其特征在于,所述第一神经网络利用第一样本集中的样本双目图像预先训练而得,所述第一样本集包括至少一组第一样本双目图像;和/或,所述第二神经网络利用第二样本集中的样本双目图像预先训练而得;所述第二样本集包括至少一组第二样本双目图像和视差图标签。
- 根据权利要求11所述的方法,其特征在于,所述第一神经网络的训练包括:经所述第一神经网络,由所述至少一组第一样本双目图像中的第一图像,获取所述至少一组第一样本双目图像中的第二图像并输出;获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件。
- 根据权利要求12所述的方法,其特征在于,获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件,包括:获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间在像素上的第一差异;基于所述第一差异调整所述第一神经网络中网络参数的参数值,直至满足第一训练完成条件;所述第一训练完成条件包括:所述第一差异小于第一预设阈值,和/或,对所述第一神经网络的训练次数达到第一预设次数。
- 根据权利要求11-13任一所述的方法,其特征在于,所述第二神经网络的训练包括:经所述第二神经网络,获取所述至少一组第二样本双目图像的视差图并输出;获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件。
- 根据权利要求14所述的方法,其特征在于,获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件,包括:获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标 签之间在像素上的第二差异;基于所述第二差异调整所述第二神经网络中网络参数的参数值,直至满足第二训练完成条件;所述第二训练完成条件包括:所述第二差异小于第二预设阈值,和/或,对所述第二神经网络的训练次数达到第二预设次数。
- 根据权利要求11-15任一所述的方法,其特征在于,所述第一神经网络和所述第二神经网络的训练还包括:利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练;其中,所述第三样本集包括至少一组第三样本双目图像和第三样本图像对应的深度图标签。
- 根据权利要求16所述的方法,其特征在于,所述利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练,包括:经所述第一神经网络,由所述至少一组第三样本双目图像中的第一图像,获取所述至少一组第三样本双目图像中的第二图像;经所述第二神经网络,获取所述至少一组第三样本双目图像的视差图;基于所述至少一组第三样本双目图像的视差图获取至少一组深度信息;获取所述至少一组深度信息与所述至少一组第三样本双目图像的深度图标签之间的第三差异;基于所述第三差异调整所述第一神经网络和所述第二神经网络中网络参数的参数值,直至满足第三训练完成条件;所述第三训练完成条件包括:所述第三差异小于第三预设阈值,和/或,对所述第一神经网络和所述第二神经网络的训练次数达到第三预设次数。
- 一种深度估计装置,其特征在于,包括:图像获取模块,用于以单张图片作为双目图像中的第一图像,经第一神经网络,基于所述第一图像获取所述双目图像中的第二图像;立体匹配模块,用于经第二神经网络,通过对所述第一图像与所述第二图像进行双目立体匹配,获取所述第一图像对应的深度信息。
- 根据权利要求18所述的装置,其特征在于,所述图像获取模块包括:第一子神经网络单元,用于对所述第一图像进行处理,输出N个通道的视差概率图;其中,每个通道的视差概率图表示所述第一图像上像素向第一水平方向偏移i个视差的概率,i=0,1,…,N-1,N的取值为大于1的整数;偏移单元,用于根据所述N个通道的视差概率图,将所述第一图像分别向第一水平方向偏移i个像素,得到N张偏移图;点乘单元,用于将所述N张偏移图中的各张偏移图分别点乘对应通道的视差概率图中,得到N个点乘结果;加法单元,用于将所述N个点乘结果基于像素进行叠加,得到所述第二图像。
- 根据权利要求19所述的装置,其特征在于,所述第一子神经网络单元,包括至少两个网络深度的网络层,用于:分别通过至少两个网络深度的网络层对所述第一图像进行特征提取,获得至少两个尺度的特征图;基于所述至少两个尺度的特征图分别获取至少两个分辨率的、N个通道的初步视差概率图;分别针对每个通道,将所述至少两个分辨率的初步视差概率图放大至所述第一图像的分辨率下进行叠加,得到所述N个通道的视差概率图。
- 根据权利要求19或20所述的装置,其特征在于,所述第一图像中对应前景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较大视差的通道的视差概率图中具有较大的概率值;所述第一图像中对应背景对象的像素在所述N个通道的视差概率图中的对应像素位置,在对应较小视差的通道的视差概率图中具有较大的概率值。
- 根据权利要求18-21任一所述的装置,其特征在于,所述立体匹配模块包括:第一获取单元,用于获取用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的相关系数;所述双目图像中一个图像包括所述第一图像或所述第二图像,所述另一个图像对应包括所述第二图像或所述第一图像;生成单元,用于基于所述另一个图像与所述相关系数,生成所述第一图像与所述第二图像的视差图;第三获取单元,基于所述视差图获取所述第一图像对应的深度信息。
- 根据权利要求22所述的装置,其特征在于,所述第一获取单元,包括:第二子神经网络单元,用于分别对所述一个图像与所述另一个图像进行特征提取;获取子单元,用于基于提取的所述一个图像的特征与所述另一个图像的特征,获取所述一个图像与所述另一个图像中像素的位置关系,并输出用于表示所述双目图像中一个图像相对于另一个图像中像素的位置关系的所述相关系数。
- 根据权利要求23所述的装置,其特征在于,所述生成单元,用于将所述另一个图像的特征与所述相关系数进行叠加,生成所述第一图像与所述第二图像的视差图。
- 根据权利要求24所述的装置,其特征在于,所述生成单元,包括:叠加子单元,用于将所述另一个图像的特征与所述相关系数进行叠加,得到叠加结果;融合子单元,用于提取所述叠加结果的特征,并将提取的所述叠加结果的特征与所述叠加结果进行融合,获得所述第一图像与所述第二图像的视差图。
- 根据权利要求23-25任一所述的装置,其特征在于,所述第三获取单元,用于基于所述视差图、拍摄所述第一图像的相机的焦距和所述双目图像对应的双目相机之间的距离,获取所述第一图像对应的深度信息。
- 根据权利要求18-26任一所述的装置,其特征在于,还包括:获取模块,用于基于所述第一图像对应的所述深度信息与所述第二图像,获取所述第一图像中场景的三维空间场景。
- 根据权利要求22-27任一所述的装置,其特征在于,所述第一神经网络利用第一样本集中的样本双目图像预先训练而得,所述第一样本集包括至少一组第一样本双目图像;和/或,所述第二神经网络利用第二样本集中的样本双目图像预先训练而得;所述第二样本集包括至少一组第二样本双目图像和视差图标签。
- 根据权利要求28所述的装置,其特征在于,所述第一神经网络,用于由所述至少一组第一样本双目图像中的第一图像,获取所述至少一组第一样本双目图像中的第二图像并输出;所述装置还包括:第一训练模块,用于获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二图像之间的第一差异,并基于所述第一差异对所述第一神经网络进行训练,直至满足第一训练完成条件。
- 根据权利要求29所述的装置,其特征在于,所述第一训练模块,用于:获取所述第一神经网络输出的第二图像与所述至少一组第一样本双目图像中的第二 图像之间在像素上的第一差异;基于所述第一差异调整所述第一神经网络中网络参数的参数值,直至满足第一训练完成条件;所述第一训练完成条件包括:所述第一差异小于第一预设阈值,和/或,对所述第一神经网络的训练次数达到第一预设次数。
- 根据权利要求28-30任一所述的装置,其特征在于,所述第二神经网络,用于获取所述至少一组第二样本双目图像的视差图并输出;所述装置还包括:第二训练模块,用于获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间的第二差异,并基于所述第二差异对所述第二神经网络进行训练,直至满足第二训练完成条件。
- 根据权利要求31所述的装置,其特征在于,所述第二训练模块,用于:获取所述第二神经网络输出的视差图与所述至少一组第二样本双目图像的视差图标签之间在像素上的第二差异;基于所述第二差异调整所述第二神经网络中网络参数的参数值,直至满足第二训练完成条件;所述第二训练完成条件包括:所述第二差异小于第二预设阈值,和/或,对所述第二神经网络的训练次数达到第二预设次数。
- 根据权利要求28-33任一所述的装置,其特征在于,还包括:第三训练模块,用于利用第三样本集中的样本双目图像和第三样本图像对应的深度图标签对所述第一神经网络和所述第二神经网络进行训练;其中,所述第三样本集包括至少一组第三样本双目图像和第三样本图像对应的深度图标签。
- 根据权利要求33所述的装置,其特征在于,所述第一神经网络,用于由所述至少一组第三样本双目图像中的第一图像,获取所述至少一组第三样本双目图像中的第二图像;所述第二神经网络,用于获取所述至少一组第三样本双目图像的视差图;所述第三训练模块,用于:基于所述至少一组第三样本双目图像的视差图获取至少一组深度信息;获取所述至少一组深度信息与所述至少一组第三样本双目图像的深度图标签之间的第三差异;基于所述第三差异调整所述第一神经网络和所述第二神经网络中网络参数的参数值,直至满足第三训练完成条件;所述第三训练完成条件包括:所述第三差异小于第三预设阈值,和/或,对所述第一神经网络和所述第二神经网络的训练次数达到第三预设次数。
- 一种电子设备,其特征在于,包括:存储器,用于存储可执行指令;以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1-17任一所述方法的操作。
- 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-17任一所述方法中操作的指令。
- 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-17任一所述方法中的操作。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG11202003141PA SG11202003141PA (en) | 2018-02-01 | 2019-01-30 | Depth estimation method and apparatus, electronic device, program, and medium |
KR1020207009470A KR102295403B1 (ko) | 2018-02-01 | 2019-01-30 | 깊이 추정 방법 및 장치, 전자 기기, 프로그램 및 매체 |
JP2020517931A JP6951565B2 (ja) | 2018-02-01 | 2019-01-30 | 深度推定方法及び装置、電子機器並びに媒体 |
US16/835,418 US11308638B2 (en) | 2018-02-01 | 2020-03-31 | Depth estimation method and apparatus, electronic device, program, and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103195.0 | 2018-02-01 | ||
CN201810103195.0A CN108335322B (zh) | 2018-02-01 | 2018-02-01 | 深度估计方法和装置、电子设备、程序和介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/835,418 Continuation US11308638B2 (en) | 2018-02-01 | 2020-03-31 | Depth estimation method and apparatus, electronic device, program, and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019149206A1 true WO2019149206A1 (zh) | 2019-08-08 |
Family
ID=62928066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/073820 WO2019149206A1 (zh) | 2018-02-01 | 2019-01-30 | 深度估计方法和装置、电子设备、程序和介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11308638B2 (zh) |
JP (1) | JP6951565B2 (zh) |
KR (1) | KR102295403B1 (zh) |
CN (1) | CN108335322B (zh) |
SG (1) | SG11202003141PA (zh) |
WO (1) | WO2019149206A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021084530A1 (en) * | 2019-10-27 | 2021-05-06 | Ramot At Tel-Aviv University Ltd. | Method and system for generating a depth map |
CN112862877A (zh) * | 2021-04-09 | 2021-05-28 | 北京百度网讯科技有限公司 | 用于训练图像处理网络和图像处理的方法和装置 |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108335322B (zh) * | 2018-02-01 | 2021-02-12 | 深圳市商汤科技有限公司 | 深度估计方法和装置、电子设备、程序和介质 |
CN110622213B (zh) * | 2018-02-09 | 2022-11-15 | 百度时代网络技术(北京)有限公司 | 利用3d语义地图进行深度定位和分段的系统和方法 |
CN109299656B (zh) * | 2018-08-13 | 2021-10-22 | 浙江零跑科技股份有限公司 | 一种车载视觉系统场景视深确定方法 |
CN109598754B (zh) * | 2018-09-29 | 2020-03-17 | 天津大学 | 一种基于深度卷积网络的双目深度估计方法 |
US10503966B1 (en) * | 2018-10-11 | 2019-12-10 | Tindei Network Technology (Shanghai) Co., Ltd. | Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same |
CN111209770B (zh) * | 2018-11-21 | 2024-04-23 | 北京三星通信技术研究有限公司 | 一种车道线识别方法及装置 |
CN111210467A (zh) * | 2018-12-27 | 2020-05-29 | 上海商汤智能科技有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN111383256B (zh) * | 2018-12-29 | 2024-05-17 | 北京市商汤科技开发有限公司 | 图像处理方法、电子设备及计算机可读存储介质 |
CN109741388B (zh) * | 2019-01-29 | 2020-02-28 | 北京字节跳动网络技术有限公司 | 用于生成双目深度估计模型的方法和装置 |
CN109840500B (zh) * | 2019-01-31 | 2021-07-02 | 深圳市商汤科技有限公司 | 一种三维人体姿态信息检测方法及装置 |
CN110223334B (zh) * | 2019-05-07 | 2021-09-14 | 深圳云天励飞技术有限公司 | 一种景深图获取方法及装置 |
CN109934307B (zh) * | 2019-05-08 | 2021-04-09 | 北京奇艺世纪科技有限公司 | 视差图预测模型训练方法、预测方法、装置及电子设备 |
US20200364442A1 (en) * | 2019-05-15 | 2020-11-19 | Getac Technology Corporation | System for detecting surface pattern of object and artificial neural network-based method for detecting surface pattern of object |
CN112434702A (zh) * | 2019-08-26 | 2021-03-02 | 阿里巴巴集团控股有限公司 | 图像处理方法、装置、计算机设备、存储介质 |
US11294996B2 (en) | 2019-10-15 | 2022-04-05 | Assa Abloy Ab | Systems and methods for using machine learning for image-based spoof detection |
US11348375B2 (en) | 2019-10-15 | 2022-05-31 | Assa Abloy Ab | Systems and methods for using focal stacks for image-based spoof detection |
CN111047634B (zh) * | 2019-11-13 | 2023-08-08 | 杭州飞步科技有限公司 | 场景深度的确定方法、装置、设备及存储介质 |
CN112991254A (zh) * | 2019-12-13 | 2021-06-18 | 上海肇观电子科技有限公司 | 视差估计系统、方法、电子设备及计算机可读存储介质 |
CN113034568B (zh) * | 2019-12-25 | 2024-03-29 | 杭州海康机器人股份有限公司 | 一种机器视觉深度估计方法、装置、系统 |
CN111652922B (zh) * | 2020-06-04 | 2023-09-08 | 江苏天宏机械工业有限公司 | 一种基于双目视觉的单目视频深度估计方法 |
US11275959B2 (en) * | 2020-07-07 | 2022-03-15 | Assa Abloy Ab | Systems and methods for enrollment in a multispectral stereo facial recognition system |
US11836965B2 (en) | 2020-08-12 | 2023-12-05 | Niantic, Inc. | Determining visual overlap of images by using box embeddings |
CN112489103B (zh) * | 2020-11-19 | 2022-03-08 | 北京的卢深视科技有限公司 | 一种高分辨率深度图获取方法及系统 |
CN112446328B (zh) * | 2020-11-27 | 2023-11-17 | 汇纳科技股份有限公司 | 单目深度的估计系统、方法、设备及计算机可读存储介质 |
CN112903952B (zh) * | 2021-01-21 | 2022-05-27 | 北京航空航天大学 | 一种金属板结构损伤评价系统和方法 |
CN112861940A (zh) * | 2021-01-26 | 2021-05-28 | 上海西井信息科技有限公司 | 双目视差估计方法、模型训练方法以及相关设备 |
CN112949504B (zh) * | 2021-03-05 | 2024-03-19 | 深圳市爱培科技术股份有限公司 | 立体匹配方法、装置、设备及存储介质 |
CN112967332B (zh) * | 2021-03-16 | 2023-06-16 | 清华大学 | 基于门控成像的双目深度估计方法、装置及计算机设备 |
US11823402B2 (en) | 2021-05-03 | 2023-11-21 | Electronics And Telecommunications Research Institute | Method and apparatus for correcting error in depth information estimated from 2D image |
KR102641108B1 (ko) * | 2021-08-03 | 2024-02-27 | 연세대학교 산학협력단 | 깊이맵 완성 장치 및 방법 |
CN113928282A (zh) * | 2021-11-24 | 2022-01-14 | 扬州大学江都高端装备工程技术研究所 | 融合路面环境和车辆安全模型的辅助巡航主动刹车方法 |
CN114627535B (zh) * | 2022-03-15 | 2024-05-10 | 平安科技(深圳)有限公司 | 基于双目摄像头的坐标匹配方法、装置、设备及介质 |
CN114615507B (zh) * | 2022-05-11 | 2022-09-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种图像编码方法、解码方法及相关装置 |
CN115937290B (zh) * | 2022-09-14 | 2024-03-22 | 北京字跳网络技术有限公司 | 一种图像深度估计方法、装置、电子设备及存储介质 |
CN116129036B (zh) * | 2022-12-02 | 2023-08-29 | 中国传媒大学 | 一种深度信息引导的全方向图像三维结构自动恢复方法 |
CN117726666B (zh) * | 2024-02-08 | 2024-06-04 | 北京邮电大学 | 跨相机单目图片度量深度估计方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523464A (zh) * | 2011-12-12 | 2012-06-27 | 上海大学 | 一种双目立体视频的深度图像估计方法 |
WO2018006296A1 (en) * | 2016-07-06 | 2018-01-11 | SZ DJI Technology Co., Ltd. | Systems and methods for stereoscopic imaging |
CN107578435A (zh) * | 2017-09-11 | 2018-01-12 | 清华-伯克利深圳学院筹备办公室 | 一种图像深度预测方法及装置 |
CN108335322A (zh) * | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | 深度估计方法和装置、电子设备、程序和介质 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02101584A (ja) * | 1988-10-11 | 1990-04-13 | Nippon Telegr & Teleph Corp <Ntt> | ステレオ画像処理方式 |
CN101907448B (zh) * | 2010-07-23 | 2013-07-03 | 华南理工大学 | 一种基于双目三维视觉的深度测量方法 |
KR101691034B1 (ko) * | 2010-08-03 | 2016-12-29 | 삼성전자주식회사 | 3차원 그래픽 기반 단말기에서 객체 렌더링 시 부가정보 합성 장치 및 방법 |
JP6210483B2 (ja) * | 2012-04-26 | 2017-10-11 | 国立大学法人山口大学 | 立体内視鏡画像からの3次元形状取得装置 |
CN102750702B (zh) * | 2012-06-21 | 2014-10-15 | 东华大学 | 基于优化bp神经网络模型的单目红外图像深度估计方法 |
WO2016105541A1 (en) * | 2014-12-24 | 2016-06-30 | Reald Inc. | Adjustment of perceived roundness in stereoscopic image of a head |
US9811756B2 (en) * | 2015-02-23 | 2017-11-07 | Mitsubishi Electric Research Laboratories, Inc. | Method for labeling images of street scenes |
GB2553782B (en) * | 2016-09-12 | 2021-10-20 | Niantic Inc | Predicting depth from image data using a statistical model |
CN106355570B (zh) * | 2016-10-21 | 2019-03-19 | 昆明理工大学 | 一种结合深度特征的双目立体视觉匹配方法 |
CN106612427B (zh) * | 2016-12-29 | 2018-07-06 | 浙江工商大学 | 一种基于卷积神经网络的时空一致性深度图序列的生成方法 |
CN106504190B (zh) * | 2016-12-29 | 2019-09-13 | 浙江工商大学 | 一种基于3d卷积神经网络的立体视频生成方法 |
RU2698402C1 (ru) * | 2018-08-30 | 2019-08-26 | Самсунг Электроникс Ко., Лтд. | Способ обучения сверточной нейронной сети для восстановления изображения и система для формирования карты глубины изображения (варианты) |
-
2018
- 2018-02-01 CN CN201810103195.0A patent/CN108335322B/zh active Active
-
2019
- 2019-01-30 SG SG11202003141PA patent/SG11202003141PA/en unknown
- 2019-01-30 KR KR1020207009470A patent/KR102295403B1/ko active IP Right Grant
- 2019-01-30 WO PCT/CN2019/073820 patent/WO2019149206A1/zh active Application Filing
- 2019-01-30 JP JP2020517931A patent/JP6951565B2/ja active Active
-
2020
- 2020-03-31 US US16/835,418 patent/US11308638B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523464A (zh) * | 2011-12-12 | 2012-06-27 | 上海大学 | 一种双目立体视频的深度图像估计方法 |
WO2018006296A1 (en) * | 2016-07-06 | 2018-01-11 | SZ DJI Technology Co., Ltd. | Systems and methods for stereoscopic imaging |
CN107578435A (zh) * | 2017-09-11 | 2018-01-12 | 清华-伯克利深圳学院筹备办公室 | 一种图像深度预测方法及装置 |
CN108335322A (zh) * | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | 深度估计方法和装置、电子设备、程序和介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021084530A1 (en) * | 2019-10-27 | 2021-05-06 | Ramot At Tel-Aviv University Ltd. | Method and system for generating a depth map |
CN112862877A (zh) * | 2021-04-09 | 2021-05-28 | 北京百度网讯科技有限公司 | 用于训练图像处理网络和图像处理的方法和装置 |
CN112862877B (zh) * | 2021-04-09 | 2024-05-17 | 北京百度网讯科技有限公司 | 用于训练图像处理网络和图像处理的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108335322A (zh) | 2018-07-27 |
JP6951565B2 (ja) | 2021-10-20 |
JP2020535547A (ja) | 2020-12-03 |
CN108335322B (zh) | 2021-02-12 |
US11308638B2 (en) | 2022-04-19 |
KR102295403B1 (ko) | 2021-08-31 |
KR20200049833A (ko) | 2020-05-08 |
US20200226777A1 (en) | 2020-07-16 |
SG11202003141PA (en) | 2020-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019149206A1 (zh) | 深度估计方法和装置、电子设备、程序和介质 | |
EP3698323B1 (en) | Depth from motion for augmented reality for handheld user devices | |
US11468585B2 (en) | Pseudo RGB-D for self-improving monocular slam and depth prediction | |
TWI766175B (zh) | 單目圖像深度估計方法、設備及儲存介質 | |
Luo et al. | Single view stereo matching | |
EP2992508B1 (en) | Diminished and mediated reality effects from reconstruction | |
US9237330B2 (en) | Forming a stereoscopic video | |
CN111899282B (zh) | 基于双目摄像机标定的行人轨迹跟踪方法及装置 | |
Guizilini et al. | Full surround monodepth from multiple cameras | |
TW202117611A (zh) | 電腦視覺訓練系統及訓練電腦視覺系統的方法 | |
KR100560464B1 (ko) | 관찰자의 시점에 적응적인 다시점 영상 디스플레이 시스템을 구성하는 방법 | |
JP2020523703A (ja) | ダブル視野角画像較正および画像処理方法、装置、記憶媒体ならびに電子機器 | |
JP7184748B2 (ja) | 場面の積層深度データを生成するための方法 | |
US20170064279A1 (en) | Multi-view 3d video method and system | |
US9483836B2 (en) | Method and apparatus for real-time conversion of 2-dimensional content to 3-dimensional content | |
JPWO2021076757A5 (zh) | ||
CN111598927B (zh) | 一种定位重建方法和装置 | |
US11810308B2 (en) | Vertical disparity detection in stereoscopic images using a deep neural network | |
CN111260544B (zh) | 数据处理方法及装置、电子设备和计算机存储介质 | |
Chantara et al. | Initial depth estimation using EPIs and structure tensor | |
San et al. | Early experience of depth estimation on intricate objects using generative adversarial networks | |
Xian et al. | ViTA: Video Transformer Adaptor for Robust Video Depth Estimation | |
KR20220071935A (ko) | 광학 흐름을 이용한 고해상도 깊이 영상 추정 방법 및 장치 | |
CN116402878A (zh) | 光场图像处理方法及装置 | |
Takaya et al. | Interactive 3D Contents Generation for Auto-stereoscopic Display based on Depth Camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19747562 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020517931 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20207009470 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19747562 Country of ref document: EP Kind code of ref document: A1 |