WO2019192290A1 - 一种深度信息确定的方法及相关装置 - Google Patents

一种深度信息确定的方法及相关装置 Download PDF

Info

Publication number
WO2019192290A1
WO2019192290A1 PCT/CN2019/077669 CN2019077669W WO2019192290A1 WO 2019192290 A1 WO2019192290 A1 WO 2019192290A1 CN 2019077669 W CN2019077669 W CN 2019077669W WO 2019192290 A1 WO2019192290 A1 WO 2019192290A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
disparity
map
depth information
disparity map
Prior art date
Application number
PCT/CN2019/077669
Other languages
English (en)
French (fr)
Inventor
揭泽群
凌永根
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19781105.2A priority Critical patent/EP3779881A4/en
Publication of WO2019192290A1 publication Critical patent/WO2019192290A1/zh
Priority to US16/899,287 priority patent/US11145078B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Definitions

  • the present application relates to the field of computer processing, and more particularly to depth information determination.
  • Parallax is the difference between the direction in which an observer sees the same object at two different locations. For example, when you put a finger on your head, close your right eye, look at it with your left eye, close your left eye, and look at it with your right eye. You will notice that the position of the object relative to the distance has changed. Look at the parallax of the same point from different angles.
  • the embodiment of the present application provides a method for determining depth information and related devices.
  • the recursive learning method can fully consider the complementary information of the dual purpose, and continuously correct the binocular disparity map, so that for a region where the binocular is difficult to match, Can effectively reduce the error of depth information.
  • a first aspect of the embodiments of the present application provides a method for determining depth information, including:
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by a neural network model to obtain a t-th left-eye disparity map;
  • the t-th right-eye matching similarity and the t-1th right-eye attention map are processed by the neural network model to obtain a t-th right-eye disparity map;
  • a second aspect of the embodiments of the present application provides a depth information determining apparatus, including:
  • an acquisition module configured to acquire a t-th left-eye matching similarity from the left-eye image to the right-eye image, and a t-th right-eye matching similarity from the right-eye image to the left-eye image, where the t is greater than 1.
  • a processing module configured to process, by using a neural network model, the t-th left-eye matching similarity acquired by the acquiring module and the t-1th left-eye attention map to obtain a t-th left-eye disparity map;
  • the processing module is further configured to process, by using the neural network model, the t-th right-eye matching similarity and the t-1th right-eye attention map acquired by the acquiring module, to obtain a t-th right-eye disparity map;
  • a determining module configured to determine first depth information according to the t-th left-eye disparity map processed by the processing module, and determine second depth information according to the t-th right-eye disparity map processed by the processing module.
  • a third aspect of the embodiments of the present application provides a depth information determining apparatus, including: a memory, a processor, and a bus system;
  • the memory is used to store a program
  • the processor is configured to execute the program in the memory, and specifically includes the following steps:
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by a neural network model to obtain a t-th left-eye disparity map;
  • the t-th right-eye matching similarity and the t-1th right-eye attention map are processed by the neural network model to obtain a t-th right-eye disparity map;
  • the bus system is for connecting the memory and the processor to cause the memory and the processor to communicate.
  • a fourth aspect of an embodiment of the present application provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • a method for determining depth information which acquires a t-th left-eye matching similarity from a left-eye image to a right-eye image, and a t-th right-eye matching similarity from a right-eye image to a left-eye image, and then passes
  • the neural network model processes the t-th left-eye matching similarity and the t-1th left-eye attention map to obtain the t-th left-eye disparity map, and the t-th right-eye matching similarity and the t-1th time through the neural network model
  • the right eye is processed to obtain the t-th right-eye disparity map, and finally the first depth information is determined according to the t-th left-eye disparity map, and the second depth information is determined according to the t-th right-eye disparity map.
  • the binocular disparity map can be obtained by using the neural network model and the binocular attention map obtained in the previous study, and the binocular disparity map obtained according to the present time is used to learn the binocular attention map, and then guide the next time.
  • the binocular parallax map such that recursive learning can make full use of the complementary information of the dual purpose, and constantly correct the binocular parallax map, thereby effectively reducing the error of the depth information for the region where the binocular is difficult to match.
  • FIG. 1A is a schematic diagram of binocular parallax based on recursive learning in an embodiment of the present application
  • FIG. 1B is a schematic structural diagram of a depth information determining apparatus according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an embodiment of a method for determining depth information according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of comparison between an original image and a model predicted depth map in the embodiment of the present application
  • FIG. 4 is a schematic diagram of generating a binocular attention map in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a recursive binocular parallax network according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a convolutional long and short memory network in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an embodiment of a depth information determining apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another embodiment of a depth information determining apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a depth information determining apparatus according to an embodiment of the present application.
  • the embodiment of the present application provides a method for determining depth information and related devices.
  • the recursive learning method can fully consider the complementary information of the dual purpose, and continuously correct the binocular disparity map, so that for a region where the binocular is difficult to match, Can effectively reduce the error of depth information.
  • the present application can be applied to facilities equipped with binocular cameras (such as binocular robots and unmanned vehicles, etc.) for object depth estimation.
  • the present application mainly obtains the parallax of the binocular visual image through the deep neural network, and divides the product of the distance between the two cameras and the focal length by the predicted parallax to obtain the depth value.
  • a convolutional neural network is first used to predict the similarity of the matching of the one-eye image to the other mesh image (ie, the left-to-right eye, and the right-to-left-eye) in different parallaxes, and the matching similarity in different parallaxes is reused.
  • a Convolutional Long Short-Term Memory recursively performs a cycle of "binocular parallax prediction - binocular parallax map comparison".
  • the complementary information of the left and right vision can be fully utilized, and the difficult matching regions in the left and right vision (such as repetitive regions, texture missing regions or complex object edges) can be automatically detected to reach double
  • the correction update of the visual difference prediction value continuously improves the accuracy of the parallax prediction, that is, the accuracy of the depth.
  • the convolutional neural network is used to predict the matching similarity between the left eye image and the right eye image, and the right eye image to the left eye image at different disparity, and then based on the predicted matching similarity.
  • FIG. 1A is a schematic diagram of binocular parallax based on recursive learning in the embodiment of the present application, as shown in the figure, assuming images taken by left and right binoculars Both are H*W (H is height, W is width) resolution.
  • the convolutional neural network is used to extract the features of the binocular image at the pixel level.
  • H*W*C is obtained respectively (C is the feature dimension). Characteristic map. Then, the feature maps of the two H*W*C are combined with the features of different disparities in the horizontal direction to obtain the feature maps with the maximum disparity of D max (the dimension is H*W*2C*D max ), and then another The convolution kernel is a 1*1 convolutional neural network, which predicts the matching similarity of all pixels in different disparity, and obtains a matching similarity value based on the input characteristics of 2C.
  • the left-eye image to the right-eye image and the right-eye image to the left-eye image can predict a matching similarity of H*W*D max .
  • ConvLSTM ConvLSTM
  • FIG. 1B is a schematic structural diagram of a depth information determining apparatus according to an embodiment of the present application.
  • the depth information determining apparatus provided by the application may be deployed on a server, and the processing result is transmitted by the server to the target.
  • the device can also directly deploy the depth information determining device on the target device.
  • the target equipment includes but is not limited to (unmanned) cars, robots, (unmanned) aircraft, and intelligent terminals, which have binocular stereo vision, can be obtained from different locations based on the principle of parallax and using imaging equipment.
  • the two images of the measured object acquire the three-dimensional geometric information of the object by calculating the positional deviation between the corresponding points of the image.
  • Binocular stereoscopic vision combines the images obtained by the two eyes and observes the difference between them, so that we can obtain a clear sense of depth, establish the correspondence between the features, and map the points of the same spatial physical point in different images. This difference is called a parallax image.
  • Binocular parallax is sometimes referred to as stereo disparity and is a depth cues. The closer the object is to the observer, the greater the difference between the objects seen by the two eyes, which creates binocular parallax. The brain can use this measurement of parallax to estimate the distance from the object to the eye.
  • an embodiment of the method for determining depth information in the embodiment of the present application includes:
  • the depth information determining device acquires the left eye image and the right eye image through the binocular camera, and then calculates the tth left eye matching similarity from the left eye image to the right eye image, and the tth time from the right eye image to the left eye image.
  • the right eye matches the similarity, and t is an integer greater than 1, which can be considered as the matching similarity obtained at the tth time.
  • the first type Mean Absolute Differences (MAD)
  • MAD Mean Absolute Differences
  • the search graph S (i, j) can be taken as the upper left corner, the subgraph of size M*N is taken, the similarity with the template is calculated, and the entire search graph S is traversed, in all the subgraphs that can be obtained. Find the subgraph that is most similar to the template map as the final match.
  • SSD Sum of Squared Differences
  • NCC Normalized Cross Correlation
  • the fifth type Sequential Similiarity Detection Algorithm (SSDA) is an improvement on the traditional template matching algorithm, which is tens to hundreds of times faster than the MAD algorithm.
  • SSDA Sequential Similiarity Detection Algorithm
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by a neural network model to obtain a t-th left-eye disparity map;
  • the depth information determining device inputs the left-eye matching similarity obtained this time (tth time) and the left-eye attention generated last time (the t-1th) into the neural network model, and the neural network model is usually Pre-trained, the current (tth) left-eye disparity map is output from the neural network model.
  • the t-th right-eye matching similarity and the t-1th right-eye attention map are processed by a neural network model to obtain a t-th right-eye disparity map;
  • the depth information determining means inputs the right-eye matching similarity obtained this time (tth time) and the right-eye attention generated last time (t-1th) into the neural network model, the neural network The model is usually pre-trained, and the current (tth) right-eye disparity map is output from the neural network model.
  • step 102 and step 103 may be: performing step 102 and then performing step 103, or performing step 103 and then performing step 102, or performing step 102 and step 103 simultaneously. No restrictions are imposed.
  • the depth information determining device determines the depth information (ie, the first depth information) of the tth left-eye disparity map according to the t-th left-eye disparity map output by the neural network model.
  • the depth information determining means determines the depth information (i.e., the second depth information) of the tth right-eye disparity map based on the t-th right-eye disparity map output by the neural network model.
  • FIG. 3 is a schematic diagram of a comparison between the original image and the model prediction depth map in the embodiment of the present application.
  • the neural network model provided by the present application can predict a high quality depth map. .
  • the application can improve the accuracy of the depth estimation of the binocular object, and is decisive for the automatic driving and working of the robot equipped with the binocular camera and the unmanned vehicle, and has potential economic benefits.
  • a method for determining depth information which acquires a t-th left-eye matching similarity from a left-eye image to a right-eye image, and a t-th right-eye matching similarity from a right-eye image to a left-eye image, and then passes
  • the neural network model processes the t-th left-eye matching similarity and the t-1th left-eye attention map to obtain the t-th left-eye disparity map, and the t-th right-eye matching similarity and the t-1th time through the neural network model
  • the right eye is processed to obtain the t-th right-eye disparity map, and finally the first depth information is determined according to the t-th left-eye disparity map, and the second depth information is determined according to the t-th right-eye disparity map.
  • the binocular disparity map can be obtained by using the neural network model and the binocular attention map obtained in the previous study, and the binocular disparity map obtained according to the present time is used to learn the binocular attention map, and then guide the next time.
  • the binocular parallax map such that recursive learning can make full use of the complementary information of the dual purpose, and constantly correct the binocular parallax map, thereby effectively reducing the error of the depth information for the region where the binocular is difficult to match.
  • the method may further include:
  • the t-th right eye attention map is generated based on the t-th right-eye mapping disparity map and the t-th right-eye disparity map.
  • the depth information determining apparatus generates a attention map by using a disparity map and a disparity map.
  • FIG. 4 is a schematic diagram of generating a binocular attention map according to an embodiment of the present application, as shown in the figure. After generating the tth right-eye disparity map and the t-th left-eye disparity map by the neural network model, the t-th right-eye disparity map can be mapped to the left-eye coordinate system to obtain the t-th left-eye mapping disparity map, and the t-th will be The left-eye disparity map is mapped to the right-eye coordinate system, and the t-th right-eye mapping disparity map is obtained.
  • mapping that is, the conversion of two disparity maps to the coordinates of the opposite disparity map.
  • the original t-th left-eye disparity map and the converted t-th left-eye mapping disparity map are connected, and input into a model consisting of several simple convolution layers and transform layers to obtain the t-th left-eye attention. Try to figure it out.
  • the original t-th right-eye disparity map and the converted t-th right-eye mapping disparity map are connected, and input into a model consisting of several simple convolution layers and transform layers to obtain the t-th right-eye attention. Try to figure it out.
  • the attention map reflects the confidence of the disparity prediction of different regions after the left and right images are compared with each other.
  • the low confidence means that the parallax prediction value of the pixel of the network is not enough to be confident, and the low confidence pixel regions automatically detected after the left and right visual parallax comparisons are detected. It is often difficult to match areas, such as repetitive areas, missing areas of texture, and complex object edges. Therefore, the attention graph learned by the t-th recursively can be used as a guide for the t+1th recursive parallax prediction, and the network can correct the parallax value of the low-confidence region pixel automatically detected by the t-th recursion. That is, you can use the attention map as the focus area for the next guide model.
  • the depth information determining apparatus maps the tth right-eye disparity map to the left-eye coordinate system, and obtains the t-th left-eye mapping disparity map, according to the t-th left-eye mapping disparity map and the t-th left-eye disparity map.
  • the t-th left eye attention map is generated, and similarly, the t-th right eye attention map can also be obtained.
  • the recursively learned attention map can be used as the guide for the next recursive parallax prediction, and the network can accordingly correct and update the disparity value of the low confidence region pixels automatically detected by this recursion. Thereby improving the reliability of the binocular attention map.
  • the first depth is determined according to the tth left-eye disparity map.
  • the information may further include:
  • the t+1th left-eye matching similarity and the t-th left-eye attention map are processed by the neural network model to obtain the t+1th left-eye disparity map;
  • the t+1th right-eye matching similarity and the tth right-eye attention map are processed by the neural network model to obtain the t+1th right-eye disparity map;
  • the third depth information is determined according to the t+1th left-eye disparity map, and the fourth depth information is determined according to the t+1th right-eye disparity map.
  • FIG. 5 is a schematic diagram of a recursive binocular parallax network according to an embodiment of the present application.
  • the recursive binocular parallax network may also be referred to as a Left-Right Comparative Recurrent (LRCR) model.
  • the LRCR model contains two parallel neural network models.
  • the left neural network model uses X't to generate the tth left-eye disparity map, where X't represents the connection result of the t-th left-eye matching similarity and the t-1th left-eye attention map.
  • the right neural network model uses X" t to generate a tth right-eye disparity map, where X" t represents the connection result of the tth right-eye matching similarity and the t-1th right-eye attention map.
  • X" t represents the connection result of the tth right-eye matching similarity and the t-1th right-eye attention map.
  • the t-th left-eye distraction map and the t-th right-eye distraction map can be predicted using the t-th left-eye disparity map and the t-th right-eye disparity map.
  • the left neural network model uses X't+1 to generate the t+1th left-eye disparity map, where X't+1 represents the t+1th left-eye matching similarity and the tth time.
  • the right neural network model uses X" t+1 to generate the t+1th right-eye disparity map, where X" t+1 represents the connection of the t+1th right-eye matching similarity and the t-th right-eye attention map. result.
  • the t+1th left-eye distraction map and the t+1th right-right disparity map can be used to predict the t+1th left-eye attention map and the t+1th right-eye attention map.
  • the depth information determining apparatus may further obtain the next binocular depth information.
  • a convolution layer and a convergence layer can be added to the neural network model to generate a binocular attention map, and the binocular attention map is used as the next input to start the LRCR model, in the next step. More attention can be paid to the left and right mismatched areas, thereby improving the accuracy of the prediction.
  • the third optional embodiment of the method for determining depth information provided by the embodiment of the present application, by using a neural network model
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed to obtain a t-th left-eye disparity map, which may include:
  • the t-th left-eye hidden variable is calculated by using ConvLSTM;
  • the t-th right-eye matching similarity and the t-1th right-eye attention map are processed by the neural network model to obtain the t-th right-eye disparity map, including:
  • the t-th right-eye hidden variable is calculated by using ConvLSTM;
  • the t-th right-eye disparity prediction value is calculated according to the t-th right-eye disparity cost, wherein the t-th right-eye disparity prediction value is used to generate the t-th right-eye disparity map.
  • the t-th left-eye matching similarity and the t-1th left-eye attention map need to be input to ConvLSTM, thereby calculating the t-th left-eye hidden variable. .
  • the t-th left-eye disparity cost is obtained according to the t-th left-eye hidden variable.
  • the t-th left-eye disparity prediction value is calculated according to the t-th left-eye disparity cost, and the t-th left-eye disparity prediction value is obtained, which means that the t-th can be generated.
  • Secondary left visual parallax map is also, the manner of generating the tth right-eye disparity map is similar to the manner of generating the t-th left-eye disparity map, and is not described herein.
  • FIG. 6 is a schematic diagram of a convolutional long and short memory network according to an embodiment of the present application.
  • each black line transmits an entire vector, and the input from one node to the input of other nodes.
  • Circles represent point-by-point operations, such as the sum of vectors, and matrices are learned neural network layers.
  • the lines that are joined together represent the connections of the vectors, and the separate lines indicate that the content is copied and then distributed to different locations. If only the above horizontal line is unable to add or delete information, but through a structure called gates, gates can selectively pass information, mainly through a sigmoid nerve.
  • the layer is implemented by a point-by-point multiplication operation.
  • Each element of the sigmoid's neural layer output (which is a vector) is a real number between 0 and 1, representing the weight (or proportion) through which the corresponding information passes. For example, 0 means “no information is passed” and 1 means “let all information pass.”
  • the tanh layer represents a repeating structural module.
  • ConvLSTM implements the protection and control of information through the structure shown in Figure 6. These three doors are respectively input to the door, the forgotten door and the output door.
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by using ConvLSTM to obtain the t-th left-eye disparity map, and the ConvLSTM pair t-th right-eye matching similarity is adopted. And the t-1th right eye attention map is processed to obtain the tth right eye disparity map.
  • ConvLSTM is used to recursively predict the binocular disparity map. This ConvLSTM can not only have the powerful sequence modeling and information processing capabilities of conventional recurrent neural networks, but also effectively extract each The information in the neighborhood of the pixel space achieves the purpose of spatial context information integration.
  • the fourth optional embodiment of the method for determining depth information provided by the embodiment of the present disclosure, according to the third embodiment corresponding to the foregoing FIG. 2, according to the t-th left-eye matching similarity and the t-th - 1 left eye attention map, using ConvLSTM to calculate the tth left eye hidden variable, which may include:
  • the t-th left-eye hidden variable is calculated as follows:
  • i't denotes the t-th recursive network input gate
  • * denotes vector multiplication
  • ° denotes a convolution operation
  • denotes a sigmoid function
  • W xi , W hi , W ci and b i denote a model of a network input gate parameters
  • X 't represent the t-th left eye matching similarity and a first t-1 times the left eye attention map
  • f' t represent the t-th left eye recursive forgotten door
  • W xf, W hf, W cf and b f represents a forgetting door model parameters
  • o 't represent the t-th left eye recursive output gates
  • W xo, W ho, W co and
  • b o represents a model parameter output gate
  • C' t represent the t-th left eye recursive memory cell
  • C ' T-1 denotes the t-1th left-eye
  • using the ConvLSTM calculation to obtain the t-th right-eye hidden variable may include:
  • the t-th right hidden variable is calculated as follows:
  • i't denotes the t-th right recursive network input gate
  • X" t denotes the tth right-eye matching similarity and the t-1th right-eye attentional graph
  • f" t denotes the t-th right-most recursive forgetting gate
  • o 't represent the t-th right-eye recursive output gate
  • C "t represent the t-th right-eye recursive memory cell
  • C" t-1 represents t-1 views the right eye recursive memory cell
  • H "t-1 represents T-1 times the right hidden variable
  • H” t represents the tth right hidden variable.
  • the calculation of the binocular hidden variable is specifically described by combining the formula, and ConvLSTM realizes the information acquisition through the input gate, the forgetting gate and the output gate.
  • the first step in ConvLSTM is to decide what information to discard. This decision is made through a process called the Forgotten Gate.
  • the gate reads H't-1 (or H" t-1 ) and X't (or X" t ) and outputs a value between 0 and 1 for each cell state C't-1
  • the number in (C" t-1 or ). 1 means “completely reserved", 0 means "completely discarded", where H' t-1 (or H" t-1 ) represents the output of the previous cell, X ' t (or X" t ) represents the input of the current cell, and ⁇ represents the sigmod function.
  • the next step is to decide how much new information to add to the cell state.
  • the need to implement this involves two steps.
  • a sigmoid layer called the "input gate layer” determines which information needs to be updated, and a tanh layer generates a vector, which is the alternative content for updating.
  • C' t-1 or C" t-1
  • C' t or C" t
  • the dual-purpose hidden variable can be obtained by using the calculation relationship provided by ConvLSTM.
  • the t-th time is obtained according to the t-th left-eye hidden variable.
  • the left-eye parallax cost can include:
  • the t-th left-eye hidden variable is processed by at least two layers of the fully connected layer to obtain a t-th left-eye disparity cost
  • Obtaining the tth right-eye disparity cost according to the t-th right hidden variable may include:
  • the t-th right-eye hidden variable is processed by at least two layers of fully connected layers to obtain a t-th right-eye disparity cost.
  • the tth left-eye hidden variable may be input to at least two layers of the fully connected layer, and the t-th left-eye disparity cost is outputted by the at least two layers of the fully connected layer.
  • the tth right-eye hidden variable is input to at least two layers of the fully connected layer, and the t-th right-eye disparity cost is output from the at least two-layer fully connected layer.
  • each node of the fully connected layer is connected to all nodes of the upper layer to integrate the features extracted from the front side. Due to its fully connected nature, the parameters of the fully connected layer are also the most common. The parameters of the fully connected layer are indeed many.
  • connection layer is actually a convolution operation in which the size of the convolution kernel is the feature size of the upper layer.
  • the result of the convolution is a node, which corresponds to a point of the fully connected layer.
  • the connection layer is actually a convolution operation in which the size of the convolution kernel is the feature size of the upper layer.
  • the result of the convolution is a node, which corresponds to a point of the fully connected layer.
  • this fully connected layer is converted into a convolutional layer, there are 4096 sets of filters, each set of filters containing 512 convolution kernels, and each convolution kernel has a size of 7 x 7 and an output of 1 x 1 x 4096. If you connect a 1 ⁇ 1 ⁇ 4096 fully connected layer later. Then the corresponding converted convolution layer has a total of 4096 sets of filters, each set of filters containing 4096 convolution kernels, each convolution kernel having a size of 1 ⁇ 1 and an output of 1 ⁇ 1 ⁇ 4096. The equivalent is to combine the features to calculate 4096 classification scores. The highest score is the correct category.
  • the method for obtaining the binocular parallax cost may be: inputting a binocular hidden variable to at least two layers of the fully connected layer, and outputting a binocular disparity cost by the two layers of the fully connected layer.
  • the binocular parallax cost can be obtained by using the fully connected layer, thereby improving the feasibility and operability of the scheme.
  • the t-th time is calculated according to the t-th left-eye disparity cost.
  • the left-eye disparity prediction value may include:
  • the t-th left-eye disparity prediction value is calculated as follows:
  • d'* denotes the tth left-eye disparity prediction value
  • Dmax denotes the maximum number of different disparity maps
  • d' denotes the tth left-eye disparity value
  • denotes a sigmoid function
  • c' d represents the tth left-eye parallax cost
  • Calculating the t-th right-eye disparity prediction value according to the t-th right-eye disparity cost including:
  • the t-th right-eye disparity prediction value is calculated as follows:
  • d"* represents the tth right-eye disparity prediction value
  • c" d represents the tth right-eye disparity cost
  • d" represents the tth right-eye disparity value
  • c" d represents the t-th right-eye disparity cost
  • the binocular parallax cost of size H*W*D max is obtained by convolving layers. Taking the tensor form of the binocular parallax cost, softmax is normalized to the tensor so that the probability tensor reflects the probability of each available difference for all pixels. Finally, the differential argmin layer can be used to generate disparity prediction values for all differences weighted by their probabilities. Mathematically, the above formula describes how a given '"binocular disparity prediction value obtained under d (d expense of a particular pixel by the cost of the tensor d or c)' * (or d 'each available parallax c * ).
  • the binocular disparity prediction value can be calculated by using the maximum number of the different disparity maps and the left-eye disparity value.
  • the second left visual disparity map determines the first depth information, which may include:
  • the first depth information is calculated as follows:
  • Z' represents the first depth information
  • d'* represents the tth left-eye disparity prediction value
  • B represents the binocular camera spacing
  • f represents the focal length
  • Determining the second depth information according to the tth right-eye disparity map may include:
  • the second depth information is calculated in the following manner:
  • Z" represents the second depth information
  • d"* represents the tth right-eye disparity prediction value
  • the binocular disparity map can be used to calculate the dual-purpose depth information.
  • the binocular disparity map can be used to calculate the dual-purpose depth information. Taking the first depth information of the left view as an example, it is necessary to obtain the binocular camera pitch and the focal length, and then divide the product result of the binocular camera pitch and the focal length by the calculated left-eye parallax prediction value to obtain the left view. A depth of information.
  • the depth information is inversely proportional to the parallax, which is consistent with our experiment with fingers, which is why near objects appear to move faster than distant objects.
  • the manner of calculating the depth information is introduced, and the predicted depth difference information, the binocular camera pitch, and the focal length can be used to predict the dual-purpose depth information.
  • the left-eye depth information and the right-eye depth information can be simultaneously calculated, and the required depth information is selected according to actual needs, thereby improving the practicability and feasibility of the scheme.
  • FIG. 7 is a schematic diagram of an embodiment of a depth information determining apparatus for configuring a binocular camera according to an embodiment of the present application.
  • the depth information determining apparatus 20 includes:
  • the obtaining module 201 is configured to acquire a t-th left-eye matching similarity from the left-eye image to the right-eye image, and a t-th right-eye matching similarity from the right-eye image to the left-eye image, where the t is greater than 1 Integer
  • the processing module 202 is configured to process, by using the neural network model, the t-th left-eye matching similarity and the t-1th left-eye attention map acquired by the acquiring module 201 to obtain a t-th left-eye disparity map;
  • the processing module 202 is further configured to process, by using the neural network model, the t-th right-eye matching similarity and the t-1th right-eye attention map acquired by the acquiring module 201 to obtain a t-th right-eye disparity Figure
  • a determining module 203 configured to determine first depth information according to the t-th left-eye disparity map processed by the processing module 202, and determine a second depth according to the t-th right-eye disparity map processed by the processing module information.
  • the obtaining module 201 acquires the t-th left-eye matching similarity from the left-eye image to the right-eye image, and the t-th right-eye matching similarity from the right-eye image to the left-eye image, where t is an integer greater than 1, and is processed.
  • the module 202 processes the t-th left-eye matching similarity and the t-1th left-eye attention map acquired by the obtaining module 201 through the neural network model to obtain a t-th left-eye disparity map, and the processing module 202 uses the neural network model pair acquisition module 201.
  • the acquired t-th right-hand matching similarity and the t-1th right-eye attention map are processed to obtain a t-th right-eye disparity map, and the determining module 203 determines the first depth information according to the t-th left-eye disparity map processed by the processing module 202. And determining second depth information according to the tth right-eye disparity map obtained by the processing module.
  • a depth information determining apparatus which can obtain a binocular parallax map by using a neural network model and a binocular attention map obtained in the previous learning, and is used for learning according to the binocular disparity map obtained this time.
  • the binocular attention map and then guide the next binocular disparity map, so that recursive learning can make full use of the complementary information of the two purposes, and constantly correct the binocular parallax map, so that it is effective for the area where the binocular is difficult to match. Ground to reduce the error of depth information.
  • the depth information determining apparatus 20 further includes a mapping module 204 .
  • the mapping module 204 is configured to map the tth right-eye disparity map to the left-eye coordinate system to obtain a t-th left-eye mapping disparity map;
  • the generating module 205 is configured to generate a t-th left-eye attention map according to the t-th left-eye mapping disparity map and the t-th left-eye disparity map obtained by the mapping module 204;
  • the mapping module 204 is further configured to map the t-th left-eye disparity map to the right-eye coordinate system to obtain a t-th right-eye mapping disparity map;
  • the generating module 205 is further configured to generate a t-th right-eye attention map according to the t-th right-eye mapping disparity map and the t-th right-eye disparity map obtained by the mapping module 204.
  • the depth information determining apparatus maps the tth right-eye disparity map to the left-eye coordinate system, and obtains the t-th left-eye mapping disparity map, according to the t-th left-eye mapping disparity map and the t-th left-eye disparity map.
  • the t-th left eye attention map is generated, and similarly, the t-th right eye attention map can also be obtained.
  • the recursively learned attention map can be used as the guide for the next recursive parallax prediction, and the network can accordingly correct and update the disparity value of the low confidence region pixels automatically detected by this recursion. Thereby improving the reliability of the binocular attention map.
  • the obtaining module 201 is further configured to: determine, by the determining module 203, the first depth information according to the tth left-eye disparity map, and obtain the second-eye image after determining the second depth information according to the t-th right-eye disparity map. a t+1th left-eye matching similarity to the right-eye image, and a t+1th right-eye matching similarity from the right-eye image to the left-eye image;
  • the processing module 202 is further configured to process the t+1th left-eye matching similarity and the t-th left-eye attention map by using the neural network model to obtain a t+1th left-eye disparity map;
  • the processing module 202 is further configured to process, by using the neural network model, the (t+1) right-eye matching similarity and the t-th right-eye attention map to obtain a t+1th right-eye disparity map;
  • the determining module 203 is further configured to determine third depth information according to the t+1th left-eye disparity map processed by the processing module 202, and according to the t+1th processed by the processing module 202.
  • the second right visual disparity map determines the fourth depth information.
  • the depth information determining apparatus may further obtain the next binocular depth information.
  • a convolution layer and a convergence layer can be added to the neural network model to generate a binocular attention map, and the binocular attention map is used as the next input to start the LRCR model, in the next step. More attention can be paid to the left and right mismatched areas, thereby improving the accuracy of the prediction.
  • the processing module 202 is configured to calculate a t-th left-eye hidden variable by using ConvLSTM according to the t-th left-eye matching similarity and the t-1th left-eye attention graph;
  • the processing module 202 is configured to calculate, by using the ConvLSTM, the t-th right-eye hidden variable according to the t-th right-hand matching similarity and the t-1th right-eye attention map;
  • the t-th right-eye disparity prediction value is calculated according to the t-th right-eye disparity cost, wherein the t-th right-eye disparity prediction value is used to generate the t-th right-eye disparity map.
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by using ConvLSTM to obtain the t-th left-eye disparity map, and the ConvLSTM pair t-th right-eye matching similarity is adopted. And the t-1th right eye attention map is processed to obtain the tth right eye disparity map.
  • ConvLSTM is used to recursively predict the binocular disparity map. This ConvLSTM can not only have the powerful sequence modeling and information processing capabilities of conventional recurrent neural networks, but also effectively extract each The information in the neighborhood of the pixel space achieves the purpose of spatial context information integration.
  • the processing module 202 is specifically configured to calculate the t-th left-eye hidden variable in the following manner:
  • the i't represents a t-th left-eye recursive network input gate
  • the * indicates a vector multiplication
  • the ° represents a convolution operation
  • the ⁇ represents a sigmoid function
  • the W xi , the W hi The ci and the b i represent model parameters of the network input gate
  • the X′ t represents the t-th left-eye matching similarity and the t-1th left-eye attention graph
  • the f ' t denotes the forgotten gate of the tth left-eye recursion
  • the W xf , the W hf , the W cf and the b f represent the model parameters of the forgetting gate
  • the o' t representing the t-th left head a recursive output gate
  • said W xo , said W ho , said W co and said b o representing model parameters of said output gate
  • said C′ t representing a t-th left-eye recurs
  • the processing module 202 is specifically configured to calculate the tth right hidden variable in the following manner:
  • the i" t represents a t-th right-hand recursive network input gate
  • the X" t represents the t-th right-eye matching similarity and the t-1th right-eye attentional graph
  • the f" t represents a t-th right-eye recursive forgotten door
  • said o 't represent the t-th right-eye recursive output gates
  • the C "t denotes a memory unit of the t-th right-eye recursion
  • the C" t-1 represents the t -1 right-hand recursive memory unit
  • said H" t-1 represents a t-1th right-eye hidden variable
  • said H" t represents said t-th right-eye hidden variable.
  • the dual-purpose hidden variable can be obtained by using the calculation relationship provided by ConvLSTM.
  • the processing module 202 is configured to process the t-th left-eye hidden variable by using at least two layers of the full connection layer to obtain the t-th left-eye disparity cost;
  • the processing module 202 is configured to process the t-th right-eye hidden variable by using the at least two layers of the full-connection layer to obtain the t-th right-eye disparity cost.
  • the method for obtaining the binocular parallax cost may be: inputting a binocular hidden variable to at least two layers of the fully connected layer, and outputting a binocular disparity cost by the two layers of the fully connected layer.
  • the binocular parallax cost can be obtained by using the fully connected layer, thereby improving the feasibility and operability of the scheme.
  • the processing module 202 is specifically configured to calculate the t-th left-eye disparity prediction value in the following manner:
  • the d′* represents the t-th left-eye disparity prediction value
  • the D max represents a maximum number of different disparity maps
  • the d′ represents a t-th left-eye disparity value
  • the ⁇ represents a sigmoid function.
  • the c' d represents the tth left-eye disparity cost
  • the processing module 202 is specifically configured to calculate the t-th right-eye disparity prediction value in the following manner:
  • the d"* represents the tth right-eye disparity prediction value
  • the c" d represents the t-th right-eye disparity cost
  • the d" represents the t-th right-eye disparity value
  • the c" d Represents the tth right eye disparity cost.
  • the binocular disparity prediction value can be calculated by using the maximum number of the different disparity maps and the left-eye disparity value.
  • the determining module 203 is specifically configured to calculate the first depth information in the following manner:
  • the Z′ represents the first depth information
  • the d′* represents the t-th left-eye disparity prediction value
  • the B represents a binocular camera spacing
  • the f represents a focal length
  • Determining the second depth information according to the tth right-eye disparity map including:
  • the determining module 203 is specifically configured to calculate the second depth information by:
  • the Z′ represents the second depth information
  • the d′′* represents the tth right-eye disparity prediction value
  • FIG. 9 is a schematic structural diagram of a depth information determining apparatus according to an embodiment of the present application.
  • the depth information determining apparatus 300 may generate a large difference due to different configurations or performances, and may include one or more central processing units (central processing units).
  • CPU 322 eg, one or more processors
  • memory 332 e.g., one or more storage media 330 storing application 342 or data 344 (eg, one or one storage device in Shanghai).
  • the memory 332 and the storage medium 330 may be short-term storage or persistent storage.
  • the program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations in the depth information determining device.
  • the central processor 322 can be configured to communicate with the storage medium 330 to perform a series of instruction operations in the storage medium 330 on the depth information determining device 300.
  • Depth information determining apparatus 300 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps performed by the depth information determining means in the above embodiment may determine the device structure based on the depth information shown in FIG.
  • the CPU 322 is configured to perform the following steps:
  • the t-th left-eye matching similarity and the t-1th left-eye attention map are processed by a neural network model to obtain a t-th left-eye disparity map;
  • the t-th right-eye matching similarity and the t-1th right-eye attention map are processed by the neural network model to obtain a t-th right-eye disparity map;
  • the CPU 322 is further configured to perform the following steps:
  • a t-th right-eye attention map is generated based on the t-th right-eye mapping disparity map and the t-th right-right disparity map.
  • the CPU 322 is further configured to perform the following steps:
  • the t+1th left-eye matching similarity and the t-th left-eye attention map are processed by the neural network model to obtain a t+1th left-eye disparity map;
  • the t+1th right-eye matching similarity and the t-th right-eye attention map are processed by the neural network model to obtain a t+1th right-eye disparity map;
  • the CPU 322 is specifically configured to perform the following steps:
  • the t-th right-eye disparity prediction value is calculated according to the t-th right-eye disparity cost, wherein the t-th right-eye disparity prediction value is used to generate the t-th right-eye disparity map.
  • the CPU 322 is specifically configured to perform the following steps:
  • the t-th left-eye hidden variable is calculated as follows:
  • the i't represents a t-th left-eye recursive network input gate
  • the * indicates a vector multiplication
  • the ° represents a convolution operation
  • the ⁇ represents a sigmoid function
  • the W xi , the W hi The ci and the b i represent model parameters of the network input gate
  • the X′ t represents the t-th left-eye matching similarity and the t-1th left-eye attention graph
  • the f ' t denotes the forgotten gate of the tth left-eye recursion
  • the W xf , the W hf , the W cf and the b f represent the model parameters of the forgetting gate
  • the o' t representing the t-th left head a recursive output gate
  • said W xo , said W ho , said W co and said b o representing model parameters of said output gate
  • said C′ t representing a t-th left-eye recurs
  • the t-th right hidden variable is calculated as follows:
  • the i" t represents a t-th right-hand recursive network input gate
  • the X" t represents the t-th right-eye matching similarity and the t-1th right-eye attentional graph
  • the f" t represents a t-th right-eye recursive forgotten door
  • said o 't represent the t-th right-eye recursive output gates
  • the C "t denotes a memory unit of the t-th right-eye recursion
  • the C" t-1 represents the t -1 right-hand recursive memory unit
  • said H" t-1 represents a t-1th right-eye hidden variable
  • said H" t represents said t-th right-eye hidden variable.
  • the CPU 322 is specifically configured to perform the following steps:
  • the t-th left-eye hidden variable is processed by at least two layers of the full connection layer to obtain the t-th left-eye disparity cost
  • the t-th right-eye hidden variable is processed by the at least two layers of fully connected layers to obtain the t-th right-eye disparity cost.
  • the CPU 322 is specifically configured to perform the following steps:
  • the t-th left-eye disparity prediction value is calculated as follows:
  • the d′* represents the t-th left-eye disparity prediction value
  • the D max represents a maximum number of different disparity maps
  • the d′ represents a t-th left-eye disparity value
  • the ⁇ represents a sigmoid function.
  • the c' d represents the tth left-eye disparity cost
  • the t-th right-eye disparity prediction value is calculated as follows:
  • the d"* represents the tth right-eye disparity prediction value
  • the c" d represents the t-th right-eye disparity cost
  • the d" represents the t-th right-eye disparity value
  • the c" d Represents the tth right eye disparity cost.
  • the CPU 322 is specifically configured to perform the following steps:
  • the first depth information is calculated in the following manner:
  • the Z′ represents the first depth information
  • the d′* represents the t-th left-eye disparity prediction value
  • the B represents a binocular camera spacing
  • the f represents a focal length
  • the second depth information is calculated in the following manner:
  • the Z′ represents the second depth information
  • the d′′* represents the tth right-eye disparity prediction value
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of cells is only a logical function division.
  • multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • An integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, can be stored in a computer readable storage medium.
  • the technical solution of the present application in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种深度信息确定的方法,包括:获取从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,其中,t为大于1的整数;通过神经网络模型对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;通过神经网络模型对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;根据第t次左目视差图确定第一深度信息,并根据第t次右目视差图确定第二深度信息。本申请还公开一种深度信息确定装置。本申请利用递归式学习充分考虑双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。

Description

一种深度信息确定的方法及相关装置
本申请要求于2018年04月04日提交中国专利局、申请号为201810301988.3、申请名称为“一种深度信息确定的方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机处理领域,尤其涉及深度信息确定。
背景技术
视差是观测者在两个不同位置看同一物体的方向之差。比如,当你伸出一个手指放在眼前,先闭上右眼,用左眼看它,再闭上左眼,用右眼看它,会发现手指相对远方的物体的位置有了变化,这就是从不同角度去看同一点的视差。
目前,在预测物体深度信息的过程中,需要先预测左眼到右眼在不同视差时的匹配相似度,然后利用左眼到右眼在不同视差时的匹配相似度,对左眼图像进行视差预测,由此确定物体的深度信息。
然而,对于双目难匹配的区域(如重复性区域、纹理缺失区域以及复杂物体边缘)而言,只利用左眼到右眼在不同视差时的匹配相似度,容易导致深度信息误差较大。
发明内容
本申请实施例提供了一种深度信息确定的方法及相关装置,利用递归式的学习可充分考虑到双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。
本申请实施例的第一方面提供了一种深度信息确定的方法,包括:
获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息。
本申请实施例的第二方面提供了一种深度信息确定装置,包括:
获取模块,用于获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
处理模块,用于通过神经网络模型对所述获取模块获取的所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
所述处理模块,还用于通过所述神经网络模型对所述获取模块获取的所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
确定模块,用于根据所述处理模块处理得到的所述第t次左目视差图确定第一深度信息,并根据所述处理模块处理得到的所述第t次右目视差图确定第二深度信息。
本申请实施例的第三方面提供了一种深度信息确定装置,包括:存储器、处理器以及总线系统;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,具体包括如下步骤:
获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息;
所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及 所述处理器进行通信。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例中,提供了一种深度信息确定的方法,获取从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,然后通过神经网络模型对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,并且通过神经网络模型对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图,最后可以根据第t次左目视差图确定第一深度信息,并根据第t次右目视差图确定第二深度信息。通过上述方式,利用神经网络模型以及上一次学习得到的双目注意力图,可以得到双目视差图,并且根据本次得到的双目视差图用于学习出双目注意力图,再指导下一次的双目视差图,这样递归式的学习可以充分利用双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。
附图说明
图1A为本申请实施例中基于递归学习的双目视差示意图;
图1B为本申请实施例中深度信息确定装置的一个架构示意图;
图2为本申请实施例中深度信息确定的方法一个实施例示意图;
图3为本申请实施例中原图和模型预测深度图的一个对比示意图;
图4为本申请实施例中生成双目注意力图的一个示意图;
图5为本申请实施例中递归双目视差网络的一个示意图;
图6为本申请实施例中卷积长短记忆网络的一个示意图;
图7为本申请实施例中深度信息确定装置的一个实施例示意图;
图8为本申请实施例中深度信息确定装置的另一个实施例示意图;
图9为本申请实施例中深度信息确定装置的一个结构示意图。
具体实施方式
本申请实施例提供了一种深度信息确定的方法及相关装置,利用递归式的 学习可充分考虑到双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应理解,本申请可以应用于配备双目摄像头的设施(如双目机器人以及无人车等)进行物体深度估计。本申请主要通过深度神经网络获取双目视觉图像的视差,再用双摄像头间距离和焦距的积除以预测出的视差,得到深度值。具体而言,先利用一个卷积神经网络预测出一目图像到另一目图像(即左目到右目,以及右目到左目)在不同视差时匹配的相似度,得不同视差时的匹配相似度,再利用一个卷积长短记忆网络(Convolutional Long Short-Term Memory,ConvLSTM)递归地进行“双目视差预测——双目视差图对比”的循环。在这个循环中,通过不断进行双目视差图对比,能充分利用左右视觉的互补信息,自动检测出左右视觉中的难匹配区域(如重复性区域、纹理缺失区域或者复杂物体边缘),达到双目视差预测值的修正更新,不断提高视差预测的准确度,也即深度的准确度。
对于双目摄像头拍摄的左右视角图像,先利用卷积神经网络对左目图像到右目图像,和右目图像到左目图像在不同视差时的匹配相似度进行预测,然后,基于上述预测出的匹配相似度,利用ConvLSTM对双目视差进行递归预测,整个流程图如图1A所示,图1A为本申请实施例中基于递归学习的双目视差示意图,如图所示,假设左右双目拍摄到的图像都为H*W(H为高度,W为宽度)分辨率大小,先利用卷积神经网络对双目图像进行像素级别的特征提取,对两图分别得到H*W*C(C为特征维度)的特征图。然后把两个H*W*C的特征图进行水平方向不同视差的特征组合,得到最多D max种不同视差时的特征图(维 数为H*W*2C*D max),再利用另一个卷积核为1*1的卷积神经网络,对所有像素在不同视差时的匹配相似度进行预测,基于2C的输入特征得到一个匹配相似度值。将H*W个像素在所有D max可能视差时的相似度值写成张量形式,则左目图像到右目图像和右目图像到左目图像都能预测出一个H*W*D max的匹配相似度。
基于上述预测出的双目匹配相似度张量,我们利用ConvLSTM对双目视差进行递归预测,从而得到左目视差图和右目视差图。
过几十年来的发展,立体视觉在机器人视觉、航空测绘、反求工程、军事运用、医学成像和工业检测等领域中的运用越来越广。请参阅图1B,图1B为本申请实施例中深度信息确定装置的一个架构示意图,如图所示,本申请所提供的深度信息确定装置可部署与服务器上,由服务器将处理结果传输至目标设备,也可以直接将深度信息确定装置部署在目标设备上。其中,目标设备包含但不仅限于(无人驾驶)汽车、机器人、(无人驾驶)飞机以及智能终端等,这些目标设备都具有双目立体视觉,能够基于视差原理并利用成像设备从不同位置获取被测物体的两幅图像,通过计算图像对应点间的位置偏差,来获取物体三维几何信息。双目立体视觉融合两只眼睛获得的图像并观察它们之间的差别,使我们可以获得明显的深度感,建立特征间的对应关系,将同一空间物理点在不同图像中的映像点对应起来,这个差别称作视差图像。
双眼视差有时候也被称为立体视差,是一种深度线索。物体离观察者越近,两只眼睛所看到物体的差别也越大,这就形成了双眼视差。大脑可以利用对这种视差的测量,估计出物体到眼睛的距离。
下面将对本申请中深度信息确定的方法进行介绍,请参阅图2,本申请实施例中深度信息确定的方法一个实施例包括:
101、获取从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,其中,t为大于1的整数;
本实施例中,首先由深度信息确定装置通过双目摄像头获取左目图像和右目图像,然后计算从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,t是一个大于1的整数,可认为是第t次获取到的匹配相似度。下面将介绍几种计算匹配相似度的算法,在实际应 用中包含但不仅限于以下列举的算法。
第一种,平均绝对差算法(Mean Absolute Differences,MAD),该算法的思想简单,具有较高的匹配精度,广泛用于图像匹配。在搜索图S中,可以将(i,j)作为左上角,取大小为M*N的子图,计算其与模板的相似度,遍历整个搜索图S,在所有能够取到的子图中,找到与模板图最相似的子图作为最终匹配结果。
第二种,绝对误差和算法(Sum of Absolute Differences,SAD)。SAD算法与MAD算法思想几乎是一致的,只是其相似度测量公式有一点改动,这里不再赘述。
第三种,误差平方和算法(Sum of Squared Differences,SSD),也叫差方和算法。SSD算法与SAD算法如出一辙,只是其相似度测量公式有一点改动,这里不再赘述。
第四种,归一化积相关算法(Normalized Cross Correlation,NCC),与上面算法相似,依然是利用子图与模板图的灰度,通过归一化的相关性度量公式来计算二者之间的匹配程度。
第五种,序贯相似性检测算法(Sequential Similiarity Detection Algorithm,SSDA),它是对传统模板匹配算法的改进,比MAD算法快几十到几百倍。
102、通过神经网络模型对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
本实施例中,深度信息确定装置将本次(第t次)得到的左目匹配相似度和上一次(第t-1次)生成的左目注意力输入至神经网络模型,该神经网络模型通常是预先训练得到的,由该神经网络模型输出本次(第t次)的左目视差图。
103、通过神经网络模型对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
本实施例中,类似地,深度信息确定装置将本次(第t次)得到的右目匹配相似度和上一次(第t-1次)生成的右目注意力输入至神经网络模型,该神经网络模型通常是预先训练得到的,由该神经网络模型输出本次(第t次)的右目视差图。
可以理解的是,步骤102和步骤103之间的执行顺序可以是,先执行步骤102 再执行步骤103,也可以先执行步骤103再执行步骤102,还可以是同时执行步骤102和步骤103,此处不做限定。
104、根据第t次左目视差图确定第一深度信息,并根据第t次右目视差图确定第二深度信息。
本实施例中,深度信息确定装置根据神经网络模型输出的第t次左目视差图,确定第t次左目视差图的深度信息(即第一深度信息)。类似地,深度信息确定装置根据神经网络模型输出的第t次右目视差图,确定第t次右目视差图的深度信息(即第二深度信息)。
为了便于介绍,请参阅图3,图3为本申请实施例中原图和模型预测深度图的一个对比示意图,如图所示,利用本申请所提供的神经网络模型可预测得到高质量的深度图。本申请能够提高双目物体深度估计的准确率,对配备双目摄像头的机器人和无人车等设施的自动驾驶和工作具有决定性作用,具有潜在的经济效益。
本申请实施例中,提供了一种深度信息确定的方法,获取从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,然后通过神经网络模型对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,并且通过神经网络模型对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图,最后可以根据第t次左目视差图确定第一深度信息,并根据第t次右目视差图确定第二深度信息。通过上述方式,利用神经网络模型以及上一次学习得到的双目注意力图,可以得到双目视差图,并且根据本次得到的双目视差图用于学习出双目注意力图,再指导下一次的双目视差图,这样递归式的学习可以充分利用双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。
可选地,在上述图2对应的实施例的基础上,本申请实施例提供的深度信息确定的方法第一个可选实施例中,还可以包括:
将第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图;
根据第t次左目映射视差图以及第t次左目视差图,生成第t次左目注意力图;
将第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图;
根据第t次右目映射视差图以及第t次右目视差图,生成第t次右目注意力图。
本实施例中,深度信息确定装置采用映射视差图和视差图生成注意力图,具体地,请参阅图4,图4为本申请实施例中生成双目注意力图的一个示意图,如图所示,在通过神经网络模型生成第t次右目视差图和第t次左目视差图之后,可以将第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图,并且将将第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图。所谓的映射,也就是将两个视差图转换到相反视差图的坐标上。接下来,将原始的第t次左目视差图和转换得到的第t次左目映射视差图连接起来,输入到由几个简单卷积层和变换层组成的模型中,以得到第t次左目注意力图。类似地,将原始的第t次右目视差图和转换得到的第t次右目映射视差图连接起来,输入到由几个简单卷积层和变换层组成的模型中,以得到第t次右目注意力图。
注意力图反映了左右图像对比彼此后,不同区域的视差预测的置信度,低置信度意味着网络该像素的视差预测值不够确信,这些左右目视差对比后自动检测出的低置信度像素区域往往是左右目难匹配区域,如重复性区域、纹理缺失区域和复杂的物体边缘。因此第t次递归学习到的注意力图能作为第t+1次递归视差预测的指导,网络能依此有针对性地修正更新第t次递归自动检测出的低置信度区域像素的视差值,也就是可以将注意力图用作下一步指导模型的聚焦区域。
其次,本申请实施例中,深度信息确定装置将第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图,根据第t次左目映射视差图以及第t次左目视差图,生成第t次左目注意力图,类似地,也可以得到第t次右目注意力图。通过上述方式,本次递归学习到的注意力图能作为第下一次递归视差预测的指导,网络能依此有针对性地修正更新本次递归自动检测出的低置信度区域像素的视差值,从而提升双目注意力图的可靠性。
可选地,在上述图2对应的第一个实施例的基础上,本申请实施例提供的深度信息确定的方法第二个可选实施例中,根据第t次左目视差图确定第一深度信息,并根据第t次右目视差图确定第二深度信息之后,还可以包括:
获取从左目图像至右目图像的第t+1次左目匹配相似度,以及从右目图像到左目图像的第t+1次右目匹配相似度;
通过神经网络模型对第t+1次左目匹配相似度以及第t次左目注意力图进行处理,得到第t+1次左目视差图;
通过神经网络模型对第t+1次右目匹配相似度以及第t次右目注意力图进行处理,得到第t+1次右目视差图;
根据第t+1次左目视差图确定第三深度信息,并根据第t+1次右目视差图确定第四深度信息。
本实施例中,将介绍预测下一次深度信息的方式。请参阅图5,图5为本申请实施例中递归双目视差网络的一个示意图,如图所示,该递归双目视差网络又可以称为左右循环比较(Left-Right Comparative Recurrent,LRCR)模型,LRCR模型包含两个并行的神经网络模型。左侧神经网络模型采用X' t生成第t次左目视差图,其中,X' t表示第t次左目匹配相似度和第t-1次左目注意力图的连接结果。类似地,右侧神经网络模型采用X” t生成第t次右目视差图,其中,X” t表示第t次右目匹配相似度和第t-1次右目注意力图的连接结果。接下来,采用第t次左目视差图和第t次右目视差图可以预测出第t次左目注意力图和第t次右目注意力图。
于是可进行下一次循环,即将左侧神经网络模型采用X' t+1生成第t+1次左目视差图,其中,X' t+1表示第t+1次左目匹配相似度和第t次左目注意力图的连接结果。类似地,右侧神经网络模型采用X” t+1生成第t+1次右目视差图,其中,X” t+1表示第t+1次右目匹配相似度和第t次右目注意力图的连接结果。接下来,采用第t+1次左目视差图和第t+1次右目视差图可以预测出第t+1次左目注意力图和第t+1次右目注意力图。以此类推,此处不作赘述。
再次,本申请实施例中,深度信息确定装置在得到本次的双目深度信息之后,还可以继续得到下一次的双目深度信息。通过上述方式,为了进行左右双目的比较,可在神经网络模型中添加卷积层和汇聚层,从而生成双目注意力图,将双目注意力图作为下一步的输入,启动LRCR模型,在下一步可更多地关注左右失配区域,由此提升预测的准确度。
可选地,在上述图2、图2对应的第一个或第二个实施例的基础上,本申请 实施例提供的深度信息确定的方法第三个可选实施例中,通过神经网络模型对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,可以包括:
根据第t次左目匹配相似度以及第t-1次左目注意力图,利用ConvLSTM计算得到第t次左目隐变量;
根据第t次左目隐变量获取第t次左目视差代价;
根据第t次左目视差代价计算第t次左目视差预测值,其中,第t次左目视差预测值用于生成第t次左目视差图;
通过神经网络模型对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图,包括:
根据第t次右目匹配相似度以及第t-1次右目注意力图,利用ConvLSTM计算得到第t次右目隐变量;
根据第t次右目隐变量获取第t次右目视差代价;
根据第t次右目视差代价计算第t次右目视差预测值,其中,第t次右目视差预测值用于生成第t次右目视差图。
本实施例中,在得到第t次左目视差图的过程中,首先需要将第t次左目匹配相似度以及第t-1次左目注意力图输入至ConvLSTM,由此计算得到第t次左目隐变量。然后根据第t次左目隐变量获取第t次左目视差代价,最后,根据第t次左目视差代价计算第t次左目视差预测值,得到第t次左目视差预测值也就意味着可以生成第t次左目视差图。类似地,生成第t次右目视差图的方式与生成第t次左目视差图的方式类似,此处不作赘述。
为了便于理解,请参阅图6,图6为本申请实施例中卷积长短记忆网络的一个示意图,如图所示,每一条黑线传输一整个向量,从一个节点的输出到其他节点的输入。圆圈代表逐点操作,诸如向量的和,而矩阵就是学习到的神经网络层。合在一起的线表示向量的连接,分开的线表示内容被复制,然后分发到不同的位置。若只有上面的那条水平线是没办法实现添加或者删除信息的,而是通过一种叫做门(gates)的结构来实现的,gates可以实现选择性地让信息通过,主要是通过一个sigmoid的神经层和一个逐点相乘的操作来实现的。sigmoid的神经层输出(是一个向量)的每个元素都是一个在0和1之间的实数, 表示让对应信息通过的权重(或者占比)。比如,0表示“不让任何信息通过”,1表示“让所有信息通过”。tanh层表示重复的结构模块。
ConvLSTM通过图6所示的结构来实现信息的保护和控制。这三个门分别输入门、遗忘门和输出门。
进一步地,本申请实施例中,采用ConvLSTM对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,并且采用ConvLSTM对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图。通过上述方式,基于预测出的双目匹配相似度,利用ConvLSTM对双目视差图进行递归预测,这种ConvLSTM能不仅具有常规递归神经网络的强大序列建模和信息处理能力,还能有效提取每个像素空间邻域内的信息,达到空间上下文信息整合的目的。
可选地,在上述图2对应的第三个实施例的基础上,本申请实施例提供的深度信息确定的方法第四个可选实施例中,根据第t次左目匹配相似度以及第t-1次左目注意力图,利用ConvLSTM计算得到第t次左目隐变量,可以包括:
采用如下方式计算第t次左目隐变量:
Figure PCTCN2019077669-appb-000001
Figure PCTCN2019077669-appb-000002
Figure PCTCN2019077669-appb-000003
Figure PCTCN2019077669-appb-000004
Figure PCTCN2019077669-appb-000005
其中,i' t表示第t次左目递归的网络输入门,*表示向量相乘,°表示卷积操作,σ表示sigmoid函数,W xi、W hi、W ci以及b i表示网络输入门的模型参数,X' t表示第t次左目匹配相似度以及第t-1次左目注意力图,f' t表示第t次左目递归的遗忘门,W xf、W hf、W cf以及b f表示遗忘门的模型参数,o' t表示第t次左目递归的输出门,W xo、W ho、W co以及b o表示输出门的模型参数,C' t表示第t次左目递归的记忆单元,C' t-1表示第t-1次左目递归的记忆单元,tanh表示双曲正切,H' t-1表示第t-1次左目隐变量,H' t表示第t次左目隐变量;
根据第t次右目匹配相似度以及第t-1次右目注意力图,利用ConvLSTM计算得到第t次右目隐变量,可以包括:
采用如下方式计算第t次右目隐变量:
Figure PCTCN2019077669-appb-000006
Figure PCTCN2019077669-appb-000007
Figure PCTCN2019077669-appb-000008
Figure PCTCN2019077669-appb-000009
Figure PCTCN2019077669-appb-000010
其中,i” t表示第t次右目递归的网络输入门,X” t表示第t次右目匹配相似度以及第t-1次右目注意力图,f” t表示第t次右目递归的遗忘门,o' t表示第t次右目递归的输出门,C” t表示第t次右目递归的记忆单元,C” t-1表示第t-1次右目递归的记忆单元,H” t-1表示第t-1次右目隐变量,H” t表示第t次右目隐变量。
本实施例中,结合公式对双目隐变量的计算进行具体说明,ConvLSTM通过输入门、遗忘门和输出门实现信息的获取。
在ConvLSTM中的第一步是决定丢弃什么信息。这个决定通过一个称为遗忘门完成。该门会读取H' t-1(或H” t-1)和X' t(或X” t),输出一个在0到1之间的数值给每个在细胞状态C' t-1(C” t-1或)中的数字。1表示“完全保留”,0表示“完全舍弃”。其中H' t-1(或H” t-1)表示的是上一个细胞的输出,X' t(或X” t)表示的是当前细胞的输入,σ表示sigmod函数。
下一步是决定让多少新的信息加入到细胞状态中来。实现这个需要包括两个步骤,首先,一个叫做“输入门层”的sigmoid层决定哪些信息需要更新,一个tanh层生成一个向量,也就是备选的用来更新的内容,在下一步,我们把这两部分联合起来,对细胞状态进行一个更新,C' t-1(或C” t-1)更新为C' t(或C” t)。把旧状态与f' t(或f” t)相乘,丢弃掉我们确定需要丢弃的信息。
最终,我们需要确定输出什么值。这个输出将会基于我们的细胞状态,但是也是一个过滤后的版本。首先,我们运行一个sigmoid层来确定细胞状态的哪个部分将输出出去。接着,我们把细胞状态通过tanh进行处理(得到一个在-1到1之间的值)并将它和sigmoid门的输出相乘,最终我们仅仅会输出我们确定输出的那部分。
更进一步地,本申请实施例中,介绍了一种计算第t次左目隐变量和第t 次右目隐变量的具体方式,采用ConvLSTM所提供的计算关系,能够得到双目的隐变量。通过上述方式,能够有效地提升隐变量计算的可靠性,并且为方案的实现提供了可操作的依据。
可选地,在上述图2对应的第三个实施例的基础上,本申请实施例提供的深度信息确定的方法第五个可选实施例中,根据第t次左目隐变量获取第t次左目视差代价,可以包括:
通过至少两层全连接层对第t次左目隐变量进行处理,得到第t次左目视差代价;
根据第t次右目隐变量获取第t次右目视差代价,可以包括:
通过至少两层全连接层对第t次右目隐变量进行处理,得到第t次右目视差代价。
本实施例中,可以将第t次左目隐变量输入至至少两层全连接层,由该至少两层全连接层输出第t次左目视差代价。类似地,将第t次右目隐变量输入至至少两层全连接层,由该至少两层全连接层输出第t次右目视差代价。
具体地,全连接层的每一个结点都与上一层的所有结点相连,用来把前边提取到的特征综合起来。由于其全相连的特性,一般全连接层的参数也是最多的。全连接层的参数的确很多。在前向计算过程,也就是一个线性的加权求和的过程,全连接层的每一个输出都可以看成前一层的每一个结点乘以一个权重系数W,最后加上一个偏置值b得到。假设输入有50×4×4个神经元结点,输出有500个结点,则一共需要50×4×4×500=400000个权值参数W和500个偏置参数b。
连接层实际就是卷积核大小为上层特征大小的卷积运算,卷积后的结果为一个节点,就对应全连接层的一个点。假设最后一个卷积层的输出为7×7×512,连接此卷积层的全连接层为1×1×4096。连接层实际就是卷积核大小为上层特征大小的卷积运算,卷积后的结果为一个节点,就对应全连接层的一个点。如果将这个全连接层转化为卷积层,则共有4096组滤波器,每组滤波器含有512个卷积核,每个卷积核的大小为7×7则输出为1×1×4096。若后面再连接一个1×1×4096全连接层。则其对应的转换后的卷积层的参数为,共有4096组滤波器,每组滤波器含有4096个卷积核,每个卷积核的大小为1×1, 输出为1×1×4096,相当于就是将特征组合起来进行4096个分类分数的计算,得分最高的就是划到的正确的类别。
更进一步地,本申请实施例中,获取双目视差代价的方法可以是,将双目隐变量输入至至少两层全连接层,由两层全连接层输出双目视差代价。通过上述方式,可以利用全连接层得到双目视差代价,从而提升方案的可行性和可操作性。
可选地,在上述图2对应的第三个实施例的基础上,本申请实施例提供的深度信息确定的方法第六个可选实施例中,根据第t次左目视差代价计算第t次左目视差预测值,可以包括:
采用如下方式计算第t次左目视差预测值:
Figure PCTCN2019077669-appb-000011
其中,d'*表示第t次左目视差预测值,D max表示不同视差图的数量最大值,d'表示第t次左目视差值,σ表示sigmoid函数。c' d表示第t次左目视差代价;
根据第t次右目视差代价计算第t次右目视差预测值,包括:
采用如下方式计算第t次右目视差预测值:
Figure PCTCN2019077669-appb-000012
d”*表示第t次右目视差预测值,c” d表示第t次右目视差代价,d”表示第t次右目视差值,c” d表示第t次右目视差代价。
本实施例中,通过卷积层来获得大小为H*W*D max的双目视差代价。取双目视差代价的张量形式,将softmax标准化应用于张量,使得概率张量反映所有像素的每个可用差异的概率。最后,可使用微分argmin层来将所有通过其概率加权的差异来生成视差预测值。在数学上,如上公式描述了如何在给定每个可用视差c' d(或c” d)代价的情况下通过特定像素的代价张量获得双目视差预测值d'*(或d”*)。
更进一步地,本申请实施例中,提供了一种计算双目视差预测值的具体方式,即利用不同视差图的数量最大值和左目视差值,就能够计算出双目视差预测值。通过上述方式,为方案的实现提供了具体的依据,从而提升方案的实用性和可操作性。
可选地,在上述图2对应的第四个至第六个实施例中任一项的基础上,本申请实施例提供的深度信息确定的方法第七个可选实施例中,根据第t次左目视差图确定第一深度信息,可以包括:
采用如下方式计算第一深度信息:
Figure PCTCN2019077669-appb-000013
其中,Z'表示第一深度信息,d'*表示第t次左目视差预测值,B表示双目摄像头间距,f表示焦距;
根据第t次右目视差图确定第二深度信息,可以包括:
采用如下方式计算所述第二深度信息:
Figure PCTCN2019077669-appb-000014
其中,Z”表示第二深度信息,d”*表示第t次右目视差预测值。
本实施例中,在得到双目视差图之后,可利用双目视差图分别计算出双目的深度信息。以计算左视图的第一深度信息为例,需要获取双目摄像头间距和焦距,然后将双目摄像头间距和焦距的乘积结果,除以计算得到的左目视差预测值,即可得到左视图的第一深度信息。
下面将介绍上述公式的推导方式,假设两个相机的内部参数一致,如焦距和镜头,为了数学描述的方便需引入坐标,由于坐标是人为引入的,因此客观世界中的事物可以处于不同的坐标系中。假设两个相机的X轴方向一致,像平面重叠,坐标系以左相机为准,右相机相对于左相机是简单的平移,用坐标表示为(T x,0,0)。T x一般称为基线,根据三角形相似关系,很容易得出空间中的一点P(X,Y,Z)分别在左右像平面上的投影坐标。因此,能够得到视差的计算方式为:
Figure PCTCN2019077669-appb-000015
Figure PCTCN2019077669-appb-000016
从而推导得出:
Figure PCTCN2019077669-appb-000017
显然,深度信息和视差成反比,这与我们用手指做试验是相吻合的,这也 是为什么近的物体看起来比远的物体移动得快。
再进一步地,本申请实施例中,介绍了计算深度信息的方式,利用预测得到的视差预测值、双目摄像头间距和焦距就能预测出双目的深度信息。通过上述方式,可同时计算得到左目深度信息和右目深度信息,根据实际需求,选择所需的深度信息,从而提升方案的实用性和可行性。
下面对本申请中的深度信息确定装置进行详细描述,请参阅图7,图7为本申请实施例中一种配置双目摄像头的深度信息确定装置一个实施例示意图,深度信息确定装置20包括:
获取模块201,用于获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
处理模块202,用于通过神经网络模型对所述获取模块201获取的所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
所述处理模块202,还用于通过所述神经网络模型对所述获取模块201获取的所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
确定模块203,用于根据所述处理模块202处理得到的所述第t次左目视差图确定第一深度信息,并根据所述处理模块处理得到的所述第t次右目视差图确定第二深度信息。
本实施例中,获取模块201获取从左目图像至右目图像的第t次左目匹配相似度,以及从右目图像到左目图像的第t次右目匹配相似度,其中,t为大于1的整数,处理模块202通过神经网络模型对获取模块201获取的第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,处理模块202通过神经网络模型对获取模块201获取的第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图,确定模块203根据处理模块202处理得到的第t次左目视差图确定第一深度信息,并根据处理模块处理得到的第t次右目视差图确定第二深度信息。
本申请实施例中,提供了一种深度信息确定装置,可以利用神经网络模型以及上一次学习得到的双目注意力图,得到双目视差图,并且根据本次得到的 双目视差图用于学习出双目注意力图,再指导下一次的双目视差图,这样递归式的学习可以充分利用双目的互补信息,不断修正双目视差图,从而对于双目难匹配的区域而言,能够有效地降低深度信息的误差。
可选地,在上述图7所对应的实施例的基础上,请参阅图8,本申请实施例提供的深度信息确定装置20的另一实施例中,深度信息确定装置20还包括映射模块204和生成模块205;
所述映射模块204,用于将第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图;
所述生成模块205,用于根据所述映射模块204映射得到的第t次左目映射视差图以及第t次左目视差图,生成第t次左目注意力图;
所述映射模块204,还用于将第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图;
所述生成模块205,还用于根据所述映射模块204映射得到的第t次右目映射视差图以及第t次右目视差图,生成第t次右目注意力图。
其次,本申请实施例中,深度信息确定装置将第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图,根据第t次左目映射视差图以及第t次左目视差图,生成第t次左目注意力图,类似地,也可以得到第t次右目注意力图。通过上述方式,本次递归学习到的注意力图能作为第下一次递归视差预测的指导,网络能依此有针对性地修正更新本次递归自动检测出的低置信度区域像素的视差值,从而提升双目注意力图的可靠性。
可选地,在上述图8所对应的实施例的基础上,本申请实施例提供的深度信息确定装置20的另一实施例中,
所述获取模块201,还用于所述确定模块203根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息之后,获取从左目图像至右目图像的第t+1次左目匹配相似度,以及从所述右目图像到所述左目图像的第t+1次右目匹配相似度;
所述处理模块202,还用于通过所述神经网络模型对所述第t+1次左目匹配相似度以及第t次左目注意力图进行处理,得到第t+1次左目视差图;
所述处理模块202,还用于通过所述神经网络模型对所述第t+1次右目匹配 相似度以及第t次右目注意力图进行处理,得到第t+1次右目视差图;
所述确定模块203,还用于根据所述处理模块202处理得到的所述第t+1次左目视差图确定第三深度信息,并根据所述处理模块202处理得到的所述第t+1次右目视差图确定第四深度信息。
再次,本申请实施例中,深度信息确定装置在得到本次的双目深度信息之后,还可以继续得到下一次的双目深度信息。通过上述方式,为了进行左右双目的比较,可在神经网络模型中添加卷积层和汇聚层,从而生成双目注意力图,将双目注意力图作为下一步的输入,启动LRCR模型,在下一步可更多地关注左右失配区域,由此提升预测的准确度。
可选地,在上述图7或图8所对应的实施例的基础上,本申请实施例提供的深度信息确定装置20的另一实施例中:
所述处理模块202,具体用于根据所述第t次左目匹配相似度以及所述第t-1次左目注意力图,利用ConvLSTM计算得到第t次左目隐变量;
根据所述第t次左目隐变量获取第t次左目视差代价;
根据所述第t次左目视差代价计算第t次左目视差预测值,其中,所述第t次左目视差预测值用于生成所述第t次左目视差图;
所述处理模块202,具体用于根据所述第t次右目匹配相似度以及所述第t-1次右目注意力图,利用所述ConvLSTM计算得到第t次右目隐变量;
根据所述第t次右目隐变量获取第t次右目视差代价;
根据所述第t次右目视差代价计算第t次右目视差预测值,其中,所述第t次右目视差预测值用于生成所述第t次右目视差图。
进一步地,本申请实施例中,采用ConvLSTM对第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,并且采用ConvLSTM对第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图。通过上述方式,基于预测出的双目匹配相似度,利用ConvLSTM对双目视差图进行递归预测,这种ConvLSTM能不仅具有常规递归神经网络的强大序列建模和信息处理能力,还能有效提取每个像素空间邻域内的信息,达到空间上下文信息整合的目的。
可选地,在上述图7或图8所对应的实施例的基础上,本申请实施例提供的 深度信息确定装置20的另一实施例中:
所述处理模块202,具体用于采用如下方式计算所述第t次左目隐变量:
Figure PCTCN2019077669-appb-000018
Figure PCTCN2019077669-appb-000019
Figure PCTCN2019077669-appb-000020
Figure PCTCN2019077669-appb-000021
Figure PCTCN2019077669-appb-000022
其中,所述i' t表示第t次左目递归的网络输入门,所述*表示向量相乘,所述°表示卷积操作,所述σ表示sigmoid函数,所述W xi、所述W hi、所述W ci以及所述b i表示所述网络输入门的模型参数,所述X' t表示所述第t次左目匹配相似度以及所述第t-1次左目注意力图,所述f' t表示第t次左目递归的遗忘门,所述W xf、所述W hf、所述W cf以及所述b f表示所述遗忘门的模型参数,所述o' t表示第t次左目递归的输出门,所述W xo、所述W ho、所述W co以及所述b o表示所述输出门的模型参数,所述C' t表示第t次左目递归的记忆单元,所述C' t-1表示第t-1次左目递归的记忆单元,所述tanh表示双曲正切,所述H' t-1表示第t-1次左目隐变量,所述H' t表示所述第t次左目隐变量;
所述处理模块202,具体用于采用如下方式计算所述第t次右目隐变量:
Figure PCTCN2019077669-appb-000023
Figure PCTCN2019077669-appb-000024
Figure PCTCN2019077669-appb-000025
Figure PCTCN2019077669-appb-000026
Figure PCTCN2019077669-appb-000027
其中,所述i” t表示第t次右目递归的网络输入门,所述X” t表示所述第t次右目匹配相似度以及所述第t-1次右目注意力图,所述f” t表示第t次右目递归的遗忘门,所述o' t表示第t次右目递归的输出门,所述C” t表示第t次右目递归的记忆单元,所述C” t-1表示第t-1次右目递归的记忆单元,所述H” t-1表示第t-1次右目隐变量,所述H” t表示所述第t次右目隐变量。
更进一步地,本申请实施例中,介绍了一种计算第t次左目隐变量和第t次右目隐变量的具体方式,采用ConvLSTM所提供的计算关系,能够得到双目的 隐变量。通过上述方式,能够有效地提升隐变量计算的可靠性,并且为方案的实现提供了可操作的依据。
可选地,在上述图7或图8所对应的实施例的基础上,本申请实施例提供的深度信息确定装置20的另一实施例中,
所述处理模块202,具体用于通过至少两层全连接层对所述第t次左目隐变量进行处理,得到所述第t次左目视差代价;
所述处理模块202,具体用于通过所述至少两层全连接层对所述第t次右目隐变量进行处理,得到所述第t次右目视差代价。
更进一步地,本申请实施例中,获取双目视差代价的方法可以是,将双目隐变量输入至至少两层全连接层,由两层全连接层输出双目视差代价。通过上述方式,可以利用全连接层得到双目视差代价,从而提升方案的可行性和可操作性。
可选地,在上述图7或图8所对应的实施例的基础上,本申请实施例提供的深度信息确定装置20的另一实施例中,
所述处理模块202,具体用于采用如下方式计算所述第t次左目视差预测值:
Figure PCTCN2019077669-appb-000028
其中,所述d'*表示所述第t次左目视差预测值,所述D max表示不同视差图的数量最大值,所述d'表示第t次左目视差值,所述σ表示sigmoid函数。所述c' d表示第t次左目视差代价;
所述处理模块202,具体用于采用如下方式计算所述第t次右目视差预测值:
Figure PCTCN2019077669-appb-000029
所述d”*表示所述第t次右目视差预测值,所述c” d表示所述第t次右目视差代价,所述d”表示第t次右目视差值,所述c” d表示第t次右目视差代价。
更进一步地,本申请实施例中,提供了一种计算双目视差预测值的具体方式,即利用不同视差图的数量最大值和左目视差值,就能够计算出双目视差预测值。通过上述方式,为方案的实现提供了具体的依据,从而提升方案的实用性和可操作性。
可选地,在上述图7或图8所对应的实施例的基础上,本申请实施例提供的深度信息确定装置20的另一实施例中,
所述确定模块203,具体用于采用如下方式计算所述第一深度信息:
Figure PCTCN2019077669-appb-000030
其中,所述Z'表示所述第一深度信息,所述d'*表示所述第t次左目视差预测值,所述B表示双目摄像头间距,所述f表示焦距;
所述根据所述第t次右目视差图确定第二深度信息,包括:
所述确定模块203,具体用于采用如下方式计算所述第二深度信息:
Figure PCTCN2019077669-appb-000031
其中,所述Z”表示所述第二深度信息,所述d”*表示所述第t次右目视差预测值。
再进一步地,本申请实施例中,介绍了计算深度信息的方式,利用预测得到的视差预测值、双目摄像头间距和焦距就能预测出双目的深度信息。通过上述方式,可同时计算得到左目深度信息和右目深度信息,根据实际需求,选择所需的深度信息,从而提升方案的实用性和可行性。
图9是本申请实施例提供的一种深度信息确定装置结构示意图,该深度信息确定装置300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,一个或一个以上存储应用程序342或数据344的存储介质330(例如一个或一个以上海量存储设备)。其中,存储器332和存储介质330可以是短暂存储或持久存储。存储在存储介质330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对深度信息确定装置中的一系列指令操作。更进一步地,中央处理器322可以设置为与存储介质330通信,在深度信息确定装置300上执行存储介质330中的一系列指令操作。
深度信息确定装置300还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由深度信息确定装置所执行的步骤可以基于该图9所示的深度信息确定装置结构。
CPU 322用于执行如下步骤:
获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息。
可选地,CPU 322还用于执行如下步骤:
将所述第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图;
根据所述第t次左目映射视差图以及所述第t次左目视差图,生成第t次左目注意力图;
将所述第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图;
根据所述第t次右目映射视差图以及所述第t次右目视差图,生成第t次右目注意力图。
可选地,CPU 322还用于执行如下步骤:
获取从左目图像至右目图像的第t+1次左目匹配相似度,以及从所述右目图像到所述左目图像的第t+1次右目匹配相似度;
通过所述神经网络模型对所述第t+1次左目匹配相似度以及第t次左目注意力图进行处理,得到第t+1次左目视差图;
通过所述神经网络模型对所述第t+1次右目匹配相似度以及第t次右目注意力图进行处理,得到第t+1次右目视差图;
根据所述第t+1次左目视差图确定第三深度信息,并根据所述第t+1次右目视差图确定第四深度信息。
可选地,CPU 322具体用于执行如下步骤:
根据所述第t次左目匹配相似度以及所述第t-1次左目注意力图,利用 ConvLSTM计算得到第t次左目隐变量;
根据所述第t次左目隐变量获取第t次左目视差代价;
根据所述第t次左目视差代价计算第t次左目视差预测值,其中,所述第t次左目视差预测值用于生成所述第t次左目视差图;
根据所述第t次右目匹配相似度以及所述第t-1次右目注意力图,利用所述ConvLSTM计算得到第t次右目隐变量;
根据所述第t次右目隐变量获取第t次右目视差代价;
根据所述第t次右目视差代价计算第t次右目视差预测值,其中,所述第t次右目视差预测值用于生成所述第t次右目视差图。
可选地,CPU 322具体用于执行如下步骤:
采用如下方式计算所述第t次左目隐变量:
Figure PCTCN2019077669-appb-000032
Figure PCTCN2019077669-appb-000033
Figure PCTCN2019077669-appb-000034
Figure PCTCN2019077669-appb-000035
Figure PCTCN2019077669-appb-000036
其中,所述i' t表示第t次左目递归的网络输入门,所述*表示向量相乘,所述°表示卷积操作,所述σ表示sigmoid函数,所述W xi、所述W hi、所述W ci以及所述b i表示所述网络输入门的模型参数,所述X' t表示所述第t次左目匹配相似度以及所述第t-1次左目注意力图,所述f' t表示第t次左目递归的遗忘门,所述W xf、所述W hf、所述W cf以及所述b f表示所述遗忘门的模型参数,所述o' t表示第t次左目递归的输出门,所述W xo、所述W ho、所述W co以及所述b o表示所述输出门的模型参数,所述C' t表示第t次左目递归的记忆单元,所述C' t-1表示第t-1次左目递归的记忆单元,所述tanh表示双曲正切,所述H' t-1表示第t-1次左目隐变量,所述H' t表示所述第t次左目隐变量;
采用如下方式计算所述第t次右目隐变量:
Figure PCTCN2019077669-appb-000037
Figure PCTCN2019077669-appb-000038
Figure PCTCN2019077669-appb-000039
Figure PCTCN2019077669-appb-000040
Figure PCTCN2019077669-appb-000041
其中,所述i” t表示第t次右目递归的网络输入门,所述X” t表示所述第t次右目匹配相似度以及所述第t-1次右目注意力图,所述f” t表示第t次右目递归的遗忘门,所述o' t表示第t次右目递归的输出门,所述C” t表示第t次右目递归的记忆单元,所述C” t-1表示第t-1次右目递归的记忆单元,所述H” t-1表示第t-1次右目隐变量,所述H” t表示所述第t次右目隐变量。
可选地,CPU 322具体用于执行如下步骤:
通过至少两层全连接层对所述第t次左目隐变量进行处理,得到所述第t次左目视差代价;
通过所述至少两层全连接层对所述第t次右目隐变量进行处理,得到所述第t次右目视差代价。
可选地,CPU 322具体用于执行如下步骤:
采用如下方式计算所述第t次左目视差预测值:
Figure PCTCN2019077669-appb-000042
其中,所述d'*表示所述第t次左目视差预测值,所述D max表示不同视差图的数量最大值,所述d'表示第t次左目视差值,所述σ表示sigmoid函数。所述c' d表示第t次左目视差代价;
采用如下方式计算所述第t次右目视差预测值:
Figure PCTCN2019077669-appb-000043
所述d”*表示所述第t次右目视差预测值,所述c” d表示所述第t次右目视差代价,所述d”表示第t次右目视差值,所述c” d表示第t次右目视差代价。
可选地,CPU 322具体用于执行如下步骤:
采用如下方式计算所述第一深度信息:
Figure PCTCN2019077669-appb-000044
其中,所述Z'表示所述第一深度信息,所述d'*表示所述第t次左目视差预测值,所述B表示双目摄像头间距,所述f表示焦距;
采用如下方式计算所述第二深度信息:
Figure PCTCN2019077669-appb-000045
其中,所述Z”表示所述第二深度信息,所述d”*表示所述第t次右目视差预测值。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程 序代码的介质。
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (13)

  1. 一种深度信息确定的方法,其特征在于,应用于配备双目摄像头的设施,所述方法包括:
    获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
    通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
    通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
    根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图;
    根据所述第t次左目映射视差图以及所述第t次左目视差图,生成第t次左目注意力图;
    将所述第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图;
    根据所述第t次右目映射视差图以及所述第t次右目视差图,生成第t次右目注意力图。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息之后,所述方法还包括:
    获取从左目图像至右目图像的第t+1次左目匹配相似度,以及从所述右目图像到所述左目图像的第t+1次右目匹配相似度;
    通过所述神经网络模型对所述第t+1次左目匹配相似度以及第t次左目注意力图进行处理,得到第t+1次左目视差图;
    通过所述神经网络模型对所述第t+1次右目匹配相似度以及第t次右目注意力图进行处理,得到第t+1次右目视差图;
    根据所述第t+1次左目视差图确定第三深度信息,并根据所述第t+1次右目视差图确定第四深度信息。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图,包括:
    根据所述第t次左目匹配相似度以及所述第t-1次左目注意力图,利用卷积长短记忆网络ConvLSTM计算得到第t次左目隐变量;
    根据所述第t次左目隐变量获取第t次左目视差代价;
    根据所述第t次左目视差代价计算第t次左目视差预测值,其中,所述第t次左目视差预测值用于生成所述第t次左目视差图;
    所述通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图,包括:
    根据所述第t次右目匹配相似度以及所述第t-1次右目注意力图,利用所述ConvLSTM计算得到第t次右目隐变量;
    根据所述第t次右目隐变量获取第t次右目视差代价;
    根据所述第t次右目视差代价计算第t次右目视差预测值,其中,所述第t次右目视差预测值用于生成所述第t次右目视差图。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第t次左目匹配相似度以及所述第t-1次左目注意力图,利用卷积长短记忆网络ConvLSTM计算得到第t次左目隐变量,包括:
    采用如下方式计算所述第t次左目隐变量:
    Figure PCTCN2019077669-appb-100001
    Figure PCTCN2019077669-appb-100002
    Figure PCTCN2019077669-appb-100003
    Figure PCTCN2019077669-appb-100004
    Figure PCTCN2019077669-appb-100005
    其中,所述i' t表示第t次左目递归的网络输入门,所述*表示向量相乘,所述
    Figure PCTCN2019077669-appb-100006
    表示卷积操作,所述σ表示sigmoid函数,所述W xi、所述W hi、所述W ci以及所述b i表示所述网络输入门的模型参数,所述X' t表示所述第t次左目匹配相似度以及所述第t-1次左目注意力图,所述f' t表示第t次左目递归的遗忘门,所述W xf、所述W hf、所述W cf以及所述b f表示所述遗忘门的模型参数, 所述o' t表示第t次左目递归的输出门,所述W xo、所述W ho、所述W co以及所述b o表示所述输出门的模型参数,所述C' t表示第t次左目递归的记忆单元,所述C' t-1表示第t-1次左目递归的记忆单元,所述tanh表示双曲正切,所述H' t-1表示第t-1次左目隐变量,所述H' t表示所述第t次左目隐变量;
    所述根据所述第t次右目匹配相似度以及所述第t-1次右目注意力图,利用所述ConvLSTM计算得到第t次右目隐变量,包括:
    采用如下方式计算所述第t次右目隐变量:
    Figure PCTCN2019077669-appb-100007
    Figure PCTCN2019077669-appb-100008
    Figure PCTCN2019077669-appb-100009
    Figure PCTCN2019077669-appb-100010
    Figure PCTCN2019077669-appb-100011
    其中,所述i” t表示第t次右目递归的网络输入门,所述X” t表示所述第t次右目匹配相似度以及所述第t-1次右目注意力图,所述f” t表示第t次右目递归的遗忘门,所述o' t表示第t次右目递归的输出门,所述C” t表示第t次右目递归的记忆单元,所述C” t-1表示第t-1次右目递归的记忆单元,所述H” t-1表示第t-1次右目隐变量,所述H” t表示所述第t次右目隐变量。
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述第t次左目隐变量获取第t次左目视差代价,包括:
    通过至少两层全连接层对所述第t次左目隐变量进行处理,得到所述第t次左目视差代价;
    所述根据所述第t次右目隐变量获取第t次右目视差代价,包括:
    通过所述至少两层全连接层对所述第t次右目隐变量进行处理,得到所述第t次右目视差代价。
  7. 根据权利要求4所述的方法,其特征在于,所述根据所述第t次左目视差代价计算第t次左目视差预测值,包括:
    采用如下方式计算所述第t次左目视差预测值:
    Figure PCTCN2019077669-appb-100012
    其中,所述d'*表示所述第t次左目视差预测值,所述D max表示不同视差 图的数量最大值,所述d'表示第t次左目视差值,所述σ表示sigmoid函数。所述c' d表示第t次左目视差代价;
    所述根据所述第t次右目视差代价计算第t次右目视差预测值,包括:
    采用如下方式计算所述第t次右目视差预测值:
    Figure PCTCN2019077669-appb-100013
    所述d”*表示所述第t次右目视差预测值,所述c” d表示所述第t次右目视差代价,所述d”表示第t次右目视差值,所述c” d表示第t次右目视差代价。
  8. 根据权利要求5至7中任一项所述的方法,其特征在于,所述根据所述第t次左目视差图确定第一深度信息,包括:
    采用如下方式计算所述第一深度信息:
    Figure PCTCN2019077669-appb-100014
    其中,所述Z'表示所述第一深度信息,所述d'*表示所述第t次左目视差预测值,所述B表示双目摄像头间距,所述f表示焦距;
    所述根据所述第t次右目视差图确定第二深度信息,包括:
    采用如下方式计算所述第二深度信息:
    Figure PCTCN2019077669-appb-100015
    其中,所述Z”表示所述第二深度信息,所述d”*表示所述第t次右目视差预测值。
  9. 一种配备双目摄像头的深度信息确定装置,其特征在于,所述装置包括:存储器、处理器以及总线系统;
    其中,所述存储器用于存储程序;
    所述处理器用于执行所述存储器中的程序,具体包括如下步骤:
    获取从左目图像至右目图像的第t次左目匹配相似度,以及从所述右目图像到所述左目图像的第t次右目匹配相似度,其中,所述t为大于1的整数;
    通过神经网络模型对所述第t次左目匹配相似度以及第t-1次左目注意力图进行处理,得到第t次左目视差图;
    通过所述神经网络模型对所述第t次右目匹配相似度以及第t-1次右目注意力图进行处理,得到第t次右目视差图;
    根据所述第t次左目视差图确定第一深度信息,并根据所述第t次右目视差图确定第二深度信息;
    所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
  10. 根据权利要求9所述的深度信息确定装置,其特征在于,
    所述处理器还用于将所述第t次右目视差图映射至左目坐标系,得到第t次左目映射视差图;
    根据所述第t次左目映射视差图以及所述第t次左目视差图,生成第t次左目注意力图;
    将所述第t次左目视差图映射至右目坐标系,得到第t次右目映射视差图;
    根据所述第t次右目映射视差图以及所述第t次右目视差图,生成第t次右目注意力图。
  11. 根据权利要求9所述的深度信息确定装置,其特征在于,
    所述处理器还用于获取从左目图像至右目图像的第t+1次左目匹配相似度,以及从所述右目图像到所述左目图像的第t+1次右目匹配相似度;
    通过所述神经网络模型对所述第t+1次左目匹配相似度以及第t次左目注意力图进行处理,得到第t+1次左目视差图;
    通过所述神经网络模型对所述第t+1次右目匹配相似度以及第t次右目注意力图进行处理,得到第t+1次右目视差图;
    根据所述第t+1次左目视差图确定第三深度信息,并根据所述第t+1次右目视差图确定第四深度信息。
  12. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-8任意一项所述的方法。
  13. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-8任意一项所述的方法。
PCT/CN2019/077669 2018-04-04 2019-03-11 一种深度信息确定的方法及相关装置 WO2019192290A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19781105.2A EP3779881A4 (en) 2018-04-04 2019-03-11 PROCEDURE FOR DETERMINING DEPTH INFORMATION AND ASSOCIATED DEVICE
US16/899,287 US11145078B2 (en) 2018-04-04 2020-06-11 Depth information determining method and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810301988.3A CN108537837B (zh) 2018-04-04 2018-04-04 一种深度信息确定的方法及相关装置
CN201810301988.3 2018-04-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/899,287 Continuation US11145078B2 (en) 2018-04-04 2020-06-11 Depth information determining method and related apparatus

Publications (1)

Publication Number Publication Date
WO2019192290A1 true WO2019192290A1 (zh) 2019-10-10

Family

ID=63483242

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077669 WO2019192290A1 (zh) 2018-04-04 2019-03-11 一种深度信息确定的方法及相关装置

Country Status (4)

Country Link
US (1) US11145078B2 (zh)
EP (1) EP3779881A4 (zh)
CN (1) CN108537837B (zh)
WO (1) WO2019192290A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956195A (zh) * 2019-10-11 2020-04-03 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质
CN111985551A (zh) * 2020-08-14 2020-11-24 湖南理工学院 一种基于多重注意力网络的立体匹配算法
CN112365586A (zh) * 2020-11-25 2021-02-12 厦门瑞为信息技术有限公司 3d人脸建模与立体判断方法及嵌入式平台的双目3d人脸建模与立体判断方法
CN113014899A (zh) * 2019-12-20 2021-06-22 杭州海康威视数字技术股份有限公司 一种双目图像的视差确定方法、装置及系统
CN117422750A (zh) * 2023-10-30 2024-01-19 河南送变电建设有限公司 一种场景距离实时感知方法、装置、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537837B (zh) * 2018-04-04 2023-05-05 腾讯科技(深圳)有限公司 一种深度信息确定的方法及相关装置
US10503966B1 (en) * 2018-10-11 2019-12-10 Tindei Network Technology (Shanghai) Co., Ltd. Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same
CN109919993B (zh) * 2019-03-12 2023-11-07 腾讯科技(深圳)有限公司 视差图获取方法、装置和设备及控制系统
CN110334749B (zh) * 2019-06-20 2021-08-03 浙江工业大学 基于注意力机制的对抗攻击防御模型、构建方法及应用
CN110427968B (zh) * 2019-06-28 2021-11-02 武汉大学 一种基于细节增强的双目立体匹配方法
US11763433B2 (en) * 2019-11-14 2023-09-19 Samsung Electronics Co., Ltd. Depth image generation method and device
KR102316960B1 (ko) * 2019-11-28 2021-10-22 광운대학교 산학협력단 무인 항공기 영상 내 실시간 객체 검출 방법 및 장치
US11694341B2 (en) * 2019-12-23 2023-07-04 Texas Instmments Incorporated Cascaded architecture for disparity and motion prediction with block matching and convolutional neural network (CNN)
CN111464814B (zh) * 2020-03-12 2022-01-04 天津大学 一种基于视差引导融合的虚拟参考帧生成方法
CN111447429B (zh) * 2020-04-02 2021-03-19 深圳普捷利科技有限公司 一种基于双目摄像头摄像的车载裸眼3d显示方法及系统
CN112766151B (zh) * 2021-01-19 2022-07-12 北京深睿博联科技有限责任公司 一种用于导盲眼镜的双目目标检测方法和系统
CN113033174B (zh) * 2021-03-23 2022-06-10 哈尔滨工业大学 一种基于输出型相似门的案件分类方法、装置及存储介质
US11967096B2 (en) 2021-03-23 2024-04-23 Mediatek Inc. Methods and apparatuses of depth estimation from focus information
CN115375665B (zh) * 2022-08-31 2024-04-16 河南大学 一种基于深度学习策略的早期阿尔兹海默症发展预测方法
CN115294375B (zh) * 2022-10-10 2022-12-13 南昌虚拟现实研究院股份有限公司 一种散斑深度估算方法、系统、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130266211A1 (en) * 2012-04-06 2013-10-10 Brigham Young University Stereo vision apparatus and method
CN106600650A (zh) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 一种基于深度学习的双目视觉深度信息获取方法
CN107590831A (zh) * 2017-08-30 2018-01-16 电子科技大学 一种基于深度学习的立体匹配方法
CN108537837A (zh) * 2018-04-04 2018-09-14 腾讯科技(深圳)有限公司 一种深度信息确定的方法及相关装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI489418B (zh) * 2011-12-30 2015-06-21 Nat Univ Chung Cheng Parallax Estimation Depth Generation
CN102750731B (zh) * 2012-07-05 2016-03-23 北京大学 基于左右单眼感受野和双目融合的立体视觉显著计算方法
CN103617608B (zh) * 2013-10-24 2016-07-06 四川长虹电器股份有限公司 通过双目图像获得深度图的方法
JP2015154101A (ja) * 2014-02-10 2015-08-24 ソニー株式会社 画像処理方法、画像処理装置及び電子機器
NZ773834A (en) * 2015-03-16 2022-07-01 Magic Leap Inc Methods and systems for diagnosing and treating health ailments
CN107209854A (zh) * 2015-09-15 2017-09-26 深圳市大疆创新科技有限公司 用于支持顺畅的目标跟随的系统和方法
CN107027019B (zh) * 2016-01-29 2019-11-08 北京三星通信技术研究有限公司 图像视差获取方法及装置
CN106447718B (zh) * 2016-08-31 2019-06-04 天津大学 一种2d转3d深度估计方法
US10466714B2 (en) * 2016-09-01 2019-11-05 Ford Global Technologies, Llc Depth map estimation with stereo images
GB2553782B (en) * 2016-09-12 2021-10-20 Niantic Inc Predicting depth from image data using a statistical model
CN106355570B (zh) * 2016-10-21 2019-03-19 昆明理工大学 一种结合深度特征的双目立体视觉匹配方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130266211A1 (en) * 2012-04-06 2013-10-10 Brigham Young University Stereo vision apparatus and method
CN106600650A (zh) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 一种基于深度学习的双目视觉深度信息获取方法
CN107590831A (zh) * 2017-08-30 2018-01-16 电子科技大学 一种基于深度学习的立体匹配方法
CN108537837A (zh) * 2018-04-04 2018-09-14 腾讯科技(深圳)有限公司 一种深度信息确定的方法及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3779881A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956195A (zh) * 2019-10-11 2020-04-03 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质
CN110956195B (zh) * 2019-10-11 2023-06-02 平安科技(深圳)有限公司 图像匹配方法、装置、计算机设备及存储介质
CN113014899A (zh) * 2019-12-20 2021-06-22 杭州海康威视数字技术股份有限公司 一种双目图像的视差确定方法、装置及系统
CN113014899B (zh) * 2019-12-20 2023-02-03 杭州海康威视数字技术股份有限公司 一种双目图像的视差确定方法、装置及系统
CN111985551A (zh) * 2020-08-14 2020-11-24 湖南理工学院 一种基于多重注意力网络的立体匹配算法
CN111985551B (zh) * 2020-08-14 2023-10-27 湖南理工学院 一种基于多重注意力网络的立体匹配算法
CN112365586A (zh) * 2020-11-25 2021-02-12 厦门瑞为信息技术有限公司 3d人脸建模与立体判断方法及嵌入式平台的双目3d人脸建模与立体判断方法
CN112365586B (zh) * 2020-11-25 2023-07-18 厦门瑞为信息技术有限公司 3d人脸建模与立体判断方法及嵌入式平台的双目3d人脸建模与立体判断方法
CN117422750A (zh) * 2023-10-30 2024-01-19 河南送变电建设有限公司 一种场景距离实时感知方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN108537837B (zh) 2023-05-05
EP3779881A1 (en) 2021-02-17
US11145078B2 (en) 2021-10-12
US20200302629A1 (en) 2020-09-24
EP3779881A4 (en) 2022-01-05
CN108537837A (zh) 2018-09-14

Similar Documents

Publication Publication Date Title
WO2019192290A1 (zh) 一种深度信息确定的方法及相关装置
Zhan et al. Visual odometry revisited: What should be learnt?
AU2017324923B2 (en) Predicting depth from image data using a statistical model
WO2020182117A1 (zh) 视差图获取方法、装置和设备及控制系统和存储介质
CN111582207B (zh) 图像处理方法、装置、电子设备及存储介质
Zhou et al. Artificial neural networks for computer vision
KR20180087994A (ko) 스테레오 매칭 방법 및 영상 처리 장치
EP3326156B1 (en) Consistent tessellation via topology-aware surface tracking
EP3769265A1 (en) Localisation, mapping and network training
Jellal et al. LS-ELAS: Line segment based efficient large scale stereo matching
CN110243390B (zh) 位姿的确定方法、装置及里程计
EP3665651B1 (en) Hierarchical disparity hypothesis generation with slanted support windows
Laskowski A novel hybrid-maximum neural network in stereo-matching process
CN111127522A (zh) 基于单目相机的深度光流预测方法、装置、设备及介质
KR20220014678A (ko) 영상의 깊이를 추정하는 방법 및 장치
CN114494395A (zh) 基于平面先验的深度图生成方法、装置、设备及存储介质
Cheung et al. Optimization-based automatic parameter tuning for stereo vision
Joglekar et al. Area based stereo image matching technique using Hausdorff distance and texture analysis
US20180001821A1 (en) Environment perception using a surrounding monitoring system
Sundararajan et al. A Combined Approach for Stereoscopic 3D Reconstruction Model based on Improved Semi Global Matching.
US20220351399A1 (en) Apparatus and method for generating depth map using monocular image
Qadir A Large Scale Inertial Aided Visual Simultaneous Localization and Mapping (SLAM) System for Small Mobile Platforms
Kerkaou et al. Omnidirectional spatio-temporal matching based on machine learning
Zheng Toward 3D reconstruction of static and dynamic objects
Laskov et al. Comparison of 3D Algorithms for Non-rigid Motion and Correspondence Estimation.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19781105

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019781105

Country of ref document: EP

Effective date: 20201104