CN110517306A - A kind of method and system of the binocular depth vision estimation based on deep learning - Google Patents
A kind of method and system of the binocular depth vision estimation based on deep learning Download PDFInfo
- Publication number
- CN110517306A CN110517306A CN201910814513.9A CN201910814513A CN110517306A CN 110517306 A CN110517306 A CN 110517306A CN 201910814513 A CN201910814513 A CN 201910814513A CN 110517306 A CN110517306 A CN 110517306A
- Authority
- CN
- China
- Prior art keywords
- depth
- picture
- binocular
- deep learning
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013135 deep learning Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000003062 neural network model Methods 0.000 claims abstract description 20
- 238000013527 convolutional neural network Methods 0.000 claims description 33
- 230000004927 fusion Effects 0.000 claims description 29
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 9
- 230000003252 repetitive effect Effects 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 229920006395 saturated elastomer Polymers 0.000 claims description 3
- 238000002844 melting Methods 0.000 claims 1
- 230000008018 melting Effects 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 abstract description 8
- 238000013526 transfer learning Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 238000007796 conventional method Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The method and system for the binocular depth vision estimation based on deep learning that the invention discloses a kind of, includes the following steps, acquires training data;Depth generation module generates the depth distance of corresponding Pictures location, and saves as and correspond to relative distance with the original picture depth picture of the same size with depth information, picture pixels value;Neural network model training;The depth distance figure that estimation of Depth is estimated.Beneficial effects of the present invention: the present invention proposes that the binocular vision depth estimation method precision based on deep learning is high, and generalization ability is strong, it can be with transfer learning, and application is under difficult environmental conditions, in speed, arithmetic speed can be greatly improved compared to traditional algorithm operation time.
Description
Technical field
It fathoms the technical field of distance more particularly to a kind of based on deep learning the present invention relates to binocular camera
The method and its system of binocular depth vision estimation.
Background technique
Environmental objects distance is obtained by estimation of Depth in recent years, is an important neck in computer vision
Domain.The three-dimensional information in environment is reconstructed by binocular camera similar to two eyes of the mankind, is obtained remote to environmental objects distance
Close estimation.Binocular vision depth estimation method based on traditional computer vision such as SGM algorithm, it is not high that there are precision, speed
Slow feature is crossed, and algorithm is higher to condition depended, poor robustness under complex scene is difficult to meet the requirement of business landing.And
Binocular vision depth estimation method based on deep learning has the features such as precision is high, and generalization ability is strong, and speed is fast.
Summary of the invention
The purpose of this section is to summarize some aspects of the embodiment of the present invention and briefly introduce some preferable implementations
Example.It may do a little simplified or be omitted to avoid our department is made in this section and the description of the application and the title of the invention
Point, the purpose of abstract of description and denomination of invention it is fuzzy, and this simplification or omit and cannot be used for limiting the scope of the invention.
In view of above-mentioned existing problem, the present invention is proposed.
Therefore, the technical problem that the present invention solves is: proposing that a kind of binocular depth vision based on deep learning is estimated
The method of meter more accurately obtains environmental objects distance.
In order to solve the above technical problems, the invention provides the following technical scheme: a kind of binocular depth based on deep learning
The method of vision estimation, includes the following steps, acquires training data, and photographing module obtains the initial picture of two different perspectivess;
Depth generation module generates the depth distance of corresponding Pictures location, and saves as of the same size with depth information with original picture
Depth picture, picture pixels value corresponds to relative distance;Neural network model training, is input to the mind for the depth picture
Through training in network model, is obtained by repetitive exercise and save trained neural network parameter;Estimation of Depth, the camera shooting
Module acquires actual picture, and the actual picture is input in the trained neural network model and is calculated, is estimated
Depth distance figure.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the neural network model includes convolutional neural networks, Fusion Features network layer and 3D convolutional neural networks layer, including following
Training step, the picture that the photographing module is got is as input;By convolutional neural networks, the feature of two figures is obtained
Figure;Input of the output of convolutional neural networks layer as Fusion Features network layer, extracts fusion feature;It is put into 3D convolutional Neural
Depth map is extracted in network layer.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the collected depth picture is put into the neural network model, extracts picture by the convolutional neural networks
Feature;It is input to the Fusion Features network layer and carries out Fusion Features, associated characteristic matching is got up fusion generation 3D's
Characteristic pattern.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the model training is further comprising the steps of, and 3D convolution is done on the 3D characteristic pattern, and convolution kernel size is 3 × 3 × 3, obtains
To the fusion feature of position and depth on 3d characteristic pattern, characteristic pattern is the 1/4 of original size, therefore picture is upsampled to original
Size obtain with the consistent depth picture of picture size, and for pixel each in picture, all corresponding one group of size is D=48
This group of signal is normalized in depth signal, and function is defined as follows:
V is corresponding depth signal, and S is the depth signal after normalization:
Obtained normalized signal is multiplied by corresponding signal weight, obtains the depth parallax information of corresponding position.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: further include the steps that the network more new stage,
It by obtained disparity map and true disparity map, i.e., is compared, is used by the collected depth map of depth camera
Smooth L1 loss function obtains the penalty values of network, and loss function formula is as follows, and x is the data difference of corresponding position:
Penalty values backpropagation is gone back to update the parameter of the entire neural network of iteration;Above procedure is repeated, until network
Parameter update is smaller, and the multiple above repetitive exercise does not obtain better test result, that is, determines that training has tended to be saturated, training
It finishes.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the convolutional neural networks include following training step, and two depth pictures are put into the convolutional Neural net simultaneously
Picture feature figure is extracted in the residual error network layer of network;The characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing,
Obtain more abundant characteristic information.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the Fusion Features network layer includes the following steps that the feature that the convolutional layer of convolutional neural networks extracts is as input;Input
To the convolution depth integration layer feature-rich information content;Depth information fused layer generates the matched Information Level of depth information.
A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning,
In: the 3D convolutional neural networks layer includes the following steps, using the output of the Fusion Features network layer as input;Input
Hourglass module extracts more abundant deep layer high dimensional information;By up-sampling layer, the depth of original picture size is obtained
Module;Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure Wj,HkPixel value be Aijk, then correspond to
Depth map on output be the then output on corresponding depth map are as follows: Djk=∑ Aijk* i (i=0,1,2 ... ... D).
Another technical problem that the present invention solves is: proposing that a kind of binocular depth vision based on deep learning is estimated
System, the above method can rely on this system and be realized.
In order to solve the above technical problems, the invention provides the following technical scheme: include photographing module, depth generation module and
Neural network model;The photographing module is the camera being fixedly installed on binocular camera, two different perspectivess for acquisition
Picture;The depth generation module generates depth picture, and the relative distance of depth picture and camera according to the picture of acquisition
It is corresponding with the pixel value of image;The neural network model carries out deep learning using the picture obtained and saves neural network ginseng
Number, the depth distance map generalization for estimation.
A kind of preferred embodiment of system as the binocular depth vision estimation of the present invention based on deep learning,
In: the photographing module includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, the colour binocular camera
Be trained as the input of neural network for acquiring picture, the gray scale binocular camera because have better contrast with
Resolution ratio is for generating depth map.
Beneficial effects of the present invention: the present invention proposes that the binocular vision depth estimation method precision based on deep learning is high,
And generalization ability is strong, it can be with transfer learning, and application is under difficult environmental conditions, in speed, compared to traditional algorithm operation
Time can greatly improve arithmetic speed.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill of field, without any creative labor, it can also be obtained according to these attached drawings other
Attached drawing.Wherein:
Fig. 1 is the entirety of the method for the binocular depth vision estimation described in the first embodiment of the invention based on deep learning
Flow diagram;
Fig. 2 is that binocular depth described in the first embodiment of the invention estimates configuration diagram;
Fig. 3 is the structural schematic diagram of convolutional neural networks module layer described in the first embodiment of the invention;
Fig. 4 is the structural schematic diagram of Fusion Features network layer described in the first embodiment of the invention;
Fig. 5 is the structural schematic diagram of 3D convolutional neural networks layer described in the first embodiment of the invention;
Fig. 6 is the structure of the system of the binocular depth vision estimation described in second of embodiment of the invention based on deep learning
Schematic diagram.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, right with reference to the accompanying drawings of the specification
A specific embodiment of the invention is described in detail, it is clear that and described embodiment is a part of the embodiments of the present invention, and
It is not all of embodiment.Based on the embodiments of the present invention, ordinary people in the field is without making creative work
Every other embodiment obtained, all should belong to the range of protection of the invention.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, but the present invention can be with
Implemented using other than the one described here other way, those skilled in the art can be without prejudice to intension of the present invention
In the case of do similar popularization, therefore the present invention is not limited by the specific embodiments disclosed below.
Secondly, " one embodiment " or " embodiment " referred to herein, which refers to, may be included at least one realization side of the invention
A particular feature, structure, or characteristic in formula." in one embodiment " that different places occur in the present specification not refers both to
The same embodiment, nor the individual or selective embodiment mutually exclusive with other embodiments.
Combination schematic diagram of the present invention is described in detail, when describing the embodiments of the present invention, for purposes of illustration only, indicating device
The sectional view of structure can disobey general proportion and make partial enlargement, and the schematic diagram is example, should not limit this herein
Invent the range of protection.In addition, the three-dimensional space of length, width and depth should be included in actual fabrication.
Simultaneously in the description of the present invention, it should be noted that the orientation of the instructions such as " upper and lower, inner and outer " in term
Or positional relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, and
It is not that the device of indication or suggestion meaning or element must have a particular orientation, be constructed and operated in a specific orientation, therefore
It is not considered as limiting the invention.In addition, term " first, second or third " is used for description purposes only, and cannot understand
For indication or suggestion relative importance.
In the present invention unless otherwise clearly defined and limited, term " installation is connected, connection " shall be understood in a broad sense, example
Such as: may be a fixed connection, be detachably connected or integral type connection;It equally can be mechanical connection, be electrically connected or be directly connected to,
Can also indirectly connected through an intermediary, the connection being also possible to inside two elements.For the ordinary skill people of this field
For member, the concrete meaning of above-mentioned term in the present invention can be understood with concrete condition.
Embodiment 1
Signal referring to Fig.1~5 obtains environmental objects distance by estimation of Depth, is one in computer vision
The important field of item.It is similar to two eyes of the mankind, by binocular camera, reconstructs the three-dimensional information in environment, obtains to ring
The estimation of border object distance distance.
By the calculating to two images parallax, directly front scenery (range taken by image) is carried out apart from survey
Amount, without judging that front occurs that kind of barrier.So for any kind of barrier, can according to away from
Variation from information carries out necessary early warning or braking.The principle of binocular camera is similar to human eye.Human eye can perceive object
Distance, be that the image presented due to two eyes to the same object is had differences, also referred to as " parallax ".Object distance is remoter,
Parallax is smaller;Otherwise parallax is bigger.The size of parallax corresponds to the distance of distance between object and eyes, this is also 3D film energy
The reason of enough having made one stereoscopic level perception.
Binocular vision depth estimation method based on traditional computer vision such as SGM algorithm, it is not high that there are precision, speed mistake
Slow feature, and algorithm is higher to condition depended, poor robustness under complex scene, is difficult to meet the requirement of business landing.Therefore
The binocular vision depth estimation method based on deep learning is proposed in the present embodiment, there can be precision high, and generalization ability is strong, speed
Degree is fast.
Specifically, include the following steps,
S1: acquisition training data, photographing module 100 obtain the initial picture 101 of two different perspectivess;
S2: depth generation module 200 generates the depth distance of corresponding Pictures location, and saves as in the same size with original picture
The depth picture 201 with depth information, picture pixels value corresponds to relative distance;
S4: depth picture 201 is input to training in neural network model 300, passed through by the training of neural network model 300
Repetitive exercise obtains and saves trained neural network parameter;Neural network model 300 includes convolutional neural networks 301, spy
Levy converged network layer 302 and 3D convolutional neural networks layer 303, including following training step, the picture that photographing module 100 is got
As input;By convolutional neural networks 301, the characteristic pattern of two figures is obtained;The output of convolutional neural networks layer 301 is as special
The input for levying converged network layer 302, extracts fusion feature;It is put into 3D convolutional neural networks layer 303 and extracts depth map.
More specifically, following steps,
Collected depth picture 201 is put into neural network model 300, extracts and schemes by convolutional neural networks 301
Piece feature;It is input to Fusion Features network layer 302 and carries out Fusion Features, associated characteristic matching is got up fusion generation 3D's
Characteristic pattern;
Then 3D convolution is done on 3D characteristic pattern, convolution kernel size is 3 × 3 × 3, obtains position and depth on 3d characteristic pattern
Fusion feature, characteristic pattern (the output result that characteristic pattern refers to 3D convolution at this time) be original size 1/4 (original size is
The picture full size that camera obtains, 1/4 purpose are in order to which picture compression reduces calculation amount, and not so network is difficult less than 1s's
Result is obtained in time), thus picture be upsampled to original size obtain it is (such as double with the consistent depth picture of picture size
The mode of linear interpolation up-samples), and for pixel each in picture, all corresponding one group of size is D=48 depth signal, right
This group of signal is normalized, and function is defined as follows:
V is corresponding depth signal, and S is the depth signal after normalization:
Obtained normalized signal is multiplied by corresponding signal weight, obtains the depth parallax information of corresponding position, directly
To depth information, subsequent processing is not needed.
It by obtained disparity map and true disparity map, i.e., is compared, is used by the collected depth map of depth camera
Smooth L1 loss function obtains the penalty values of network, and loss function formula is as follows, and x is the data difference of corresponding position:
Penalty values backpropagation is gone back to update the parameter of the entire neural network of iteration;
Above procedure is repeated, (final depth information figure is can be understood as until network parameter update is smaller, instructs every time
Practice, all variation is little, such as some corresponding points depth is 100, and next training always all is 100 or so many times, indicates
Network no longer learns), the multiple above repetitive exercise does not obtain better test result, that is, determines that training has tended to be saturated, instruct
White silk finishes.
S4: estimation of Depth, photographing module 100 acquire actual picture, actual picture are input to trained neural network
It is calculated in model 300, the depth distance figure estimated.
It successively include convolutional neural networks 301, Fusion Features network layer 302,3D convolutional neural networks layer in the present embodiment
303 training step, specifically.
Convolutional neural networks 301 include following training step, and two depth pictures 201 are put into convolutional neural networks simultaneously
Picture feature figure is extracted in 301 residual error network layer;Characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing, obtains
More abundant characteristic information.
Fusion Features network layer 302 includes the following steps that the feature that the convolutional layer of convolutional neural networks 301 extracts is as defeated
Enter;It is input to the convolution depth integration layer feature-rich information content;Depth information fused layer generates the matched information of depth information
Layer.
3D convolutional neural networks layer 303 includes the following steps, using the output of Fusion Features network layer 302 as input;It is defeated
Enter Hourglass module and extracts more abundant deep layer high dimensional information;By up-sampling layer, the depth of original picture size is obtained
Spend module;Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure Wj,HkPixel value be Aijk, then right
The output on depth map answered is the output on then corresponding depth map are as follows: Djk=∑ Aijk* i i=0,1,2 ... ... D.
It should be noted that the application provides a kind of method of binocular depth vision estimation based on deep learning, it is practical
On by the picture of two different perspectivess obtained by the camera of two on binocular camera fixed positions, this two picture is same
When be put into convolutional neural networks residual error network extraction picture feature figure, then picture feature figure is put into spatial pyramid pond layer
Do feature enhancing.Two picture features with correlation are subjected to Fusion Features, wherein the purpose of Fusion Features: two differences
The characteristic pattern that the picture at visual angle proposes has correlation, because two figure differences are that visual angle is different, has in characteristic pattern very much
These Fusion Features are for subsequent depth extraction together by the same or similar matching characteristic.
Using fused layer neural network, supervised learning distinguishes and the calculation of traditional binocular ranging to suitable characteristic matching
Method, characteristic matching here are the matching for having neural network oneself to learn, such as poor characteristic matching, can allow result very
Difference, according to the quality of result, Backward Supervision neural network is adjusted to better characteristic matching.
Finally only need to provide the effective distance range of camera, such as 200 meters maximum, neural network can provide 0- on picture
Depth distance information within 200 meters, beyond effective distance range, confidence level reliability is not high, therefore sets all in accordance with 200 meters
It is fixed.
The present embodiment proposes the binocular vision depth estimation method based on deep learning, and precision is high.In KITTI data set
On, it is 0.83% that average error pixel percentage, which may be implemented, and traditional binocular depth estimation method error rate is 3.57%.And
And the method generalization ability of this patent estimation binocular depth is strong, it can be with the meaning of transfer learning wherein transfer learning: with same
Neural network framework can be applied in different types of binocular camera, and not need entire re -training neural network completely, only
It needs on the basis of original, sets the parameter of binocular camera, it is only necessary to update the defeated of training part of neural network network layer
Parameter out, it is possible to reduce development time and difficulty, and application is under difficult environmental conditions, in speed, is in size
On the picture of 1242*375, average calculating operation time is 0.32 second, compared to traditional algorithm operation time 3.7 seconds, is substantially increased
Arithmetic speed also substantially conforms to the requirement that business is landed, solves the practical problem of general traditional binocular estimation of Depth scheme, In
Automatic Pilot, the related fieldss such as indoor positioning have very big application prospect.
Scene one:
The vehicle of the test vehicle for disposing this method and deployment conventional method is compared test by the present embodiment, and is used
Python software programming is tested on KITTI data set in the emulation testing for realizing this method and conventional method, according to
Experimental result obtains emulation data, and conventional method uses SGM algorithm, SDM algorithm in experiment.Testing environment is public data collection
The binocular depth test set of KITTI, by conventional method and this method operation python software realization emulation.
Performance comparison is carried out to each algorithm, tests the speed of service of each algorithm, and calculate its each error amount,
Evaluated error is averaged on to KITTI test set.Experiment is compared using above-mentioned 2 kinds of conventional methods and this method
Test, the signal of test result is as follows table 1.
Table 1: test result.
Algorithm | Speed | Average error pixel ratio | Mean parallax |
This patent | 0.32s | 1.32% | 0.5px |
SGM | 3.7s | 5.76% | 1.3px |
SDM | ~1min | 10.95% | 2.0px |
Referring to data in upper table 1, it can be observed how the algorithm of this implementation is in speed and precision is all significantly better than traditional calculation
Method.
Embodiment 2
Referring to the signal of Fig. 6, the system that the present embodiment proposes a kind of binocular depth vision estimation based on deep learning, on
It states embodiment and can depend on this system and realized, above-described embodiment method or system can be applied to the depth of vehicle
Vision estimation.Such as the binocular camera by being installed on vehicle body carries out the image information around shooting vehicle body, while vehicle-mounted
Deep learning network algorithm is written in host, will shoot and carry out calculating estimation environmental objects distance far in image input algoritic module
Closely, and with shown on the display screen of on-vehicle host, therefore the safety traffic of driver can be reminded.
Specifically, this system includes photographing module 100, depth generation module 200 and neural network model 300;Image mould
Block 100 is the camera being fixedly installed on binocular camera, the picture of two different perspectivess for acquisition;Depth generation module
200 generate depth picture 201, and depth picture 201 and the relative distance of camera and the pixel value of image according to the picture of acquisition
It is corresponding;Neural network model 300 carries out deep learning using the picture obtained and saves neural network parameter, the depth for estimation
Apart from map generalization.Wherein photographing module 100 includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, colored
Binocular camera is trained for acquiring picture as the input of neural network, and gray scale binocular camera is because there is better comparison
Degree and resolution ratio are for generating depth map.
It should be noted that corresponding two three-dimensional depth level grey cameras can be generated the depth of corresponding Pictures location away from
From finding binocular picture Corresponding matching point, count according to the pixel difference between match point, obtain depth information d=f*b/d, f
It is the focal length of camera, b is the distance of two cameras of binocular, and d is the pixel difference of two match points, i.e., at a distance from camera.And it protects
Save as the picture of one with original picture depth information of the same size, such as the general depth for only needing to save corresponding left hand side
Hum pattern, picture pixels value correspond to relative distance, and the information of gray scale picture is generally made of Any Digit in 0-255, example
If white is 255, black is 1, if the effective distance of camera is 1 to 200 meters, it can indicates deep with the pixel value of picture
Information is spent, i.e., 155 indicate this point of picture from 155 meters of camera.
Network is not know how to do Fusion Features matched in untrained situation, and by constantly training, network can
To provide suitable feature matching method, associated characteristic matching is got up, wherein associated mean that left figure sees one
The automobile that automobile is seen with right figure has the relevant information of automobile on characteristic pattern, and the effect of Fusion Features layer is exactly
This associated automobile information matching is got up, to be fused into the characteristic pattern of 3D.
Depth generation module 200 can be the computing module in depth level grey camera in the present embodiment, for example, by using
The depth camera of RGBD-SLAM, detection range include detection accuracy, detection angles, frame per second;Module low power consumption simultaneously is low, deep
Degree information relies on pure software algorithm and obtains, processing chip has very high calculated performance, therefore can obtain ambient enviroment object
Depth information.The present embodiment is only used as the acquisition of training data, while it is not difficult to find that since processing chip needs very
The slow disadvantage of high calculated performance and operation also can use after common camera shooting obtains image and directly take binocular on computers
The acquisition of imaging algorithm progress depth information.After the completion of the training of neural network model 300, only general take the photograph need to be installed in vehicle body
Camera, input neural network model 300 obtain depth information, and the performance not only needed is lower and arithmetic speed is fast, and nerve net
Network model 300 is write-in deep learning algorithm chip, is set in on-vehicle host, such as can be the deep learning master of GPU
Chip is flowed, is entirely exactly a huge calculating matrix, GPU has thousands of calculating cores, can be achieved 10-100 times
Application throughput, and it is also supported to the vital computation capability of deep learning, it can be than conventional processors more
Quickly, training process is greatly accelerated.GPU is one of the deep learning arithmetic element most generallyd use at present.
It should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to preferable
Embodiment describes the invention in detail, those skilled in the art should understand that, it can be to technology of the invention
Scheme is modified or replaced equivalently, and without departing from the spirit and scope of the technical solution of the present invention, should all be covered in this hair
In bright scope of the claims.
Claims (10)
1. a kind of method of the binocular depth vision estimation based on deep learning, it is characterised in that: include the following steps,
Training data is acquired, photographing module (100) obtains the initial picture (101) of two different perspectivess;
Depth generation module (200) generates the depth distance of corresponding Pictures location, and saves as and original picture tool of the same size
There is the depth picture (201) of depth information, picture pixels value corresponds to relative distance;
The depth picture (201) is input to instruction in the neural network model (300) by neural network model (300) training
Practice, is obtained by repetitive exercise and save trained neural network parameter;
Estimation of Depth, the photographing module (100) acquire actual picture, the actual picture are input to the trained mind
Through being calculated in network model (300), the depth distance figure estimated.
2. the method for the binocular depth vision estimation based on deep learning as described in claim 1, it is characterised in that: the mind
It include convolutional neural networks (301), Fusion Features network layer (302) and 3D convolutional neural networks layer through network model (300)
(303), including following training step,
The picture that the photographing module (100) gets is as input;
By convolutional neural networks (301), the characteristic pattern of two figures is obtained;
Input of the output of convolutional neural networks layer (301) as Fusion Features network layer (302), extracts fusion feature;
It is put into 3D convolutional neural networks layer (303) and extracts depth map.
3. the method for the binocular depth vision estimation based on deep learning as claimed in claim 2, it is characterised in that: will acquire
To the depth picture (201) be put into the neural network model (300), mentioned by the convolutional neural networks (301)
Take picture feature;It is input to the Fusion Features network layer (302) and carries out Fusion Features, associated characteristic matching is got up to melt
Symphysis at 3D characteristic pattern.
4. the method for the binocular depth vision estimation based on deep learning as claimed in claim 3, it is characterised in that: the mould
Type training is further comprising the steps of,
Do 3D convolution on the 3D characteristic pattern, convolution kernel size is 3 × 3 × 3, obtains melting for position and depth on 3d characteristic pattern
Feature is closed, characteristic pattern is the 1/4 of original size, therefore picture is upsampled to original size and is obtained and the consistent depth of picture size
Picture is spent, and for pixel each in picture, all corresponding one group of size is D=48 depth signal, is returned to this group of signal
One changes, and function is defined as follows:
V is corresponding depth signal, and S is the depth signal after normalization:
Obtained normalized signal is multiplied by corresponding signal weight, obtains the depth parallax information of corresponding position.
5. the method for the binocular depth vision estimation based on deep learning as claimed in claim 4, it is characterised in that: further include
The step of network more new stage,
That is, it compares, obtained disparity map and true disparity map using smooth by the collected depth map of depth camera
L1 loss function obtain the penalty values of network, loss function formula is as follows, x be corresponding position data difference:
Penalty values backpropagation is gone back to update the parameter of the entire neural network of iteration;
Above procedure is repeated, until network parameter update is smaller, the multiple above repetitive exercise does not obtain better test result,
Determine that training has tended to be saturated, training finishes.
6. the method for the binocular depth vision estimation based on deep learning as described in claim 2~5 is any, feature exist
In: the convolutional neural networks (301) include following training step,
Two depth pictures (201) are put into the residual error network layer of the convolutional neural networks (301) simultaneously and extract figure
Piece characteristic pattern;
The characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing, obtains more abundant characteristic information.
7. the method for the binocular depth vision estimation based on deep learning as claimed in claim 6, it is characterised in that: the spy
Converged network layer (302) are levied to include the following steps,
The feature that the convolutional layer of convolutional neural networks (301) extracts is as input;
It is input to the convolution depth integration layer feature-rich information content;
Depth information fused layer generates the matched Information Level of depth information.
8. the method for the binocular depth vision estimation based on deep learning as claimed in claim 7, it is characterised in that: the 3D
Convolutional neural networks layer (303) includes the following steps,
Using the output of the Fusion Features network layer (302) as input;
It inputs Hourglass module and extracts more abundant deep layer high dimensional information;
By up-sampling layer, the depth module of original picture size is obtained;
Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure Wj,HkPixel value be Aijk, then corresponding
Output on depth map is the output on then corresponding depth map are as follows:
Djk=∑ Aijk* i (i=0,1,2 ... ... D).
9. it is a kind of based on deep learning binocular depth vision estimation system, it is characterised in that: including photographing module (100),
Depth generation module (200) and neural network model (300);
The photographing module (100) is the camera being fixedly installed on binocular camera, the figure of two different perspectivess for acquisition
Piece;The depth generation module (200) generates depth picture (201) according to the picture of acquisition, and depth picture (201) and camera
Relative distance it is corresponding with the pixel value of image;The neural network model (300) carries out deep learning using the picture obtained
Save neural network parameter, the depth distance map generalization for estimation.
10. the system of the binocular depth vision estimation based on deep learning as claimed in claim 9, it is characterised in that: described
Photographing module (100) includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, the colour binocular camera
It is trained for acquiring picture as the input of neural network, the gray scale binocular camera is because having better contrast and dividing
Resolution is for generating depth map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910814513.9A CN110517306B (en) | 2019-08-30 | 2019-08-30 | Binocular depth vision estimation method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910814513.9A CN110517306B (en) | 2019-08-30 | 2019-08-30 | Binocular depth vision estimation method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110517306A true CN110517306A (en) | 2019-11-29 |
CN110517306B CN110517306B (en) | 2023-07-28 |
Family
ID=68629476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910814513.9A Active CN110517306B (en) | 2019-08-30 | 2019-08-30 | Binocular depth vision estimation method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110517306B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179330A (en) * | 2019-12-27 | 2020-05-19 | 福建(泉州)哈工大工程技术研究院 | Binocular vision scene depth estimation method based on convolutional neural network |
CN111310916A (en) * | 2020-01-22 | 2020-06-19 | 浙江省北大信息技术高等研究院 | Depth system training method and system for distinguishing left and right eye pictures |
CN112446822A (en) * | 2021-01-29 | 2021-03-05 | 聚时科技(江苏)有限公司 | Method for generating contaminated container number picture |
CN112967332A (en) * | 2021-03-16 | 2021-06-15 | 清华大学 | Binocular depth estimation method and device based on gated imaging and computer equipment |
CN113344997A (en) * | 2021-06-11 | 2021-09-03 | 山西方天圣华数字科技有限公司 | Method and system for rapidly acquiring high-definition foreground image only containing target object |
CN113763447A (en) * | 2021-08-24 | 2021-12-07 | 北京的卢深视科技有限公司 | Method for completing depth map, electronic device and storage medium |
CN114035871A (en) * | 2021-10-28 | 2022-02-11 | 深圳市优聚显示技术有限公司 | Display method and system of 3D display screen based on artificial intelligence and computer equipment |
CN114789870A (en) * | 2022-05-20 | 2022-07-26 | 深圳市信成医疗科技有限公司 | Innovative modular drug storage management implementation mode |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122373A1 (en) * | 2018-12-10 | 2019-04-25 | Intel Corporation | Depth and motion estimations in machine learning environments |
CN110060290A (en) * | 2019-03-14 | 2019-07-26 | 中山大学 | A kind of binocular parallax calculation method based on 3D convolutional neural networks |
-
2019
- 2019-08-30 CN CN201910814513.9A patent/CN110517306B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122373A1 (en) * | 2018-12-10 | 2019-04-25 | Intel Corporation | Depth and motion estimations in machine learning environments |
CN110060290A (en) * | 2019-03-14 | 2019-07-26 | 中山大学 | A kind of binocular parallax calculation method based on 3D convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
深视: "论文阅读笔记《Pyramid Stereo Matching Network》", 《CSDN》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179330A (en) * | 2019-12-27 | 2020-05-19 | 福建(泉州)哈工大工程技术研究院 | Binocular vision scene depth estimation method based on convolutional neural network |
CN111310916A (en) * | 2020-01-22 | 2020-06-19 | 浙江省北大信息技术高等研究院 | Depth system training method and system for distinguishing left and right eye pictures |
CN111310916B (en) * | 2020-01-22 | 2022-10-25 | 浙江省北大信息技术高等研究院 | Depth system training method and system for distinguishing left and right eye pictures |
CN112446822A (en) * | 2021-01-29 | 2021-03-05 | 聚时科技(江苏)有限公司 | Method for generating contaminated container number picture |
CN112446822B (en) * | 2021-01-29 | 2021-07-30 | 聚时科技(江苏)有限公司 | Method for generating contaminated container number picture |
CN112967332A (en) * | 2021-03-16 | 2021-06-15 | 清华大学 | Binocular depth estimation method and device based on gated imaging and computer equipment |
CN113344997A (en) * | 2021-06-11 | 2021-09-03 | 山西方天圣华数字科技有限公司 | Method and system for rapidly acquiring high-definition foreground image only containing target object |
CN113763447A (en) * | 2021-08-24 | 2021-12-07 | 北京的卢深视科技有限公司 | Method for completing depth map, electronic device and storage medium |
CN114035871A (en) * | 2021-10-28 | 2022-02-11 | 深圳市优聚显示技术有限公司 | Display method and system of 3D display screen based on artificial intelligence and computer equipment |
CN114789870A (en) * | 2022-05-20 | 2022-07-26 | 深圳市信成医疗科技有限公司 | Innovative modular drug storage management implementation mode |
Also Published As
Publication number | Publication date |
---|---|
CN110517306B (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110517306A (en) | A kind of method and system of the binocular depth vision estimation based on deep learning | |
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
Lee et al. | From big to small: Multi-scale local planar guidance for monocular depth estimation | |
Quan et al. | Deep learning for seeing through window with raindrops | |
Kuznietsov et al. | Semi-supervised deep learning for monocular depth map prediction | |
CN110032278B (en) | Pose identification method, device and system for human eye interested object | |
CN113052835B (en) | Medicine box detection method and system based on three-dimensional point cloud and image data fusion | |
CN110427917B (en) | Method and device for detecting key points | |
CN104036488B (en) | Binocular vision-based human body posture and action research method | |
CN112634341B (en) | Method for constructing depth estimation model of multi-vision task cooperation | |
CN110378838B (en) | Variable-view-angle image generation method and device, storage medium and electronic equipment | |
Tian et al. | Depth estimation using a self-supervised network based on cross-layer feature fusion and the quadtree constraint | |
CN101443817B (en) | Method and device for determining correspondence, preferably for the three-dimensional reconstruction of a scene | |
CN111343367B (en) | Billion-pixel virtual reality video acquisition device, system and method | |
CN106157307A (en) | A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF | |
CN111524233B (en) | Three-dimensional reconstruction method of static scene dynamic target | |
CN105740775A (en) | Three-dimensional face living body recognition method and device | |
CN105631859B (en) | Three-degree-of-freedom bionic stereo visual system | |
CN111462128A (en) | Pixel-level image segmentation system and method based on multi-modal spectral image | |
CN105184857A (en) | Scale factor determination method in monocular vision reconstruction based on dot structured optical ranging | |
CN114049434A (en) | 3D modeling method and system based on full convolution neural network | |
CN111325782A (en) | Unsupervised monocular view depth estimation method based on multi-scale unification | |
CN106295657A (en) | A kind of method extracting human height's feature during video data structure | |
CN114419568A (en) | Multi-view pedestrian detection method based on feature fusion | |
CN115375581A (en) | Dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 11th Floor, Building A1, Huizhi Science and Technology Park, No. 8 Hengtai Road, Nanjing Economic and Technological Development Zone, Jiangsu Province, 211000 Patentee after: DILU TECHNOLOGY Co.,Ltd. Address before: Building C4, No.55 Liyuan South Road, moling street, Jiangning District, Nanjing City, Jiangsu Province Patentee before: DILU TECHNOLOGY Co.,Ltd. |