CN110517306A

CN110517306A - A kind of method and system of the binocular depth vision estimation based on deep learning

Info

Publication number: CN110517306A
Application number: CN201910814513.9A
Authority: CN
Inventors: 秦豪
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-11-29
Anticipated expiration: 2039-08-30
Also published as: CN110517306B

Abstract

The method and system for the binocular depth vision estimation based on deep learning that the invention discloses a kind of, includes the following steps, acquires training data；Depth generation module generates the depth distance of corresponding Pictures location, and saves as and correspond to relative distance with the original picture depth picture of the same size with depth information, picture pixels value；Neural network model training；The depth distance figure that estimation of Depth is estimated.Beneficial effects of the present invention: the present invention proposes that the binocular vision depth estimation method precision based on deep learning is high, and generalization ability is strong, it can be with transfer learning, and application is under difficult environmental conditions, in speed, arithmetic speed can be greatly improved compared to traditional algorithm operation time.

Description

A kind of method and system of the binocular depth vision estimation based on deep learning

Technical field

It fathoms the technical field of distance more particularly to a kind of based on deep learning the present invention relates to binocular camera The method and its system of binocular depth vision estimation.

Background technique

Environmental objects distance is obtained by estimation of Depth in recent years, is an important neck in computer vision Domain.The three-dimensional information in environment is reconstructed by binocular camera similar to two eyes of the mankind, is obtained remote to environmental objects distance Close estimation.Binocular vision depth estimation method based on traditional computer vision such as SGM algorithm, it is not high that there are precision, speed Slow feature is crossed, and algorithm is higher to condition depended, poor robustness under complex scene is difficult to meet the requirement of business landing.And Binocular vision depth estimation method based on deep learning has the features such as precision is high, and generalization ability is strong, and speed is fast.

Summary of the invention

The purpose of this section is to summarize some aspects of the embodiment of the present invention and briefly introduce some preferable implementations Example.It may do a little simplified or be omitted to avoid our department is made in this section and the description of the application and the title of the invention Point, the purpose of abstract of description and denomination of invention it is fuzzy, and this simplification or omit and cannot be used for limiting the scope of the invention.

In view of above-mentioned existing problem, the present invention is proposed.

Therefore, the technical problem that the present invention solves is: proposing that a kind of binocular depth vision based on deep learning is estimated The method of meter more accurately obtains environmental objects distance.

In order to solve the above technical problems, the invention provides the following technical scheme: a kind of binocular depth based on deep learning The method of vision estimation, includes the following steps, acquires training data, and photographing module obtains the initial picture of two different perspectivess； Depth generation module generates the depth distance of corresponding Pictures location, and saves as of the same size with depth information with original picture Depth picture, picture pixels value corresponds to relative distance；Neural network model training, is input to the mind for the depth picture Through training in network model, is obtained by repetitive exercise and save trained neural network parameter；Estimation of Depth, the camera shooting Module acquires actual picture, and the actual picture is input in the trained neural network model and is calculated, is estimated Depth distance figure.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the neural network model includes convolutional neural networks, Fusion Features network layer and 3D convolutional neural networks layer, including following Training step, the picture that the photographing module is got is as input；By convolutional neural networks, the feature of two figures is obtained Figure；Input of the output of convolutional neural networks layer as Fusion Features network layer, extracts fusion feature；It is put into 3D convolutional Neural Depth map is extracted in network layer.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the collected depth picture is put into the neural network model, extracts picture by the convolutional neural networks Feature；It is input to the Fusion Features network layer and carries out Fusion Features, associated characteristic matching is got up fusion generation 3D's Characteristic pattern.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the model training is further comprising the steps of, and 3D convolution is done on the 3D characteristic pattern, and convolution kernel size is 3 × 3 × 3, obtains To the fusion feature of position and depth on 3d characteristic pattern, characteristic pattern is the 1/4 of original size, therefore picture is upsampled to original Size obtain with the consistent depth picture of picture size, and for pixel each in picture, all corresponding one group of size is D=48 This group of signal is normalized in depth signal, and function is defined as follows:

V is corresponding depth signal, and S is the depth signal after normalization:

Obtained normalized signal is multiplied by corresponding signal weight, obtains the depth parallax information of corresponding position.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: further include the steps that the network more new stage,

It by obtained disparity map and true disparity map, i.e., is compared, is used by the collected depth map of depth camera Smooth L1 loss function obtains the penalty values of network, and loss function formula is as follows, and x is the data difference of corresponding position:

Penalty values backpropagation is gone back to update the parameter of the entire neural network of iteration；Above procedure is repeated, until network Parameter update is smaller, and the multiple above repetitive exercise does not obtain better test result, that is, determines that training has tended to be saturated, training It finishes.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the convolutional neural networks include following training step, and two depth pictures are put into the convolutional Neural net simultaneously Picture feature figure is extracted in the residual error network layer of network；The characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing, Obtain more abundant characteristic information.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the Fusion Features network layer includes the following steps that the feature that the convolutional layer of convolutional neural networks extracts is as input；Input To the convolution depth integration layer feature-rich information content；Depth information fused layer generates the matched Information Level of depth information.

A kind of preferred embodiment of method as the binocular depth vision estimation of the present invention based on deep learning, In: the 3D convolutional neural networks layer includes the following steps, using the output of the Fusion Features network layer as input；Input Hourglass module extracts more abundant deep layer high dimensional information；By up-sampling layer, the depth of original picture size is obtained Module；Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure W_j,H_kPixel value be A_ijk, then correspond to Depth map on output be the then output on corresponding depth map are as follows: D_jk=∑ A_ijk* i (i=0,1,2 ... ... D).

Another technical problem that the present invention solves is: proposing that a kind of binocular depth vision based on deep learning is estimated System, the above method can rely on this system and be realized.

In order to solve the above technical problems, the invention provides the following technical scheme: include photographing module, depth generation module and Neural network model；The photographing module is the camera being fixedly installed on binocular camera, two different perspectivess for acquisition Picture；The depth generation module generates depth picture, and the relative distance of depth picture and camera according to the picture of acquisition It is corresponding with the pixel value of image；The neural network model carries out deep learning using the picture obtained and saves neural network ginseng Number, the depth distance map generalization for estimation.

A kind of preferred embodiment of system as the binocular depth vision estimation of the present invention based on deep learning, In: the photographing module includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, the colour binocular camera Be trained as the input of neural network for acquiring picture, the gray scale binocular camera because have better contrast with Resolution ratio is for generating depth map.

Beneficial effects of the present invention: the present invention proposes that the binocular vision depth estimation method precision based on deep learning is high, And generalization ability is strong, it can be with transfer learning, and application is under difficult environmental conditions, in speed, compared to traditional algorithm operation Time can greatly improve arithmetic speed.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill of field, without any creative labor, it can also be obtained according to these attached drawings other Attached drawing.Wherein:

Fig. 1 is the entirety of the method for the binocular depth vision estimation described in the first embodiment of the invention based on deep learning Flow diagram；

Fig. 2 is that binocular depth described in the first embodiment of the invention estimates configuration diagram；

Fig. 3 is the structural schematic diagram of convolutional neural networks module layer described in the first embodiment of the invention；

Fig. 4 is the structural schematic diagram of Fusion Features network layer described in the first embodiment of the invention；

Fig. 5 is the structural schematic diagram of 3D convolutional neural networks layer described in the first embodiment of the invention；

Fig. 6 is the structure of the system of the binocular depth vision estimation described in second of embodiment of the invention based on deep learning Schematic diagram.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, right with reference to the accompanying drawings of the specification A specific embodiment of the invention is described in detail, it is clear that and described embodiment is a part of the embodiments of the present invention, and It is not all of embodiment.Based on the embodiments of the present invention, ordinary people in the field is without making creative work Every other embodiment obtained, all should belong to the range of protection of the invention.

In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, but the present invention can be with Implemented using other than the one described here other way, those skilled in the art can be without prejudice to intension of the present invention In the case of do similar popularization, therefore the present invention is not limited by the specific embodiments disclosed below.

Secondly, " one embodiment " or " embodiment " referred to herein, which refers to, may be included at least one realization side of the invention A particular feature, structure, or characteristic in formula." in one embodiment " that different places occur in the present specification not refers both to The same embodiment, nor the individual or selective embodiment mutually exclusive with other embodiments.

Combination schematic diagram of the present invention is described in detail, when describing the embodiments of the present invention, for purposes of illustration only, indicating device The sectional view of structure can disobey general proportion and make partial enlargement, and the schematic diagram is example, should not limit this herein Invent the range of protection.In addition, the three-dimensional space of length, width and depth should be included in actual fabrication.

Simultaneously in the description of the present invention, it should be noted that the orientation of the instructions such as " upper and lower, inner and outer " in term Or positional relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, and It is not that the device of indication or suggestion meaning or element must have a particular orientation, be constructed and operated in a specific orientation, therefore It is not considered as limiting the invention.In addition, term " first, second or third " is used for description purposes only, and cannot understand For indication or suggestion relative importance.

In the present invention unless otherwise clearly defined and limited, term " installation is connected, connection " shall be understood in a broad sense, example Such as: may be a fixed connection, be detachably connected or integral type connection；It equally can be mechanical connection, be electrically connected or be directly connected to, Can also indirectly connected through an intermediary, the connection being also possible to inside two elements.For the ordinary skill people of this field For member, the concrete meaning of above-mentioned term in the present invention can be understood with concrete condition.

Embodiment 1

Signal referring to Fig.1~5 obtains environmental objects distance by estimation of Depth, is one in computer vision The important field of item.It is similar to two eyes of the mankind, by binocular camera, reconstructs the three-dimensional information in environment, obtains to ring The estimation of border object distance distance.

By the calculating to two images parallax, directly front scenery (range taken by image) is carried out apart from survey Amount, without judging that front occurs that kind of barrier.So for any kind of barrier, can according to away from Variation from information carries out necessary early warning or braking.The principle of binocular camera is similar to human eye.Human eye can perceive object Distance, be that the image presented due to two eyes to the same object is had differences, also referred to as " parallax ".Object distance is remoter, Parallax is smaller；Otherwise parallax is bigger.The size of parallax corresponds to the distance of distance between object and eyes, this is also 3D film energy The reason of enough having made one stereoscopic level perception.

Binocular vision depth estimation method based on traditional computer vision such as SGM algorithm, it is not high that there are precision, speed mistake Slow feature, and algorithm is higher to condition depended, poor robustness under complex scene, is difficult to meet the requirement of business landing.Therefore The binocular vision depth estimation method based on deep learning is proposed in the present embodiment, there can be precision high, and generalization ability is strong, speed Degree is fast.

Specifically, include the following steps,

S1: acquisition training data, photographing module 100 obtain the initial picture 101 of two different perspectivess；

S2: depth generation module 200 generates the depth distance of corresponding Pictures location, and saves as in the same size with original picture The depth picture 201 with depth information, picture pixels value corresponds to relative distance；

S4: depth picture 201 is input to training in neural network model 300, passed through by the training of neural network model 300 Repetitive exercise obtains and saves trained neural network parameter；Neural network model 300 includes convolutional neural networks 301, spy Levy converged network layer 302 and 3D convolutional neural networks layer 303, including following training step, the picture that photographing module 100 is got As input；By convolutional neural networks 301, the characteristic pattern of two figures is obtained；The output of convolutional neural networks layer 301 is as special The input for levying converged network layer 302, extracts fusion feature；It is put into 3D convolutional neural networks layer 303 and extracts depth map.

More specifically, following steps,

Collected depth picture 201 is put into neural network model 300, extracts and schemes by convolutional neural networks 301 Piece feature；It is input to Fusion Features network layer 302 and carries out Fusion Features, associated characteristic matching is got up fusion generation 3D's Characteristic pattern；

Then 3D convolution is done on 3D characteristic pattern, convolution kernel size is 3 × 3 × 3, obtains position and depth on 3d characteristic pattern Fusion feature, characteristic pattern (the output result that characteristic pattern refers to 3D convolution at this time) be original size 1/4 (original size is The picture full size that camera obtains, 1/4 purpose are in order to which picture compression reduces calculation amount, and not so network is difficult less than 1s's Result is obtained in time), thus picture be upsampled to original size obtain it is (such as double with the consistent depth picture of picture size The mode of linear interpolation up-samples), and for pixel each in picture, all corresponding one group of size is D=48 depth signal, right This group of signal is normalized, and function is defined as follows:

V is corresponding depth signal, and S is the depth signal after normalization:

Obtained normalized signal is multiplied by corresponding signal weight, obtains the depth parallax information of corresponding position, directly To depth information, subsequent processing is not needed.

Penalty values backpropagation is gone back to update the parameter of the entire neural network of iteration；

Above procedure is repeated, (final depth information figure is can be understood as until network parameter update is smaller, instructs every time Practice, all variation is little, such as some corresponding points depth is 100, and next training always all is 100 or so many times, indicates Network no longer learns), the multiple above repetitive exercise does not obtain better test result, that is, determines that training has tended to be saturated, instruct White silk finishes.

S4: estimation of Depth, photographing module 100 acquire actual picture, actual picture are input to trained neural network It is calculated in model 300, the depth distance figure estimated.

It successively include convolutional neural networks 301, Fusion Features network layer 302,3D convolutional neural networks layer in the present embodiment 303 training step, specifically.

Convolutional neural networks 301 include following training step, and two depth pictures 201 are put into convolutional neural networks simultaneously Picture feature figure is extracted in 301 residual error network layer；Characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing, obtains More abundant characteristic information.

Fusion Features network layer 302 includes the following steps that the feature that the convolutional layer of convolutional neural networks 301 extracts is as defeated Enter；It is input to the convolution depth integration layer feature-rich information content；Depth information fused layer generates the matched information of depth information Layer.

3D convolutional neural networks layer 303 includes the following steps, using the output of Fusion Features network layer 302 as input；It is defeated Enter Hourglass module and extracts more abundant deep layer high dimensional information；By up-sampling layer, the depth of original picture size is obtained Spend module；Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure W_j,H_kPixel value be A_ijk, then right The output on depth map answered is the output on then corresponding depth map are as follows: D_jk=∑ A_ijk* i i=0,1,2 ... ... D.

It should be noted that the application provides a kind of method of binocular depth vision estimation based on deep learning, it is practical On by the picture of two different perspectivess obtained by the camera of two on binocular camera fixed positions, this two picture is same When be put into convolutional neural networks residual error network extraction picture feature figure, then picture feature figure is put into spatial pyramid pond layer Do feature enhancing.Two picture features with correlation are subjected to Fusion Features, wherein the purpose of Fusion Features: two differences The characteristic pattern that the picture at visual angle proposes has correlation, because two figure differences are that visual angle is different, has in characteristic pattern very much These Fusion Features are for subsequent depth extraction together by the same or similar matching characteristic.

Using fused layer neural network, supervised learning distinguishes and the calculation of traditional binocular ranging to suitable characteristic matching Method, characteristic matching here are the matching for having neural network oneself to learn, such as poor characteristic matching, can allow result very Difference, according to the quality of result, Backward Supervision neural network is adjusted to better characteristic matching.

Finally only need to provide the effective distance range of camera, such as 200 meters maximum, neural network can provide 0- on picture Depth distance information within 200 meters, beyond effective distance range, confidence level reliability is not high, therefore sets all in accordance with 200 meters It is fixed.

The present embodiment proposes the binocular vision depth estimation method based on deep learning, and precision is high.In KITTI data set On, it is 0.83% that average error pixel percentage, which may be implemented, and traditional binocular depth estimation method error rate is 3.57%.And And the method generalization ability of this patent estimation binocular depth is strong, it can be with the meaning of transfer learning wherein transfer learning: with same Neural network framework can be applied in different types of binocular camera, and not need entire re -training neural network completely, only It needs on the basis of original, sets the parameter of binocular camera, it is only necessary to update the defeated of training part of neural network network layer Parameter out, it is possible to reduce development time and difficulty, and application is under difficult environmental conditions, in speed, is in size On the picture of 1242*375, average calculating operation time is 0.32 second, compared to traditional algorithm operation time 3.7 seconds, is substantially increased Arithmetic speed also substantially conforms to the requirement that business is landed, solves the practical problem of general traditional binocular estimation of Depth scheme, In Automatic Pilot, the related fieldss such as indoor positioning have very big application prospect.

Scene one:

The vehicle of the test vehicle for disposing this method and deployment conventional method is compared test by the present embodiment, and is used Python software programming is tested on KITTI data set in the emulation testing for realizing this method and conventional method, according to Experimental result obtains emulation data, and conventional method uses SGM algorithm, SDM algorithm in experiment.Testing environment is public data collection The binocular depth test set of KITTI, by conventional method and this method operation python software realization emulation.

Performance comparison is carried out to each algorithm, tests the speed of service of each algorithm, and calculate its each error amount, Evaluated error is averaged on to KITTI test set.Experiment is compared using above-mentioned 2 kinds of conventional methods and this method Test, the signal of test result is as follows table 1.

Table 1: test result.

Algorithm	Speed	Average error pixel ratio	Mean parallax
				This patent	0.32s	1.32%	0.5px
SGM	3.7s	5.76%	1.3px
				SDM	~1min	10.95%	2.0px

Referring to data in upper table 1, it can be observed how the algorithm of this implementation is in speed and precision is all significantly better than traditional calculation Method.

Embodiment 2

Referring to the signal of Fig. 6, the system that the present embodiment proposes a kind of binocular depth vision estimation based on deep learning, on It states embodiment and can depend on this system and realized, above-described embodiment method or system can be applied to the depth of vehicle Vision estimation.Such as the binocular camera by being installed on vehicle body carries out the image information around shooting vehicle body, while vehicle-mounted Deep learning network algorithm is written in host, will shoot and carry out calculating estimation environmental objects distance far in image input algoritic module Closely, and with shown on the display screen of on-vehicle host, therefore the safety traffic of driver can be reminded.

Specifically, this system includes photographing module 100, depth generation module 200 and neural network model 300；Image mould Block 100 is the camera being fixedly installed on binocular camera, the picture of two different perspectivess for acquisition；Depth generation module 200 generate depth picture 201, and depth picture 201 and the relative distance of camera and the pixel value of image according to the picture of acquisition It is corresponding；Neural network model 300 carries out deep learning using the picture obtained and saves neural network parameter, the depth for estimation Apart from map generalization.Wherein photographing module 100 includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, colored Binocular camera is trained for acquiring picture as the input of neural network, and gray scale binocular camera is because there is better comparison Degree and resolution ratio are for generating depth map.

It should be noted that corresponding two three-dimensional depth level grey cameras can be generated the depth of corresponding Pictures location away from From finding binocular picture Corresponding matching point, count according to the pixel difference between match point, obtain depth information d=f*b/d, f It is the focal length of camera, b is the distance of two cameras of binocular, and d is the pixel difference of two match points, i.e., at a distance from camera.And it protects Save as the picture of one with original picture depth information of the same size, such as the general depth for only needing to save corresponding left hand side Hum pattern, picture pixels value correspond to relative distance, and the information of gray scale picture is generally made of Any Digit in 0-255, example If white is 255, black is 1, if the effective distance of camera is 1 to 200 meters, it can indicates deep with the pixel value of picture Information is spent, i.e., 155 indicate this point of picture from 155 meters of camera.

Network is not know how to do Fusion Features matched in untrained situation, and by constantly training, network can To provide suitable feature matching method, associated characteristic matching is got up, wherein associated mean that left figure sees one The automobile that automobile is seen with right figure has the relevant information of automobile on characteristic pattern, and the effect of Fusion Features layer is exactly This associated automobile information matching is got up, to be fused into the characteristic pattern of 3D.

Depth generation module 200 can be the computing module in depth level grey camera in the present embodiment, for example, by using The depth camera of RGBD-SLAM, detection range include detection accuracy, detection angles, frame per second；Module low power consumption simultaneously is low, deep Degree information relies on pure software algorithm and obtains, processing chip has very high calculated performance, therefore can obtain ambient enviroment object Depth information.The present embodiment is only used as the acquisition of training data, while it is not difficult to find that since processing chip needs very The slow disadvantage of high calculated performance and operation also can use after common camera shooting obtains image and directly take binocular on computers The acquisition of imaging algorithm progress depth information.After the completion of the training of neural network model 300, only general take the photograph need to be installed in vehicle body Camera, input neural network model 300 obtain depth information, and the performance not only needed is lower and arithmetic speed is fast, and nerve net Network model 300 is write-in deep learning algorithm chip, is set in on-vehicle host, such as can be the deep learning master of GPU Chip is flowed, is entirely exactly a huge calculating matrix, GPU has thousands of calculating cores, can be achieved 10-100 times Application throughput, and it is also supported to the vital computation capability of deep learning, it can be than conventional processors more Quickly, training process is greatly accelerated.GPU is one of the deep learning arithmetic element most generallyd use at present.

It should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to preferable Embodiment describes the invention in detail, those skilled in the art should understand that, it can be to technology of the invention Scheme is modified or replaced equivalently, and without departing from the spirit and scope of the technical solution of the present invention, should all be covered in this hair In bright scope of the claims.

Claims

1. a kind of method of the binocular depth vision estimation based on deep learning, it is characterised in that: include the following steps,

Training data is acquired, photographing module (100) obtains the initial picture (101) of two different perspectivess；

Depth generation module (200) generates the depth distance of corresponding Pictures location, and saves as and original picture tool of the same size There is the depth picture (201) of depth information, picture pixels value corresponds to relative distance；

The depth picture (201) is input to instruction in the neural network model (300) by neural network model (300) training Practice, is obtained by repetitive exercise and save trained neural network parameter；

Estimation of Depth, the photographing module (100) acquire actual picture, the actual picture are input to the trained mind Through being calculated in network model (300), the depth distance figure estimated.

2. the method for the binocular depth vision estimation based on deep learning as described in claim 1, it is characterised in that: the mind It include convolutional neural networks (301), Fusion Features network layer (302) and 3D convolutional neural networks layer through network model (300) (303), including following training step,

The picture that the photographing module (100) gets is as input；

By convolutional neural networks (301), the characteristic pattern of two figures is obtained；

Input of the output of convolutional neural networks layer (301) as Fusion Features network layer (302), extracts fusion feature；

It is put into 3D convolutional neural networks layer (303) and extracts depth map.

3. the method for the binocular depth vision estimation based on deep learning as claimed in claim 2, it is characterised in that: will acquire To the depth picture (201) be put into the neural network model (300), mentioned by the convolutional neural networks (301) Take picture feature；It is input to the Fusion Features network layer (302) and carries out Fusion Features, associated characteristic matching is got up to melt Symphysis at 3D characteristic pattern.

4. the method for the binocular depth vision estimation based on deep learning as claimed in claim 3, it is characterised in that: the mould Type training is further comprising the steps of,

Do 3D convolution on the 3D characteristic pattern, convolution kernel size is 3 × 3 × 3, obtains melting for position and depth on 3d characteristic pattern Feature is closed, characteristic pattern is the 1/4 of original size, therefore picture is upsampled to original size and is obtained and the consistent depth of picture size Picture is spent, and for pixel each in picture, all corresponding one group of size is D=48 depth signal, is returned to this group of signal One changes, and function is defined as follows:

V is corresponding depth signal, and S is the depth signal after normalization:

5. the method for the binocular depth vision estimation based on deep learning as claimed in claim 4, it is characterised in that: further include The step of network more new stage,

That is, it compares, obtained disparity map and true disparity map using smooth by the collected depth map of depth camera L1 loss function obtain the penalty values of network, loss function formula is as follows, x be corresponding position data difference:

Above procedure is repeated, until network parameter update is smaller, the multiple above repetitive exercise does not obtain better test result, Determine that training has tended to be saturated, training finishes.

6. the method for the binocular depth vision estimation based on deep learning as described in claim 2~5 is any, feature exist In: the convolutional neural networks (301) include following training step,

Two depth pictures (201) are put into the residual error network layer of the convolutional neural networks (301) simultaneously and extract figure Piece characteristic pattern；

The characteristic pattern, which is put into the layer of spatial pyramid pond, carries out feature enhancing, obtains more abundant characteristic information.

7. the method for the binocular depth vision estimation based on deep learning as claimed in claim 6, it is characterised in that: the spy Converged network layer (302) are levied to include the following steps,

The feature that the convolutional layer of convolutional neural networks (301) extracts is as input；

It is input to the convolution depth integration layer feature-rich information content；

Depth information fused layer generates the matched Information Level of depth information.

8. the method for the binocular depth vision estimation based on deep learning as claimed in claim 7, it is characterised in that: the 3D Convolutional neural networks layer (303) includes the following steps,

Using the output of the Fusion Features network layer (302) as input；

It inputs Hourglass module and extracts more abundant deep layer high dimensional information；

By up-sampling layer, the depth module of original picture size is obtained；

Size is D*W*H, and wherein meaning is the figure for having D Zhang great little, it is assumed that i-th figure W_j,H_kPixel value be A_ijk, then corresponding Output on depth map is the output on then corresponding depth map are as follows:

D_jk=∑ A_ijk* i (i=0,1,2 ... ... D).

9. it is a kind of based on deep learning binocular depth vision estimation system, it is characterised in that: including photographing module (100), Depth generation module (200) and neural network model (300)；

The photographing module (100) is the camera being fixedly installed on binocular camera, the figure of two different perspectivess for acquisition Piece；The depth generation module (200) generates depth picture (201) according to the picture of acquisition, and depth picture (201) and camera Relative distance it is corresponding with the pixel value of image；The neural network model (300) carries out deep learning using the picture obtained Save neural network parameter, the depth distance map generalization for estimation.

10. the system of the binocular depth vision estimation based on deep learning as claimed in claim 9, it is characterised in that: described Photographing module (100) includes 2 groups of cameras, respectively colored binocular camera and gray scale binocular camera, the colour binocular camera It is trained for acquiring picture as the input of neural network, the gray scale binocular camera is because having better contrast and dividing Resolution is for generating depth map.