CN107945265B - Real-time dense monocular SLAM method and system based on on-line study depth prediction network - Google Patents
Real-time dense monocular SLAM method and system based on on-line study depth prediction network Download PDFInfo
- Publication number
- CN107945265B CN107945265B CN201711227295.6A CN201711227295A CN107945265B CN 107945265 B CN107945265 B CN 107945265B CN 201711227295 A CN201711227295 A CN 201711227295A CN 107945265 B CN107945265 B CN 107945265B
- Authority
- CN
- China
- Prior art keywords
- depth
- picture
- training
- cnn
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/08—Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The real-time dense monocular SLAM method based on on-line study depth prediction network that the invention discloses a kind of: the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and obtains half dense map of present frame using the depth of triangulation prediction high gradient point;On-line training picture pair is selected, CNN network model is updated using block-by-block stochastic gradient descent method on-line training, and depth prediction is carried out to present frame picture using CNN network model after training and obtains dense map;It according to half dense map of the present frame and predicts that dense map carries out depth dimension recurrence, obtains the absolute measure factor of present frame depth information;It selects each pixel depth predicted value of the present frame to obtain predetermined depth figure according to described two projection results using NCC score voting method, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.The present invention also provides the real-time dense monocular SLAM systems accordingly based on on-line study depth prediction network.
Description
Technical field
The invention belongs to Computerized 3 D visual reconstruction technique fields, deep based on on-line study more particularly, to one kind
The real-time dense monocular SLAM method and system of degree prediction network.
Background technique
Immediately positioning can with map reconstruction technology (Simultaneous Localization And Mapping, SLAM)
To predict the pose of sensor in real time and reconstruct the 3D map of ambient enviroment, thus it is existing in unmanned plane avoidance and enhancing
The fields such as real play an important role.Wherein, the SLAM system for relying solely on single camera as input pickup is claimed
For monocular SLAM system.Monocular SLAM has the characteristics such as low-power consumption, hardware threshold be low and easy to operate, and it is extensive to be studied personnel
It uses.But the monocular SLAM system of existing prevalence, PTAM (the Parallel Tracking either based on characterization method
And Mapping For Small AR Workspaces) and ORB-SLAM (Orb-slam:AVersatile And
Accurate Monocular Slam System), or using the LSD-SLAM (Lsd-slam:Large-scale of direct method
Direct Monocular Slam), all there is a problem of two it is main: (1) the sparse of scene can only be constructed or half is dense
Map, because only that a small number of key point or the depth of high gradient point can be calculated;(2) there is scale uncertainty,
The phenomenon that there are scale drifts.
In recent years, for depth convolutional neural networks (the Convolutional Neural of monocular picture estimation of Depth
Network, CNN) huge progress is had been achieved for, cardinal principle is to learn the depth of object among a large amount of training data
Inner link between degree and shape, texture, Scene Semantics and scene context etc., thus Accurate Prediction be input to network it
The depth information of middle picture.CNN not only can be improved to the percentage of head rice for building figure in conjunction with monocular SLAM, can also obtain absolute
Dimensional information, therefore compensate for the shortcomings and deficiencies of monocular SLAM.Currently, the system that the two combines is referred to as CNN- by most successful
SLAM (Cnn-slam:Realtime Dense Monocular Slam With Learned Depth Prediction), should
System using the result of CNN depth prediction as the initial depth value of SLAM key frame, then using pixel matching, triangulation and
The method of figure optimization optimizes the depth of high gradient point in key frame, to obtain dense 3D reconstructed results, and makes ruler
Degree information is more nearly true scale.Although achieving certain effect, there are still following problems for the system:
(1) depth value of only a small number of high gradient pixels is optimised, and the depth value of most of low gradient pixel point does not have
Variation causes reconstruction effect undesirable, especially for unknown scene;(2) depth of high gradient pixel among CNN output is utilized
Information is spent to predict that dimensional information is not accurate enough, is caused initialization to be not enough, be will increase the mistake that SLAM system builds figure and tracking
Difference.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of by on-line study depth prediction net
The method and system that network is combined with monocular SLAM, its object is to make full use of the realization pair of the advantage of depth convolutional neural networks
Estimate in the dense depth of monocular SLAM system core frame, and according to the true dimensional information of result restoration scenario, thus solves to pass
System monocular SLAM lacks dimensional information and can not achieve dense the technical issues of building figure.
To achieve the above object, according to one aspect of the present invention, it provides a kind of based on on-line study depth prediction net
The real-time dense monocular SLAM method of network, comprising:
(1) key frame is selected from the sequence of pictures that monocular vision sensor is acquired by rotation and translation motion, passed through
The luminosity error for minimizing high gradient point optimizes to obtain the camera posture of key frame, and predicts high gradient using triangulation
The depth of point obtains half dense map of present frame;
(2) according to the key frame select on-line training picture pair, using block-by-block stochastic gradient descent method according to described in
Line training picture updates CNN network model to on-line training is carried out, and is carried out using CNN network model after training to present frame picture
Depth prediction obtains dense map;
(3) depth dimension recurrence is carried out according to half dense map of the present frame and the dense map of prediction, obtained current
The absolute measure factor of frame depth information;
(4) the dense map of prediction is projected in previous keyframe by pose transformation according to camera posture, and root
Described half dense map is projected in previous keyframe according to the absolute measure factor, using NCC score voting method according to
Described two projection results select each pixel depth predicted value of the present frame to obtain predetermined depth figure, and to the pre- depth measurement
Degree figure carries out Gauss and merges to obtain ultimate depth figure.
It is described that on-line training picture is selected according to the key frame in one embodiment of the present of invention, specifically: using such as
Lower constraint condition screens picture frame in key frame before and after frames picture and the key frame constitutes picture pair:
First, camera motion constraint: the displacement between two frame pictures in horizontal direction meets | tx| > 0.9*T, wherein T generation
Parallax range between two frame picture of table;
Second, for every a pair of of picture, the mean parallax of the vertical direction between picture disparity constraint: is calculated using optical flow method
Disavg, only work as DisavgJust this can be saved as candidate training picture to picture when less than preset threshold δ;
Third, diversity constraint: the same key frame can only generate a pair of of training picture;
4th, training pool capacity-constrained:, just will be in training pool when the quantity of training picture pair reaches given threshold V
Picture be sent to network, on-line training is carried out to network, saves the obtained network model of training, while emptying training pool continuation
It is trained the screening of data.
In one embodiment of the present of invention, using block-by-block stochastic gradient descent method according to the on-line training picture come online
Training updates CNN network model, specifically:
Convolutional layer among ResNet-50 is divided into 5 blocks, wherein each block is embodied as conv1, conv2_x,
conv3_x,conv4_x,conv5_x;Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by 3X3's
Convolutional layer and 3 bottleneck structure blocks form for 10 layers totally;Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks;Conv4_x is by 6
Bottleneck structure block forms for 18 layers totally: conv5_x is made of for 9 layers totally 3 bottleneck structure blocks, and five parts add up and constitute
50 layers of structure of ResNet-50;
Among the process of on-line study each time and update, iteration k, only updates the parameter W of a part each timei(i
=1,2,3,4,5), keep remaining 4 subnetwork layer parameters constant, and in next iteration, i-th piece of parameter is updated,
Middle i=(i+1) %5;Other layer parameters remain unchanged, and the iteration of entire on-line study and update is carrying out always, until default
Stop condition is satisfied.
In one embodiment of the present of invention, it is selective updating that the on-line training, which updates CNN network model, specifically:
The training loss function that every a batch is input to picture among CNN network model is calculated, once a collection of picture is all
The loss function of picture is both greater than preset threshold value Lhigh, will start on-line study and update process, on-line study and
The process of update will carry out always, until the loss function of training picture drops to threshold value LlowUnder or iteration number
Preset threshold value is reached.
In one embodiment of the present of invention, the depth dimension homing method are as follows: RANSAC algorithm or least-squares algorithm.
It is described that the dense map of prediction is projected into a upper key by pose transformation in one embodiment of the present of invention
In frame, and described half dense map is projected in previous keyframe according to the absolute measure factor, is voted using NCC score
Method selects each pixel depth predicted value of the present frame to obtain predetermined depth figure according to described two projection results, specifically
Are as follows:
By key frame i each of pixel p, according to CNN predict dense map Dcnn(p) it is converted with poseThe pixel is projected to therewith in nearest key frame i-1, the result of projection is expressed as p 'cnn;
Pixel p among key frame i is done into another projection, is mapped in key frame i-1 and is denoted as p 'sd, projection is
Result D based on half dense mapsp(p) and absolute scale factor;
The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdSmall region is nearby chosen, and calculates separately region R (p)
With RcnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdNormalized-cross-correlation function between (p ')
NCCsdIf NCCcnnLess than NCCsd, then show the depth prediction result of half dense depth map be better than CNN's as a result, choosing
Select Dsp(p) as the ultimate depth predicted value of pixel p, R is otherwise selectedcnn(p '), if some points only have the prediction knot of CNN
Fruit just uses RcnnThe ultimate depth of (p ') as pixel p.
In one embodiment of the present of invention, Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure, specifically
Are as follows:
The depth map obtained to NCC score voting method is further processed, according to the context relation between key frame, and
And the uncertainty figure of key frame depth maps is combined to carry out combined optimization, final depth map is obtained by combined optimization.
In one embodiment of the present of invention, depth prediction is carried out to present frame picture using CNN network model after training and is obtained
Into dense map further include:
By the depth value of each pixel in depth map multiplied by a scale coefficient
Wherein, fadaptedFor the monocular camera focal length for obtaining training data online, BadaptedFor binocular training picture
Baseline, fpre-trainAnd Bpre-trainRespectively for training the focal length and baseline of original CNN network model training picture.
In one embodiment of the present of invention, the key frame are as follows: define what whole image sequence or camera obtained in real time
First picture is key frame, and in addition to first frame, a part of picture frame of back can also be defined as key frame, wherein definition is closed
The principle of key frame is the translation monitored present frame therewith between previous nearest key frame and whether rotation has reached and set in advance
Fixed threshold value.
It is another aspect of this invention to provide that additionally providing a kind of real-time dense list based on on-line study depth prediction network
Mesh SLAM system, including direct method monocular SLAM module, online adaptive CNN prediction module, absolute measure regression block and
Depth map Fusion Module, in which:
The direct method monocular SLAM module, the figure for being acquired from monocular vision sensor by rotation and translation motion
Key frame is selected in piece sequence, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and
Half dense map of present frame is obtained using the depth of triangulation prediction high gradient point;
The online adaptive CNN prediction module, for according to the key frame select on-line training picture pair, using by
Block stochastic gradient descent method updates CNN network model to on-line training is carried out according to the on-line training picture, and after utilization training
CNN network model carries out depth prediction to present frame picture and obtains dense map;
The absolute measure regression block, for being carried out according to the half dense map and the dense map of prediction of the present frame
Depth dimension returns, and obtains the absolute measure factor of present frame depth information;
The depth map Fusion Module is projected for being converted the dense map of prediction by pose according to camera posture
Described half dense map is projected in previous keyframe into previous keyframe, and according to the absolute measure factor, is used
NCC score voting method selects each pixel depth predicted value of the present frame to obtain pre- depth measurement according to described two projection results
Degree figure, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have following beneficial to effect
Fruit: this hair clipper, using based on direct method, obtains the half dense map and phase of scene using monocular SLAM by the way of optimization
Machine posture;Online adaptive CNN uses Weakly supervised depth prediction network, and carries out online updating according to scene information,
So that network has good effect under unknown scene;Depth dimension returns the dimensional information of available depth value, is used to
Improve the accuracy that 3D is rebuild;Data fusion is in such a way that region is voted and Gauss merges, the case where guaranteeing percentage of head rice
Under, improve the precision of result.
Detailed description of the invention
Fig. 1 is the real-time dense monocular SLAM Method And Principle in the embodiment of the present invention based on on-line study depth prediction network
Schematic diagram;
Fig. 2 is intermediate cam mensuration model schematic of the embodiment of the present invention;
Fig. 3 is the constraint relationship that training picture is screened in the embodiment of the present invention;Wherein figure (a) is the corresponding pass of the first pixel
The image pair of system, figure (b) are the images pair of second of pixel corresponding relationship;
Fig. 4 is mesoscale coefficient adjustment schematic diagram of the embodiment of the present invention, and wherein top half is original network structure, under
Half part is improvement of the present invention for network;
Fig. 5 is block-by-block gradient descent method in the embodiment of the present invention (block-wise SGD) schematic diagram;
Fig. 6 is the recurrence of mesoscale of the embodiment of the present invention and effect picture;
Fig. 7 is the real-time dense monocular SLAM system structure based on on-line study depth prediction network in the embodiment of the present invention
Schematic diagram.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
The problem to be solved in the present invention is to realize that a real-time monocular is dense to build figure SLAM system, and the system is using adaptive
The mode for answering online CNN depth prediction network and the monocular SLAM system based on direct method to combine, can not only significantly improve
For the accuracy and Lu Fangxing of the prediction of unknown scene depth, additionally it is possible to solve that monocular SLAM system scale is probabilistic to ask
Topic.
To achieve the goals above, present invention employs the modes by CNN in conjunction with SLAM, for existing for monocular SLAM
Problem, proposes the more preferable stronger algorithm of putting property of Shandong of accuracy, and the main innovation point of the program includes:
(1) use adaptive online CNN depth prediction network, this be also entire field for the first time by the type network with
Monocular SLAM system combines, and does so the accuracy that can greatly improve system depth prediction in unknown scene;
(2) one kind method of " block-by-block stochastic gradient descent " (block-wise SGD) and the plan of selective updating are proposed
Slightly, make CNN that can obtain better depth prediction result under the conditions of limited training data;
(3) a kind of absolute measure homing method based on adaptive network is devised, depth prediction can be greatlyd improve
Accuracy, and make whole system track and build figure precision it is higher.
The system is mainly by four component parts: direct method monocular SLAM, online adaptive CNN, depth dimension return and
Data fusion, Method And Principle block diagram are as shown in Figure 1.Monocular SLAM uses direct method, and based on direct method, using optimization
Mode obtains the half dense map and camera posture of scene;Online adaptive CNN uses Weakly supervised depth prediction network, and
And online updating is carried out according to scene information, so that network has good effect under unknown scene;Depth dimension recurrence can
To obtain the dimensional information of depth value, for improving the accuracy of 3D reconstruction;Data fusion uses region ballot and Gauss is melted
The mode of conjunction improves the precision of result in the case where guaranteeing percentage of head rice.
Specifically, the method includes following processes:
(1) direct method monocular SLAM: this part is the transformation carried out on the basis of LSD-SLAM, high by minimizing
The luminosity error of gradient point, optimization obtains the camera posture of each frame picture, and predicts high gradient using triangulation
The depth of point, to obtain half dense map;
Picture collection: this method is based on monocular vision sensor, when acquiring picture, it is desirable that monocular camera has rotation
It is suitably increased with the amplitude of the movement of translation, and translation.There are two the reason of doing so is main: first is that if only existing
The case where static and pure rotation, it is likely that cause this part initialization failure or picture tracking failure, in turn result in entire
The irregular working of system;Second is that the amplitude appropriate for increasing translation, facilitates system and selects suitable training picture, to protect
Card on-line training is normally carried out with update CNN process.
Crucial frame definition: the first picture that the part monocular SLAM defines entire sequence or camera obtains in real time is
Keyframe (key frame), in addition to first frame, a part of picture frame of back can also be defined as key frame, wherein definition is crucial
The principle of frame is the translation monitored present frame therewith between previous nearest key frame and whether rotation has reached and preset
Threshold value;Algorithm composition based on key frame is that the basis of the direct method rear end monocular SLAM optimization is also the important frame of network portion
Frame structure, it is therefore desirable to especially introduce.
The tracking of camera posture: the movement of camera in three dimensions shares six-freedom degree, the amount of exercise within the Δ t time
A sextuple array representation: ξ=[v can be used(1) ν(2) ν(3) ψ(1) ψ(2) ψ(3)]T.Wherein [ν(1) ν(2) ν(3)]TIndicate rigid body
Amount of exercise is along three translation of axes components, and [ν(1) ν(2) ν(3)]T∈R3It is the vector in theorem in Euclid space;[ψ(1) ψ(2)
ψ(3)]TIndicate rigid motion along the rotational component of three axes, and [ψ(1) ψ(2) ψ(3)]T∈ SO (3) is non-European three-dimensional rotation
Vector in group SO (3).The camera of view-based access control model is exactly the process that tracking solves ξ by visual information.It is of the present invention
Monocular SLAM tracks camera posture using direct method, and all the points for scheming to have depth information in A are projected in figure B, obtain one
The picture B' of Zhang Xin, by optimization B' and B between all positions gray value difference summation (photometric error) come
Obtain change in location of the B relative to A.Direct method can preferably cope with the feelings such as visual angle change, illumination variation, scene texture be sparse
Condition is a kind of method popular at present, therefore this project is used and realized camera posture tracking based on direct method.
Specifically, direct method is the key that in present frame n and at a distance of nearest for the crucial idea of camera posture tracking
An optimal camera posture is found between frame kSo that the light measurement error between present frame n and key frame k is minimum.?
The presence in even region is likely to result in the pixel matching inaccuracy of interframe, because of different camera posturesIt might have similar
Light measurement error.In order to obtain the tracking result of putting property of Gao Lu and reduce optimization time overhead used, light measurement error r
Only the high gradient point { p } in key frame k is calculated, as follows:
Wherein D (p) represents the depth value of high gradient pixel p, and π is that the model of projection can be by the 3D in camera coordinates system
Spatial point PcProjecting to 2D picture image plane vegetarian refreshments p, π is determined by camera internal reference K.
Similarly, π-1It is the model of back projection, the pixel of 2D plane can be projected into 3d space.One by excellent
Camera posture after changeIt can be calculated by minimizing the light measurement error r of all high gradient pixels, it is as follows
It is shown:
Wherein wpBe for improve Lu Fangxing and minimize exceptional value influence pixel p weight.
(2) the problem of formula, can be solved by the Gauss-Newton optimization algorithm of a standard.Certainly, this to pass through the above method
Drift can be generated because of the accumulation of error by carrying out camera posture tracking, but this drift can be by additionally adding winding inspection
It surveys to eliminate, this project proposed adoption solves cumulative errors bring drifting problem based on the winding detection method of bag of words.
The estimation of half dense depth: compared for building the thread of figure by small Baseline Stereo in monocular direct method SLAM system
Method estimates the depth value of high gradient pixel, that is, the method for pixel matching used by triangulation.It is specific and
Speech characteristic matching and the model of triangulation are illustrated in fig. 2 shown below, and C and C' are the camera coordinates of key frame and reference frame respectively
It is origin, X is the 3D point that calculate depth, and m and m' are respectively projection of the point X on video camera C and C' video camera projection plane.
Because key frame and reference frame derive from the same video camera in monocular vision, their projection internal reference is phase
With, if obtaining the translation of the rotation between two camera coordinate systems [R, t] according to visible sensation method at this time, then there is following formula:
Wherein, fx,fy,cx,cy, s is the internal reference of camera, and R, t are the matrix of one 3 × 3 and 3 × 1 respectively, indicates camera shooting
Rotation and translation of the machine C' coordinate system relative to video camera C coordinate system, (xc,yc,zc)T, (xc',yc',zc')TRespectively indicate point X
Homogeneous coordinates at camera coordinate system C and C', (u, v)T, (u', v')TPoint X is respectively indicated in the projection of video camera C and C'
Pixel coordinate in plane.Since internal reference matrix is given value after camera calibration, [R, t] can be obtained by the positioning of front, institute
With (m11...m34) it is datum, then above formula can abbreviation are as follows:
That is: A (xc,yc,zc)T=b, equation group contain 3 unknown numbers, and 4 equations are an over-determined systems, to it
Least square solution is asked to solveSo that meeting
Once a new key frame k is created, its depth map DpriWith the uncertainty U of depth value predictionpriFirst
It can be initialised, by by the depth D of -1 key frame of kthk-1With uncertainty Uk-1Project to current key frame, specific practice
It is as follows:
Dpri=Dk-1-tz(6)
Wherein, tzIt is translation of the camera on boresight direction, σ2Indicate the standard deviation of initialization noise.The depth of initialization
Figure can constantly be repaired according to later picture frame, and repair process is high along each in bipolar line retrieval present frame first
Matched pixel point of the gradient pixel point p in key frame k, wherein the retrieval section on bipolar line be by pixel p depth not
What degree of certainty determined;Once having found matched pixel point, the depth value of p can be calculated by triangulation.This hair
The bright whole process that pixel matching and triangulation is indicated with a function F, is based on F, the depth value that the present invention obtains
Observed result DobsIt can be expressed as shown below:
Wherein IkAnd IcurKey frame k and current picture frame are respectively represented,Represent key frame k to current image frame it
Between camera motion, K represents the internal reference matrix of camera.The observation D of depthobsUncertainty UobsIt is generated by noise, these
Noise is present in IkAnd IcurBetween pixel matching and camera motionEstimation procedure among.Depth after repairing
Figure and its corresponding uncertainty are exactly a kind of fusion of initial depth information and Observational depth information in fact, meet following point
Cloth:
(2) online adaptive training CNN, and key frame is predicted using the CNN to obtain dense map (dense
Depth map):
Online adaptive CNN: firstly, present invention employs carry out list based on a kind of state-of-the-art Weakly supervised method
Width picture estimation of Depth.The network architecture of this kind of Weakly supervised method mainly consists of two parts, and first part is based on ResNet-
50 full convolutional layer (ConvLayers);Second part is then the pond layer and full articulamentum for being located at back in ResNet-50
(FCLayers) a series of up-sampling regions being made of warp lamination and short articulamentum are replaced with.The entire CNN of training is needed into
Pair the stereotome through overcorrection, the baseline B of these stereotomespre-trainWith camera focus fpre-trainIt is fixed.
The output of network is disparity map, and the reconstruct image of source picture, the light between source picture and reconstructed picture can be generated according to disparity map
Degree learns error and constitutes the loss function of whole network plus smooth item.In experiment, pass through monocular SLAM system available one
Serial key frame { I1,…Ij{ T is translated between picture1,2,…,Ti-1,i,…,Tj-1,jAnd rotationally-varying { R1,2,…,Ri-1,i,…,
Rj-1,j, for the present invention using this information as true value, study can minimize the two dimension of re-projection error between any two key frames
The depth map of picture.
Pre-training CNN: entire online adaptive CNN is based partially on the CNN network model of pre-training of the present invention, in pre-training
When network model, the present invention uses CMU data set (the Wean Hall dataset) according to traditional CNN training method
To picture, totally 42098 pairs of pictures are as training set plus this laboratory scene 35588 that oneself is recorded for 6510 pairs of pictures, to network
It is trained.Wherein the baseline of training set picture is 0.12 meter, this laboratory scene picture is shot using ZED stereoscopic camera
It arrives, and uses random color, scale and mirror image variation carry out data enhancing to training set.All training set picture warps
It crosses after processing and is input to network and is iterated, the number of iterations 40000 times, learning rate 0.0001 has finally obtained and wanted
Pre-training model, whole system also can carry out on-line study and update based on this model.
Under single video scene sequence, train CNN that can all save and update the model of network each time, and with new mould
Type generates depth map.Online adaptive CNN strategy mainly has following four:
1) on-line study picture screens: depth prediction network needs the picture of pairs of stereoscopic camera shooting as training figure
Piece, these stereotomes have fixed baseline Bpre-train.For training in real time and CNN network model is updated, the present invention is in list
Rule while mesh camera motion according to binocular camera collects pairs of monocular picture come simulating stereo picture.The present invention uses
The requirement of high standard collects reliable trained picture to reduce the CNN network model of noise generation to the mistake of error sample
Fitting phenomenon.The present invention devises four main screening conditions: first, camera motion constraint.Level side between two frame pictures
Upward displacement meets | tx| > 0.9*T, wherein T represents the parallax range between two frame pictures
Second, disparity constraint.For every a pair of of picture, the mean parallax of the vertical direction between picture can be all calculated using optical flow method
Disavg, only work as DisavgThis is saved as into candidate training picture to picture less than threshold value δ (being taken as 5 when experiment) Shi Caihui.Effect
Fruit is (b) two pairs of pictures respectively as shown in figure 3, (a), and the pixel relationship among every a pair of of picture meets closes as shown in (a)
When being, such image will be lost for trained candidate picture when their relationship is as shown in (b) to will be screened
It discards;Third, diversity constraint.The screening of every a pair of of training picture is all uniquely corresponding with key frame picture, that is to say, that
The same key frame can only at most generate a pair of of training picture;4th, training pool capacity-constrained.Whenever the quantity of training picture pair
When reaching threshold value V (taking 4 when experiment), the picture in training pool is just sent to network, on-line training is carried out to network, saves instruction
The network model got, while emptying the screening that training pool continues training data;
2) camera parameter adjusts: for obtaining the monocular camera focal length f of training data onlineadaptedWith binocular training picture
Baseline BadaptedFocal length f very possible and for training original CNN network model training picturepre-trainAnd baseline
Bpre-trainIt is very different.Network structure that the relationship between camera parameter and scene depth value had been implied be dissolved into it
In, so if inputting network test with different focal length picture, the absolute measure of obtained 3D reconstructed results may inaccuracy.
Therefore, whole network needs to be adjusted to accommodate the variation of different cameral parameter, it is done so that can make on-line study each time
Renewal speed it is slack-off.In order to solve this problem, the new approaches of adjustment output depth map, basic conception such as Fig. 4 institute are proposed
Show, basic thought be by by the depth value of each pixel in depth map multiplied by a scale coefficientTo guarantee the accuracy of depth map;
3) block-by-block SGD method: stochastic gradient descent (stochastic gradient descent, SGD) is mainstream now
A kind of deep learning most fully optimized algorithm.Its main thought is to be divided into n batch first for training dataset,
Each batch includes m sample.The parameter for updating network every time all only utilizes the data of a batch, rather than entirely trains
Collection.
Advantage is: when there are many training data, the pressure of machine can be reduced using batch, and can quickly receive
It holds back;When training set has many redundancies (similar sample occurs multiple), batch method restrains faster.
The disadvantage is that: it is easy to converge to local optimum, is not global optimum.
It is proposed that block-by-block gradient descent method (block-wise SGD) be in stochastic gradient descent method (stochastic
Gradient descent, SGD) on carry out primary innovative improve.
ResNet-50 is used to extract the characteristic information of different stage in picture by the present invention, these characteristic informations are subsequent
It can be operated and be encoded among disparity map by a series of down-samplings.In order to reduce since training picture limitation causes CNN mistake
The risk of fitting, the invention proposes the new methods of a kind of " block-by-block stochastic gradient descent " (block-wise SGD), will
Convolutional layer among ResNet-50 has been divided into 5 blocks, as shown in figure 5, wherein each block is embodied as conv1, conv2_
x,conv3_x,conv4_x,conv5_x.Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by a 3X3
Convolutional layer and 3 bottleneck structure blocks (each bottleneck structure block is 1X1 64,3X3 64,1X1 256) totally 10 layers of composition;
Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks (each bottleneck structure block is 1X1 128,3X3 128,1X1 512);
Conv4_x is made of for 18 layers totally 6 bottleneck structure blocks (each bottleneck structure block is 1X1 256,3X3 256,1X1 1024):
Conv5_x is made of for 9 layers totally 3 bottleneck structure blocks (each bottleneck structure block is 1X1 512,3X3 512,1X1 2048), and five
A part, which adds up, constitutes 50 layers of structure of ResNet-50.Among the process of on-line study each time and update, each time
Iteration k only updates the parameter W of a parti(i=1,2,3,4,5) keeps remaining 4 subnetwork layer parameters constant.And
In next iteration, i-th piece of (i=(k+1) %5) parameter is updated, other layer parameters are remained unchanged, thereby reduced each time more
The complexity of new network.The iteration of entire on-line study and update is carrying out always, until stop condition is satisfied (such as iteration
Number limitation, or training loss function reach preset threshold value);
4) on-line study and CNN network model just selective updating: are carried out whenever having the generation of suitable training data
It updates, this way be easy to cause unnecessary computing cost.As long as current CNN network model can provide current scene
Sufficiently exact depth prediction is as a result, just use always current CNN network model, until forced progress network model
Adjustment.Based on this thinking, the present invention devises the operating mode of a kind of " Systematic selection update ", defeated by calculating every a batch
Enter the training loss function to picture among CNN network model, once the loss function of all pictures of a collection of picture is both greater than
Preset threshold value Lhigh, the process of on-line study and update will be started.The process of on-line study and update will be one straight
Row, until the loss function of training picture drops to LlowUnder or the number of iteration reached preset threshold value.This
A strategy not only largely reduces calculation amount, and can satisfy and network depth prediction result precision is wanted
It asks.
(3) depth dimension returns: the camera posture with accurate dimensional information has for selecting suitable training picture
Significance directly influences the output result of network.Since monocular SLAM system is unable to get absolute scale, this hair
It is bright to propose the method for a kind of " accurate scale regression is carried out based on adaptive CNN ".We depict Dsd(p)-Dgt(p) between
Relational graph, it is blue as shown in fig. 6, wherein the black line in (b) figure is the camera posture of GroundTruth in scene (true value)
Line is the camera posture that monocular SLAM is obtained, and red line is then to use RANSAC algorithm to return to obtain scale and be applied to camera
Result after posture;It was found that Dsd(p) (depth for the high gradient point p that monocular SLAM is obtained) and Dgt(p) (pixel p point it is true
Depth value) ratio represent the absolute measure information of p point.It proposes to close using the depth of all high gradient points based on this present invention
Be to return out the thinking of absolute measure information, but in practical application true depth information be it is unknown, so this hair
The bright prediction result using CNN carries out scale regression.In view of there is the unfavorable shadows of some exceptional values for CNN predetermined depth
It rings, we have tested two kinds of algorithms of RANSAC algorithm and least square respectively and have carried out scale regressions, in experimental result such as Fig. 6 (a)
Shown in green line and red line, it was demonstrated that more accurate fitting effect can be obtained using RANSAC algorithm, therefore the embodiment of the present invention is adopted
With the method for RANSAC.After we calculate the absolute measure of depth information with this method, according to mapping relations
The dimensional information of available posture again can improve the tracking precision of monocular SLAM system, as shown in Fig. 6 (b), this hair in turn
Bright two in TUM data set scene is tested, and wherein blue portion is the posture of monocular SLAM tracking, black portions
For real depth information, RED sector by dimensional information increase among monocular SLAM tracking as a result, showing that this kind of method can
Scale is tracked with preferable fitting.
(4) data fusion: for each key frame, our available two depth maps, one is monocular SLAM warp
Result D after crossing optimizationsd, another Zhang Ze is the prediction result D of CNNcnn.The present invention devises a kind of " ballot of NCC score and height
This fusion combines " mode, to reach best combination effect.The process is made of two parts, and first part obtains for NCC
Divide ballot.NCC (Normalized Cross Correlation) is the abbreviation of normalized crosscorrelation, for calculating two pictures
Correlation between region A and B, calculation formula areIt is crucial
Frame i each of pixel p, according to CNN predict depth map Dcnn(p) it is converted with poseThe pixel is projected
To in nearest key frame i-1, the result of projection is expressed as p ' therewithcnn;It is similar, among key frame i
Pixel p does another projection, is mapped in key frame i-1 and is denoted as p 'sd, but projecting is the result based on half dense map
Dsp(p) and absolute scale factor.The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdSmall region is nearby chosen, and
Calculate separately region R (p) and RcnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdReturn between (p ')
One changes cross-correlation coefficient NCCsd.If NCCcnnLess than NCCsd, then showing that the depth prediction result of half dense depth map will be got well
In CNN's as a result, selection Dsp(p) as the ultimate depth predicted value of pixel p, otherwise, R is selectedcnn(p′).If some point
The only prediction result of CNN, we just use RcnnThe ultimate depth of (p ') as pixel p.Second part is Gauss fusion.
Depth map obtained in the previous step is further processed, according to the context relation between key frame, and combines key frame deep
The uncertainty figure for spending figure carries out combined optimization, and here it is so-called Gauss fusions.Final depth is obtained by combined optimization
Figure.We are tested in the sequence of scenes of multiple data sets among experiment, achieve relatively good effect.
Due to having used CNN, the dense SLAM system of our monocular needs that fine reality can be obtained using the acceleration of GPU
When property effect.On TUM data set and ICL-NUIM data set, our algorithm is tested, and is currently with direct method
Basis monocular SLAM system LSD-SLAM new at first compare, we posture tracking precision absolute orbit error by
0.622m is reduced to 0.231m.(point of the error within 10% accounts for full figure to the percentage of head rice of key frame depth maps in depth map
Ratio) by 0.61% it has been increased to 26.47%;Compared with using Weakly supervised depth prediction network merely, key frame depth maps
Percentage of head rice has been increased to 26.47% by 21.05%.In addition, the speed of service of whole system also can achieve real-time effect.
Further, as shown in fig. 7, the present invention also provides a kind of based on the real-time thick of on-line study depth prediction network
Close monocular SLAM system, including direct method monocular SLAM module 1, online adaptive CNN prediction module 2, absolute measure return mould
Block 3 and depth map Fusion Module 4, in which:
The direct method monocular SLAM module 1, for what is acquired from monocular vision sensor by rotation and translation motion
Key frame is selected in sequence of pictures, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and
And half dense map of present frame is obtained using the depth of triangulation prediction high gradient point;
The online adaptive CNN prediction module 2 is used for selecting on-line training picture pair according to the key frame
Block-by-block stochastic gradient descent method updates CNN network model to on-line training is carried out according to the on-line training picture, and utilizes training
CNN network model carries out depth prediction to present frame picture and obtains dense map afterwards;
The absolute measure regression block 3, for according to half dense map of the present frame and predict dense map into
Row depth dimension returns, and obtains the absolute measure factor of present frame depth information;
The depth map Fusion Module 4 is thrown for being converted the dense map of prediction by pose according to camera posture
Shadow projects to described half dense map in previous keyframe into previous keyframe, and according to the absolute measure factor, adopts
Each pixel depth predicted value of the present frame is selected to be predicted according to described two projection results with NCC score voting method
Depth map, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (9)
1. a kind of real-time dense monocular SLAM method based on on-line study depth prediction network, which is characterized in that including as follows
Step:
(1) key frame is selected from the sequence of pictures that monocular vision sensor is acquired by rotation and translation motion, passes through minimum
The luminosity error for changing high gradient point optimizes to obtain the camera posture of key frame, and using triangulation prediction high gradient point
Depth obtains half dense map of present frame;
(2) on-line training picture pair is selected according to the key frame, using block-by-block stochastic gradient descent method according to the online instruction
Practice picture and update CNN network model to on-line training is carried out, and depth is carried out to present frame picture using CNN network model after training
Prediction obtains dense map;Wherein, using block-by-block stochastic gradient descent method according to the on-line training picture to carrying out on-line training
CNN network model is updated, specifically:
Convolutional layer among ResNet-50 is divided into 5 blocks, wherein each block is embodied as conv1, conv2_x,
conv3_x,conv4_x,conv5_x;Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by 3X3's
Convolutional layer and 3 bottleneck structure blocks form for 10 layers totally;Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks;Conv4_x is by 6
Bottleneck structure block forms for 18 layers totally: conv5_x is made of for 9 layers totally 3 bottleneck structure blocks, and five parts add up and constitute
50 layers of structure of ResNet-50;
Among the process of on-line study each time and update, iteration k, only updates the parameter W of a part each timei, wherein i
=1,2,3,4,5, keep remaining 4 subnetwork layer parameters constant, and in next iteration, i-th piece of parameter is updated, wherein
I=(k+1) %5;Other layer parameters remain unchanged, and the iteration of entire on-line study and update is carrying out always, stop until default
Only condition is satisfied;
(3) depth dimension recurrence is carried out according to half dense map of the present frame and the dense map of prediction, obtains present frame depth
Spend the absolute measure factor of information;
(4) the dense map of prediction is projected in previous keyframe by pose transformation according to camera posture, and according to institute
It states the absolute measure factor to project to described half dense map in previous keyframe, using NCC score voting method according to
Two kinds of projection results select each pixel depth predicted value of the present frame to obtain predetermined depth figure, and to predetermined depth figure
Gauss is carried out to merge to obtain ultimate depth figure.
2. the real-time dense monocular SLAM method based on on-line study depth prediction network as described in claim 1, feature
It is, it is described that on-line training picture pair is selected according to the key frame, specifically: using following constraint condition before and after key frame
Picture frame is screened in frame picture and the key frame constitutes picture pair:
First, camera motion constraint: the displacement between two frame pictures in horizontal direction meets | tx| > 0.9*T, wherein T represents two
Parallax range between frame picture;
Second, for every a pair of of picture, the mean parallax of the vertical direction between picture disparity constraint: is calculated using optical flow method
Disavg, only work as DisavgJust this can be saved as candidate training picture to picture when less than preset threshold δ;
Third, diversity constraint: the same key frame can only generate a pair of of training picture;
4th, training pool capacity-constrained: when the quantity of training picture pair reaches given threshold V, just by the figure in training pool
Piece is sent to network, carries out on-line training to network, saves the network model that training obtains, while emptying training pool and continuing
The screening of training data.
3. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is that it is selective updating that the on-line training, which updates CNN network model, specifically:
The training loss function that every a batch is input to picture among CNN network model is calculated, once all pictures of a collection of picture
Loss function be both greater than preset threshold value Lhigh, the process of on-line study and update, on-line study and update will be started
Process will carry out always, until training picture loss function drop to threshold value LlowUnder or the number of iteration reach
Preset threshold value.
4. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is, the depth dimension homing method are as follows: RANSAC algorithm or least-squares algorithm.
5. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is, described to project to the prediction dense map in previous keyframe by pose transformation, and according to the absolute ruler
The degree factor projects to described half dense map in previous keyframe, using NCC score voting method according to described two projections
As a result each pixel depth predicted value of the present frame is selected to obtain predetermined depth figure, specifically:
By key frame i each of pixel p, according to CNN predict dense map Dcnn(p) it is converted with poseIt will
The pixel projects to therewith in nearest key frame i-1, and the result of projection is expressed as p 'cnn;
Pixel p among key frame i is done into another projection, is mapped in key frame i-1 and is denoted as p 'sd, projection is to be based on
The result D of half dense mapsp(p) and absolute scale factor;
The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdNearby choose small region, and calculate separately region R (p) with
RcnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdNormalized-cross-correlation function between (p ')
NCCsdIf NCCcnnLess than NCCsd, then show the depth prediction result of half dense depth map be better than CNN's as a result, choosing
Select Dsp(p) as the ultimate depth predicted value of pixel p, R is otherwise selectedcnn(p '), if some points only have the prediction knot of CNN
Fruit just uses RcnnThe ultimate depth of (p ') as pixel p.
6. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is, carries out Gauss to predetermined depth figure and merges to obtain ultimate depth figure, specifically:
The depth map obtained to NCC score voting method is further processed, and according to the context relation between key frame, and is tied
The uncertainty figure for closing key frame depth maps carries out combined optimization, obtains final depth map by combined optimization.
7. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is, carries out depth prediction to present frame picture using CNN network model after training and obtains in dense map further include:
By the depth value of each pixel in depth map multiplied by a scale coefficient
Wherein, fadaptedFor the monocular camera focal length for obtaining training data online, BadaptedFor the base of binocular training picture
Line, fpre-trainAnd Bpre-trainRespectively for training the focal length and baseline of original CNN network model training picture.
8. the real-time dense monocular SLAM method based on on-line study depth prediction network as claimed in claim 1 or 2, special
Sign is, the key frame are as follows:
Defining the first picture that whole image sequence or camera obtain in real time is key frame, in addition to first frame, the one of back
Part picture frame can also be defined as key frame, wherein the principle for defining key frame is that monitoring present frame is previous nearest therewith
Whether translation and rotation between key frame have reached preset threshold value.
9. a kind of real-time dense monocular SLAM system based on on-line study depth prediction network, which is characterized in that including direct
Method monocular SLAM module, online adaptive CNN prediction module, absolute measure regression block and depth map Fusion Module, in which:
The direct method monocular SLAM module, the picture sequence for being acquired from monocular vision sensor by rotation and translation motion
Key frame is selected in column, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and uses
The depth of triangulation prediction high gradient point obtains half dense map of present frame;
The online adaptive CNN prediction module, for according to the key frame select on-line training picture pair, using block-by-block with
Machine gradient descent method updates CNN network model to on-line training is carried out according to the on-line training picture, and utilizes CNN net after training
Network model carries out depth prediction to present frame picture and obtains dense map;Wherein, using block-by-block stochastic gradient descent method according to institute
It states on-line training picture and updates CNN network model to on-line training is carried out, specifically:
Convolutional layer among ResNet-50 is divided into 5 blocks, wherein each block is embodied as conv1, conv2_x,
conv3_x,conv4_x,conv5_x;Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by 3X3's
Convolutional layer and 3 bottleneck structure blocks form for 10 layers totally;Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks;Conv4_x is by 6
Bottleneck structure block forms for 18 layers totally: conv5_x is made of for 9 layers totally 3 bottleneck structure blocks, and five parts add up and constitute
50 layers of structure of ResNet-50;
Among the process of on-line study each time and update, iteration k, only updates the parameter W of a part each timei, wherein i
=1,2,3,4,5, keep remaining 4 subnetwork layer parameters constant, and in next iteration, i-th piece of parameter is updated, wherein
I=(k+1) %5;Other layer parameters remain unchanged, and the iteration of entire on-line study and update is carrying out always, stop until default
Only condition is satisfied;
The absolute measure regression block, for carrying out depth according to the half dense map and the dense map of prediction of the present frame
Scale regression obtains the absolute measure factor of present frame depth information;
The depth map Fusion Module, for being projected to the dense map of prediction by pose transformation according to camera posture
In one key frame, and described half dense map is projected in previous keyframe according to the absolute measure factor, is obtained using NCC
Point voting method selects each pixel depth predicted value of the present frame to obtain predetermined depth figure according to described two projection results,
And Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711227295.6A CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711227295.6A CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107945265A CN107945265A (en) | 2018-04-20 |
CN107945265B true CN107945265B (en) | 2019-09-20 |
Family
ID=61947685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711227295.6A Active CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107945265B (en) |
Families Citing this family (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921893B (en) * | 2018-04-24 | 2022-03-25 | 华南理工大学 | Image cloud computing method and system based on online deep learning SLAM |
CN110634150B (en) * | 2018-06-25 | 2023-08-11 | 上海汽车集团股份有限公司 | Method, system and device for generating instant positioning and map construction |
CN109300151B (en) * | 2018-07-02 | 2021-02-12 | 浙江商汤科技开发有限公司 | Image processing method and device and electronic equipment |
CN109087349B (en) * | 2018-07-18 | 2021-01-26 | 亮风台(上海)信息科技有限公司 | Monocular depth estimation method, device, terminal and storage medium |
CN109034237B (en) * | 2018-07-20 | 2021-09-17 | 杭州电子科技大学 | Loop detection method based on convolutional neural network signposts and sequence search |
CN110766737B (en) * | 2018-07-26 | 2023-08-04 | 富士通株式会社 | Method and apparatus for training depth estimation model and storage medium |
CN109035319B (en) | 2018-07-27 | 2021-04-30 | 深圳市商汤科技有限公司 | Monocular image depth estimation method, monocular image depth estimation device, monocular image depth estimation apparatus, monocular image depth estimation program, and storage medium |
CN109241856A (en) * | 2018-08-13 | 2019-01-18 | 浙江零跑科技有限公司 | A kind of vehicle-mounted vision system solid object detection method of monocular |
CN109087346B (en) * | 2018-09-21 | 2020-08-11 | 北京地平线机器人技术研发有限公司 | Monocular depth model training method and device and electronic equipment |
CN111089579B (en) * | 2018-10-22 | 2022-02-01 | 北京地平线机器人技术研发有限公司 | Heterogeneous binocular SLAM method and device and electronic equipment |
CN109640068A (en) * | 2018-10-31 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Information forecasting method, device, equipment and the storage medium of video frame |
CN109341694A (en) * | 2018-11-12 | 2019-02-15 | 哈尔滨理工大学 | A kind of autonomous positioning air navigation aid of mobile sniffing robot |
CN109544630B (en) * | 2018-11-30 | 2021-02-02 | 南京人工智能高等研究院有限公司 | Pose information determination method and device and visual point cloud construction method and device |
CN111382613B (en) * | 2018-12-28 | 2024-05-07 | 中国移动通信集团辽宁有限公司 | Image processing method, device, equipment and medium |
CN113711276A (en) * | 2019-04-30 | 2021-11-26 | 华为技术有限公司 | Scale-aware monocular positioning and mapping |
CN112085842B (en) * | 2019-06-14 | 2024-04-09 | 北京京东乾石科技有限公司 | Depth value determining method and device, electronic equipment and storage medium |
CN112150529B (en) * | 2019-06-28 | 2023-09-01 | 北京地平线机器人技术研发有限公司 | Depth information determination method and device for image feature points |
CN110428461B (en) * | 2019-07-30 | 2022-07-05 | 清华大学 | Monocular SLAM method and device combined with deep learning |
CN110569877A (en) * | 2019-08-07 | 2019-12-13 | 武汉中原电子信息有限公司 | Non-invasive load identification method and device and computing equipment |
CN110610486B (en) * | 2019-08-28 | 2022-07-19 | 清华大学 | Monocular image depth estimation method and device |
CN110599542A (en) * | 2019-08-30 | 2019-12-20 | 北京影谱科技股份有限公司 | Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area |
CN110717917B (en) * | 2019-09-30 | 2022-08-09 | 北京影谱科技股份有限公司 | CNN-based semantic segmentation depth prediction method and device |
CN110738699A (en) * | 2019-10-12 | 2020-01-31 | 浙江省北大信息技术高等研究院 | unsupervised absolute scale calculation method and system |
CN111062981B (en) * | 2019-12-13 | 2023-05-05 | 腾讯科技(深圳)有限公司 | Image processing method, device and storage medium |
CN111179326B (en) * | 2019-12-27 | 2020-12-29 | 精英数智科技股份有限公司 | Monocular depth estimation method, system, equipment and storage medium |
CN111127522B (en) * | 2019-12-30 | 2024-02-06 | 亮风台(上海)信息科技有限公司 | Depth optical flow prediction method, device, equipment and medium based on monocular camera |
CN111260706B (en) * | 2020-02-13 | 2023-04-25 | 青岛联合创智科技有限公司 | Dense depth map calculation method based on monocular camera |
CN111462329B (en) * | 2020-03-24 | 2023-09-29 | 南京航空航天大学 | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning |
CN111783968B (en) * | 2020-06-30 | 2024-05-31 | 山东信通电子股份有限公司 | Power transmission line monitoring method and system based on cloud edge cooperation |
CN111784757B (en) * | 2020-06-30 | 2024-01-23 | 北京百度网讯科技有限公司 | Training method of depth estimation model, depth estimation method, device and equipment |
CN112308911A (en) * | 2020-10-26 | 2021-02-02 | 中国科学院自动化研究所 | End-to-end visual positioning method and system |
CN112149645A (en) * | 2020-11-10 | 2020-12-29 | 西北工业大学 | Human body posture key point identification method based on generation of confrontation learning and graph neural network |
CN112612476A (en) * | 2020-12-28 | 2021-04-06 | 吉林大学 | SLAM control method, equipment and storage medium based on GPU |
CN112767480A (en) * | 2021-01-19 | 2021-05-07 | 中国科学技术大学 | Monocular vision SLAM positioning method based on deep learning |
CN112862959B (en) * | 2021-03-23 | 2022-07-12 | 清华大学 | Real-time probability monocular dense reconstruction method and system based on semantic prior |
CN113971760B (en) * | 2021-10-26 | 2024-02-06 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
CN114820755B (en) * | 2022-06-24 | 2022-10-04 | 武汉图科智能科技有限公司 | Depth map estimation method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358624A (en) * | 2017-06-06 | 2017-11-17 | 武汉几古几古科技有限公司 | The dense positioning immediately of monocular and map reconstruction method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9674507B2 (en) * | 2013-04-30 | 2017-06-06 | Qualcomm Incorporated | Monocular visual SLAM with general and panorama camera movements |
-
2017
- 2017-11-29 CN CN201711227295.6A patent/CN107945265B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358624A (en) * | 2017-06-06 | 2017-11-17 | 武汉几古几古科技有限公司 | The dense positioning immediately of monocular and map reconstruction method |
Also Published As
Publication number | Publication date |
---|---|
CN107945265A (en) | 2018-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107945265B (en) | Real-time dense monocular SLAM method and system based on on-line study depth prediction network | |
CN109387204B (en) | Mobile robot synchronous positioning and composition method facing indoor dynamic environment | |
CN107392964B (en) | The indoor SLAM method combined based on indoor characteristic point and structure lines | |
CN108596974B (en) | Dynamic scene robot positioning and mapping system and method | |
CN105809687B (en) | A kind of monocular vision ranging method based on point information in edge in image | |
Bozic et al. | Neural deformation graphs for globally-consistent non-rigid reconstruction | |
US9613420B2 (en) | Method for locating a camera and for 3D reconstruction in a partially known environment | |
CN112505065B (en) | Method for detecting surface defects of large part by indoor unmanned aerial vehicle | |
CN104537709B (en) | It is a kind of that method is determined based on the real-time three-dimensional reconstruction key frame that pose changes | |
CN109558879A (en) | A kind of vision SLAM method and apparatus based on dotted line feature | |
CN109974693A (en) | Unmanned plane localization method, device, computer equipment and storage medium | |
CN108090958A (en) | A kind of robot synchronously positions and map constructing method and system | |
CN107204010A (en) | A kind of monocular image depth estimation method and system | |
CN110945565A (en) | Dense visual SLAM using probabilistic bin maps | |
CN107909150B (en) | Method and system for on-line training CNN based on block-by-block random gradient descent method | |
CN106846417A (en) | The monocular infrared video three-dimensional rebuilding method of view-based access control model odometer | |
CN110378997A (en) | A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method | |
CN109087329A (en) | Human body three-dimensional joint point estimation frame and its localization method based on depth network | |
CN106875482A (en) | A kind of positioning simultaneously and dense three-dimensional rebuilding method | |
CN106934827A (en) | The method for reconstructing and device of three-dimensional scenic | |
CN103400409A (en) | 3D (three-dimensional) visualization method for coverage range based on quick estimation of attitude of camera | |
Tang et al. | Joint multi-view people tracking and pose estimation for 3D scene reconstruction | |
CN113256698B (en) | Monocular 3D reconstruction method with depth prediction | |
CN109859266A (en) | Vision positions and drawing practice simultaneously under a kind of big visual angle change based on pre-transform | |
KR20210058686A (en) | Device and method of implementing simultaneous localization and mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20180420 Assignee: HISCENE INFORMATION TECHNOLOGY Co.,Ltd. Assignor: HUAZHONG University OF SCIENCE AND TECHNOLOGY Contract record no.: X2023990000439 Denomination of invention: Real time dense monocular SLAM method and system based on online learning deep prediction network Granted publication date: 20190920 License type: Exclusive License Record date: 20230428 |