CN107945265A - Real-time dense monocular SLAM method and systems based on on-line study depth prediction network - Google Patents
Real-time dense monocular SLAM method and systems based on on-line study depth prediction network Download PDFInfo
- Publication number
- CN107945265A CN107945265A CN201711227295.6A CN201711227295A CN107945265A CN 107945265 A CN107945265 A CN 107945265A CN 201711227295 A CN201711227295 A CN 201711227295A CN 107945265 A CN107945265 A CN 107945265A
- Authority
- CN
- China
- Prior art keywords
- depth
- picture
- cnn
- training
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/08—Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of real-time dense monocular SLAM methods based on on-line study depth prediction network:Luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and obtains half dense map of present frame using the depth of triangulation prediction high gradient point;On-line training picture pair is selected, CNN network models are updated using block-by-block stochastic gradient descent method on-line training, and depth prediction is carried out to present frame picture using CNN network models after training and obtains dense map;According to half dense map of the present frame and predict that dense map carries out depth dimension recurrence, obtain the absolute measure factor of present frame depth information;Select each pixel depth prediction of the present frame to be worth to predetermined depth figure according to described two projection results using NCC scores voting method, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.Present invention also offers the real-time dense monocular SLAM systems based on on-line study depth prediction network accordingly.
Description
Technical field
It is deep based on on-line study more particularly, to one kind the invention belongs to Computerized 3 D visual reconstruction technique field
The real-time dense monocular SLAM method and systems of degree prediction network.
Background technology
Immediately positioning can with map reconstruction technology (Simultaneous Localization And Mapping, SLAM)
To predict the pose of sensor in real time and reconstruct the 3D maps of surrounding environment, thus it is existing in unmanned plane avoidance and enhancing
The field such as real plays an important role.Wherein, the SLAM systems for relying solely on single camera as input pickup are claimed
For monocular SLAM systems.Monocular SLAM has the characteristics such as low-power consumption, hardware threshold be low and easy to operate, and it is extensive to be studied personnel
Use.But existing popular monocular SLAM systems, either PTAM (the Parallel Tracking of feature based method
And Mapping For Small AR Workspaces) and ORB-SLAM (Orb-slam:AVersatile And
Accurate Monocular Slam System), or the LSD-SLAM (Lsd-slam using direct method:Large-scale
Direct Monocular Slam), all there are two it is main the problem of:(1) the sparse of scene can only be constructed or half is dense
Map, because only that a small number of key point or the depth of high gradient point can be calculated;(2) there is scale uncertainty,
There are the phenomenon of scale drift.
In recent years, depth convolutional neural networks (the Convolutional Neural for monocular picture estimation of Depth
Network, CNN) huge progress is had been achieved for, its cardinal principle is the depth in the learning object of substantial amounts of training data
Inner link between degree and shape, texture, Scene Semantics and scene context etc., thus Accurate Prediction be input to network it
The depth information of middle picture.CNN is combined with monocular SLAM can not only improve the percentage of head rice for building figure, can also obtain absolute
Dimensional information, thus the defects of compensate for monocular SLAM with deficiency.At present, the system that both combine is referred to as CNN- by most successful
SLAM(Cnn-slam:Realtime Dense Monocular Slam With Learned Depth Prediction), should
Initial depth value of the system using the results of CNN depth predictions as SLAM key frames, then using pixel matching, triangulation and
The method of figure optimization optimizes the depth of high gradient point in key frame, so as to obtain dense 3D reconstructed results, and makes ruler
Degree information is more nearly true scale.Although achieving certain effect, which still suffers from problems with:
(1) depth value of only a small number of high gradient pixels is optimised, and the depth value of most of low gradient pixel point does not have
Change, causes reconstruction effect undesirable, especially for unknown scene;(2) depth of high gradient pixel among CNN outputs is utilized
Information is spent to predict that dimensional information is not accurate enough, causes initialization not abundant enough, can increase the mistake that SLAM systems build figure and tracking
Difference.
The content of the invention
For the disadvantages described above or Improvement requirement of the prior art, the present invention provides one kind by on-line study depth prediction net
The method and system that network is combined with monocular SLAM, its object is to make full use of the advantage realization pair of depth convolutional neural networks
Estimate in the dense depth of monocular SLAM system core frames, and according to the true dimensional information of result restoration scenario, thus solve to pass
System monocular SLAM lacks dimensional information and can not achieve the dense technical problem for building figure.
To achieve the above object, one side according to the invention, there is provided one kind is based on on-line study depth prediction net
The real-time dense monocular SLAM methods of network, including:
(1) key frame is selected from the sequence of pictures that monocular vision sensor is gathered by rotation and translation motion, passed through
The luminosity error for minimizing high gradient point optimizes to obtain the camera posture of key frame, and using triangulation prediction high gradient
The depth of point obtains half dense map of present frame;
(2) according to the key frame select on-line training picture pair, using block-by-block stochastic gradient descent method according to described in
Line training picture carries out present frame picture to carrying out on-line training renewal CNN network models, and using CNN network models after training
Depth prediction obtains dense map;
(3) depth dimension recurrence is carried out according to half dense map of the present frame and the dense map of prediction, obtained current
The absolute measure factor of frame depth information;
(4) the dense map of prediction is projected in previous keyframe by pose conversion according to camera posture, and root
Described half dense map is projected in previous keyframe according to the absolute measure factor, using NCC scores voting method according to
Described two projection results select each pixel depth prediction of the present frame to be worth to predetermined depth figure, and to the pre- depth measurement
Degree figure carries out Gauss and merges to obtain ultimate depth figure.
It is described that on-line training picture is selected according to the key frame in one embodiment of the present of invention, be specially:Using such as
Lower constraints screens picture frame in frame picture before and after key frame and forms picture pair with the key frame:
First, camera motion constraint:Displacement between two frame pictures in horizontal direction meets | tx|>0.9*T, wherein T generations
Parallax range between two frame picture of table;
Second, disparity constraint:For every a pair of of picture, using the mean parallax of the vertical direction between optical flow method calculating picture
Disavg, only work as DisavgThis can just save as picture to candidate during less than predetermined threshold value δ and train picture;
3rd, diversity constraint:Same key frame can only produce a pair of of training picture;
4th, training pool capacity-constrained:When the quantity of training picture pair reaches given threshold V, just by training pool
Picture be sent to network, on-line training is carried out to network, preserves the obtained network model of training, while empties training pool continuation
It is trained the screening of data.
In one embodiment of the present of invention, using block-by-block stochastic gradient descent method according to the on-line training picture come online
Training renewal CNN network models, are specially:
Convolutional layer among ResNet-50 is divided into 5 blocks, each of which block is embodied as conv1, conv2_x,
conv3_x,conv4_x,conv5_x;Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by 3X3's
Convolutional layer and 3 bottleneck structure blocks form for 10 layers totally;Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks;Conv4_x is by 6
Bottleneck structure block forms for 18 layers totally:Conv5_x is made of for 9 layers totally 3 bottleneck structure blocks, and five parts add up and constitute
50 Rotating fields of ResNet-50;
Among the process of on-line study each time and renewal, iteration k each time, only updates the parameter W of a parti(i
=1,2,3,4,5), keep remaining 4 subnetwork layer parameters constant, and in next iteration, i-th piece of parameter is updated, its
Middle i=(i+1) %5;Other layer parameters remain unchanged, and the iteration of whole on-line study and renewal is carrying out always, until default
Stop condition is satisfied.
In one embodiment of the present of invention, the on-line training renewal CNN network models are selective updating, are specially:
Calculate per a collection of training loss function for being input to picture among CNN network models, once a collection of picture is all
The loss function of picture is both greater than threshold value L set in advancehigh, will start on-line study and renewal process, on-line study and
The process of renewal will carry out always, until the loss function of training picture drops to threshold value LlowUnder, or the number of iteration
Threshold value set in advance is reached.
In one embodiment of the present of invention, the depth dimension homing method is:RANSAC algorithms or least-squares algorithm.
It is described that the dense map of prediction is projected into a upper key by pose conversion in one embodiment of the present of invention
In frame, and described half dense map is projected in previous keyframe according to the absolute measure factor, voted using NCC scores
Method selects each pixel depth prediction of the present frame to be worth to predetermined depth figure according to described two projection results, specifically
For:
By each pixel p among key frame i, the dense map D predicted according to CNNcnn(p) converted with poseThe pixel is projected to therewith in nearest key frame i-1, the result of projection is expressed as p 'cnn;
Pixel p among key frame i is done into another projection, is mapped in key frame i-1 and is denoted as p 'sd, projection is
Result D based on half dense mapsp(p) and absolute scale factor;
The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdNearby choose small region, and zoning R (p) respectively
With RcnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdNormalized-cross-correlation function between (p ')
NCCsdIf NCCcnnLess than NCCsd, then show half dense depth map depth prediction result be better than CNN's as a result, choosing
Select Dsp(p) the ultimate depth predicted value as pixel p, otherwise selects Rcnn(p '), if some points only have the prediction knot of CNN
Fruit, just uses RcnnThe ultimate depth of (p ') as pixel p.
In one embodiment of the present of invention, Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure, specifically
For:
The depth map obtained to NCC score voting methods is further handled, according to the context relation between key frame, and
And the uncertainty figure for combining key frame depth maps carries out combined optimization, and final depth map is obtained by combined optimization.
In one embodiment of the present of invention, depth prediction is carried out to present frame picture using CNN network models after training and is obtained
Further included into dense map:
The depth value of each pixel in depth map is multiplied by a scale coefficient
Wherein, fadaptedFor the monocular camera focal length for obtaining training data online, BadaptedPicture is trained for binocular
Baseline, fpre-trainAnd Bpre-trainRespectively it is used for the focal length and baseline for training original CNN network models training picture.
In one embodiment of the present of invention, the key frame is:What definition whole image sequence or camera obtained in real time
First pictures are key frame, and except the first frame, a part of picture frame of back can also be defined as key frame, are closed defined in it
Whether the principle of key frame is to monitor translation and rotation of the present frame therewith between previous nearest key frame and reached to set in advance
Fixed threshold value.
It is another aspect of this invention to provide that additionally provide a kind of real-time dense list based on on-line study depth prediction network
Mesh SLAM systems, including direct method monocular SLAM modules, online adaptive CNN prediction modules, absolute measure regression block and
Depth map Fusion Module, wherein:
The direct method monocular SLAM modules, for the figure gathered from monocular vision sensor by rotation and translation motion
Key frame is selected in piece sequence, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and
Half dense map of present frame is obtained using the depth of triangulation prediction high gradient point;
The online adaptive CNN prediction modules, for according to the key frame select on-line training picture pair, using by
Block stochastic gradient descent method is according to the on-line training picture to carrying out on-line training renewal CNN network models, and utilizing after training
CNN network models carry out depth prediction to present frame picture and obtain dense map;
The absolute measure regression block, for the half dense map according to the present frame and predicts that dense map carries out
Depth dimension returns, and obtains the absolute measure factor of present frame depth information;
The depth map Fusion Module, projects for being converted the dense map of prediction by pose according to camera posture
Described half dense map is projected in previous keyframe into previous keyframe, and according to the absolute measure factor, is used
NCC scores voting method selects each pixel depth prediction of the present frame to be worth to prediction deeply according to described two projection results
Degree figure, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
In general, by the contemplated above technical scheme of the present invention compared with prior art, have following beneficial to effect
Fruit:Based on this hair clipper uses direct method using monocular SLAM, the half dense map and phase of scene are obtained by the way of optimization
Machine posture;Online adaptive CNN employs Weakly supervised depth prediction network, and carries out online updating according to scene information,
So that network has good effect under unknown scene;Depth dimension returns the dimensional information that can obtain depth value, is used for
Improve the accuracy that 3D is rebuild;Data fusion employs the mode of region ballot and Gauss fusion, is ensureing the situation of percentage of head rice
Under, improve the precision of result.
Brief description of the drawings
Fig. 1 is the real-time dense monocular SLAM Method And Principles based on on-line study depth prediction network in the embodiment of the present invention
Schematic diagram;
Fig. 2 is intermediate cam mensuration model schematic of the embodiment of the present invention;
Fig. 3 is the restriction relation that training picture is screened in the embodiment of the present invention;Wherein figure (a) is that the first pixel corresponds to pass
The image pair of system, figure (b) are the images pair of second of pixel correspondence;
Fig. 4 is mesoscale coefficient adjustment schematic diagram of the embodiment of the present invention, and wherein top half is original network structure, under
Half part is improvement of the present invention for network;
Fig. 5 is block-by-block gradient descent method in the embodiment of the present invention (block-wise SGD) schematic diagram;
Fig. 6 is the recurrence of mesoscale of the embodiment of the present invention and design sketch;
Fig. 7 is the real-time dense monocular SLAM system structures based on on-line study depth prediction network in the embodiment of the present invention
Schematic diagram.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below
Not forming conflict each other can be mutually combined.
The problem to be solved in the present invention is to realize that a real-time monocular is dense to build figure SLAM systems, and the system is using adaptive
Answer online CNN depth predictions network and mode that the monocular SLAM systems based on direct method are combined, can not only significantly improve
For the accuracy and Lu Fangxing of the prediction of unknown scene depth, additionally it is possible to solve that monocular SLAM system scales are probabilistic to ask
Topic.
To achieve these goals, present invention employs the mode for being combined CNN with SLAM, for existing for monocular SLAM
Problem, it is proposed that the more preferable stronger algorithm of putting property of Shandong of accuracy, the main innovation point of the program include:
(1) adaptive online CNN depth prediction networks are employed, this be also whole field for the first time by the type network with
Monocular SLAM systems combine, and so do the accuracy that can greatly improve system depth prediction in unknown scene;
(2) a kind of method of " block-by-block stochastic gradient descent " (block-wise SGD) and the plan of selective updating are proposed
Slightly, CNN is made to obtain more preferable depth prediction result under the conditions of limited training data;
(3) a kind of absolute measure homing method based on adaptive network is devised, depth prediction can be greatly enhanced
Accuracy, and whole system is followed the trail of and build the precision higher of figure.
The system is mainly by four parts:Direct method monocular SLAM, online adaptive CNN, depth dimension return and
Data fusion, Method And Principle block diagram are as shown in Figure 1.Monocular SLAM uses direct method, and based on direct method, using optimization
Mode obtains half dense map and camera posture of scene;Online adaptive CNN employs Weakly supervised depth prediction network, and
And online updating is carried out according to scene information so that network has good effect under unknown scene;Depth dimension returns can
To obtain the dimensional information of depth value, for improving the accuracy of 3D reconstructions;Data fusion employs region ballot and Gauss is melted
The mode of conjunction, in the case where ensureing percentage of head rice, improves the precision of result.
Specifically, the described method includes following process:
(1) direct method monocular SLAM:This part is the transformation carried out on the basis of LSD-SLAM, high by minimizing
The luminosity error of gradient point, optimization obtains the camera posture of each frame picture, and predicts high gradient using triangulation
The depth of point, so as to obtain half dense map;
Picture collection:This method is based on monocular vision sensor, when gathering picture, it is desirable to which monocular camera has rotation
Suitably increased with the amplitude of the movement of translation, and translation.The reason for so doing mainly has two:If first, only exist
Static and pure rotating situation, it is likely that cause this part initialization failure or picture tracking failure, in turn result in whole
The irregular working of system;Second, the amplitude of appropriate increase translation, contributes to system to select suitable training picture, so as to protect
Card on-line training is normally carried out with renewal CNN processes.
Crucial frame definition:The first pictures that monocular SLAM parts define whole sequence or camera obtains in real time are
Keyframe (key frame), except the first frame, a part of picture frame of back can also be defined as key frame, key defined in it
Whether the principle of frame is to monitor translation and rotation of the present frame therewith between previous nearest key frame and reached to preset
Threshold value;Algorithm composition based on key frame is that the basis of direct method monocular SLAM rear ends optimization is also the important frame of network portion
Frame structure, it is therefore desirable to especially introduce.
Camera posture is followed the trail of:The movement of camera in three dimensions shares six-freedom degree, the amount of exercise within the Δ t times
A sextuple array representation can be used:ξ=[v(1) ν(2) ν(3) ψ(1) ψ(2) ψ(3)]T.Wherein [ν(1) ν(2) ν(3)]TRepresent rigid body
Amount of exercise is along three translation of axes components, and [ν(1) ν(2) ν(3)]T∈R3It is the vector in theorem in Euclid space;[ψ(1) ψ(2)
ψ(3)]TRepresent rotational component of the rigid motion along three axes, and [ψ(1) ψ(2) ψ(3)]T∈ SO (3) are non-European three-dimensional rotations
Vector in group SO (3).The camera of view-based access control model is exactly the process that tracking solves ξ by visual information.It is of the present invention
Monocular SLAM follows the trail of camera posture using direct method, and all the points for scheming to have depth information in A are projected in figure B, obtain one
The picture B' of Zhang Xin, by optimize the summation (photometric error) of the difference of the gray value of all positions between B' and B come
Obtain change in location of the B relative to A.Direct method can preferably tackle the feelings such as visual angle change, illumination variation, scene texture be sparse
Condition, is a kind of method popular at present, therefore this project is used based on direct method to realize camera posture tracking.
Specifically, the crucial idea that direct method is used for the tracking of camera posture is in present frame n and at a distance of nearest key
An optimal camera posture is found between frame kSo that the light measurement error between present frame n and key frame k is minimum.Uniformly
The pixel matching that the presence in region is likely to result in interframe is inaccurate, because different camera posturesIt might have similar light
Degree learns error.In order to obtain the tracking result of putting property of Gao Lu and reduce optimization time overhead used, light measurement error r only exists
High gradient point { p } in key frame k is calculated, as follows:
Wherein D (p) represents the depth value of high gradient pixel p, and π is that the model of projection can be by the 3D in camera coordinates system
Spatial point PcProjecting to 2D picture image plane vegetarian refreshments p, π is determined by camera internal reference K.
Similarly, π-1It is the model of back projection, the pixel of 2D planes can be projected to 3d space.One process is excellent
Camera posture after changeIt can be calculated by minimizing the light measurement error r of all high gradient pixels, it is as follows
It is shown:
Wherein wpIt is for improving Lu Fangxing and minimizing the weights of the pixel p of exceptional value influence.
(2) can be solved the problem of formula by the Gauss-Newton optimization algorithm of a standard.Certainly, it is this to pass through the above method
Drift can be produced because of the accumulation of error by carrying out camera posture tracking, but this drift can be examined by additionally adding winding
Survey to eliminate, this project intends the drifting problem for being solved cumulative errors using the winding detection method based on bag of words and being brought.
Half dense depth is estimated:Compared in monocular direct method SLAM systems for building the thread of figure by small Baseline Stereo
Method estimates the depth value of high gradient pixel, that is, used by triangulation pixel matching method.It is specific and
Speech characteristic matching and the model of triangulation are illustrated in fig. 2 shown below, and C and C' are the camera coordinates of key frame and reference frame respectively
It is origin, X is the 3D points that calculate depth, and m and m' are respectively projections of the point X on video camera C and C' video camera projection plane.
Because key frame and reference frame derive from same video camera in monocular vision, their projection internal reference is phase
With, if the rotation obtained at this time according to visible sensation method between two camera coordinate systems translates [R, t], then there is following formula:
Wherein, fx,fy,cx,cy, s is the internal reference of camera, R, t be respectively one 3 × 3 and 3 × 1 matrix, represent shooting
Machine C' coordinate systems are relative to the rotation and translation of video camera C coordinate systems, (xc,yc,zc)T, (xc',yc',zc')TPoint X is represented respectively
Homogeneous coordinates under camera coordinate system C and C', (u, v)T, (u', v')TProjections of the point X in video camera C and C' is represented respectively
Pixel coordinate in plane.Since internal reference matrix is given value after camera calibration, [R, t] can be obtained by positioning above, institute
With (m11...m34) be datum, then above formula can abbreviation be:
I.e.:A(xc,yc,zc)T=b, equation group contain 3 unknown numbers, and 4 equations, are an over-determined systems, to it
Least square solution is asked to solveSo that meet
Once a new key frame k is created, its depth map DpriWith the uncertainty U of depth value predictionpriFirst
It can be initialised, by by the depth D of -1 key frame of kthk-1With uncertainty Uk-1Project to current key frame, specific practice
It is as follows:
Dpri=Dk-1-tz(6)
Wherein, tzIt is camera along the translation on boresight direction, σ2Represent the standard deviation of initialization noise.The depth of initialization
Figure can constantly be repaired according to later picture frame, repair process be first along bipolar line retrieval present frame in each is high
Matched pixel points of the gradient pixel point p in key frame k, wherein, the retrieval section on bipolar line be by pixel p depth not
What degree of certainty determined;Once have found matched pixel point, the depth value of p can be calculated by triangulation.This hair
The bright whole process that pixel matching and triangulation are represented with a function F, the depth value obtained based on F, the present invention
Observed result DobsCan be expressed as shown below:
Wherein IkAnd IcurKey frame k and current picture frame are represented respectively,Represent key frame k to photo current frame it
Between camera motion, K represents the internal reference matrix of camera.The observation D of depthobsUncertainty UobsProduced by noise, these
Noise is present in IkAnd IcurBetween pixel matching and camera motionEstimation procedure among.Depth after reparation
Figure and its corresponding uncertainty are exactly a kind of fusion of initial depth information and Observational depth information in fact, meet following point
Cloth:
(2) online adaptive training CNN, and using the CNN key frame is predicted to obtain dense map (dense
depth map):
Online adaptive CNN:First, present invention employs carry out list based on a kind of state-of-the-art Weakly supervised method
Width picture estimation of Depth.The network architecture of this kind of Weakly supervised method is mainly made of two parts, and Part I is to be based on ResNet-
50 full convolutional layer (ConvLayers);Part II is then the pond layer in ResNet-50 positioned at back and full articulamentum
(FCLayers) a series of up-sampling regions being made of warp lamination and short articulamentum are replaced with.The whole CNN of training is needed into
To the stereotome through overcorrection, the baseline B of these stereotomespre-trainWith camera focus fpre-trainAll it is fixed.
The output of network is disparity map, and the reconstruct image of source picture, the light between source picture and reconstructed picture can be generated according to disparity map
Degree learns the loss function that error constitutes whole network plus smooth item.In experiment, one can be obtained by monocular SLAM systems
Serial key frame { I1,…Ij{ T is translated between picture1,2,…,Ti-1,i,…,Tj-1,jAnd rotationally-varying { R1,2,…,Ri-1,i,…,
Rj-1,j, for the present invention using this information as true value, study can minimize the two dimension of re-projection error between any two key frames
The depth map of picture.
Pre-training CNN:Whole online adaptive CNN is based partially on the CNN network models of pre-training of the present invention, in pre-training
During network model, the present invention uses CMU data sets (the Wean Hall dataset) according to traditional CNN training methods
To picture, totally 42098 pairs of pictures are as training set plus this laboratory scene 35588 that oneself is recorded for 6510 pairs of pictures, to network
It is trained.Wherein the baseline of training set picture is 0.12 meter, this laboratory scene picture is shot using ZED stereoscopic cameras
Arrive, and employ random color, scale and mirror image change carry out data enhancing to training set.All training set picture warps
Cross after processing and be input to network and be iterated, iterations 40000 times, learning rate 0.0001, has finally obtained what is wanted
Pre-training model, whole system can also be based on this model and carry out on-line study and renewal.
Under single video scene sequence, CNN is trained all to preserve and update the model of network each time, and with new mould
Type generates depth map.Online adaptive CNN strategies mainly have following four:
1) on-line study picture screens:Depth prediction network needs the picture of paired stereoscopic camera shooting to scheme as training
Piece, these stereotomes have fixed baseline Bpre-train.For training in real time and CNN network models are updated, the present invention is in list
Paired monocular picture is collected come simulating stereo picture according to the rule of binocular camera while mesh camera motion.The present invention uses
The requirement of high standard collects reliable trained picture to reduce mistake of the CNN network models of noise generation to error sample
Fitting phenomenon.The present invention devises four main screening conditions:First, camera motion constraint.Level side between two frame pictures
Upward displacement meets | tx|>0.9*T, wherein T represent the parallax range between two frame picturesThe
Two, disparity constraint., all can be using the mean parallax Dis of the vertical direction between optical flow method calculating picture for every a pair of of pictureavg,
Only work as DisavgThis saves as into picture candidate less than threshold value δ (5 are taken as during experiment) Shi Caihui and trains picture.Effect such as Fig. 3
Shown, (a), (b) is two pairs of pictures respectively, when the pixel relationship among every a pair of of picture meets the relation as shown in (a), this
The image of sample, when their relation is as shown in (b), will be discarded to that will be screened as trained candidate's picture;The
Three, diversity constraint.Screening per a pair of of training picture is all uniquely corresponding with key frame picture, that is to say, that same pass
Key frame can only at most produce a pair of of training picture;4th, training pool capacity-constrained.Whenever the quantity of training picture pair reaches threshold value
During V (4 are taken during experiment), the picture in training pool is just sent to network, on-line training is carried out to network, preserves what training obtained
Network model, while empty the screening that training pool continues training data;
2) camera parameter adjusts:For obtaining the monocular camera focal length f of training data onlineadaptedWith binocular training picture
Baseline BadaptedProbably and for training original CNN network models to train the focal length f of picturepre-trainWith baseline Bpre-train
It is very different.Relation between camera parameter and scene depth value by it is implicit be dissolved into network structure among, therefore
If inputting network test with different focal picture, the absolute measure of obtained 3D reconstructed results may be inaccurate.Therefore, it is whole
A network needs to be adjusted to accommodate the change of different cameral parameter, it is done so that can make the renewal speed of on-line study each time
Spend slack-off.In order to solve this problem, it is proposed that the new approaches of adjustment output depth map, basic conception is as shown in figure 4, basic think
Want by the way that the depth value of each pixel in depth map is multiplied by a scale coefficientTo protect
Demonstrate,prove the accuracy of depth map;
3) block-by-block SGD methods:Stochastic gradient descent (stochastic gradient descent, SGD) is mainstream now
A kind of deep learning most fully optimized algorithm.Its main thought is for training dataset, is divided into n batch first,
Each batch includes m sample.The parameter of renewal network all only utilizes the data of a batch, rather than whole training every time
Collection.
Advantage is:When training data is very much, the pressure of machine can be reduced using batch, and can quickly be received
Hold back;When training set has many redundancies (similar sample occurs multiple), batch methods restrain faster.
Shortcoming is:Local optimum easily is converged to, is not global optimum.
It is proposed that block-by-block gradient descent method (block-wise SGD) be in stochastic gradient descent method (stochastic
Gradient descent, SGD) on carry out once innovative improve.
ResNet-50 is used for the characteristic information that different stage is extracted in picture by the present invention, these characteristic informations are subsequent
It can be operated and be encoded among disparity map by a series of down-samplings.In order to reduce since training picture limitation causes CNN mistakes
The risk of fitting, the present invention propose the new method of a kind of " block-by-block stochastic gradient descent " (block-wise SGD), will
Convolutional layer among ResNet-50 has been divided into 5 blocks, as shown in figure 5, each of which block is embodied as conv1, conv2_
x,conv3_x,conv4_x,conv5_x.Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by a 3X3
Convolutional layer and 3 bottleneck structure blocks (each bottleneck structure block is 1X1 64,3X3 64,1X1 256) totally 10 layers composition;
Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks (each bottleneck structure block is 1X1 128,3X3 128,1X1 512);
Conv4_x is made of for 18 layers totally 6 bottleneck structure blocks (each bottleneck structure block is 1X1 256,3X3 256,1X1 1024):
Conv5_x is made of for 9 layers totally 3 bottleneck structure blocks (each bottleneck structure block is 1X1 512,3X3 512,1X1 2048), and five
A part adds up 50 Rotating fields for constituting ResNet-50.Among the process of on-line study each time and renewal, each time
Iteration k, only updates the parameter W of a parti(i=1,2,3,4,5), keeps remaining 4 subnetwork layer parameters constant.And
In next iteration, i-th piece of (i=(k+1) %5) parameter is updated, other layer parameters are remained unchanged, thereby reduced each time more
The complexity of new network.The iteration of whole on-line study and renewal is carrying out always, until stop condition is satisfied (such as iteration
Number limitation, or training loss function reach threshold value set in advance);
4) selective updating:On-line study and CNN network models are just carried out whenever having suitable training data to produce
Renewal, this way be easy to cause unnecessary computing cost.As long as current CNN network models can be provided for current scene
Sufficiently exact depth prediction as a result, just use current CNN network models, until forced progress network model always
Adjustment.Based on this thinking, the present invention devises the operating mode of a kind of " Systematic selection renewal ", by calculating per a collection of defeated
The training loss function of picture among entering to CNN network models, once the loss function of all pictures of a collection of picture is both greater than
Threshold value L set in advancehigh, the process of on-line study and renewal will be started.The process of on-line study and renewal will be one straight
OK, until the loss function of training picture drops to LlowUnder, or the number of iteration reached threshold value set in advance.This
A strategy not only largely reduces calculation amount, and can meet to want for network depth prediction result precision
Ask.
(3) depth dimension returns:Camera posture with accurate dimensional information for select suitable training picture with
Significance, directly influences the output result of network.Since monocular SLAM systems can not obtain absolute scale, this hair
It is bright to propose a kind of method of " accurate scale regression is carried out based on adaptive CNN ".We depict Dsd(p)-Dgt(p) between
Graph of a relation, it is blue as shown in fig. 6, the black line wherein in (b) figure is the camera posture of GroundTruth in scene (actual value)
Line is the camera posture that monocular SLAM is obtained, and red line is then to employ RANSAC algorithms to return to obtain scale and be applied to camera
Result after posture;It was found that Dsd(p) (depth for the high gradient point p that monocular SLAM is obtained) and Dgt(p) (pixel p point it is true
Depth value) ratio represent the absolute measure information of p points.It is proposed to close using the depth of all high gradient points based on this present invention
It is to return out the thinking of absolute measure information, but real depth information is unknown in practical application, so this hair
The bright prediction result using CNN carries out scale regression.Unfavorable shadow in view of CNN predetermined depth there is some exceptional values
Ring, we have tested two kinds of algorithms of RANSAC algorithms and least square and have carried out scale regressions respectively, in experimental result such as Fig. 6 (a)
Shown in green line and red line, it was demonstrated that more accurately fitting effect can be obtained using RANSAC algorithms, therefore the embodiment of the present invention is adopted
With the method for RANSAC.After our profits calculate the absolute measure of depth information in this way, according to mapping relations
The dimensional information of posture can be obtained again, can improve the tracking precision of monocular SLAM systems, as shown in Fig. 6 (b), this hair in turn
Bright two scenes in TUM data sets are tested, the posture that wherein blue portion is followed the trail of for monocular SLAM, black portions
For real depth information, RED sector by dimensional information increase to monocular SLAM follow the trail of among as a result, showing that this kind of method can
Scale is followed the trail of with preferable fitting.
(4) data fusion:For each key frame, we can obtain two depth maps, and one is monocular SLAM warps
Cross the result D after optimizationsd, another Zhang Ze is the prediction result D of CNNcnn.The present invention devises a kind of " ballot of NCC scores and height
This fusion is combined " mode, to reach best combination effect.The process is made of two parts, and Part I obtains for NCC
Divide ballot.NCC (Normalized Cross Correlation) is the abbreviation of normalized crosscorrelation, for calculating two pictures
Correlation between region A and B, calculation formula areIt is crucial
Each pixel p among frame i, the depth map D predicted according to CNNcnn(p) converted with poseBy the pixel spot projection
To in nearest key frame i-1, the result of projection is expressed as p ' therewithcnn;It is similar, among key frame i
Pixel p does another projection, is mapped in key frame i-1 and is denoted as p 'sd, but it is the result based on half dense map to project
Dsp(p) and absolute scale factor.The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdSmall region is nearby chosen, and
Zoning R (p) and R respectivelycnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdReturn between (p ')
One changes cross-correlation coefficient NCCsd.If NCCcnnLess than NCCsd, then showing the depth prediction result of half dense depth map will get well
In CNN's as a result, selection Dsp(p) the ultimate depth predicted value as pixel p, otherwise, selects Rcnn(p′).If some point
The only prediction result of CNN, we just use RcnnThe ultimate depth of (p ') as pixel p.Part II merges for Gauss.
Further handled for depth map obtained in the previous step, according to the context relation between key frame, and it is deep to combine key frame
The uncertainty figure for spending figure carries out combined optimization, and here it is so-called Gauss fusion.Final depth is obtained by combined optimization
Figure.We are tested in the sequence of scenes of multiple data sets among experiment, achieve relatively good effect.
Due to the use of CNN, our dense SLAM systems of monocular need that using the acceleration of GPU fine reality can be obtained
When property effect.On TUM data sets and ICL-NUIM data sets, our algorithm is tested, with currently using direct method as
Basis monocular SLAM systems LSD-SLAM new at first compare, we posture tracking precision absolute orbit error by
0.622m is reduced to 0.231m.(point of the error within 10% accounts for full figure to the percentage of head rice of key frame depth maps in depth map
Ratio) by 0.61% bring up to 26.47%;Compared with using Weakly supervised depth prediction network merely, key frame depth maps
Percentage of head rice has brought up to 26.47% by 21.05%.In addition, the speed of service of whole system can also reach real-time effect.
Further, as shown in fig. 7, present invention also offers a kind of based on the real-time thick of on-line study depth prediction network
Close monocular SLAM systems, including direct method monocular SLAM modules 1, online adaptive CNN prediction modules 2, absolute measure return mould
Block 3 and depth map Fusion Module 4, wherein:
The direct method monocular SLAM modules 1, for what is gathered from monocular vision sensor by rotation and translation motion
Key frame is selected in sequence of pictures, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and
And half dense map of present frame is obtained using the depth of triangulation prediction high gradient point;
The online adaptive CNN prediction modules 2, for selecting on-line training picture pair according to the key frame, use
Block-by-block stochastic gradient descent method updates CNN network models according to the on-line training picture to carrying out on-line training, and utilizes training
CNN network models carry out depth prediction to present frame picture and obtain dense map afterwards;
The absolute measure regression block 3, for the half dense map according to the present frame and predict dense map into
Row depth dimension returns, and obtains the absolute measure factor of present frame depth information;
The depth map Fusion Module 4, throws for being converted the dense map of prediction by pose according to camera posture
Shadow projects to described half dense map in previous keyframe into previous keyframe, and according to the absolute measure factor, adopts
Each pixel depth prediction of the present frame is selected to be worth to prediction according to described two projection results with NCC scores voting method
Depth map, and Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, all any modification, equivalent and improvement made within the spirit and principles of the invention etc., should all include
Within protection scope of the present invention.
Claims (10)
- A kind of 1. real-time dense monocular SLAM methods based on on-line study depth prediction network, it is characterised in that including as follows Step:(1) key frame is selected from the sequence of pictures that monocular vision sensor is gathered by rotation and translation motion, passes through minimum The luminosity error for changing high gradient point optimizes to obtain the camera posture of key frame, and using triangulation prediction high gradient point Depth obtains half dense map of present frame;(2) on-line training picture pair is selected according to the key frame, using block-by-block stochastic gradient descent method according to the online instruction Practice picture and depth is carried out to present frame picture to carrying out on-line training renewal CNN network models, and using CNN network models after training Prediction obtains dense map;(3) depth dimension recurrence is carried out according to half dense map of the present frame and the dense map of prediction, obtains present frame depth Spend the absolute measure factor of information;(4) the dense map of prediction is projected in previous keyframe by pose conversion according to camera posture, and according to institute State the absolute measure factor to project to described half dense map in previous keyframe, using NCC score voting methods according to Two kinds of projection results select each pixel depth prediction of the present frame to be worth to predetermined depth figure, and to predetermined depth figure Gauss is carried out to merge to obtain ultimate depth figure.
- 2. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1, its feature It is, it is described that on-line training picture is selected according to the key frame, be specially:Using following constraints before and after key frame frame Picture frame is screened in picture and forms picture pair with the key frame:First, camera motion constraint:Displacement between two frame pictures in horizontal direction meets | tx|>0.9*T, wherein T represent two frames Parallax range between picture;Second, disparity constraint:For every a pair of of picture, using the mean parallax of the vertical direction between optical flow method calculating picture Disavg, only work as DisavgThis can just save as picture to candidate during less than predetermined threshold value δ and train picture;3rd, diversity constraint:Same key frame can only produce a pair of of training picture;4th, training pool capacity-constrained:When the quantity of training picture pair reaches given threshold V, just by the figure in training pool Piece is sent to network, and on-line training is carried out to network, preserves the network model that training obtains, while empty training pool and continue The screening of training data.
- 3. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is, updates CNN network models, tool according to the on-line training picture come on-line training using block-by-block stochastic gradient descent method Body is:Convolutional layer among ResNet-50 is divided into 5 blocks, each of which block is embodied as conv1, conv2_x, conv3_x,conv4_x,conv5_x;Conv1 is made of the full convolutional layer of a single 7X7;Conv2_x is by 3X3's Convolutional layer and 3 bottleneck structure blocks form for 10 layers totally;Conv3_x is made of for 12 layers totally 4 bottleneck structure blocks;Conv4_x is by 6 Bottleneck structure block forms for 18 layers totally:Conv5_x is made of for 9 layers totally 3 bottleneck structure blocks, and five parts add up and constitute 50 Rotating fields of ResNet-50;Among the process of on-line study each time and renewal, iteration k each time, only updates the parameter W of a parti(i=1, 2,3,4,5), keep remaining 4 subnetwork layer parameters constant, and in next iteration, update i-th piece of parameter, wherein i= (k+1) %5;Other layer parameters remain unchanged, and the iteration of whole on-line study and renewal is carrying out always, until preset stopping bar Part is satisfied.
- 4. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is that the on-line training renewal CNN network models are selective updating, are specially:Calculate per a collection of training loss function for being input to picture among CNN network models, once all pictures of a collection of picture Loss function be both greater than threshold value L set in advancehigh, the process of on-line study and renewal, on-line study and renewal will be started Process will carry out always, until training picture loss function drop to threshold value LlowUnder, or the number of iteration reaches Threshold value set in advance.
- 5. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is that the depth dimension homing method is:RANSAC algorithms or least-squares algorithm.
- 6. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is, described to project to the prediction dense map in previous keyframe by pose conversion, and according to the absolute ruler The degree factor projects to described half dense map in previous keyframe, using NCC scores voting method according to described two projections As a result select each pixel depth of the present frame to predict and be worth to predetermined depth figure, be specially:By each pixel p among key frame i, the dense map D predicted according to CNNcnn(p) converted with poseWill The pixel is projected to therewith in nearest key frame i-1, and the result of projection is expressed as p 'cnn;Pixel p among key frame i is done into another projection, is mapped in key frame i-1 and is denoted as p 'sd, projection is to be based on The result D of half dense mapsp(p) and absolute scale factor;The subpoint p ' in key frame i-1 respectivelycnnWith p 'sdNearby choose small region, and respectively zoning R (p) with RcnnNormalized-cross-correlation function NCC between (p ')cnnWith region R (p) and RsdNormalized-cross-correlation function between (p ') NCCsdIf NCCcnnLess than NCCsd, then show half dense depth map depth prediction result be better than CNN's as a result, choosing Select Dsp(p) the ultimate depth predicted value as pixel p, otherwise selects Rcnn(p '), if some points only have the prediction knot of CNN Fruit, just uses RcnnThe ultimate depth of (p ') as pixel p.
- 7. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is that carrying out Gauss to predetermined depth figure merges to obtain ultimate depth figure, is specially:The depth map obtained to NCC score voting methods is further handled, and according to the context relation between key frame, and is tied The uncertainty figure for closing key frame depth maps carries out combined optimization, and final depth map is obtained by combined optimization.
- 8. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is that carrying out depth prediction to present frame picture using CNN network models after training obtains further including in dense map:The depth value of each pixel in depth map is multiplied by a scale coefficientWherein, fadaptedFor the monocular camera focal length for obtaining training data online, BadaptedThe base of picture is trained for binocular Line, fpre-trainAnd Bpre-trainRespectively it is used for the focal length and baseline for training original CNN network models training picture.
- 9. the real-time dense monocular SLAM methods based on on-line study depth prediction network as claimed in claim 1 or 2, it is special Sign is that the key frame is:It is key frame to define the first pictures that whole image sequence or camera obtain in real time, except the first frame, the one of back Part picture frame can also be defined as key frame, and principle of key frame is that monitoring present frame is previous nearest therewith defined in it Whether translation and rotation between key frame have reached threshold value set in advance.
- 10. a kind of real-time dense monocular SLAM systems based on on-line study depth prediction network, it is characterised in that including direct Method monocular SLAM modules, online adaptive CNN prediction modules, absolute measure regression block and depth map Fusion Module, wherein:The direct method monocular SLAM modules, for the picture sequence gathered from monocular vision sensor by rotation and translation motion Key frame is selected in row, the luminosity error by minimizing high gradient point optimizes to obtain the camera posture of key frame, and uses The depth of triangulation prediction high gradient point obtains half dense map of present frame;The online adaptive CNN prediction modules, for according to the key frame select on-line training picture pair, using block-by-block with Machine gradient descent method is according to the on-line training picture to carrying out on-line training renewal CNN network models, and utilizing CNN nets after training Network model carries out depth prediction to present frame picture and obtains dense map;The absolute measure regression block, for the half dense map according to the present frame and predicts that dense map carries out depth Scale regression, obtains the absolute measure factor of present frame depth information;The depth map Fusion Module, for being projected to the dense map of prediction by pose conversion according to camera posture In one key frame, and described half dense map is projected in previous keyframe according to the absolute measure factor, obtained using NCC Point voting method selects each pixel depth prediction of the present frame to be worth to predetermined depth figure according to described two projection results, And Gauss is carried out to predetermined depth figure and merges to obtain ultimate depth figure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711227295.6A CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711227295.6A CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107945265A true CN107945265A (en) | 2018-04-20 |
CN107945265B CN107945265B (en) | 2019-09-20 |
Family
ID=61947685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711227295.6A Active CN107945265B (en) | 2017-11-29 | 2017-11-29 | Real-time dense monocular SLAM method and system based on on-line study depth prediction network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107945265B (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921893A (en) * | 2018-04-24 | 2018-11-30 | 华南理工大学 | A kind of image cloud computing method and system based on online deep learning SLAM |
CN109034237A (en) * | 2018-07-20 | 2018-12-18 | 杭州电子科技大学 | Winding detection method based on convolutional Neural metanetwork road sign and sequence search |
CN109087349A (en) * | 2018-07-18 | 2018-12-25 | 亮风台(上海)信息科技有限公司 | A kind of monocular depth estimation method, device, terminal and storage medium |
CN109087346A (en) * | 2018-09-21 | 2018-12-25 | 北京地平线机器人技术研发有限公司 | Training method, training device and the electronic equipment of monocular depth model |
CN109241856A (en) * | 2018-08-13 | 2019-01-18 | 浙江零跑科技有限公司 | A kind of vehicle-mounted vision system solid object detection method of monocular |
CN109300151A (en) * | 2018-07-02 | 2019-02-01 | 浙江商汤科技开发有限公司 | Image processing method and device, electronic equipment |
CN109341694A (en) * | 2018-11-12 | 2019-02-15 | 哈尔滨理工大学 | A kind of autonomous positioning air navigation aid of mobile sniffing robot |
CN109544630A (en) * | 2018-11-30 | 2019-03-29 | 南京人工智能高等研究院有限公司 | Posture information determines method and apparatus, vision point cloud construction method and device |
CN109640068A (en) * | 2018-10-31 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Information forecasting method, device, equipment and the storage medium of video frame |
CN110428461A (en) * | 2019-07-30 | 2019-11-08 | 清华大学 | In conjunction with the monocular SLAM method and device of deep learning |
CN110569877A (en) * | 2019-08-07 | 2019-12-13 | 武汉中原电子信息有限公司 | Non-invasive load identification method and device and computing equipment |
CN110599542A (en) * | 2019-08-30 | 2019-12-20 | 北京影谱科技股份有限公司 | Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area |
CN110610486A (en) * | 2019-08-28 | 2019-12-24 | 清华大学 | Monocular image depth estimation method and device |
CN110634150A (en) * | 2018-06-25 | 2019-12-31 | 上海汽车集团股份有限公司 | Method, system and device for generating instant positioning and map construction |
CN110717917A (en) * | 2019-09-30 | 2020-01-21 | 北京影谱科技股份有限公司 | CNN-based semantic segmentation depth prediction method and device |
CN110766737A (en) * | 2018-07-26 | 2020-02-07 | 富士通株式会社 | Method and apparatus for training depth estimation model and storage medium |
CN111062981A (en) * | 2019-12-13 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Image processing method, device and storage medium |
CN111089579A (en) * | 2018-10-22 | 2020-05-01 | 北京地平线机器人技术研发有限公司 | Heterogeneous binocular SLAM method and device and electronic equipment |
CN111127522A (en) * | 2019-12-30 | 2020-05-08 | 亮风台(上海)信息科技有限公司 | Monocular camera-based depth optical flow prediction method, device, equipment and medium |
CN111179326A (en) * | 2019-12-27 | 2020-05-19 | 精英数智科技股份有限公司 | Monocular depth estimation algorithm, system, equipment and storage medium |
CN111260706A (en) * | 2020-02-13 | 2020-06-09 | 青岛联合创智科技有限公司 | Dense depth map calculation method based on monocular camera |
CN111275751A (en) * | 2019-10-12 | 2020-06-12 | 浙江省北大信息技术高等研究院 | Unsupervised absolute scale calculation method and system |
CN111382613A (en) * | 2018-12-28 | 2020-07-07 | 中国移动通信集团辽宁有限公司 | Image processing method, apparatus, device and medium |
CN111462329A (en) * | 2020-03-24 | 2020-07-28 | 南京航空航天大学 | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning |
CN111783968A (en) * | 2020-06-30 | 2020-10-16 | 山东信通电子股份有限公司 | Power transmission line monitoring method and system based on cloud edge cooperation |
CN111784757A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Training method of depth estimation model, depth estimation method, device and equipment |
WO2020221443A1 (en) | 2019-04-30 | 2020-11-05 | Huawei Technologies Co., Ltd. | Scale-aware monocular localization and mapping |
CN112085842A (en) * | 2019-06-14 | 2020-12-15 | 北京京东尚科信息技术有限公司 | Depth value determination method and device, electronic equipment and storage medium |
CN112149645A (en) * | 2020-11-10 | 2020-12-29 | 西北工业大学 | Human body posture key point identification method based on generation of confrontation learning and graph neural network |
CN112150529A (en) * | 2019-06-28 | 2020-12-29 | 北京地平线机器人技术研发有限公司 | Method and device for determining depth information of image feature points |
CN112308911A (en) * | 2020-10-26 | 2021-02-02 | 中国科学院自动化研究所 | End-to-end visual positioning method and system |
CN112612476A (en) * | 2020-12-28 | 2021-04-06 | 吉林大学 | SLAM control method, equipment and storage medium based on GPU |
CN112767480A (en) * | 2021-01-19 | 2021-05-07 | 中国科学技术大学 | Monocular vision SLAM positioning method based on deep learning |
CN112862959A (en) * | 2021-03-23 | 2021-05-28 | 清华大学 | Real-time probability monocular dense reconstruction method and system based on semantic prior |
CN113971760A (en) * | 2021-10-26 | 2022-01-25 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
CN114119424A (en) * | 2021-08-27 | 2022-03-01 | 上海大学 | Video restoration method based on optical flow method and multi-view scene |
CN114820755A (en) * | 2022-06-24 | 2022-07-29 | 武汉图科智能科技有限公司 | Depth map estimation method and system |
US11443445B2 (en) | 2018-07-27 | 2022-09-13 | Shenzhen Sensetime Technology Co., Ltd. | Method and apparatus for depth estimation of monocular image, and storage medium |
CN118279770A (en) * | 2024-06-03 | 2024-07-02 | 南京信息工程大学 | Unmanned aerial vehicle follow-up shooting method based on SLAM algorithm |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140320593A1 (en) * | 2013-04-30 | 2014-10-30 | Qualcomm Incorporated | Monocular visual slam with general and panorama camera movements |
CN107358624A (en) * | 2017-06-06 | 2017-11-17 | 武汉几古几古科技有限公司 | The dense positioning immediately of monocular and map reconstruction method |
-
2017
- 2017-11-29 CN CN201711227295.6A patent/CN107945265B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140320593A1 (en) * | 2013-04-30 | 2014-10-30 | Qualcomm Incorporated | Monocular visual slam with general and panorama camera movements |
CN107358624A (en) * | 2017-06-06 | 2017-11-17 | 武汉几古几古科技有限公司 | The dense positioning immediately of monocular and map reconstruction method |
Cited By (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921893B (en) * | 2018-04-24 | 2022-03-25 | 华南理工大学 | Image cloud computing method and system based on online deep learning SLAM |
CN108921893A (en) * | 2018-04-24 | 2018-11-30 | 华南理工大学 | A kind of image cloud computing method and system based on online deep learning SLAM |
CN110634150A (en) * | 2018-06-25 | 2019-12-31 | 上海汽车集团股份有限公司 | Method, system and device for generating instant positioning and map construction |
CN110634150B (en) * | 2018-06-25 | 2023-08-11 | 上海汽车集团股份有限公司 | Method, system and device for generating instant positioning and map construction |
CN109300151A (en) * | 2018-07-02 | 2019-02-01 | 浙江商汤科技开发有限公司 | Image processing method and device, electronic equipment |
CN109300151B (en) * | 2018-07-02 | 2021-02-12 | 浙江商汤科技开发有限公司 | Image processing method and device and electronic equipment |
CN109087349A (en) * | 2018-07-18 | 2018-12-25 | 亮风台(上海)信息科技有限公司 | A kind of monocular depth estimation method, device, terminal and storage medium |
CN109087349B (en) * | 2018-07-18 | 2021-01-26 | 亮风台(上海)信息科技有限公司 | Monocular depth estimation method, device, terminal and storage medium |
CN109034237B (en) * | 2018-07-20 | 2021-09-17 | 杭州电子科技大学 | Loop detection method based on convolutional neural network signposts and sequence search |
CN109034237A (en) * | 2018-07-20 | 2018-12-18 | 杭州电子科技大学 | Winding detection method based on convolutional Neural metanetwork road sign and sequence search |
CN110766737B (en) * | 2018-07-26 | 2023-08-04 | 富士通株式会社 | Method and apparatus for training depth estimation model and storage medium |
CN110766737A (en) * | 2018-07-26 | 2020-02-07 | 富士通株式会社 | Method and apparatus for training depth estimation model and storage medium |
US11443445B2 (en) | 2018-07-27 | 2022-09-13 | Shenzhen Sensetime Technology Co., Ltd. | Method and apparatus for depth estimation of monocular image, and storage medium |
CN109241856A (en) * | 2018-08-13 | 2019-01-18 | 浙江零跑科技有限公司 | A kind of vehicle-mounted vision system solid object detection method of monocular |
CN109087346B (en) * | 2018-09-21 | 2020-08-11 | 北京地平线机器人技术研发有限公司 | Monocular depth model training method and device and electronic equipment |
CN109087346A (en) * | 2018-09-21 | 2018-12-25 | 北京地平线机器人技术研发有限公司 | Training method, training device and the electronic equipment of monocular depth model |
CN111089579B (en) * | 2018-10-22 | 2022-02-01 | 北京地平线机器人技术研发有限公司 | Heterogeneous binocular SLAM method and device and electronic equipment |
CN111089579A (en) * | 2018-10-22 | 2020-05-01 | 北京地平线机器人技术研发有限公司 | Heterogeneous binocular SLAM method and device and electronic equipment |
CN109640068A (en) * | 2018-10-31 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Information forecasting method, device, equipment and the storage medium of video frame |
CN109341694A (en) * | 2018-11-12 | 2019-02-15 | 哈尔滨理工大学 | A kind of autonomous positioning air navigation aid of mobile sniffing robot |
CN109544630A (en) * | 2018-11-30 | 2019-03-29 | 南京人工智能高等研究院有限公司 | Posture information determines method and apparatus, vision point cloud construction method and device |
CN111382613B (en) * | 2018-12-28 | 2024-05-07 | 中国移动通信集团辽宁有限公司 | Image processing method, device, equipment and medium |
CN111382613A (en) * | 2018-12-28 | 2020-07-07 | 中国移动通信集团辽宁有限公司 | Image processing method, apparatus, device and medium |
WO2020221443A1 (en) | 2019-04-30 | 2020-11-05 | Huawei Technologies Co., Ltd. | Scale-aware monocular localization and mapping |
CN113711276A (en) * | 2019-04-30 | 2021-11-26 | 华为技术有限公司 | Scale-aware monocular positioning and mapping |
CN112085842B (en) * | 2019-06-14 | 2024-04-09 | 北京京东乾石科技有限公司 | Depth value determining method and device, electronic equipment and storage medium |
CN112085842A (en) * | 2019-06-14 | 2020-12-15 | 北京京东尚科信息技术有限公司 | Depth value determination method and device, electronic equipment and storage medium |
CN112150529A (en) * | 2019-06-28 | 2020-12-29 | 北京地平线机器人技术研发有限公司 | Method and device for determining depth information of image feature points |
CN112150529B (en) * | 2019-06-28 | 2023-09-01 | 北京地平线机器人技术研发有限公司 | Depth information determination method and device for image feature points |
CN110428461A (en) * | 2019-07-30 | 2019-11-08 | 清华大学 | In conjunction with the monocular SLAM method and device of deep learning |
CN110428461B (en) * | 2019-07-30 | 2022-07-05 | 清华大学 | Monocular SLAM method and device combined with deep learning |
CN110569877A (en) * | 2019-08-07 | 2019-12-13 | 武汉中原电子信息有限公司 | Non-invasive load identification method and device and computing equipment |
CN110610486B (en) * | 2019-08-28 | 2022-07-19 | 清华大学 | Monocular image depth estimation method and device |
CN110610486A (en) * | 2019-08-28 | 2019-12-24 | 清华大学 | Monocular image depth estimation method and device |
CN110599542A (en) * | 2019-08-30 | 2019-12-20 | 北京影谱科技股份有限公司 | Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area |
CN110717917B (en) * | 2019-09-30 | 2022-08-09 | 北京影谱科技股份有限公司 | CNN-based semantic segmentation depth prediction method and device |
CN110717917A (en) * | 2019-09-30 | 2020-01-21 | 北京影谱科技股份有限公司 | CNN-based semantic segmentation depth prediction method and device |
CN111275751B (en) * | 2019-10-12 | 2022-10-25 | 浙江省北大信息技术高等研究院 | Unsupervised absolute scale calculation method and system |
CN111275751A (en) * | 2019-10-12 | 2020-06-12 | 浙江省北大信息技术高等研究院 | Unsupervised absolute scale calculation method and system |
CN111062981B (en) * | 2019-12-13 | 2023-05-05 | 腾讯科技(深圳)有限公司 | Image processing method, device and storage medium |
CN111062981A (en) * | 2019-12-13 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Image processing method, device and storage medium |
CN111179326A (en) * | 2019-12-27 | 2020-05-19 | 精英数智科技股份有限公司 | Monocular depth estimation algorithm, system, equipment and storage medium |
CN111179326B (en) * | 2019-12-27 | 2020-12-29 | 精英数智科技股份有限公司 | Monocular depth estimation method, system, equipment and storage medium |
CN111127522A (en) * | 2019-12-30 | 2020-05-08 | 亮风台(上海)信息科技有限公司 | Monocular camera-based depth optical flow prediction method, device, equipment and medium |
CN111127522B (en) * | 2019-12-30 | 2024-02-06 | 亮风台(上海)信息科技有限公司 | Depth optical flow prediction method, device, equipment and medium based on monocular camera |
CN111260706A (en) * | 2020-02-13 | 2020-06-09 | 青岛联合创智科技有限公司 | Dense depth map calculation method based on monocular camera |
CN111462329B (en) * | 2020-03-24 | 2023-09-29 | 南京航空航天大学 | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning |
CN111462329A (en) * | 2020-03-24 | 2020-07-28 | 南京航空航天大学 | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning |
CN111783968A (en) * | 2020-06-30 | 2020-10-16 | 山东信通电子股份有限公司 | Power transmission line monitoring method and system based on cloud edge cooperation |
CN111784757B (en) * | 2020-06-30 | 2024-01-23 | 北京百度网讯科技有限公司 | Training method of depth estimation model, depth estimation method, device and equipment |
CN111783968B (en) * | 2020-06-30 | 2024-05-31 | 山东信通电子股份有限公司 | Power transmission line monitoring method and system based on cloud edge cooperation |
CN111784757A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Training method of depth estimation model, depth estimation method, device and equipment |
CN112308911A (en) * | 2020-10-26 | 2021-02-02 | 中国科学院自动化研究所 | End-to-end visual positioning method and system |
CN112149645A (en) * | 2020-11-10 | 2020-12-29 | 西北工业大学 | Human body posture key point identification method based on generation of confrontation learning and graph neural network |
CN112612476A (en) * | 2020-12-28 | 2021-04-06 | 吉林大学 | SLAM control method, equipment and storage medium based on GPU |
CN112767480A (en) * | 2021-01-19 | 2021-05-07 | 中国科学技术大学 | Monocular vision SLAM positioning method based on deep learning |
CN112862959B (en) * | 2021-03-23 | 2022-07-12 | 清华大学 | Real-time probability monocular dense reconstruction method and system based on semantic prior |
CN112862959A (en) * | 2021-03-23 | 2021-05-28 | 清华大学 | Real-time probability monocular dense reconstruction method and system based on semantic prior |
CN114119424A (en) * | 2021-08-27 | 2022-03-01 | 上海大学 | Video restoration method based on optical flow method and multi-view scene |
CN114119424B (en) * | 2021-08-27 | 2024-08-06 | 上海大学 | Video restoration method based on optical flow method and multi-view scene |
CN113971760B (en) * | 2021-10-26 | 2024-02-06 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
CN113971760A (en) * | 2021-10-26 | 2022-01-25 | 山东建筑大学 | High-quality quasi-dense complementary feature extraction method based on deep learning |
CN114820755A (en) * | 2022-06-24 | 2022-07-29 | 武汉图科智能科技有限公司 | Depth map estimation method and system |
CN118279770A (en) * | 2024-06-03 | 2024-07-02 | 南京信息工程大学 | Unmanned aerial vehicle follow-up shooting method based on SLAM algorithm |
CN118279770B (en) * | 2024-06-03 | 2024-09-20 | 南京信息工程大学 | Unmanned aerial vehicle follow-up shooting method based on SLAM algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN107945265B (en) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107945265B (en) | Real-time dense monocular SLAM method and system based on on-line study depth prediction network | |
CN109387204B (en) | Mobile robot synchronous positioning and composition method facing indoor dynamic environment | |
CN107392964B (en) | The indoor SLAM method combined based on indoor characteristic point and structure lines | |
Bozic et al. | Neural deformation graphs for globally-consistent non-rigid reconstruction | |
CN112505065B (en) | Method for detecting surface defects of large part by indoor unmanned aerial vehicle | |
CN110084304B (en) | Target detection method based on synthetic data set | |
CN104537709B (en) | It is a kind of that method is determined based on the real-time three-dimensional reconstruction key frame that pose changes | |
CN112785702A (en) | SLAM method based on tight coupling of 2D laser radar and binocular camera | |
Gallego et al. | Event-based camera pose tracking using a generative event model | |
CN109974693A (en) | Unmanned plane localization method, device, computer equipment and storage medium | |
CN105184857B (en) | Monocular vision based on structure light ranging rebuilds mesoscale factor determination method | |
CN110945565A (en) | Dense visual SLAM using probabilistic bin maps | |
CN112184757B (en) | Method and device for determining motion trail, storage medium and electronic device | |
CN113674416B (en) | Three-dimensional map construction method and device, electronic equipment and storage medium | |
CN110378997A (en) | A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method | |
Kiciroglu et al. | Activemocap: Optimized viewpoint selection for active human motion capture | |
Tang et al. | Joint multi-view people tracking and pose estimation for 3D scene reconstruction | |
Sartipi et al. | Deep depth estimation from visual-inertial slam | |
CN110136202A (en) | A kind of multi-targets recognition and localization method based on SSD and dual camera | |
CN112233179A (en) | Visual odometer measuring method | |
KR20210058686A (en) | Device and method of implementing simultaneous localization and mapping | |
Tao et al. | LiDAR-NeRF: Novel lidar view synthesis via neural radiance fields | |
CN114627491A (en) | Single three-dimensional attitude estimation method based on polar line convergence | |
Zhou et al. | Evaluating modern approaches in 3d scene reconstruction: Nerf vs gaussian-based methods | |
Fan et al. | RS-DPSNet: Deep plane sweep network for rolling shutter stereo images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20180420 Assignee: HISCENE INFORMATION TECHNOLOGY Co.,Ltd. Assignor: HUAZHONG University OF SCIENCE AND TECHNOLOGY Contract record no.: X2023990000439 Denomination of invention: Real time dense monocular SLAM method and system based on online learning deep prediction network Granted publication date: 20190920 License type: Exclusive License Record date: 20230428 |
|
EE01 | Entry into force of recordation of patent licensing contract |