CN110533720A - Semantic SLAM system and method based on joint constraint - Google Patents
Semantic SLAM system and method based on joint constraint Download PDFInfo
- Publication number
- CN110533720A CN110533720A CN201910768052.6A CN201910768052A CN110533720A CN 110533720 A CN110533720 A CN 110533720A CN 201910768052 A CN201910768052 A CN 201910768052A CN 110533720 A CN110533720 A CN 110533720A
- Authority
- CN
- China
- Prior art keywords
- point
- module
- depth
- characteristic
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention proposes a kind of semantic SLAM System and method fors based on joint constraint, it aims to solve the problem that and solves the problems, such as that camera pose calculates inaccuracy and can not calculate camera pose when dynamic object occupies camera fields of view major part space in the unstable situation of pixel depth value, the accuracy that the estimation of camera pose is improved by depth constraints method, the integrality of camera track is improved using epipolar-line constraint method.Implementation method are as follows: data acquisition module obtains image sequence;Neural network module obtains detection image and example segmented image;Joint constraints module obtains different characteristic point category sets;Data fusion module obtains static object example segmented image and dynamic object example segmented image;Vision front-end module obtains the road sign point set in the pose and three-dimensional space of depth camera;The depth camera pose and road sign point of rear end optimization module acquisition global optimum;Semantic mapping module obtains semantic point cloud map.
Description
Technical field
The invention belongs to technical field of computer vision, further relate to a kind of semantic SLAM system based on joint constraint
System and method can be used for the pose estimation and the building of semantic map of camera in complicated high dynamic environment.
Background technique
Simultaneous localization and mapping system SLAM plays important angle in the independent navigation avoidance of unmanned systems
Color, in past 30 years, SLAM System Development is rapid, and main target is that unmanned systems are independently visiting circumstances not known
Carrying out during rope being capable of constructing environment map while itself accurate positioning.But the map that traditional SLAM system is built out
It only include the inferior grades geometrical characteristics such as the point, line, surface in environment, and for following unmanned systems, only believe comprising simple space
The map of breath is difficult to meet its growth requirement.The distinctive feature of semantic map is to contain the semantic information of object in environment,
The semantic map of three-dimensional space can make unmanned systems correctly perceive ambient conditions, by the Cognition Understanding to environment, can allow
The certain positioning accuracy of SLAM system improving makes up existing unmanned systems in environment sensing, and the deficiency understood.Semantic SLAM system
System not only obtains the geometry information of object in environment during building figure, object in environment-identification, at the same it is available its
The semantic informations such as position, posture and functional attributes, so as to effectively cope with complex scene and complete more complicated task.
In October, 2018, Berta Bescos of Univ Zaragoza, Spain et al. is in IEEE Robotics and
Automation Letters the 4th phase of volume 3 delivers entitled " DynaSLAM:Tracking, Mapping, and
The article of Inpainting in Dynamic Scenes " proposes a kind of SLAM system and method for Case-based Reasoning segmentation, In
On the basis of ORB-SLAM2, Detection dynamic target function is increased, RGB-D image data is input to Mask R-CNN network
In have priori dynamic property to all targets carry out segmentation pixel-by-pixel, obtain dynamic object example, and use multiple view
Method of geometry detection is not included in the true mobile object in CNN network output classification, by being not belonging to these dynamic objects reality
To camera pose is calculated, solve ORB-SLAM2 has dynamic object in the environment for example and the Feature Points Matching of true mobile object
In the case of camera pose estimation inaccuracy problem.Meanwhile by all targets for having priori dynamic property in example segmentation
Example is divided away, obtains image only containing static scene, and use static scene image building point cloud map.
However, DynaSLAM is by all object removals with priori dynamic property, when these targets are quiet in the environment
When state, the static scene map of foundation will lack the information of these objects, so that map structuring is not accurate enough.Another party
Face will lead to pose to camera pose is calculated using the unstable characteristic matching of depth value in the case where depth value is unstable
Evaluated error is larger, and when dynamic object occupies most of space in camera fields of view, because the match point in environment is not
Foot, will lead to DynaSLAM can not calculate camera pose, to frame losing phenomenon occur, the track of camera will be imperfect.
Summary of the invention
It is an object of the invention to overcome the shortcomings of above-mentioned prior art, a kind of semanteme based on joint constraint is proposed
SLAM system and method, for solving the camera pose calculating inaccuracy in the unstable situation of pixel depth value and working as dynamic object
The problem of camera pose can not be calculated when occupying camera fields of view major part space, to improve the accuracy and camera rail of camera pose
The integrality of mark, while solving the problems, such as that the object with kinetic property can not construct when static in cloud map, it obtains
More accurately point cloud map.
To achieve the above object, the technical scheme adopted by the invention is as follows:
A kind of semantic SLAM system based on joint constraint, including data acquisition module, neural network module, joint constraint
Module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, with
Obtain range image sequence and color image sequence;
Neural network module, for by training BlitzNet network model, before being carried out frame by frame to color image sequence
To dissemination process, to obtain the detection image with potential dynamic object frame and the example segmentation with potential dynamic object example
Image;
Joint constraints module, for carrying out characteristic matching to each color image frame and previous color image frame, and to
The depth value of each characteristic matching pair with acquisition constructs depth constraints, to the characteristic point in potential dynamic object frame region to structure
Epipolar-line constraint is built, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static mesh
Mark example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point
Function carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module, for establishing a point cloud map according to the optimal pose of depth camera, and by static object example
It is mapped on a cloud map in segmented image with semantic pixel, to obtain semantic point cloud map.
A kind of implementation method of the semantic SLAM based on joint constraint, includes the following steps:
(1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and
According to shooting time, sequence is respectively ranked up N color image frame and N frame depth image from front to back, obtains color image sequence
Arrange C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
(2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to colour
N color image frame in image sequence carries out propagated forward processing frame by frame, obtains the detection image with potential dynamic object frame
CD1,CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,...,
CSi,...,CSN;
(3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
(3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and to P1With
P2It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all spies for meeting depth constraints
Sign point is classified as depth invariant feature point set DSP2;
(3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point be classified as
Potential behavioral characteristics point set PP2, by DSP2In be located at CD2The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point
Set EP2;
(3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2Middle satisfaction
The characteristic point of epipolar-line constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2, and will
EP2And SP2Merge into invariant feature point set S2;
(4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image
CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmentation figure
As CS2The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static mesh
Example is marked, remaining example is classified as dynamic object example, obtains static object example segmented image CSS2With dynamic object example
Segmented image CDS2;
(5) vision front-end module obtains the pose ξ of depth camera2With the road sign point set L in three-dimensional space2:
(5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2And S2In
C1In corresponding match point, calculate C2The pose ξ of depth camera2;
(5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain three
Road sign point set L in dimension space2;
(5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth camera position
Appearance ξ3,ξ4,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN;
(6) rear end optimization module obtains the depth camera pose and road sign point of global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point
p1,p2,...,pj,...,pM, construct with depth camera pose ξ2,ξ3,...,ξi,...,ξNWith road sign point p1,
p2,...,pj,...,pMFor the cost function Loss of variable, and using column Wen Baige-Ma Kuaer special formula method to cost function Loss
Nonlinear optimization is carried out, global optimum depth camera pose ξ is obtained2',ξ3',...,ξi',...,ξN' and three-dimensional space in road
Punctuate
p1',p2',...,pj',...,pM';
(7) semantic mapping module obtains semantic point cloud map:
(7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by the i-th frame
Color image CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and the CDS obtained using data fusion modulei
In dynamic object example information by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi;
(7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions, benefit
Three-dimensional space point is generated with cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PLi;
(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiThe language of acquisition
Adopted information, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PLi';
(7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, obtain the overall situation
Semanteme point cloud map PL.
The present invention compared with prior art, has the advantage that
First, the present invention is realized to characteristic matching using depth constraints method to the depth constraints of depth value distance, is obtained
Depth invariant feature point set, and camera pose is calculated by the match point of depth invariant feature point and depth invariant feature point,
Compared with the match point for passing through characteristic point and characteristic point all in environment in the prior art calculates camera pose, camera is improved
The accuracy of pose estimation;
Second, the present invention is realized to characteristic matching using epipolar-line constraint method to the epipolar-line constraint of polar curve distance, is obtained quiet
State set of characteristic points and behavioral characteristics point set, and camera is calculated by static nature point set and environmental characteristic point set jointly
Pose solves compared with only calculating camera pose by environmental characteristic point set in the prior art when dynamic object occupies phase
The problem of camera pose can not be calculated when most of space in the machine visual field, to draw more complete camera track;
Third, data fusion module of the present invention calculate behavioral characteristics point ratio and potential behavioral characteristics point ratio, will be potential
Dynamic object example is divided into dynamic object instance and static dynamic object example, and semantic mapping module is by static object example
Be mapped in a cloud map, in the prior art by potential dynamic object example be classified as dynamic object example and building point a cloud
It does not utilize dynamic object example to compare when map, obtains more accurately semantic point cloud map.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of semanteme SLAM system of the invention;
Fig. 2 is the implementation flow chart of semanteme SLAM method of the invention;
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Referring to Fig.1, the present invention is based on the semantic SLAM systems of joint constraint, including data acquisition module, neural network mould
Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, with
Obtain range image sequence and color image sequence;
Neural network module, for by training BlitzNet network model, before being carried out frame by frame to color image sequence
To dissemination process, to obtain the detection image with potential dynamic object frame and the example segmentation with potential dynamic object example
Image;
Joint constraints module, for carrying out characteristic matching to each color image frame and previous color image frame, and to
The depth value of each characteristic matching pair with acquisition constructs depth constraints, to the characteristic point in potential dynamic object frame region to structure
Epipolar-line constraint is built, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static mesh
Mark example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point
Function carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module, for establishing a point cloud map according to the optimal pose of depth camera, and by static object example
It is mapped on a cloud map in segmented image with semantic pixel, to obtain semantic point cloud map.
Referring to Fig. 2, the present invention is based on the semantic SLAM methods of joint constraint, include the following steps:
Step (1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and
According to shooting time, sequence is respectively ranked up N color image frame and N frame depth image from front to back, obtains color image sequence
Arrange C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
Step (2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to colour
N color image frame in image sequence carries out propagated forward processing frame by frame, obtains the detection image with potential dynamic object frame
CD1,CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,...,
CSi,...,CSN;
Step (3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
Step (3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and it is right
P1And P2It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all meet depth constraints
Characteristic point be classified as depth invariant feature point set DSP2;
The realization step of depth constraints method are as follows:
Step (3a1) is constructed with P2Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image
Block, and calculate the average depth value of each image block:
Wherein (x, y) is indicatedPixel coordinate, depth (x, y) indicateDepth value;
Step (3a2) passes throughWithIn C1In matching characteristic pointCharacteristic matching is calculated to depth value distance Dd:
Threshold θ is arranged in step (3a3), and by P2Middle DdCharacteristic point less than θ is classified as depth invariant feature point set DSP2,
It realizes to DdDepth constraints, this is because the unstable characteristic point of removal depth can reduce extraneous data amount, to improve
Solution efficiency, on the other hand, the error that the characteristic point of depth mutation is generated when calculating the error of cost function are larger, to non-thread
Property optimization result bring greater impact so that the global optimum's camera pose acquired is not accurate enough;
Step (3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point
It is classified as potential behavioral characteristics point set PP2, by DSP2In be located at CD2Potential dynamic object outer frame characteristic point be classified as environment spy
Levy point set EP2, subsequent step, which calculates basis matrix using environmental characteristic point set, can obtain accurate result;
Step (3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2In
The characteristic point for meeting epipolar-line constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2,
And by EP2And SP2Merge into invariant feature point set S2;
The realization step of epipolar-line constraint method are as follows:
Step (3c1) passes through the internal reference x-axis zoom factor f of camerax, y-axis zoom factor fy, x-axis shift factor cx, y-axis
Shift factor cyBy PP2Each of characteristic pointPixel coordinate [us,vs]TIt is converted into normalized coordinate [uc,vc,1]T:
Step (3c2) chooses EP using RANSAC method2In eight characteristic points, and pass through eight features using 8 methods
Then point matching passes through F and [u to basis matrix F is calculatedc,vc,1]TIt calculatesPolar curve l:
Step (3c3) by F, l,WithIn C1In matching characteristic pointCharacteristic matching is calculated to polar curve distance
De:
Threshold value η is arranged in step (3c4), and by PP2Middle DeCharacteristic point less than η is classified as static nature point set SP2, remaining
Characteristic point be classified as behavioral characteristics point set DP2, realize to DeEpipolar-line constraint, this is because when characteristic point be static field sight spot
When,It falls on polar curve l, in allowable range of error, whenWhen near polar curve l, it is believed that characteristic pointFor
Static nature point, and DeWhen greater than threshold value,With a distance from polar curve l farther out, then it is assumed that characteristic pointFor on dynamic object
Behavioral characteristics point, subsequent step is not had to behavioral characteristics point using static nature point and calculates camera crisis, to improve phase seat in the plane
The accuracy of appearance estimation;
On the other hand, the feature in the case where dynamic object occupies most of space in camera fields of view, in static scene
Point is very little to being not enough to calculate camera pose, this leads to the appearance of frame losing problem, and static nature point set SP2It will provide for
Sufficient characteristic point, calculates camera pose using static nature point set and environmental characteristic point set, to solve frame losing jointly
Problem, and improve the accuracy of camera pose estimation;
Step (4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image
CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmentation figure
As CS2The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static mesh
Example is marked, remaining example is classified as dynamic object example, obtains static object example segmented image CSS2With dynamic object example
Segmented image CDS2;
The acquisition methods of static object example and dynamic object example are as follows:
Step (4a) statistical environment set of characteristic points EP2NumberStatic nature point set SP2NumberWith
Behavioral characteristics point set DP2NumberAnd calculate behavioral characteristics point ratio τdWith potential behavioral characteristics point ratio τr:
Step (4b) sets τdThreshold value be 0.5, τrIt is 0.15, works as τd≤ 0.5 and τrWhen≤0.15, then it will test frame
Interior example segmentation object is classified as static object example, remaining example segmentation object is classified as dynamic object example, this is because
Neural network module all detected all objects for having kinetic property, when this has the object of kinetic property in the environment
When being static, which can still be classified as static scene, and the object information should be built by putting in cloud map, otherwise map structuring
It is not accurate enough and complete.
The pose ξ of step (5) vision front-end module acquisition depth camera2With the road sign point set L in three-dimensional space2:
Step (5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2, and
S2In C1In corresponding match point, calculate C2The pose ξ of depth camera2;
Step (5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain
Road sign point set L into three-dimensional space2;
Road sign point set L in three-dimensional space2Acquisition methods are as follows:
Step (5b1) is by the internal reference of camera by S2Each of characteristic point pixel coordinate [us,vs]TIt is converted into and returns
One changes coordinate [uc,vc,1]T:
Step (5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinateT:
Step (5b3) passes through camera pose ξ2In spin matrix R and translation vector t, convert generation for camera coordinates P'
Boundary coordinate Pw:
Pw=R-1(P'-t)=[X, Y, Z]T (10)
Step (5b4) will be located at PwThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L2。
Step (5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth phase
Seat in the plane appearance ξ3,ξ4,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN, concrete methods of realizing is according to step
The method of (3a)-(5b4) is to C2And C3It carries out identical processing and obtains depth camera pose ξ3With road sign point set L3, then
To C3And C4、C4And C5、…、Ci-1And Ci、…、CN-1And CNIdentical processing is carried out, ξ is obtained3And L3、ξ4And L4、…、ξiWith
Li、…、ξNAnd LN;
The depth camera pose and road sign point of step (6) rear end optimization module acquisition global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point p1,
p2,...,pj,...,pM, construct with depth camera pose ξ2,ξ3,...,ξi,...,ξNWith road sign point p1,p2,...,pj,...,pM
For the cost function Loss of variable, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method,
Obtain global optimum depth camera pose ξ2',ξ3',...,ξi',...,ξN' and three-dimensional space in road sign point p1',
p2',...,pj',...,pM';
Construct the realization step of cost function are as follows:
Step (6a) is according to camera pose ξ2In spin matrix R and translation vector t, by the road sign point p in LjThree-dimensional coordinate
[X,Y,Z]TIt is converted into camera coordinates pj':
pj'=Rpj+ t=[X', Y', Z']T (11)
Step (6b) passes through camera coordinates [X', Y', Z']TCalculate normalized coordinate [uc,vc,1]T:
Step (6c) calculates pixel coordinate P by the internal reference of cameraj=[us,vs]:
Step (6d) passes through S2In with pjCorresponding characteristic point pixel coordinate Pj' calculate error e2:
Step (6e) is according to step (7a)-(7d) method to ξ3、ξ4、…、ξi、…、ξNIdentical operation is successively carried out,
Obtain e3、e4、…、ei、…、eN;
Step (6f) is to e2,e3,...,ei,...,eNIt sums, obtains cost function Loss:
Step (7) semanteme mapping module obtains semantic point cloud map:
Step (7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by
I color image frame CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and obtained using data fusion module
CDSiIn dynamic object example information by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi;
Step (7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions
It sets, generates three-dimensional space point using cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PLi;
Step (7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiIt obtains
Semantic information, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud
PLi';
Step (7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, it obtains
Global semantic point cloud map PL.
Claims (7)
1. a kind of semantic SLAM system based on joint constraint, which is characterized in that including data acquisition module, neural network mould
Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, to obtain
Range image sequence and color image sequence;
Neural network module, it is preceding to biography for being carried out frame by frame to color image sequence by training BlitzNet network model
Processing is broadcast, to obtain the detection image with potential dynamic object frame and the example segmentation figure with potential dynamic object example
Picture;
Joint constraints module for carrying out characteristic matching to each color image frame and previous color image frame, and obtains matching
The depth value of each characteristic matching pair taken constructs depth constraints, to the characteristic point in potential dynamic object frame region to building pole
Line constraint, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static object reality
Example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost letter by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point
Number carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module for establishing point cloud map according to the optimal pose of depth camera, and static object example is divided
It is mapped on a cloud map in image with semantic pixel, to obtain semantic point cloud map.
2. a kind of implementation method of the semantic SLAM based on joint constraint, which comprises the steps of:
(1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and according to
Sequence is respectively ranked up N color image frame and N frame depth image shooting time from front to back, obtains color image sequence
C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
(2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to color image
N color image frame in sequence carries out propagated forward processing frame by frame, obtains the detection image CD with potential dynamic object frame1,
CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,...,CSi,...,
CSN;
(3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
(3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and to P1And P2Into
Row matching, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all characteristic points for meeting depth constraints
It is classified as depth invariant feature point set DSP2;
(3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point be classified as it is potential
Behavioral characteristics point set PP2, by DSP2In be located at CD2The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point set
EP2;
(3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2In meet polar curve
The characteristic point of constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2, and by EP2With
SP2Merge into invariant feature point set S2;
(4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmented image CS2
The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static object reality
Example, remaining example are classified as dynamic object example, obtain static object example segmented image CSS2Divide with dynamic object example
Image CDS2;
(5) vision front-end module obtains the pose ξ of depth camera2With the road sign point set L in three-dimensional space2:
(5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2And S2In C1In it is right
The match point answered calculates C2The pose ξ of depth camera2;
(5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain three-dimensional space
Between in road sign point set L2;
(5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth camera pose ξ3,
ξ4,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN;
(6) rear end optimization module obtains the depth camera pose and road sign point of global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point p1,p2,...,
pj,...,pM, construct with depth camera pose ξ2,ξ3,...,ξi,...,ξNWith road sign point p1,p2,...,pj,...,pMFor variable
Cost function Loss, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method, obtained complete
Office optimal depth camera pose ξ2',ξ3',...,ξi',...,ξN' and three-dimensional space in road sign point p1',p2',...,
pj',...,pM';
(7) semantic mapping module obtains semantic point cloud map:
(7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by the i-th frame cromogram
As CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and the CDS obtained using data fusion moduleiIn it is dynamic
State object instance information is by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi;
(7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions utilizes point
Cloud library PCL generates three-dimensional space point, and all three-dimensional space points of generation are merged into a cloud PLi;
(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiThe semantic letter of acquisition
Breath, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PLi';
(7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, obtain global semanteme
Point cloud map PL.
3. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3a)
Depth constraints method, realize step are as follows:
(3a1) is constructed with P2Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image block, and count
Calculate the average depth value of each image block:
Wherein (x, y) is indicatedPixel coordinate, depth (x, y) indicateDepth value;
(3a2) passes throughWithIn C1In matching characteristic pointCharacteristic matching is calculated to depth value distance Dd:
Threshold θ is arranged in (3a3), and by P2Middle DdCharacteristic point less than θ is classified as depth invariant feature point set DSP2, realize to Dd's
Depth constraints.
4. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3c)
Epipolar-line constraint method, realize step are as follows:
(3c1) passes through the internal reference x-axis zoom factor f of camerax, y-axis zoom factor fy, x-axis shift factor cx, y-axis shift factor cy
By PP2Each of characteristic pointPixel coordinate [us,vs]TIt is converted into normalized coordinate [uc,vc,1]T:
(3c2) chooses EP using RANSAC method2In eight characteristic points, and pass through eight Feature Points Matchings pair using 8 methods
Basis matrix F is calculated, F and [u is then passed throughc,vc,1]TIt calculatesPolar curve l:
(3c3) by F, l,WithIn C1In matching characteristic pointCharacteristic matching is calculated to polar curve distance De:
Threshold value η is arranged in (3c4), and by PP2Middle DeCharacteristic point less than η is classified as static nature point set SP2, remaining characteristic point
It is classified as behavioral characteristics point set DP2, realize to DeEpipolar-line constraint.
5. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (4)
Static object example and dynamic object example, acquisition methods are as follows:
(4a) statistical environment set of characteristic points EP2NumberStatic nature point set SP2NumberWith behavioral characteristics point
Set DP2NumberAnd calculate behavioral characteristics point ratio τdWith potential behavioral characteristics point ratio τr:
(4b) sets τdThreshold value be 0.5, τrIt is 0.15, works as τd≤ 0.5 and τrWhen≤0.15, then it will test the example point in frame
It cuts target and is classified as static object example, remaining example segmentation object is classified as dynamic object example.
6. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (5b)
Road sign point set L in three-dimensional space2, acquisition methods are as follows:
(5b1) is by the internal reference of camera by S2Each of characteristic point pixel coordinate [us,vs]TIt is converted into normalized coordinate
[uc,vc,1]T:
(5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinateT:
(5b3) passes through camera pose ξ2In spin matrix R and translation vector t, convert world coordinates P for camera coordinates P'w:
Pw=R-1(P'-t)=[X, Y, Z]T (10)
(5b4) will be located at PwThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L2。
7. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (6)
Cost function is constructed, realizes step are as follows:
(6a) is according to camera pose ξ2In spin matrix R and translation vector t, by the road sign point p in LjThree-dimensional coordinate [X, Y, Z]T
It is converted into camera coordinates pj':
pj'=Rpj+ t=[X', Y', Z']T (11)
(6b) passes through camera coordinates [X', Y', Z']TCalculate normalized coordinate [uc,vc,1]T:
(6c) calculates pixel coordinate P by the internal reference of cameraj=[us,vs]:
(6d) passes through S2In with pjCorresponding characteristic point pixel coordinate Pj' calculate error e2:
(6e) is according to step (7a)-(7d) method to ξ3、ξ4、…、ξi、…、ξNIdentical operation is successively carried out, e is obtained3、
e4、…、ei、…、eN;
(6f) is to e2,e3,...,ei,...,eNIt sums, obtains cost function Loss:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910768052.6A CN110533720B (en) | 2019-08-20 | 2019-08-20 | Semantic SLAM system and method based on joint constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910768052.6A CN110533720B (en) | 2019-08-20 | 2019-08-20 | Semantic SLAM system and method based on joint constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110533720A true CN110533720A (en) | 2019-12-03 |
CN110533720B CN110533720B (en) | 2023-05-02 |
Family
ID=68663703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910768052.6A Active CN110533720B (en) | 2019-08-20 | 2019-08-20 | Semantic SLAM system and method based on joint constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110533720B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956651A (en) * | 2019-12-16 | 2020-04-03 | 哈尔滨工业大学 | Terrain semantic perception method based on fusion of vision and vibrotactile sense |
CN111242954A (en) * | 2020-01-20 | 2020-06-05 | 浙江大学 | Panorama segmentation method with bidirectional connection and shielding processing |
CN111402336A (en) * | 2020-03-23 | 2020-07-10 | 中国科学院自动化研究所 | Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method |
CN111559314A (en) * | 2020-04-27 | 2020-08-21 | 长沙立中汽车设计开发股份有限公司 | Depth and image information fused 3D enhanced panoramic looking-around system and implementation method |
CN111709982A (en) * | 2020-05-22 | 2020-09-25 | 浙江四点灵机器人股份有限公司 | Three-dimensional reconstruction method for dynamic environment |
CN111882611A (en) * | 2020-07-17 | 2020-11-03 | 北京三快在线科技有限公司 | Map construction method and device |
CN112308921A (en) * | 2020-11-09 | 2021-02-02 | 重庆大学 | Semantic and geometric based joint optimization dynamic SLAM method |
WO2021114776A1 (en) * | 2019-12-12 | 2021-06-17 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Object detection method, object detection device, terminal device, and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201805688D0 (en) * | 2018-04-05 | 2018-05-23 | Imagination Tech Ltd | Matching local image feature descriptors |
CN108596974A (en) * | 2018-04-04 | 2018-09-28 | 清华大学 | Dynamic scene robot localization builds drawing system and method |
CN109815847A (en) * | 2018-12-30 | 2019-05-28 | 中国电子科技集团公司信息科学研究院 | A kind of vision SLAM method based on semantic constraint |
WO2019153245A1 (en) * | 2018-02-09 | 2019-08-15 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Systems and methods for deep localization and segmentation with 3d semantic map |
-
2019
- 2019-08-20 CN CN201910768052.6A patent/CN110533720B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019153245A1 (en) * | 2018-02-09 | 2019-08-15 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Systems and methods for deep localization and segmentation with 3d semantic map |
CN108596974A (en) * | 2018-04-04 | 2018-09-28 | 清华大学 | Dynamic scene robot localization builds drawing system and method |
GB201805688D0 (en) * | 2018-04-05 | 2018-05-23 | Imagination Tech Ltd | Matching local image feature descriptors |
CN109815847A (en) * | 2018-12-30 | 2019-05-28 | 中国电子科技集团公司信息科学研究院 | A kind of vision SLAM method based on semantic constraint |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021114776A1 (en) * | 2019-12-12 | 2021-06-17 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Object detection method, object detection device, terminal device, and medium |
CN110956651A (en) * | 2019-12-16 | 2020-04-03 | 哈尔滨工业大学 | Terrain semantic perception method based on fusion of vision and vibrotactile sense |
CN111242954A (en) * | 2020-01-20 | 2020-06-05 | 浙江大学 | Panorama segmentation method with bidirectional connection and shielding processing |
CN111242954B (en) * | 2020-01-20 | 2022-05-13 | 浙江大学 | Panorama segmentation method with bidirectional connection and shielding processing |
CN111402336A (en) * | 2020-03-23 | 2020-07-10 | 中国科学院自动化研究所 | Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method |
CN111402336B (en) * | 2020-03-23 | 2024-03-12 | 中国科学院自动化研究所 | Semantic SLAM-based dynamic environment camera pose estimation and semantic map construction method |
CN111559314A (en) * | 2020-04-27 | 2020-08-21 | 长沙立中汽车设计开发股份有限公司 | Depth and image information fused 3D enhanced panoramic looking-around system and implementation method |
CN111709982A (en) * | 2020-05-22 | 2020-09-25 | 浙江四点灵机器人股份有限公司 | Three-dimensional reconstruction method for dynamic environment |
CN111882611A (en) * | 2020-07-17 | 2020-11-03 | 北京三快在线科技有限公司 | Map construction method and device |
CN111882611B (en) * | 2020-07-17 | 2023-11-24 | 北京三快在线科技有限公司 | Map construction method and device |
CN112308921A (en) * | 2020-11-09 | 2021-02-02 | 重庆大学 | Semantic and geometric based joint optimization dynamic SLAM method |
CN112308921B (en) * | 2020-11-09 | 2024-01-12 | 重庆大学 | Combined optimization dynamic SLAM method based on semantics and geometry |
Also Published As
Publication number | Publication date |
---|---|
CN110533720B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110533720A (en) | Semantic SLAM system and method based on joint constraint | |
CN111462135B (en) | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation | |
CN110264416B (en) | Sparse point cloud segmentation method and device | |
CN111179324B (en) | Object six-degree-of-freedom pose estimation method based on color and depth information fusion | |
CN110135485A (en) | The object identification and localization method and system that monocular camera is merged with millimetre-wave radar | |
CN109816725A (en) | A kind of monocular camera object pose estimation method and device based on deep learning | |
CN110097553A (en) | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system | |
CN111161317A (en) | Single-target tracking method based on multiple networks | |
CN110533716B (en) | Semantic SLAM system and method based on 3D constraint | |
CN110827395A (en) | Instant positioning and map construction method suitable for dynamic environment | |
CN112801074B (en) | Depth map estimation method based on traffic camera | |
CN105719352B (en) | Face three-dimensional point cloud super-resolution fusion method and apply its data processing equipment | |
CN112418288B (en) | GMS and motion detection-based dynamic vision SLAM method | |
CN110827312B (en) | Learning method based on cooperative visual attention neural network | |
CN113408584B (en) | RGB-D multi-modal feature fusion 3D target detection method | |
Shreyas et al. | 3D object detection and tracking methods using deep learning for computer vision applications | |
CN109508673A (en) | It is a kind of based on the traffic scene obstacle detection of rodlike pixel and recognition methods | |
Cui et al. | Dense depth-map estimation based on fusion of event camera and sparse LiDAR | |
CN111915651A (en) | Visual pose real-time estimation method based on digital image map and feature point tracking | |
CN112396655B (en) | Point cloud data-based ship target 6D pose estimation method | |
CN114140527A (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN112801928A (en) | Attention mechanism-based millimeter wave radar and visual sensor fusion method | |
Yang et al. | Ground plane matters: Picking up ground plane prior in monocular 3d object detection | |
CN115908508A (en) | Coastline ship real-time tracking method based on array camera | |
Tao et al. | 3d semantic vslam of indoor environment based on mask scoring rcnn |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |