CN110533720A - Semantic SLAM system and method based on joint constraint - Google Patents

Semantic SLAM system and method based on joint constraint Download PDF

Info

Publication number
CN110533720A
CN110533720A CN201910768052.6A CN201910768052A CN110533720A CN 110533720 A CN110533720 A CN 110533720A CN 201910768052 A CN201910768052 A CN 201910768052A CN 110533720 A CN110533720 A CN 110533720A
Authority
CN
China
Prior art keywords
point
module
depth
characteristic
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910768052.6A
Other languages
Chinese (zh)
Other versions
CN110533720B (en
Inventor
韩红
王毅飞
张齐驰
唐裕亮
迟勇欣
范迎春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Electronic Science and Technology
Original Assignee
Xian University of Electronic Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Electronic Science and Technology filed Critical Xian University of Electronic Science and Technology
Priority to CN201910768052.6A priority Critical patent/CN110533720B/en
Publication of CN110533720A publication Critical patent/CN110533720A/en
Application granted granted Critical
Publication of CN110533720B publication Critical patent/CN110533720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention proposes a kind of semantic SLAM System and method fors based on joint constraint, it aims to solve the problem that and solves the problems, such as that camera pose calculates inaccuracy and can not calculate camera pose when dynamic object occupies camera fields of view major part space in the unstable situation of pixel depth value, the accuracy that the estimation of camera pose is improved by depth constraints method, the integrality of camera track is improved using epipolar-line constraint method.Implementation method are as follows: data acquisition module obtains image sequence;Neural network module obtains detection image and example segmented image;Joint constraints module obtains different characteristic point category sets;Data fusion module obtains static object example segmented image and dynamic object example segmented image;Vision front-end module obtains the road sign point set in the pose and three-dimensional space of depth camera;The depth camera pose and road sign point of rear end optimization module acquisition global optimum;Semantic mapping module obtains semantic point cloud map.

Description

Semantic SLAM system and method based on joint constraint
Technical field
The invention belongs to technical field of computer vision, further relate to a kind of semantic SLAM system based on joint constraint System and method can be used for the pose estimation and the building of semantic map of camera in complicated high dynamic environment.
Background technique
Simultaneous localization and mapping system SLAM plays important angle in the independent navigation avoidance of unmanned systems Color, in past 30 years, SLAM System Development is rapid, and main target is that unmanned systems are independently visiting circumstances not known Carrying out during rope being capable of constructing environment map while itself accurate positioning.But the map that traditional SLAM system is built out It only include the inferior grades geometrical characteristics such as the point, line, surface in environment, and for following unmanned systems, only believe comprising simple space The map of breath is difficult to meet its growth requirement.The distinctive feature of semantic map is to contain the semantic information of object in environment, The semantic map of three-dimensional space can make unmanned systems correctly perceive ambient conditions, by the Cognition Understanding to environment, can allow The certain positioning accuracy of SLAM system improving makes up existing unmanned systems in environment sensing, and the deficiency understood.Semantic SLAM system System not only obtains the geometry information of object in environment during building figure, object in environment-identification, at the same it is available its The semantic informations such as position, posture and functional attributes, so as to effectively cope with complex scene and complete more complicated task.
In October, 2018, Berta Bescos of Univ Zaragoza, Spain et al. is in IEEE Robotics and Automation Letters the 4th phase of volume 3 delivers entitled " DynaSLAM:Tracking, Mapping, and The article of Inpainting in Dynamic Scenes " proposes a kind of SLAM system and method for Case-based Reasoning segmentation, In On the basis of ORB-SLAM2, Detection dynamic target function is increased, RGB-D image data is input to Mask R-CNN network In have priori dynamic property to all targets carry out segmentation pixel-by-pixel, obtain dynamic object example, and use multiple view Method of geometry detection is not included in the true mobile object in CNN network output classification, by being not belonging to these dynamic objects reality To camera pose is calculated, solve ORB-SLAM2 has dynamic object in the environment for example and the Feature Points Matching of true mobile object In the case of camera pose estimation inaccuracy problem.Meanwhile by all targets for having priori dynamic property in example segmentation Example is divided away, obtains image only containing static scene, and use static scene image building point cloud map.
However, DynaSLAM is by all object removals with priori dynamic property, when these targets are quiet in the environment When state, the static scene map of foundation will lack the information of these objects, so that map structuring is not accurate enough.Another party Face will lead to pose to camera pose is calculated using the unstable characteristic matching of depth value in the case where depth value is unstable Evaluated error is larger, and when dynamic object occupies most of space in camera fields of view, because the match point in environment is not Foot, will lead to DynaSLAM can not calculate camera pose, to frame losing phenomenon occur, the track of camera will be imperfect.
Summary of the invention
It is an object of the invention to overcome the shortcomings of above-mentioned prior art, a kind of semanteme based on joint constraint is proposed SLAM system and method, for solving the camera pose calculating inaccuracy in the unstable situation of pixel depth value and working as dynamic object The problem of camera pose can not be calculated when occupying camera fields of view major part space, to improve the accuracy and camera rail of camera pose The integrality of mark, while solving the problems, such as that the object with kinetic property can not construct when static in cloud map, it obtains More accurately point cloud map.
To achieve the above object, the technical scheme adopted by the invention is as follows:
A kind of semantic SLAM system based on joint constraint, including data acquisition module, neural network module, joint constraint Module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, with Obtain range image sequence and color image sequence;
Neural network module, for by training BlitzNet network model, before being carried out frame by frame to color image sequence To dissemination process, to obtain the detection image with potential dynamic object frame and the example segmentation with potential dynamic object example Image;
Joint constraints module, for carrying out characteristic matching to each color image frame and previous color image frame, and to The depth value of each characteristic matching pair with acquisition constructs depth constraints, to the characteristic point in potential dynamic object frame region to structure Epipolar-line constraint is built, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static mesh Mark example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point Function carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module, for establishing a point cloud map according to the optimal pose of depth camera, and by static object example It is mapped on a cloud map in segmented image with semantic pixel, to obtain semantic point cloud map.
A kind of implementation method of the semantic SLAM based on joint constraint, includes the following steps:
(1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and According to shooting time, sequence is respectively ranked up N color image frame and N frame depth image from front to back, obtains color image sequence Arrange C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
(2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to colour N color image frame in image sequence carries out propagated forward processing frame by frame, obtains the detection image with potential dynamic object frame CD1,CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,..., CSi,...,CSN
(3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
(3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and to P1With P2It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all spies for meeting depth constraints Sign point is classified as depth invariant feature point set DSP2
(3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point be classified as Potential behavioral characteristics point set PP2, by DSP2In be located at CD2The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point Set EP2
(3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2Middle satisfaction The characteristic point of epipolar-line constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2, and will EP2And SP2Merge into invariant feature point set S2
(4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmentation figure As CS2The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static mesh Example is marked, remaining example is classified as dynamic object example, obtains static object example segmented image CSS2With dynamic object example Segmented image CDS2
(5) vision front-end module obtains the pose ξ of depth camera2With the road sign point set L in three-dimensional space2:
(5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2And S2In C1In corresponding match point, calculate C2The pose ξ of depth camera2
(5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain three Road sign point set L in dimension space2
(5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth camera position Appearance ξ34,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN
(6) rear end optimization module obtains the depth camera pose and road sign point of global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point
p1,p2,...,pj,...,pM, construct with depth camera pose ξ23,...,ξi,...,ξNWith road sign point p1, p2,...,pj,...,pMFor the cost function Loss of variable, and using column Wen Baige-Ma Kuaer special formula method to cost function Loss Nonlinear optimization is carried out, global optimum depth camera pose ξ is obtained2',ξ3',...,ξi',...,ξN' and three-dimensional space in road Punctuate
p1',p2',...,pj',...,pM';
(7) semantic mapping module obtains semantic point cloud map:
(7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by the i-th frame Color image CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and the CDS obtained using data fusion modulei In dynamic object example information by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi
(7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions, benefit Three-dimensional space point is generated with cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PLi
(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiThe language of acquisition Adopted information, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PLi';
(7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, obtain the overall situation Semanteme point cloud map PL.
The present invention compared with prior art, has the advantage that
First, the present invention is realized to characteristic matching using depth constraints method to the depth constraints of depth value distance, is obtained Depth invariant feature point set, and camera pose is calculated by the match point of depth invariant feature point and depth invariant feature point, Compared with the match point for passing through characteristic point and characteristic point all in environment in the prior art calculates camera pose, camera is improved The accuracy of pose estimation;
Second, the present invention is realized to characteristic matching using epipolar-line constraint method to the epipolar-line constraint of polar curve distance, is obtained quiet State set of characteristic points and behavioral characteristics point set, and camera is calculated by static nature point set and environmental characteristic point set jointly Pose solves compared with only calculating camera pose by environmental characteristic point set in the prior art when dynamic object occupies phase The problem of camera pose can not be calculated when most of space in the machine visual field, to draw more complete camera track;
Third, data fusion module of the present invention calculate behavioral characteristics point ratio and potential behavioral characteristics point ratio, will be potential Dynamic object example is divided into dynamic object instance and static dynamic object example, and semantic mapping module is by static object example Be mapped in a cloud map, in the prior art by potential dynamic object example be classified as dynamic object example and building point a cloud It does not utilize dynamic object example to compare when map, obtains more accurately semantic point cloud map.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of semanteme SLAM system of the invention;
Fig. 2 is the implementation flow chart of semanteme SLAM method of the invention;
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Referring to Fig.1, the present invention is based on the semantic SLAM systems of joint constraint, including data acquisition module, neural network mould Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, with Obtain range image sequence and color image sequence;
Neural network module, for by training BlitzNet network model, before being carried out frame by frame to color image sequence To dissemination process, to obtain the detection image with potential dynamic object frame and the example segmentation with potential dynamic object example Image;
Joint constraints module, for carrying out characteristic matching to each color image frame and previous color image frame, and to The depth value of each characteristic matching pair with acquisition constructs depth constraints, to the characteristic point in potential dynamic object frame region to structure Epipolar-line constraint is built, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static mesh Mark example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point Function carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module, for establishing a point cloud map according to the optimal pose of depth camera, and by static object example It is mapped on a cloud map in segmented image with semantic pixel, to obtain semantic point cloud map.
Referring to Fig. 2, the present invention is based on the semantic SLAM methods of joint constraint, include the following steps:
Step (1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and According to shooting time, sequence is respectively ranked up N color image frame and N frame depth image from front to back, obtains color image sequence Arrange C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
Step (2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to colour N color image frame in image sequence carries out propagated forward processing frame by frame, obtains the detection image with potential dynamic object frame CD1,CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,..., CSi,...,CSN
Step (3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
Step (3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and it is right P1And P2It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all meet depth constraints Characteristic point be classified as depth invariant feature point set DSP2
The realization step of depth constraints method are as follows:
Step (3a1) is constructed with P2Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image Block, and calculate the average depth value of each image block:
Wherein (x, y) is indicatedPixel coordinate, depth (x, y) indicateDepth value;
Step (3a2) passes throughWithIn C1In matching characteristic pointCharacteristic matching is calculated to depth value distance Dd:
Threshold θ is arranged in step (3a3), and by P2Middle DdCharacteristic point less than θ is classified as depth invariant feature point set DSP2, It realizes to DdDepth constraints, this is because the unstable characteristic point of removal depth can reduce extraneous data amount, to improve Solution efficiency, on the other hand, the error that the characteristic point of depth mutation is generated when calculating the error of cost function are larger, to non-thread Property optimization result bring greater impact so that the global optimum's camera pose acquired is not accurate enough;
Step (3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point It is classified as potential behavioral characteristics point set PP2, by DSP2In be located at CD2Potential dynamic object outer frame characteristic point be classified as environment spy Levy point set EP2, subsequent step, which calculates basis matrix using environmental characteristic point set, can obtain accurate result;
Step (3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2In The characteristic point for meeting epipolar-line constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2, And by EP2And SP2Merge into invariant feature point set S2
The realization step of epipolar-line constraint method are as follows:
Step (3c1) passes through the internal reference x-axis zoom factor f of camerax, y-axis zoom factor fy, x-axis shift factor cx, y-axis Shift factor cyBy PP2Each of characteristic pointPixel coordinate [us,vs]TIt is converted into normalized coordinate [uc,vc,1]T:
Step (3c2) chooses EP using RANSAC method2In eight characteristic points, and pass through eight features using 8 methods Then point matching passes through F and [u to basis matrix F is calculatedc,vc,1]TIt calculatesPolar curve l:
Step (3c3) by F, l,WithIn C1In matching characteristic pointCharacteristic matching is calculated to polar curve distance De:
Threshold value η is arranged in step (3c4), and by PP2Middle DeCharacteristic point less than η is classified as static nature point set SP2, remaining Characteristic point be classified as behavioral characteristics point set DP2, realize to DeEpipolar-line constraint, this is because when characteristic point be static field sight spot When,It falls on polar curve l, in allowable range of error, whenWhen near polar curve l, it is believed that characteristic pointFor Static nature point, and DeWhen greater than threshold value,With a distance from polar curve l farther out, then it is assumed that characteristic pointFor on dynamic object Behavioral characteristics point, subsequent step is not had to behavioral characteristics point using static nature point and calculates camera crisis, to improve phase seat in the plane The accuracy of appearance estimation;
On the other hand, the feature in the case where dynamic object occupies most of space in camera fields of view, in static scene Point is very little to being not enough to calculate camera pose, this leads to the appearance of frame losing problem, and static nature point set SP2It will provide for Sufficient characteristic point, calculates camera pose using static nature point set and environmental characteristic point set, to solve frame losing jointly Problem, and improve the accuracy of camera pose estimation;
Step (4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmentation figure As CS2The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static mesh Example is marked, remaining example is classified as dynamic object example, obtains static object example segmented image CSS2With dynamic object example Segmented image CDS2
The acquisition methods of static object example and dynamic object example are as follows:
Step (4a) statistical environment set of characteristic points EP2NumberStatic nature point set SP2NumberWith Behavioral characteristics point set DP2NumberAnd calculate behavioral characteristics point ratio τdWith potential behavioral characteristics point ratio τr:
Step (4b) sets τdThreshold value be 0.5, τrIt is 0.15, works as τd≤ 0.5 and τrWhen≤0.15, then it will test frame Interior example segmentation object is classified as static object example, remaining example segmentation object is classified as dynamic object example, this is because Neural network module all detected all objects for having kinetic property, when this has the object of kinetic property in the environment When being static, which can still be classified as static scene, and the object information should be built by putting in cloud map, otherwise map structuring It is not accurate enough and complete.
The pose ξ of step (5) vision front-end module acquisition depth camera2With the road sign point set L in three-dimensional space2:
Step (5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2, and S2In C1In corresponding match point, calculate C2The pose ξ of depth camera2
Step (5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain Road sign point set L into three-dimensional space2
Road sign point set L in three-dimensional space2Acquisition methods are as follows:
Step (5b1) is by the internal reference of camera by S2Each of characteristic point pixel coordinate [us,vs]TIt is converted into and returns One changes coordinate [uc,vc,1]T:
Step (5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinateT:
Step (5b3) passes through camera pose ξ2In spin matrix R and translation vector t, convert generation for camera coordinates P' Boundary coordinate Pw:
Pw=R-1(P'-t)=[X, Y, Z]T (10)
Step (5b4) will be located at PwThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L2
Step (5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth phase Seat in the plane appearance ξ34,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN, concrete methods of realizing is according to step The method of (3a)-(5b4) is to C2And C3It carries out identical processing and obtains depth camera pose ξ3With road sign point set L3, then To C3And C4、C4And C5、…、Ci-1And Ci、…、CN-1And CNIdentical processing is carried out, ξ is obtained3And L3、ξ4And L4、…、ξiWith Li、…、ξNAnd LN
The depth camera pose and road sign point of step (6) rear end optimization module acquisition global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point p1, p2,...,pj,...,pM, construct with depth camera pose ξ23,...,ξi,...,ξNWith road sign point p1,p2,...,pj,...,pM For the cost function Loss of variable, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method, Obtain global optimum depth camera pose ξ2',ξ3',...,ξi',...,ξN' and three-dimensional space in road sign point p1', p2',...,pj',...,pM';
Construct the realization step of cost function are as follows:
Step (6a) is according to camera pose ξ2In spin matrix R and translation vector t, by the road sign point p in LjThree-dimensional coordinate [X,Y,Z]TIt is converted into camera coordinates pj':
pj'=Rpj+ t=[X', Y', Z']T (11)
Step (6b) passes through camera coordinates [X', Y', Z']TCalculate normalized coordinate [uc,vc,1]T:
Step (6c) calculates pixel coordinate P by the internal reference of cameraj=[us,vs]:
Step (6d) passes through S2In with pjCorresponding characteristic point pixel coordinate Pj' calculate error e2:
Step (6e) is according to step (7a)-(7d) method to ξ3、ξ4、…、ξi、…、ξNIdentical operation is successively carried out, Obtain e3、e4、…、ei、…、eN
Step (6f) is to e2,e3,...,ei,...,eNIt sums, obtains cost function Loss:
Step (7) semanteme mapping module obtains semantic point cloud map:
Step (7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by I color image frame CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and obtained using data fusion module CDSiIn dynamic object example information by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi
Step (7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions It sets, generates three-dimensional space point using cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PLi
Step (7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiIt obtains Semantic information, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PLi';
Step (7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, it obtains Global semantic point cloud map PL.

Claims (7)

1. a kind of semantic SLAM system based on joint constraint, which is characterized in that including data acquisition module, neural network mould Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:
Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, to obtain Range image sequence and color image sequence;
Neural network module, it is preceding to biography for being carried out frame by frame to color image sequence by training BlitzNet network model Processing is broadcast, to obtain the detection image with potential dynamic object frame and the example segmentation figure with potential dynamic object example Picture;
Joint constraints module for carrying out characteristic matching to each color image frame and previous color image frame, and obtains matching The depth value of each characteristic matching pair taken constructs depth constraints, to the characteristic point in potential dynamic object frame region to building pole Line constraint, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories;
Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static object reality Example segmented image and dynamic object example segmented image;
Vision front-end module, for calculating depth camera pose by invariant feature point;
Rear end optimization module, for constructing cost letter by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point Number carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point;
Semantic mapping module for establishing point cloud map according to the optimal pose of depth camera, and static object example is divided It is mapped on a cloud map in image with semantic pixel, to obtain semantic point cloud map.
2. a kind of implementation method of the semantic SLAM based on joint constraint, which comprises the steps of:
(1) data acquisition module obtains image sequence:
Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and according to Sequence is respectively ranked up N color image frame and N frame depth image shooting time from front to back, obtains color image sequence C1,C2,...,Ci,...,CNWith range image sequence D1,D2,...,Di,...,DN, i=1,2 ..., N, N >=100;
(2) neural network module obtains detection image and example segmented image:
Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to color image N color image frame in sequence carries out propagated forward processing frame by frame, obtains the detection image CD with potential dynamic object frame1, CD2,...,CDi,...,CDN, and the example segmented image CS with potential dynamic object example1,CS2,...,CSi,..., CSN
(3) joint constraints module obtains different characteristic point category set DSP2、EP2、SP2、DP2And S:
(3a) combines constraints module to C1And C2ORB feature extraction is carried out respectively obtains characteristic set P1And P2, and to P1And P2Into Row matching, obtains multiple characteristic matchings pair, depth constraints method is then used, by P2In all characteristic points for meeting depth constraints It is classified as depth invariant feature point set DSP2
(3b) combines constraints module for DSP2In be located at target detection image CD2Dynamic object frame in characteristic point be classified as it is potential Behavioral characteristics point set PP2, by DSP2In be located at CD2The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point set EP2
(3c) joint constraints module passes through EP2Basis matrix F is calculated, epipolar-line constraint method is then used, by PP2In meet polar curve The characteristic point of constraint is classified as static nature point set SP2, remaining characteristic point is classified as behavioral characteristics point set DP2, and by EP2With SP2Merge into invariant feature point set S2
(4) data fusion module obtains static object example segmented image CSS2With dynamic object example segmented image CDS2:
Data fusion module calculates C2Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmented image CS2 The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static object reality Example, remaining example are classified as dynamic object example, obtain static object example segmented image CSS2Divide with dynamic object example Image CDS2
(5) vision front-end module obtains the pose ξ of depth camera2With the road sign point set L in three-dimensional space2:
(5a) vision front-end module uses iteration closest approach ICP method, and passes through C2Available feature point S2And S2In C1In it is right The match point answered calculates C2The pose ξ of depth camera2
(5b) vision front-end module passes through camera internal reference and ξ2, by S2Pixel coordinate convert three dimensional space coordinate, obtain three-dimensional space Between in road sign point set L2
(5c) vision front-end module is according to acquisition ξ2And L2Method obtain C3,C4,...,Ci,...,CNDepth camera pose ξ3, ξ4,...,ξi,...,ξNWith road sign point set L3,L4,...,Li,...,LN
(6) rear end optimization module obtains the depth camera pose and road sign point of global optimum:
Rear end optimization module is by L2,L3,...,Li,...,LNRoad sign point set L is merged into, including road sign point p1,p2,..., pj,...,pM, construct with depth camera pose ξ23,...,ξi,...,ξNWith road sign point p1,p2,...,pj,...,pMFor variable Cost function Loss, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method, obtained complete Office optimal depth camera pose ξ2',ξ3',...,ξi',...,ξN' and three-dimensional space in road sign point p1',p2',..., pj',...,pM';
(7) semantic mapping module obtains semantic point cloud map:
(7a) semanteme mapping module is to color image sequence C2,C3,...,Ci,...,CNIt is handled frame by frame, by the i-th frame cromogram As CiMiddle depth value is not that 0 pixel is classified as pixel collection YPi, and the CDS obtained using data fusion moduleiIn it is dynamic State object instance information is by YPiIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CPi
(7b) semanteme mapping module passes through camera internal reference and ξi, calculate CPiThree-dimensional coordinate position in three dimensions utilizes point Cloud library PCL generates three-dimensional space point, and all three-dimensional space points of generation are merged into a cloud PLi
(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion moduleiThe semantic letter of acquisition Breath, to CSSiCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PLi';
(7d) semanteme mapping module puts cloud PL to semanteme2',PL3',...,PLi',...,PLN' spliced, obtain global semanteme Point cloud map PL.
3. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3a) Depth constraints method, realize step are as follows:
(3a1) is constructed with P2Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image block, and count Calculate the average depth value of each image block:
Wherein (x, y) is indicatedPixel coordinate, depth (x, y) indicateDepth value;
(3a2) passes throughWithIn C1In matching characteristic pointCharacteristic matching is calculated to depth value distance Dd:
Threshold θ is arranged in (3a3), and by P2Middle DdCharacteristic point less than θ is classified as depth invariant feature point set DSP2, realize to Dd's Depth constraints.
4. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3c) Epipolar-line constraint method, realize step are as follows:
(3c1) passes through the internal reference x-axis zoom factor f of camerax, y-axis zoom factor fy, x-axis shift factor cx, y-axis shift factor cy By PP2Each of characteristic pointPixel coordinate [us,vs]TIt is converted into normalized coordinate [uc,vc,1]T:
(3c2) chooses EP using RANSAC method2In eight characteristic points, and pass through eight Feature Points Matchings pair using 8 methods Basis matrix F is calculated, F and [u is then passed throughc,vc,1]TIt calculatesPolar curve l:
(3c3) by F, l,WithIn C1In matching characteristic pointCharacteristic matching is calculated to polar curve distance De:
Threshold value η is arranged in (3c4), and by PP2Middle DeCharacteristic point less than η is classified as static nature point set SP2, remaining characteristic point It is classified as behavioral characteristics point set DP2, realize to DeEpipolar-line constraint.
5. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (4) Static object example and dynamic object example, acquisition methods are as follows:
(4a) statistical environment set of characteristic points EP2NumberStatic nature point set SP2NumberWith behavioral characteristics point Set DP2NumberAnd calculate behavioral characteristics point ratio τdWith potential behavioral characteristics point ratio τr:
(4b) sets τdThreshold value be 0.5, τrIt is 0.15, works as τd≤ 0.5 and τrWhen≤0.15, then it will test the example point in frame It cuts target and is classified as static object example, remaining example segmentation object is classified as dynamic object example.
6. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (5b) Road sign point set L in three-dimensional space2, acquisition methods are as follows:
(5b1) is by the internal reference of camera by S2Each of characteristic point pixel coordinate [us,vs]TIt is converted into normalized coordinate [uc,vc,1]T:
(5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinateT:
(5b3) passes through camera pose ξ2In spin matrix R and translation vector t, convert world coordinates P for camera coordinates P'w:
Pw=R-1(P'-t)=[X, Y, Z]T (10)
(5b4) will be located at PwThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L2
7. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (6) Cost function is constructed, realizes step are as follows:
(6a) is according to camera pose ξ2In spin matrix R and translation vector t, by the road sign point p in LjThree-dimensional coordinate [X, Y, Z]T It is converted into camera coordinates pj':
pj'=Rpj+ t=[X', Y', Z']T (11)
(6b) passes through camera coordinates [X', Y', Z']TCalculate normalized coordinate [uc,vc,1]T:
(6c) calculates pixel coordinate P by the internal reference of cameraj=[us,vs]:
(6d) passes through S2In with pjCorresponding characteristic point pixel coordinate Pj' calculate error e2:
(6e) is according to step (7a)-(7d) method to ξ3、ξ4、…、ξi、…、ξNIdentical operation is successively carried out, e is obtained3、 e4、…、ei、…、eN
(6f) is to e2,e3,...,ei,...,eNIt sums, obtains cost function Loss:
CN201910768052.6A 2019-08-20 2019-08-20 Semantic SLAM system and method based on joint constraint Active CN110533720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910768052.6A CN110533720B (en) 2019-08-20 2019-08-20 Semantic SLAM system and method based on joint constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910768052.6A CN110533720B (en) 2019-08-20 2019-08-20 Semantic SLAM system and method based on joint constraint

Publications (2)

Publication Number Publication Date
CN110533720A true CN110533720A (en) 2019-12-03
CN110533720B CN110533720B (en) 2023-05-02

Family

ID=68663703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910768052.6A Active CN110533720B (en) 2019-08-20 2019-08-20 Semantic SLAM system and method based on joint constraint

Country Status (1)

Country Link
CN (1) CN110533720B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN111242954A (en) * 2020-01-20 2020-06-05 浙江大学 Panorama segmentation method with bidirectional connection and shielding processing
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111559314A (en) * 2020-04-27 2020-08-21 长沙立中汽车设计开发股份有限公司 Depth and image information fused 3D enhanced panoramic looking-around system and implementation method
CN111709982A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Three-dimensional reconstruction method for dynamic environment
CN111882611A (en) * 2020-07-17 2020-11-03 北京三快在线科技有限公司 Map construction method and device
CN112308921A (en) * 2020-11-09 2021-02-02 重庆大学 Semantic and geometric based joint optimization dynamic SLAM method
WO2021114776A1 (en) * 2019-12-12 2021-06-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Object detection method, object detection device, terminal device, and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201805688D0 (en) * 2018-04-05 2018-05-23 Imagination Tech Ltd Matching local image feature descriptors
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN109815847A (en) * 2018-12-30 2019-05-28 中国电子科技集团公司信息科学研究院 A kind of vision SLAM method based on semantic constraint
WO2019153245A1 (en) * 2018-02-09 2019-08-15 Baidu.Com Times Technology (Beijing) Co., Ltd. Systems and methods for deep localization and segmentation with 3d semantic map

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153245A1 (en) * 2018-02-09 2019-08-15 Baidu.Com Times Technology (Beijing) Co., Ltd. Systems and methods for deep localization and segmentation with 3d semantic map
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
GB201805688D0 (en) * 2018-04-05 2018-05-23 Imagination Tech Ltd Matching local image feature descriptors
CN109815847A (en) * 2018-12-30 2019-05-28 中国电子科技集团公司信息科学研究院 A kind of vision SLAM method based on semantic constraint

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021114776A1 (en) * 2019-12-12 2021-06-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Object detection method, object detection device, terminal device, and medium
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN111242954A (en) * 2020-01-20 2020-06-05 浙江大学 Panorama segmentation method with bidirectional connection and shielding processing
CN111242954B (en) * 2020-01-20 2022-05-13 浙江大学 Panorama segmentation method with bidirectional connection and shielding processing
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111402336B (en) * 2020-03-23 2024-03-12 中国科学院自动化研究所 Semantic SLAM-based dynamic environment camera pose estimation and semantic map construction method
CN111559314A (en) * 2020-04-27 2020-08-21 长沙立中汽车设计开发股份有限公司 Depth and image information fused 3D enhanced panoramic looking-around system and implementation method
CN111709982A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Three-dimensional reconstruction method for dynamic environment
CN111882611A (en) * 2020-07-17 2020-11-03 北京三快在线科技有限公司 Map construction method and device
CN111882611B (en) * 2020-07-17 2023-11-24 北京三快在线科技有限公司 Map construction method and device
CN112308921A (en) * 2020-11-09 2021-02-02 重庆大学 Semantic and geometric based joint optimization dynamic SLAM method
CN112308921B (en) * 2020-11-09 2024-01-12 重庆大学 Combined optimization dynamic SLAM method based on semantics and geometry

Also Published As

Publication number Publication date
CN110533720B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN110533720A (en) Semantic SLAM system and method based on joint constraint
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN110264416B (en) Sparse point cloud segmentation method and device
CN111179324B (en) Object six-degree-of-freedom pose estimation method based on color and depth information fusion
CN110135485A (en) The object identification and localization method and system that monocular camera is merged with millimetre-wave radar
CN109816725A (en) A kind of monocular camera object pose estimation method and device based on deep learning
CN110097553A (en) The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN111161317A (en) Single-target tracking method based on multiple networks
CN110533716B (en) Semantic SLAM system and method based on 3D constraint
CN110827395A (en) Instant positioning and map construction method suitable for dynamic environment
CN112801074B (en) Depth map estimation method based on traffic camera
CN105719352B (en) Face three-dimensional point cloud super-resolution fusion method and apply its data processing equipment
CN112418288B (en) GMS and motion detection-based dynamic vision SLAM method
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
Shreyas et al. 3D object detection and tracking methods using deep learning for computer vision applications
CN109508673A (en) It is a kind of based on the traffic scene obstacle detection of rodlike pixel and recognition methods
Cui et al. Dense depth-map estimation based on fusion of event camera and sparse LiDAR
CN111915651A (en) Visual pose real-time estimation method based on digital image map and feature point tracking
CN112396655B (en) Point cloud data-based ship target 6D pose estimation method
CN114140527A (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN112801928A (en) Attention mechanism-based millimeter wave radar and visual sensor fusion method
Yang et al. Ground plane matters: Picking up ground plane prior in monocular 3d object detection
CN115908508A (en) Coastline ship real-time tracking method based on array camera
Tao et al. 3d semantic vslam of indoor environment based on mask scoring rcnn

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant