CN110533720A

CN110533720A - Semantic SLAM system and method based on joint constraint

Info

Publication number: CN110533720A
Application number: CN201910768052.6A
Authority: CN
Inventors: 韩红; 王毅飞; 张齐驰; 唐裕亮; 迟勇欣; 范迎春
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2019-12-03
Anticipated expiration: 2039-08-20
Also published as: CN110533720B

Abstract

The invention proposes a kind of semantic SLAM System and method fors based on joint constraint, it aims to solve the problem that and solves the problems, such as that camera pose calculates inaccuracy and can not calculate camera pose when dynamic object occupies camera fields of view major part space in the unstable situation of pixel depth value, the accuracy that the estimation of camera pose is improved by depth constraints method, the integrality of camera track is improved using epipolar-line constraint method.Implementation method are as follows: data acquisition module obtains image sequence；Neural network module obtains detection image and example segmented image；Joint constraints module obtains different characteristic point category sets；Data fusion module obtains static object example segmented image and dynamic object example segmented image；Vision front-end module obtains the road sign point set in the pose and three-dimensional space of depth camera；The depth camera pose and road sign point of rear end optimization module acquisition global optimum；Semantic mapping module obtains semantic point cloud map.

Description

Semantic SLAM system and method based on joint constraint

Technical field

The invention belongs to technical field of computer vision, further relate to a kind of semantic SLAM system based on joint constraint System and method can be used for the pose estimation and the building of semantic map of camera in complicated high dynamic environment.

Background technique

Simultaneous localization and mapping system SLAM plays important angle in the independent navigation avoidance of unmanned systems Color, in past 30 years, SLAM System Development is rapid, and main target is that unmanned systems are independently visiting circumstances not known Carrying out during rope being capable of constructing environment map while itself accurate positioning.But the map that traditional SLAM system is built out It only include the inferior grades geometrical characteristics such as the point, line, surface in environment, and for following unmanned systems, only believe comprising simple space The map of breath is difficult to meet its growth requirement.The distinctive feature of semantic map is to contain the semantic information of object in environment, The semantic map of three-dimensional space can make unmanned systems correctly perceive ambient conditions, by the Cognition Understanding to environment, can allow The certain positioning accuracy of SLAM system improving makes up existing unmanned systems in environment sensing, and the deficiency understood.Semantic SLAM system System not only obtains the geometry information of object in environment during building figure, object in environment-identification, at the same it is available its The semantic informations such as position, posture and functional attributes, so as to effectively cope with complex scene and complete more complicated task.

In October, 2018, Berta Bescos of Univ Zaragoza, Spain et al. is in IEEE Robotics and Automation Letters the 4th phase of volume 3 delivers entitled " DynaSLAM:Tracking, Mapping, and The article of Inpainting in Dynamic Scenes " proposes a kind of SLAM system and method for Case-based Reasoning segmentation, In On the basis of ORB-SLAM2, Detection dynamic target function is increased, RGB-D image data is input to Mask R-CNN network In have priori dynamic property to all targets carry out segmentation pixel-by-pixel, obtain dynamic object example, and use multiple view Method of geometry detection is not included in the true mobile object in CNN network output classification, by being not belonging to these dynamic objects reality To camera pose is calculated, solve ORB-SLAM2 has dynamic object in the environment for example and the Feature Points Matching of true mobile object In the case of camera pose estimation inaccuracy problem.Meanwhile by all targets for having priori dynamic property in example segmentation Example is divided away, obtains image only containing static scene, and use static scene image building point cloud map.

However, DynaSLAM is by all object removals with priori dynamic property, when these targets are quiet in the environment When state, the static scene map of foundation will lack the information of these objects, so that map structuring is not accurate enough.Another party Face will lead to pose to camera pose is calculated using the unstable characteristic matching of depth value in the case where depth value is unstable Evaluated error is larger, and when dynamic object occupies most of space in camera fields of view, because the match point in environment is not Foot, will lead to DynaSLAM can not calculate camera pose, to frame losing phenomenon occur, the track of camera will be imperfect.

Summary of the invention

It is an object of the invention to overcome the shortcomings of above-mentioned prior art, a kind of semanteme based on joint constraint is proposed SLAM system and method, for solving the camera pose calculating inaccuracy in the unstable situation of pixel depth value and working as dynamic object The problem of camera pose can not be calculated when occupying camera fields of view major part space, to improve the accuracy and camera rail of camera pose The integrality of mark, while solving the problems, such as that the object with kinetic property can not construct when static in cloud map, it obtains More accurately point cloud map.

To achieve the above object, the technical scheme adopted by the invention is as follows:

A kind of semantic SLAM system based on joint constraint, including data acquisition module, neural network module, joint constraint Module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:

Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, with Obtain range image sequence and color image sequence；

Neural network module, for by training BlitzNet network model, before being carried out frame by frame to color image sequence To dissemination process, to obtain the detection image with potential dynamic object frame and the example segmentation with potential dynamic object example Image；

Joint constraints module, for carrying out characteristic matching to each color image frame and previous color image frame, and to The depth value of each characteristic matching pair with acquisition constructs depth constraints, to the characteristic point in potential dynamic object frame region to structure Epipolar-line constraint is built, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories；

Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static mesh Mark example segmented image and dynamic object example segmented image；

Vision front-end module, for calculating depth camera pose by invariant feature point；

Rear end optimization module, for constructing cost by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point Function carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point；

Semantic mapping module, for establishing a point cloud map according to the optimal pose of depth camera, and by static object example It is mapped on a cloud map in segmented image with semantic pixel, to obtain semantic point cloud map.

A kind of implementation method of the semantic SLAM based on joint constraint, includes the following steps:

(1) data acquisition module obtains image sequence:

Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and According to shooting time, sequence is respectively ranked up N color image frame and N frame depth image from front to back, obtains color image sequence Arrange C₁,C₂,...,C_i,...,C_NWith range image sequence D₁,D₂,...,D_i,...,D_N, i=1,2 ..., N, N >=100；

(2) neural network module obtains detection image and example segmented image:

Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to colour N color image frame in image sequence carries out propagated forward processing frame by frame, obtains the detection image with potential dynamic object frame CD₁,CD₂,...,CD_i,...,CD_N, and the example segmented image CS with potential dynamic object example₁,CS₂,..., CS_i,...,CS_N；

(3) joint constraints module obtains different characteristic point category set DSP₂、EP₂、SP₂、DP₂And S:

(3a) combines constraints module to C₁And C₂ORB feature extraction is carried out respectively obtains characteristic set P₁And P₂, and to P₁With P₂It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P₂In all spies for meeting depth constraints Sign point is classified as depth invariant feature point set DSP₂；

(3b) combines constraints module for DSP₂In be located at target detection image CD₂Dynamic object frame in characteristic point be classified as Potential behavioral characteristics point set PP₂, by DSP₂In be located at CD₂The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point Set EP₂；

(3c) joint constraints module passes through EP₂Basis matrix F is calculated, epipolar-line constraint method is then used, by PP₂Middle satisfaction The characteristic point of epipolar-line constraint is classified as static nature point set SP₂, remaining characteristic point is classified as behavioral characteristics point set DP₂, and will EP₂And SP₂Merge into invariant feature point set S₂；

(4) data fusion module obtains static object example segmented image CSS₂With dynamic object example segmented image CDS₂:

Data fusion module calculates C₂Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmentation figure As CS₂The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static mesh Example is marked, remaining example is classified as dynamic object example, obtains static object example segmented image CSS₂With dynamic object example Segmented image CDS₂；

(5) vision front-end module obtains the pose ξ of depth camera₂With the road sign point set L in three-dimensional space₂:

(5a) vision front-end module uses iteration closest approach ICP method, and passes through C₂Available feature point S₂And S₂In C₁In corresponding match point, calculate C₂The pose ξ of depth camera₂；

(5b) vision front-end module passes through camera internal reference and ξ₂, by S₂Pixel coordinate convert three dimensional space coordinate, obtain three Road sign point set L in dimension space₂；

(5c) vision front-end module is according to acquisition ξ₂And L₂Method obtain C₃,C₄,...,C_i,...,C_NDepth camera position Appearance ξ₃,ξ₄,...,ξ_i,...,ξ_NWith road sign point set L₃,L₄,...,L_i,...,L_N；

(6) rear end optimization module obtains the depth camera pose and road sign point of global optimum:

Rear end optimization module is by L₂,L₃,...,L_i,...,L_NRoad sign point set L is merged into, including road sign point

p₁,p₂,...,p_j,...,p_M, construct with depth camera pose ξ₂,ξ₃,...,ξ_i,...,ξ_NWith road sign point p₁, p₂,...,p_j,...,p_MFor the cost function Loss of variable, and using column Wen Baige-Ma Kuaer special formula method to cost function Loss Nonlinear optimization is carried out, global optimum depth camera pose ξ is obtained₂',ξ₃',...,ξ_i',...,ξ_N' and three-dimensional space in road Punctuate

p₁',p₂',...,p_j',...,p_M'；

(7) semantic mapping module obtains semantic point cloud map:

(7a) semanteme mapping module is to color image sequence C₂,C₃,...,C_i,...,C_NIt is handled frame by frame, by the i-th frame Color image C_iMiddle depth value is not that 0 pixel is classified as pixel collection YP_i, and the CDS obtained using data fusion module_i In dynamic object example information by YP_iIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CP_i；

(7b) semanteme mapping module passes through camera internal reference and ξ_i, calculate CP_iThree-dimensional coordinate position in three dimensions, benefit Three-dimensional space point is generated with cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PL_i；

(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion module_iThe language of acquisition Adopted information, to CSS_iCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PL_i'；

(7d) semanteme mapping module puts cloud PL to semanteme₂',PL₃',...,PL_i',...,PL_N' spliced, obtain the overall situation Semanteme point cloud map PL.

The present invention compared with prior art, has the advantage that

First, the present invention is realized to characteristic matching using depth constraints method to the depth constraints of depth value distance, is obtained Depth invariant feature point set, and camera pose is calculated by the match point of depth invariant feature point and depth invariant feature point, Compared with the match point for passing through characteristic point and characteristic point all in environment in the prior art calculates camera pose, camera is improved The accuracy of pose estimation；

Second, the present invention is realized to characteristic matching using epipolar-line constraint method to the epipolar-line constraint of polar curve distance, is obtained quiet State set of characteristic points and behavioral characteristics point set, and camera is calculated by static nature point set and environmental characteristic point set jointly Pose solves compared with only calculating camera pose by environmental characteristic point set in the prior art when dynamic object occupies phase The problem of camera pose can not be calculated when most of space in the machine visual field, to draw more complete camera track；

Third, data fusion module of the present invention calculate behavioral characteristics point ratio and potential behavioral characteristics point ratio, will be potential Dynamic object example is divided into dynamic object instance and static dynamic object example, and semantic mapping module is by static object example Be mapped in a cloud map, in the prior art by potential dynamic object example be classified as dynamic object example and building point a cloud It does not utilize dynamic object example to compare when map, obtains more accurately semantic point cloud map.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of semanteme SLAM system of the invention；

Fig. 2 is the implementation flow chart of semanteme SLAM method of the invention；

Specific embodiment

Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.

Referring to Fig.1, the present invention is based on the semantic SLAM systems of joint constraint, including data acquisition module, neural network mould Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:

Referring to Fig. 2, the present invention is based on the semantic SLAM methods of joint constraint, include the following steps:

Step (1) data acquisition module obtains image sequence:

Step (2) neural network module obtains detection image and example segmented image:

Step (3) joint constraints module obtains different characteristic point category set DSP₂、EP₂、SP₂、DP₂And S:

Step (3a) combines constraints module to C₁And C₂ORB feature extraction is carried out respectively obtains characteristic set P₁And P₂, and it is right P₁And P₂It is matched, obtains multiple characteristic matchings pair, depth constraints method is then used, by P₂In all meet depth constraints Characteristic point be classified as depth invariant feature point set DSP₂；

The realization step of depth constraints method are as follows:

Step (3a1) is constructed with P₂Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image Block, and calculate the average depth value of each image block:

Wherein (x, y) is indicatedPixel coordinate, depth (x, y) indicateDepth value；

Step (3a2) passes throughWithIn C₁In matching characteristic pointCharacteristic matching is calculated to depth value distance D_d:

Threshold θ is arranged in step (3a3), and by P₂Middle D_dCharacteristic point less than θ is classified as depth invariant feature point set DSP₂, It realizes to D_dDepth constraints, this is because the unstable characteristic point of removal depth can reduce extraneous data amount, to improve Solution efficiency, on the other hand, the error that the characteristic point of depth mutation is generated when calculating the error of cost function are larger, to non-thread Property optimization result bring greater impact so that the global optimum's camera pose acquired is not accurate enough；

Step (3b) combines constraints module for DSP₂In be located at target detection image CD₂Dynamic object frame in characteristic point It is classified as potential behavioral characteristics point set PP₂, by DSP₂In be located at CD₂Potential dynamic object outer frame characteristic point be classified as environment spy Levy point set EP₂, subsequent step, which calculates basis matrix using environmental characteristic point set, can obtain accurate result；

Step (3c) joint constraints module passes through EP₂Basis matrix F is calculated, epipolar-line constraint method is then used, by PP₂In The characteristic point for meeting epipolar-line constraint is classified as static nature point set SP₂, remaining characteristic point is classified as behavioral characteristics point set DP₂, And by EP₂And SP₂Merge into invariant feature point set S₂；

The realization step of epipolar-line constraint method are as follows:

Step (3c1) passes through the internal reference x-axis zoom factor f of camera_x, y-axis zoom factor f_y, x-axis shift factor c_x, y-axis Shift factor c_yBy PP₂Each of characteristic pointPixel coordinate [u_s,v_s]^TIt is converted into normalized coordinate [u_c,v_c,1]^T:

Step (3c2) chooses EP using RANSAC method₂In eight characteristic points, and pass through eight features using 8 methods Then point matching passes through F and [u to basis matrix F is calculated_c,v_c,1]^TIt calculatesPolar curve l:

Step (3c3) by F, l,WithIn C₁In matching characteristic pointCharacteristic matching is calculated to polar curve distance D_e:

Threshold value η is arranged in step (3c4), and by PP₂Middle D_eCharacteristic point less than η is classified as static nature point set SP₂, remaining Characteristic point be classified as behavioral characteristics point set DP₂, realize to D_eEpipolar-line constraint, this is because when characteristic point be static field sight spot When,It falls on polar curve l, in allowable range of error, whenWhen near polar curve l, it is believed that characteristic pointFor Static nature point, and D_eWhen greater than threshold value,With a distance from polar curve l farther out, then it is assumed that characteristic pointFor on dynamic object Behavioral characteristics point, subsequent step is not had to behavioral characteristics point using static nature point and calculates camera crisis, to improve phase seat in the plane The accuracy of appearance estimation；

On the other hand, the feature in the case where dynamic object occupies most of space in camera fields of view, in static scene Point is very little to being not enough to calculate camera pose, this leads to the appearance of frame losing problem, and static nature point set SP₂It will provide for Sufficient characteristic point, calculates camera pose using static nature point set and environmental characteristic point set, to solve frame losing jointly Problem, and improve the accuracy of camera pose estimation；

Step (4) data fusion module obtains static object example segmented image CSS₂With dynamic object example segmented image CDS₂:

The acquisition methods of static object example and dynamic object example are as follows:

Step (4a) statistical environment set of characteristic points EP₂NumberStatic nature point set SP₂NumberWith Behavioral characteristics point set DP₂NumberAnd calculate behavioral characteristics point ratio τ_dWith potential behavioral characteristics point ratio τ_r:

Step (4b) sets τ_dThreshold value be 0.5, τ_rIt is 0.15, works as τ_d≤ 0.5 and τ_rWhen≤0.15, then it will test frame Interior example segmentation object is classified as static object example, remaining example segmentation object is classified as dynamic object example, this is because Neural network module all detected all objects for having kinetic property, when this has the object of kinetic property in the environment When being static, which can still be classified as static scene, and the object information should be built by putting in cloud map, otherwise map structuring It is not accurate enough and complete.

The pose ξ of step (5) vision front-end module acquisition depth camera₂With the road sign point set L in three-dimensional space₂:

Step (5a) vision front-end module uses iteration closest approach ICP method, and passes through C₂Available feature point S₂, and S₂In C₁In corresponding match point, calculate C₂The pose ξ of depth camera₂；

Step (5b) vision front-end module passes through camera internal reference and ξ₂, by S₂Pixel coordinate convert three dimensional space coordinate, obtain Road sign point set L into three-dimensional space₂；

Road sign point set L in three-dimensional space₂Acquisition methods are as follows:

Step (5b1) is by the internal reference of camera by S₂Each of characteristic point pixel coordinate [u_s,v_s]^TIt is converted into and returns One changes coordinate [u_c,v_c,1]^T:

Step (5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinate^T:

Step (5b3) passes through camera pose ξ₂In spin matrix R and translation vector t, convert generation for camera coordinates P' Boundary coordinate P_w:

P_w=R^-1(P'-t)=[X, Y, Z]^T (10)

Step (5b4) will be located at P_wThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L₂。

Step (5c) vision front-end module is according to acquisition ξ₂And L₂Method obtain C₃,C₄,...,C_i,...,C_NDepth phase Seat in the plane appearance ξ₃,ξ₄,...,ξ_i,...,ξ_NWith road sign point set L₃,L₄,...,L_i,...,L_N, concrete methods of realizing is according to step The method of (3a)-(5b4) is to C₂And C₃It carries out identical processing and obtains depth camera pose ξ₃With road sign point set L₃, then To C₃And C₄、C₄And C₅、…、C_i-1And C_i、…、C_N-1And C_NIdentical processing is carried out, ξ is obtained₃And L₃、ξ₄And L₄、…、ξ_iWith L_i、…、ξ_NAnd L_N；

The depth camera pose and road sign point of step (6) rear end optimization module acquisition global optimum:

Rear end optimization module is by L₂,L₃,...,L_i,...,L_NRoad sign point set L is merged into, including road sign point p₁, p₂,...,p_j,...,p_M, construct with depth camera pose ξ₂,ξ₃,...,ξ_i,...,ξ_NWith road sign point p₁,p₂,...,p_j,...,p_M For the cost function Loss of variable, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method, Obtain global optimum depth camera pose ξ₂',ξ₃',...,ξ_i',...,ξ_N' and three-dimensional space in road sign point p₁', p₂',...,p_j',...,p_M'；

Construct the realization step of cost function are as follows:

Step (6a) is according to camera pose ξ₂In spin matrix R and translation vector t, by the road sign point p in L_jThree-dimensional coordinate [X,Y,Z]^TIt is converted into camera coordinates p_j':

p_j'=Rp_j+ t=[X', Y', Z']^T (11)

Step (6b) passes through camera coordinates [X', Y', Z']^TCalculate normalized coordinate [u_c,v_c,1]^T:

Step (6c) calculates pixel coordinate P by the internal reference of camera_j=[u_s,v_s]:

Step (6d) passes through S₂In with p_jCorresponding characteristic point pixel coordinate P_j' calculate error e₂:

Step (6e) is according to step (7a)-(7d) method to ξ₃、ξ₄、…、ξ_i、…、ξ_NIdentical operation is successively carried out, Obtain e₃、e₄、…、e_i、…、e_N；

Step (6f) is to e₂,e₃,...,e_i,...,e_NIt sums, obtains cost function Loss:

Step (7) semanteme mapping module obtains semantic point cloud map:

Step (7a) semanteme mapping module is to color image sequence C₂,C₃,...,C_i,...,C_NIt is handled frame by frame, by I color image frame C_iMiddle depth value is not that 0 pixel is classified as pixel collection YP_i, and obtained using data fusion module CDS_iIn dynamic object example information by YP_iIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CP_i；

Step (7b) semanteme mapping module passes through camera internal reference and ξ_i, calculate CP_iThree-dimensional coordinate position in three dimensions It sets, generates three-dimensional space point using cloud library PCL, and all three-dimensional space points of generation are merged into a cloud PL_i；

Step (7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion module_iIt obtains Semantic information, to CSS_iCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PL_i'；

Step (7d) semanteme mapping module puts cloud PL to semanteme₂',PL₃',...,PL_i',...,PL_N' spliced, it obtains Global semantic point cloud map PL.

Claims

1. a kind of semantic SLAM system based on joint constraint, which is characterized in that including data acquisition module, neural network mould Block, joint constraints module, data fusion module, vision front-end module, rear end optimization module and semantic mapping module, in which:

Data acquisition module, using depth camera, for acquiring the multiframe depth image and color image of indoor environment, to obtain Range image sequence and color image sequence；

Neural network module, it is preceding to biography for being carried out frame by frame to color image sequence by training BlitzNet network model Processing is broadcast, to obtain the detection image with potential dynamic object frame and the example segmentation figure with potential dynamic object example Picture；

Joint constraints module for carrying out characteristic matching to each color image frame and previous color image frame, and obtains matching The depth value of each characteristic matching pair taken constructs depth constraints, to the characteristic point in potential dynamic object frame region to building pole Line constraint, to sort out to all characteristic points of the color image, to obtain characteristic point set of all categories；

Data fusion module, for being merged to example segmented image with set of characteristic points data, to obtain static object reality Example segmented image and dynamic object example segmented image；

Rear end optimization module, for constructing cost letter by depth camera pose and the corresponding three-dimensional space road sign point of characteristic point Number carries out nonlinear optimization to cost function, to obtain global optimum's camera pose and road sign point；

Semantic mapping module for establishing point cloud map according to the optimal pose of depth camera, and static object example is divided It is mapped on a cloud map in image with semantic pixel, to obtain semantic point cloud map.

2. a kind of implementation method of the semantic SLAM based on joint constraint, which comprises the steps of:

(1) data acquisition module obtains image sequence:

Data acquisition module carries out n times to indoor environment and persistently shoots, and obtains N color image frame and N frame depth image, and according to Sequence is respectively ranked up N color image frame and N frame depth image shooting time from front to back, obtains color image sequence C₁,C₂,...,C_i,...,C_NWith range image sequence D₁,D₂,...,D_i,...,D_N, i=1,2 ..., N, N >=100；

(2) neural network module obtains detection image and example segmented image:

Neural network module uses the BlitzNet network model of the model parameter by the training of COCO data set, to color image N color image frame in sequence carries out propagated forward processing frame by frame, obtains the detection image CD with potential dynamic object frame₁, CD₂,...,CD_i,...,CD_N, and the example segmented image CS with potential dynamic object example₁,CS₂,...,CS_i,..., CS_N；

(3a) combines constraints module to C₁And C₂ORB feature extraction is carried out respectively obtains characteristic set P₁And P₂, and to P₁And P₂Into Row matching, obtains multiple characteristic matchings pair, depth constraints method is then used, by P₂In all characteristic points for meeting depth constraints It is classified as depth invariant feature point set DSP₂；

(3b) combines constraints module for DSP₂In be located at target detection image CD₂Dynamic object frame in characteristic point be classified as it is potential Behavioral characteristics point set PP₂, by DSP₂In be located at CD₂The characteristic point of potential dynamic object outer frame be classified as environmental characteristic point set EP₂；

(3c) joint constraints module passes through EP₂Basis matrix F is calculated, epipolar-line constraint method is then used, by PP₂In meet polar curve The characteristic point of constraint is classified as static nature point set SP₂, remaining characteristic point is classified as behavioral characteristics point set DP₂, and by EP₂With SP₂Merge into invariant feature point set S₂；

Data fusion module calculates C₂Behavioral characteristics point ratio and potential behavioral characteristics point ratio, and by example segmented image CS₂ The example that middle behavioral characteristics point ratio and potential behavioral characteristics point ratio are respectively less than preset rate threshold is classified as static object reality Example, remaining example are classified as dynamic object example, obtain static object example segmented image CSS₂Divide with dynamic object example Image CDS₂；

(5a) vision front-end module uses iteration closest approach ICP method, and passes through C₂Available feature point S₂And S₂In C₁In it is right The match point answered calculates C₂The pose ξ of depth camera₂；

(5b) vision front-end module passes through camera internal reference and ξ₂, by S₂Pixel coordinate convert three dimensional space coordinate, obtain three-dimensional space Between in road sign point set L₂；

(5c) vision front-end module is according to acquisition ξ₂And L₂Method obtain C₃,C₄,...,C_i,...,C_NDepth camera pose ξ₃, ξ₄,...,ξ_i,...,ξ_NWith road sign point set L₃,L₄,...,L_i,...,L_N；

Rear end optimization module is by L₂,L₃,...,L_i,...,L_NRoad sign point set L is merged into, including road sign point p₁,p₂,..., p_j,...,p_M, construct with depth camera pose ξ₂,ξ₃,...,ξ_i,...,ξ_NWith road sign point p₁,p₂,...,p_j,...,p_MFor variable Cost function Loss, and nonlinear optimization is carried out to cost function Loss using column Wen Baige-Ma Kuaer special formula method, obtained complete Office optimal depth camera pose ξ₂',ξ₃',...,ξ_i',...,ξ_N' and three-dimensional space in road sign point p₁',p₂',..., p_j',...,p_M'；

(7) semantic mapping module obtains semantic point cloud map:

(7a) semanteme mapping module is to color image sequence C₂,C₃,...,C_i,...,C_NIt is handled frame by frame, by the i-th frame cromogram As C_iMiddle depth value is not that 0 pixel is classified as pixel collection YP_i, and the CDS obtained using data fusion module_iIn it is dynamic State object instance information is by YP_iIn be not belonging to the pixel of dynamic object example and be classified as pixel collection CP_i；

(7b) semanteme mapping module passes through camera internal reference and ξ_i, calculate CP_iThree-dimensional coordinate position in three dimensions utilizes point Cloud library PCL generates three-dimensional space point, and all three-dimensional space points of generation are merged into a cloud PL_i；

(7c) semanteme mapping module utilizes the static object example segmented image CSS in data fusion module_iThe semantic letter of acquisition Breath, to CSS_iCorresponding cloud of pixel of middle static object example carries out semantic tagger, obtains semantic point cloud PL_i'；

(7d) semanteme mapping module puts cloud PL to semanteme₂',PL₃',...,PL_i',...,PL_N' spliced, obtain global semanteme Point cloud map PL.

3. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3a) Depth constraints method, realize step are as follows:

(3a1) is constructed with P₂Each of characteristic pointPixel coordinate centered on and size be 3 × 3 image block, and count Calculate the average depth value of each image block:

(3a2) passes throughWithIn C₁In matching characteristic pointCharacteristic matching is calculated to depth value distance D_d:

Threshold θ is arranged in (3a3), and by P₂Middle D_dCharacteristic point less than θ is classified as depth invariant feature point set DSP₂, realize to D_d's Depth constraints.

4. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (3c) Epipolar-line constraint method, realize step are as follows:

(3c1) passes through the internal reference x-axis zoom factor f of camera_x, y-axis zoom factor f_y, x-axis shift factor c_x, y-axis shift factor c_y By PP₂Each of characteristic pointPixel coordinate [u_s,v_s]^TIt is converted into normalized coordinate [u_c,v_c,1]^T:

(3c2) chooses EP using RANSAC method₂In eight characteristic points, and pass through eight Feature Points Matchings pair using 8 methods Basis matrix F is calculated, F and [u is then passed through_c,v_c,1]^TIt calculatesPolar curve l:

(3c3) by F, l,WithIn C₁In matching characteristic pointCharacteristic matching is calculated to polar curve distance D_e:

Threshold value η is arranged in (3c4), and by PP₂Middle D_eCharacteristic point less than η is classified as static nature point set SP₂, remaining characteristic point It is classified as behavioral characteristics point set DP₂, realize to D_eEpipolar-line constraint.

5. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (4) Static object example and dynamic object example, acquisition methods are as follows:

(4a) statistical environment set of characteristic points EP₂NumberStatic nature point set SP₂NumberWith behavioral characteristics point Set DP₂NumberAnd calculate behavioral characteristics point ratio τ_dWith potential behavioral characteristics point ratio τ_r:

(4b) sets τ_dThreshold value be 0.5, τ_rIt is 0.15, works as τ_d≤ 0.5 and τ_rWhen≤0.15, then it will test the example point in frame It cuts target and is classified as static object example, remaining example segmentation object is classified as dynamic object example.

6. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (5b) Road sign point set L in three-dimensional space₂, acquisition methods are as follows:

(5b1) is by the internal reference of camera by S₂Each of characteristic point pixel coordinate [u_s,v_s]^TIt is converted into normalized coordinate [u_c,v_c,1]^T:

(5b2) calculates camera coordinates P'=[X', Y', Z'] by normalized coordinate^T:

(5b3) passes through camera pose ξ₂In spin matrix R and translation vector t, convert world coordinates P for camera coordinates P'_w:

P_w=R^-1(P'-t)=[X, Y, Z]^T (10)

(5b4) will be located at P_wThree-dimensional space point be defined as road sign point p, and p is classified as road sign point set L₂。

7. the semantic SLAM method according to claim 2 based on joint constraint, which is characterized in that described in step (6) Cost function is constructed, realizes step are as follows:

(6a) is according to camera pose ξ₂In spin matrix R and translation vector t, by the road sign point p in L_jThree-dimensional coordinate [X, Y, Z]^T It is converted into camera coordinates p_j':

p_j'=Rp_j+ t=[X', Y', Z']^T (11)

(6b) passes through camera coordinates [X', Y', Z']^TCalculate normalized coordinate [u_c,v_c,1]^T:

(6c) calculates pixel coordinate P by the internal reference of camera_j=[u_s,v_s]:

(6d) passes through S₂In with p_jCorresponding characteristic point pixel coordinate P_j' calculate error e₂:

(6e) is according to step (7a)-(7d) method to ξ₃、ξ₄、…、ξ_i、…、ξ_NIdentical operation is successively carried out, e is obtained₃、 e₄、…、e_i、…、e_N；

(6f) is to e₂,e₃,...,e_i,...,e_NIt sums, obtains cost function Loss: