CN109815847A

CN109815847A - A kind of vision SLAM method based on semantic constraint

Info

Publication number: CN109815847A
Application number: CN201811648994.2A
Authority: CN
Inventors: 王蓉; 查文中; 葛建军; 孟繁乐; 孟祥瑞
Original assignee: CETC Information Science Research Institute
Current assignee: CETC Information Science Research Institute
Priority date: 2018-12-30
Filing date: 2018-12-30
Publication date: 2019-05-28
Anticipated expiration: 2038-12-30
Also published as: CN109815847B

Abstract

The vision SLAM method based on semantic constraint that the invention discloses a kind of, comprising: pass through depth camera continuous acquisition ambient enviroment image sequence；The key frame in described image sequence, which is handled, by vision SLAM method rebuilds map；Semantic segmentation is carried out to the key frame, and semantic constraint parameter is formulated according to semantic segmentation result；Semantic constraint is carried out to the reconstruction map by the semantic constraint parameter, and merges semantic segmentation result and obtains semantic map.Then, binding update is carried out to semantic constraint parameter and obligatory point after often detecting key frame；And the semantic map of benefit carries out pose estimation to depth camera in the case where non-key frame textural characteristics are abundant or lack textural characteristics.The present invention obtains more accurate semantic map by way of semantic constraint；By the estimation of the camera pose in the case of enriching to textural characteristics or lack two kinds of textural characteristics and the combination of the two, the accuracy of camera pose estimation is improved.

Description

A kind of vision SLAM method based on semantic constraint

Technical field

The invention belongs to computer visions, field of artificial intelligence, and in particular to a kind of view based on semantic constraint Feel SLAM method.

Background technique

Simultaneous Localization and Mapping (SLAM) is instant positioning and map structuring technology, It can position in real time sensor, while obtaining the three of circumstances not known by movement of the sensor in circumstances not known Tie up structure.SLAM sensor according to used in it can be broadly divided into laser SLAM and vision SLAM.Vision SLAM is due to it Colour or depth camera for using etc. price, convenience and in terms of outstanding advantage, obtained more and more Concern.Vision SLAM has broad application prospects in fields such as robot, augmented reality and automatic Pilots.

Conventional visual SLAM technology is easy to fail under the conditions ofs weak texture, quickly movement etc..With depth learning technology Continuous development and its excellent performance in Classification and Identification task, deep learning is combined with vision SLAM show it is wide Application prospect and huge potential value.Semantic SLAM is exactly one of them important direction.Conventional visual SLAM is only utilized With present the information such as color and geometry, there is no using semantic information abundant in space,

Summary of the invention

A kind of vision SLAM method based on semantic constraint that the purpose of the present invention is be achieved through the following technical solutions, packet It includes: by depth camera continuous acquisition ambient enviroment image sequence；Semantic segmentation is carried out to the key frame in described image sequence, And semantic constraint parameter is obtained according to semantic segmentation result；Semantic segmentation is handled by vision SLAM method as a result, rebuilding map； Semantic constraint is carried out to the reconstruction map by the semantic constraint parameter, and merges semantic segmentation result and obtains semantically Figure.

It is further, described that carry out semantic segmentation to key frame include: textural characteristics according to object specific in key frame Image is split, to identify the real language of corresponding region by a frame image segmentation at multiple regions and according to textural characteristics Justice.

Further, it is described according to semantic segmentation result obtain semantic constraint parameter include: obtain key in all features The depth information of point；It is described more that correspondence is obtained according to the depth information of multiple characteristic points of the ground region in semantic segmentation result Multiple three-dimensional points of a characteristic point；According to the multiple three-dimensional point, optimal planar parameter is obtained using random sampling unification algorism； Wherein, the optimal planar parameter is used as semantic constraint parameter, and updates the semantic constraint after detecting key frame every time Parameter.

Further, it is described by semantic constraint parameter to the reconstruction map carry out semantic constraint include: make it is described heavy Build multiple three-dimensional points in map in ground region to the ground region in the segmentation result multiple characteristic points connection it is straight Line obtains multiple straight line parameters；It is described more according to the multiple straight line parameter and the multiple intersection points of optimal planar gain of parameter A intersection point is as the obligatory point obtained, to carry out semantic constraint to the reconstruction map；Wherein, the obligatory point and institute's predicate Adopted constrained parameters binding update.

Further, the semantic map of the acquisition include: by by the semantic constraint parameter to the reconstruction map into Three-dimensional point corresponding to characteristic point of the obligatory point that row semantic constraint obtains with, multiple regions in semantic segmentation result and its existing Real semantic fusion, to obtain semantic map.

Further, the vision SLAM method based on semantic constraint, further includes: by non-in image sequence The analysis of key frame and in conjunction with the semantic map, estimates the pose of the depth camera；Wherein, to the depth It includes: to be estimated according to the pose of textural characteristics non-key frame abundant and according to lacking textural characteristics that the pose of camera, which carries out estimation, Non-key frame pose estimation.

Further, described to estimate to include: to described image sequence according to the pose of textural characteristics non-key frame abundant Textural characteristics in the non-key frame of column are identified, determine ground region；The characteristic point in the ground region is extracted, and is obtained Obtain subpoint of the obligatory point in semantic map in the non-key frame；Using least square method according to the characteristic point to described The Euclidean distance of subpoint constructs the first energy function；First energy function is solved to the position to the depth camera Appearance is estimated.

Further, described includes: using singular value decomposition method to first energy to the solution of the first energy function Function is solved, and transformation matrix is obtained；Wherein, the transformation matrix is estimated for the pose of depth camera.

Further, the described pair of pose estimation for lacking the non-key frame of textural characteristics includes: using described non-key The depth information of pixel obtains corresponding three-dimensional point in frame；According to the three-dimensional point into current semantics map obligatory point institute group At the Distance Judgment of plane, whether the pixel belongs to the ground region in image；Belong to the three-dimensional of ground region for judgement Point, using least square method according to the distance of the three-dimensional point formed plane of obligatory point into current semantics map building second Energy function；Second energy function is solved to estimate the pose of the depth camera.

Further, described includes: by the transformation matrix in second energy function to the solution of the second energy function It is decomposed into spin matrix and translation vector；Using gradient descent algorithm to the parameter to be asked in the spin matrix and translation vector Seek local derviation；Wherein, the local derviation of the parameter to be asked is estimated for the pose of depth camera.

The present invention has the advantages that

(1) aiming at the problem that vision SLAM is typically only capable to obtain the color of scene and geological information, this patent is by vision SLAM and image, semantic cutting techniques are combined to construct the semantic map of scene, to obtain the high-rise cognition letter to scene Breath, provides more natural human-computer interaction side for the application field including robot navigation, augmented reality and automatic Pilot Formula.

(2) aiming at the problem that mutually indepedent onrelevant between Scene Semantics information and geological information, this patent proposes one kind The thought that geometry constrains in SLAM is converted by semantic information.Ground region after focusing on semantic segmentation first, setting All ground regions should be located at the constraint condition in the same space plane.Its advantage is to have also contemplated language during SLAM The constraint condition of adopted information structuring so as to improve the performance of SLAM algorithm, and is widely used in indoor scene.Further, language Adopted information cannot be only used for constraint ground region, can also naturally be generalized in the constraint of arbitrary objects grade.

(3) it is directed to the generation and replacement problem of semantic constraint, this patent proposes a kind of ground based on key frame in SLAM Face parameter generates and update method, it is intended to incrementally obtain accurately global semantic ground parameter.In order to avoid raw in point map At the noise in input depth map is introduced into during with optimization, propose directly to pass through currently the characteristic point in ground region Plane parameter restore its three-dimensional coordinate point.

(4) aiming at the problem that texture marking area in image is only utilized in method of the tradition based on facial feature estimation pose, This patent proposes that the constraint of application semantics ground region is modified come the pose estimated result to camera.It devises such as formula (4) energy function simultaneously gives solution using gradient descent method to it.This have the advantage that the mistakes estimated in pose The region of the textures such as ground very important in practical indoor application shortage is considered in journey, to promote the essence of pose estimation Degree.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Attached drawing 1 shows semantic map in the vision SLAM method based on semantic constraint of embodiment according to the present invention Construct flow chart.

Attached drawing 2 is shown in the vision SLAM method based on semantic constraint of embodiment according to the present invention to key frame Process flow diagram.

Attached drawing 3 shows the workflow of the vision SLAM method based on semantic constraint of embodiment according to the present invention Figure.

Attached drawing 4 shows the effect diagram of semantic segmentation.

Specific embodiment

The illustrative embodiments of the disclosure are more fully described below with reference to accompanying drawings.Although showing this public affairs in attached drawing The illustrative embodiments opened, it being understood, however, that may be realized in various forms the disclosure without the reality that should be illustrated here The mode of applying is limited.It is to be able to thoroughly understand the disclosure on the contrary, providing these embodiments, and can be by this public affairs The range opened is fully disclosed to those skilled in the art.

The invention proposes the visions based on semantic constraint to position immediately and map structuring (Simultaneous Localization and Mapping SLAM) method, hereinafter it is referred to as the vision SLAM method based on semantic constraint.It is logical It crosses and semantic constraint is formulated according to the semantic segmentation in vision SLAM method, global plane parameter is updated, thus realization pair The building of SLAM semanteme map and amendment to camera pose.The present invention will be carried out by specific attached drawing below more detailed Explanation.

As shown in Figure 1, for according to map semantic in the vision SLAM method based on semantic constraint of embodiment of the present invention Building flow chart.Wherein, the building of the semantic map includes: by depth camera continuous acquisition ambient enviroment image sequence Column；Semantic segmentation is carried out to the key frame in described image sequence, and semantic constraint parameter is obtained according to semantic segmentation result；It is logical Vision SLAM method processing semantic segmentation is crossed as a result, rebuilding map；By the semantic constraint parameter to the reconstruction map into Row semantic constraint, and merge semantic segmentation result and obtain semantic map.

Specifically, the present invention carries out the language of image level using full connection convolutional neural networks to the key frame in image sequence Justice segmentation, knows according to the textural characteristics of object specific in key frame by a frame image segmentation at multiple regions and according to textural characteristics The reality of other corresponding region is semantic, and extracts the characteristic point in the multiple region；Wherein, textural characteristics and its corresponding It is real semantic by the semantic point cloud of formation, for connecting the self-teaching of convolutional neural networks entirely；Own next, obtaining in key The depth information of characteristic point obtains corresponding institute according to the depth information of multiple characteristic points of the ground region in semantic segmentation result State multiple three-dimensional points of multiple characteristic points；According to the multiple three-dimensional point, optimal planar is obtained using random sampling unification algorism Parameter；Wherein, the optimal planar parameter is used as semantic constraint parameter, and updates the semanteme after detecting key frame every time Constrained parameters；Next, making the multiple three-dimensional points rebuild in map in ground region to the ground in the segmentation result The connection straight line of multiple characteristic points in region, obtains multiple straight line parameters；According to the multiple straight line parameter and described optimal flat The multiple intersection points of face gain of parameter, the multiple intersection point is as the obligatory point obtained, to carry out semanteme about to the reconstruction map Beam；Wherein, the obligatory point and the semantic constraint parameter binding update；Wherein, the obligatory point is the language generated in the future The three-dimensional map point of ground region in the figure of free burial ground for the destitute；It avoids and is directly obtained by the depth information of characteristic point in this way Introduced input noise when three-dimensional map point, and by combining semantic constraint that vision SLAM method of the invention is enabled to obtain To more accurate and robust positioning and reconstructed results；Next, by the obligatory point, multiple regions characteristic point corresponding to three Dimension point and its real semantic fusion, to obtain semantic map.

More specifically, the recognition methods of the key frame in the video sequence includes: that setting checks frequency, according to image sequence The textural characteristics of specific frame and the pose situation of change of depth camera judge whether it is key frame in column.The semanteme map For the three-dimensional map for including dense point map and its corresponding semanteme.The meaning of the semantic constraint is, in most cases To the feature for being identified by the pixel of pixel spatial position, as ORB feature is determined, however because ray angles The factors such as degree, the feature for being located at the pixel in the same space plane in image equally have difference.Therefore, the invention proposes It is constrained by semanteme, the three-dimensional point as corresponding to characteristic point in ground region is tied in the same three-dimensional planar, from And construct more accurate three-dimensional map.It can be machine in this way and obtain the high-rise cognitive information to scene, for packet The more natural man-machine interaction mode of application field offer including robot navigation, augmented reality and automatic Pilot is provided.

The processing to key frame being related in above-mentioned semanteme map building process is as shown in Figure 2.

Shown in Fig. 2, for according in the vision SLAM method based on semantic constraint of embodiment of the present invention to key frame Process flow diagram.It wherein, include: to be judged as key frame, and the semanteme point of image level is carried out to key frame to the processing of key frame It cuts；Choose the three-dimensional point for obtaining and being identified as ground region in partition graph；According to the three-dimensional point, unanimously calculated by random sampling Method obtains optimal planar parameter；And then choose three-dimensional point corresponding to the characteristic point in the ground region and the optimal planar Parameter is fitted acquisition obligatory point, to obtain the three-dimensional map point of ground region in semantic map；Later, according to each inspection The key frame measured carries out binding update to the optimal planar parameter, three-dimensional point and obligatory point.

In the building and renewal process of semantic map, depth camera in real time according in image sequence non-key frame and The obligatory point of ground region carries out posture estimation in the semantic map of building.Wherein, the pose of depth camera can be expressed as Three-dimension varying matrix T from part to global coordinate system_wc, or the transformation matrix T from the overall situation to local coordinate system_cw, deposit between the two In the relationship of inverse transformation each other.

Wherein, footmark C indicates that part, footmark W indicate global, if three-dimensional space point X=[X, Y, Z]^T.Plane vector is π =(π₁,π₂,π₃,π₄)^T=(n^T,d)^T, wherein n is the normal vector of plane, and d is plane at a distance from world coordinate system origin.It is logical Cross the pose, i.e. transformation matrix T_wc, a local coordinate point X_cX can be passed through_w=T_wcX_cIt is transformed into global coordinate system.Together Sample can also complete the conversion of part plan to world coordinates planeHere X actually should be [X, Y, Z, 1]^T The homogeneous coordinates of form, but for the simplicity of statement, the homogeneous coordinates of X and it are not repartitioned herein, according to calculating needs Automatically it is converted.Therefore, the estimation key of depth camera pose is to seek transition matrix T_wcOr T_cw.For converting square Battle array is sought, and obtains transition matrix in such a way that building energy function is to solve energy function in the present invention.Furthermore this hair Textural characteristics abundance and the sparse two kinds of situations of textural characteristics in the bright image for considering acquisition.When textural characteristics abundance, to depth It includes: that the textural characteristics in non-key frame to described image sequence identify that the pose of degree camera, which carries out estimation, definitely Face region；The characteristic point in the ground region is extracted, and obtains projection of the obligatory point in the non-key frame in semantic map Point；The first energy function is constructed according to the Euclidean distance of the characteristic point to the subpoint using least square method；To described First energy function solves to estimate the pose of the depth camera.Wherein, turn of the subpoint and obligatory point Change relationship may be expressed as: with formula

Wherein, X_cFor obligatory point, d_cFor the origin in local coordinate system to part plan distance,For presently described spy The normal vector of part plan, K where sign point be calibration matrix,For the homogeneous coordinates of the subpoint in the part plan.Its In, calibration matrix K is embodied as:

Wherein, f_xAnd f_yFor the focal length in two dimensions of x in the planes and y, c_xAnd c_yFor corresponding optical center coordinate.Then, Obtain the subpoint of three-dimensional map point in the current frame；And using least square method according to the characteristic point to the subpoint Euclidean distance construct the first energy function:

Wherein, π (KT_cwX_w) indicating subpoint, wherein K is calibration matrix, T_cwFor transition matrix, X_wFor global restriction point, U is characterized a little.Wherein, characteristic point u is the ORB feature obtained by ORB Feature Descriptor, and the ORB feature has scale, rotation Turn, the characteristic of illumination invariant.

In above-mentioned pose estimation procedure, the global restriction point is tied more at semantic map rejuvenation (key frame update) Newly, and then again by formula (1), formula (2) and formula (3) process it completes to correct the pose of depth camera.

Since vision SLAM method is commonly used in the occasions such as robot navigation or augmented reality and automatic Pilot, lead to Cross depth camera acquisition image in textural characteristics be not it is unalterable, therefore, the present invention in also particularly contemplated line The case where when reason feature deficiency.In the non-key frame when lacking textural characteristics, due to can not be by textural characteristics over the ground Face region is identified.Therefore the invention proposes obtain corresponding three using the depth information of pixel in the non-key frame Dimension point；According to distance (the i.e. plane or optimal planar composed by obligatory point on three-dimensional point ground into current semantics map Parameter) judge the ground region whether pixel belongs in image；The three-dimensional point for belonging to ground region for judgement, using most Small square law constructs the second energy function according to the distance on three-dimensional point ground into current semantics map；Wherein, described Two energy functions are as follows:

Wherein,Indicate that the three-dimensional map point of present frame is less than all of given threshold to the distance on semantic map ground Three-dimensional point under global coordinate system, and haveWherein,For the three-dimensional point under local coordinate system；T_wcTo convert square Battle array.Wherein,Z-direction component can directly be obtained from the corresponding depth map of the frame.Unknown variable is transition matrix T_wc The translation vector t of 3 × 3 spin matrix R and 3 × 1 can be broken down into.Different from the solution of the first energy function, due to The noise that can be introduced into during obtaining corresponding three-dimensional point by the depth information of pixel in depth map, so to the second energy The solution present invention of flow function carries out seeking local derviation by gradient descent method as follows, position when for estimation pose Appearance amendment:

Wherein, λ_kIt is parameter to be asked, step-length of the t as iteration.In order to solve conveniently, wait ask R in transformation and t can be with It indicates are as follows:

T=[t_x t_y t_z]^T (7)

Wherein q_x,q_y,q_z,q_wIndicate rotation quaternary number, t_x,t_y,t_zIt indicates along three translation of axes amounts.

Treating to internal items of summing in formula (4) asks the local derviation of parameter that can indicate are as follows:

By formula (6) and formula (7), local derviation of the three-dimensional point in world coordinate system to each parameter to be asked can be indicated For the function of three-dimensional point and current pose parameter value in Current camera coordinate system, as shown in table 1.

The local derviation of three-dimensional point in 1 world coordinate system of table to each pose parameter

Pose estimation for depth camera, when the present invention can be enriched by combining according to textural characteristics and lacks texture The pose of depth camera is estimated in the case of two kinds when feature, and is corrected when lacking textural characteristics to pose, so that this Invention has higher accuracy compared with the estimation of traditional pose to the estimation of camera pose.Certainly, the raising of this precision It is unable to do without the improvement in terms of the obligatory point in semanteme map.

As shown in figure 3, for according to the workflow of the vision SLAM method based on semantic constraint of embodiment of the present invention Figure.Wherein, since image sequence is obtained by depth camera, the present invention is based on the vision SLAM methods of semantic constraint Input be the image with colored and depth map information；Each non-key frame be used for the estimation of depth camera pose or Amendment.By each key frame not Cha Ru to carry out semantic segmentation, the generation of semantic point cloud, the update of three-dimensional map point, The processes such as binding optimization and global semantic update, to obtain semantic map.It wherein, will while global semantic constraint updates The point map to ground region in semantic map it can update simultaneously and the pose of camera is modified.Wherein, before semantic segmentation Image afterwards is as shown in Figure 4.

As shown in figure 4, being the effect diagram of semantic segmentation.Left side is input picture, and right side is pair after semantic segmentation Answer image, the region for being identified as different semantemes will be shown by different color and its specific partition graph, in this figure with Different gray scales are replaced.

Finally, it is noted that the above-mentioned introduction about the method for the present invention is to be directed to such as flooring scene progress , i.e., indoors etc. in scenes, ground is plane, then constrains the spy in the ground region by semantic " ground region " in turn Sign point is reasonable in the same plane.But it is emphasized that the method for the present invention cannot be only used for constraint ground region, can also be very It is naturally generalized in the constraint of arbitrary objects grade, such as there is spherical object in identification scene, then when determining the ball in image When the region of shape object, the characteristic point in the region also should comply with feature corresponding to spherical shape, e.g., when by those features When point is converted to three-dimensional point, these three-dimensional points should have the equidistant characteristic to a certain spatial point.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of the claim Subject to enclosing.

Claims

1. a kind of vision SLAM method based on semantic constraint characterized by comprising

Pass through depth camera continuous acquisition ambient enviroment image sequence；

Semantic segmentation is carried out to the key frame in described image sequence, and semantic constraint parameter is obtained according to semantic segmentation result；

Semantic segmentation is handled by vision SLAM method as a result, rebuilding map；

Semantic constraint is carried out to the reconstruction map by the semantic constraint parameter, and merges semantic segmentation result and obtains semanteme Map.

2. the vision SLAM method according to claim 1 based on semantic constraint, which is characterized in that it is described to key frame into Row semantic segmentation includes:

Image is split according to the textural characteristics of object specific in key frame, thus by a frame image segmentation at multiple regions And identify that the reality of corresponding region is semantic according to textural characteristics.

3. the vision SLAM method according to claim 1 based on semantic constraint, which is characterized in that described according to semanteme point Cutting result acquisition semantic constraint parameter includes:

Obtain the depth information of all characteristic points in key；

Corresponding the multiple characteristic point is obtained according to the depth information of multiple characteristic points of the ground region in semantic segmentation result Multiple three-dimensional points；

According to the multiple three-dimensional point, optimal planar parameter is obtained using random sampling unification algorism；Wherein,

The optimal planar parameter is used as semantic constraint parameter, and the semantic constraint ginseng is updated after detecting key frame every time Number.

4. the vision SLAM method according to claim 1 based on semantic constraint, which is characterized in that it is described by it is semantic about Beam parameter carries out semantic constraint to the reconstruction map

Make multiple three-dimensional points in the reconstruction map in ground region to the ground region in the segmentation result multiple spies The connection straight line for levying point, obtains multiple straight line parameters；

According to the multiple straight line parameter and the multiple intersection points of optimal planar gain of parameter, the multiple intersection point is as acquisition Obligatory point, to carry out semantic constraint to the reconstruction map；Wherein,

The obligatory point and the semantic constraint parameter binding update.

5. the vision SLAM method according to claim 1 based on semantic constraint, which is characterized in that the acquisition is semantically Figure includes:

The obligatory point and, semantic segmentation knot of semantic constraint acquisition will be carried out to the reconstruction map by the semantic constraint parameter Three-dimensional point corresponding to the characteristic point of multiple regions in fruit and its real semantic fusion, to obtain semantic map.

6. the vision SLAM method according to claim 1 based on semantic constraint, which is characterized in that further include:

By the analysis to the non-key frame in image sequence and in conjunction with the semantic map, to the pose of the depth camera Estimated；Wherein,

Carrying out estimation to the pose of the depth camera includes: according to the estimation of the pose of textural characteristics non-key frame abundant and root Estimate according to the pose for the non-key frame for lacking textural characteristics.

7. the vision SLAM method according to claim 6 based on semantic constraint, which is characterized in that described according to texture spy The pose for levying non-key frame abundant is estimated

Textural characteristics in the non-key frame of described image sequence are identified, determine ground region；

The characteristic point in the ground region is extracted, and obtains projection of the obligatory point in the non-key frame in semantic map Point；

The first energy function is constructed according to the Euclidean distance of the characteristic point to the subpoint using least square method；

First energy function is solved to estimate the pose of the depth camera.

8. the vision SLAM method according to claim 7 based on semantic constraint, which is characterized in that described to the first energy Function solves

First energy function is solved using singular value decomposition method, obtains transformation matrix；Wherein,

Pose of the transformation matrix for depth camera is estimated.

9. the vision SLAM method according to claim 6 based on semantic constraint, which is characterized in that described pair lacks texture The pose of the non-key frame of feature is estimated

Corresponding three-dimensional point is obtained using the depth information of pixel in the non-key frame；

According to the three-dimensional point, into current semantics map, whether the Distance Judgment of the formed plane of the obligatory point pixel belongs to Ground region in image；

The three-dimensional point of ground region is belonged to for judgement, using least square method according to the three-dimensional point into current semantics map The distance of the formed plane of obligatory point constructs the second energy function；

Second energy function is solved to estimate the pose of the depth camera.

10. the vision SLAM method according to claim 9 based on semantic constraint, which is characterized in that described to the second energy Flow function solves

Transformation matrix in second energy function is decomposed into spin matrix and translation vector；

Local derviation is asked to the parameter to be asked in the spin matrix and translation vector using gradient descent algorithm；Wherein, described wait ask Pose of the local derviation of parameter for depth camera is estimated.