CN109815847B - Visual SLAM method based on semantic constraint - Google Patents

Visual SLAM method based on semantic constraint Download PDF

Info

Publication number
CN109815847B
CN109815847B CN201811648994.2A CN201811648994A CN109815847B CN 109815847 B CN109815847 B CN 109815847B CN 201811648994 A CN201811648994 A CN 201811648994A CN 109815847 B CN109815847 B CN 109815847B
Authority
CN
China
Prior art keywords
semantic
constraint
points
map
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811648994.2A
Other languages
Chinese (zh)
Other versions
CN109815847A (en
Inventor
王蓉
查文中
葛建军
孟繁乐
孟祥瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC Information Science Research Institute
Original Assignee
CETC Information Science Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC Information Science Research Institute filed Critical CETC Information Science Research Institute
Priority to CN201811648994.2A priority Critical patent/CN109815847B/en
Publication of CN109815847A publication Critical patent/CN109815847A/en
Application granted granted Critical
Publication of CN109815847B publication Critical patent/CN109815847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a visual SLAM method based on semantic constraint, which comprises the following steps: continuously acquiring a sequence of images of the surrounding environment by a depth camera; processing a key frame reconstruction map in the image sequence by a visual SLAM method; performing semantic segmentation on the key frame, and formulating semantic constraint parameters according to a semantic segmentation result; and performing semantic constraint on the reconstructed map through the semantic constraint parameters, and fusing semantic segmentation results to obtain a semantic map. Then, binding and updating semantic constraint parameters and constraint points after each key frame is detected; and the semantic map is utilized to estimate the pose of the depth camera under the condition that the texture features of the non-key frames are rich or lack the texture features. According to the invention, a more accurate semantic map is obtained in a semantic constraint mode; the accuracy of camera pose estimation is improved by estimating the camera pose under the two conditions of abundant texture features or lack of texture features and combining the two conditions.

Description

Visual SLAM method based on semantic constraint
Technical Field
The invention belongs to the technical field of computer vision and artificial intelligence, and particularly relates to a visual SLAM method based on semantic constraint.
Background
Simultaneous Localization and Mapping (SLAM) is an instant positioning and Mapping technology, and can position a sensor in real time through the motion of the sensor in an unknown environment and obtain a three-dimensional structure of the unknown environment. SLAMs can be broadly classified into laser SLAMs and visual SLAMs according to sensors used therein. Visual SLAM is gaining increasing attention due to its outstanding advantages in price, convenience, versatility, etc. of using color or depth cameras, etc. The visual SLAM has wide application prospect in the fields of robots, augmented reality, automatic driving and the like.
Conventional visual SLAM techniques are prone to failure under conditions of weak texture, fast motion, etc. With the continuous development of deep learning technology and the excellent performance of the deep learning technology in the classification and identification tasks, the deep learning and visual SLAM are combined to present a wide application prospect and a huge potential value. The semantic SLAM is one of the important directions. The conventional visual SLAM only utilizes and presents information such as color, geometric structure and the like, does not utilize rich semantic information in space,
disclosure of Invention
The invention aims to provide a visual SLAM method based on semantic constraint, which is realized by the following technical scheme and comprises the following steps: continuously acquiring a sequence of images of the surrounding environment by a depth camera; performing semantic segmentation on the key frames in the image sequence, and obtaining semantic constraint parameters according to semantic segmentation results; processing a semantic segmentation result by a visual SLAM method, and reconstructing a map; and performing semantic constraint on the reconstructed map through the semantic constraint parameters, and fusing semantic segmentation results to obtain a semantic map.
Further, the semantic segmentation of the key frame includes: and segmenting the image according to the texture features of specific objects in the key frame, thereby segmenting one frame of image into a plurality of regions and identifying the reality semantics of the corresponding region according to the texture features.
Further, the obtaining semantic constraint parameters according to the semantic segmentation result includes: obtaining depth information of all feature points in the key; obtaining a plurality of three-dimensional points corresponding to a plurality of feature points according to depth information of the feature points of the ground area in the semantic segmentation result; obtaining optimal plane parameters by utilizing a random sampling consistency algorithm according to the three-dimensional points; and updating the semantic constraint parameters after detecting the key frame every time.
Further, the semantically constraining the reconstructed map by the semantically constraining parameters includes: connecting straight lines from a plurality of three-dimensional points in the ground area in the reconstructed map to a plurality of feature points of the ground area in the segmentation result are made to obtain a plurality of straight line parameters; obtaining a plurality of intersection points according to the plurality of straight line parameters and the optimal plane parameter, wherein the plurality of intersection points are used as obtained constraint points, so that semantic constraint is carried out on the reconstructed map; wherein the constraint points are updated in a binding manner with the semantic constraint parameters.
Further, the obtaining the semantic map comprises: and fusing constraint points obtained by performing semantic constraint on the reconstructed map through the semantic constraint parameters, three-dimensional points corresponding to the feature points of the plurality of regions in the semantic segmentation result and reality semantics thereof, thereby obtaining the semantic map.
Further, the semantic constraint-based visual SLAM method further includes: estimating the pose of the depth camera by analyzing non-key frames in the image sequence and combining the semantic map; wherein estimating the pose of the depth camera comprises: and estimating the pose of the non-key frame with abundant textural features and estimating the pose of the non-key frame with lacking textural features.
Further, the pose estimation according to the texture feature rich non-key frame comprises: identifying texture features in non-key frames of the image sequence and determining a ground area; extracting feature points in the ground area, and obtaining projection points of constraint points in the semantic map in the non-key frame; constructing a first energy function according to the Euclidean distance from the characteristic point to the projection point by using a least square method; solving the first energy function to estimate a pose of the depth camera.
Further, the solving the first energy function includes: solving the first energy function by using a singular value decomposition method to obtain a transformation matrix; wherein the transformation matrix is used for pose estimation of the depth camera.
Further, the pose estimation for the non-key frames lacking texture features comprises: acquiring corresponding three-dimensional points by using the depth information of the pixel points in the non-key frame; judging whether the pixel point belongs to a ground area in the image or not according to the distance from the three-dimensional point to a plane formed by constraint points in the current semantic map; for the three-dimensional points which belong to the ground area, a second energy function is constructed according to the distance from the three-dimensional points to a plane formed by constraint points in the current semantic map by using a least square method; solving the second energy function to estimate the pose of the depth camera.
Further, the solving the second energy function includes: decomposing the transformation matrix in the second energy function into a rotation matrix and a translation vector; calculating the partial derivatives of the parameters to be calculated in the rotation matrix and the translation vector by using a gradient descent algorithm; and the partial derivatives of the parameters to be solved are used for estimating the pose of the depth camera.
The invention has the advantages that:
(1) aiming at the problem that the visual SLAM can only obtain the color and geometric information of a scene generally, the patent combines the visual SLAM and an image semantic segmentation technology to construct a semantic map of the scene, so that high-level cognitive information of the scene is obtained, and a more natural man-machine interaction mode is provided for application fields including robot navigation, augmented reality and automatic driving.
(2) Aiming at the problem that scene semantic information and geometric information are independent and unrelated, the patent provides an idea of converting the semantic information into geometric structure constraint in SLAM. Focusing on the ground area after semantic segmentation, and setting a constraint condition that all the ground areas should be located on the same spatial plane. The method has the advantages that the constraint condition of semantic information construction is also considered in the SLAM process, so that the performance of the SLAM algorithm is improved, and the method is widely applied to indoor scenes. Furthermore, the semantic information can be used for restricting the ground area and can be naturally popularized to the restriction of any object level.
(3) Aiming at the problems of generation and updating of semantic constraints, the patent provides a ground parameter generation and updating method based on a key frame in SLAM, and aims to obtain accurate global semantic ground parameters in an incremental manner. In order to avoid introducing noise in the input depth map in the process of generating and optimizing map points, the three-dimensional coordinate points of the feature points in the ground area are directly recovered through the current plane parameters.
(4) Aiming at the problem that the traditional method for estimating the pose based on the feature points only utilizes texture salient regions in the image, the invention provides the method for correcting the pose estimation result of the camera by applying the constraint of semantic ground regions. The energy function as in equation (4) is designed and solved using the gradient descent method. The advantage of doing so is that the area that texture lacks such as ground that is not negligible in the actual indoor application has been considered in the process of position appearance estimation to promote the precision of position appearance estimation.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating semantic map construction in a visual SLAM method based on semantic constraints according to an embodiment of the present invention.
Fig. 2 is a flow chart illustrating the processing of key frames in the visual SLAM method based on semantic constraints according to an embodiment of the present invention.
FIG. 3 is a flow chart illustrating the operation of a visual SLAM method based on semantic constraints according to an embodiment of the present invention.
Fig. 4 shows a schematic diagram of the effect of semantic segmentation.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The invention provides a semantic constraint-based visual instant positioning and Mapping SLAM (Simultaneous Localization and Mapping SLAM) method, which is hereinafter referred to as a semantic constraint-based visual SLAM method for short. Semantic constraints are formulated according to semantic segmentation in the visual SLAM method, and global plane parameters are updated, so that the SLAM semantic map is constructed and the camera pose is corrected. The invention will be explained in more detail below with reference to the specific figures.
Fig. 1 is a flow chart illustrating the construction of a semantic map in a visual SLAM method based on semantic constraints according to an embodiment of the present invention. Wherein the semantic map is constructed by: continuously acquiring a sequence of images of the surrounding environment by a depth camera; performing semantic segmentation on the key frames in the image sequence, and obtaining semantic constraint parameters according to semantic segmentation results; processing a semantic segmentation result by a visual SLAM method, and reconstructing a map; and performing semantic constraint on the reconstructed map through the semantic constraint parameters, and fusing semantic segmentation results to obtain a semantic map.
Specifically, the method adopts a fully-connected convolutional neural network to perform image-level semantic segmentation on a key frame in an image sequence, segments one frame of image into a plurality of regions according to the texture features of specific objects in the key frame, identifies the actual semantics of the corresponding regions according to the texture features, and extracts feature points in the regions; the texture features and the corresponding reality semantics form semantic point cloud which is used for self-learning of the fully-connected convolutional neural network; then, obtaining depth information of all feature points in the key, and obtaining a plurality of three-dimensional points corresponding to the feature points according to the depth information of the feature points of the ground area in the semantic segmentation result; obtaining optimal plane parameters by utilizing a random sampling consistency algorithm according to the three-dimensional points; the optimal plane parameter is used as a semantic constraint parameter, and the semantic constraint parameter is updated after a key frame is detected each time; then, connecting straight lines from a plurality of three-dimensional points in the ground area in the reconstructed map to a plurality of feature points of the ground area in the segmentation result are made, and a plurality of straight line parameters are obtained; obtaining a plurality of intersection points according to the plurality of straight line parameters and the optimal plane parameter, wherein the plurality of intersection points are used as obtained constraint points, so that semantic constraint is carried out on the reconstructed map; wherein the constraint points are updated in a binding manner with the semantic constraint parameters; the constraint points are three-dimensional map points of a ground area in a semantic map generated in the future; by the method, input noise introduced when the three-dimensional map point is directly obtained through the depth information of the feature point is avoided, and the visual SLAM method can obtain more accurate and robust positioning and reconstruction results by combining semantic constraint; and then, fusing the three-dimensional points corresponding to the constraint points and the characteristic points of the plurality of areas and the real semantics thereof to obtain a semantic map.
More specifically, the method for identifying key frames in a video sequence includes: and setting an inspection frequency, and judging whether the image sequence is a key frame according to the texture features of the specific frame in the image sequence and the pose change condition of the depth camera. The semantic map is a three-dimensional map comprising dense map points and corresponding semantics. The meaning of the semantic constraint is that in most cases, the identification of the spatial position of a pixel point is determined by the characteristics of the pixel point, such as the ORB characteristics, however, because of factors such as the light angle, the characteristics of the pixel points in the same spatial plane in an image also have differences. Therefore, the invention provides the method for restraining the three-dimensional points corresponding to the characteristic points in the ground area to the same three-dimensional plane through semantic restraint, so that a more accurate three-dimensional map is constructed. The method can enable the machine to obtain high-level cognitive information of a scene, and provides a more natural man-machine interaction mode for application fields including robot navigation, augmented reality and automatic driving.
The processing of key frames involved in the semantic map building process described above is shown in fig. 2.
Fig. 2 is a flowchart illustrating processing of a key frame in a visual SLAM method based on semantic constraints according to an embodiment of the present invention. Wherein the processing of the key frame comprises: judging the frame as a key frame, and performing image-level semantic segmentation on the key frame; selecting and obtaining three-dimensional points identified as ground areas in the segmentation graph; obtaining optimal plane parameters through a random sampling consistency algorithm according to the three-dimensional points; then selecting three-dimensional points corresponding to the feature points in the ground area and fitting the optimal plane parameters to obtain constraint points, so as to obtain three-dimensional map points of the ground area in the semantic map; and then, binding and updating the optimal plane parameters, the three-dimensional points and the constraint points according to the detected key frames each time.
In the process of building and updating the semantic map, the depth camera carries out attitude estimation in real time according to non-key frames in the image sequence and constraint points of the ground area in the built semantic map. Wherein the pose of the depth camera can be expressed as a three-dimensional transformation matrix T from a local to a global coordinate systemwcOr transformation matrix T from global to local coordinate systemcwThe two are in inverse transformation relation to each other.
Wherein, the corner mark C represents local part, the corner mark W represents global part, and three-dimensional space point X is [ X, Y, Z ═ X]T. The plane vector is pi ═ pi (pi)1234)T=(nT,d)TWhere n is the normal vector of the plane and d is the distance of the plane from the world coordinate system origin. By said pose, i.e. transformation matrix TwcA local coordinate point XcCan pass through Xw=TwcXcAnd transforming into a global coordinate system. Likewise, the conversion of local planes to global coordinate planes may also be accomplished
Figure BDA0001932589820000061
Where X is essentially [ X, Y, Z,1 ]]THomogeneous coordinates of the form, but for simplicity of presentation, X is no longer distinguished herein from its homogeneous coordinates, which are automatically transformed according to computational needs. Therefore, the key to estimating the pose of the depth camera is to find the transformation matrix TwcOr Tcw. For the calculation of the conversion matrix, the conversion matrix is obtained by constructing an energy function and solving the energy function. In addition, the method considers two situations of sufficient texture features and sparse texture features in the acquired image. When the texture features are sufficient, estimating the pose of the depth camera comprises: identifying texture features in non-key frames of the image sequence and determining a ground area; extracting feature points in the ground area, and obtaining projection points of constraint points in the semantic map in the non-key frame;constructing a first energy function according to the Euclidean distance from the characteristic point to the projection point by using a least square method; solving the first energy function to estimate a pose of the depth camera. Wherein, the conversion relationship between the projection point and the constraint point can be expressed by the following formula:
Figure BDA0001932589820000062
wherein, XcIs a constraint point, dcIs the distance from the origin in the local coordinate system to the local plane,
Figure BDA0001932589820000063
Is a normal vector of a local plane where the current characteristic point is located, K is a calibration matrix,
Figure BDA0001932589820000071
Homogeneous coordinates of the projected points in the local plane. Wherein, the calibration matrix K is specifically expressed as:
Figure BDA0001932589820000072
wherein f isxAnd fyFor focal lengths in both x and y dimensions in a plane, cxAnd cyAre the corresponding optical center coordinates. Then, obtaining the projection point of the three-dimensional map point in the current frame; and constructing a first energy function according to the Euclidean distance from the characteristic point to the projection point by using a least square method:
Figure BDA0001932589820000073
wherein, pi (KT)cwXw) Representing projection points, where K is a calibration matrix, TcwFor transforming the matrix, XwIs a global constraint point and u is a feature point. Wherein the feature point u is an ORB feature obtained by an ORB feature descriptor, the ORB feature having a scaleRotation, and illumination invariance.
In the pose estimation process, the global constraint points are bundled and updated during semantic map updating (keyframe updating), and pose correction of the depth camera is completed through the processes of formula (1), formula (2) and formula (3).
Since the visual SLAM method is generally used in situations such as robot navigation, augmented reality, and automatic driving, the texture features in the image obtained by the depth camera are not uniform, and therefore, the situation when the texture features are insufficient is particularly considered in the present invention. When the texture features are absent in the non-key frames, the ground area cannot be identified through the texture features. Therefore, the invention provides that the depth information of the pixel points in the non-key frame is utilized to obtain the corresponding three-dimensional points; judging whether the pixel point belongs to the ground area in the image or not according to the distance from the three-dimensional point to the ground in the current semantic map (namely, the plane formed by the constraint points or the optimal plane parameter); for the three-dimensional points which belong to the ground area, a second energy function is constructed according to the distance from the three-dimensional points to the ground in the current semantic map by using a least square method; wherein the second energy function is as follows:
Figure BDA0001932589820000074
wherein the content of the first and second substances,
Figure BDA0001932589820000075
three-dimensional points in all global coordinate systems, which represent that the distance from the three-dimensional map point of the current frame to the semantic map ground is less than a set threshold value, are provided
Figure BDA0001932589820000076
Wherein the content of the first and second substances,
Figure BDA0001932589820000077
three-dimensional points under a local coordinate system; t iswcIs a transformation matrix. Wherein the content of the first and second substances,
Figure BDA0001932589820000081
the Z-direction component of (a) can be directly obtained from the corresponding depth map of the frame. The variable to be solved is a conversion matrix TwcCan be decomposed into a 3 x 3 rotation matrix R and a 3 x 1 translation vector t. Different from the solution of the first energy function, since noise in the depth map is introduced in the process of obtaining the corresponding three-dimensional point through the depth information of the pixel point, the solution of the second energy function is used for correcting the pose when estimating the pose by solving the partial derivative through a gradient descent method as shown in the following:
Figure BDA0001932589820000082
wherein λ iskIs the parameter to be solved, and t is used as the step length of iteration. For ease of solution, R and t in the transform to be solved can be expressed as:
Figure BDA0001932589820000083
t=[tx ty tz]T (7)
wherein q isx,qy,qz,qwRepresenting a rotational quaternion, tx,ty,tzIndicating the amount of translation along three coordinate axes.
The partial derivative of the parameter to be solved for the sum of the internal terms in equation (4) can be expressed as:
Figure BDA0001932589820000084
through the formula (6) and the formula (7), the partial derivatives of the three-dimensional points in the world coordinate system to each parameter to be solved can be expressed as functions of the three-dimensional points in the current camera coordinate system and the current pose parameter values, as shown in table 1.
TABLE 1 partial derivatives of three-dimensional points in the world coordinate System to each pose parameter
Figure BDA0001932589820000085
Figure BDA0001932589820000091
For the pose estimation of the depth camera, the pose estimation of the depth camera under the two conditions of rich texture features and lack of texture features is combined, and the pose is corrected under the condition of lack of texture features, so that the pose estimation of the depth camera has higher accuracy compared with the traditional pose estimation. Of course, this increase in accuracy also leaves the improvement in the constraint points in the semantic map.
Fig. 3 is a flowchart illustrating a visual SLAM method based on semantic constraints according to an embodiment of the present invention. Wherein, the input of the visual SLAM method based on semantic constraint is an image with color and depth map information because the image sequence is obtained by a depth camera; each non-key frame is used for estimation or correction of the depth camera pose. The semantic map is obtained by performing processes of semantic segmentation, semantic point cloud generation, three-dimensional map point updating, binding optimization, global semantic updating and the like through the respective insertion of each key frame. And updating the map points of the ground area in the semantic map and correcting the pose of the camera simultaneously when the global semantic constraint is updated. The images before and after semantic segmentation are shown in fig. 4.
Fig. 4 is a schematic diagram illustrating the effect of semantic segmentation. The left side is an input image, the right side is a corresponding image after semantic segmentation, areas recognized as different semantics are displayed through different colors and specific segmentation graphs thereof, and the images are replaced by different gray levels.
Finally, it should be noted that the above description of the method of the present invention is directed to a scene such as an indoor ground, that is, in the indoor scene, the ground is a plane, and it is reasonable to conversely constrain the feature points in the ground area to be on the same plane by the semantic "ground area". However, it should be emphasized that the method of the present invention can be used not only for constraining a ground area, but also for naturally extending to any object level constraint, for example, if there is a spherical object in an identified scene, when determining the area of the spherical object in an image, the feature points in the area should also conform to the features corresponding to the sphere, for example, when converting the feature points into three-dimensional points, the three-dimensional points should have the characteristic of equal distance to a certain spatial point.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A visual SLAM method based on semantic constraints is characterized by comprising the following steps:
continuously acquiring a sequence of images of the surrounding environment by a depth camera;
performing semantic segmentation on the key frames in the image sequence, and obtaining semantic constraint parameters according to semantic segmentation results;
processing a semantic segmentation result by a visual SLAM method, and reconstructing a map;
performing semantic constraint on the reconstructed map through the semantic constraint parameters, and fusing semantic segmentation results to obtain a semantic map;
the obtaining of the semantic constraint parameters according to the semantic segmentation result comprises:
obtaining depth information of all feature points in the key frame;
obtaining a plurality of three-dimensional points corresponding to a plurality of feature points according to depth information of the feature points of the ground area in the semantic segmentation result;
obtaining optimal plane parameters by utilizing a random sampling consistency algorithm according to the three-dimensional points; wherein the content of the first and second substances,
the optimal plane parameter is used as a semantic constraint parameter, and the semantic constraint parameter is updated after a key frame is detected each time;
the semantically constraining the reconstructed map by the semantically constraining parameters comprises:
connecting straight lines from a plurality of three-dimensional points in the ground area in the reconstructed map to a plurality of feature points of the ground area in the segmentation result are made to obtain a plurality of straight line parameters;
obtaining a plurality of intersection points according to the plurality of straight line parameters and the optimal plane parameter, wherein the plurality of intersection points are used as obtained constraint points, so that semantic constraint is carried out on the reconstructed map; wherein the content of the first and second substances,
and the constraint points and the semantic constraint parameters are updated in a binding mode.
2. The visual SLAM method based on semantic constraints of claim 1 wherein said semantically segmenting key frames comprises:
and segmenting the image according to the texture features of specific objects in the key frame, thereby segmenting one frame of image into a plurality of regions and identifying the reality semantics of the corresponding region according to the texture features.
3. The visual SLAM method based on semantic constraints of claim 1 wherein the obtaining a semantic map comprises:
and fusing constraint points obtained by performing semantic constraint on the reconstructed map through the semantic constraint parameters, three-dimensional points corresponding to the feature points of the plurality of regions in the semantic segmentation result and reality semantics thereof, thereby obtaining the semantic map.
4. The visual SLAM method based on semantic constraints of claim 1 further comprising:
estimating the pose of the depth camera by analyzing non-key frames in the image sequence and combining the semantic map; wherein the content of the first and second substances,
estimating the pose of the depth camera comprises: and estimating the pose of the non-key frame with abundant textural features and estimating the pose of the non-key frame with lacking textural features.
5. The visual SLAM method based on semantic constraints of claim 4 wherein the pose estimation from textural feature rich non-key frames comprises:
identifying texture features in non-key frames of the image sequence and determining a ground area;
extracting feature points in the ground area, and obtaining projection points of constraint points in the semantic map in the non-key frame;
constructing a first energy function according to the Euclidean distance from the characteristic point to the projection point by using a least square method;
solving the first energy function to estimate a pose of the depth camera.
6. The visual SLAM method based on semantic constraints of claim 5 wherein said solving a first energy function comprises:
solving the first energy function by using a singular value decomposition method to obtain a transformation matrix; wherein the content of the first and second substances,
the transformation matrix is used for pose estimation of the depth camera.
7. The visual SLAM method based on semantic constraints of claim 4 wherein the pose estimation for non-key frames lacking textural features comprises:
acquiring corresponding three-dimensional points by using the depth information of the pixel points in the non-key frame;
judging whether the pixel point belongs to a ground area in the image or not according to the distance from the three-dimensional point to a plane formed by constraint points in the current semantic map;
for the three-dimensional points which belong to the ground area, a second energy function is constructed according to the distance from the three-dimensional points to a plane formed by constraint points in the current semantic map by using a least square method;
solving the second energy function to estimate the pose of the depth camera.
8. The visual SLAM method based on semantic constraints of claim 7 wherein said solving a second energy function comprises:
decomposing the transformation matrix in the second energy function into a rotation matrix and a translation vector;
calculating the partial derivatives of the parameters to be calculated in the rotation matrix and the translation vector by using a gradient descent algorithm; and the partial derivatives of the parameters to be solved are used for estimating the pose of the depth camera.
CN201811648994.2A 2018-12-30 2018-12-30 Visual SLAM method based on semantic constraint Active CN109815847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811648994.2A CN109815847B (en) 2018-12-30 2018-12-30 Visual SLAM method based on semantic constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811648994.2A CN109815847B (en) 2018-12-30 2018-12-30 Visual SLAM method based on semantic constraint

Publications (2)

Publication Number Publication Date
CN109815847A CN109815847A (en) 2019-05-28
CN109815847B true CN109815847B (en) 2020-12-01

Family

ID=66603833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811648994.2A Active CN109815847B (en) 2018-12-30 2018-12-30 Visual SLAM method based on semantic constraint

Country Status (1)

Country Link
CN (1) CN109815847B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335319B (en) * 2019-06-26 2022-03-18 华中科技大学 Semantic-driven camera positioning and map reconstruction method and system
CN110298921B (en) * 2019-07-05 2023-07-07 青岛中科智保科技有限公司 Method for constructing three-dimensional map with character semantic information and processing equipment
CN110533720B (en) * 2019-08-20 2023-05-02 西安电子科技大学 Semantic SLAM system and method based on joint constraint
CN111210518B (en) * 2020-01-15 2022-04-05 西安交通大学 Topological map generation method based on visual fusion landmark
CN111427373B (en) * 2020-03-24 2023-11-24 上海商汤临港智能科技有限公司 Pose determining method, pose determining device, medium and pose determining equipment
CN111707275B (en) * 2020-05-12 2022-04-29 驭势科技(北京)有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
EP4020111B1 (en) * 2020-12-28 2023-11-15 Zenseact AB Vehicle localisation
CN113674416A (en) * 2021-08-26 2021-11-19 中国电子科技集团公司信息科学研究院 Three-dimensional map construction method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732518A (en) * 2015-01-19 2015-06-24 北京工业大学 PTAM improvement method based on ground characteristics of intelligent robot
CN105045263A (en) * 2015-07-06 2015-11-11 杭州南江机器人股份有限公司 Kinect-based robot self-positioning method
CN108229416A (en) * 2018-01-17 2018-06-29 苏州科技大学 Robot SLAM methods based on semantic segmentation technology

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679355B2 (en) * 2017-05-02 2020-06-09 Hrl Laboratories, Llc System and method for detecting moving obstacles based on sensory prediction from ego-motion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732518A (en) * 2015-01-19 2015-06-24 北京工业大学 PTAM improvement method based on ground characteristics of intelligent robot
CN105045263A (en) * 2015-07-06 2015-11-11 杭州南江机器人股份有限公司 Kinect-based robot self-positioning method
CN108229416A (en) * 2018-01-17 2018-06-29 苏州科技大学 Robot SLAM methods based on semantic segmentation technology

Also Published As

Publication number Publication date
CN109815847A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN109815847B (en) Visual SLAM method based on semantic constraint
CN111968129B (en) Instant positioning and map construction system and method with semantic perception
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN112258618B (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
WO2021233029A1 (en) Simultaneous localization and mapping method, device, system and storage medium
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
Park et al. High-precision depth estimation using uncalibrated LiDAR and stereo fusion
CN106780592A (en) Kinect depth reconstruction algorithms based on camera motion and image light and shade
CN111862201A (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN110223382B (en) Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning
CN111797692B (en) Depth image gesture estimation method based on semi-supervised learning
CN111998862A (en) Dense binocular SLAM method based on BNN
Gao et al. Pose refinement with joint optimization of visual points and lines
CN113160275A (en) Automatic target tracking and track calculating method based on multiple videos
Zhu et al. Fusing panoptic segmentation and geometry information for robust visual slam in dynamic environments
CN113822996A (en) Pose estimation method and device for robot, electronic device and storage medium
Dang et al. Real-time semantic plane reconstruction on a monocular drone using sparse fusion
CN116612235A (en) Multi-view geometric unmanned aerial vehicle image three-dimensional reconstruction method and storage medium
Lai et al. 3D semantic map construction system based on visual SLAM and CNNs
Wang et al. A Visual SLAM Algorithm Based on Image Semantic Segmentation in Dynamic Environment
CN113570713A (en) Semantic map construction method and device for dynamic environment
Xu et al. DOS-SLAM: A real-time dynamic object segmentation visual SLAM system
CN111915632A (en) Poor texture target object truth value database construction method based on machine learning
Su et al. Omnidirectional Depth Estimation With Hierarchical Deep Network for Multi-Fisheye Navigation Systems
CN110930519A (en) Semantic ORB-SLAM sensing method and device based on environment understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant