CN115205560A

CN115205560A - Monocular camera-based prior map-assisted indoor positioning method

Info

Publication number: CN115205560A
Application number: CN202210846173.XA
Authority: CN
Inventors: 张小国; 鲁添祎; 蒋琬琪; 王慧青; 邓奎刚
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-10-18

Abstract

The invention provides a prior map auxiliary indoor positioning method based on a monocular camera. The method comprises the following steps: s1, acquiring a prior point cloud map of a non-closed-loop indoor scene; s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information, and forming a simplified semantic line feature positioning map; and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, and adding a semantic label to the two-dimensional line. S4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels; s5, constructing an optimization model matched with the prior map in real time, and improving the positioning accuracy of the system; the method solves the problem that closed-loop correction accumulated errors are difficult to rely on in large-range indoor scenes as a pose optimization basis, and effectively improves the accuracy of large-range indoor positioning results of the monocular vision positioning system.

Description

Monocular camera-based prior map-assisted indoor positioning method

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a prior map auxiliary indoor positioning method based on a monocular camera.

Background

Among the many means of indoor localization, SLAM visual localization plays an increasingly important role. As is well known, the SLAM technology generally relies on closed loop correction of accumulated errors, however, in large-range indoor scenes such as airports, shopping malls, museums and the like, closed loops may not exist for a long time, so that continuous and robust SLAM positioning in large-range indoor vision is realized, and other pose optimization bases need to be found.

The indoor environment has a large amount of environment semantic information such as doors and windows and the like and clear structural line characteristics. Line features are more descriptive than point features under the same scene. Theoretically, on the premise that a relatively accurate three-dimensional environment map exists, continuous and high-precision positioning can be achieved on the premise that a large indoor scene does not have a closed loop by matching the SLAM reconstruction environment and existing map model data (such as a BIM model) in real time.

Disclosure of Invention

In order to solve the problem that closed-loop correction accumulated errors are difficult to rely on in a large-range indoor scene as a pose optimization basis, the invention provides a prior map auxiliary indoor positioning method based on a monocular camera, and the accuracy of a large-range indoor positioning result can be effectively improved.

In order to achieve the purpose, the solution of the invention is a prior map auxiliary indoor positioning method based on a monocular camera, which comprises the following steps:

s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;

s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;

and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the two-dimensional line segments.

S4, judging the visibility of the three-dimensional line segment in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;

s5, an optimization model matched with the real-time map and the prior map is constructed, and the positioning accuracy of the system is improved.

And S6, comparing and detecting the performance of each algorithm under the conditions of no closed track and no annular track and by taking the absolute position error and the closed error as evaluation standards of the performance of each algorithm.

In the step 1, an RGB image and a depth map of a scene are captured by a handheld Kinect-v2.0 camera, and dense point cloud information is generated by an ORB-SLAM2 and stored as a prior point cloud map.

In the step 2, a high-efficiency semantic segmentation model RandLA-Net is used, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is used for reserving the remarkable features.

In the step 2, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, the three-dimensional straight lines are extracted by using an image-based three-dimensional line segment detection method, and a prior map composed of line segments with environment semantic information is effectively generated by combining a semantic segmentation model RandLA-Net.

In the step 3, an M-LSD algorithm is used for detecting the two-dimensional line characteristics, meanwhile, a semantic detection thread is added, a segnet network is used for semantic segmentation, and if the extracted line segment is in a color block corresponding to the gate, a corresponding semantic category label is given. And screening the line segments with the label category of 'gate', reserving a straight line group close to a semantic boundary curve xi, and merging two-dimensional line segments which may be the same frame line.

In the step 4, the visibility of the three-dimensional doorframe line is judged by adopting different strategies under three conditions according to the visibility of the doorframe line end point.

In step 4, after screening, the two end points of the door frame line L are respectively projected to an image plane, three-dimensional door frame lines in all the visual field ranges are traversed, the difference and the distance between the included angle and the two-dimensional door frame line are calculated, a two-dimensional-three-dimensional matching pair which satisfies that the three parameters are smaller than the threshold value is reserved, and matching with the prior map is completed.

In the step 5, motion between two continuous frames is iteratively estimated through a Ceres Solver, a matched optimization model is constructed, and the positioning accuracy of the system is improved.

The invention has the beneficial effects that:

firstly, performing semantic segmentation on a prior map in advance, extracting line features with semantic information in the map, and forming a simplified semantic line feature positioning map; and then, line features obtained by on-line extraction of the monocular camera are processed during actual positioning, a door frame line is screened out and matched with a prestored map, a cost function of system optimization is constructed, and the pose of the camera is effectively optimized under the condition of no closed loop. Finally, we use the measured data set to verify the performance of the algorithm herein. Experiments show that the method effectively improves the accuracy of the large-range indoor positioning result.

Drawings

Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and detailed description, which will be understood as being illustrative only and not limiting in scope. It should be noted that as used in the following description, the terms "front," "back," "left," "right," "upper" and "lower" refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.

Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention;

as shown in fig. 1, the prior map-assisted indoor positioning method based on a monocular camera provided by the present invention specifically includes the following steps:

s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;

specifically, the prior map data of the example acquires an RGB image and a depth map of a scene captured by a Kinect-v2.0 camera, the hand-held Kinect-v2.0 camera walks in a corridor environment where an experiment is located to obtain scene information, dense point cloud is obtained by calculation by ORB-SLAM2, and the dense point cloud information is stored in a ply format after running an ORB-SLAM2 algorithm to serve as a prior map of a subsequent experiment.

specifically, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.

Specifically, an M-LSD algorithm is used for performing two-dimensional line feature detection on the RGB image, a segnet network is used for performing semantic segmentation on the RGB image, and two threads are used for processing the RGB image at the same time. Then, giving corresponding semantic category labels to the extracted two-dimensional line features, screening all line segments with the labels of 'gate', and removing line segments with the length less than a threshold value l _d And line segments that are far from the boundary of the semantic category. Respectively calculating the distances dist (p, xi) between two end points and middle point of the line segment and semantic boundary xi, discarding maximum value and adding two smaller values, and marking as sigma _i . When sigma _i Less than threshold value ∑ _d Then, the line segment is considered to belong to the searched two-dimensional door frame line V _d 。

Finally, we are right to V _d And carrying out merging optimization on the similar line segments. If the overlapping part O of the line segments in the X-axis or Y-axis direction is larger than the threshold value O _d Then, the included angle alpha and beta formed by the two and the coordinate axis are judged, if the alpha-beta is less than the threshold value gamma _d The two are considered to be merged and the two far ends are taken to represent as a new line segment V.

S4, judging the visibility of the required line segments in the three-dimensional map for the single-frame image, and matching the two-dimensional line features with the consistent semantic labels with the three-dimensional line segments reserved in the visual field range;

specifically, the three-dimensional door frame line is screened according to the visibility of the end point, and the line segment which is not in the visual field range in the frame is removed. The end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:

(1) If the two end points are both in the visual field range, the door frame line is considered to be in the visual field range;

(2) One of the two endpoints is within the field of view and one is outside the field of view. Points X remaining in the field of view _i . Take the midpoint of the two

If the line segment is within the visual field, the line segment is reserved

If not, the process is repeated for the newly generated segment until the length of the segment is less than the set threshold.

(3) If both ends are not in the visual field, the door frame line is not in the visual field.

Respectively projecting two end points of the three-dimensional door frame line L into an image plane, traversing the three-dimensional door frame lines in all visual field ranges, calculating an included angle theta, a length difference delta L and a distance d between the three-dimensional door frame lines and the two-dimensional door frame line, and finding out the condition that theta is less than theta ₀ ，Δl＜Δl ₀ ，d＜d ₀ The matching with the prior map is completed by the two-dimensional-three-dimensional matching pair.

Specifically, error terms matched with the two-dimensional line segments and the three-dimensional line segments are defined, all the error terms are added in a least square mode under the constraint of the prior point cloud map, and a cost function is constructed to be solved.

For the constraint of a portal line in a prior map, two end points of a three-dimensional semantic line feature projected to a two-dimensional plane are defined

The sum of the distances to the straight line Ax + By + C =0 where the 2D line segment is located is a new error term

Then for a single frame image, the kth camera frame c _k Observed ith spatial line L _i The residual error of (c) is:

the final cost function can be expressed as:

the first item is a Marg marginalized residual error part, the second item is an IMU residual error part between adjacent frames in the sliding window, and the third item is a visual reprojection residual error of the feature points in the sliding window under the camera. χ is all state quantities in the sliding window,

the covariance matrix of the noise term is pre-integrated for the IMU,

a covariance matrix of the observed noise for the visual feature points,

observation of features as two-dimensional-three-dimensional semantic linesCovariance matrix of noise.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims

1. A prior map auxiliary indoor positioning method based on a monocular camera is characterized by comprising the following steps:

s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;

s3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the semantic labels;

s4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;

s5, an optimization model matched with the real-time map and the prior map is built, and the positioning accuracy of the system is improved.

2. The indoor positioning method assisted by the prior map based on the monocular camera as recited in claim 1, comprising the following steps: in the step S2, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.

3. The monocular camera-based prior map-assisted indoor positioning method according to claim 1, wherein in step S3, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, line features are extracted by using an image-based three-dimensional line segment detection method.

4. The monocular camera-based prior map-assisted indoor positioning method of claim 1, wherein in step S3, two-dimensional line feature detection is performed using M-LSD algorithm, and meanwhile, semantic detection threads are added, semantic labels are added to two-dimensional line segments in the image, and the two threads process RGB image at the same time.

5. The indoor positioning method based on the prior map of the monocular camera as recited in claim 1, wherein in the step S4, the three-dimensional door frame line is screened by adopting different strategies in three cases depending on visibility of the end point of the door frame line; specifically, line segments which are not in the visual field range in the frame are removed; the end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:

(1) If both ends are in the visual field, the door frame line is considered to be in the visual field;

(2) One of the two end points is positioned in the visual field range, and the other end point is positioned outside the visual field range; points X remaining in the field of view _i (ii) a Take the midpoint of the two

If the line segment is within the visual field, the line segment is reserved

If the distance is not within the visual field range, the process is continuously repeated for the newly generated line segment until the length of the line segment is smaller than the set threshold value;

6. The monocular camera-based prior map-assisted indoor positioning method of claim 5, wherein in step S4, the two end points of the portal line L are projected into the image plane respectively after being screened, the three-dimensional portal lines in all the visual field ranges are traversed, the included angle θ, the difference Δ L of the lengths and the distance d between the two-dimensional portal lines are calculated, and the condition that θ < θ is found ₀ ，Δl＜Δl ₀ ，d＜d ₀ Two-three dimensional matching pairs of; matching with the prior map is completed.

7. The prior map-assisted indoor positioning method based on monocular camera as claimed in claim 1, wherein in step S5, motion between two consecutive frames is iteratively estimated by Ceres Solver, a matched optimization model is constructed, and positioning accuracy of the system is improved, specifically as follows;

minimizing the error model, namely solving an optimal solution for an objective function in the error model; the objective function is expressed as follows;

the method comprises the following steps that a Marg marginalized residual error part is taken as a first item, an IMU residual error part between adjacent frames in a sliding window is taken as a second item, and a visual re-projection residual error of a feature point in the sliding window under a camera is taken as a third item; χ is all state quantities in the sliding window,

the covariance matrix of the noise term is pre-integrated for the IMU,

a covariance matrix of the observed noise for the visual feature points,

a covariance matrix of observed noise that is a two-dimensional-three-dimensional semantic line feature;

error terms corresponding to two-dimensional-three-dimensional matching pairs

Can be expressed as:

wherein the linear equation of the two-dimensional doorframe line is Ax + By + C =0,

and

respectively representing the coordinates of the endpoints of the prior three-dimensional door frame line after being projected to the image plane.