CN115205560A - Monocular camera-based prior map-assisted indoor positioning method - Google Patents
Monocular camera-based prior map-assisted indoor positioning method Download PDFInfo
- Publication number
- CN115205560A CN115205560A CN202210846173.XA CN202210846173A CN115205560A CN 115205560 A CN115205560 A CN 115205560A CN 202210846173 A CN202210846173 A CN 202210846173A CN 115205560 A CN115205560 A CN 115205560A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- semantic
- line
- map
- visual field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a prior map auxiliary indoor positioning method based on a monocular camera. The method comprises the following steps: s1, acquiring a prior point cloud map of a non-closed-loop indoor scene; s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information, and forming a simplified semantic line feature positioning map; and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, and adding a semantic label to the two-dimensional line. S4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels; s5, constructing an optimization model matched with the prior map in real time, and improving the positioning accuracy of the system; the method solves the problem that closed-loop correction accumulated errors are difficult to rely on in large-range indoor scenes as a pose optimization basis, and effectively improves the accuracy of large-range indoor positioning results of the monocular vision positioning system.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a prior map auxiliary indoor positioning method based on a monocular camera.
Background
Among the many means of indoor localization, SLAM visual localization plays an increasingly important role. As is well known, the SLAM technology generally relies on closed loop correction of accumulated errors, however, in large-range indoor scenes such as airports, shopping malls, museums and the like, closed loops may not exist for a long time, so that continuous and robust SLAM positioning in large-range indoor vision is realized, and other pose optimization bases need to be found.
The indoor environment has a large amount of environment semantic information such as doors and windows and the like and clear structural line characteristics. Line features are more descriptive than point features under the same scene. Theoretically, on the premise that a relatively accurate three-dimensional environment map exists, continuous and high-precision positioning can be achieved on the premise that a large indoor scene does not have a closed loop by matching the SLAM reconstruction environment and existing map model data (such as a BIM model) in real time.
Disclosure of Invention
In order to solve the problem that closed-loop correction accumulated errors are difficult to rely on in a large-range indoor scene as a pose optimization basis, the invention provides a prior map auxiliary indoor positioning method based on a monocular camera, and the accuracy of a large-range indoor positioning result can be effectively improved.
In order to achieve the purpose, the solution of the invention is a prior map auxiliary indoor positioning method based on a monocular camera, which comprises the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the two-dimensional line segments.
S4, judging the visibility of the three-dimensional line segment in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;
s5, an optimization model matched with the real-time map and the prior map is constructed, and the positioning accuracy of the system is improved.
And S6, comparing and detecting the performance of each algorithm under the conditions of no closed track and no annular track and by taking the absolute position error and the closed error as evaluation standards of the performance of each algorithm.
In the step 1, an RGB image and a depth map of a scene are captured by a handheld Kinect-v2.0 camera, and dense point cloud information is generated by an ORB-SLAM2 and stored as a prior point cloud map.
In the step 2, a high-efficiency semantic segmentation model RandLA-Net is used, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is used for reserving the remarkable features.
In the step 2, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, the three-dimensional straight lines are extracted by using an image-based three-dimensional line segment detection method, and a prior map composed of line segments with environment semantic information is effectively generated by combining a semantic segmentation model RandLA-Net.
In the step 3, an M-LSD algorithm is used for detecting the two-dimensional line characteristics, meanwhile, a semantic detection thread is added, a segnet network is used for semantic segmentation, and if the extracted line segment is in a color block corresponding to the gate, a corresponding semantic category label is given. And screening the line segments with the label category of 'gate', reserving a straight line group close to a semantic boundary curve xi, and merging two-dimensional line segments which may be the same frame line.
In the step 4, the visibility of the three-dimensional doorframe line is judged by adopting different strategies under three conditions according to the visibility of the doorframe line end point.
In step 4, after screening, the two end points of the door frame line L are respectively projected to an image plane, three-dimensional door frame lines in all the visual field ranges are traversed, the difference and the distance between the included angle and the two-dimensional door frame line are calculated, a two-dimensional-three-dimensional matching pair which satisfies that the three parameters are smaller than the threshold value is reserved, and matching with the prior map is completed.
In the step 5, motion between two continuous frames is iteratively estimated through a Ceres Solver, a matched optimization model is constructed, and the positioning accuracy of the system is improved.
The invention has the beneficial effects that:
firstly, performing semantic segmentation on a prior map in advance, extracting line features with semantic information in the map, and forming a simplified semantic line feature positioning map; and then, line features obtained by on-line extraction of the monocular camera are processed during actual positioning, a door frame line is screened out and matched with a prestored map, a cost function of system optimization is constructed, and the pose of the camera is effectively optimized under the condition of no closed loop. Finally, we use the measured data set to verify the performance of the algorithm herein. Experiments show that the method effectively improves the accuracy of the large-range indoor positioning result.
Drawings
Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and detailed description, which will be understood as being illustrative only and not limiting in scope. It should be noted that as used in the following description, the terms "front," "back," "left," "right," "upper" and "lower" refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention;
as shown in fig. 1, the prior map-assisted indoor positioning method based on a monocular camera provided by the present invention specifically includes the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
specifically, the prior map data of the example acquires an RGB image and a depth map of a scene captured by a Kinect-v2.0 camera, the hand-held Kinect-v2.0 camera walks in a corridor environment where an experiment is located to obtain scene information, dense point cloud is obtained by calculation by ORB-SLAM2, and the dense point cloud information is stored in a ply format after running an ORB-SLAM2 algorithm to serve as a prior map of a subsequent experiment.
S2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
specifically, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.
And S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the two-dimensional line segments.
Specifically, an M-LSD algorithm is used for performing two-dimensional line feature detection on the RGB image, a segnet network is used for performing semantic segmentation on the RGB image, and two threads are used for processing the RGB image at the same time. Then, giving corresponding semantic category labels to the extracted two-dimensional line features, screening all line segments with the labels of 'gate', and removing line segments with the length less than a threshold value l d And line segments that are far from the boundary of the semantic category. Respectively calculating the distances dist (p, xi) between two end points and middle point of the line segment and semantic boundary xi, discarding maximum value and adding two smaller values, and marking as sigma i . When sigma i Less than threshold value ∑ d Then, the line segment is considered to belong to the searched two-dimensional door frame line V d 。
Finally, we are right to V d And carrying out merging optimization on the similar line segments. If the overlapping part O of the line segments in the X-axis or Y-axis direction is larger than the threshold value O d Then, the included angle alpha and beta formed by the two and the coordinate axis are judged, if the alpha-beta is less than the threshold value gamma d The two are considered to be merged and the two far ends are taken to represent as a new line segment V.
S4, judging the visibility of the required line segments in the three-dimensional map for the single-frame image, and matching the two-dimensional line features with the consistent semantic labels with the three-dimensional line segments reserved in the visual field range;
specifically, the three-dimensional door frame line is screened according to the visibility of the end point, and the line segment which is not in the visual field range in the frame is removed. The end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:
(1) If the two end points are both in the visual field range, the door frame line is considered to be in the visual field range;
(2) One of the two endpoints is within the field of view and one is outside the field of view. Points X remaining in the field of view i . Take the midpoint of the twoIf the line segment is within the visual field, the line segment is reservedIf not, the process is repeated for the newly generated segment until the length of the segment is less than the set threshold.
(3) If both ends are not in the visual field, the door frame line is not in the visual field.
Respectively projecting two end points of the three-dimensional door frame line L into an image plane, traversing the three-dimensional door frame lines in all visual field ranges, calculating an included angle theta, a length difference delta L and a distance d between the three-dimensional door frame lines and the two-dimensional door frame line, and finding out the condition that theta is less than theta 0 ,Δl<Δl 0 ,d<d 0 The matching with the prior map is completed by the two-dimensional-three-dimensional matching pair.
S5, an optimization model matched with the real-time map and the prior map is constructed, and the positioning accuracy of the system is improved.
Specifically, error terms matched with the two-dimensional line segments and the three-dimensional line segments are defined, all the error terms are added in a least square mode under the constraint of the prior point cloud map, and a cost function is constructed to be solved.
For the constraint of a portal line in a prior map, two end points of a three-dimensional semantic line feature projected to a two-dimensional plane are definedThe sum of the distances to the straight line Ax + By + C =0 where the 2D line segment is located is a new error termThen for a single frame image, the kth camera frame c k Observed ith spatial line L i The residual error of (c) is:
the final cost function can be expressed as:
the first item is a Marg marginalized residual error part, the second item is an IMU residual error part between adjacent frames in the sliding window, and the third item is a visual reprojection residual error of the feature points in the sliding window under the camera. χ is all state quantities in the sliding window,the covariance matrix of the noise term is pre-integrated for the IMU,a covariance matrix of the observed noise for the visual feature points,observation of features as two-dimensional-three-dimensional semantic linesCovariance matrix of noise.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.
Claims (7)
1. A prior map auxiliary indoor positioning method based on a monocular camera is characterized by comprising the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
s3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the semantic labels;
s4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;
s5, an optimization model matched with the real-time map and the prior map is built, and the positioning accuracy of the system is improved.
2. The indoor positioning method assisted by the prior map based on the monocular camera as recited in claim 1, comprising the following steps: in the step S2, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.
3. The monocular camera-based prior map-assisted indoor positioning method according to claim 1, wherein in step S3, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, line features are extracted by using an image-based three-dimensional line segment detection method.
4. The monocular camera-based prior map-assisted indoor positioning method of claim 1, wherein in step S3, two-dimensional line feature detection is performed using M-LSD algorithm, and meanwhile, semantic detection threads are added, semantic labels are added to two-dimensional line segments in the image, and the two threads process RGB image at the same time.
5. The indoor positioning method based on the prior map of the monocular camera as recited in claim 1, wherein in the step S4, the three-dimensional door frame line is screened by adopting different strategies in three cases depending on visibility of the end point of the door frame line; specifically, line segments which are not in the visual field range in the frame are removed; the end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:
(1) If both ends are in the visual field, the door frame line is considered to be in the visual field;
(2) One of the two end points is positioned in the visual field range, and the other end point is positioned outside the visual field range; points X remaining in the field of view i (ii) a Take the midpoint of the twoIf the line segment is within the visual field, the line segment is reservedIf the distance is not within the visual field range, the process is continuously repeated for the newly generated line segment until the length of the line segment is smaller than the set threshold value;
(3) If both ends are not in the visual field, the door frame line is not in the visual field.
6. The monocular camera-based prior map-assisted indoor positioning method of claim 5, wherein in step S4, the two end points of the portal line L are projected into the image plane respectively after being screened, the three-dimensional portal lines in all the visual field ranges are traversed, the included angle θ, the difference Δ L of the lengths and the distance d between the two-dimensional portal lines are calculated, and the condition that θ < θ is found 0 ,Δl<Δl 0 ,d<d 0 Two-three dimensional matching pairs of; matching with the prior map is completed.
7. The prior map-assisted indoor positioning method based on monocular camera as claimed in claim 1, wherein in step S5, motion between two consecutive frames is iteratively estimated by Ceres Solver, a matched optimization model is constructed, and positioning accuracy of the system is improved, specifically as follows;
minimizing the error model, namely solving an optimal solution for an objective function in the error model; the objective function is expressed as follows;
the method comprises the following steps that a Marg marginalized residual error part is taken as a first item, an IMU residual error part between adjacent frames in a sliding window is taken as a second item, and a visual re-projection residual error of a feature point in the sliding window under a camera is taken as a third item; χ is all state quantities in the sliding window,the covariance matrix of the noise term is pre-integrated for the IMU,a covariance matrix of the observed noise for the visual feature points,a covariance matrix of observed noise that is a two-dimensional-three-dimensional semantic line feature;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210846173.XA CN115205560A (en) | 2022-07-19 | 2022-07-19 | Monocular camera-based prior map-assisted indoor positioning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210846173.XA CN115205560A (en) | 2022-07-19 | 2022-07-19 | Monocular camera-based prior map-assisted indoor positioning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115205560A true CN115205560A (en) | 2022-10-18 |
Family
ID=83581244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210846173.XA Pending CN115205560A (en) | 2022-07-19 | 2022-07-19 | Monocular camera-based prior map-assisted indoor positioning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115205560A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116246038A (en) * | 2023-05-11 | 2023-06-09 | 西南交通大学 | Multi-view three-dimensional line segment reconstruction method, system, electronic equipment and medium |
CN118089753A (en) * | 2024-04-26 | 2024-05-28 | 江苏集萃清联智控科技有限公司 | Monocular semantic SLAM positioning method and system based on three-dimensional target |
-
2022
- 2022-07-19 CN CN202210846173.XA patent/CN115205560A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116246038A (en) * | 2023-05-11 | 2023-06-09 | 西南交通大学 | Multi-view three-dimensional line segment reconstruction method, system, electronic equipment and medium |
CN118089753A (en) * | 2024-04-26 | 2024-05-28 | 江苏集萃清联智控科技有限公司 | Monocular semantic SLAM positioning method and system based on three-dimensional target |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223348B (en) | Robot scene self-adaptive pose estimation method based on RGB-D camera | |
CN107330376B (en) | Lane line identification method and system | |
CN110569704A (en) | Multi-strategy self-adaptive lane line detection method based on stereoscopic vision | |
CN115205560A (en) | Monocular camera-based prior map-assisted indoor positioning method | |
Yuan et al. | Robust lane detection for complicated road environment based on normal map | |
CN106778668B (en) | A kind of method for detecting lane lines of robust that combining RANSAC and CNN | |
CN106846359A (en) | Moving target method for quick based on video sequence | |
WO2022027931A1 (en) | Video image-based foreground detection method for vehicle in motion | |
CN109086724B (en) | Accelerated human face detection method and storage medium | |
CN109559324B (en) | Target contour detection method in linear array image | |
CN111611643A (en) | Family type vectorization data obtaining method and device, electronic equipment and storage medium | |
CN111340881B (en) | Direct method visual positioning method based on semantic segmentation in dynamic scene | |
US10249046B2 (en) | Method and apparatus for object tracking and segmentation via background tracking | |
Yang et al. | Multiple object tracking with kernelized correlation filters in urban mixed traffic | |
CN105354857B (en) | A kind of track of vehicle matching process for thering is viaduct to block | |
Tsintotas et al. | DOSeqSLAM: Dynamic on-line sequence based loop closure detection algorithm for SLAM | |
CN104408741A (en) | Video global motion estimation method with sequential consistency constraint | |
CN116449384A (en) | Radar inertial tight coupling positioning mapping method based on solid-state laser radar | |
CN111914832B (en) | SLAM method of RGB-D camera under dynamic scene | |
CN109543498B (en) | Lane line detection method based on multitask network | |
CN110675442B (en) | Local stereo matching method and system combined with target recognition technology | |
CN114549549B (en) | Dynamic target modeling tracking method based on instance segmentation in dynamic environment | |
Furuya et al. | Road intersection monitoring from video with large perspective deformation | |
Wang et al. | Lane detection algorithm based on density clustering and RANSAC | |
CN112883836A (en) | Video detection method for deformation of underground coal mine roadway |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |