CN115205560A - Monocular camera-based prior map-assisted indoor positioning method - Google Patents

Monocular camera-based prior map-assisted indoor positioning method Download PDF

Info

Publication number
CN115205560A
CN115205560A CN202210846173.XA CN202210846173A CN115205560A CN 115205560 A CN115205560 A CN 115205560A CN 202210846173 A CN202210846173 A CN 202210846173A CN 115205560 A CN115205560 A CN 115205560A
Authority
CN
China
Prior art keywords
dimensional
semantic
line
map
visual field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210846173.XA
Other languages
Chinese (zh)
Inventor
张小国
鲁添祎
蒋琬琪
王慧青
邓奎刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210846173.XA priority Critical patent/CN115205560A/en
Publication of CN115205560A publication Critical patent/CN115205560A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a prior map auxiliary indoor positioning method based on a monocular camera. The method comprises the following steps: s1, acquiring a prior point cloud map of a non-closed-loop indoor scene; s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information, and forming a simplified semantic line feature positioning map; and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, and adding a semantic label to the two-dimensional line. S4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels; s5, constructing an optimization model matched with the prior map in real time, and improving the positioning accuracy of the system; the method solves the problem that closed-loop correction accumulated errors are difficult to rely on in large-range indoor scenes as a pose optimization basis, and effectively improves the accuracy of large-range indoor positioning results of the monocular vision positioning system.

Description

Monocular camera-based prior map-assisted indoor positioning method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a prior map auxiliary indoor positioning method based on a monocular camera.
Background
Among the many means of indoor localization, SLAM visual localization plays an increasingly important role. As is well known, the SLAM technology generally relies on closed loop correction of accumulated errors, however, in large-range indoor scenes such as airports, shopping malls, museums and the like, closed loops may not exist for a long time, so that continuous and robust SLAM positioning in large-range indoor vision is realized, and other pose optimization bases need to be found.
The indoor environment has a large amount of environment semantic information such as doors and windows and the like and clear structural line characteristics. Line features are more descriptive than point features under the same scene. Theoretically, on the premise that a relatively accurate three-dimensional environment map exists, continuous and high-precision positioning can be achieved on the premise that a large indoor scene does not have a closed loop by matching the SLAM reconstruction environment and existing map model data (such as a BIM model) in real time.
Disclosure of Invention
In order to solve the problem that closed-loop correction accumulated errors are difficult to rely on in a large-range indoor scene as a pose optimization basis, the invention provides a prior map auxiliary indoor positioning method based on a monocular camera, and the accuracy of a large-range indoor positioning result can be effectively improved.
In order to achieve the purpose, the solution of the invention is a prior map auxiliary indoor positioning method based on a monocular camera, which comprises the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
and S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the two-dimensional line segments.
S4, judging the visibility of the three-dimensional line segment in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;
s5, an optimization model matched with the real-time map and the prior map is constructed, and the positioning accuracy of the system is improved.
And S6, comparing and detecting the performance of each algorithm under the conditions of no closed track and no annular track and by taking the absolute position error and the closed error as evaluation standards of the performance of each algorithm.
In the step 1, an RGB image and a depth map of a scene are captured by a handheld Kinect-v2.0 camera, and dense point cloud information is generated by an ORB-SLAM2 and stored as a prior point cloud map.
In the step 2, a high-efficiency semantic segmentation model RandLA-Net is used, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is used for reserving the remarkable features.
In the step 2, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, the three-dimensional straight lines are extracted by using an image-based three-dimensional line segment detection method, and a prior map composed of line segments with environment semantic information is effectively generated by combining a semantic segmentation model RandLA-Net.
In the step 3, an M-LSD algorithm is used for detecting the two-dimensional line characteristics, meanwhile, a semantic detection thread is added, a segnet network is used for semantic segmentation, and if the extracted line segment is in a color block corresponding to the gate, a corresponding semantic category label is given. And screening the line segments with the label category of 'gate', reserving a straight line group close to a semantic boundary curve xi, and merging two-dimensional line segments which may be the same frame line.
In the step 4, the visibility of the three-dimensional doorframe line is judged by adopting different strategies under three conditions according to the visibility of the doorframe line end point.
In step 4, after screening, the two end points of the door frame line L are respectively projected to an image plane, three-dimensional door frame lines in all the visual field ranges are traversed, the difference and the distance between the included angle and the two-dimensional door frame line are calculated, a two-dimensional-three-dimensional matching pair which satisfies that the three parameters are smaller than the threshold value is reserved, and matching with the prior map is completed.
In the step 5, motion between two continuous frames is iteratively estimated through a Ceres Solver, a matched optimization model is constructed, and the positioning accuracy of the system is improved.
The invention has the beneficial effects that:
firstly, performing semantic segmentation on a prior map in advance, extracting line features with semantic information in the map, and forming a simplified semantic line feature positioning map; and then, line features obtained by on-line extraction of the monocular camera are processed during actual positioning, a door frame line is screened out and matched with a prestored map, a cost function of system optimization is constructed, and the pose of the camera is effectively optimized under the condition of no closed loop. Finally, we use the measured data set to verify the performance of the algorithm herein. Experiments show that the method effectively improves the accuracy of the large-range indoor positioning result.
Drawings
Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and detailed description, which will be understood as being illustrative only and not limiting in scope. It should be noted that as used in the following description, the terms "front," "back," "left," "right," "upper" and "lower" refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
Fig. 1 is a flowchart of a prior map-assisted indoor positioning method based on a monocular camera in an embodiment of the present invention;
as shown in fig. 1, the prior map-assisted indoor positioning method based on a monocular camera provided by the present invention specifically includes the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
specifically, the prior map data of the example acquires an RGB image and a depth map of a scene captured by a Kinect-v2.0 camera, the hand-held Kinect-v2.0 camera walks in a corridor environment where an experiment is located to obtain scene information, dense point cloud is obtained by calculation by ORB-SLAM2, and the dense point cloud information is stored in a ply format after running an ORB-SLAM2 algorithm to serve as a prior map of a subsequent experiment.
S2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
specifically, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.
And S3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the two-dimensional line segments.
Specifically, an M-LSD algorithm is used for performing two-dimensional line feature detection on the RGB image, a segnet network is used for performing semantic segmentation on the RGB image, and two threads are used for processing the RGB image at the same time. Then, giving corresponding semantic category labels to the extracted two-dimensional line features, screening all line segments with the labels of 'gate', and removing line segments with the length less than a threshold value l d And line segments that are far from the boundary of the semantic category. Respectively calculating the distances dist (p, xi) between two end points and middle point of the line segment and semantic boundary xi, discarding maximum value and adding two smaller values, and marking as sigma i . When sigma i Less than threshold value ∑ d Then, the line segment is considered to belong to the searched two-dimensional door frame line V d
Finally, we are right to V d And carrying out merging optimization on the similar line segments. If the overlapping part O of the line segments in the X-axis or Y-axis direction is larger than the threshold value O d Then, the included angle alpha and beta formed by the two and the coordinate axis are judged, if the alpha-beta is less than the threshold value gamma d The two are considered to be merged and the two far ends are taken to represent as a new line segment V.
S4, judging the visibility of the required line segments in the three-dimensional map for the single-frame image, and matching the two-dimensional line features with the consistent semantic labels with the three-dimensional line segments reserved in the visual field range;
specifically, the three-dimensional door frame line is screened according to the visibility of the end point, and the line segment which is not in the visual field range in the frame is removed. The end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:
(1) If the two end points are both in the visual field range, the door frame line is considered to be in the visual field range;
(2) One of the two endpoints is within the field of view and one is outside the field of view. Points X remaining in the field of view i . Take the midpoint of the two
Figure BDA0003752868460000051
If the line segment is within the visual field, the line segment is reserved
Figure BDA0003752868460000052
If not, the process is repeated for the newly generated segment until the length of the segment is less than the set threshold.
(3) If both ends are not in the visual field, the door frame line is not in the visual field.
Respectively projecting two end points of the three-dimensional door frame line L into an image plane, traversing the three-dimensional door frame lines in all visual field ranges, calculating an included angle theta, a length difference delta L and a distance d between the three-dimensional door frame lines and the two-dimensional door frame line, and finding out the condition that theta is less than theta 0 ,Δl<Δl 0 ,d<d 0 The matching with the prior map is completed by the two-dimensional-three-dimensional matching pair.
S5, an optimization model matched with the real-time map and the prior map is constructed, and the positioning accuracy of the system is improved.
Specifically, error terms matched with the two-dimensional line segments and the three-dimensional line segments are defined, all the error terms are added in a least square mode under the constraint of the prior point cloud map, and a cost function is constructed to be solved.
For the constraint of a portal line in a prior map, two end points of a three-dimensional semantic line feature projected to a two-dimensional plane are defined
Figure BDA0003752868460000061
The sum of the distances to the straight line Ax + By + C =0 where the 2D line segment is located is a new error term
Figure BDA0003752868460000062
Then for a single frame image, the kth camera frame c k Observed ith spatial line L i The residual error of (c) is:
Figure BDA0003752868460000063
Figure BDA0003752868460000064
Figure BDA0003752868460000065
the final cost function can be expressed as:
Figure BDA0003752868460000071
the first item is a Marg marginalized residual error part, the second item is an IMU residual error part between adjacent frames in the sliding window, and the third item is a visual reprojection residual error of the feature points in the sliding window under the camera. χ is all state quantities in the sliding window,
Figure BDA0003752868460000072
the covariance matrix of the noise term is pre-integrated for the IMU,
Figure BDA0003752868460000073
a covariance matrix of the observed noise for the visual feature points,
Figure BDA0003752868460000074
observation of features as two-dimensional-three-dimensional semantic linesCovariance matrix of noise.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims (7)

1. A prior map auxiliary indoor positioning method based on a monocular camera is characterized by comprising the following steps:
s1, acquiring a prior point cloud map of a non-closed-loop indoor scene;
s2, performing semantic segmentation on the known prior map, extracting and reserving local point clouds with environment semantic information and extracting three-dimensional line segments to form a simplified semantic line feature positioning map;
s3, extracting two-dimensional line features from the camera image, adding a semantic detection process, adding semantic labels to the two-dimensional line segments, and screening and combining the semantic labels;
s4, judging the visibility of the line segments in the semantic positioning map for the single-frame image, and matching the two-dimensional line features with the three-dimensional line segments which are in the visual field range and have consistent semantic labels;
s5, an optimization model matched with the real-time map and the prior map is built, and the positioning accuracy of the system is improved.
2. The indoor positioning method assisted by the prior map based on the monocular camera as recited in claim 1, comprising the following steps: in the step S2, a high-efficiency semantic segmentation model RandLA-Net is utilized, a simple and quick random sampling method is used for greatly reducing the point density, and a redesigned Localfeature aggregator is applied to retain remarkable characteristics.
3. The monocular camera-based prior map-assisted indoor positioning method according to claim 1, wherein in step S3, in order to satisfy the extraction of three-dimensional straight lines and obtain better robustness, line features are extracted by using an image-based three-dimensional line segment detection method.
4. The monocular camera-based prior map-assisted indoor positioning method of claim 1, wherein in step S3, two-dimensional line feature detection is performed using M-LSD algorithm, and meanwhile, semantic detection threads are added, semantic labels are added to two-dimensional line segments in the image, and the two threads process RGB image at the same time.
5. The indoor positioning method based on the prior map of the monocular camera as recited in claim 1, wherein in the step S4, the three-dimensional door frame line is screened by adopting different strategies in three cases depending on visibility of the end point of the door frame line; specifically, line segments which are not in the visual field range in the frame are removed; the end points of the door frame lines are extracted, and then the end points are processed by adopting different strategies in three cases:
(1) If both ends are in the visual field, the door frame line is considered to be in the visual field;
(2) One of the two end points is positioned in the visual field range, and the other end point is positioned outside the visual field range; points X remaining in the field of view i (ii) a Take the midpoint of the two
Figure FDA0003752868450000021
If the line segment is within the visual field, the line segment is reserved
Figure FDA0003752868450000022
If the distance is not within the visual field range, the process is continuously repeated for the newly generated line segment until the length of the line segment is smaller than the set threshold value;
(3) If both ends are not in the visual field, the door frame line is not in the visual field.
6. The monocular camera-based prior map-assisted indoor positioning method of claim 5, wherein in step S4, the two end points of the portal line L are projected into the image plane respectively after being screened, the three-dimensional portal lines in all the visual field ranges are traversed, the included angle θ, the difference Δ L of the lengths and the distance d between the two-dimensional portal lines are calculated, and the condition that θ < θ is found 0 ,Δl<Δl 0 ,d<d 0 Two-three dimensional matching pairs of; matching with the prior map is completed.
7. The prior map-assisted indoor positioning method based on monocular camera as claimed in claim 1, wherein in step S5, motion between two consecutive frames is iteratively estimated by Ceres Solver, a matched optimization model is constructed, and positioning accuracy of the system is improved, specifically as follows;
minimizing the error model, namely solving an optimal solution for an objective function in the error model; the objective function is expressed as follows;
Figure FDA0003752868450000031
the method comprises the following steps that a Marg marginalized residual error part is taken as a first item, an IMU residual error part between adjacent frames in a sliding window is taken as a second item, and a visual re-projection residual error of a feature point in the sliding window under a camera is taken as a third item; χ is all state quantities in the sliding window,
Figure FDA0003752868450000032
the covariance matrix of the noise term is pre-integrated for the IMU,
Figure FDA0003752868450000033
a covariance matrix of the observed noise for the visual feature points,
Figure FDA0003752868450000034
a covariance matrix of observed noise that is a two-dimensional-three-dimensional semantic line feature;
error terms corresponding to two-dimensional-three-dimensional matching pairs
Figure FDA0003752868450000035
Can be expressed as:
Figure FDA0003752868450000036
wherein the linear equation of the two-dimensional doorframe line is Ax + By + C =0,
Figure FDA0003752868450000037
and
Figure FDA0003752868450000038
respectively representing the coordinates of the endpoints of the prior three-dimensional door frame line after being projected to the image plane.
CN202210846173.XA 2022-07-19 2022-07-19 Monocular camera-based prior map-assisted indoor positioning method Pending CN115205560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210846173.XA CN115205560A (en) 2022-07-19 2022-07-19 Monocular camera-based prior map-assisted indoor positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210846173.XA CN115205560A (en) 2022-07-19 2022-07-19 Monocular camera-based prior map-assisted indoor positioning method

Publications (1)

Publication Number Publication Date
CN115205560A true CN115205560A (en) 2022-10-18

Family

ID=83581244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210846173.XA Pending CN115205560A (en) 2022-07-19 2022-07-19 Monocular camera-based prior map-assisted indoor positioning method

Country Status (1)

Country Link
CN (1) CN115205560A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246038A (en) * 2023-05-11 2023-06-09 西南交通大学 Multi-view three-dimensional line segment reconstruction method, system, electronic equipment and medium
CN118089753A (en) * 2024-04-26 2024-05-28 江苏集萃清联智控科技有限公司 Monocular semantic SLAM positioning method and system based on three-dimensional target

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246038A (en) * 2023-05-11 2023-06-09 西南交通大学 Multi-view three-dimensional line segment reconstruction method, system, electronic equipment and medium
CN118089753A (en) * 2024-04-26 2024-05-28 江苏集萃清联智控科技有限公司 Monocular semantic SLAM positioning method and system based on three-dimensional target

Similar Documents

Publication Publication Date Title
CN110223348B (en) Robot scene self-adaptive pose estimation method based on RGB-D camera
CN107330376B (en) Lane line identification method and system
CN110569704A (en) Multi-strategy self-adaptive lane line detection method based on stereoscopic vision
CN115205560A (en) Monocular camera-based prior map-assisted indoor positioning method
Yuan et al. Robust lane detection for complicated road environment based on normal map
CN106778668B (en) A kind of method for detecting lane lines of robust that combining RANSAC and CNN
CN106846359A (en) Moving target method for quick based on video sequence
WO2022027931A1 (en) Video image-based foreground detection method for vehicle in motion
CN109086724B (en) Accelerated human face detection method and storage medium
CN109559324B (en) Target contour detection method in linear array image
CN111611643A (en) Family type vectorization data obtaining method and device, electronic equipment and storage medium
CN111340881B (en) Direct method visual positioning method based on semantic segmentation in dynamic scene
US10249046B2 (en) Method and apparatus for object tracking and segmentation via background tracking
Yang et al. Multiple object tracking with kernelized correlation filters in urban mixed traffic
CN105354857B (en) A kind of track of vehicle matching process for thering is viaduct to block
Tsintotas et al. DOSeqSLAM: Dynamic on-line sequence based loop closure detection algorithm for SLAM
CN104408741A (en) Video global motion estimation method with sequential consistency constraint
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
CN111914832B (en) SLAM method of RGB-D camera under dynamic scene
CN109543498B (en) Lane line detection method based on multitask network
CN110675442B (en) Local stereo matching method and system combined with target recognition technology
CN114549549B (en) Dynamic target modeling tracking method based on instance segmentation in dynamic environment
Furuya et al. Road intersection monitoring from video with large perspective deformation
Wang et al. Lane detection algorithm based on density clustering and RANSAC
CN112883836A (en) Video detection method for deformation of underground coal mine roadway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination