CN111161318A - Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching - Google Patents
Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching Download PDFInfo
- Publication number
- CN111161318A CN111161318A CN201911394459.3A CN201911394459A CN111161318A CN 111161318 A CN111161318 A CN 111161318A CN 201911394459 A CN201911394459 A CN 201911394459A CN 111161318 A CN111161318 A CN 111161318A
- Authority
- CN
- China
- Prior art keywords
- matching
- image
- points
- feature points
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
Abstract
The invention discloses a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, which comprises the following steps: reading the RGB image of the first frame through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold valueTaking the image as a reference frame, and initializing the system; if the number of feature points is less than the thresholdReading the next frame of RGB picture, extracting ORB characteristic points,until the number of feature points is greater than a thresholdCarrying out system initialization; the method combines a deep convolutional neural network model and an image characteristic point matching technology, fully utilizes the information of the image, eliminates the dynamic object pixels in the image and keeps static pixels; the matching precision of the image characteristic points in a dynamic scene is improved, the error accumulation of pose estimation is reduced, and the system robustness is improved.
Description
Technical Field
The invention relates to the technical field of computer vision and mobile robot positioning, in particular to a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching.
Background
SLAM (simultaneous localization and mapping) refers to creating a map in a completely unknown environment under the condition that the position of the map is uncertain, expanding the map, and simultaneously performing autonomous localization and navigation by using the map.
Slam finds application in many different fields, from indoor robots to outdoor, underwater and air systems and AR (augmented reality). On a theoretical and conceptual level, slam can now be considered as a problem that has been solved. However, in practice, implementing a generic slam solution, particularly in building and using a perceptually rich map as part of the slam algorithm, still presents a number of problems. In recent years, the visual SLAM has become a hot spot of technical research due to the simple structure and the great technical difficulty of the visual sensor.
The traditional slam system assumes that the scene is fixed in advance, but there are many uncertain factors in the real environment, such as the change of the intensity of illumination, walking pedestrians or animals, moving automobiles, and the like. In a dynamic environment, a conventional visual SLAM is easily mismatched and has a large error. The traditional feature point-based visual SLAM algorithm deals with simple dynamic scene problems by detecting dynamic feature points and labeling as noise points. The ORB-SLAM reduces the influence of moving objects on the positioning and mapping precision through RANSAC (random sample consensus), key frames and optimization of local maps. The visual SLAM algorithm based on the direct method processes the occlusion problem caused by the dynamic object by optimizing a cost function in an equation. However, these methods have great errors and limitations in handling dynamic objects.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, the method combines a deep convolutional neural network model and an image feature point matching technology, fully utilizes the information of a picture, eliminates dynamic object pixels in the image and reserves static pixels; the matching precision of the image characteristic points in a dynamic scene is improved, the error accumulation of pose estimation is reduced, and the system robustness is improved.
The purpose of the invention is realized by the following technical scheme:
a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching comprises the following steps:
reading a first frame RGB image through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold valueTaking the image as a reference frame, and initializing the system; if the number of feature points is less than the thresholdReading the next frame of RGB picture, and extracting ORB feature points until the number of the feature points is greater than a threshold valueCarrying out system initialization;
reading an RGB image of a next frame as a current frame, detecting a moving object of the current frame by using a YOLO algorithm, and selecting the object frame in the scene;
thirdly, a target frame of a detection result of the YOLO algorithm also comprises pixel points of other objects besides the target object, and in order to fully utilize pixel information of the image, the scene is subjected to semantic segmentation through FCN and segnet algorithms, so that dynamic pixels and static pixels of the image are obtained;
step four, matching the static pixels and the reference frames in the step three by using feature points, and eliminating error matching of the obtained feature matching points by using a Grid Motion Statistics (GMS) algorithm; the Grid Motion Statistics (GMS) algorithm proposes an assumption based on the smoothness of the motion: the matching point of one feature point p1 on the first frame image on the second frame image is p2, and if the matching is correct, the matching points of the feature points in the 3 x 3 grid with the p1 as the center are all in the 3 x 3 grid with the p2 as the center with high probability; based on the hypothesis, the two frames of images are subjected to grid division, and the matching points in the corresponding grid area are scored, wherein the scoring is defined as follows:
wherein | XikjkL represents the number of matched feature points on the corresponding grid pair;
because the number of feature matching points between images with good motion continuity is larger than that between images with poor continuity, and therefore the corresponding score is also high, an adaptive threshold T is set to ensure the universality under different scenes, if the number of matching points is larger than the threshold T, the matching pair is considered to be a correct matching pair, otherwise, the matching pair is an incorrect matching pair, wherein the calculation formula of T is as follows:
wherein n is the average number of the feature points in each grid;
step five, calculating the three-dimensional coordinates of the feature points subjected to matching processing in the step four, and starting a tracking thread of the ORB-SLAM to track the feature points;
sixthly, minimizing the reprojection error by using a beam adjustment method (BA) and optimizing a local map;
and seventhly, optimizing the pose by utilizing loop detection and correcting the drift error.
Compared with the prior art, the invention has the following beneficial effects:
the method integrates a deep convolutional neural network model and a traditional image feature point matching technology, fully utilizes the information of the image, eliminates the dynamic object pixels in the image and keeps static pixels; the SLAM technology based on the feature points only uses the feature point information in the image, and can not distinguish whether the feature points are the feature points of moving objects such as pedestrians, automobiles and the like, but the invention fully utilizes the information provided by the image, carries out object identification on the image and carries out semantic segmentation on the image, and can more effectively and fully utilize the information of the image; moreover, the GMS algorithm is utilized to effectively and quickly eliminate the mismatching for the problem of residual dynamic characteristic points which are not removed in the recognition and semantic segmentation; the method has stronger accuracy and robustness in a dynamic environment.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a diagram illustrating the matching result of GMS algorithm of the present invention;
fig. 3 is a schematic diagram of the GMS algorithm of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The invention discloses a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, which is used for detecting potential moving objects in a dynamic scene, realizing semantic segmentation and motion detection of the scene and then eliminating potential moving object pixel points. Because the object identification and semantic segmentation are greatly influenced by the neural network training model, the situation that the dynamic object pixel points are not completely removed may exist. The core idea of the GMS is that: in terms of motion smoothness, the number of pairs of correct matching points attached to a correctly matching feature point should be greater than the number of pairs of correct matching points in the vicinity of the feature point of the incorrectly matching point. And performing mismatching elimination on the frame with semantic segmentation eliminated dynamic object pixels and the reference frame by using the GMS.
YOLO is a new target detection method, and the method has the characteristics of realizing rapid detection and simultaneously achieving higher accuracy. The target detection task is regarded as a regression problem of target area prediction and category prediction. The method adopts a single neural network to directly predict the boundary and the class probability of the article, and realizes end-to-end article detection. Meanwhile, the method has very fast speed detection, and the basic version can achieve real-time detection of 45 frames/s; fast YOLO can reach 155 frames/s.
The GMS (grid-based motion statistics) algorithm is a simple method to encapsulate motion smoothing as a certain number of matches in the data estimation region. The GMS may convert a high number of matches into a high quality match, which enables a real-time, highly robust system.
Specifically, as shown in fig. 1 to 3, a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching includes the following steps:
reading a first frame RGB image through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold valueTaking the image as a reference frame, and initializing the system; if the number of feature points is less than the thresholdReading the next frame of RGB picture, and extracting ORB feature points until the number of the feature points is greater than a threshold valueAnd carrying out system initialization.
And step two, reading the RGB image of the next frame as the current frame, detecting the moving object of the current frame by using a YOLO algorithm, and selecting the object frame in the scene.
And thirdly, the target frame of the detection result of the YOLO algorithm also comprises pixel points of other objects besides the target object, and in order to fully utilize the pixel information of the image, the scene is subjected to semantic segmentation through FCN and segnet algorithms, so that dynamic pixels and static pixels of the image are obtained.
Step four, matching the static pixels and the reference frames in the step three by using feature points, and eliminating error matching of the obtained feature matching points by using a Grid Motion Statistics (GMS) algorithm; the Grid Motion Statistics (GMS) algorithm proposes an assumption based on the smoothness of the motion: the matching point of one feature point p1 on the first frame image on the second frame image is p2, and if the matching is correct, the matching points of the feature points in the 3 x 3 grid with the p1 as the center are all in the 3 x 3 grid with the p2 as the center with high probability; based on the hypothesis, the two frames of images are subjected to grid division, and the matching points in the corresponding grid area are scored, wherein the scoring is defined as follows:
wherein | XikjkL represents the number of matched feature points on the corresponding grid pair;
because the number of feature matching points between images with good motion continuity is larger than that between images with poor continuity, and therefore the corresponding score is also high, an adaptive threshold T is set to ensure the universality under different scenes, if the number of matching points is larger than the threshold T, the matching pair is considered to be a correct matching pair, otherwise, the matching pair is an incorrect matching pair, wherein the calculation formula of T is as follows:
wherein n is the average number of the characteristic points in each grid, and the α value in the scheme is 5.
And step five, calculating the three-dimensional coordinates of the feature points subjected to matching processing in the step four, and starting the tracking thread of the ORB-SLAM to track the feature points.
And sixthly, minimizing the reprojection error by using a beam adjustment method (BA) and optimizing a local map.
And seventhly, optimizing the pose by utilizing loop detection and correcting the drift error.
The key point of the invention is that a dynamic object of an image is identified by using a YOLO algorithm, and then the segmentation of dynamic pixels and static pixels is carried out by using semantic segmentation; and (3) performing feature point matching on the static pixels, then performing mismatching elimination on dynamic pixels which cannot be identified by YOLO and are remained in semantic segmentation by using a GMS algorithm, and applying the mismatching elimination to an ORB-SLAM system.
The method integrates a deep convolutional neural network model and a traditional image feature point matching technology, fully utilizes the information of the image, eliminates the dynamic object pixels in the image and keeps static pixels; the SLAM technology based on the feature points only uses the feature point information in the image, and can not distinguish whether the feature points are the feature points of moving objects such as pedestrians, automobiles and the like, but the invention fully utilizes the information provided by the image, carries out object identification on the image and carries out semantic segmentation on the image, and can more effectively and fully utilize the information of the image; moreover, the GMS algorithm is utilized to effectively and quickly eliminate the mismatching for the problem of residual dynamic characteristic points which are not removed in the recognition and semantic segmentation; the method has stronger accuracy and robustness in a dynamic environment.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.
Claims (1)
1. A dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching is characterized by comprising the following steps:
reading a first frame RGB image through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold valueTaking the image as a reference frame, and initializing the system; if the number of feature points is less than the thresholdReading the next frame of RGB picture, and extracting ORB feature points until the number of the feature points is greater than a threshold valueCarrying out system initialization;
reading an RGB image of a next frame as a current frame, detecting a moving object of the current frame by using a YOLO algorithm, and selecting the object frame in the scene;
thirdly, a target frame of a detection result of the YOLO algorithm also comprises pixel points of other objects besides the target object, and in order to fully utilize pixel information of the image, the scene is subjected to semantic segmentation through FCN and segnet algorithms, so that dynamic pixels and static pixels of the image are obtained;
step four, matching the static pixels and the reference frames in the step three by using feature points, and eliminating error matching of the obtained feature matching points by using a Grid Motion Statistics (GMS) algorithm; the Grid Motion Statistics (GMS) algorithm proposes an assumption based on the smoothness of the motion: the matching point of one feature point p1 on the first frame image on the second frame image is p2, and if the matching is correct, the matching points of the feature points in the 3 x 3 grid with the p1 as the center are all in the 3 x 3 grid with the p2 as the center with high probability; based on the hypothesis, the two frames of images are subjected to grid division, and the matching points in the corresponding grid area are scored, wherein the scoring is defined as follows:
wherein | XikjkL represents the number of matched feature points on the corresponding grid pair;
because the number of feature matching points between images with good motion continuity is larger than that between images with poor continuity, and therefore the corresponding score is also high, an adaptive threshold T is set to ensure the universality under different scenes, if the number of matching points is larger than the threshold T, the matching pair is considered to be a correct matching pair, otherwise, the matching pair is an incorrect matching pair, wherein the calculation formula of T is as follows:
wherein n is the average number of the feature points in each grid;
step five, calculating the three-dimensional coordinates of the feature points subjected to matching processing in the step four, and starting a tracking thread of the ORB-SLAM to track the feature points;
sixthly, minimizing the reprojection error by using a beam adjustment method (BA) and optimizing a local map;
and seventhly, optimizing the pose by utilizing loop detection and correcting the drift error.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911394459.3A CN111161318A (en) | 2019-12-30 | 2019-12-30 | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911394459.3A CN111161318A (en) | 2019-12-30 | 2019-12-30 | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111161318A true CN111161318A (en) | 2020-05-15 |
Family
ID=70559054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911394459.3A Pending CN111161318A (en) | 2019-12-30 | 2019-12-30 | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111161318A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112258453A (en) * | 2020-09-27 | 2021-01-22 | 南京一起康讯智能科技有限公司 | Positioning landmark detection method for industrial fault inspection robot |
CN112349096A (en) * | 2020-10-28 | 2021-02-09 | 厦门博海中天信息科技有限公司 | Method, system, medium and equipment for intelligently identifying pedestrians on road |
CN112381841A (en) * | 2020-11-27 | 2021-02-19 | 广东电网有限责任公司肇庆供电局 | Semantic SLAM method based on GMS feature matching in dynamic scene |
CN112418288A (en) * | 2020-11-17 | 2021-02-26 | 武汉大学 | GMS and motion detection-based dynamic vision SLAM method |
CN113990101A (en) * | 2021-11-19 | 2022-01-28 | 深圳市捷顺科技实业股份有限公司 | Method, system and processing device for detecting vehicles in no-parking area |
CN112258453B (en) * | 2020-09-27 | 2024-04-26 | 南京一起康讯智能科技有限公司 | Industrial fault inspection robot positioning landmark detection method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109974743A (en) * | 2019-03-14 | 2019-07-05 | 中山大学 | A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure |
CN110009732A (en) * | 2019-04-11 | 2019-07-12 | 司岚光电科技(苏州)有限公司 | Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method |
CN110349250A (en) * | 2019-06-28 | 2019-10-18 | 浙江大学 | A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera |
CN110378345A (en) * | 2019-06-04 | 2019-10-25 | 广东工业大学 | Dynamic scene SLAM method based on YOLACT example parted pattern |
-
2019
- 2019-12-30 CN CN201911394459.3A patent/CN111161318A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109974743A (en) * | 2019-03-14 | 2019-07-05 | 中山大学 | A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure |
CN110009732A (en) * | 2019-04-11 | 2019-07-12 | 司岚光电科技(苏州)有限公司 | Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method |
CN110378345A (en) * | 2019-06-04 | 2019-10-25 | 广东工业大学 | Dynamic scene SLAM method based on YOLACT example parted pattern |
CN110349250A (en) * | 2019-06-28 | 2019-10-18 | 浙江大学 | A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera |
Non-Patent Citations (4)
Title |
---|
YONGKANG ZHANG等: "Bilateral Grid Statistics Combined with BRISK for Robust Matching", 《2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC)》 * |
YONGKANG ZHANG等: "Bilateral Grid Statistics Combined with BRISK for Robust Matching", 《2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC)》, 5 August 2019 (2019-08-05), pages 263 - 269 * |
王星尧: "移动机器人在未知环境下感知及自主规划的关键算法研究", 《中国优秀硕士论文全文数据库信息科技辑》 * |
王星尧: "移动机器人在未知环境下感知及自主规划的关键算法研究", 《中国优秀硕士论文全文数据库信息科技辑》, 15 May 2019 (2019-05-15), pages 140 - 420 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112258453A (en) * | 2020-09-27 | 2021-01-22 | 南京一起康讯智能科技有限公司 | Positioning landmark detection method for industrial fault inspection robot |
CN112258453B (en) * | 2020-09-27 | 2024-04-26 | 南京一起康讯智能科技有限公司 | Industrial fault inspection robot positioning landmark detection method |
CN112349096A (en) * | 2020-10-28 | 2021-02-09 | 厦门博海中天信息科技有限公司 | Method, system, medium and equipment for intelligently identifying pedestrians on road |
CN112418288A (en) * | 2020-11-17 | 2021-02-26 | 武汉大学 | GMS and motion detection-based dynamic vision SLAM method |
CN112418288B (en) * | 2020-11-17 | 2023-02-03 | 武汉大学 | GMS and motion detection-based dynamic vision SLAM method |
CN112381841A (en) * | 2020-11-27 | 2021-02-19 | 广东电网有限责任公司肇庆供电局 | Semantic SLAM method based on GMS feature matching in dynamic scene |
CN113990101A (en) * | 2021-11-19 | 2022-01-28 | 深圳市捷顺科技实业股份有限公司 | Method, system and processing device for detecting vehicles in no-parking area |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109544636B (en) | Rapid monocular vision odometer navigation positioning method integrating feature point method and direct method | |
CN111462200B (en) | Cross-video pedestrian positioning and tracking method, system and equipment | |
CN110378345B (en) | Dynamic scene SLAM method based on YOLACT instance segmentation model | |
CN111161318A (en) | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching | |
CN110738673A (en) | Visual SLAM method based on example segmentation | |
Yuan et al. | Robust lane detection for complicated road environment based on normal map | |
Delmerico et al. | Building facade detection, segmentation, and parameter estimation for mobile robot localization and guidance | |
CN112396595B (en) | Semantic SLAM method based on point-line characteristics in dynamic environment | |
CN111696118A (en) | Visual loopback detection method based on semantic segmentation and image restoration in dynamic scene | |
CN103886107A (en) | Robot locating and map building system based on ceiling image information | |
CN112484746B (en) | Monocular vision auxiliary laser radar odometer method based on ground plane | |
CN110021029B (en) | Real-time dynamic registration method and storage medium suitable for RGBD-SLAM | |
CN111797688A (en) | Visual SLAM method based on optical flow and semantic segmentation | |
CN111242985A (en) | Video multi-pedestrian tracking method based on Markov model | |
Dornaika et al. | A new framework for stereo sensor pose through road segmentation and registration | |
CN111914832B (en) | SLAM method of RGB-D camera under dynamic scene | |
CN117315547A (en) | Visual SLAM method for solving large duty ratio of dynamic object | |
Zhuang et al. | Amos-SLAM: An Anti-Dynamics Two-stage SLAM Approach | |
CN114283199A (en) | Dynamic scene-oriented dotted line fusion semantic SLAM method | |
Che et al. | Traffic light recognition for real scenes based on image processing and deep learning | |
CN113837243A (en) | RGB-D camera dynamic visual odometer method based on edge information | |
CN113592947A (en) | Visual odometer implementation method of semi-direct method | |
CN112614161A (en) | Three-dimensional object tracking method based on edge confidence | |
Tao et al. | A sky region segmentation method for outdoor visual-inertial SLAM | |
Ji et al. | Robust RGB-D SLAM in Dynamic Environments for Autonomous Vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200515 |