CN111161318A

CN111161318A - Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching

Info

Publication number: CN111161318A
Application number: CN201911394459.3A
Authority: CN
Inventors: 鲁仁全; 陈伯松; 陶杰; 李鸿一; 孟伟
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-15

Abstract

The invention discloses a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, which comprises the following steps: reading the RGB image of the first frame through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold value

Taking the image as a reference frame, and initializing the system; if the number of feature points is less than the threshold

Reading the next frame of RGB picture, extracting ORB characteristic points,until the number of feature points is greater than a threshold

Carrying out system initialization; the method combines a deep convolutional neural network model and an image characteristic point matching technology, fully utilizes the information of the image, eliminates the dynamic object pixels in the image and keeps static pixels; the matching precision of the image characteristic points in a dynamic scene is improved, the error accumulation of pose estimation is reduced, and the system robustness is improved.

Description

Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching

Technical Field

The invention relates to the technical field of computer vision and mobile robot positioning, in particular to a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching.

Background

SLAM (simultaneous localization and mapping) refers to creating a map in a completely unknown environment under the condition that the position of the map is uncertain, expanding the map, and simultaneously performing autonomous localization and navigation by using the map.

Slam finds application in many different fields, from indoor robots to outdoor, underwater and air systems and AR (augmented reality). On a theoretical and conceptual level, slam can now be considered as a problem that has been solved. However, in practice, implementing a generic slam solution, particularly in building and using a perceptually rich map as part of the slam algorithm, still presents a number of problems. In recent years, the visual SLAM has become a hot spot of technical research due to the simple structure and the great technical difficulty of the visual sensor.

The traditional slam system assumes that the scene is fixed in advance, but there are many uncertain factors in the real environment, such as the change of the intensity of illumination, walking pedestrians or animals, moving automobiles, and the like. In a dynamic environment, a conventional visual SLAM is easily mismatched and has a large error. The traditional feature point-based visual SLAM algorithm deals with simple dynamic scene problems by detecting dynamic feature points and labeling as noise points. The ORB-SLAM reduces the influence of moving objects on the positioning and mapping precision through RANSAC (random sample consensus), key frames and optimization of local maps. The visual SLAM algorithm based on the direct method processes the occlusion problem caused by the dynamic object by optimizing a cost function in an equation. However, these methods have great errors and limitations in handling dynamic objects.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, the method combines a deep convolutional neural network model and an image feature point matching technology, fully utilizes the information of a picture, eliminates dynamic object pixels in the image and reserves static pixels; the matching precision of the image characteristic points in a dynamic scene is improved, the error accumulation of pose estimation is reduced, and the system robustness is improved.

The purpose of the invention is realized by the following technical scheme:

a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching comprises the following steps:

reading a first frame RGB image through a vision sensor, carrying out ORB feature point extraction on the image and calculating the number of feature points, if the number of the feature points is more than a threshold value

Reading the next frame of RGB picture, and extracting ORB feature points until the number of the feature points is greater than a threshold value

Carrying out system initialization;

reading an RGB image of a next frame as a current frame, detecting a moving object of the current frame by using a YOLO algorithm, and selecting the object frame in the scene;

thirdly, a target frame of a detection result of the YOLO algorithm also comprises pixel points of other objects besides the target object, and in order to fully utilize pixel information of the image, the scene is subjected to semantic segmentation through FCN and segnet algorithms, so that dynamic pixels and static pixels of the image are obtained;

step four, matching the static pixels and the reference frames in the step three by using feature points, and eliminating error matching of the obtained feature matching points by using a Grid Motion Statistics (GMS) algorithm; the Grid Motion Statistics (GMS) algorithm proposes an assumption based on the smoothness of the motion: the matching point of one feature point p1 on the first frame image on the second frame image is p2, and if the matching is correct, the matching points of the feature points in the 3 x 3 grid with the p1 as the center are all in the 3 x 3 grid with the p2 as the center with high probability; based on the hypothesis, the two frames of images are subjected to grid division, and the matching points in the corresponding grid area are scored, wherein the scoring is defined as follows:

wherein | X_ikjkL represents the number of matched feature points on the corresponding grid pair;

because the number of feature matching points between images with good motion continuity is larger than that between images with poor continuity, and therefore the corresponding score is also high, an adaptive threshold T is set to ensure the universality under different scenes, if the number of matching points is larger than the threshold T, the matching pair is considered to be a correct matching pair, otherwise, the matching pair is an incorrect matching pair, wherein the calculation formula of T is as follows:

wherein n is the average number of the feature points in each grid;

step five, calculating the three-dimensional coordinates of the feature points subjected to matching processing in the step four, and starting a tracking thread of the ORB-SLAM to track the feature points;

sixthly, minimizing the reprojection error by using a beam adjustment method (BA) and optimizing a local map;

and seventhly, optimizing the pose by utilizing loop detection and correcting the drift error.

Compared with the prior art, the invention has the following beneficial effects:

the method integrates a deep convolutional neural network model and a traditional image feature point matching technology, fully utilizes the information of the image, eliminates the dynamic object pixels in the image and keeps static pixels; the SLAM technology based on the feature points only uses the feature point information in the image, and can not distinguish whether the feature points are the feature points of moving objects such as pedestrians, automobiles and the like, but the invention fully utilizes the information provided by the image, carries out object identification on the image and carries out semantic segmentation on the image, and can more effectively and fully utilize the information of the image; moreover, the GMS algorithm is utilized to effectively and quickly eliminate the mismatching for the problem of residual dynamic characteristic points which are not removed in the recognition and semantic segmentation; the method has stronger accuracy and robustness in a dynamic environment.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a diagram illustrating the matching result of GMS algorithm of the present invention;

fig. 3 is a schematic diagram of the GMS algorithm of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The invention discloses a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching, which is used for detecting potential moving objects in a dynamic scene, realizing semantic segmentation and motion detection of the scene and then eliminating potential moving object pixel points. Because the object identification and semantic segmentation are greatly influenced by the neural network training model, the situation that the dynamic object pixel points are not completely removed may exist. The core idea of the GMS is that: in terms of motion smoothness, the number of pairs of correct matching points attached to a correctly matching feature point should be greater than the number of pairs of correct matching points in the vicinity of the feature point of the incorrectly matching point. And performing mismatching elimination on the frame with semantic segmentation eliminated dynamic object pixels and the reference frame by using the GMS.

YOLO is a new target detection method, and the method has the characteristics of realizing rapid detection and simultaneously achieving higher accuracy. The target detection task is regarded as a regression problem of target area prediction and category prediction. The method adopts a single neural network to directly predict the boundary and the class probability of the article, and realizes end-to-end article detection. Meanwhile, the method has very fast speed detection, and the basic version can achieve real-time detection of 45 frames/s; fast YOLO can reach 155 frames/s.

The GMS (grid-based motion statistics) algorithm is a simple method to encapsulate motion smoothing as a certain number of matches in the data estimation region. The GMS may convert a high number of matches into a high quality match, which enables a real-time, highly robust system.

Specifically, as shown in fig. 1 to 3, a dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching includes the following steps:

And carrying out system initialization.

And step two, reading the RGB image of the next frame as the current frame, detecting the moving object of the current frame by using a YOLO algorithm, and selecting the object frame in the scene.

And thirdly, the target frame of the detection result of the YOLO algorithm also comprises pixel points of other objects besides the target object, and in order to fully utilize the pixel information of the image, the scene is subjected to semantic segmentation through FCN and segnet algorithms, so that dynamic pixels and static pixels of the image are obtained.

wherein n is the average number of the characteristic points in each grid, and the α value in the scheme is 5.

And step five, calculating the three-dimensional coordinates of the feature points subjected to matching processing in the step four, and starting the tracking thread of the ORB-SLAM to track the feature points.

And sixthly, minimizing the reprojection error by using a beam adjustment method (BA) and optimizing a local map.

The key point of the invention is that a dynamic object of an image is identified by using a YOLO algorithm, and then the segmentation of dynamic pixels and static pixels is carried out by using semantic segmentation; and (3) performing feature point matching on the static pixels, then performing mismatching elimination on dynamic pixels which cannot be identified by YOLO and are remained in semantic segmentation by using a GMS algorithm, and applying the mismatching elimination to an ORB-SLAM system.

The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims

1. A dynamic scene SLAM method based on a YOLO algorithm and GMS feature matching is characterized by comprising the following steps:

Carrying out system initialization;

wherein n is the average number of the feature points in each grid;