CN116958595A

CN116958595A - Visual SLAM loop detection improvement method based on image block region feature points

Info

Publication number: CN116958595A
Application number: CN202310960132.8A
Authority: CN
Inventors: 肖震东; 魏武; 杨姗; 柳雄顶
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-27

Abstract

The invention belongs to image processing, in particular to a visual SLAM technology, and discloses a visual SLAM loop detection improvement method based on image block region feature points. Firstly, converting an input image into a gray image, dividing the gray image into image blocks through grids, and calculating a gradient direction histogram of characteristic points to obtain gradient vectors of the characteristic points; judging the gradient vector weight of the feature points in the image block, wherein the condition that the gradient vector weight does not reach a set threshold value is called an invalid image block, and the condition that the gradient vector weight reaches the set threshold value is called an effective image block; and constructing a vocabulary tree model by using the effective image blocks in the current frame and the gradient vectors of the corresponding feature points, and continuously iterating and updating the content information of the vocabulary tree. And the similarity between two frames of images is measured through searching in the updated vocabulary tree model, so that better image retrieval and matching results are obtained, and more accurate loop detection is realized. Thus, the problems of error accumulation and camera track drift caused in the long-time positioning and three-dimensional map construction process of the camera are solved.

Description

Visual SLAM loop detection improvement method based on image block region feature points

Technical Field

The invention belongs to the technical field of simultaneous localization and mapping (SLAM) of image processing and vision, and particularly relates to an improved visual SLAM loop detection method based on image block region feature points.

Technical Field

In the long-time operation process of the vision SLAM system, the vision sensor is easily affected by environmental noise to cause continuous accumulation of errors, this accumulation of errors eventually leads to serious distortions in the positioning and mapping results. In order to solve the key problem, by adding a loop detection module in the visual SLAM framework, the camera acquires environment information and judges whether the camera returns to the original position. Therefore, the accurate loop detection module can provide reliable basis for realizing robustness of the visual SLAM system, and has important significance for realizing reasonable optimization of the whole motion trail of the camera and the map.

In the visual SLAM system, a loop detection module plays a key role in camera positioning and map construction. Therefore, according to the implementation principle of the loop detection method, the loop detection method can be largely classified into a loop detection algorithm based on geometric information and a loop detection method based on appearance information. The loop detection method based on the appearance information is mainly used for loop detection through image matching, is realized according to the color and texture changes of the images, and is easy to be influenced by ambient illumination. The loop detection algorithm based on the geometric information is simple and easy to realize by judging whether the current position is near a certain historical position in the past, and when the camera rotates, the noise problem generated by the camera can be overcome due to the characteristics of rotation invariance and the like. In addition, according to geometric characteristic information, key points and descriptors can be extracted from the image by utilizing feature point-based loop detection to help complete data association.

The feature point method utilizing the geometric information mainly completes data association between image frames by extracting texture information rich local areas of an input image, and further calculates the pose of a camera and corresponding space 3D point coordinates. The feature point method of the geometric information is not easy to be influenced by the change of the camera pose, and in order to have the advantages of the descriptors, the input image information is considered to be divided into image blocks, and the point feature extraction and description are carried out in the obvious region of the image block features. However, the common image global feature points have large descriptor extraction and matching calculation amount and are easily influenced by noise change, so that the problem of accumulation of camera pose errors is significant through image block information.

Disclosure of Invention

Aiming at the problems, the method for extracting the characteristic points of the image block region calculates the gradient direction histogram through image blocks and judges the effective image blocks by using weights, and loop detection is carried out through the effective image blocks, so that a great amount of time can be prevented from being wasted in a weak texture region with unobvious matching characteristics.

Therefore, in order to overcome the problems of error accumulation and camera track drift caused by long-time positioning of a camera and a three-dimensional map construction process, the method firstly converts an input image into a gray image, divides the image blocks through grids, and simultaneously calculates gradient direction histograms of feature points in each image block to obtain gradient vectors of the feature points. Then judging the gradient vector weight of the feature points in the image block, wherein the image block is called an invalid image block when the gradient vector weight does not reach a set threshold value, and the image block is called an effective image block when the gradient vector weight reaches the set threshold value. And finally, constructing a vocabulary tree model by using the effective image blocks in the current frame and gradient vectors of the corresponding feature points, and continuously iterating and updating the content information of the vocabulary tree. When the loop detection stage is executed, the similarity between two frames of images is measured through searching in the updated vocabulary tree model, so that better image retrieval and matching results are obtained, and more accurate loop detection is realized. Thus, the present invention has been completed. The general technical scheme is that image blocks are divided through grids, gradient histograms of the image blocks are calculated at the same time, feature extraction is carried out on effective image blocks of a current frame through the gradient direction histograms, a word bag model and a word tree are constructed, and finally similarity of two frames of images in the word bag model is judged to complete loop detection.

The technical scheme provided by the invention is as follows:

an improved visual SLAM loop detection method based on image block region feature points comprises the following steps:

step one, acquiring image data of a current frame, converting the image data of the current frame into a gray image, and dividing the gray image of the current frame into different sub-image blocks.

Step two, dividing grids of the whole image according to the size of the input image of the current frame. Preferably, in order to limit the size of the image block, the gradient direction histogram information of the feature points in the image block is guaranteed to be reflected, and the grid size is adaptively divided by inputting the size of the image. I.e. for an image of size h x w, it is partitioned using a grid of h/10 x w/10.

And thirdly, extracting characteristic points on the gray image blocks after division, calculating a gradient direction histogram, and setting the gradient amplitude value of the gradient to p (x, y) and the gradient direction to theta (x, y). The gradient vector of the feature points is as follows:

and fourthly, obtaining h/10 Xw/10 image blocks through grid division, calculating gradient direction histograms of the feature points in each image block, and counting gradient vectors of the feature points in each image block.

Step five, through calculation of the gradient vector of the characteristic point in the image block, a weight set W is used for representing the weight total value of the characteristic point in the image block, and a gradient direction histogram vector set W= { W is set ₁ ，w ₂ ，...，w _n Mean weight w _m 。

And step six, calculating the weight of the feature point gradient vector in any image block W in the current frame and adjacent to the image block W (upper, lower, left and right adjacent image blocks). If the gradient vector weight W reaches the mean weight W _m It is regarded as a valid imageAnd if not, the block is an invalid image block.

Step seven, starting searching from the effective image block to search for the effective image block capable of meeting the average weight w _m Is added to the weight set, and for each newly added effective weight value, the mean weight w is updated _m Until a new valid image block cannot be added.

Step eight, setting corresponding image sequences and image block set numbers contained in the current frame, the image blocks and the feature points, and then normalizing gradient direction histogram vectors of the effective image blocks in the current frame into unit vectors to obtain feature vectors of the effective image blocks.

And step nine, adding the effective image block information of the current frame into a word bag model to construct a vocabulary library.

And step ten, when the next frame of image arrives, repeating the steps, judging whether the similarity between the current effective image block and the image frame in the word bag library model reaches a weight threshold, and considering effective loop detection if the condition is met.

Preferably, in step seven, starting from the valid image block, it is found that it can satisfy the mean weight w _m The mean weight w _m As the dynamic weight, when other effective image blocks are added, the average weight w is dynamically updated _m 。

Preferably, in step eight, the feature vector of the valid image block includes the size and direction of the feature, which is used as a descriptor of the region feature and is an important factor for loop detection.

Preferably, in step ten, the weight threshold is a set weight parameter, and when the two frames of images are greater than or equal to the set weight parameter, the loop detection is considered to be effective.

The invention has the beneficial effects that:

(1) The invention adopts the characteristic points of the effective image blocks to extract and describe, can avoid the algorithm from wasting a large amount of time on the ineffective image blocks with unobvious characteristics, and greatly improves the real-time efficiency of the algorithm;

(2) The similarity between images is measured through the word bag model, and the word tree in the word bag model is continuously updated under the iteration of the algorithm, so that the algorithm can accurately and efficiently realize loop detection.

(3) In addition, the invention can also be applied to loop detection in the fields of disinfection epidemic prevention robots, warehouse logistics robots, unmanned automatic driving, AR/VR, military rescue and the like.

Drawings

In order to make the technical solutions of the embodiments of the present invention more clearly revealed, the drawings required in the description of the embodiments will be briefly described in detail.

FIG. 1 is a block diagram of an improved method for visual SLAM loop detection based on image block region feature points;

FIG. 2 is a schematic illustration of meshing an input image;

FIG. 3 is a schematic diagram of gradient locations and direction vectors of feature points in an image block;

FIG. 4 is a schematic diagram of valid tiles and invalid tiles;

FIG. 5 is a weight value of an effective image block of each frame image;

FIG. 6 is a schematic diagram of a vocabulary tree structure in a bag of words model.

Detailed Description

In order to further understand and appreciate the method of the present invention, the following description will simply and in detail describe the technical solution of the embodiment of the present invention with reference to the accompanying drawings.

Example 1

Referring to fig. 1, the invention provides an improved visual SLAM loop detection method based on image block region feature points, comprising the following steps:

preparation: as shown in the schematic diagram of fig. 2, when the camera collects one frame of picture data, the image information data is in RGB format, and the RGB image is converted into a gray image. The color of each pixel point in the gray image has a corresponding numerical value, and the value range of each pixel is 0-255. 0 represents black 255 represents white, and pixel values of pixels at specific locations can be obtained by locating the abscissa and ordinate of the pixel grid.

Referring to fig. 2, step one, obtaining image data of a current frame, converting the image data of the current frame into a gray scale image, and dividing the gray scale image of the current frame into different sub-image blocks.

Step two, dividing grids of the whole image according to the size of the input image of the current frame. In order to limit the size of the image block and ensure that the gradient direction histogram information of the feature points in the image block is reflected, the grid size is adaptively divided by inputting the size of the image. I.e. for an image of size h x w, it is partitioned using a grid of h/10 x w/10.

Specifically, the input image size is 600×800, and is divided by using a 60×80 mesh.

Referring to fig. 3, step three, feature point extraction is performed on the divided gray image blocks and a gradient direction histogram is calculated, and its gradient magnitude is set to p (x, y) and gradient direction is set to θ (x, y). The gradient vector of the feature points is as follows:

specifically, the green area is a divided image block, and the gradient direction histogram is calculated for the feature points in the image block, including the position and direction of the feature points, so as to obtain gradient vectors of the feature points.

Referring to fig. 4, step five, through calculation of gradient vectors of feature points in an image block, a weight set W is used to represent features in the image blockThe weight total value of the sign points and sets a gradient direction histogram vector set w= { W ₁ ，w ₂ ，...，w _n Mean weight w _m 。

And step six, calculating the weight of the feature point gradient vector in any image block W in the current frame and adjacent to the image block W (upper, lower, left and right adjacent image blocks). If the gradient vector weight W reaches the mean weight W _m It is considered a valid image block, otherwise an invalid image block.

Specifically, the green region is an effective image block because the weight W of the green region reaches the average weight W _m The method comprises the steps of carrying out a first treatment on the surface of the And the blue region does not reach the average weight and is regarded as an invalid image block.

Referring to fig. 5, step seven, starting from the effective image block, searching for the effective image block that satisfies the average weight w _m Is added to the weight set, and for each newly added effective weight value, the mean weight w is updated _m Until a new valid image block cannot be added.

Referring to fig. 6, step nine, adding the effective image block information of the current frame to the word bag model to construct a vocabulary tree.

Specifically, the image frames acquired by the camera can continuously construct and update node information of the vocabulary tree, and the node information of the vocabulary tree is constructed and updated to be the characteristic of the effective image block.

Example two

The embodiment is implemented by a handheld Kinect RGB-D camera on the premise of the technical scheme, the resolution is 640 multiplied by 480, and 1700 groups of data are collected for model training and verification in an indoor environment. The effectiveness and accuracy of the whole method are demonstrated by comparing the model training result with the true loop similarity score through a comparison experiment with the traditional orb algorithm (Oriented FAST and Rotated BRIEF). Because the current program algorithm cannot accurately judge whether two images are similar or shot from the same place and the same angle like a human brain, the perception deviation and the perception variation can occur. Therefore, we choose to compare recall and accuracy at the same time to evaluate the effectiveness of the model, and the calculation method is as follows:

precision =number of correct loop image frames extracted by algorithm/number of loop image frames extracted

Recall =number of correct loop image frames extracted by algorithm/number of loop image frames in sample

The accuracy describes the probability that the algorithm will extract all loops that are truly loops. Recall refers to the probability of being correctly detected in all real loops. The comparison results are as follows:

table 1 algorithm comparison

Method	Precision	Recall
			Orb(Baseline)	0.683	0.772
PatchUp(ours)	0.718	0.816

In table 1, by comparing with the Orb traditional algorithm, because the Orb algorithm adopts binary descriptors, the Orb algorithm is sensitive to indoor image noise and environmental brightness change, and the accuracy of loop detection is lower than that of the patch up method of the characteristic point of the image block area, which is proposed by the invention, the probability of truly being a true loop in all loops is improved to 0.718, and the accuracy of the method is fully shown. In addition, the Orb algorithm has weaker scale change capability in processing indoor environment images, is not efficient and stable as the PatchUp method provided by the invention when the handheld RGB-D camera generates larger rotation change to extract characteristic points, and can fully demonstrate that the method of the invention has better efficiency performance from a recall rate of 0.816.

Claims

1. The visual SLAM loop detection improvement method based on the image block region characteristic points is characterized by comprising the following steps of:

step one, acquiring image data of a current frame, converting the image data of the current frame into a gray image, and dividing the gray image of the current frame into different sub-image blocks;

step two, dividing grids of the whole image according to the size of the input image of the current frame;

step three, extracting characteristic points on the gray image blocks after division and calculating a gradient direction histogram, and setting the gradient amplitude value of the gradient direction histogram as p (x, y) and the gradient direction as theta (x, y);

step four, obtaining h/10 Xw/10 image blocks through grid division, calculating gradient direction histograms of feature points in each image block, and counting gradient vectors of the feature points in each image block;

step five, through calculation of the gradient vector of the characteristic point in the image block, a weight set W is used for representing the weight total value of the characteristic point in the image block, and a gradient direction histogram vector set W= { W is set ₁ ，w ₂ ，...，w _n Mean value of }The weight is w _m ；

Step six, calculating the weights of the feature point gradient vectors in the upper, lower, left and right adjacent image blocks adjacent to any image block Wi in the current frame and the positions of the image blocks, if the gradient vector weights Wi reach the mean weight w _m The image is regarded as a valid image block, otherwise, the image is an invalid image block;

step seven, starting from the effective image block, searching that the effective image block can meet and exceed the average weight w _m Is added to the weight set, and for each newly added effective weight value, the mean weight w is updated _m Until a new valid image block cannot be added;

step eight, setting corresponding image sequences and image block set numbers contained in the current frame, the image blocks and the feature points, and then normalizing gradient direction histogram vectors of the effective image blocks in the current frame into unit vectors to obtain feature vectors of the effective image blocks;

step nine, adding the effective image block information of the current frame into a word bag model to construct a word tree;

2. The improved method for detecting the visual SLAM loop based on the characteristic points of the image block area as claimed in claim 1, wherein in the first step and the third step, the gray level image is converted from an RGB image, the image data acquired by the camera is the RGB image, and the local characteristic information can be enhanced to a certain extent through graying.

3. The improvement method for visual SLAM loop detection based on image block area feature points of claim 1, wherein in step two, the mesh size is adaptively divided by the size of the input image, i.e. for an image of size h x w, a mesh of h/10 x w/10 is used for dividing it.

4. The improvement method for visual SLAM loop detection based on feature points of image block area as claimed in claim 1, wherein in the third step, gradient vectors of the feature points are as follows:

5. the improvement method for visual SLAM loop detection based on image block area feature points as set forth in claim 1, wherein in step seven, starting from the valid image block, searching for a valid image block that satisfies the mean weight w _m The mean weight w _m As the dynamic weight, when other effective image blocks are added, the average weight w is dynamically updated _m 。

6. The improvement method for visual SLAM loop detection based on image block area feature points as claimed in claim 1, wherein in step eight, said feature vector of said effective image block includes the size and direction of the feature, which is used as a descriptor of the area feature as an important factor of loop detection.

7. The improvement method for visual SLAM loop detection based on image block area feature points as claimed in claim 1, wherein in step nine, the word bag model is composed of each valid image block extracted from the current frame image and the corresponding feature vector, and the vocabulary tree is composed of each corresponding frame image sequence and image block set number.

8. The improvement in visual SLAM loop detection based on feature points of image block areas of claim 7, wherein in step nine, the node information of the lexical tree is constructed and updated as the feature of the valid image block based on the node information of the lexical tree that is continuously constructed and updated based on the acquired image frames.

9. The improvement method for detecting loop back in visual SLAM based on feature points of image block area as set forth in claim 1, wherein in step ten, the weight threshold is a set weight parameter, and when two frames of images are greater than or equal to the set weight parameter, the loop back detection is considered to be effective.