CN114372510A

CN114372510A - Interframe matching slam method based on image region segmentation

Info

Publication number: CN114372510A
Application number: CN202111540149.5A
Authority: CN
Inventors: 阮晓钢; 谭晨硕; 朱晓庆
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-04-19

Abstract

The invention discloses an interframe matching slam method based on image region segmentation. And secondly, an evaluation system is added to evaluate each region, and regions with low contrast and difficult extraction of feature points with strong identification are excluded to obtain feature points with stronger robustness. And finally, verifying the data set. The experimental results show that the improvement orb is improved by 4.1% and 5.8% in feature point matching compared with orb-slam and orb-slam2 respectively; respectively promoting 111 frames and 65.4 frames in the aspect of initialization; the method has the advantages that better effects are achieved in the aspects of pose estimation and real-time positioning, and the accuracy of feature point matching and positioning can be improved under the condition of no speed loss.

Description

Interframe matching slam method based on image region segmentation

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to an interframe matching slam method based on image region segmentation in an indoor and outdoor unknown environment.

Background

Currently, positioning technologies exhibit a tendency of hundreds of flowers, such as WIFI positioning technology, RFID positioning technology, infrared technology, ultrasonic positioning technology, visual slam technology, and the like. In view of the particularity and limitation of indoor and outdoor positioning, special requirements are made on positioning technology. The WIFI positioning technology is mature, but still has the defects of serious positioning drift phenomenon caused by large environmental influence, easy interference of same frequency signals and the like. The RFID positioning technology has short action distance and no communication capability, is not easy to be integrated into other systems, and is difficult to realize accurate positioning. The infrared positioning technology has high manufacturing cost and is greatly influenced by noise, and is difficult to popularize. Ultrasonic waves attenuate during transmission due to their wave characteristics, thereby affecting the positioning accuracy. The visual slam technology gradually becomes the main role of the positioning technology by virtue of the characteristics of small volume, low manufacturing cost, strong universality, rich information acquisition capability and the like.

SLAM (Simultaneous localization and mapping) is translated into simultaneous localization and mapping, and aims to place the robot into a completely unknown position in a completely unknown environment for moving, perform self-localization and environment detection through a sensor (camera) in the moving process, finally complete the construction of an unknown environment map, and realize the autonomous localization and navigation of the robot. Visual slam is classified into three types according to the camera types: monocular vision slam, binocular vision slam, and RGB-Dslam. The monocular vision slam is widely applied to people by virtue of the advantages of low price, flexible and convenient use and the like.

The extraction and matching of the image feature points are an important part in the slam technology, and the extraction and matching results directly influence the quality of inter-frame matching, whether pose estimation is really realized or not and finally influence the tracking and positioning results. The calculation of the key point descriptors of the existing method for realizing the self-positioning of the robot by extracting and matching key points between frames is time-consuming and almost cannot ensure the real-time performance; the feature point collection is too concentrated, so that most other possibly useful image information of the scene is ignored; the failure of the camera to obtain enough scene information when moving to the place where the feature point is missing causes the matching to be interrupted and the positioning to fail.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and an improved FAST algorithm (Features from accessed Segment Test) and a BRIEF algorithm (Binary Robust Independent element Features) are adopted to extract and describe the feature points; the technical implementation process of the method is as follows:

s1.1 feature point extraction

The idea of the FAST algorithm is that if the brightness of a certain pixel and most of the continuous pixels within a certain range is large or small, the pixel can be regarded as a corner point to be extracted, i.e., a feature point. During the process of passing the image pixel points, firstly, the current pixel point P is selected, and the pixel value of the current pixel point P is set as Ip. Setting a circle with radius r equal to 3, setting a threshold t after 16 pixel points on the circumference boundary, and detecting the 16 pixel points. If the pixel values of more than 12(FAST-12) continuous pixels are simultaneously greater than Ip + t or less than Ip-t, the pixel point P can be regarded as a characteristic point.

The value of a certain number of feature points is preset in each frame of image. In order to avoid the problem that the extraction of the feature points is too concentrated to cause the failure of inter-frame matching so as to position interruption, the method provides an area evaluation scheme based on image area segmentation so that the feature points are distributed in the whole frame image as far as possible. Under the conditions of not reducing the feature point extraction number of each frame and the successful matching number of the feature points between frames and not influencing the feature point extraction time of each frame, firstly dividing the image into a plurality of small square blocks, and evenly distributing the number of the feature points to be extracted of each frame into each block to ensure that the feature points are evenly distributed in the image; and then, a part of blocks with unobvious contrast are removed through a threshold thought, the blocks which are not beneficial to matching after the feature points are extracted, and pixel traversal is carried out in the rest blocks, so that the feature points which can represent the image more and have stronger robustness are obtained under the condition of not increasing the extraction time.

Compared with the traditional algorithms such as SIFT and SURF, the FAST algorithm can greatly improve the extraction efficiency of the feature points, and the extracted feature points do not have scale invariance and rotation invariance instead.

For the problem of scale invariance, an Image Pyramid (Image Pyramid) is introduced. The idea is to down-sample the input image by a proportion of 1.2, repeat the sampling seven times, and add the original image to obtain 8 layers of images with the same window and different scales. And respectively carrying out FAST corner detection on the 8 layers of images, wherein characteristic points detected in all layers obtained after screening obtain 8 different scales.

For the problem of rotation invariance, a gray Centroid method (Intensity Centroid) is introduced, and the idea is to use a connecting line of the characteristic point O and a Centroid of a gray value in a range with the characteristic point O as a center and a radius R as the direction of the characteristic point, so that the rotation invariance is obtained. Firstly, obtaining a region B with a characteristic point as a circle center and a radius of R, and calculating the distance M of the region B_pqIs composed of

Where (x, y) is the selected point in region B and I (x, y) is the gray scale value for point (x, y). The centroid C of the region B can then be found to be

Connecting the characteristic point O with the center of mass C to obtain a direction vector, and defining the characteristic point direction theta as

θ＝arctan(M₀₁/M₁₀) (3)

The method enables the feature points to have rotation invariance, and improves the robustness of expression of the feature points among different images.

S1.2 feature point description

Since orb preset 1000 feature points extracted from each image, each feature point is also described, which results in that the descriptor should have the detailed description of the feature points as much as possible without occupying a large amount of memory.

The BRIEF descriptor is a binary number, and the contrast quantity between pixels can be greatly reduced under the condition of ensuring higher descriptor efficiency.

The idea of the BRIEF algorithm is to randomly acquire n pairs of points in the square neighborhood of the feature points and compare the gray values:

wherein p (x) and p (y) are gray values of the selected point pair. A set of point pairs constitutes a description value.

After n times of operations between the pairs, a string of binary descriptors with the length of n is obtained. n is the feature dimension, taken 256 in the orb algorithm. To make the descriptor more descriptive, the point pairs x, y both follow a gaussian distribution.

And matching Hamming distance (Hamming distance) of descriptors of two points to be paired when the feature points are paired. If the matching similarity exceeds the threshold value, the two feature points can be considered to be matched.

Compared with the prior art, the method has the advantages that the calculation amount is reduced by reducing the redundant extraction quantity of key points between frames under the condition of ensuring the matching quality, and the real-time performance of the method is improved; and the characteristic points to be extracted are uniformly distributed in the whole frame of image so as to obtain most of possible useful image information in the scene, thereby ensuring the continuity of the matching of the characteristic points of the video stream.

Drawings

FIG. 1 is a schematic diagram of FAST characteristic points.

Fig. 2 is a brief description sub-point to sampling pattern diagram.

Fig. 3 is a flow chart of a method of improving orb.

Detailed Description

Fig. 1 is a schematic diagram of the extraction of the FAST feature point, and referring to fig. 1, the idea of the FAST algorithm is that if the brightness of a certain pixel and most of the continuous pixels within a certain range is large or small, the pixel can be regarded as a corner point to be extracted, i.e., a feature point. During the process of passing the image pixel points, firstly, the current pixel point P is selected, and the pixel value of the current pixel point P is set as Ip. Setting a circle with radius r equal to 3, setting a threshold t after 16 pixel points on the circumference boundary, and detecting the 16 pixel points. If the pixel values of more than 12(FAST-12) continuous pixels are simultaneously greater than Ip + t or less than Ip-t, the pixel point P can be regarded as a characteristic point.

Fig. 2 illustrates a brief description sub-point pair sampling method, and referring to fig. 2, the brief point pair selection method is roughly divided into five types:

1. averaging samples within the image block;

p and q both conform to

(ii) a gaussian distribution of;

p is in accordance with

Is gaussian distributed and q is in accordance with

(ii) a gaussian distribution of;

4. randomly sampling discrete positions under a spatial quantization polar coordinate;

5. fixing p to (0,0), and q is sampled averagely in the periphery;

the algorithm uses a second sampling method, i.e. both p and q are coincident

A gaussian distribution of (a).

Fig. 3 is an overall algorithm flowchart, and the algorithm realizes matching of feature points between simplified frames and enables the feature points to be uniformly distributed in the whole frame image, thereby completing self-positioning of the robot for inputting scene information through a single eye. The method comprises the following specific steps:

(1) and converting the current input frame into a gray image, and performing weighted average processing on the image by Gaussian filtering.

(2) The processed image is divided into 15 × 15 small blocks similar to the original image.

(3) And 3, calculating the gray standard deviation sigma of each block.

(4) Setting a lower threshold t₁Directly abandoning all sigma ≦ t₁The block of (1).

(5) And sorting the blocks in each residual row according to the size of sigma, and finding the block S with the minimum standard deviation in each row.

σ_min＝min{σ₁,σ₂,…σ_n}

(6) Discarding sigma_minThe block S is located.

(7) And defining the rest blocks as search domains for FAST feature point extraction.

Claims

1. An interframe matching slam method based on image region segmentation is characterized in that: the method is realized by the following steps:

s1.1, extracting feature points;

during the process of passing the image pixel points, firstly, selecting a current pixel point P, and setting the pixel value of the current pixel point P as Ip; setting a circle with radius r equal to 3, setting a threshold t after 16 pixel points pass through the circumference boundary, and detecting the 16 pixel points; if the pixel values of more than 12(FAST-12) continuous pixel points are simultaneously greater than Ip + t or less than Ip-t, the pixel point P is considered as a characteristic point;

presetting a quantity of characteristic point values for each frame of image; under the conditions that the feature point extraction number of each frame and the successful matching number of the inter-frame feature points are not reduced and the feature point extraction time of each frame is not influenced, firstly, an image is divided into a plurality of small square blocks, the number of the feature points to be extracted of each frame is evenly distributed into each block, and the feature points are ensured to be evenly distributed in the image; then, a part of blocks with unobvious contrast are removed through a threshold thought, the blocks which are not beneficial to matching after the feature points are extracted, and pixel traversal is carried out in the rest blocks so as to obtain the feature points which can represent the image more and have stronger robustness under the condition of not increasing the extraction time;

introducing an image pyramid, performing down-sampling on an input image according to the proportion of 1.2, repeating for seven times, and adding an original image to obtain 8 layers of images with the same window and different scales; performing FAST corner detection on 8 layers of images respectively, wherein characteristic points detected in all layers obtained after screening obtain 8 layers of different scales;

for the problem of rotation invariance, introducing a gray centroid method, and taking a connecting line of the characteristic point O and a centroid of a gray value in a range with the characteristic point O as a circle center and a radius of R as the direction of the characteristic point to obtain rotation invariance; firstly, obtaining a region B with a characteristic point as a circle center and a radius of R, and calculating the distance M of the region B_pqIs composed of

Wherein (x, y) is the selected point in the region B, and I (x, y) is the gray value of the point (x, y); the centroid C of the region B can then be found to be

θ＝arctan(M₀₁/M₁₀) (3)

S1.2 feature point description

Since 1000 feature points are preset and extracted from each image in orb, each feature point is described;

the BRIEF algorithm randomly acquires n pairs of point pairs in the square neighborhood of the feature points and compares the gray values:

τ(p:x,y):

wherein p (x) and p (y) are the gray values of the selected point pairs; a group of point pairs form a description value;

obtaining a string of binary descriptors with the length of n after n times of operations among the point pairs; n is a characteristic dimension, and 256 are taken in an orb algorithm; the point pairs are in Gaussian distribution according with x and y;

carrying out Hamming distance matching on descriptors of two points to be matched when matching the feature points; and if the matching similarity exceeds the threshold value, the two feature points are considered to be matched.