CN112396593A

CN112396593A - Closed loop detection method based on key frame selection and local features

Info

Publication number: CN112396593A
Application number: CN202011360902.8A
Authority: CN
Inventors: 宋海龙; 游林辉; 胡峰; 孙仝; 陈政; 张谨立; 黄达文; 王伟光; 梁铭聪; 黄志就; 何彧; 陈景尚; 谭子毅; 尤德柱; 区嘉亮; 陈宇婷
Original assignee: Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-02-23
Anticipated expiration: 2040-11-27
Also published as: CN112396593B

Abstract

The invention relates to a closed loop detection method based on key frame selection and local features. The key frames are selected through KLT sparse optical flow tracking, the motion speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be well processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved.

Description

Closed loop detection method based on key frame selection and local features

Technical Field

The invention relates to the field of positioning and navigation based on vision in autonomous inspection of unmanned aerial vehicles, in particular to a closed loop detection method based on key frame selection and local features.

Background

In the unmanned aerial vehicle intelligence inspection process, the unmanned aerial vehicle needs to independently decide the required operation of going on according to environmental information. Therefore, autonomous positioning and environment map sensing and construction are key links in autonomous inspection of the unmanned aerial vehicle. In recent years, development of visual SLAM (simultaneous localization and mapping) technology has improved the capability of autonomous localization and mapping of mobile robots. The closed-loop detection is an important component in the visual SLAM system, is used for detecting whether the mobile robot returns to a place visited once, and plays an extremely important role in reducing the positioning error of the mobile robot and constructing a globally consistent environment map. The closed-loop detection matches the current frame with the key frame, and judges whether to be closed according to the matching degree, so that the correct selection of the key frame is crucial to the closed-loop detection.

The Chinese patent application with publication number "CN 109902619A" and publication date of 2019, 6 and 18 discloses an image closed-loop detection method and system, and the method comprises the following steps: extracting a FAST corner point for each frame image, and calculating a BRIEF operator; substituting the BRIEF operator into a pre-established word bag model to obtain a visual word corresponding to the operator; the visual words are used for establishing vector description of the image; judging whether a current image is likely to generate a closed loop or not based on a tracking prediction algorithm, and predicting the likely position of the closed loop to obtain a closed loop candidate set; evaluating the similarity degree of the current image and each image in the closed-loop candidate set through the visual word vector, and taking the image with the highest similarity in the closed-loop candidate set as a candidate image; carrying out normalization processing on the candidate image to obtain a normalized image; and calculating an ORB global operator of the normalized image to complete the structure check of the candidate image. The invention can effectively accelerate the detection algorithm and provide more accurate closed-loop detection performance.

The method belongs to a closed loop detection method based on a visual bag-of-words model, and comprises the steps of extracting local feature points and descriptors of an input image, obtaining BoW vector representation of the input image by means of a visual dictionary, and judging whether to be closed loop or not through a tracking prediction algorithm. Closed-loop detection based on the visual bag-of-words model has better robustness under the condition of changing the visual angle of an image, but is difficult to process the condition of changing the appearance. Meanwhile, the method lacks selection of key frames, only takes the similarity as a candidate image, and the calculation amount is large, so that the final detection speed is influenced.

Disclosure of Invention

The invention aims to solve the problem of slow detection speed in the prior art, and provides a closed-loop detection method based on key frame selection and local features.

In order to solve the technical problems, the invention adopts the technical scheme that: a closed loop detection method based on key frame selection and local features comprises the following steps:

the method comprises the following steps: an input image acquired by a mobile robot; determining a first frame of an input image sequence as a key frame, extracting Shi _ Tomasi corner points of a previous key frame of a current input image, tracking the corner points in the current input image iteratively by adopting a sparse optical flow tracking algorithm, and if the number of the corner points which cannot be tracked is greater than a threshold value, determining the current input image as a new key frame;

step two: extracting global features from the current input image by adopting a convolutional neural network trained by an image classification data set, and inserting the extracted global features into a layered navigable small-world map of an approximate nearest neighbor retrieval algorithm if the current input image is a key frame;

step three: in the retrieval range of the current input image, retrieving a key frame most similar to the current input image as a closed-loop candidate key frame of the current image through HNSW, and taking all images between the closed-loop candidate key frame and a key frame next to the closed-loop candidate key frame as a closed-loop candidate image queue;

step four: introducing geometric consistency check, respectively extracting ORB characteristic points and corresponding local difference binary descriptors LDB from the input image and the retrieved closed-loop candidate image, and respectively matching the input image with descriptors of images in a closed-loop candidate image queue;

step five: the closed-loop candidate image which is most matched with the LDB descriptor of the current input image is used as an optimal closed-loop candidate image, the feature points matched with the two images are input into a random sampling consistency algorithm to further eliminate mismatching and solve a basic matrix, and if the number of the inner points between the two images is less than a threshold value, the two images do not form a closed loop; if the number of inner points between the two images is larger than the threshold value, the two images may form a closed loop;

step six: and (4) introducing time consistency check, and if the continuous 2 frames of images after the current input image all meet the threshold condition of the step five, considering that the input image and the closed-loop candidate image form a group of closed loops.

Preferably, in the first step, the corner points are iteratively tracked in the current input image by using a sparse optical flow tracking algorithm KLT, specifically:

current input image I_iIs marked as I_k-1For image I_iAnd I_k-1Performing graying to obtain an image G_i、G_k-1(ii) a Extracting image G_k-1Shi _ Tomasi corner point, set image I_iAnd I_k-1The brightness is kept constant before and after the movement of the middle pixel point, and an image G is calculated_k-1At the center point P (x, y) in the image G_iPosition P (x + dx, y + dy) and optical flow

The specific calculation steps are as follows: for the current input image I_iPerforming graying to obtain an image G_iImage I_iThe gray image of the previous key frame is G_k-1Extracting an image G_k-1The Shi _ Tomasi corner point. For image G_k-1、G_iAnd respectively carrying out Gaussian pyramid transformation to obtain L layers of images with different resolutions. At L_mIn a layer, assume G_k-1At the corner P (x, y) in the image G_iTo point P (x + dx, y + dy), taking time dt. Because the luminance keeps invariable before and after the pixel moves in two pictures, promptly:

I(x,y,t)＝I(x+dx,y+dy,t+dt) (1)

where I (x, y, t) represents the brightness of the pixel P (x, y) at time t, and I (x + dx, y + dy, t + dt) represents the shifted image G_iBrightness at the middle pixel point P (x + dx, y + dy). I (x + dx, y + dy, t + dt) can be decomposed by Taylor's formula as:

wherein epsilon is infinitesimal and can be ignored. Equation (1) can therefore be simplified to:

both sides are divided by dt simultaneously:

let u, v be the velocity components of the flow along the X-axis and Y-axis, respectively, i.e.

In addition, note

At this time, equation (5) can be written as:

I_xu+I_yv+I_t＝0 (8)

assuming that the pixel points around P (x, y) keep the same moving distance with P (x, y), a window with size of (5,5) is taken around P (x, y), and the pixel points in the window have:

and solving the optimal solution of the equation set by adopting a least square method so as to minimize the matching error sum in the window. Equation (9) can be abbreviated as:

Ad＝b (10)

multiplying both sides by A^T：(A^TA)d＝A^Tb (11)

At this time, velocity vectors u and v of the optical flow along the X axis and the Y axis are obtained as follows:

the L < th > value can be calculated by the solved u and v values_mCorner point P (x, y) in layer in image G_iPosition P (x + dx, y + dy) and optical flow

Mixing L with_mThe optical flow value obtained by layer calculation is taken as L_m-1Initial value of laminar flow, and calculating L_m-1The precise value of the laminar flow until the lowest layer L is calculated₀The optical flow of the original image and the traced corner point P (x + dx, y + dy).

Preferably, if the number of corner points that cannot be tracked is greater than the threshold, it is determined that the current input image is a new key frame specifically:

key frame image G_k-1At the current input image G_iWhen KLT sparse optical flow tracking is performed, if the following occurs, it is considered that tracking has failed:

(1) corner point P (x, y) at G_iOut of image range;

(2) the sum of the matching errors in the neighborhood of the matching corner points is greater than a threshold value;

if the number of corner points which fail to track is greater than the set threshold value, the current input image I is considered_iIs a new key frame.

Preferably, in the second step, the extracting global features of the current input image by using the convolutional neural network trained by the image classification dataset specifically includes: for the current input image I_iAnd preprocessing is carried out, the input of the convolutional neural network requires to adjust the size of the image, and the output of the last but one full connection of the convolutional neural network is used as the global characteristic of the image.

Preferably, in the third step, if the current input image is a key frame, the specific process of inserting the extracted global features into the hierarchical navigable small-world map of the approximate nearest neighbor search algorithm is as follows: if the current input image I_iSelected as key frames, the images I are randomly assigned by an exponentially decaying probability distribution function_iCharacteristic node of (1) the highest level number l in the HNSW structure_maxInsert the feature node into l_maxTo the bottom layer l₀In all layers of (a). And searching M nodes nearest to the node in each layer respectively, and connecting the new characteristic node with the M nodes nearest to the new characteristic node.

Preferably, in the second step, the search range of the current input image specifically includes:

U_sa＝U_before-U_fr×ct

wherein, U_saIndicating a search range of the input image; u shape_beforeA set representing all images preceding the current input image; fr is the frame rate of the camera; ct is a time constant; u shape_fr×ctIs a set of fr × ct frame images preceding the current input image.

Preferably, in the fourth step, the specific process of extracting ORB feature points and corresponding local differential binary descriptors LDB from the current input image and the retrieved closed-loop candidate image queue includes:

respectively extracting ORB characteristic points from the current input image and the closed loop candidate image queue, and for each ORB characteristic point k_ijIn k, with_ijCropping a block S of size S × S for the center_ijWill S_ijDivided into c x c mesh units of equal size

Calculating the average intensity I of each grid cell_avgAnd gradient d_x、d_y. For S_ijOf any two grid cells

Executing binary test to obtain binary code as the sum of characteristic points k_ijCorresponding binary LDB descriptors.

Preferably, for S_ijOf any two grid cells

Executing binary test, specifically:

wherein f (m), f (n) respectively represent grid cells

Average intensity of_avgAnd gradient d_x、d_yThe value is obtained.

Preferably, in the fourth step, the input image is matched with the descriptors of the images in the closed-loop candidate image queue, specifically, the descriptors are matched

Input image I using Hamming distance_iAnd closed loop candidate image I_nFor the input image I_iLDB descriptor of

In the candidate image I_nIn search and

two descriptors with the closest distance

If it is

And

if the following conditions are satisfied, the product is considered to be

And

is a pair of satisfactory feature matching:

wherein the content of the first and second substances,

respectively represent feature descriptors

And

hamming distance, epsilon between_dThe value is usually less than 1 for the distance scaling factor.

Preferably, the Hamming distance is adopted for the input image I_iAnd closed loop candidate image I_nThe specific matching of the LDB descriptors is as follows:

wherein d is¹，d²Representing two LDB descriptors, d_iDenotes d¹，d²Bit i of the descriptor.

Compared with the prior art, the invention has the beneficial effects that:

1. the key frames are selected through KLT sparse optical flow tracking, the motion speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be well processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved.

2. The invention checks whether the two images form a closed loop or not through the local differential binary descriptor LDB, thereby not only obtaining the geometric topological relation between the two images, but also verifying whether the two images form the closed loop or not, and improving the precision of closed loop detection.

3. The invention extracts the global features of the image by adopting the convolutional neural network trained by the image classification dataset and uses the global features for nearest neighbor image retrieval, thereby being capable of better coping with scenes with appearance changes.

Drawings

FIG. 1 is a flow chart of a closed loop detection method based on key frame selection and local features of the present invention;

FIG. 2 is a flowchart of key frame selection for a closed loop detection method based on key frame selection and local features according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms such as "upper", "lower", "left", "right", "long", "short", etc., indicating orientations or positional relationships based on the orientations or positional relationships shown in the drawings, it is only for convenience of description and simplicity of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationships in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

The technical scheme of the invention is further described in detail by the following specific embodiments in combination with the attached drawings:

examples

Fig. 1-2 show an embodiment of a closed-loop detection method based on key frame selection and local features, comprising the following steps:

the method comprises the following steps: the first frame of the input image sequence is identified as the key frame, the current input image I_iIs marked as I_k-1For image I_iAnd I_k-1Performing graying to obtain an image G_i、G_k-1. Extracting image G_k-1Shi _ Tomasi corner point; for image G_k-1、G_iRespectively carrying out Gaussian pyramid transformation to obtain L layers of images with different resolutions;

due to the image G_k-1、G_iThe brightness is kept constant before and after the movement of the middle pixel point, and the L-th pixel point is calculated by solving the velocity components u and v of the optical flow along the X axis and the Y axis_mCorner point P (x, y) in layer in image G_iPosition P (x + dx, y + dy) and optical flow

Image G_k-1In image G_iWhen KLT sparse optical flow tracking is performed, if the following occurs, it is considered that tracking has failed:

(1) corner point P (x, y) at G_iOut of the image range.

(2) The sum of the matching errors in the neighborhood of the matching corner points in the two images is greater than a threshold.

If the number of corner points which fail to track is greater than the set threshold value, the current input is consideredImage I_iIs a new key frame.

Step two: for the current input image I_iPreprocessing is performed to resize the image to 224 x 224 pixels. Extraction of image I Using convolutional neural network VGG16 trained by Places365-standard dataset_iThe output of the penultimate fully connected layer of the VGG16 network will be the image I_iGlobal feature f of_glo,i. And if the current input image is a key frame, inserting the extracted global features into a hierarchical navigable small-world map (HNSW) of an approximate nearest neighbor search algorithm.

Step three: at the current input image I_iIn the retrieval range, a key frame most similar to the current input image is retrieved through HNSW and used as a closed loop candidate key frame of the current image, and all images between the closed loop candidate key frame and a key frame next to the closed loop candidate key frame are used as a closed loop candidate image queue. Since the image sequence transmitted by the mobile robot has high similarity between adjacent images, the retrieval range of the current input image is U_saAll key frames within:

U_sa＝U_before-U_fr×ct

in the formula of U_beforeFor at the current input image I_iSet of all previous images, fr frame rate of camera, ct time constant, U_fr×ctIs a set of fr × ct frame images preceding the current input image.

Step four: introducing geometric consistency check to the current input image I_iAnd extracting ORB characteristic points respectively with the retrieved closed loop candidate image queue. For each ORB feature point k_ijIn k, with_ijCropping a block S of size S × S for the center_ij. Secondly, adding S_ijDivided into c x c mesh units of equal size

The binary test is performed as follows:

wherein f (m) is a grid unit

Average intensity of_avgAnd gradient d_x、d_yValue of (a), (b), (c) represents a grid cell

Average intensity of_avgAnd gradient d_x、d_yThe value of (c). To S_ijAfter the c × c grid units all execute binary test, the obtained binary code is the sum of the characteristic point k_ijCorresponding binary LDB descriptors.

After the current input image I is obtained_iAfter the LDB descriptor of the closed loop candidate image queue, the Hamming distance is respectively adopted for the input image I_iWith pictures I in a closed-loop candidate picture queue_q,nFor I, for the LDB descriptor of_iLDB descriptor of

In I_q,nIn search and

two LDB descriptors with the nearest distance

If it is

And

if the following conditions are satisfied, the product is considered to be

And

is a good feature match:

wherein the content of the first and second substances,

presentation descriptor

And

the Hamming distance between the two electrodes,

presentation descriptor

And

the Hamming distance between them. Epsilon_dThe value is usually less than 1 for the distance scaling factor.

Input image I using Hamming distance_iAnd closed loop candidate image I_nThe specific matching of the LDB descriptors is as follows:

Step five: with the current input image I_iMost matching of LDB descriptorsThe closed-loop candidate image is used as an optimal closed-loop candidate image, the matched characteristic points of the two images are input into a random sampling consistency algorithm (RANSAC) to further eliminate mismatching and solve a basic matrix; if the number of the inner points between the two images is less than the threshold value, the two images do not form a closed loop; if the number of inliers between two images is not less than the threshold, the two images may form a closed loop.

Step six: checking the consistency of the incoming time if the current input image I_iAnd C, if the subsequent 2 continuous frame images meet the threshold condition of the step five, the current input image and the optimal closed-loop candidate image are considered to form a group of closed loops.

The beneficial effects of this example: 1. the key frames are selected through KLT sparse optical flow tracking, the motion speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be well processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved. 2. The invention checks whether the two images form a closed loop or not through the local differential binary descriptor LDB, thereby not only obtaining the geometric topological relation between the two images, but also verifying whether the two images form the closed loop or not, and improving the precision of closed loop detection. 3. The invention extracts the global features of the image by adopting the convolutional neural network trained by the image classification dataset and uses the global features for nearest neighbor image retrieval, thereby being capable of better coping with scenes with appearance changes.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A closed loop detection method based on key frame selection and local features is characterized by comprising the following steps:

2. A closed-loop detection method based on key-frame selection and local features as claimed in claim 1, characterized in that in said step one, the corner points are iteratively tracked in the current input image using a sparse optical flow tracking algorithm KLT, specifically:

3. The method according to claim 2, wherein if the number of corner points that cannot be tracked is greater than a threshold, the method considers that the current input image is a new key frame specifically as follows:

(1) corner point P (x, y) at G_iOut of image range;

4. The method according to claim 3, wherein in the second step, the current input image uses a convolutional neural network trained by the image classification dataset to extract global features specifically as follows: for the current input image I_iPreprocessing, inputting the image size required by the convolution neural network, and calculating the reciprocal of the convolution neural networkThe second fully connected output serves as a global feature of the image.

5. The method according to claim 3, wherein in the third step, if the current input image is a key frame, the specific process of inserting the extracted global features into the hierarchical navigable small-world map of the approximate nearest neighbor search algorithm is as follows: if the current input image I_iSelected as key frames, the images I are randomly assigned by an exponentially decaying probability distribution function_iCharacteristic node of (1) the highest level number l in the HNSW structure_maxInsert the feature node into l_maxTo the bottom layer l₀In all layers of (a). And searching M nodes nearest to the node in each layer respectively, and connecting the new characteristic node with the M nodes nearest to the new characteristic node.

6. The method according to claim 1, wherein in the second step, the search range of the current input image is specifically:

U_sa＝U_before-U_fr×ct

7. The method as claimed in claim 1, wherein in the fourth step, the specific process of extracting ORB feature points and corresponding local differential binary descriptors LDB from the current input image and the retrieved closed-loop candidate image queue is as follows:

Calculating the average intensity I of each grid cell_avgAnd gradient d_x、d_y(ii) a For S_ijOf any two grid cells

8. The method of claim 7, wherein for S, the closed loop detection method is based on key frame selection and local feature_ijOf any two grid cells

Executing binary test, specifically:

wherein f (m), f (n) respectively represent grid cells

Average intensity of_avgAnd gradient d_x、d_yThe value is obtained.

9. The method according to claim 8, wherein in step four, the input image is matched with the descriptors of the images in the closed-loop candidate image queue, specifically, the descriptors of the images in the closed-loop candidate image queue are matched

Input image I using Hamming distance_iAnd closed loop candidate image I_nFor the input image I_iLDB tracing ofWord-writing device

In the candidate image I_nIn search and

two descriptors with the closest distance

If it is

And

if the following conditions are satisfied, the product is considered to be

And

is a pair of satisfactory feature matching:

wherein the content of the first and second substances,

respectively represent feature descriptors

And

10. Root of herbaceous plantThe method as claimed in claim 9, wherein the Hamming distance is used for the input image I_iAnd closed loop candidate image I_nThe specific matching of the LDB descriptors is as follows: