CN112396593B - Closed loop detection method based on key frame selection and local features - Google Patents

Closed loop detection method based on key frame selection and local features Download PDF

Info

Publication number
CN112396593B
CN112396593B CN202011360902.8A CN202011360902A CN112396593B CN 112396593 B CN112396593 B CN 112396593B CN 202011360902 A CN202011360902 A CN 202011360902A CN 112396593 B CN112396593 B CN 112396593B
Authority
CN
China
Prior art keywords
image
input image
current input
closed
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011360902.8A
Other languages
Chinese (zh)
Other versions
CN112396593A (en
Inventor
宋海龙
游林辉
胡峰
孙仝
陈政
张谨立
黄达文
王伟光
梁铭聪
黄志就
何彧
陈景尚
谭子毅
尤德柱
区嘉亮
陈宇婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Zhaoqing Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202011360902.8A priority Critical patent/CN112396593B/en
Publication of CN112396593A publication Critical patent/CN112396593A/en
Application granted granted Critical
Publication of CN112396593B publication Critical patent/CN112396593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20164Salient point detection; Corner detection

Abstract

The invention relates to a closed loop detection method based on key frame selection and local features. The key frames are selected through KLT sparse optical flow tracking, the motion speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be well processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved.

Description

Closed loop detection method based on key frame selection and local features
Technical Field
The invention relates to the field of positioning and navigation based on vision in autonomous inspection of unmanned aerial vehicles, in particular to a closed-loop detection method based on key frame selection and local features.
Background
In the intelligent inspection process of the unmanned aerial vehicle, the unmanned aerial vehicle needs to autonomously determine the operation required to be performed according to the environmental information. Therefore, autonomous positioning and environmental map sensing and construction are key links in autonomous inspection of the unmanned aerial vehicle. In recent years, development of visual SLAM (simultaneous localization and mapping) technology has improved the capability of autonomous localization and mapping of mobile robots. The closed-loop detection is an important component in the visual SLAM system, is used for detecting whether the mobile robot returns to a place visited once, and plays an extremely important role in reducing the positioning error of the mobile robot and constructing a globally consistent environment map. The closed-loop detection matches the current frame with the key frame, and judges whether to be closed according to the matching degree, so that the correct selection of the key frame is crucial to the closed-loop detection.
The Chinese patent application document with the publication number of CN109902619A and the publication date of 2019, 6 and 18 discloses an image closed-loop detection method and a system, and the method comprises the following steps: extracting a FAST corner point for each frame image, and calculating a BRIEF operator; substituting the BRIEF operator into a pre-established word bag model to obtain a visual word corresponding to the operator; the visual words are used for establishing vector description of the image; judging whether a current image is likely to generate a closed loop or not based on a tracking prediction algorithm, and predicting the position of the likely generated closed loop to obtain a closed loop candidate set; evaluating the similarity degree of the current image and each image in the closed-loop candidate set through the visual word vector, and taking the image with the highest similarity in the closed-loop candidate set as a candidate image; carrying out normalization processing on the candidate image to obtain a normalized image; and calculating an ORB global operator of the normalized image to complete the structure check of the candidate image. The invention can effectively accelerate the detection algorithm and provide more accurate closed loop detection performance.
The method belongs to a closed loop detection method based on a visual bag-of-words model, and comprises the steps of extracting local feature points and descriptors of an input image, obtaining BoW vector representation of the input image by means of a visual dictionary, and judging whether to be closed loop or not through a tracking prediction algorithm. Closed-loop detection based on the visual bag-of-words model has better robustness under the condition of changing the visual angle of an image, but is difficult to process the condition of changing the appearance. Meanwhile, the method lacks selection of key frames, only takes the similarity as a candidate image, and the calculation amount is large, so that the final detection speed is influenced.
Disclosure of Invention
The invention aims to solve the problem of slow detection speed in the prior art, and provides a closed-loop detection method based on key frame selection and local features.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a closed loop detection method based on key frame selection and local features comprises the following steps:
the method comprises the following steps: an input image acquired by a mobile robot; determining a first frame of an input image sequence as a key frame, extracting Shi _ Tomasi corner points of a previous key frame of a current input image, tracking the corner points in the current input image iteratively by adopting a sparse optical flow tracking algorithm, and if the number of the corner points which cannot be tracked is greater than a threshold value, determining the current input image as a new key frame;
step two: extracting global features from the current input image by adopting a convolutional neural network trained by an image classification data set, and inserting the extracted global features into a layered navigable small-world map of an approximate nearest neighbor retrieval algorithm if the current input image is a key frame;
step three: in the retrieval range of the current input image, retrieving a key frame most similar to the current input image as a closed-loop candidate key frame of the current image through HNSW, and taking all images between the closed-loop candidate key frame and a key frame next to the closed-loop candidate key frame as a closed-loop candidate image queue;
step four: introducing geometric consistency check, respectively extracting ORB characteristic points and corresponding local difference binary descriptors LDB from the input image and the retrieved closed-loop candidate image, and respectively matching the input image with descriptors of images in a closed-loop candidate image queue;
step five: the closed-loop candidate image which is most matched with the LDB descriptor of the current input image is used as an optimal closed-loop candidate image, the feature points matched with the two images are input into a random sampling consistency algorithm to further eliminate mismatching and solve a basic matrix, and if the number of the inner points between the two images is less than a threshold value, the two images do not form a closed loop; if the number of inner points between the two images is larger than the threshold value, the two images may form a closed loop;
step six: and (4) introducing time consistency check, and if the continuous 2 frames of images after the current input image all meet the threshold condition of the step five, considering that the input image and the closed-loop candidate image form a group of closed loops.
Preferably, in the first step, the corner points are iteratively tracked in the current input image by using a sparse optical flow tracking algorithm KLT, specifically:
current input image I i Is marked as I k-1 For image I i And I k-1 Carrying out graying processing to obtain an image G i 、G k-1 (ii) a Extracting image G k-1 Shi _ Tomasi corner, set image I i And I k-1 The brightness of the middle pixel point is kept constant before and after movement, and an image G is calculated k-1 At the center point P (x, y) in the image G i Position P (x + dx, y + dy) and optical flow
Figure BDA0002803960130000031
The specific calculation steps are as follows: for the current input image I i Performing graying to obtain an image G i Image I i The gray image of the previous key frame is G k-1 Extracting an image G k-1 The Shi _ Tomasi corner point. For image G k-1 、G i And respectively carrying out Gaussian pyramid transformation to obtain L layers of images with different resolutions. At L m In a layer, assume G k-1 At the corner P (x, y) in the image G i To point P (x + dx, y + dy), taking time dt. Because the luminance keeps invariable before and after the pixel moves in two pictures, promptly:
I(x,y,t)=I(x+dx,y+dy,t+dt) (1)
where I (x, y, t) represents the brightness of the pixel P (x, y) at time t, and I (x + dx, y + dy, t + dt) represents the shifted image G i Brightness at the middle pixel point P (x + dx, y + dy). I (x + dx, y + dy, t + dt) can be decomposed by Taylor's equation as:
Figure BDA0002803960130000032
wherein epsilon is infinitesimal and can be ignored. Equation (1) can therefore be simplified to:
Figure BDA0002803960130000033
Figure BDA0002803960130000034
both sides are divided by dt simultaneously:
Figure BDA0002803960130000035
let u, v be the velocity components of the flow along the X-axis and Y-axis, respectively, i.e.
Figure BDA0002803960130000036
In addition, note
Figure BDA0002803960130000037
In this case, equation (5) can be written as:
I x u+I y v+I t =0 (8)
assuming that the pixel points around P (x, y) keep the same moving distance with P (x, y), a window with size of (5, 5) is taken around P (x, y), and for the pixel points in the window:
Figure BDA0002803960130000041
and solving the optimal solution of the equation system by adopting a least square method so as to minimize the matching error sum in the window. Equation (9) can be abbreviated as:
Ad=b (10)
multiplying both sides by A T :(A T A)d=A T b (11)
At this time, velocity vectors u and v of the optical flow along the X axis and the Y axis are obtained as follows:
Figure BDA0002803960130000042
the L < th > value can be calculated by solving the u and v values m Corner point P (x, y) in layer in image G i Position P (x + dx, y + dy) and optical flow
Figure BDA0002803960130000043
Will L m The optical flow value obtained by layer calculation is taken as L m-1 The initial value of the laminar flow, and calculate L m-1 The precise value of the laminar flow until the lowest layer L is calculated 0 The optical flow of the original image and the tracked corner point P (x + dx, y + dy).
Preferably, if the number of corner points that cannot be tracked is greater than the threshold, it is determined that the current input image is a new key frame specifically:
key frame image G k-1 At the current input image G i When KLT sparse optical flow tracking is performed, if the following occurs, it is considered that tracking has failed:
(1) Corner point P (x, y) at G i Out of image range;
(2) The sum of matching errors in the neighborhood of the matching corners is larger than a threshold value;
if the number of the corner points which fail to track is larger than the set threshold value, the current input image I is considered i Is a new key frame.
Preferably, in the second step, the extracting global features of the current input image by using the convolutional neural network trained by the image classification dataset specifically includes: for the current input image I i And preprocessing is carried out, the input of the convolutional neural network requires to adjust the size of the image, and the output of the last but one full connection of the convolutional neural network is used as the global characteristic of the image.
Preferably, in the third step, if the current input image is a key frame, the specific process of inserting the extracted global features into the hierarchical navigable small-world map of the approximate nearest neighbor search algorithm is as follows: if the current input image I i Selected as key frames, are randomized by exponentially decaying probability distribution functionsSpecifying an image I i Characteristic node of (1) the highest level number l in the HNSW structure max Inserting the feature node into l max To the bottom layer 0 In all layers of (a). And searching M nodes nearest to the node in each layer respectively, and connecting the new feature node with the M nodes nearest to the new feature node.
Preferably, in the second step, the search range of the current input image is specifically:
U sa =U before -U fr×ct
wherein, U sa Indicating a search range of the input image; u shape before A set representing all images preceding the current input image; fr is the frame rate of the camera; ct is a time constant; u shape fr×ct Is a set of fr × ct frame images preceding the current input image.
Preferably, in the fourth step, the specific process of extracting ORB feature points and corresponding local differential binary descriptors LDB from the current input image and the retrieved closed-loop candidate image queue includes:
respectively extracting ORB characteristic points from the current input image and the closed loop candidate image queue, and for each ORB characteristic point k ij In k, with ij Cropping a block S of size S × S for the center ij Will S ij Divided into c x c mesh units of equal size
Figure BDA0002803960130000051
Calculating the average intensity I of each grid unit avg And gradient d x 、d y . For S ij Of any two grid cells
Figure BDA0002803960130000052
Performing binary test to obtain binary code as the sum of characteristic points k ij Corresponding binary LDB descriptors.
Preferably, for S ij Of any two grid cells
Figure BDA0002803960130000053
ExecuteThe binary test specifically comprises the following steps:
Figure BDA0002803960130000054
wherein f (m) and f (n) respectively represent grid cells
Figure BDA0002803960130000055
Average intensity of avg And gradient d x 、d y The value is obtained.
Preferably, in the fourth step, the input image is matched with the descriptors of the images in the closed-loop candidate image queue, specifically, the descriptors are matched
Input image I using Hamming distance i And closed loop candidate image I n For the input image I i LDB descriptor of (1)
Figure BDA0002803960130000056
In the candidate image I n In search and
Figure BDA0002803960130000057
two descriptors with the closest distance
Figure BDA0002803960130000061
If it is
Figure BDA0002803960130000062
And with
Figure BDA0002803960130000063
If the following conditions are satisfied, the product is considered to be
Figure BDA0002803960130000064
And
Figure BDA0002803960130000065
is a pair of satisfactory feature matching:
Figure BDA0002803960130000066
wherein the content of the first and second substances,
Figure BDA0002803960130000067
respectively represent feature descriptors
Figure BDA0002803960130000068
And
Figure BDA0002803960130000069
hamming distance between, epsilon d The value is usually less than 1 for the distance scaling factor.
Preferably, the Hamming distance is adopted for the input image I i And closed loop candidate image I n The specific matching of the LDB descriptors is as follows:
Figure BDA00028039601300000610
wherein d is 1 ,d 2 Represents two LDB descriptors, d i Denotes d 1 ,d 2 Bit i of the descriptor.
Compared with the prior art, the invention has the beneficial effects that:
1. the key frames are selected through KLT sparse optical flow tracking, the movement speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be better processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved.
2. The invention checks whether the two images form a closed loop or not through the local differential binary descriptor LDB, thereby not only obtaining the geometric topological relation between the two images, but also verifying whether the two images form the closed loop or not, and improving the precision of closed loop detection.
3. The invention extracts the global features of the image by adopting the convolutional neural network trained by the image classification dataset and uses the global features for nearest neighbor image retrieval, thereby being capable of better coping with scenes with appearance changes.
Drawings
FIG. 1 is a flow chart of a closed loop detection method based on key frame selection and local features of the present invention;
FIG. 2 is a flowchart of key frame selection for a closed loop detection method based on key frame selection and local features according to the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms such as "upper", "lower", "left", "right", "long", "short", etc., indicating orientations or positional relationships based on the orientations or positional relationships shown in the drawings, it is only for convenience of description and simplicity of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationships in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
The technical scheme of the invention is further described in detail by the specific embodiments and the accompanying drawings:
examples
Fig. 1-2 show an embodiment of a closed loop detection method based on key frame selection and local features, which includes the following steps:
the method comprises the following steps: the first frame of the input image sequence is identified as the key frame, the current input image I i Is marked as I k-1 For image I i And I k-1 Performing graying to obtain an image G i 、G k-1 . Extracting image G k-1 Shi _ Tomasi corner point; for image G k-1 、G i Respectively carrying out Gaussian pyramid transformation to obtain L layers of images with different resolutions;
due to the image G k-1 、G i The brightness is kept constant before and after the movement of the middle pixel point, and the L-th pixel point is calculated by solving the velocity components u and v of the optical flow along the X axis and the Y axis m Corner point P (x, y) in layer in image G i Position P (x + dx, y + dy) and optical flow
Figure BDA0002803960130000071
Mixing L with m The optical flow value obtained by layer calculation is taken as L m-1 Initial value of laminar flow, and calculating L m-1 The precise value of the laminar flow until the lowest layer L is calculated 0 The optical flow of the original image and the traced corner point P (x + dx, y + dy).
Image G k-1 In image G i When KLT sparse optical flow tracking is performed, if the following occurs, it is considered that tracking has failed:
(1) Corner point P (x, y) at G i Out of the image range.
(2) The sum of the matching errors in the neighborhood of the matching corner points in the two images is greater than a threshold.
If the number of the corner points which fail to track is larger than the set threshold value, the current input image I is considered i Is a new key frame.
Step two: for the current input image I i Preprocessing is performed to resize the image to 224 x 224 pixels. Extraction of image I Using convolutional neural network VGG16 trained from Places365-standard datasets i The output of the penultimate fully connected layer of the VGG16 network will be the image I i Global feature f of glo,i . And if the current input image is a key frame, inserting the extracted global features into a hierarchical navigable small world map (HNSW) of an approximate nearest neighbor search algorithm.
Step three: at the current input image I i In the search range of (2), the most phase with the current input image is searched by HNSWThe similar key frame is used as a closed loop candidate key frame of the current image, and all images between the closed loop candidate key frame and the key frame of the next frame are used as a closed loop candidate image queue. Since the image sequence transmitted by the mobile robot has high similarity between adjacent images, the retrieval range of the current input image is U sa All key frames within:
U sa =U before -U fr×ct
in the formula of U before For at the current input image I i Set of all previous images, fr frame rate of camera, ct time constant, U fr×ct Is a set of fr × ct frame images preceding the current input image.
Step four: introducing geometric consistency check to the current input image I i And extracting ORB characteristic points respectively with the retrieved closed loop candidate image queue. For each ORB feature point k ij In k, with ij Cropping a block S of size S x S for the center ij . Secondly, adding S ij Divided into c x c mesh units of equal size
Figure BDA0002803960130000081
Calculating the average intensity I of each grid unit avg And gradient d x 、d y . For S ij Of any two grid cells
Figure BDA0002803960130000082
The binary test is performed as follows:
Figure BDA0002803960130000083
wherein f (m) is a grid unit
Figure BDA0002803960130000084
Average intensity of avg And gradient d x 、d y F (n) represents a grid cell
Figure BDA0002803960130000085
Average intensity of avg And gradient d x 、d y The value of (c). To S ij After the c × c grid units all execute binary test, the obtained binary code is the sum of the characteristic point k ij Corresponding binary LDB descriptors.
After the current input image I is obtained i After the LDB descriptor of the closed loop candidate image queue, the Hamming distance is respectively adopted for the input image I i With pictures I in a closed-loop candidate picture queue q,n For I, for the LDB descriptor of i LDB descriptor of (1)
Figure BDA0002803960130000086
In I q,n In search and
Figure BDA0002803960130000087
two LDB descriptors with the nearest distance
Figure BDA0002803960130000088
If it is
Figure BDA0002803960130000089
And
Figure BDA00028039601300000810
if the following conditions are satisfied, the product is considered to be
Figure BDA00028039601300000811
And
Figure BDA00028039601300000812
is a good feature match:
Figure BDA00028039601300000813
wherein the content of the first and second substances,
Figure BDA00028039601300000814
presentation descriptor
Figure BDA00028039601300000815
And
Figure BDA00028039601300000816
the Hamming distance between the two electrodes,
Figure BDA00028039601300000817
presentation descriptor
Figure BDA00028039601300000818
And
Figure BDA00028039601300000819
hamming distance between them. Epsilon d The distance scaling factor is usually smaller than 1.
Input image I using Hamming distance i And closed loop candidate image I n The specific steps for matching the LDB descriptors are:
Figure BDA00028039601300000820
wherein d is 1 ,d 2 Representing two LDB descriptors, d i Denotes d 1 ,d 2 Bit i of the descriptor.
Step five: with the current input image I i The closed loop candidate image with the most matched LDB descriptor is used as the optimal closed loop candidate image, the matched feature points of the two images are input into a random sampling consistency algorithm (RANSAC) to further eliminate mismatching and solve a basic matrix; if the number of the inner points between the two images is less than the threshold value, the two images do not form a closed loop; if the number of inliers between two images is not less than the threshold, the two images may form a closed loop.
Step six: checking the consistency of the incoming time if the current input image I i And C, if the subsequent 2 continuous frame images meet the threshold condition of the step five, the current input image and the optimal closed-loop candidate image are considered to form a group of closed loops.
The beneficial effects of this example: 1. the key frames are selected through KLT sparse optical flow tracking, the movement speed of the mobile robot does not need to be considered, meanwhile, the images at the corners can be better processed, and the selected key frames are more representative. Meanwhile, the key frame is selected, so that the operation speed in the matching process can be reduced, and the detection speed of the whole method is improved. 2. The invention checks whether the two images form a closed loop or not through the local differential binary descriptor LDB, thereby not only obtaining the geometric topological relation between the two images, but also verifying whether the two images form the closed loop or not, and improving the precision of closed loop detection. 3. The invention extracts the global features of the image by adopting the convolutional neural network trained by the image classification dataset and uses the global features for nearest neighbor image retrieval, thereby being capable of better coping with scenes with appearance changes.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A closed loop detection method based on key frame selection and local features is characterized by comprising the following steps:
the method comprises the following steps: an input image acquired by a mobile robot; determining a first frame of an input image sequence as a key frame, extracting Shi _ Tomasi corner points of a previous key frame of a current input image, iteratively tracking the corner points in the current input image by adopting a sparse optical flow tracking algorithm, and determining the current input image as a new key frame if the number of corner points which cannot be tracked is greater than a threshold value;
step two: extracting global features of the current input image by adopting a convolutional neural network trained by an image classification data set, and if the current input image is a key frame, inserting the extracted global features into a layered navigable small world map approximate to a nearest neighbor retrieval algorithm;
step three: in the retrieval range of the current input image, retrieving a key frame most similar to the current input image as a closed-loop candidate key frame of the current image through HNSW, and taking all images between the closed-loop candidate key frame and a key frame next to the closed-loop candidate key frame as a closed-loop candidate image queue;
step four: introducing geometric consistency check, respectively extracting ORB characteristic points and corresponding local difference binary descriptors LDB from the input image and the retrieved closed-loop candidate image, and respectively matching the input image with descriptors of images in a closed-loop candidate image queue;
step five: the closed-loop candidate image which is most matched with the LDB descriptor of the current input image is used as an optimal closed-loop candidate image, the feature points matched with the two images are input into a random sampling consistency algorithm to further eliminate mismatching and solve a basic matrix, and if the number of the inner points between the two images is less than a threshold value, the two images do not form a closed loop; if the number of inner points between the two images is larger than the threshold value, the two images may form a closed loop;
step six: and (4) introducing time consistency check, and if the continuous 2 frames of images after the current input image all meet the threshold condition of the step five, considering that the input image and the closed-loop candidate image form a group of closed loops.
2. A closed-loop detection method based on key-frame selection and local features as claimed in claim 1, characterized in that in said step one, the corner points are iteratively tracked in the current input image using a sparse optical flow tracking algorithm KLT, specifically:
current input image I i Is marked as I k-1 For image I i And I k-1 Carrying out graying processing to obtain an image G i 、G k-1 (ii) a Extracting image G k-1 Shi _ Tomasi corner point, set image I i And I k-1 The brightness is kept constant before and after the movement of the middle pixel point, and an image G is calculated k-1 At the center point P (x, y) in the image G i Position P (x + dx, y + dy) and optical flow
Figure FDA0002803960120000021
3. The method according to claim 2, wherein if the number of corner points that cannot be tracked is greater than a threshold, the method considers that the current input image is a new key frame specifically as follows:
key frame image G k-1 At the current input image G i When KLT sparse optical flow tracking is performed, if the following situations occur, the tracking is considered to be failed:
(1) Corner point P (x, y) at G i Out of image range;
(2) The sum of the matching errors in the neighborhood of the matching corner points is greater than a threshold value;
if the number of corner points which fail to track is greater than the set threshold value, the current input image I is considered i Is a new key frame.
4. The method as claimed in claim 3, wherein in the second step, the current input image is extracted with a convolutional neural network trained by the image classification dataset by: for the current input image I i And preprocessing is carried out, the input of the convolutional neural network requires to adjust the size of the image, and the output of the last but one full connection of the convolutional neural network is used as the global characteristic of the image.
5. The method according to claim 3, wherein in the third step, if the current input image is a key frame, the specific process of inserting the extracted global features into the hierarchical navigable small-world map of the approximate nearest neighbor search algorithm is as follows: if the current input image I i Selected as key frames, the image I is randomly assigned by an exponentially decaying probability distribution function i Characteristic node of (1) the highest level number l in the HNSW structure max Insert the feature node into l max To the bottom layer l 0 Of all layers of (a). And searching M nodes nearest to the node in each layer respectively, and connecting the new feature node with the M nodes nearest to the new feature node.
6. The method according to claim 1, wherein in the second step, in the search range of the current input image, the method specifically comprises:
U sa =U before -U fr×ct
wherein, U sa Indicating a search range of the input image; u shape before A set representing all images preceding the current input image; fr is the frame rate of the camera; ct is a time constant; u shape fr×ct Is a set of fr × ct frame images preceding the current input image.
7. The method as claimed in claim 1, wherein in the fourth step, the specific process of extracting ORB feature points and corresponding local differential binary descriptors LDB from the current input image and the retrieved closed-loop candidate image queue is as follows:
respectively extracting ORB characteristic points from the current input image and the closed loop candidate image queue, and for each ORB characteristic point k ij In k is given ij Cropping a block S of size S x S for the center ij Will S ij Divided into c x c mesh units of equal size
Figure FDA0002803960120000031
Calculating the average intensity I of each grid cell avg And gradient d x 、d y (ii) a For S ij Of any two grid cells
Figure FDA0002803960120000032
Executing binary test to obtain binary code as the sum of characteristic points k ij Corresponding binary LDB descriptors.
8. The method of claim 7, wherein for S, the closed loop detection method is based on key frame selection and local feature ij Of any two grid cells
Figure FDA0002803960120000033
Executing binary test, specifically:
Figure FDA0002803960120000034
wherein f (m) and f (n) respectively represent grid cells
Figure FDA0002803960120000035
Average intensity of avg And gradient d x 、d y The value is obtained.
9. The method according to claim 8, wherein in step four, the input image is matched with the descriptors of the images in the closed-loop candidate image queue, specifically, the descriptors of the images in the closed-loop candidate image queue are matched
Input image I using Hamming distance i And closed loop candidate image I n For the input image I i LDB descriptor of (1)
Figure FDA0002803960120000036
In the candidate image I n In search and
Figure FDA0002803960120000037
two descriptors with the closest distance
Figure FDA0002803960120000038
If it is
Figure FDA0002803960120000039
And
Figure FDA00028039601200000310
if the following conditions are satisfied, the product is considered to be
Figure FDA00028039601200000311
And
Figure FDA00028039601200000312
is a pair of satisfactory feature matching:
Figure FDA00028039601200000313
wherein the content of the first and second substances,
Figure FDA00028039601200000314
respectively represent feature descriptors
Figure FDA00028039601200000315
And with
Figure FDA00028039601200000316
Hamming distance between, epsilon d The distance scaling factor is usually smaller than 1.
10. The method of claim 9, wherein the Hamming distance is used for the input image I i And closed loop candidate image I n The specific steps for matching the LDB descriptors are:
Figure FDA00028039601200000317
wherein, d 1 ,d 2 Represents two LDB descriptors, d i Denotes d 1 ,d 2 Bit i of the descriptor.
CN202011360902.8A 2020-11-27 2020-11-27 Closed loop detection method based on key frame selection and local features Active CN112396593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011360902.8A CN112396593B (en) 2020-11-27 2020-11-27 Closed loop detection method based on key frame selection and local features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011360902.8A CN112396593B (en) 2020-11-27 2020-11-27 Closed loop detection method based on key frame selection and local features

Publications (2)

Publication Number Publication Date
CN112396593A CN112396593A (en) 2021-02-23
CN112396593B true CN112396593B (en) 2023-01-24

Family

ID=74604695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011360902.8A Active CN112396593B (en) 2020-11-27 2020-11-27 Closed loop detection method based on key frame selection and local features

Country Status (1)

Country Link
CN (1) CN112396593B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM
CN109902619A (en) * 2019-02-26 2019-06-18 上海大学 Image closed loop detection method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM
CN109902619A (en) * 2019-02-26 2019-06-18 上海大学 Image closed loop detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于ORB关键帧匹配算法的机器人SLAM实现;艾青林等;《机电工程》;20160520(第05期);全文 *

Also Published As

Publication number Publication date
CN112396593A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
Ding et al. Object detection in aerial images: A large-scale benchmark and challenges
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
CN111126359B (en) High-definition image small target detection method based on self-encoder and YOLO algorithm
CN114202672A (en) Small target detection method based on attention mechanism
CN110781262B (en) Semantic map construction method based on visual SLAM
CN110287826B (en) Video target detection method based on attention mechanism
CN110738673A (en) Visual SLAM method based on example segmentation
CN109785298B (en) Multi-angle object detection method and system
CN113506318B (en) Three-dimensional target perception method under vehicle-mounted edge scene
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN109063549B (en) High-resolution aerial video moving target detection method based on deep neural network
Dong et al. Learning a robust CNN-based rotation insensitive model for ship detection in VHR remote sensing images
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
CN111723660A (en) Detection method for long ground target detection network
Saleem et al. Neural network-based recent research developments in SLAM for autonomous ground vehicles: A review
CN113724388B (en) High-precision map generation method, device, equipment and storage medium
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
Ali et al. A life-long SLAM approach using adaptable local maps based on rasterized LIDAR images
CN112651294A (en) Method for recognizing human body shielding posture based on multi-scale fusion
CN111932612A (en) Intelligent vehicle vision positioning method and device based on second-order hidden Markov model
CN116721206A (en) Real-time indoor scene vision synchronous positioning and mapping method
CN112396593B (en) Closed loop detection method based on key frame selection and local features
CN115187614A (en) Real-time simultaneous positioning and mapping method based on STDC semantic segmentation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant