CN113191435B - Image closed-loop detection method based on improved visual dictionary tree - Google Patents

Image closed-loop detection method based on improved visual dictionary tree Download PDF

Info

Publication number
CN113191435B
CN113191435B CN202110493714.0A CN202110493714A CN113191435B CN 113191435 B CN113191435 B CN 113191435B CN 202110493714 A CN202110493714 A CN 202110493714A CN 113191435 B CN113191435 B CN 113191435B
Authority
CN
China
Prior art keywords
image
score
dictionary tree
visual dictionary
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110493714.0A
Other languages
Chinese (zh)
Other versions
CN113191435A (en
Inventor
赵勃
杭程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110493714.0A priority Critical patent/CN113191435B/en
Publication of CN113191435A publication Critical patent/CN113191435A/en
Application granted granted Critical
Publication of CN113191435B publication Critical patent/CN113191435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image closed-loop detection method based on an improved visual dictionary tree, which comprises the following steps: step 1, establishing a visual dictionary tree by utilizing a layered K-means clustering scheme, and describing at least two images of a preset frame by using each node in the visual dictionary tree and a score vector obtained by TF-IDF entropy of each node; step 2, carrying out similarity calculation on the images in the step 1; and 3, processing the miscorrect closed loop by using the constraint relation of the image on time and space. The method effectively reduces the problem of perception ambiguity in closed-loop detection and can effectively improve the recall rate of closed-loop detection.

Description

Image closed-loop detection method based on improved visual dictionary tree
Technical Field
The invention relates to an image closed-loop detection method based on an improved visual dictionary tree, and belongs to the technical field of mobile robot navigation.
Background
With the rapid development of modern high-tech technologies, the research on mobile robot technology is also continuously advanced. The Simultaneous positioning and Mapping technology (SLAM) which receives much attention in automatic driving, unmanned aerial vehicle autonomous navigation and scene three-dimensional reconstruction is a key technology for realizing that a mobile robot can autonomously complete tasks. The simultaneous positioning and mapping technology means that when a mobile robot enters an unknown environment, a 3D environment map of the current environment of the mobile robot is constructed through sensors such as a camera, a laser radar and an IMU carried by the mobile robot, and the position of the robot in the map can be determined simultaneously. In the visual SLAM problem with a camera as a sensor, errors are accumulated continuously by a system along with the increase of tracking time, and the closed-loop detection technology has good effects on eliminating the accumulated errors of the position and posture estimation of a robot and keeping the accuracy of a map for a long time by judging whether the robot returns to an environment area visited before, and is a key link and a basic problem in the SLAM technology.
The existing closed loop detection scheme has the problems of perception ambiguity under the condition of similar scenes and low recall rate under the condition of ensuring the accuracy.
In view of the above, it is necessary to provide an improved visual dictionary tree-based image closed-loop detection method to solve the above problems.
Disclosure of Invention
The invention aims to provide an image closed-loop detection method based on an improved visual dictionary tree, so that the problem of perception ambiguity existing in closed-loop detection is effectively reduced, and the recall rate of closed-loop detection can be effectively improved.
In order to achieve the above object, the present invention provides an image closed-loop detection method based on an improved visual dictionary tree, comprising the following steps:
the image closed-loop detection method based on the improved visual dictionary tree comprises the following steps:
step 1, establishing a visual dictionary tree by utilizing a layered K-means clustering scheme, and describing at least two images of a preset frame by using each node in the visual dictionary tree and a score vector obtained by TF-IDF entropy of each node;
step 2, carrying out similarity calculation on the images in the step 1;
and 3, processing the misadjustment closed loop by using the constraint relation of the image on time and space.
As a further improvement of the present invention, step 1 comprises:
step 1-1, establishing a visual dictionary tree by adopting a hierarchical K-means clustering scheme: creating a tree with kappa branches and l layers, recursively calling a K mean value clustering method for each branch of each layer to obtain kappa finer branches of the next layer, and calling the kappa finer branches to the l layer to finish;
step 1-2, extracting image features from the collected image, and projecting the image features to a visual dictionary tree to obtain a description vector corresponding to the image;
1-3, expressing the scoring weight of the image at different nodes by using TF-IDF entropy of different nodes in the visual dictionary tree:
Figure GDA0003685499480000021
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000022
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image feature and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with the existing image features projected on the node i, lambda i Representing the coefficient of variation; TF represents the frequency of a certain word appearing in an image, and the higher the frequency is, the higher the word discrimination is; the IDF represents the frequency of a certain word in all words, and the lower the frequency is, the more distinguishing the classification of the image is;
1-4, expressing the score vector of the image in the whole visual dictionary tree by using the score weights of the image at different nodes as follows:
W(P)=(W 1 (P),W 2 (P),…,W L (P))
wherein W (P) represents a score vector of the image P, W 1 (P) a score vector, W, for image P at the first level 2 (P) a score vector, W, for image P on the second layer L (P) a score vector representing picture P on level L;
1-5, the score vector of the image P at the l-th layer is represented as:
Figure GDA0003685499480000031
wherein, W l (P) represents a score vector of the picture P on the l-th layer,
Figure GDA0003685499480000032
a score vector representing the 1 st node of the image P on the l-th level,
Figure GDA0003685499480000033
a score vector representing the 2 nd node of image P on the l-th level,
Figure GDA0003685499480000034
indicating that the picture P is on the l-th layer l Score vectors for individual nodes.
As a further improvement of the invention, in step 1, the TF-IDF entropy of each node is expressed as:
Figure GDA0003685499480000035
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000036
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image feature and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with image features projected to the node i, lambda i The coefficient of variation is indicated.
As a further development of the invention, the lambda i The calculation formula of (c) is:
Figure GDA0003685499480000037
wherein CV is i Coefficient of variation representing the number of words of the ith node, alpha being the coefficient of variation average scale factor, k ι Representing the number of words.
As a further improvement of the invention, the CV is i The calculation formula of (c) is:
Figure GDA0003685499480000038
where σ denotes a standard deviation of the number of occurrences of the word of the ith node, and μ denotes an average value of the number of occurrences of the word of the ith node.
As a further improvement of the present invention, step 2 comprises:
step 2-1, representing similarity scores of words by using the minimum value of the score weights of the image P and the image Q in the same word;
step 2-2, when the image M, the image P and the image Q exist, if the similarity score of the image M and the image Q is the same as the similarity score between the image P and the image Q, the step 2-3 is carried out;
step 2-3, improving a calculation formula of the similarity score, which is as follows:
Figure GDA0003685499480000041
wherein the content of the first and second substances,
Figure GDA0003685499480000042
representing the similarity score of the ith node of image P and image Q at the ith level in the visual dictionary tree,
Figure GDA0003685499480000043
a score vector representing the ith node of the image P at the ith level in the visual dictionary tree,
Figure GDA0003685499480000044
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-4, when the number of words existing in each image is far less than the number of all words in the visual dictionary tree, namely the scoring weight of many words in the image is 0, the calculation formula of the improved similarity score is as follows:
Figure GDA0003685499480000045
wherein the content of the first and second substances,
Figure GDA0003685499480000046
representing the similarity score of the ith node at the ith level in the visual dictionary tree for image P and image Q,
Figure GDA0003685499480000047
a score vector representing the ith node of the image P at the ith level in the visual dictionary tree,
Figure GDA0003685499480000048
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-5, based on the calculation formula of the similarity score of the ith node of the image P and the image Q on the ith layer in the visual dictionary tree, defining the calculation formula of the similarity score of the image P and the image Q on the ith layer as follows:
Figure GDA0003685499480000049
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA00036854994800000410
a similarity score representing the ith node of the image P and the image Q on the ith layer in the visual dictionary tree;
step 2-6, defining the increment of the similarity score of the image P and the image Q on the l level based on the function of the similarity score of the image P and the image Q on the l level to avoid the repeated accumulation of the similarity of the image P and the image Q in the visual dictionary tree from top to bottom, wherein the increment of the similarity score of the image P and the image Q on the l level is defined as follows:
Figure GDA0003685499480000051
wherein S is l (P, Q) represents a similarity calculation score between the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score of the image P and the image Q at the l +1 th layer;
step 2-7, based on the increment of the similarity score in step 2-6, the formula defining the similarity calculation score between P, Q two images is:
Figure GDA0003685499480000052
where K (P, Q) represents a similarity calculation score of the image P and the image Q, S l (P, Q) represents a similarity calculation score between the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score between the image P and the image Q at the l +1 th layer, S L (P, Q) represents a similarity calculation score of the image P and the image Q at the L-th layer,
Figure GDA0003685499480000053
representing the matching strength factor of the visual dictionary tree.
As a further improvement of the present invention, in step 2-1, the similarity score is calculated by the formula:
Figure GDA0003685499480000054
wherein the content of the first and second substances,
Figure GDA0003685499480000055
representing the similarity score of the ith node at the ith level in the visual dictionary tree for image P and image Q,
Figure GDA0003685499480000056
a score vector representing the ith node of the image P on the ith layer,
Figure GDA0003685499480000057
the score vector of the ith node on the ith layer of the image Q.
As a further improvement of the present invention, in step 2-2,
Figure GDA0003685499480000058
wherein the content of the first and second substances,
Figure GDA0003685499480000059
the score weight of the ith node representing the image M on the l-th level of the visual dictionary tree,
Figure GDA00036854994800000510
the scoring weight of the ith node representing the image P on the ith level of the visual dictionary tree,
Figure GDA0003685499480000061
represents the scoring weight of the ith node of image Q on the ith level of the visual dictionary tree.
As a further improvement of the invention, the step 3 comprises the following steps:
3-1, eliminating a false positive closed loop by using time consistency constraint of the image;
and 3-2, eliminating the false closed loop by using the space consistency constraint of the image.
The invention has the beneficial effects that: the invention improves the TF-IDF entropy expression method of each node by adjusting the visual dictionary tree; and a similarity calculation method between the images is improved, so that the problem of perception ambiguity existing in closed-loop detection is effectively reduced, and the recall rate of the closed-loop detection can be effectively improved.
Drawings
FIG. 1 is a flow chart of the image closed-loop detection method based on the improved visual dictionary tree according to the present invention.
Fig. 2 (a) is a graph of accuracy versus recall in a data set fr3_ long _ office _ house according to the improved visual dictionary tree-based image closed-loop detection method and IAB-MAP detection method, FAB-MAP detection method, and RTAB-MAP detection method of the present invention.
Fig. 2 (b) is a graph of accuracy versus recall in the data set fr2_ pioneer _ slam2 of the improved visual dictionary tree-based image closed-loop detection method and IAB-MAP detection method, FAB-MAP detection method, and RTAB-MAP detection method of the present invention. .
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The closed loop detection has the effects that the current position information of the mobile robot and the observed map point information are combined by utilizing a scene recognition algorithm to judge whether the mobile robot returns to a scene which is experienced once, the constraint is added between the current pose and the previous pose, and the accumulated error brought by the system is reduced. The method aims to reduce the problem of perception ambiguity existing in a traditional closed-loop detection algorithm based on a visual bag-of-words model (BOVW).
As shown in fig. 1, to solve the problems in the prior art, the present invention provides an improved image closed-loop detection method based on an improved visual dictionary tree, which includes the following steps:
step 1, establishing a visual dictionary tree by utilizing a layered K-means clustering scheme, and describing at least two images of a preset frame by using each node in the visual dictionary tree and a score vector obtained by TF-IDF entropy of each node; the TF-IDF entropy of each node is expressed as:
Figure GDA0003685499480000071
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000072
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image features and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with the existing image features projected on the node i, lambda i Represents an improved coefficient of variation;
step 2, similarity calculation is carried out on the images in the step 1;
and 3, processing the misadjustment closed loop by using the constraint relation of the image on time and space.
In other words, the above steps include three aspects: 1. a vector description of the improved image; 2. improved methods of similarity calculation between images; 3. closed loop posterior.
1. Improved vector description of images.
The closed-loop detection algorithm of the classic visual bag-of-word model uses a large number of image features to train to obtain a visual dictionary tree, then extracts the image features of the collected images, and projects the image features to the visual dictionary tree, so that the description vector of the images is obtained.
In the invention, the real-time calculation requirement of a large number of images in an actual scene is considered, and the visual dictionary tree is established by adopting a layered K mean value clustering scheme: creating a tree with kappa branches and l layers, recursively calling a K-means clustering method for each branch of each layer to obtain kappa finer branches of the next layer, and calling the branches to the l layer to finish.
In the prior art, the scoring weight of the defined image at different tree nodes can be represented by TF-IDF entropy of each tree node. TF represents the frequency of a certain word appearing in an image, and the higher the frequency is, the higher the word discrimination is; IDF represents the frequency with which a word appears in all words, the lower the frequency, the more discriminative the classification of the image.
Define TF-IDF entropy as:
Figure GDA0003685499480000073
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000081
representing the scoring weight, n, of the ith node of the image P on the l-th level of the tree i And N represents the number of feature points projected to the node i and the total number of feature points, N and N, respectively i Representing the total number of images that need to be processed and the number of images that present the projection of the feature onto node i, respectively.
However, consider a case for the TF-IDF entropy defined by the above formula: as shown in Table 1,. omega. 1 ,ω 2 ,ω 3 The IDF values calculated according to the above formula for these three words are the same, ω from TF point of view 1 The most frequent word occurs, with the highest scoring weight, ω 2 Sub, omega 3 Minimum; but actually fromFrom the word discrimination view, omega 3 The number of times a word appears in each image is relative to ω 1 And omega 2 Said large span, ω 3 The word should get the highest weight. Clearly the two are contradictory.
TABLE 1 number of words in database image
Figure GDA0003685499480000082
Based on this problem, the present invention introduces an improved coefficient of variation to assist the calculation of scoring weights for words. The coefficient of variation defining the number of words is:
Figure GDA0003685499480000083
in the above equation, σ represents the standard deviation of the number of occurrences of the word, and μ represents the average of the number of occurrences of the word. For ω in Table 1 1 In the case of the standard deviation of 0, the coefficient of variation is improved, and the improved coefficient of variation is defined as lambda i
Figure GDA0003685499480000084
Wherein CV is i Representing the coefficient of variation of the word represented by the ith node, alpha being the coefficient of variation average scale factor, kappa ι Representing the number of visual words.
Thus, the improved TF-IDF entropy is expressed as:
Figure GDA0003685499480000085
and using the improved TF-IDF entropy of different nodes in the visual dictionary tree as the score weight of different visual words, thereby obtaining a score vector of words in the image for describing the scene. The score vector of the image P in the whole visual dictionary tree is represented as: w: (W:P)=(W 1 (P),W 2 (P),…,W L (P)), wherein W L (P) represents the score vector of the image on level L, expressed as:
Figure GDA0003685499480000091
specifically, the step 1 includes:
step 1-1, establishing a visual dictionary tree by adopting a hierarchical K-means clustering scheme: creating a tree with kappa branches and l layers, recursively calling a K mean value clustering method for each branch of each layer to obtain kappa finer branches of the next layer, and calling the kappa finer branches to the l layer to finish;
step 1-2, extracting image features from the collected image, and projecting the image features to a visual dictionary tree to obtain a description vector corresponding to the image;
1-3, expressing the scoring weight of the image at different nodes by using TF-IDF entropy of different nodes in the visual dictionary tree:
Figure GDA0003685499480000092
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000093
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image feature and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with image features projected to the node i, lambda i Represents an improved coefficient of variation; TF represents the frequency of a certain word appearing in an image, and the higher the frequency is, the higher the word discrimination is; the IDF represents the frequency of a certain word in all words, and the lower the frequency is, the more distinguishing the classification of the image is;
1-4, expressing the score vector of the image in the whole visual dictionary tree by using the score weights of the image at different nodes as follows:
W(P)=(W 1 (P),W 2 (P),…,W L (P))
wherein W (P) represents a score vector of the image P, W 1 (P) a score vector, W, for image P at the first level 2 (P) a score vector, W, for image P on the second layer L (P) a score vector representing picture P on level L;
in steps 1-5, the score vector of the image P at the l-th layer is represented as:
Figure GDA0003685499480000101
wherein, W l (P) represents a score vector of the picture P on the l-th layer,
Figure GDA0003685499480000102
score vectors representing the 1 st node of the image P on the l-th level,
Figure GDA0003685499480000103
a score vector representing the 2 nd node of image P on the l-th level,
Figure GDA0003685499480000104
indicating that the picture P is on the l-th layer l Score vectors for individual nodes.
2. Improved similarity score algorithm
For the algorithm for calculating the similarity score between images, in the prior art, the BOVW scheme is a calculation formula for expressing the similarity score of a word by using P, Q minimum values of the two images in the same word score weight:
Figure GDA0003685499480000105
such a similarity score is expressed by using the minimum value of the score weight of the same word in the image, and although the similarity degree of a single node can be judged well, there is still a problem. According to the above formulaExpressed, if there are three images: image M, image P and image Q satisfying
Figure GDA0003685499480000106
Then, there is a similarity score between the image M and the image Q which is the same as the similarity score between the image P and the image Q. However, contrary to our cognitive scope, we would consider that the closer the similarity is, the higher the similarity is between the two images, and that image P should be more similar to image Q than to image M.
In order to avoid the above problem of perception ambiguity, and at the same time, considering that the number of words existing in each image is far less than the number of all words in the visual dictionary tree, the score of many words existing in the image is 0, and in order to improve the calculation efficiency of the whole algorithm, the calculation formula of the similarity score is improved as follows:
Figure GDA0003685499480000107
based on the improved calculation formula of the similarity score of the new single node, the similarity score function of the image at the l-th layer is defined as follows:
Figure GDA0003685499480000108
considering that the visual dictionary tree is built layer by layer from top to bottom, a certain layer of space of the visual dictionary tree will inevitably contain a part of the image similarity of the next layer of the layer. Therefore, in order to avoid repeated accumulation of similarity, a scheme of calculating the similarity score increment from the bottom to the top of the lowest layer is adopted, and then the similarity score increment of the ith layer can be expressed as:
Figure GDA0003685499480000111
therefore, the similarity calculation between two images is defined as:
Figure GDA0003685499480000112
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003685499480000113
and representing the matching strength coefficient, and constraining the matching difference between different levels in the dictionary tree.
Specifically, step 2 includes:
step 2-1, representing the similarity score of the words by using the minimum value of the score weights of the images P and Q on the same word;
step 2-2, when the conditions are satisfied
Figure GDA0003685499480000114
If so, namely the similarity score of the image M and the image Q is the same as the similarity score between the image P and the image Q, the step 2-3 is carried out;
step 2-3, improving a calculation formula of the similarity score, as follows:
Figure GDA0003685499480000115
wherein the content of the first and second substances,
Figure GDA0003685499480000116
representing the similarity score of the ith node of image P and image Q at the ith level in the visual dictionary tree,
Figure GDA0003685499480000117
a score vector representing the ith node of image P at the l-th level in the visual dictionary tree,
Figure GDA0003685499480000118
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-4, when the number of words existing in each image is far less than the number of all words in the visual dictionary tree, namely the scoring weight of many words in the image is 0, the calculation formula of the improved similarity score is as follows:
Figure GDA0003685499480000119
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA00036854994800001110
representing the similarity score of the ith node at the ith level in the visual dictionary tree for image P and image Q,
Figure GDA0003685499480000121
a score vector representing the ith node of the image P at the ith level in the visual dictionary tree,
Figure GDA0003685499480000122
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-5, based on the calculation formula of the similarity score of the ith node of the image P and the image Q on the ith layer in the visual dictionary tree, defining the calculation formula of the similarity score of the image P and the image Q on the ith layer as follows:
Figure GDA0003685499480000123
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure GDA0003685499480000124
a similarity score representing the ith node of image P and image Q on the l-th level in the visual dictionary tree;
step 2-6, defining the increment of the similarity score of the image P and the image Q on the l level based on the function of the similarity score of the image P and the image Q on the l level to avoid the repeated accumulation of the similarity of the image P and the image Q in the visual dictionary tree from top to bottom, wherein the increment of the similarity score of the image P and the image Q on the l level is defined as follows:
Figure GDA0003685499480000125
wherein S is l (P, Q) represents a similarity calculation score of the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score of the image P and the image Q at the l +1 th layer;
step 2-7, based on the increment of the similarity score in step 2-6, the formula defining the similarity calculation score between P, Q two images is:
Figure GDA0003685499480000126
where K (P, Q) represents a similarity calculation score of the image P and the image Q, S l (P, Q) represents a similarity calculation score between the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score between the image P and the image Q at the l +1 th layer, S L (P, Q) represents a similarity calculation score between the image P and the image Q at the L-th layer,
Figure GDA0003685499480000127
a matching strength coefficient representing a visual dictionary tree.
3. Closed loop posterior test
Due to the fact that the spatial position relation and semantic correlation among different features of the image are omitted, some closed loops obtained through similarity calculation can be mistaken for correct closed loops. The invention utilizes the constraint relation of images in time and space to process the miscorrect closed loop.
Specifically, step 3 comprises:
and 3-1, eliminating the error closed loop by using time consistency constraint. Because the robot runs in a continuous time, images collected by the robot should be continuous in time, and adjacent images should correspond to the continuous change of the same scene, so that when a closed loop exists at a certain moment, closed loops also exist at the later moments, and if the candidate closed loop does not meet the constraint of the time consistency, the candidate closed loop is rejected.
And 3-2, eliminating the error closed loop by using space consistency constraint. When a closed-loop phenomenon occurs, two images for generating the closed loop should correspond to the same scene, and only the imaging angles of the images are different, so that the two images can eliminate the error closed loop by utilizing space consistency constraint. And calculating a basic matrix by matching the pose of the two images with the feature points, comparing the number of inner points of the basic matrix with a set threshold, and if the number of inner points exceeds the threshold, keeping the two images to form a closed loop.
A plurality of experiments using the improved visual dictionary tree-based image closed-loop detection method of the present invention will be described in detail below. In the closed-loop detection problem, an important index for evaluating the closed-loop detection performance is an accuracy-recall curve, and the accuracy represents that: the proportion of the real closed loop in the closed loops detected by all algorithms, and the recall rate represents: the closed loops that the algorithm correctly detects are a percentage of all the actual closed loops. The calculation formulas of the accuracy rate and the recall rate are respectively shown as the following formulas:
Figure GDA0003685499480000131
Figure GDA0003685499480000132
wherein, TP represents the number of correct closed loops in the closed loops detected by the algorithm, FP represents the number of closed loops which are not actually detected by the algorithm, FN represents the number of closed loops which are not detected by the algorithm and the actual result is the number of closed loops.
As shown in fig. 2 (a) and fig. 2 (b), in order to demonstrate the effect of the formula for calculating the corresponding description vector and the similarity score of the image proposed by the present invention, two data sets were selected from the TUM RGB-D data set for experimental verification. The first is the data set fr3_ long _ office _ house in a complex indoor environment; the second is the data set fr2_ pioneer _ slam2, which is extremely similar in context, and prone to perceptual ambiguities. The image closed-loop detection method based on the improved visual dictionary tree is compared with a plurality of classical image closed-loop detection algorithms including IAB-MAP, FAB-MAP and RTAB-MAP in a data set fr3_ long _ office _ house hold and a data set fr2_ pioneer _ slam2 respectively. As can be seen from fig. 2 (a) and fig. 2 (b), the improved visual dictionary tree-based image closed-loop detection method of the present invention has a higher recall rate with an accuracy of 100%, so that the influence of perceptual ambiguity can be effectively reduced.
In conclusion, each node in the visual dictionary tree and TF-IDF entropy of the node are used for forming a vector to describe at least two images of a preset frame, similarity calculation is carried out on the images represented by the vector, and a constraint relation of the images on time and space is used for processing a miscorrect closed loop; further, the establishment of the visual dictionary tree is adjusted, and a TF-IDF entropy expression method of each node is improved; improving the similarity calculation method to reduce the perception ambiguity problem existing in closed loop detection; and respectively processing the error closed loops by utilizing time consistency and space consistency, eliminating the error closed loops as correct closed loops, and finally determining the correct closed loops. The method can effectively reduce the problem of perception ambiguity in closed-loop detection and can effectively improve the recall rate of the closed-loop detection.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (8)

1. An improved visual dictionary tree-based image closed-loop detection method is characterized by comprising the following steps:
step 1, establishing a visual dictionary tree by utilizing a layered K-means clustering scheme, and describing at least two images of a preset frame by using each node in the visual dictionary tree and a score vector obtained by TF-IDF entropy of each node;
step 2, carrying out similarity calculation on the images in the step 1;
step 2-1, representing similarity scores of words by using the minimum value of the score weights of the image P and the image Q in the same word;
step 2-2, when the image M, the image P and the image Q exist, if the similarity score of the image M and the image Q is the same as the similarity score between the image P and the image Q, the step 2-3 is carried out;
step 2-3, improving a calculation formula of the similarity score, as follows:
Figure FDA0003685499470000011
wherein the content of the first and second substances,
Figure FDA0003685499470000012
representing the similarity score of the ith node of image P and image Q at the ith level in the visual dictionary tree,
Figure FDA0003685499470000013
a score vector representing the ith node of image P at the l-th level in the visual dictionary tree,
Figure FDA0003685499470000014
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-4, when the number of words existing in each image is far less than the number of all words in the visual dictionary tree, namely the scoring weight of many words in the image is 0, the calculation formula of the improved similarity score is as follows:
Figure FDA0003685499470000015
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003685499470000016
representing the phase of the ith node at the l-th level in the visual dictionary tree for image P and image QThe similarity score is obtained by the similarity analysis method,
Figure FDA0003685499470000017
a score vector representing the ith node of the image P at the ith level in the visual dictionary tree,
Figure FDA0003685499470000018
the score vector of the ith node of the image Q on the ith layer in the visual dictionary tree;
step 2-5, based on the calculation formula of the similarity score of the ith node of the image P and the image Q on the ith layer in the visual dictionary tree, defining the calculation formula of the similarity score of the image P and the image Q on the ith layer as follows:
Figure FDA0003685499470000021
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes at the l-th level,
Figure FDA0003685499470000022
a similarity score representing the ith node of the image P and the image Q on the ith layer in the visual dictionary tree;
step 2-6, defining the increment of the similarity score of the image P and the image Q on the l level based on the function of the similarity score of the image P and the image Q on the l level to avoid the repeated accumulation of the similarity of the image P and the image Q in the visual dictionary tree from top to bottom, wherein the increment of the similarity score of the image P and the image Q on the l level is defined as follows:
Figure FDA0003685499470000023
wherein S is l (P, Q) represents a similarity calculation score between the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score of the image P and the image Q at the l +1 th layer;
step 2-7, based on the increment of the similarity score in step 2-6, defining P, Q the formula for the similarity calculation score between the two images as:
Figure FDA0003685499470000024
where K (P, Q) represents a similarity calculation score of the image P and the image Q, S l (P, Q) represents a similarity calculation score between the image P and the image Q at the l-th layer, S l+1 (P, Q) represents a similarity calculation score between the image P and the image Q at the l +1 th layer, S L (P, Q) represents a similarity calculation score between the image P and the image Q at the L-th layer,
Figure FDA0003685499470000025
a matching strength coefficient representing a visual dictionary tree;
and 3, processing the misadjustment closed loop by using the constraint relation of the image on time and space.
2. The improved visual dictionary tree-based image closed-loop detection method according to claim 1, wherein the step 1 comprises:
step 1-1, establishing a visual dictionary tree by adopting a hierarchical K-means clustering scheme: creating a tree with kappa branches and l layers, recursively calling a K mean value clustering method for each branch of each layer to obtain kappa finer branches of the next layer, and calling the kappa finer branches to the l layer to finish;
step 1-2, extracting image features from the collected image, and projecting the image features to a visual dictionary tree to obtain a description vector corresponding to the image;
1-3, expressing the scoring weight of the image at different nodes by using TF-IDF entropy of different nodes in the visual dictionary tree:
Figure FDA0003685499470000031
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes of the l-th level,
Figure FDA0003685499470000032
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image features and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with the existing image features projected on the node i, lambda i Representing the coefficient of variation; TF represents the frequency of a certain word appearing in an image, and the higher the frequency is, the higher the word discrimination is; the IDF represents the frequency of a certain word in all words, and the lower the frequency is, the more distinguishing the classification of the image is;
1-4, expressing the score vector of the image in the whole visual dictionary tree by using the score weights of the image at different nodes as follows:
W(P)=(W 1 (P),W 2 (P),…,W L (P))
wherein W (P) represents a score vector of the image P, W 1 (P) a score vector, W, for image P at the first level 2 (P) score vector, W, on the second layer representing image P L (P) a score vector representing picture P on level L;
1-5, the score vector of the image P at the l-th layer is represented as:
Figure FDA0003685499470000033
wherein, W l (P) represents a score vector of the picture P on the l-th layer,
Figure FDA0003685499470000034
a score vector representing the 1 st node of the image P on the l-th level,
Figure FDA0003685499470000035
a score vector representing the 2 nd node of the image P on the l-th layer,
Figure FDA0003685499470000036
indicating that the picture P is on the l-th layer l Score vectors for individual nodes.
3. The improved visual dictionary tree-based image closed-loop detection method according to claim 1, wherein in step 1, the TF-IDF entropy of each node is expressed as:
Figure FDA0003685499470000037
wherein l represents the number of levels of the visual dictionary tree, i represents the number of nodes of the l-th level,
Figure FDA0003685499470000041
representing the scoring weight, n, of the ith node of the image P on the l-th level of the visual dictionary tree i And N represents the number of feature points projected to the node i by the image feature and the total number of the feature points, N and N i Respectively representing the total number of images to be processed and the number of images with the existing image features projected on the node i, lambda i The coefficient of variation is indicated.
4. The improved visual dictionary tree-based image closed-loop detection method according to claim 3, wherein λ i The calculation formula of (2) is as follows:
Figure FDA0003685499470000042
wherein CV is i Coefficient of variation representing the number of words of the ith node, alpha being the coefficient of variation average scale factor, k ι Representing the number of words.
5. The improved visual dictionary tree-based image closed-loop detection method according to claim 4, wherein the CV is a function of the image closed-loop detection method i The calculation formula of (c) is:
Figure FDA0003685499470000043
where σ denotes a standard deviation of the number of occurrences of the word of the ith node, and μ denotes an average value of the number of occurrences of the word of the ith node.
6. The improved visual dictionary tree-based image closed-loop detection method according to claim 1, wherein in step 2-1, the similarity score is calculated by the formula:
Figure FDA0003685499470000044
wherein the content of the first and second substances,
Figure FDA0003685499470000045
representing the similarity score of the ith node at the ith level in the visual dictionary tree for image P and image Q,
Figure FDA0003685499470000046
a score vector representing the ith node of image P on the ith layer,
Figure FDA0003685499470000047
the score vector of the ith node on the ith layer of the image Q.
7. The improved visual dictionary tree-based image closed-loop detection method according to claim 1, wherein, in step 2-2,
Figure FDA0003685499470000048
wherein the content of the first and second substances,
Figure FDA0003685499470000051
the score weight of the ith node representing the image M on the l-th level of the visual dictionary tree,
Figure FDA0003685499470000052
the scoring weight of the ith node representing the image P on the ith level of the visual dictionary tree,
Figure FDA0003685499470000053
represents the scoring weight of the ith node of image Q on the ith level of the visual dictionary tree.
8. The improved visual dictionary tree-based image closed-loop detection method according to claim 1, wherein step 3 comprises:
3-1, eliminating a false closed loop by using time consistency constraint of the image;
and 3-2, eliminating the false closed loop by using the space consistency constraint of the image.
CN202110493714.0A 2021-05-07 2021-05-07 Image closed-loop detection method based on improved visual dictionary tree Active CN113191435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110493714.0A CN113191435B (en) 2021-05-07 2021-05-07 Image closed-loop detection method based on improved visual dictionary tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110493714.0A CN113191435B (en) 2021-05-07 2021-05-07 Image closed-loop detection method based on improved visual dictionary tree

Publications (2)

Publication Number Publication Date
CN113191435A CN113191435A (en) 2021-07-30
CN113191435B true CN113191435B (en) 2022-08-23

Family

ID=76983907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110493714.0A Active CN113191435B (en) 2021-05-07 2021-05-07 Image closed-loop detection method based on improved visual dictionary tree

Country Status (1)

Country Link
CN (1) CN113191435B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831446A (en) * 2012-08-20 2012-12-19 南京邮电大学 Image appearance based loop closure detecting method in monocular vision SLAM (simultaneous localization and mapping)
CN107886129A (en) * 2017-11-13 2018-04-06 湖南大学 A kind of mobile robot map closed loop detection method of view-based access control model bag of words
CN110472585A (en) * 2019-08-16 2019-11-19 中南大学 A kind of VI-SLAM closed loop detection method based on inertial navigation posture trace information auxiliary

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831446A (en) * 2012-08-20 2012-12-19 南京邮电大学 Image appearance based loop closure detecting method in monocular vision SLAM (simultaneous localization and mapping)
CN107886129A (en) * 2017-11-13 2018-04-06 湖南大学 A kind of mobile robot map closed loop detection method of view-based access control model bag of words
CN110472585A (en) * 2019-08-16 2019-11-19 中南大学 A kind of VI-SLAM closed loop detection method based on inertial navigation posture trace information auxiliary

Also Published As

Publication number Publication date
CN113191435A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN110223324B (en) Target tracking method of twin matching network based on robust feature representation
Cieslewski et al. Point cloud descriptors for place recognition using sparse visual information
CN109871803B (en) Robot loop detection method and device
CN110781790A (en) Visual SLAM closed loop detection method based on convolutional neural network and VLAD
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
Yue et al. Robust loop closure detection based on bag of superpoints and graph verification
CN110969648B (en) 3D target tracking method and system based on point cloud sequence data
CN104794219A (en) Scene retrieval method based on geographical position information
CN111723600B (en) Pedestrian re-recognition feature descriptor based on multi-task learning
CN112287906B (en) Template matching tracking method and system based on depth feature fusion
CN112507778B (en) Loop detection method of improved bag-of-words model based on line characteristics
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN110569706A (en) Deep integration target tracking algorithm based on time and space network
CN109284668A (en) A kind of pedestrian's weight recognizer based on apart from regularization projection and dictionary learning
CN115359366A (en) Remote sensing image target detection method based on parameter optimization
CN109697727A (en) Method for tracking target, system and storage medium based on correlation filtering and metric learning
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN108805032A (en) Fall detection method based on depth convolutional network
CN117213470B (en) Multi-machine fragment map aggregation updating method and system
CN114764870A (en) Object positioning model processing method, object positioning device and computer equipment
CN114067128A (en) SLAM loop detection method based on semantic features
CN114046790A (en) Factor graph double-loop detection method
Wang et al. Object detection algorithm based on improved Yolov3-tiny network in traffic scenes
CN113191435B (en) Image closed-loop detection method based on improved visual dictionary tree
CN111291785A (en) Target detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant