CN110781841A

CN110781841A - Closed loop detection method and device based on SLAM space invariant information

Info

Publication number: CN110781841A
Application number: CN201911039054.8A
Authority: CN
Inventors: 吉长江
Original assignee: Beijing Yingpu Technology Co Ltd
Current assignee: Beijing Yingpu Technology Co Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-02-11

Abstract

The application discloses a closed loop detection method and device based on SLAM space invariant information, and relates to the field of closed loop detection. The method comprises the following steps: acquiring a current image in real time, and establishing a word bag for the current image; analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI (inter-symbol interference) space invariant information; and using the ISI as a query condition, and performing nearest neighbor search in the word bag to complete matching. The device includes: the device comprises a processing module, a conversion module and a matching module. According to the method and the device, ISI is taken as additional data to participate in the search of BoW during closed-loop detection, and the detection accuracy is improved.

Description

Closed loop detection method and device based on SLAM space invariant information

Technical Field

The present invention relates to the field of closed-loop detection, and in particular, to a closed-loop detection method and apparatus based on SLAM space invariant information.

Background

LCD (Loop-Closure Detection) is an important process of visual SLAM (simultaneous localization And Mapping). In brief, closed loop detection is the determination that a scene is now scanned and has not been encountered before. Even if the front end and the rear end exist, the complete correctness of the pose and the landmark still cannot be guaranteed, and due to the existence of noise, when the user walks to the place again next time, the obtained landmark is possibly different from the previous landmark position, which is almost certainly the case. Therefore, without closed loop detection, errors and ghosting in the mapping can occur, and the positioning can be affected.

BoW (Bag-of-Word) is a de facto standard solution for LCDs, the basic idea being to represent each mapped (or referenced) image by an unordered set of local features, called visual words, and then efficiently index and retrieve by inverted indexing. The bag of words model requires a dictionary (dictionary). In short, a classification is made of pixels belonging to something that appear in the scanned scene, similar to a dictionary. And establishing a word bag for each scanned frame, and judging whether pixel points in the frame appear in the dictionary or not. Specifically, 0 and 1 can be used to indicate whether a certain "word" appears or not, or a number greater than 1 can be used to indicate the number of times of appearance in the frame, so that a vector can be formed for a certain frame, and then the similarity of the vectors is compared, so that the similarity of pictures is compared. Assuming that there are tables, chairs, etc. in the dictionary, but actually the establishment of the dictionary is a clustering problem, such as clustering by using the k-means algorithm for unsupervised learning, the "words" in the dictionary are the result of clustering, and generally, the individual numbers are used to identify categories. In order to increase the dictionary lookup speed, a multi-way tree may be needed to establish the dictionary, i.e. multi-layer clustering and the like.

For example, an identification Visual Words to plants for Loop closure detection in ICRA2018 proposes a closed-Loop detection method: the strategy for showing image-to-sequence gathers images close in time and content to generate an image sequence, which is defined as place, and the images are set as I, v most significant SURF feature points are detected on each frame image (v are set for matching of following VW and feature), and the feature vector of each image is shown as d _I. In order to avoid reconstructing discontinuous scenes, such as camera standstill or deceleration, the scene is almost unchanged, and when the number of the special points in the image is less than epsilon, the picture is deleted and does not participate in the place generation. Then VWs (visual word) is generated by GNG (growing neural Gas network), and the local description sub-database Ds is used as the input of GNG which has no preset GNGThe number of clusters of the clustering result, GNG, will incrementally increase the new endpoint (new VW) until the minimum error requirement is met. Next, candidates are found, and a voting method is selected to project the descriptors of the query image to VWs already generated in the database, and in the process of converting the descriptors of the query image, conversion is performed, namely voting. The candidate closed loop search, which is selected as candidate by binomial probability function, needs to have two conditions: the similarity score satisfies a threshold, and the number VWs of pre-candidate locations is greater than the spread value of the distribution. Using KNN classifier, subset d of description of query image _QAnd D belonging to S (m) _{_S(m)}And (6) matching. Image I with the most number of matches ^{^{S}}Is considered a closed-loop latent frame. And then carrying out subsequent consistency check.

However, the existing closed-loop method ignores the feature space relation, is easily affected by viewpoint change and occlusion only according to 2D projection of an object on an image plane, has a large projection quantization error, and causes perception ambiguity, that is, the same word is projected to different regions, but different words are projected to the same region, and the probability of occurrence of such a situation of edge words is higher.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to an aspect of the present application, there is provided a closed loop detection method based on SLAM space invariant information, including:

acquiring a current image in real time, and establishing a word bag for the current image;

analyzing the space structure of the current image to extract space Information, and converting the space Information into an Invariant coordinate system to obtain ISI (Invariant Spatial Information);

and using the ISI as a query condition, and performing nearest neighbor search in the word bag to complete matching.

Optionally, creating a bag of words for the current image, including:

extracting a set of profile descriptors from the current image, treating each profile descriptor as a visual word, checking whether each visual word is a member of a dictionary, and if not, inserting it into the dictionary.

Optionally, analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI spatial invariant information, including:

and analyzing the spatial structure of the current image, extracting local characteristic key points from the current image, and converting the local characteristic key points to an invariant coordinate system by using an ICP (inductively coupled plasma) iterative nearest neighbor method to obtain ISI (inter-symbol interference).

Optionally, performing a nearest neighbor search within the bag of words using the ISI as a query condition to complete matching, including:

and using the ISI as a query condition, performing nearest neighbor search in the word bag to obtain a matching pair of the feature point and the visual word, calculating the Hamming distance between the feature point and the visual word, and filtering out the matching pair of which the Hamming distance exceeds a preset ratio threshold.

Optionally, the method further comprises:

after matching is completed, calculating TF-IDF word frequency-inverse text frequency index scores of ISI in word bags for the obtained matching pairs, obtaining N matching pairs with the TF-IDF scores ranked in the front, obtaining the internal number through verification after RANSAC random sample consistency, and scoring the N matching pairs according to the internal number.

According to another aspect of the present application, there is provided a closed loop detection apparatus based on SLAM space invariant information, including:

a processing module configured to obtain a current image in real time and establish a bag of words for the current image;

the conversion module is configured to analyze the spatial structure of the current image to extract spatial information, and convert the spatial information into an invariant coordinate system to obtain ISI (inter-symbol interference) spatial invariant information;

a matching module configured to perform a nearest neighbor search within the bag of words using the ISIs as query conditions, completing a match.

Optionally, the processing module is specifically configured to:

Optionally, the conversion module is specifically configured to:

Optionally, the matching module is specifically configured to:

Optionally, the apparatus further comprises:

and the scoring module is configured to calculate TF-IDF word frequency-inverse text frequency index scores of the ISI in the word bag for the obtained matching pairs after matching is completed, obtain N matching pairs with the TF-IDF scores ranked in the front, obtain the internal number after RANSAC random sample consistency verification, and score the N matching pairs according to the internal number.

According to yet another aspect of the application, there is provided a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.

According to yet another aspect of the application, a computer-readable storage medium, preferably a non-volatile readable storage medium, is provided, having stored therein a computer program which, when executed by a processor, implements a method as described above.

According to yet another aspect of the application, there is provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method described above.

According to the technical scheme, the current image is obtained in real time, a word bag is established for the current image, the spatial structure of the current image is analyzed to extract spatial information, and the spatial information is converted into an invariant coordinate system to obtain ISI; the ISI is used as a query condition, nearest neighbor searching is carried out in a bag of words to complete matching, closed loop detection based on the ISI is realized, useful spatial information is extracted before a 3D map is compressed into a compact BoW to be represented, after the compact BoW is converted into an invariant coordinate system, the key point position of a visual word is used as the ISI, the ISI is used as additional data to participate in the retrieval of the BoW during closed loop detection, and the detection accuracy is improved.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a flowchart of a closed-loop detection method based on SLAM spatial invariant information according to an embodiment of the present application;

FIG. 2 is a flow chart of a closed loop detection method based on SLAM spatial invariant information according to another embodiment of the present application;

fig. 3 is a block diagram of a closed loop detection apparatus based on SLAM space invariant information according to another embodiment of the present application;

FIG. 4 is a block diagram of a computing device according to another embodiment of the present application;

fig. 5 is a diagram of a computer-readable storage medium structure according to another embodiment of the present application.

Detailed Description

Fig. 1 is a flowchart of a closed-loop detection method based on SLAM spatial invariant information according to an embodiment of the present application. Referring to fig. 1, the method includes:

101: acquiring a current image in real time, and establishing a word bag for the current image;

102: analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI (inter-symbol interference) spatial invariant information;

103: and (3) performing nearest neighbor search in the word bag by using the ISI as a query condition to complete matching.

In this embodiment, optionally, establishing a bag of words for the current image includes:

a set of profile descriptors is extracted from the current image, each profile descriptor is treated as a visual word, and each visual word is checked for being a member of the lexicon and, if not, inserted into the lexicon.

In this embodiment, optionally, analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI spatial invariant information includes:

analyzing the spatial structure of the current image, extracting local feature key points from the current image, and converting the local feature key points to obtain ISI under a constant coordinate system by using an ICP (inductively coupled plasma) iterative nearest neighbor method.

In this embodiment, optionally, using ISI as a query condition, performing nearest neighbor search within a word bag, and completing matching includes:

and (3) using ISI as a query condition, performing nearest neighbor search in the word bag to obtain matching pairs of the feature points and the visual words, calculating the Hamming distance between the feature points and the visual words, and filtering out the matching pairs of which the Hamming distance exceeds a preset ratio threshold.

In this embodiment, optionally, the method further includes:

after matching is completed, calculating TF-IDF word frequency-inverse text frequency index scores of ISI in word bags for the obtained matching pairs, obtaining N matching pairs with the TF-IDF scores ranked at the front, obtaining the internal number through verification after RANSAC random sample consistency, and scoring the N matching pairs according to the internal number.

According to the method, the current image is obtained in real time, the word bag is established for the current image, the spatial structure of the current image is analyzed to extract spatial information, and the spatial information is converted into an invariant coordinate system to obtain ISI; the ISI is used as a query condition, nearest neighbor searching is carried out in a bag of words to complete matching, closed loop detection based on the ISI is realized, useful spatial information is extracted before a 3D map is compressed into a compact BoW to be represented, after the compact BoW is converted into an invariant coordinate system, the key point position of a visual word is used as the ISI, the ISI is used as additional data to participate in the retrieval of the BoW during closed loop detection, and the detection accuracy is improved.

Fig. 2 is a flowchart of a closed-loop detection method based on SLAM spatial invariant information according to another embodiment of the present application. Referring to fig. 2, the method includes:

201: acquiring a current image in real time;

in this embodiment, the current image may be acquired from the dataset in real time. Preferably, the selected experimental data set is a KITTI data set (created by the union of Karllu's institute of technology, Germany and Toyota's American institute of technology), which is the computer vision algorithm evaluation data set in the current international largest automatic driving scene. The acquisition platform of KITTI data set includes: 2 grayscale cameras, 2 color cameras, one Velodyne3D lidar, 4 optical lenses, and 1 GPS navigation system. The entire data set consisted of 389 images of stereoscopic images and optical flow maps, 39.2 km visual ranging sequence and over 200,0003D labeled objects, where each image included a maximum of 15 vehicles and 30 pedestrians, and also contained varying degrees of occlusion.

202: extracting a set of brief descriptors from the current image, treating each brief descriptor as a visual word, checking whether each visual word is a member of the dictionary, and if not, inserting it into the dictionary;

in this embodiment, each binary Brief descriptor is treated as a visual word to build a bag of words, and inverted indexing is implemented by an inverted file.

203: analyzing the spatial structure of the current image, extracting local feature key points from the current image, and converting the local feature key points to obtain ISI (inter symbol interference) under a constant coordinate system by using an ICP (inductively coupled plasma) iterative nearest neighbor method;

in this embodiment, the invariant coordinate system may alternatively be represented by a single dominant invariant 2D landmark point on the uv image plane, called the scene center (CoS). Displacement Δ ü is computed from the landmark ü along the horizontal axis at the current image center.

204: using ISI as a query condition, performing nearest neighbor search in the word bag to obtain matching pairs of the feature points and the visual words, calculating the Hamming distance between the feature points and the visual words, and filtering out the matching pairs of which the Hamming distance exceeds a preset ratio threshold;

in this embodiment, optionally, any key point whose hamming distance exceeds 90% of the maximum range of the distance function may be set as unreliable and not used for matching to filter it out.

205: after matching is completed, calculating TF-IDF word frequency-inverse text frequency index scores of ISI in word bags for the obtained matching pairs, obtaining N matching pairs with the TF-IDF scores ranked at the front, obtaining internal quantity after RANSAC (Random Sample Consensus) verification, and scoring the N matching pairs according to the internal quantity.

The internal number refers to the number of correctly matched feature points obtained after RANSAC.

Fig. 3 is a block diagram of a closed loop detection apparatus based on SLAM spatial invariant information according to another embodiment of the present application. Referring to fig. 3, the apparatus includes:

a processing module 301 configured to obtain a current image in real time and create a bag of words for the current image;

a conversion module 302, configured to analyze the spatial structure of the current image to extract spatial information, and convert the spatial information into an invariant coordinate system to obtain ISI spatial invariant information;

a matching module 303 configured to perform a nearest neighbor search within a bag of words using ISI as a query condition to complete the matching.

In this embodiment, optionally, the processing module is specifically configured to:

In this embodiment, optionally, the conversion module is specifically configured to:

In this embodiment, optionally, the matching module is specifically configured to:

In this embodiment, optionally, the apparatus further includes:

and the scoring module is configured to calculate TF-IDF word frequency-inverse text frequency index scores of ISI in word bags for the obtained matching pairs after matching is completed, obtain N matching pairs with the TF-IDF scores ranked in the front, obtain the internal number after RANSAC random sample consistency verification, and score the N matching pairs according to the internal number.

The apparatus provided in this embodiment may perform the method provided in any of the above method embodiments, and details of the process are described in the method embodiments and are not described herein again.

According to the device, the current image is obtained in real time, the word bag is established for the current image, the spatial structure of the current image is analyzed to extract spatial information, and the spatial information is converted into an invariant coordinate system to obtain ISI; the ISI is used as a query condition, nearest neighbor searching is carried out in a bag of words to complete matching, closed loop detection based on the ISI is realized, useful spatial information is extracted before a 3D map is compressed into a compact BoW to be represented, after the compact BoW is converted into an invariant coordinate system, the key point position of a visual word is used as the ISI, the ISI is used as additional data to participate in the retrieval of the BoW during closed loop detection, and the detection accuracy is improved.

Embodiments also provide a computing device, referring to fig. 4, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 5, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A closed loop detection method based on SLAM space invariant information comprises the following steps:

analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI (inter-symbol interference) space invariant information;

2. The method of claim 1, wherein creating a bag of words for the current image comprises:

3. The method of claim 1, wherein analyzing the spatial structure of the current image to extract spatial information, and converting the spatial information into an invariant coordinate system to obtain ISI spatial invariant information comprises:

4. The method of claim 1, wherein performing a nearest neighbor search within the bag of words using the ISI as a query condition to complete matching comprises:

5. The method according to any one of claims 1-4, further comprising:

6. A closed loop detection device based on SLAM space invariant information comprises:

7. The apparatus of claim 6, wherein the processing module is specifically configured to:

8. The apparatus of claim 6, wherein the conversion module is specifically configured to:

9. The apparatus of claim 6, wherein the matching module is specifically configured to:

10. The apparatus according to any one of claims 6-9, further comprising: