CN112699266A

CN112699266A - Visual map positioning method and system based on key frame correlation

Info

Publication number: CN112699266A
Application number: CN202011606121.2A
Authority: CN
Inventors: 李中源; 张小军
Original assignee: Shichen Information Technology Shanghai Co ltd
Current assignee: Shichen Information Technology Shanghai Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-23

Abstract

A visual map positioning method and system based on key frame correlation carries out image retrieval on a visual map according to a global descriptor of a positioning frame, processes the result of the image retrieval by using the key frame correlation, and positions the positioning frame according to the processing result. According to the visual map positioning method and system based on the key frame correlation, the threshold value of the key frame correlation coefficient is set or the key frame correlation is utilized to carry out weighted screening on the image retrieval result, the aggregation of the image retrieval result is filtered, the higher image retrieval recall rate is ensured under the condition that the total calculation power is limited, and the success rate of visual map positioning is improved.

Description

Visual map positioning method and system based on key frame correlation

Technical Field

The invention belongs to the field of visual maps, visual positioning and image retrieval, and particularly relates to a visual map positioning method and system based on key frame correlation.

Background

In image retrieval, retrieval is generally performed by means of Nearest neighbor searches (K-Nearest Neighbors, KNN), and the retrieval is based on an overall descriptor (global descriptor) of an image. The number of candidates returned by KNN is K, and in general, the more candidates returned by KNN (the larger K), the higher the recall rate (the recall rate is the probability that the returned keyframe result contains the correct candidate). In practical application, image retrieval itself is not particularly time-consuming, but the subsequent processing of the retrieved candidate frames, such as sim3 transformation, pose solving, and the like, is relatively time-consuming, and in consideration of the requirement on real-time performance, K is generally limited to a relatively small value, but when K is relatively small, the result of positioning failure is easily caused by simply depending on the result of image retrieval and performing the subsequent processing.

Specifically, the results returned by the KNN are originally sorted according to the distance between the descriptors. The distance of the global descriptor (euclidean/hamming/cosine distance, the descriptor is generally a one-dimensional vector of fixed length (same kind of descriptor length)) of the key frame arranged in the front is smaller. In various application scenarios, there are often many interferences, such as similar corners, etc., which may cause top K types ranked to contain many such interference items. In addition, since the objects described in the keyframes in the same place are similar and the descriptors are also similar, it may be caused that there are several keyframe candidates in the same place in the top K that is returned, and the candidates may be paired or wrong, and such multiple keyframe candidates in the same place do not significantly promote the subsequent positioning (assuming that the candidates are paired, the flow can be passed through as long as there is one keyframe pair in the subsequent processing). For example, suppose there are three candidate locations a, B, C with similar global descriptors, whose corresponding keyframes are a1, a2, A3, a4 … … B1, B2, B3 … … C1, C2, C3 … … respectively, and assume C is the correct candidate. In the first case, the Top K returns results that may be a1, B1, a2, B2, C1 … …; assuming that K takes 4, the correct result is excluded; in the second case, the returned results may be a1, a2, A3, a4, a5, C1. A1-A5 describe that the location is actually the same candidate, and only need to judge whether A1 is the correct candidate. But they are lined up together in front, since a1-a5 are similar. And if the value of K is increased, the two failure conditions can be avoided, but the subsequent processing time is obviously increased, so that the real-time application is hindered.

Therefore, there is a need for a method and a system for improving the success rate of positioning under the condition that the processing time of the subsequent processing, such as sim3 transformation, pose solving, etc., is not changed, that is, the number of the keyframes returned to the subsequent processing is not changed.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a visual map positioning method and system based on key frame correlation. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

one aspect of the invention provides a visual map positioning method based on key frame correlation, which carries out image retrieval on the visual map according to a global descriptor of a positioning frame and positions the positioning frame according to a retrieval result.

Preferably, the processing of the image retrieval result by using the key frame correlation filters the image retrieval result by setting a threshold of a key frame correlation coefficient.

Preferably, the processing the result of the image retrieval by using the key frame correlation comprises the following steps:

setting the number of key frames in the processing result; setting the number of the key frames returned by the image retrieval to be larger than the number of the key frames in the set processing result; setting a threshold value of the key frame correlation coefficient;

traversing the key frames returned by the image retrieval, if the correlation coefficient of the current key frame and any key frame in the processing result is larger than the threshold value, skipping the current key frame and inquiring the next key frame; otherwise, adding the current key frame into the processing result; until the number of key frames in the processing result reaches the set number of key frames in the processing result.

Preferably, the processing of the result of the image retrieval by using the key frame correlation performs weighted screening on the result of the image retrieval by using the key frame correlation.

Preferably, the processing the result of the image retrieval by using the key frame correlation includes:

dividing the key frames returned by the image retrieval into different places by using the key frame correlation coefficient, sorting the places according to the number of the key frames contained in the places from large to small, and calculating the weight of each key frame in the place;

traversing the key frames in the places according to the sequence, selecting the key frame with the minimum weight from the current places to be added into the processing result, and removing the key frame with the minimum weight from the current places until the number of the key frames in the processing result reaches the number of the key frames in the set processing result.

Preferably, the dividing of the key frames into different locations starts with a key frame with the minimum sub-distance to the positioning frame description, the key frames returned by the image retrieval are traversed according to the key frame sorting, and if the key frame correlation coefficient of the current key frame and any position in the locations is greater than the key frame correlation coefficient threshold, the current key frame is inserted into the current location; otherwise, a new location is created for the current keyframe.

Preferably, the method for calculating the weight of each key frame in the location includes:

the weight of the current key frame is equal to the descriptor distance between the current key frame and the positioning frame/the average value of the correlation coefficients of the current key frame and other key frames in the current position.

The invention also provides a visual map positioning system based on key frame correlation, which comprises a positioning frame acquisition module, an extraction global description sub-module, an image retrieval module and a positioning module, and further comprises a processing module, wherein the processing module processes the result of the image retrieval by using the key frame correlation, and the positioning module positions the positioning frame according to the processing result.

Preferably, the processing module filters the result of the image retrieval by setting a threshold of a key frame correlation coefficient.

Preferably, the processing module performs weighted filtering on the result of the image retrieval by using the key frame correlation.

According to the visual map positioning method and system based on the key frame correlation, the threshold value of the key frame correlation coefficient is set or the correlation of the key frame is utilized to carry out weighted screening on the image retrieval result, the aggregation of the returned result of the image retrieval is filtered, the higher image retrieval recall rate is ensured under the condition that the total calculation power is limited, and the success rate of visual map positioning is improved.

Description of reference numerals: 10: a visual map positioning system based on keyframe correlation; 11: a positioning frame acquisition module; 12: extracting a global description submodule; 13: an image retrieval module; 14: a processing module; 15: and a positioning module.

Drawings

The various aspects of the present invention will become more apparent to the reader after reading the detailed description of the invention with reference to the attached drawings. Wherein the content of the first and second substances,

FIG. 1 is a general flow diagram of a method for keyframe correlation based visual mapping according to one embodiment of the present invention;

FIG. 2 is a flowchart of processing the results of the image retrieval using keyframe correlations, in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of processing the results of the image retrieval using key frame correlation according to another embodiment of the present invention.

FIG. 4 is a system block diagram of a keyframe correlation based visual mapping system in accordance with an embodiment of the present invention.

Detailed Description

In order to make the disclosure more complete and complete, reference is made to the appended drawings and the following detailed description of the invention. However, it should be understood by those skilled in the art that the examples provided below are not intended to limit the scope of the present invention. In addition, the drawings are only for illustrative purposes and are not drawn to scale.

Specific embodiments of various aspects of the present invention are described in further detail below with reference to the accompanying drawings.

One aspect of the invention provides a visual map positioning method based on key frame correlation, which carries out image retrieval on the visual map according to a global descriptor of a positioning frame, processes the result of the image retrieval by using the key frame correlation, and positions the positioning frame according to the processing result. Please refer to fig. 1, which is a general flowchart of a method for positioning a visual map based on keyframe correlation according to an embodiment of the present invention, specifically, the method includes positioning frame acquisition S1, extracting a global descriptor S2, image retrieval S3, processing the result of the image retrieval using the keyframe correlation S4, and positioning S5, wherein:

a picture is shot by the camera, and the positioning frame obtaining step S1 obtains the picture (RGB image or gray image) of the positioning frame and camera parameters (focal length, optical center, etc. mainly used for subsequent pose resolving) for shooting the picture;

the global descriptor S2 calculates the descriptor of the positioning frame;

the image retrieval S3 compares the descriptor of the positioning frame with the descriptors of the key frames in the visual map, and finds the candidate with the smallest distance from the descriptor of the current positioning frame; the image retrieval S3 generally adopts KNN (K-Nearest-Neighbor) algorithm to return K results with the closest distance to the positioning frame, and the results returned by KNN are generally sorted from small to large in distance, that is, the distances between the key frame of TOP1 and the descriptor of the positioning frame are the smallest and the most similar to each other;

the step of processing the result of the image retrieval by utilizing the key frame correlation S4 is to perform re-screening processing on the result returned by the KNN;

and the positioning step S5 calculates the pose of the positioning frame according to the result of the re-screening process, and positions the positioning frame.

Specifically, the correlation of the key frame is constructed in the process of building the visual map, and the processing of the result of the image retrieval using the key frame correlation S4 calls the key frame correlation to process the result of the image retrieval S3.

Please refer to fig. 2, which is a flowchart illustrating an embodiment of processing the image retrieval result by utilizing the key frame correlation.

In this embodiment, the processing of the result of the image retrieval using the key frame correlation S4 filters the result of the image retrieval S3 by setting a threshold of a key frame correlation coefficient.

In this embodiment, the processing S4 of the result of the image retrieval using the key frame correlation includes:

setting the number of key frames in the processing result; setting the number of key frames returned by the image retrieval S3 to be greater than the number of key frames in the set processing result; setting a threshold value of the key frame correlation coefficient;

traversing the key frames returned by the image retrieval S3, if the correlation coefficient between the current key frame and any key frame in the processing result is greater than the threshold value, skipping the current key frame and inquiring the next key frame; otherwise, adding the current key frame into the processing result; until the number of key frames in the processing result reaches the set number of key frames in the processing result.

Specifically, the step of processing the result of the image retrieval with the key frame correlation S4 is as follows:

1) setting initial data S41:

setting the number of key frames in a processing result returned after processing the result of the image retrieval S3, specifically, assuming that the requirement of the positioning S5 processing is k frames, the number of key frames in the set processing result is k;

returning a set of key frames using the image retrieval S3, the image retrieval S3The number of returned key frames is greater than the number of key frames in the set processing result, specifically, since the result of the image retrieval S3 is to be filtered and sorted, if the image retrieval S3 returns the number of key frames or K, there is no promotion to the subsequent result, therefore, the number of key frames returned by the image retrieval S3 is greater than K, and it is assumed that the number of key frames returned by the image retrieval S3 is K₁～K_l(l>k)；

Presetting an initial value of the processing result to be null, and specifically, assuming that a key frame set of the processing result is Q;

presetting a threshold value of the key frame correlation coefficient, wherein the threshold value of the correlation coefficient is d;

2) go through the image retrieval returned key frame S42:

if the correlation coefficient of the current key frame and any key frame in the processing result is larger than the threshold value, skipping the current key frame and inquiring the next key frame;

otherwise, adding the current key frame into the processing result;

3) determining whether the number of key frames in the processing result reaches the set number of key frames in the processing result S43:

if the number of key frames in the processing result reaches the number of key frames in the set processing result, sending the processing result to the positioning S5;

if the number of key frames in the processing result does not reach the number of key frames in the setting processing result, returning to the key frame returned through the image retrieval S42.

The specific algorithm of this embodiment is as follows:

1) let Q be null, i-0;

2) screening K_iIf the correlation coefficient of any key frame in Ki and Q is larger than d, skipping K_iQuery K_i+1(ii) a Otherwise, K is added_iAdding into Q;

3)i＝i+1；

4) if the number of key frames in Q reaches k or i > l, then outputting Q to the location, otherwise returning to 2).

In this embodiment, the relevance of the key frames is used to filter out the aggregations of the returned key frames from the image retrieval S3, so as to improve the recall rate of subsequent positioning.

In this embodiment, the key frame relevance is calculated somewhat by the number of 3D point clouds shared between two key frames. Calculating and storing a key frame correlation coefficient during creation of a visual three-dimensional map, in particular representing the key frame correlation coefficient by a set C, C₁₂，C₁₄，C₁₆……C_1nThe single element in C represents the correlation coefficient of the first key frame and the 2 nd, 4 th, 6 th and n th key frames; the number of 3D point clouds contained in the key frame is represented by a set P, P₁，P₂，P₃……P_nRepresenting the number of 3D point clouds contained in the 1 st, 2 nd, 3 rd 3 … … th key frame as a single element in P; s₁₂，S₁₄，S₁₆……S_1nNumber of 3D points observed together for the first key frame and the 2 nd, 4 th, 6 … … th key frame (S)_1n＝S_n1) Thereby calculating the correlation coefficient C between the first key frame and the nth key frame_1n＝S_1n/P₁And so on.

In this embodiment, the key frame correlation is calculated by the fraction of the overlap between two key frames. The key frame correlation coefficient is calculated and stored in the process of creating the visual three-dimensional map, specifically, the key frame A1 is divided into N grids, if A3D point observed together with the key frame A2 exists in a single grid, the grid is set to be 1, otherwise, the grid is set to be zero. Assuming that the number of meshes that is finally set to 1 is M, the correlation coefficient C12 between the key frame a1 and the key frame a2 is M/N, and so on. Compared with the simple statistical 3D point cloud quantity, the calculation method can better measure the coincidence relation between key frames.

In this embodiment, the positioning S5 positions the positioning frame by solving with sim3 or pose solving with local feature points.

Please refer to fig. 3, which is a flowchart illustrating a processing of the image retrieval result by utilizing the key frame correlation according to another embodiment of the present invention.

In this embodiment, the processing of the result of the image retrieval using the key frame correlation S4 performs weighted filtering of the result of the image retrieval using the key frame correlation S3.

dividing the keyframes returned by the image retrieval S3 into different places by using the keyframe correlation coefficient, sorting the places according to the number of the keyframes contained in the places from large to small, and calculating the weight of each keyframe in the place;

1) setting initial data S41:

returning a group of key frames by using the image retrieval S3, wherein the number of the key frames returned by the image retrieval S3 is larger than the number of the key frames in the set processing result, and specifically, because the result of the image retrieval S3 is screened and sorted, if the number of the key frames returned by the image retrieval S3 is k, the subsequent result is not promoted, so that the image retrieval S3 returns the number of the key frames, and the subsequent result is not promotedThe number of key frames returned by the image retrieval S3 is larger than K, and it is assumed that K is returned by the image retrieval S3₁～K_l(l>k)；

2) sorting the keyframes returned by the image retrieval using the keyframe correlation coefficients S44: and dividing the key frames returned by the image retrieval into different places, and sequencing the places according to the number of the key frames contained in each place, wherein the places with larger number of the key frames are ranked more ahead. Specifically, assuming that the set of locations is P, the key frame is divided into locations P₁-P_j(j is less than or equal to l), and the returned l key frames are divided into j place candidate dimensions;

3) respectively calculating the weight of each key frame in each place S45, and assuming the weight to be E;

4) traversing the location S46: and traversing the key frames in the places one by one according to the sorting sequence of the places, selecting the key frame with the minimum weight from the current places, adding the key frame with the minimum weight into the processing result, and removing the key frame with the minimum weight from the current places. Specifically, in order, first from P₁At first, traverse P₁Selecting the key frame with the minimum weight from the key frames in the (B), adding the key frame with the minimum weight into the processing result Q, and selecting the key frame from P₁Removing the key frame with the minimum weight; go through P₁Then, P is traversed in the same way₂And so on;

5) judging the number of key frames in the processing result S43:

if the number of the key frames in the processing result reaches the number of the key frames in the set processing result, sending the processing result to the positioning step S5;

if the number of key frames in the processing result does not reach the number of key frames in the setting processing result, returning to the traverse the place S46.

In this embodiment, the dividing the key frames into different locations starts with the key frame with the minimum sub-distance to the positioning frame description, and traverses the key frames returned by the image retrieval S3 according to the key frame sorting, and if the key frame correlation coefficient of the current key frame and any position in the locations is greater than the key frame correlation coefficient threshold d, inserts the current key frame into the current location; otherwise, a new location is created for the current keyframe.

Specifically, the step of sorting and sorting the keyframes returned by the image retrieval using the keyframe correlation coefficients S44 is as follows:

1) presetting that the current place is empty;

2) sorting the keyframes returned by the image retrieval S3 by the descriptor distances of the current keyframe and the positioning frame;

3) traversing the key frames returned by the image retrieval S3 according to the key frame ordering starting from the key frame with the smallest sub-distance to the positioning frame descriptor:

inserting a current keyframe into a current location if a keyframe correlation coefficient in any position in the location with the current keyframe is greater than the keyframe correlation coefficient threshold d; otherwise, a new location is created for the current keyframe.

4) And sorting the places by using the number of the key frames in the places, and if the number of the key frames of more than two places is equal, sorting according to the average value of the descriptor distances between the key frames in the places and the positioning frame, wherein the places with smaller average values of the descriptor distances are ranked more front.

In this embodiment, the method for calculating the weight of each key frame in the location includes:

Specifically, the step of calculating the weight S45 of each key frame in each location respectively is as follows:

1) calculating the average value of the correlation coefficients of the current key frame in the current location and other key frames in the current location, wherein if the current location only contains one key frame, the average value of the correlation coefficients of the current key frame and other key frames in the current location is 1;

2) calculating the weight of the current key frame:

the weight of the current key frame is the average value of the descriptor distance/correlation coefficient of the current key frame and the positioning frame.

The specific algorithm of this embodiment is as follows:

1) let Q be null, i-0;

2) classifying the I key frames by using the key frame correlation coefficient to obtain P₁～P_j((j≤l)：

(1) Let P be null;

(2) from K₁->K_lIf the current frame K is traversed sequentially_cIf the correlation coefficient with the key frame in any position in P is larger than the threshold value d, the current frame K is processed_cAdding into corresponding P, otherwise, K_cA new P is created by sequence number. For example, if P is currently an empty set, then P is created₁And will K_cInsert P₁Performing the following steps; if P already exists₁Absence of P₂Then create P₂And will K_cInsert P₂In the middle, the rest is done in the same way;

(3) sorting by the number of key frames of the place in P, so that P₁Is greater than P₂，P₂Greater than P₃… … if the number of keyframes in the P to be ranked is equal, the ranking is performed according to the average value of the descriptor distances between the keyframes in the P and the positioning frames, and the smaller the average value of the descriptor distances is, the higher the ranking is.

3) Calculating P₁-P_jWeight of (E):

(1) with P₁For example, assume P₁Including a key frame K₁～K_g；

(2) Calculating K₁And K₂……K_gAverage of all correlation coefficients C_average1；

(3) Calculating K₁Weight E of₁＝K₁/C_average1Wherein, K is₁The self numerical value represents the descriptor distance between the key frame and the positioning frame;

(4) calculating K₂And K₁、K₃……K_gCalculating K from the average of the correlation coefficients₂Weight E of₂＝K₂/C_average2And so on;

(5) if the current set P₁Contains only 1 key frame, then C_average1＝1；

4) From P_iThe key frame K with the minimum weight E is selected_mInserted into Q and driven from P_iIn removing K_m；

5) If i is equal to j, making i equal to 1, otherwise, i equal to i + 1;

6) if the number of Q reaches k, then output Q to the location S5, otherwise return to 4).

The basic idea of the above algorithm is that among all candidate locations P, P₁Has the most number of frames, it means that the most number of key frame descriptors are similar to the descriptors of the positioning frames, so P₁With highest probability of correct location, priority P₁。P₁The smaller the descriptor distance between the internal key frame and the positioning frame is, the more similar the key frame and the positioning frame is, the larger the correlation coefficient between the key frame and other key frames is, the more representative the key frame is at the current candidate location, according to the weight calculation formula, the smaller the weight E of the key frame finally calculated is, and the more advanced the ranking is, so that the key frame can be preferentially selected into the key frame set Q of the processing result.

In this embodiment, the key frame relevance is calculated somewhat by the number of 3D point clouds shared between two key frames. Calculating and storing a key frame correlation coefficient during creation of a visual three-dimensional map, in particular representing the key frame correlation coefficient by a set C, C₁₂，C₁₄，C₁₆……C_1nFor a single element in C, the first key frame is representedCorrelation coefficients with the 2 nd, 4 th, 6 th, n th key frame; the number of 3D point clouds contained in the key frame is represented by a set P, P₁，P₂，P₃……P_nRepresenting the number of 3D point clouds contained in the 1 st, 2 nd, 3 rd 3 … … th key frame as a single element in P; s₁₂，S₁₄，S₁₆……S_1nNumber of 3D points observed together for the first key frame and the 2 nd, 4 th, 6 … … th key frame (S)_1n＝S_n1) Thereby calculating the correlation coefficient C between the first key frame and the nth key frame_1n＝S_1n/P₁And so on.

In this embodiment, the key frame correlation is calculated by the fraction of the overlap between two key frames. The key frame correlation coefficient is calculated and stored in the process of creating the visual three-dimensional map, specifically, the key frame A1 is divided into N grids, if A3D point observed together with the key frame A2 exists in a single grid, the grid is set to be 1, otherwise, the grid is set to be zero. Assuming that the number of meshes that are finally set to 1 is M, the correlation coefficient C between the key frame A1 and the key frame A2₁₂M/N, and so on. Compared with the simple statistical 3D point cloud quantity, the calculation method can better measure the coincidence relation between key frames.

Please refer to fig. 4, which is a block diagram of a system for positioning a visual map based on keyframe correlation according to an embodiment of the present invention. Another aspect of the present invention provides a visual map positioning system 10 based on key frame correlation, which includes a positioning frame obtaining module 11, an extraction global description sub-module 12, an image retrieving module 13, and a positioning module 15, the system further includes a processing module 14, the processing module 14 processes the result of the image retrieving module 13 by using key frame correlation, and the positioning module 15 positions the positioning frame according to the processing result.

In this embodiment, the processing module 14 filters the result of the image retrieval 13 by setting a threshold of the key frame correlation coefficient.

In this embodiment, the processing module 14 performs weighted filtering on the result of the image retrieval 13 by using the key frame correlation.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the invention and are not to be construed as limiting the embodiments of the present invention, and that various other changes and modifications may be made by those skilled in the art based on the above description. All documents mentioned in this application are incorporated by reference into this application as if each were individually incorporated by reference.

Claims

1. A visual map positioning method based on key frame correlation is characterized in that the method utilizes the key frame correlation to process the image retrieval result and position the positioning frame according to the processing result.

2. The method according to claim 1, wherein the processing of the image retrieval results using key frame correlation filters the image retrieval results by setting a threshold of key frame correlation coefficients.

3. The method according to claim 2, wherein the processing the result of the image retrieval using the key frame correlation comprises:

4. The method according to claim 1, wherein the processing of the results of the image retrieval using keyframe correlation uses the keyframe correlation to perform a weighted filtering of the results of the image retrieval.

5. The method according to claim 4, wherein the processing the image retrieval result by using the key frame correlation comprises:

and traversing the key frames in the places one by one according to the sequence, selecting the key frame with the minimum weight from the current place, adding the key frame into the processing result, and removing the key frame with the minimum weight from the current place until the number of the key frames in the processing result reaches the number of the key frames in the set processing result.

6. The method according to claim 5, wherein the dividing of the key frames into different locations starts from the key frame with the smallest descriptive sub-distance from the positioning frame, traverses the key frames returned by the image retrieval according to the key frame sorting, and inserts the current key frame into the current location if the key frame correlation coefficient in any position of the current key frame and the location is greater than the key frame correlation coefficient threshold; otherwise, a new location is created for the current keyframe.

7. The method according to claim 5, wherein the weight of each key frame in the location is calculated by:

8. A visual map positioning system based on key frame correlation comprises a positioning frame acquisition module, an extraction global description submodule, an image retrieval module and a positioning module, and is characterized by further comprising a processing module, wherein the processing module processes the result of image retrieval by using the key frame correlation, and the positioning module positions the positioning frame according to the processing result.

9. The system of claim 8, wherein the processing module filters the results of the image retrieval by setting a threshold for key frame correlation coefficients.

10. The system of claim 8, wherein the processing module utilizes the key frame correlation to weight-filter the results of the image retrieval.