CN116701700A

CN116701700A - Method executed by electronic equipment, electronic equipment and storage medium

Info

Publication number: CN116701700A
Application number: CN202210178991.7A
Authority: CN
Inventors: 彭雄峰; 刘志花; 王强; 金允泰
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-09-05
Also published as: KR20230127830A

Abstract

The embodiment of the application provides a method and a device for executing by electronic equipment, the electronic equipment and a computer readable storage medium, and relates to the technical field of executing by the electronic equipment. The method comprises the following steps: acquiring a search image of the query image; respectively acquiring the spatial characteristics of the query image and the spatial characteristics of the search image; the relative pose between the query image and the retrieved image is estimated based on the spatial features. The method executed by the electronic equipment can determine the relative pose in an artificial intelligence mode, and optimize the global map more accurately.

Description

Method executed by electronic equipment, electronic equipment and storage medium

Technical Field

The present application relates to the field of localization and mapping (simultaneous localization and mapping, SLAM), and more particularly, to a method performed by an electronic device, and a computer-readable storage medium.

Background

A three-dimensional map describing a space in which the device is located is constructed in real time by using a camera, a laser radar, and other sensors on the device, and the pose (position and pose) of the device is determined, and such a technology is called SLAM. Due to the limitations of errors in camera calibration and feature matching accuracy, the vision SLAM can generate unavoidable accumulated errors in the process of mapping and positioning. To solve this problem, a closed Loop (LC) module is added to the SLAM system, which is responsible for identifying the co-view relationship between the current frame and the early key frame, and then optimizing the global map to reduce the accumulated error, so as to realize drift-free positioning.

In the prior art, visual constraints are generally established through methods such as feature matching and the like, then the relative pose between a query image and a search image is estimated, so that a global map is optimized, the problems that the visual angle change is large and the global map optimizing time is large can not be solved by the method, and the current SLAM closed loop module is necessary to be optimized.

Disclosure of Invention

The application provides a method executed by electronic equipment, the electronic equipment and a computer readable storage medium, wherein the technical scheme is as follows:

in a first aspect, there is provided a method performed by an electronic device, the method comprising:

acquiring a search image of the query image;

respectively acquiring the spatial characteristics of the query image and the spatial characteristics of the search image;

the relative pose between the query image and the retrieved image is estimated based on the spatial features.

In a second aspect, there is provided an electronic device comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: operations corresponding to the method performed by the electronic device according to the first aspect are performed.

In a third aspect, a computer readable storage medium is provided, the storage medium storing at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by a processor to implement a method performed by an electronic device as described in the first aspect.

The technical scheme provided by the application has the beneficial effects that:

compared with the prior art, the method, the electronic device and the computer-readable storage medium estimate the relative pose between the query image and the search image by the spatial characteristics of the query image and the search image, and the spatial characteristics have the hierarchy and spatial information of larger visual field feeling and can more accurately optimize the global map.

Further, the three-dimensional point set is estimated by densely and uniformly extracting key points and ORB descriptors from the image and then completing stereo matching and triangulation by using epipolar constraint, so that the three-dimensional point set is more uniformly and densely distributed in space than the global map, and the relative pose is more accurately determined to optimize the global map.

Furthermore, by combining incremental bundling adjustment and complete bundling adjustment, the global map is effectively optimized, the running time is shorter, and the precision is higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a flowchart of a method performed by an electronic device according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an example of a scheme of clustering three-dimensional point sets according to the present application;

FIG. 3 is a schematic diagram of a scheme for generating a first feature matching pair and a second feature matching pair in an example of the application;

FIG. 4 is a schematic diagram of an aspect of generating a third feature matching pair in an example of the application;

FIG. 5 is a schematic diagram of a scheme for generating an optimized global map in an example of the application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an aspect of the present application implemented by an electronic device;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The three-dimensional map of the space where the device is located is constructed in real time by using a camera, an inertial measurement unit and other sensors on the device, and the position and the posture of the device in the map are determined in real time, which is called SLAM. Because the price of the camera and the inertia measuring unit is lower than that of the laser radar sensor and is also a mobile phone, the standard allocation of equipment such as glasses, indoor robots and the like is enhanced, the equipment can be used in various scenes, and the main research content of all the existing SLAM technology is to use the camera and the inertia measuring unit as the sensors to carry out real-time map construction and equipment pose acquisition. The three-dimensional map built by the binocular camera has a truly physical scale relative to the monocular camera, so in practical applications, the on-device visual sensor is often a binocular camera.

The existing SLAM system mainly utilizes tracking matching of point features in images according to multi-view geometric theory to acquire equipment pose (the equipment pose refers to the spatial three-dimensional position and orientation of equipment) and three-dimensional environment information. Specifically, the point features of related images of the video in time sequence are tracked and matched according to a multi-view geometric principle, the point features of the binocular image are matched according to epipolar constraint, finally, the geometric constraint relation between the pose of the device and the three-dimensional map points is established through the matching, and the pose of the device and the three-dimensional map points can be solved through filtering or bundling adjustment.

Due to errors in camera calibration and feature matching, the vision SLAM can produce unavoidable cumulative errors in the mapping and positioning process. Achieving drift-free positioning and building an accurate global map would be challenging. To solve this problem, a closed Loop (LC) module is added to the SLAM system, which is responsible for identifying the co-view relationship between the current frame and the early key frame, and then optimizing the global map to reduce the accumulated error, so as to realize drift-free positioning. Therefore, LC forms a key module of SLAM system, and can significantly improve SLAM performance.

LC is generally divided into three steps: the first step is similar to the image retrieval task, which aims to search for semantically similar images for query images. Obviously, proper image representation is indispensable, and most methods are based on the bag of words model (BoW). The second step is to establish visual constraints by means of feature matching such as BoW and ORB (Oriented FAST and Rotated Binary Robust Independent Elementary Features) feature matching and projection matching, and then estimate the relative pose between the query image and the retrieved image. And thirdly, optimizing the global map to realize drift-free positioning. In recent years, some research on LC has been carried out. Four degrees of freedom (4 DOF) pose map optimization is proposed in some related techniques to optimize the global consistency of key frame poses and current frame poses in a global map. The 4DOF pose map optimization method optimizes global consistency of key frame poses with low time consumption. However, it does not maintain a global map, which results in insufficient accuracy of the optimization. Still other related art techniques have proposed increasing LC recall by replacing the temporal consistency check of three key frames with a local consistency check between the query key frame and three co-view key frames. However, when the change of the view angle of the camera is large and there is perceived aliasing in the scene, there are few interior points of the relative pose between the estimated query and the search key frames, LC will also fail, and in addition, the complete bundle adjustment method (FBA) costs much time for optimizing the global map. In other related technologies, feature-in-recognition methods are proposed, and pose priors help the proposed spatio-temporal sensitive global sub-map to quickly recognize existing features. When pose is not reliable a priori, LC and feature re-recognition are combined together to obtain a drift-free camera pose. When the camera drift is large, feature re-recognition is not functional. LC is also prone to failure due to large viewpoint changes of the camera and the presence of perceived aliasing in the scene. In addition, when the camera drift is large, the incremental bundling adjustment method (IBA) is insufficient for optimization of the global map.

In summary, accurate and stable LC presents several challenges. First, feature matching considers local features in tiles, such as ORB, BRIEF (Binary Robust Independent Elementary Features), SURF (Speeded Up Robust Features), and SIFT (Scale-invariant feature transform, scale-invariant feature transforms), rather than hierarchical and spatial information with a larger perceived field of view, which can lead to LC instability in the presence of large changes in perspective of the camera and perceived aliasing in the scene. Feature matching based on deep learning is typically focused on using Convolutional Neural Networks (CNNs) to learn better sparse detectors and local descriptors from the data. Some recent work along this direction is to match neural networks of two sets of local features by jointly finding correspondences and rejecting unmatched points, which can solve various multi-view geometry problems that require high quality feature correspondence. But the deep learning method requires a strong computing resource. Secondly, the global map optimization problem cannot be solved robustly by a single optimization method, for example, IBA optimization is insufficient when camera drift is large, global map optimization by a complete bundle adjustment (FBA) method is very time-consuming, and an accurate global map is not maintained by a pose optimization method.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

In one possible implementation manner provided in the embodiment of the present application, as shown in fig. 1, a method performed by an electronic device is provided, and may include the following steps:

step S101, a search image of the query image is acquired.

Wherein the query image is an image (e.g., a current frame scene image) acquired by the electronic device during the positioning and mapping process. But may also be images received from other devices.

In some possible embodiments, the query image may be acquired in real time, may be acquired at periodic intervals, may be acquired through event triggering, and the process of acquiring the query image is not limited.

In some possible embodiments, in the positioning and map building process, the electronic device configures an image dataset for each key frame, acquires an image dataset corresponding to the query image, wherein the image dataset comprises a plurality of candidate images, and searches for images similar to the query image semantically from the plurality of candidate images to obtain the retrieval image.

The number of the search images can be one or more, and the number of the search images is not limited in the application.

For example, a search image may be searched for from candidate images based on the bag of words model.

Step S102, spatial features of the query image and spatial features of the search image are acquired respectively.

Wherein the spatial feature may comprise a three-dimensional set of points.

In some possible embodiments, step S102 may include respectively acquiring spatial features of the query image and the search image:

for any one of the query image and the search image, acquiring the spatial feature includes:

(1) Extracting image feature points; the image feature points comprise image key points and feature descriptors;

(2) And estimating to obtain a three-dimensional point set by carrying out three-dimensional matching on the image characteristic points.

The feature descriptors may be, for example, ORB descriptors.

In a specific implementation, epipolar constraints may be used for stereo matching and triangulation to estimate a three-dimensional set of points for the query image and a three-dimensional set of points for the search image, respectively.

Step S103, estimating the relative pose between the query image and the retrieval image based on the spatial features.

In some possible embodiments, the spatial features of the query image and the spatial features between the search images may be matched at least once to obtain feature matching results, and the relative pose may be determined according to the feature matching results.

In the implementation process, the three-dimensional point set of the query image and the three-dimensional point set of the search image can be subjected to multi-level matching, the matching level is from rough to fine, and then the relative pose is determined according to the final matching result, and the determination method of the relative pose is further elaborated below.

In any of the embodiments, the relative pose between the query image and the search image is estimated through the spatial features of the query image and the search image, and the spatial features have the hierarchy and spatial information of larger visual field feeling, so that the global map can be optimized more accurately.

In some possible implementations, the global map is optimized by dense and uniform extraction of key points and ORB descriptors from the image, and then using epipolar constraints to complete stereo matching and triangulation to estimate a three-dimensional set of points such that the three-dimensional set of points is more uniformly and densely distributed in space than the global map, thereby more accurately determining the relative pose.

The specific process of determining the relative pose will be further described in connection with the embodiments below.

In some possible implementations, the feature matching result includes a first feature matching pair;

Matching the spatial features of the query image with the spatial features of the search image at least once to obtain a feature matching result may include:

and clustering the three-dimensional point sets of the query image and the search image respectively to generate a first feature matching pair between the clustering result of the query image and the clustering result of the search image.

In some possible implementations, the three-dimensional point set points are aggregated into a cube according to a spatial distribution. Each cluster center descriptor D _C Is described by all three-dimensional points in the cubeThe voting function V (-) results, which takes into account the spatial information of the larger perceived field of view.

Wherein D is _C Is each cluster center descriptor D _C ；Is described by three-dimensional points in the cube; v (X) is a voting function.

In some possible embodiments, clustering the three-dimensional point sets of the query image and the search image respectively, generating a first feature matching pair between the clustering result of the query image and the clustering result of the search image includes:

determining at least one first cube formed by gathering three-dimensional point sets of the query image;

determining at least one second cube formed by gathering three-dimensional point sets of the search image;

determining a first clustering center of each first cube, and determining a second clustering center of each second cube;

And respectively determining second clustering centers matched with the first clustering centers, and forming the first feature matching pair based on the matched first clustering centers and the second clustering centers.

In the embodiment shown in fig. 2, the number 1 is more for the first dimension of the ORB descriptor described above, and therefore, the first dimension of the cluster center descriptor is 1. After the three-dimensional point set clustering, a cluster center descriptor of each cube can be obtained,for querying the cluster center descriptor of the image, +.>To retrieve cluster center descriptors of images. Then, by nearest neighbor search and mutual verification +.>A coarse matching pair, i.e. a first feature matching pair, between the query image and the retrieved image is obtained with respect to the cube.

In the method, in the process of the invention,a cluster center descriptor for the query image; />A cluster center descriptor for retrieving the image; h () represents the hamming distance, μ represents the threshold value of the hamming distance; />Nearest neighbor searching from the cluster center characteristic of the query image to the cluster center of the search image; />Searching for nearest neighbors from the cluster center feature of the search image to the cluster center feature of the query image; />And checking the cluster center characteristics of the query image and the cluster center characteristics of the search image.

In fig. 3, the cube connected by the double-headed arrow dashed line is a rough matching pair between the query image and the search image.

In some possible embodiments, the first feature matching pair between the cubes is obtained by performing rough matching on the query image and the search image, and the relative pose between the query image and the search image can be estimated directly according to the first feature matching pair.

In some possible embodiments, after the rough matching is performed on the query image and the search image to obtain a first feature matching pair between cubes, fine matching may also be performed again to obtain a second feature matching pair between three-dimensional point sets.

In the implementation process, the feature matching result may further include a second feature matching pair, and the matching is performed on the spatial feature of the query image and the spatial feature of the search image at least once to obtain the feature matching result, and may further include:

performing nearest neighbor search and mutual verification on the neighborhood three-dimensional points of the first feature matching pair to obtain a second feature matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the search image;

determining the relative pose based on the feature matching result may include:

The relative pose is estimated based on the second feature matching pair.

Specifically, all three-dimensional points in the neighborhood of the first feature matching pairAndperforming nearest neighbor search and mutual verification, +.>And->Representing a set of 27 cubes in the spatial neighborhood of the i-th cube and a set of 27 cubes in the spatial neighborhood of the j-th cube, respectively. The coarse relative pose deltat between the query image and the retrieved image is then estimated based on the second feature matching pair.

Wherein:is->Is->Searching nearest neighbor from the three-dimensional point feature of the query image to the three-dimensional point feature of the search image; />Searching nearest neighbor from the three-dimensional point feature of the search image to the three-dimensional point feature of the query image; />And checking the three-dimensional point characteristics of the query image and the retrieval image.

In some possible embodiments, after performing rough matching on the query image and the search image to obtain a first feature matching pair between cubes and performing fine matching again to obtain a second feature matching pair between three-dimensional point sets, the rough relative pose between the query image and the search image may be estimated directly based on the second feature matching pair, and the rough relative pose may be set as the relative pose between the query image and the search image.

In some possible embodiments, after rough matching is performed on the query image and the search image to obtain a first feature matching pair between cubes, fine matching is performed again to obtain a second feature matching pair between three-dimensional point sets, pose guidance matching may be performed again to obtain a third feature matching pair.

Specifically, the feature matching result further includes a third feature matching pair;

matching the spatial features of the query image with the spatial features of the search image at least once to obtain a feature matching result, and may further include:

estimating a coarse relative pose between the query image and the retrieved image based on the second feature matching pair;

and projecting the three-dimensional point set of the search image to a coordinate system of the query image through rough relative gestures, and determining a third feature matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the search image.

Determining the relative pose based on the feature matching results may include:

a relative pose is determined based on the third feature matching pair.

In the embodiment shown in fig. 4, the three-dimensional points of the retrieved image are projected to the query image coordinate system using the rough relative pose Δt, then nearest neighbor searching and mutual verification are performed according to the point position distance and the ORB descriptor hamming distance, similarly to the fine matching section, to obtain a third matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the retrieved image, and finally the initial relative position between the query image and the retrieved image is estimated based on the third matching pair. As shown in fig. 4, the existence of overlapping between the corresponding three-dimensional points of the query image and the search image may be regarded as a third matching pair, and three-dimensional points that do not overlap at all represent outliers.

The above embodiments illustrate specific processes for determining feature matching results, and the process for determining relative pose will be further illustrated in conjunction with the drawings and embodiments.

In some possible embodiments, determining the relative pose based on the feature matching result may include:

estimating an initial relative pose between the query image and the search image based on the feature matching result;

determining local points corresponding to key points of the query image in the search image based on the initial relative pose, and forming point matching pairs based on the key points and the corresponding local points;

estimating the relative pose based on the pair of point matches.

In a specific implementation process, a projection search matching method can be used for determining local points corresponding to key points of the query image in the search image, point matching pairs are formed based on the key points and the corresponding local points, and then a PNP algorithm (Perchoice-n-Point) is used for estimating the relative pose between the query image and the search image.

The above embodiments illustrate specific processes of the relative pose, and after the relative pose is acquired, an optimized global map may be acquired according to the relative pose.

In some possible embodiments, the method performed by the electronic device may further include:

and acquiring an optimized global map based on the relative pose.

In some possible embodiments, the optimized global map may be obtained by combining Incremental Bundling Adjustment (IBA) and Full Bundling Adjustment (FBA), and it is determined whether to take the incremental bundling adjustment or the full bundling adjustment according to the relative pose, so as to improve the optimization accuracy.

In some possible embodiments, obtaining an optimized global map based on the relative pose may include:

and optimizing the current global map based on the relative pose to obtain the optimized global map.

In some possible embodiments, in the process of positioning and map construction, the global map is continuously optimized, and the global map obtained by the last optimization can be optimized again based on the relative pose, namely the current global map is optimized, and the optimized global map is obtained. In some possible embodiments, optimizing the current global map based on the relative pose, to obtain the optimized global map may include:

determining pose drift information based on the relative pose;

And determining an optimization strategy based on the pose drift information, and optimizing the current global map through the optimization strategy to obtain the optimized global map.

Wherein the pose drift information includes at least one of a drift angle, a drift distance, and a number of closed loops of similar drift.

Wherein the optimization strategy may include incremental bundling adjustments and/or full bundling adjustments.

The specific process of determining pose drift information will be elaborated upon by some embodiments below.

In some possible embodiments, when a cycle is successfully detected, the pose drift T is calculated by the following formula _drift ：

In the formula DeltaT _SLAM Representing the relative pose between the SLAM method estimated query image and the retrieved image; r is R _drift Drift representing rotation; t is t _drift Representing translational drift. Angle of drift A _drift And distance D _drift Can pass through R _drift And t _drift And (5) calculating to obtain the product.

In some possible implementations, to determine the relative pose accuracy of a closed Loop (LC) module, k ε [ q ] within a time window is calculated _th -10,q _th ) Closed loop drift error between query image and current query imagePose drift is calculated by the following equation:

wherein: q _th An index representing a current query image; r is R _error Drift error indicative of rotation; t is t _error Representing the drift error of the translation.

In some possible embodiments, the angle of the errorAnd distance->Can pass->Andand (5) performing calculation. Finally, counting the number N of time-consistent cycles in the time window _TCL If N _TCL Greater than or equal to a given threshold η, then this means the estimated relative pose Δt _loop Satisfying time consistency and being sufficiently accurate.

N _TCL Expressed by the following equation:

wherein: n (N) _TCL Indicating the number of time-consistent cycles within the time window.

The specific process of determining pose drift information is set forth above, and the process of acquiring an optimized global map based on the pose drift information will be specifically set forth below by some embodiments.

In some possible embodiments, determining an optimization policy based on the pose drift information, and optimizing the current global map by the optimization policy to obtain the optimized global map may include:

if the pose drift information accords with a preset error condition, the initial global map is adjusted through incremental bundling, and the optimized global map is obtained; or (b)

And if the pose drift information does not accord with the error condition, adjusting the initial global map through complete bundling to obtain the optimized global map.

Specifically, if the pose drift information accords with a preset error condition, based on the point matching pair, the initial global map is adjusted through incremental bundling, and an optimized global map is obtained;

if the pose drift information does not meet the preset error condition, based on the relative pose and point matching pair, the initial global map is adjusted through complete bundling, and the optimized global map is obtained.

In some possible embodiments, the angle A of the drift is determined according to the above _drift And distance D _drift Number of time-consistent cycles N _TCL The following optimization strategy is performed:

IBA is incremental bundling adjustment; FBA is global bundling adjustment.

If the camera drift is small (A _drift And D _drift Less than a given threshold β and τ) or estimated relative pose Δt _loop Time consistency has not been verified (N _TCL Less than a given threshold η), only the point matching pair constraints are added, and then the pose and map points of the relevant key frames are optimized by incremental bundling adjustment. Otherwise, the accumulated error of the current SLAM system is larger and the estimated relative pose delta T is estimated _loop Satisfying the time consistency and being accurate enough, adding the estimated relative pose delta T _loop And the constraint of the point matching pair, and optimizing all key frame gestures and all map points through global bundling adjustment.

In some possible embodiments, the obtaining the optimized global map by adjusting the initial global map through the complete bundle may include:

based on the relative pose, optimizing the multi-degree-of-freedom pose of the key frame of the initial global map to obtain a first global map;

and optimizing the key frame pose and map points of the first global map through global bundling adjustment to obtain an optimized global map.

As shown in fig. 5, the 6-degree-of-freedom pose of all key frames may be optimized first, and then all key frame poses and all map points may be optimized by performing a bundle adjustment (FBA).

In some possible embodiments, a method performed by an electronic device is provided, which may include:

acquiring a search image of the query image;

determining a relative pose between the query image and the retrieved image;

determining pose drift information based on the relative pose;

and determining an optimization strategy based on the pose drift information, and optimizing the current global map through the optimization strategy to obtain an optimized global map.

In some possible implementations, determining the relative pose of the query image and the retrieved image may include:

and establishing visual constraints between the query image and the retrieval image through feature matching, and estimating the relative pose between the query image and the retrieval image.

In some possible implementations, the feature matches may specifically be a BOW and ORB feature match or a projection match, or the like.

estimating a relative pose between the query image and the search image based on the spatial features.

In the specific implementation process, the three-dimensional point set of the query image and the three-dimensional point set of the search image can be subjected to multi-level matching, the matching level is from rough to fine, then the relative pose is determined according to the final matching result, and the specific determination method for the relative pose can refer to the above, and no description is given here.

In some possible embodiments, if the pose drift information meets a preset error condition, the initial global map is adjusted by incremental bundling to obtain an optimized global map;

And if the pose drift information does not accord with the error condition, adjusting the initial global map through complete bundling to obtain an optimized global map.

The scheme of the present application performed by the electronic device will be described below with reference to specific examples.

In one example, as shown in fig. 6, the electronic device of the present application may include:

an image retrieval module: searching search images with similar semanteme for the query images in the image data sets corresponding to the key frames;

an initial relative pose estimation module: estimating an initial relative pose between the query image and the retrieved image;

an accurate relative pose estimation module: accurately estimating the relative pose constraint between the query image and the search image and establishing the constraint between the key points of the query image and the corresponding local map points of the search image;

and an optimization module: and further optimizing the global map according to the newly added constraint of accurate estimation.

The method of the present application performed by an electronic device will be further described below in connection with specific examples.

As shown in fig. 7, the method performed by the electronic device of the present application may include:

searching for search images (i.e., search images shown in the figure) with similar semantics for the query images (i.e., the query images shown in the figure) in the image dataset through the word bag model;

Generating a three-dimensional point set of the query image and generating a three-dimensional point set of the search image;

clustering a three-dimensional point set of the query image to form at least one first cube; clustering the three-dimensional point set of the search image to form at least one second cube;

respectively determining second clustering centers matched with the first clustering centers, and forming first feature matching pairs, namely rough matching shown in the figure, based on the matched first clustering centers and the second clustering centers;

generating a second feature matching pair between the three-dimensional point set of the query image and the search image based on the first feature matching pair, namely, fine matching shown in the figure;

generating a third feature matching pair between the three-dimensional point set of the query image and the search image through pose guidance matching, and generating an initial relative pose;

determining point matching pairs between key points of the query image and corresponding local points of the search image based on the initial relative pose estimation, and estimating the relative pose;

based on the relative pose and point matching pair, an optimization strategy for the initial global map is determined, and the initial global map obtained by synchronous positioning and mapping can be optimized by selecting complete bundling adjustment or incremental bundling adjustment.

In the above example, a new closed Loop (LC) method is proposed, named a closed loop (DH-LC) method with hierarchical and hybrid properties, the process of generating a three-dimensional point set, clustering of the three-dimensional point set, coarse matching, fine matching, pose guidance matching is named hierarchical-based spatial feature matching (HSFM), the optimization method combining IBA and FBA is named Hybrid Bundle Adjustment (HBA), and the global map after optimization of the Hybrid Bundle Adjustment (HBA) is performed by estimating the initial relative pose between the query image and the search image.

For each query image, a search pattern is obtained from the candidate image set by means of a bag of words model BOW, and then HSFM estimates the relative pose between the query image and the search image in a coarse-to-fine hierarchy. Then, a projection search matching method is used for completing matching between key points of a query image and corresponding local map points of a search image, a PNP algorithm is used for estimating accurate relative pose between the query image and the search image, and finally, according to the proposed optimization strategy, the HBA can adaptively select an IBA or FBA method to more effectively optimize the current global map.

In order to improve the interior point rate and the efficiency of feature matching, the invention provides HSFM. Unlike existing methods of direct local feature matching or feature cluster-based accelerated matching, key points and ORB descriptors are extracted densely and uniformly from query images and search images, three-dimensional points corresponding to the query images and the search images are estimated by stereo matching and triangulation using epipolar constraints, then the three-dimensional points are clustered into cubes according to spatial distribution, each cluster center descriptor is obtained by voting from all three-dimensional point descriptors of cubes with larger perceived fields of view, and finally initial relative pose between key frames of the query images and the search images is estimated in a coarse-to-fine manner. After robust pose estimation and point matching, the next step is how to effectively optimize the global map. The single optimization method cannot give consideration to precision and efficiency, so the invention provides the HBA, which combines IBA and FBA, effectively optimizes the global map, and has shorter running time and higher precision.

1) The HSFM estimates the initial relative pose between the query image and the retrieved image based on hierarchical matching from coarse to fine and spatial clustering of epipolar constraint generating three-dimensional points. Compared with the prior art, the method provided by the invention improves the interior point rate and the efficiency of feature matching.

2) The HBA combines the IBA and the FBA, so that the optimized global map is shorter in running time and higher in precision.

3) The DH-LC method is provided by combining HSFM and HBA, so that the recall rate and efficiency of a closed loop are improved, the accumulated error is reduced, and the positioning accuracy is further improved.

According to the method executed by the electronic equipment, the relative pose between the query image and the search image is estimated through the spatial features of the query image and the search image, the spatial features have the hierarchy and spatial information of larger visual field feeling, and the global map can be optimized more accurately.

Furthermore, by combining incremental bundling adjustment and complete bundling adjustment, the global map after effective optimization is shorter in running time and higher in precision.

The method executed by the electronic device is described in the above embodiment through the angle of the method flow, and the following description is described through the angle of the virtual module, which is specifically shown as follows:

in an example, as shown in fig. 8, an electronic device 80 provided by an embodiment of the present application, the electronic device 80 may include a first obtaining module 801, a second obtaining module 802, an estimating module 803, and an optimizing module 804, where:

a first obtaining module 801, configured to obtain a search image of a query image;

a second obtaining module 802, configured to obtain spatial features of the query image and spatial features of the search image respectively;

an estimating module 803 is configured to estimate a global map after optimizing a relative pose between the query image and the search image based on the spatial features.

In one possible implementation, the spatial features include a three-dimensional set of points;

for any one of the query image and the search image, the second acquisition module 802 is specifically configured to, when acquiring the spatial feature:

extracting image feature points; the image feature points comprise image key points and feature descriptors;

And estimating to obtain a three-dimensional point set by carrying out three-dimensional matching on the image characteristic points.

In one possible implementation, the estimating module 803 is specifically configured to, when estimating the relative pose between the query image and the search image based on the spatial features:

matching the spatial features of the query image with the spatial features of the search image at least once to obtain a feature matching result;

the relative pose is determined based on the feature matching results.

In one possible implementation, the feature matching result includes a first feature matching pair;

the estimating module 803 is specifically configured to, when performing at least one matching on the spatial feature of the query image and the spatial feature of the search image to obtain a feature matching result:

In one possible implementation, the estimating module 803 is specifically configured to, when clustering three-dimensional point sets of the query image and the search image, generate a first feature matching pair between a clustering result of the query image and a clustering result of the search image, where the first feature matching pair is specifically configured to:

In one possible implementation, the feature matching result further includes a second feature matching pair;

the estimating module 803 is further specifically configured to, when performing at least one matching on the spatial feature of the query image and the spatial feature of the search image to obtain a feature matching result:

and carrying out nearest neighbor search and mutual verification on the neighborhood three-dimensional points of the first feature matching pair to obtain a second feature matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the search image.

In one possible implementation, the feature matching result further includes a third feature matching pair;

In a possible implementation manner, the estimation module 803 is specifically configured to, when determining the relative pose based on the feature matching result:

estimating the relative pose based on the pair of point matches.

In a possible implementation manner, the device further comprises an optimization module, specifically configured to:

and optimizing the current global map based on the relative pose to obtain an optimized global map.

In one possible implementation manner, the optimization module is specifically configured to, when optimizing the current global map based on the relative pose and obtaining the optimized global map:

Determining pose drift information based on the relative poses;

In a possible implementation manner, when determining an optimization strategy based on the pose drift information, the optimization module is specifically configured to, when optimizing the current global map through the optimization strategy to obtain the optimized global map:

In one possible implementation manner, the optimization module is specifically configured to, when the initial global map is adjusted through a complete bundle to obtain the optimized global map:

and optimizing the key frame pose and map points of the first global map through global bundling adjustment to obtain the optimized global map.

According to the electronic device, the relative pose between the query image and the search image is estimated through the spatial features of the query image and the search image, the spatial features have the hierarchy and spatial information of a larger visual field, and the global map can be optimized more accurately.

The electronic device of the embodiments of the present disclosure may perform the method provided by any of the foregoing method embodiments of the present disclosure, and implementation principles of the method are similar, and actions performed by each module in the device of each embodiment of the present disclosure correspond to steps in the method performed by the electronic device in each embodiment of the present disclosure, and detailed functional descriptions of each module in the device may be specifically referred to descriptions in the corresponding method shown in the foregoing, which are not repeated herein.

The device provided in the embodiments of the present application may implement at least one module of the plurality of modules through an AI (Artificial Intelligence ) model. The functions associated with the AI may be performed by a non-volatile memory, a volatile memory, and a processor.

The processor may include one or more processors. In this case, the one or more processors may be general-purpose processors such as a Central Processing Unit (CPU), an Application Processor (AP), etc., or purely graphics processing units such as Graphics Processing Units (GPUs), visual Processing Units (VPUs), and/or AI-specific processors such as Neural Processing Units (NPUs).

The one or more processors control the processing of the input data according to predefined operating rules or Artificial Intelligence (AI) models stored in the non-volatile memory and the volatile memory. Predefined operational rules or artificial intelligence models are provided through training or learning.

Here, providing by learning refers to deriving a predefined operation rule or an AI model having a desired characteristic by applying a learning algorithm to a plurality of learning data. The learning may be performed in the apparatus itself in which the AI according to the embodiment is performed, and/or may be implemented by a separate server/system.

The AI model may include a plurality of neural network layers. Each layer has a plurality of weight values, and the calculation of one layer is performed by the calculation result of the previous layer and the plurality of weights of the current layer. Examples of neural networks include, but are not limited to, convolutional Neural Networks (CNNs), deep Neural Networks (DNNs), recurrent Neural Networks (RNNs), boltzmann machines limited (RBMs), deep Belief Networks (DBNs), bi-directional recurrent deep neural networks (BRDNNs), generation countermeasure networks (GANs), and deep Q networks.

A learning algorithm is a method of training a predetermined target device (e.g., a robot) using a plurality of learning data so that, allowing, or controlling the target device to make a determination or prediction. Examples of such learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

The device provided by the embodiment of the present application is described above from the aspect of functional modularization, and next, the electronic device provided by the embodiment of the present application will be described from the aspect of hardware materialization, and meanwhile, the computing system of the electronic device will be described.

Based on the same principles as the methods shown in the embodiments of the present disclosure, there is also provided in the embodiments of the present disclosure an electronic device that may include, but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and a processor for executing any of the methods described in the above embodiments by invoking computer operating instructions. Compared with the prior art, the method executed by the electronic equipment in the application optimizes the global map more accurately. .

In an alternative embodiment, an electronic device is provided, as shown in fig. 9, the electronic device 1000 shown in fig. 9 includes: a processor 1001 and a memory 1003. The processor 1001 is coupled to the memory 1003, such as via a bus 1002. Optionally, the electronic device 1000 may also include a transceiver 1004. It should be noted that, in practical applications, the transceiver 1004 is not limited to one, and the structure of the electronic device 1000 is not limited to the embodiment of the present application.

The processor 1001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 1001 may also be a combination that implements computing functionality, such as a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 1002 may include a path to transfer information between the components. Bus 1002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The bus 1002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

The Memory 1003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 1003 is used for storing application code for executing the inventive arrangements and is controlled for execution by the processor 1001. The processor 1001 is configured to execute application code stored in the memory 1003 to implement what is shown in the foregoing method embodiment.

Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs, desktop computers, and the like, and intelligent robots, and the like. The electronic device shown in fig. 9 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

Embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module is not limited to the module itself in some cases, and the second acquisition module may be described as a "module that acquires a spatial feature", for example.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method performed by an electronic device, comprising:

acquiring a search image of the query image;

2. The method of claim 1, wherein the spatial feature comprises a three-dimensional set of points;

for any one of the query image and the search image, acquiring spatial features includes:

And estimating the three-dimensional point set by carrying out three-dimensional matching on the image characteristic points.

3. The method of claim 2, wherein the estimating the relative pose between the query image and the retrieved image based on the spatial features comprises:

and determining the relative pose based on the feature matching result.

4. A method according to claim 3, wherein the feature matching result comprises a first feature matching pair;

the step of matching the spatial features of the query image with the spatial features of the search image at least once to obtain feature matching results comprises the following steps:

5. The method of claim 4, wherein the clustering the three-dimensional point sets of the query image and the search image, respectively, to generate a first feature matching pair between the clustering result of the query image and the clustering result of the search image, comprises:

determining at least one second cube formed by gathering the three-dimensional point set of the search image;

determining a first cluster center of each first cube and a second cluster center of each second cube;

6. The method of claim 4, wherein the feature matching result further comprises a second feature matching pair;

the step of matching the spatial features of the query image with the spatial features of the search image at least once to obtain a feature matching result, and the step of further comprises: performing nearest neighbor search and mutual verification on the neighborhood three-dimensional points of the first feature matching pair to obtain a second feature matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the search image;

the determining the relative pose based on the feature matching result includes:

the relative pose is determined based on the second feature matching pair.

7. The method of claim 6, wherein the feature matching result further comprises a third feature matching pair;

the step of matching the spatial features of the query image with the spatial features of the search image at least once to obtain a feature matching result, and the step of further comprises:

estimating a coarse relative pose between the query image and the search image based on the second feature matching pair; projecting the three-dimensional point set of the search image to a coordinate system of the query image through the rough relative gesture, and determining a third feature matching pair between the three-dimensional point set of the query image and the three-dimensional point set of the search image;

the relative pose is determined based on the third feature matching pair.

8. The method of any of claims 3-7, wherein the determining the relative pose based on the feature matching results comprises:

Estimating the relative pose based on the pair of point matches.

9. The method as recited in claim 1, further comprising:

10. The method of claim 9, wherein optimizing the current global map based on the relative pose results in an optimized global map, comprising:

determining pose drift information based on the relative pose;

11. The method of claim 10, wherein the determining an optimization strategy based on the pose drift information, and optimizing the current global map by the optimization strategy, to obtain an optimized global map, comprises:

12. The method of claim 11, wherein said adjusting the initial global map through a complete bundle to obtain the optimized global map comprises:

13. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: a method performed by an electronic device according to any of claims 1 to 12.

14. A computer readable storage medium storing at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by the processor to implement the method performed by an electronic device of any one of claims 1 to 12.