CN116342826B

CN116342826B - AR map construction method and device

Info

Publication number: CN116342826B
Application number: CN202310598206.8A
Authority: CN
Inventors: 张飞; 李�城; 陈柏君
Original assignee: Shanghai Weizhi Zhuoxin Information Technology Co ltd
Current assignee: Shanghai Weizhi Zhuoxin Information Technology Co ltd
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-10-10
Anticipated expiration: 2043-05-25
Also published as: CN116342826A

Abstract

The invention discloses a construction method and device of an AR map, comprising the following steps: extracting feature data of a plurality of original scene images, wherein the feature data comprise feature points and feature attributes of each feature point; performing feature matching processing on feature points in all original scene images according to feature data corresponding to each original scene image to obtain feature matching results; and constructing a target AR map based on the feature matching result and the feature image corresponding to each original scene image, wherein the feature image corresponding to each original scene image comprises target feature points with matching relation with other original scene images in the original scene image. Therefore, the invention is implemented by extracting and matching the feature data of any original scene image, and constructing the AR map according to the matching result and the feature map generated based on the matching result, thereby reducing the dependency degree of the AR map on the time sequence of the image data in the construction process of the AR map and improving the reusability of the AR map.

Description

AR map construction method and device

Technical Field

The invention relates to the technical field of visual positioning, in particular to a method and a device for constructing an AR map.

Background

AR (Augmented Reality ) technology is a technology that calculates the position and angle of a camera in real time, and inserts a corresponding image into the camera image according to the position and angle. The main core technology in the AR technology is visual positioning, namely, a scene image is matched with a group of reference images, pose information corresponding to the reference images matched with the scene image (namely, pose when the image acquisition equipment acquires the reference images) is determined to be pose information corresponding to the scene images, if the pose information of the query images is required to be acquired, massive reference images are required to be acquired, the early-stage workload of visual positioning is high, and the number of the reference images is high, so that the visual positioning process is complex, and the visual positioning efficiency is reduced.

In order to reduce the earlier workload of visual positioning and improve the efficiency of visual positioning, an AR map-based repositioning technology has been developed, namely, an AR map of a current scene is constructed according to a plurality of acquired scene images, and pose information corresponding to the scene images is redetermined according to the AR map. At present, an AR map is mainly constructed by adopting an SLAM (Simultaneous Localization and Mapping, real-time positioning and map construction) technology, however, practice discovers that the SLAM technology generally adopts multi-frame continuous scene images to construct the AR map, so that the SLAM technology depends on the time sequence of image data, the AR map cannot be constructed according to unordered images, and the SLAM technology is only used for real-time positioning, so that the reusability of the AR map is low.

Therefore, how to reduce the dependency on the time sequence of the image data in the AR map construction process and improve the reusability of the AR map is important.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method and a device for constructing an AR map, which can reduce the dependency degree of the AR map on the time sequence of image data in the construction process of the AR map and improve the reusability of the AR map.

In order to solve the technical problem, the first aspect of the present invention discloses a method for constructing an AR map, which includes:

performing feature extraction processing on a plurality of original scene images acquired from a target scene to obtain feature data of each original scene image, wherein the feature data of each original scene image comprises feature points in the original scene image and feature attributes of each feature point in the original scene image;

performing feature matching processing on feature points in all original scene images according to feature data corresponding to each original scene image to obtain feature matching results, wherein the feature matching processing is used for determining target feature point pairs with matching relations between any two original scene images;

And constructing a target AR map corresponding to the target scene based on the feature matching result and the feature images corresponding to each original scene image, wherein each feature image corresponding to the original scene image comprises target feature points with matching relation with other original scene images in the original scene image, and the target AR map is used for determining pose information corresponding to any scene image acquired from the target scene.

As an optional implementation manner, in the first aspect of the present invention, the constructing, based on the feature matching result and the feature image corresponding to each original scene image, a target AR map corresponding to the target scene includes:

based on a preset segmentation algorithm and the characteristic attribute of each target characteristic point in the characteristic image corresponding to each original scene image, carrying out segmentation processing on the characteristic image corresponding to each original scene image to obtain one or more sub-characteristic images corresponding to the original scene image;

classifying all the sub-feature images according to the feature matching result to obtain a plurality of sub-feature image sets, wherein each sub-feature image set comprises at least 2 sub-feature images with matching relations, and the matching relations among the 2 sub-feature images represent that at least one pair of target feature points with the matching relations are contained in the 2 sub-feature images;

Constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result;

and merging all the AR sub-maps into a target AR map corresponding to the target scene.

In an optional implementation manner, in a first aspect of the present invention, the constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result includes:

according to the feature matching result, determining 2 sub-feature images with highest matching degree in each sub-feature image set as an initialization image of the sub-feature image set;

and constructing an AR sub-map corresponding to each sub-feature image set based on the initialized image of the sub-feature image set.

In a first aspect of the present invention, for each of the sub-feature image sets, the constructing an AR sub-map corresponding to the sub-feature image set based on the initialized image of the sub-feature image set includes:

determining relative pose information corresponding to 2 initialized images based on first target feature points with matching relations in the 2 initialized images of the sub-feature image set, wherein the relative pose information comprises a first rotation matrix and a translation matrix between the 2 initialized images;

Calculating three-dimensional point coordinates corresponding to each first target feature point based on the relative pose information, the camera parameters corresponding to each initialized image which are determined in advance, the weight corresponding to each first target feature point in 2 initialized images which are determined in advance and the coordinates of each first target feature point in 2 initialized images;

and constructing an AR sub-map corresponding to the sub-feature image set based on the three-dimensional point coordinates corresponding to each first target feature point.

As an optional implementation manner, in the first aspect of the present invention, before the merging all the AR sub-maps into the target AR map corresponding to the target scene, the method further includes:

for each sub-feature image set, if the sub-feature image set includes at least 3 sub-feature images, updating an AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set, and triggering and executing the operation of merging all the AR sub-maps into a target AR map corresponding to the target scene;

for each sub-feature image set, updating the AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set includes:

Determining three-dimensional point coordinates corresponding to each second target feature point based on a second rotation matrix corresponding to each sub-feature image in the sub-feature image, optical center coordinates corresponding to the sub-feature image, coordinates of each second target feature point in the corresponding sub-feature image, three-dimensional point coordinates corresponding to each first target feature point and a weighting iterative least square algorithm, wherein the second rotation matrix corresponding to each sub-feature image comprises a rotation matrix between the sub-feature image and the AR sub-map, the optical center coordinates corresponding to each sub-feature image comprises coordinates of an optical center corresponding to the sub-feature image in the AR sub-map, and the second target feature points comprise target feature points with matching relations between all target feature points of each sub-feature image in the sub-feature image set and any other target feature points in the sub-feature image set;

and updating the AR sub-map corresponding to the sub-feature image set based on the three-dimensional point coordinates corresponding to each second target feature point.

For the AR sub-map corresponding to each sub-feature image set, judging whether the AR sub-map meets a preset constraint condition or not based on radar detection data of the target scene, wherein the radar detection data is at least based on a radar detection result obtained by detecting the target scene by radar detection equipment;

for the AR sub-map corresponding to each sub-feature image set, if the AR sub-map does not meet the preset constraint condition, correcting the AR sub-map, and re-executing the radar detection data based on the target scene to judge whether the AR sub-map meets the preset constraint condition or not;

and if all the AR sub-maps corresponding to the sub-feature image sets meet the preset constraint conditions, triggering and executing the operation of combining all the AR sub-maps into the target AR map corresponding to the target scene.

As an optional implementation manner, in the first aspect of the present invention, before the constructing the target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image, the method further includes:

determining one or more feature image sets from feature images corresponding to all the original scene images according to the feature matching result;

Judging whether each characteristic image set meets the conditions corresponding to the preset pruning principle or not;

for each feature image set, if the feature image set does not meet the condition corresponding to the preset pruning principle, pruning at least one feature image in the feature image set to obtain a new feature image set, and re-executing the operation of judging whether the feature image set meets the condition corresponding to the preset pruning principle;

and if each characteristic image set meets the condition corresponding to the preset pruning principle, triggering and executing the operation of constructing the target AR map corresponding to the target scene based on the characteristic matching result and the characteristic images corresponding to each original scene image.

As an optional implementation manner, in the first aspect of the present invention, the preset pruning principle includes a loop consistency principle, each of the feature image sets includes a plurality of feature images arranged randomly or based on a preset rule, and the number of images in each of the feature image sets is greater than or equal to 3;

for each feature image set, the judging whether the feature image set meets the condition corresponding to the preset pruning principle includes:

According to the feature matching result, determining the rotation relation between any 2 adjacent feature images in the feature image set and the rotation relation between the last feature image and the first feature image to obtain a plurality of third rotation matrixes;

calculating a loop value corresponding to the feature image set based on all the third rotation matrixes, wherein the loop value is used for representing the matching degree between an image obtained after any feature image in the feature image set rotates based on all the third rotation matrixes and the feature image;

judging whether the difference between the loop value and a preset loop standard value is larger than a preset difference threshold value or not;

if the difference degree is larger than the preset difference degree, determining that the characteristic image set does not meet the condition corresponding to the preset pruning principle;

and if the difference degree is not greater than the preset difference degree, determining that the characteristic image set meets the condition corresponding to the preset pruning principle.

As an optional implementation manner, in the first aspect of the present invention, the feature points in each of the original scene images include at least a first feature point extracted based on a conventional feature extraction network;

And performing feature matching processing on feature points in all the original scene images according to feature data corresponding to each original scene image to obtain feature matching results, wherein the feature matching processing comprises the following steps:

screening all the original scene images based on radar detection data of the target scene and feature data corresponding to each original scene image to obtain a matching candidate image set;

and carrying out feature matching processing on the first feature points of all the original scene images in the matching candidate image set according to the feature attributes corresponding to the first feature points of each original scene image in the matching candidate image set to obtain feature matching results.

The second aspect of the present invention discloses an AR map construction apparatus, which includes:

the feature extraction module is used for carrying out feature extraction processing on a plurality of original scene images acquired from a target scene to obtain feature data of each original scene image, wherein the feature data of each original scene image comprises feature points in the original scene image and feature attributes of each feature point in the original scene image;

the feature matching module is used for carrying out feature matching processing on feature points in all the original scene images according to the feature data corresponding to each original scene image to obtain feature matching results, and the feature matching processing is used for determining target feature point pairs with matching relations between any two original scene images;

The map construction module is used for constructing a target AR map corresponding to the target scene based on the feature matching result and the feature images corresponding to each original scene image, wherein each feature image corresponding to the original scene image comprises target feature points with matching relations with other original scene images in the original scene image, and the target AR map is used for determining pose information corresponding to any scene image acquired from the target scene.

As an optional implementation manner, in the second aspect of the present invention, the map building module includes:

the clustering segmentation sub-module is used for carrying out segmentation processing on the feature images corresponding to each original scene image based on a preset segmentation algorithm and the feature attribute of each target feature point in the feature images corresponding to each original scene image to obtain one or more sub-feature images corresponding to the original scene image, classifying all the sub-feature images according to the feature matching result to obtain a plurality of sub-feature image sets, wherein each sub-feature image set comprises at least 2 sub-feature images with matching relations, and the matching relations among the 2 sub-feature images represent that at least one pair of target feature points with the matching relations exist in the 2 sub-feature images;

The map construction sub-module is used for constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result;

and the map merging sub-module is used for merging all the AR sub-maps into a target AR map corresponding to the target scene.

In a second aspect of the present invention, as an optional implementation manner, the map construction sub-module constructs an AR sub-map corresponding to each sub-feature image set based on the feature matching result, where the specific manner includes:

In a second aspect of the present invention, for each sub-feature image set, the specific manner in which the map construction sub-module constructs the AR sub-map corresponding to the sub-feature image set based on the initialized image of the sub-feature image set includes:

In a second aspect of the present invention, the map construction sub-module is further configured to, for each of the sub-feature image sets, if the sub-feature image set includes at least 3 sub-feature images, update an AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set before the map merging sub-module merges all the AR sub-maps into a target AR map corresponding to the target scene, and trigger the map merging sub-module to perform the operation of merging all the AR sub-maps into the target AR map corresponding to the target scene;

For each sub-feature image set, the specific way for the map construction sub-module to update the AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set includes:

In a second aspect of the present invention, the map construction sub-module is further configured to, before the map merging sub-module merges all the AR sub-maps into a target AR map corresponding to the target scene, determine, for each AR sub-map corresponding to the set of sub-feature images, whether the AR sub-map meets a preset constraint condition based on radar detection data of the target scene, correct the AR sub-map if the AR sub-map does not meet the preset constraint condition, and re-execute the operation of determining, based on the radar detection data of the target scene, whether the AR sub-map meets the preset constraint condition, where the radar detection data is based at least on a radar detection result obtained by detecting the target scene by a radar detection device; and if all the AR sub-maps corresponding to the sub-feature image sets meet the preset constraint conditions, triggering the map merging sub-module to execute the operation of merging all the AR sub-maps into a target AR map corresponding to the target scene.

As an alternative embodiment, in the second aspect of the present invention, the apparatus further includes a pruning module, where the pruning module includes:

the determining sub-module is used for determining one or more feature image sets from the feature images corresponding to all the original scene images according to the feature matching result before the map building module builds the target AR map corresponding to the target scene based on the feature matching result and the feature images corresponding to each original scene image;

the judging sub-module is used for judging whether the characteristic image sets meet the conditions corresponding to the preset pruning principle or not for each characteristic image set, and if each characteristic image set meets the conditions corresponding to the preset pruning principle, triggering the map construction module to execute the operation of constructing the target AR map corresponding to the target scene based on the characteristic matching result and the characteristic images corresponding to each original scene image;

and the pruning sub-module is used for pruning at least one characteristic image in the characteristic image set to obtain a new characteristic image set if the characteristic image set does not meet the condition corresponding to the preset pruning principle, and triggering the judging sub-module to re-execute the operation of judging whether the characteristic image set meets the condition corresponding to the preset pruning principle.

As an optional implementation manner, in the second aspect of the present invention, the preset pruning principle includes a loop consistency principle, each of the feature image sets includes a plurality of feature images arranged randomly or based on a preset rule, and the number of images in each of the feature image sets is greater than or equal to 3;

for each feature image set, the specific mode of judging whether the feature image set meets the condition corresponding to the preset pruning principle by the judging submodule comprises the following steps:

As an optional implementation manner, in the second aspect of the present invention, the feature points in each of the original scene images include at least a first feature point extracted based on a conventional feature extraction network;

the feature matching module performs feature matching processing on feature points in all the original scene images according to feature data corresponding to each original scene image to obtain a specific mode of feature matching results, and the specific mode comprises the following steps:

The third aspect of the present invention discloses another AR map construction apparatus, which includes:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to execute the method for constructing the AR map disclosed in the first aspect of the present invention.

A fourth aspect of the present invention discloses a computer storage medium storing computer instructions for performing the method of constructing an AR map disclosed in the first aspect of the present invention when the computer instructions are called.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, feature extraction processing is carried out on a plurality of original scene images acquired from a target scene to obtain feature data of each original scene image, wherein the feature data of each original scene image comprises feature points in the original scene image and feature attributes of each feature point in the original scene image; according to the feature data corresponding to each original scene image, carrying out feature matching processing on feature points in all original scene images to obtain feature matching results, wherein the feature matching processing is used for determining target feature point pairs with matching relations between any two original scene images; based on the feature matching result and the feature image corresponding to each original scene image, a target AR map corresponding to a target scene is constructed, wherein the feature image corresponding to each original scene image comprises target feature points with matching relations with other original scene images in the original scene images, and the target AR map is used for determining pose information corresponding to any scene image acquired from a target scene. Therefore, by extracting and matching the feature data of any original scene image, the invention ensures that the feature points in different original scene images are corresponding, and the AR map of the scene is constructed according to the matching result and the feature map generated based on the matching result, thereby reducing the dependency degree of the AR map on the time sequence of the image data in the construction process and improving the reusability of the AR map.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an AR map construction method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another method for constructing an AR map according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an AR map construction apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another AR map construction apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an AR map construction apparatus according to still another embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The invention discloses a method and a device for constructing an AR map, which can enable feature points in different original scene images to be corresponding by extracting and matching feature data of any original scene image, and construct the AR map of the scene according to a matching result and a feature map generated based on the matching result, thereby reducing the dependency degree of the AR map on the time sequence of the image data in the construction process and improving the reusability of the AR map. The following will describe in detail.

Example 1

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for constructing an AR map according to an embodiment of the present invention. The method for constructing the AR map described in fig. 1 may be applied to a server to construct a target AR map based on an original scene image uploaded by a terminal device, or may be applied to a terminal device to construct a target AR map based on an original scene image acquired by the terminal device, where the terminal device may be an image acquisition device (such as a camera), a smart phone, a wearable device, or the like, and the embodiment of the present invention is not limited. As shown in fig. 1, the construction method of the AR map may include the following operations:

101. and carrying out feature extraction processing on a plurality of original scene images acquired from the target scene to obtain feature data of each original scene image.

The feature data of each original scene image may include feature points in the original scene image and feature attributes of each feature point in the original scene image, optionally, the feature data of the original scene image includes first feature data extracted based on a conventional feature extraction network and/or second feature data extracted based on a learning feature extraction network, the first feature data includes feature attributes of the first feature points and each first feature point, the second feature data includes feature attributes of the second feature points and each second feature point, further optionally, the conventional feature extraction network may be a SIFT (Scale Invariant Feature Transform, scale-invariant feature transform) -GPU network, and the learning feature extraction network may be a superfint network.

As an optional implementation manner, performing feature extraction processing on a plurality of original scene images acquired from a target scene to obtain feature data in each original scene image may include:

inputting a plurality of original scene images acquired from a target scene into a traditional feature extraction network, so that the traditional feature extraction network performs feature extraction processing on each original scene image to obtain first feature data of each original scene image; and/or the number of the groups of groups,

inputting a plurality of original scene images acquired from a target scene into a learning feature extraction network, so that the learning feature extraction network performs feature extraction processing on each original scene image to obtain second feature data of each original scene image.

Therefore, the accuracy of extracting the features of the general scene images is improved by extracting the traditional features through the traditional feature extraction network, and meanwhile, the learning features are extracted through the learning feature extraction network, so that the feature data is enriched, and the accuracy of extracting the features of some difficult scene images, such as illumination change, is improved.

In this optional embodiment, optionally, the feature extraction processing is performed on each original scene image by using a conventional feature extraction network to obtain first feature data of each original scene image, which may include:

Convolving each original scene image based on multi-layer Gaussian check by a traditional feature extraction network to obtain a Gaussian pyramid image corresponding to each original scene image;

determining extreme points in a local area in a Gaussian pyramid image corresponding to each original scene image by a traditional feature extraction network, and taking the extreme points as first feature points of the original scene image;

allocating a main direction of each first feature point to the first feature point by a traditional feature extraction network;

the feature description of each first feature point in each gradient direction is calculated as the feature attribute of the first feature point by the conventional feature extraction network based on the main direction of the first feature point.

In this optional embodiment, optionally, the extremum point may include a maximum value or a minimum value, when determining the extremum point in each gaussian pyramid image, the conventional feature extraction network needs to compare each sampling point in the gaussian pyramid image with 8 adjacent sampling points in the same layer of scale space and 18 adjacent sampling points corresponding to the upper and lower adjacent scale spaces, and if the sampling point is the extremum point in all three scale spaces, the sampling point is the first feature point. Thus, the accuracy and reliability of extreme point positioning can be improved.

In this optional embodiment, optionally, when the main direction of each first feature point is allocated to each first feature point, calculating a gradient amplitude angle and a gradient amplitude of each pixel point in the domain with the first feature point as a center and a preset length as a radius, determining a sum of gradient amplitudes of the pixel points falling into a preset amplitude angle interval according to the gradient amplitude angle and the gradient amplitude of each pixel point in the neighborhood, obtaining a total gradient amplitude corresponding to the amplitude angle interval, and determining an amplitude angle corresponding to an amplitude angle interval with the largest total gradient amplitude in all amplitude angle intervals as the main direction of the first feature point. Further alternatively, the preset radius may be 3×1.5σ, where σ is the scale of the gaussian pyramid image, and 0 to 360 ° may be divided into 8 amplitude angle intervals of 45 °.

In this optional embodiment, optionally, a serial port with a preset size may be taken with the first feature point as the center, a gradient direction histogram of each gradient direction is calculated on a plurality of small blocks of the window, and an accumulated value of each gradient direction is drawn, so as to obtain a feature description of the first feature point. For example, if there are 8 argument intervals, there are 8 gradient directions, a 16×16 window may be centered on the first feature point, a gradient direction histogram of the 8 gradient directions is calculated on each 4×4 patch, and an accumulated value of each gradient direction (each lattice is used as a dimension of the descriptor) is drawn, so as to obtain a descriptor of 4×4×8=128 dimensions.

Therefore, after the Gaussian pyramid image corresponding to the original scene image is constructed, the extreme points are screened from the Gaussian pyramid, so that the feature points can be positioned in multiple visual field ranges with different sizes, the accuracy and the reliability of feature point positioning are improved, the main direction is allocated to the feature points, the feature description is determined, and the accuracy and the reliability of traditional feature extraction of the original scene image are improved.

In this optional embodiment, optionally, the learning feature extraction network may include an encoding network, a feature point detection network, and a descriptor generating network, where the learning feature extraction network performs feature extraction processing on each original scene image to obtain second feature data of each original scene image, and may include:

encoding each original scene image by an encoding network to obtain an encoding result of each original scene image;

calculating a probability value corresponding to each pixel point in each original scene image by a characteristic point detection network according to the coding result of each original scene image, and determining a second characteristic point in the original scene image according to the probability value corresponding to each pixel point;

The method comprises the steps of learning a coding result of each original scene image by a descriptor generating network to obtain a plurality of semi-dense descriptors, calculating each semi-dense descriptor based on a bicubic difference algorithm to obtain complete descriptors, and finally carrying out standardization processing on each complete descriptor based on a standardization algorithm to obtain descriptors with unit length serving as characteristic attributes of second characteristic points.

In this alternative embodiment, further optionally, the step performed by the feature point detection network has no precedence relation with the step performed by the descriptor generating network, that is, the second feature point and the generating order of the second feature point have no precedence relation. Optionally, the encoding network may adopt a network structure similar to VGG, so as to reduce the dimension of the original scene image, reduce the calculation amount of the subsequent network, and the normalization algorithm may be an L2 normalization algorithm. When training the learning feature extraction network, two loss functions are needed to calculate the loss value of the feature point detection network and the loss value of the descriptor generation network respectively, and the two loss values are added to obtain the loss value of the whole learning feature extraction network.

Therefore, after the original scene image is encoded, the second feature points and the feature descriptions thereof can be determined simultaneously according to the encoding result, so that the accuracy and the reliability of learning feature extraction are improved, the extraction sequence of the second feature points and the feature descriptions thereof is not related, and the efficiency of learning feature extraction is improved.

102. And carrying out feature matching processing on feature points in all original scene images according to the feature data corresponding to each original scene image to obtain feature matching results.

The feature matching process is used for determining target feature point pairs with matching relations between any two original scene images.

As an optional implementation manner, if the feature points in the original scene images include the first feature points, performing feature matching processing on the feature points in all the original scene images according to the feature data corresponding to each original scene image to obtain feature matching results, may include:

screening all original scene images based on radar detection data of a target scene and feature data corresponding to each original scene image to obtain a matching candidate image set;

Preferably, the radar detection data is lidar detection data.

Therefore, the implementation of the alternative implementation mode can screen the original scene images by taking the radar detection data of the target scene as priori position information, global matching is not required to be carried out on all the original scene images, and the workload of image matching calculation is reduced.

In this optional embodiment, optionally, performing feature matching processing on the first feature points of all the original scene images in the matching candidate image set according to the feature attribute corresponding to the first feature point of each original scene image in the matching candidate image set to obtain a feature matching result may include:

for each first feature point of each original scene image in the matching candidate set, according to the feature attribute corresponding to the first feature point, calculating cosine similarity between the first feature point and each first feature point of other original scene images in the matching candidate set to obtain nearest neighbor feature points corresponding to the first feature point and secondary neighbor feature points corresponding to the first feature point, wherein the nearest neighbor feature points corresponding to the first feature point are first feature points with highest cosine similarity with the first feature point, and the secondary neighbor feature points corresponding to the first feature point are first feature points with next highest cosine similarity with the first feature point;

and for each first feature point of each original scene image in the matching candidate set, calculating the ratio between the cosine similarity between the first feature point and the next adjacent feature point and the cosine similarity between the first feature point and the nearest adjacent feature point, and if the ratio is smaller than a preset ratio threshold, determining the nearest adjacent feature point corresponding to the first feature point as a first feature point with a matching relation with the first feature point.

Therefore, the matching degree between different feature points can be calculated by utilizing the cosine similarity, and the accuracy and the reliability of feature matching are improved.

In this optional embodiment, optionally, after performing feature matching processing on the first feature points of all the original scene images in the matching candidate image set according to the feature attribute corresponding to the first feature point of each original scene image in the matching candidate image set to obtain a feature matching result, the method may further include: the feature matching result is filtered based on a preset screening algorithm, which may be a Lo-ranac (Locally Optimized Random Sample Consensus, locally optimized random sampling consistency) algorithm. Therefore, the feature point pairs which are mismatched can be removed, and the accuracy of feature matching is further improved.

As another optional implementation manner, if the feature points in the original scene images include the second feature points, performing feature matching processing on the feature points in all the original scene images according to the feature data corresponding to each original scene image to obtain feature matching results, may include:

inputting the feature data corresponding to each original scene image into a learning feature association network, so that the learning feature extraction network performs feature matching processing on second feature points in all original scene images according to the feature data corresponding to each original scene image to obtain feature matching results;

The learning feature extraction network may include an attention mechanism network and an optimal matching layer network, and performs feature matching processing on second feature points in all original scene images according to feature data corresponding to each original scene image by the learning feature extraction network to obtain a specific mode of feature matching results, where the specific mode includes:

encoding each second feature point in each original scene image and the feature attribute corresponding to the second feature point into a feature matching vector by an attention mechanism network, and carrying out attention distribution on each feature matching vector based on a preset attention mechanism to obtain a new feature matching vector;

and calculating the inner product of the feature matching vector corresponding to the second feature point and the feature matching vector corresponding to each second feature point in other original scene images by the optimal matching layer network for each second feature point in each original scene image to obtain a matching degree score matrix, and solving the matching degree score matrix based on a preset solving algorithm to obtain a feature matching result.

In this alternative embodiment, optionally, the learning feature association network may be a SuperGlue network, the preset attention mechanism may include a cross attention mechanism and/or a self attention mechanism, and the preset solving algorithm may be a Sinkhorn algorithm.

Therefore, according to the implementation of the method, the feature matching vector is encoded for each second feature point, so that feature data is simplified, the subsequent comparison of features of different second feature points is facilitated, the attention is distributed for each feature matching vector, the accuracy of subsequent feature matching is improved, in addition, the feature matching result of an original scene image is obtained by solving a matching degree scoring matrix, and therefore feature matching and outlier filtering can be simultaneously carried out, and the feature matching efficiency is improved.

103. And constructing a target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image.

In the embodiment of the invention, the feature image corresponding to each original scene image may include target feature points having a matching relationship with other original scene images in the original scene image, and the target AR map is used for determining pose information corresponding to any scene image acquired from the target scene.

Therefore, by extracting and matching the feature data of any original scene image, the embodiment of the invention ensures that the feature points in different original scene images are corresponding, and the AR map of the scene is constructed according to the matching result and the feature map generated based on the matching result, thereby reducing the dependency degree on the time sequence of the image data in the AR map construction process and improving the reusability of the AR map.

In an alternative embodiment, before constructing the target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image, the method may further include:

according to the feature matching result, one or more feature image sets are determined from feature images corresponding to all original scene images;

and if each characteristic image set meets the condition corresponding to the preset pruning principle, triggering and executing the characteristic image corresponding to each original scene image based on the characteristic matching result to construct a target AR map corresponding to the target scene.

Therefore, by implementing the alternative embodiment, the feature images constructed based on the key feature points in the original scene images can be pruned, so that the feature points in the feature images are reduced, the workload and redundancy of image construction are reduced, and the image construction efficiency is improved.

In this optional embodiment, as an optional implementation manner, the preset pruning principle may include a loop consistency principle, and each feature image set may include a plurality of feature images arranged randomly or based on a preset rule, where the number of images in each feature image set is greater than or equal to 3;

for each feature image set, determining whether the feature image set meets a condition corresponding to a preset pruning principle may include:

Optionally, the feature matching result may be analyzed based on an operational averaging algorithm, to obtain a third rotation matrix between the two feature images.

For example, if the number of images in a certain feature image set is equal to 3, based on all the third rotation matrices, the loop value corresponding to the feature image set may be calculated as follows:

；

wherein epsilon is used to represent the loop value,is the third rotation between the ith characteristic image and the jth imageA matrix. />

Therefore, the implementation of the alternative implementation method is based on the loop consistency principle to prune the feature images, so that the quality of the initial view is improved, the uncertainty of triangularization in the process of building the images caused by a short base line in the feature images is reduced, and the re-projection error of the AR map is reduced.

Example two

Referring to fig. 2, fig. 2 is a flowchart illustrating another method for constructing an AR map according to an embodiment of the present invention. The method for constructing the AR map described in fig. 2 may be applied to a server to construct a target AR map based on an original scene image uploaded by a terminal device, or may be applied to a terminal device to construct a target AR map based on an original scene image acquired by the terminal device, where the terminal device may be an image acquisition device (such as a camera), a smart phone, a wearable device, or the like, and the embodiment of the present invention is not limited. As shown in fig. 2, the construction method of the AR map may include the following operations:

201. And carrying out feature extraction processing on a plurality of original scene images acquired from the target scene to obtain feature data of each original scene image.

The feature data of each original scene image may include feature points in the original scene image and feature attributes of each feature point in the original scene image;

202. and carrying out feature matching processing on feature points in all original scene images according to the feature data corresponding to each original scene image to obtain feature matching results.

The feature matching process is used for determining target feature point pairs with matching relations between any two original scene images;

203. and based on a preset segmentation algorithm and the characteristic attribute of each target characteristic point in the characteristic image corresponding to each original scene image, carrying out segmentation processing on the characteristic image corresponding to each original scene image to obtain one or more sub-characteristic images corresponding to the original scene image.

Alternatively, the preset segmentation algorithm may be a clustering algorithm, and the clustering algorithm may include one of spectral clustering (spectral clustering) algorithm, NCUT (normalized cut) algorithm, community detection and clustering (community detection and clustering) algorithm, and K-Means algorithm. Preferably, the NCUT algorithm is used as the preset segmentation algorithm. The similarity between different feature points in the segmented feature images can be improved through a clustering algorithm, and therefore the AR map building effect can be improved.

204. And classifying all the sub-feature images according to the feature matching result to obtain a plurality of sub-feature image sets.

Each sub-feature image set may include at least 2 sub-feature images having a matching relationship, where the presence of the matching relationship between the 2 sub-feature images indicates that at least one pair of target feature points having the matching relationship are included in the 2 sub-feature images.

205. And constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result.

As an optional implementation manner, constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result may include:

according to the feature matching result, determining 2 sub-feature images with highest matching degree in each sub-feature image set, and taking the 2 sub-feature images as an initialization image of the sub-feature image set;

for each sub-feature image set, constructing an AR sub-map corresponding to the sub-feature image set based on the initialized image of the sub-feature image set.

In this optional embodiment, optionally, for each sub-feature image set, constructing an AR sub-map corresponding to the sub-feature image set based on the initialized image of the sub-feature image set may include:

Determining relative pose information corresponding to the 2 initialized images based on first target feature points with matching relations in the 2 initialized images of the sub-feature image set, wherein the relative pose information can comprise a first rotation matrix and a translation matrix between the 2 initialized images;

calculating three-dimensional point coordinates corresponding to each first target feature point based on the relative pose information, the camera parameters corresponding to each pre-determined initialized image, the weight corresponding to each pre-determined first target feature point in 2 initialized images and the coordinates of each first target feature point in 2 initialized images;

In this alternative embodiment, optionally, a DLT (Direct Linear Transform, direct linear transformation) algorithm, a midpoint algorithm, an optimal (optimization-based) algorithm may be used in calculating the three-dimensional point coordinates corresponding to each first target feature point. If the three-dimensional point coordinates corresponding to the first target feature point are calculated by adopting the migrant algorithm, the camera parameters at least comprise camera depth, and preferably, the three-dimensional point coordinates corresponding to the first target feature point are calculated by adopting the improved migrant algorithm:

；

Wherein, the liquid crystal display device comprises a liquid crystal display device,for representing three-dimensional point coordinates corresponding to the first target feature point, R, T are respectively for representing a first rotation matrix and a translation matrix between 2 initialized images, +.>、/>For representing the camera depth corresponding to 2 initialized graphics, respectively>、/>For representing the coordinates of the first target feature point in 2 initialized images, respectively, ">、/>And the weights are respectively used for representing the corresponding weights of the first target feature points in 2 initialized images.

Therefore, according to the implementation of the alternative implementation mode, the two initialization images with the highest matching degree can be subjected to triangulation processing to obtain the segmented AR sub-map, the weights of the same feature points in different initialization images are introduced in the two-view triangulation process, so that the re-projection error when the remote points in the initialization images are positioned is reduced, the AR sub-map is constructed by utilizing a middle point algorithm, the requirements of the two-view triangulation on the pose accuracy are facilitated, and the situation of negative depth is further reduced.

206. And combining all the AR sub-maps into a target AR map corresponding to the target scene.

As an optional implementation manner, merging all AR sub-maps into a target AR map corresponding to the target scene may include:

determining a construction error corresponding to each AR sub-map in an AR sub-map set formed by all AR sub-maps, and determining the AR sub-map with the minimum construction error as a reference AR sub-map;

Based on a similarity transformation principle, aligning other AR sub-maps except the reference AR sub-map in the AR sub-map set with the reference AR sub-map to obtain an alignment result, and merging all AR sub-maps in the AR sub-map set according to the alignment result to obtain a target AR map corresponding to the target scene.

In this optional embodiment, optionally, before the AR sub-map set other than the reference AR sub-map is aligned with the reference AR sub-map based on the similarity transformation principle, the method may further include: and deleting the AR sub-map with the construction error larger than the preset error threshold from the AR sub-map set, so that the accuracy and stability of the AR map can be improved.

It can be seen that implementing this alternative embodiment determines the AR sub-map with the smallest construction error as the reference AR sub-map, and merges all the AR sub-maps on the basis of the reference AR sub-map, thereby improving the accuracy and stability of AR map construction.

Therefore, by extracting and matching the feature data of any original scene image, the embodiment of the invention ensures that the feature points in different original scene images are corresponding, and the AR map of the scene is constructed according to the matching result and the feature map generated based on the matching result, thereby reducing the dependency degree of the AR map on the time sequence of the image data in the construction process of the AR map, improving the reusability of the AR map, and being beneficial to improving the construction efficiency and reducing the track drift caused by the accumulation of large scene construction errors by adopting the strategy of block parallel positioning and construction.

In an alternative embodiment, before merging all AR sub-maps into a target AR map corresponding to the target scene, the method may further include:

for each sub-feature image set, if the sub-feature image set comprises at least 3 sub-feature images, updating an AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set, and triggering and executing the operation of merging all the AR sub-maps into a target AR map corresponding to a target scene;

for each sub-feature image set, updating the AR sub-map corresponding to the sub-feature image set based on all the sub-feature images in the sub-feature image set may include:

determining three-dimensional point coordinates corresponding to each second target feature point based on a second rotation matrix corresponding to each sub-feature image in the sub-feature images, optical center coordinates corresponding to the sub-feature images, coordinates of each second target feature point in the corresponding sub-feature images, three-dimensional point coordinates corresponding to each first target feature point and a weighting iterative least square algorithm, wherein the second rotation matrix corresponding to each sub-feature image comprises a rotation matrix between the sub-feature image and an AR sub-map, the optical center coordinates corresponding to each sub-feature image comprises coordinates of an optical center corresponding to the sub-feature image in the AR sub-map, and the second target feature point can comprise a target feature point with a matching relationship between any target feature point of any other sub-feature image in the sub-feature image set in all target feature points of each sub-feature image set of the sub-feature image set;

In this alternative embodiment, alternatively, the weighted iterative least squares algorithm may be:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,three-dimensional point coordinates for representing the second target feature point>For representing the optical center coordinates corresponding to the ith sub-feature image, < >>Rotation matrix for representing the corresponding sub-feature image of the ith sub-feature image,>for representing the coordinates of the second target feature point in the ith sub-feature image, +.>The method is used for representing the weight distributed for the ith image in the process of iteration by adopting the weight selection iterative least square algorithm.

For each second target feature point, continuously and iteratively acquiring the corresponding second target feature point through a weighting iterative least square algorithmAnd->So that->Below a certain preset value +.>The method comprises the steps of carrying out a first treatment on the surface of the When a certain iteration obtains +>From last iteration +.>When the difference between them is smaller than the preset difference threshold, the iteration is terminated and +.>And determining the coordinates of the three-dimensional points corresponding to the second target feature points.

Therefore, according to the implementation of the alternative implementation mode, the more complete AR sub-map is built by multi-view triangulation based on other characteristic images on the basis of the initial AR sub-map built by two-view triangulation, incremental positioning mapping is achieved, positioning in the multi-view triangulation process is achieved through a weight selection iterative least square method, so that improvement of the anti-poor capability of multi-view triangulation is facilitated, noise caused by a short base line is eliminated, and accuracy and stability of the AR sub-map are further improved.

In another alternative embodiment, the method may further comprise: the method may further include, before merging all AR sub-maps into a target AR map corresponding to the target scene:

for the AR sub-map corresponding to each sub-feature image set, judging whether the AR sub-map meets a preset constraint condition or not based on radar detection data of a target scene, wherein the radar detection data is at least based on a radar detection result obtained by detecting the target scene by radar detection equipment;

for the AR sub-map corresponding to each sub-feature image set, if the AR sub-map does not meet the preset constraint condition, correcting the AR sub-map, and re-executing the radar detection data based on the target scene to judge whether the AR sub-map meets the preset constraint condition;

and if the AR sub-maps corresponding to all the sub-feature image sets meet the preset constraint conditions, triggering and executing the operation of combining all the AR sub-maps into the target AR map corresponding to the target scene.

Optionally, the radar detection device may be a lidar detection device, and the radar detection data may further include real-time radar device information when the radar detection device detects the target scene.

In this optional embodiment, as an optional implementation manner, the preset constraint condition includes a beam method adjustment constraint condition, and for an AR sub-map corresponding to each sub-feature image, determining whether the AR sub-map meets the preset constraint condition may include:

determining an error value corresponding to the AR sub-map based on radar detection data of a target scene and a preset beam method adjustment formula;

judging whether the error value is larger than a preset error value or not;

if the error value is larger than the preset error value, determining that the AR sub-map does not meet the preset constraint condition;

if the error value is not greater than the preset error value, determining that the AR sub-map meets the preset constraint condition.

In this optional embodiment, optionally, the radar detection result may include radar detection coordinates corresponding to each target feature point in the radar detection space, and the real-time radar device information may include a radar real-time position of the radar detection device and an optical center corresponding to the radar detection result.

In this alternative embodiment, alternatively, the beam method adjustment formula may be:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,for indicating error values, ++>For representing the radar detection coordinates of the j-th target feature point corresponding to the radar detection space,/for the target feature point >Projection matrix for representing the projection of an AR sub-map onto a radar detection space, respectively, < >>Three-dimensional point coordinates representing the corresponding j-th target feature point in the AR map, < ->Coordinates in the AR sub-map for representing the real-time position of the radar,/->And the coordinates of the optical center corresponding to the radar detection result in the AR sub-map are represented.

Therefore, by implementing the optional embodiment, the beam method adjustment constraint can be carried out on the constructed AR sub-map based on the radar detection data of the target scene, the condition that the AR map is not accurately constructed due to the fact that GPS information is not available in the indoor scene is reduced, meanwhile, the track drift phenomenon of incremental map construction of the large scene is reduced, and the accuracy and stability of the AR map are further improved.

Example III

Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for constructing an AR map according to an embodiment of the present invention. The device for constructing the AR map described in fig. 3 may be applied to a server to construct a target AR map based on an original scene image uploaded by a terminal device, or may be applied to a terminal device to construct a target AR map based on an original scene image acquired by the terminal device, where the terminal device may be an image acquisition device (such as a camera), a smart phone, a wearable device, or the like, and the embodiment of the present invention is not limited. As shown in fig. 3, the AR map construction apparatus may include:

The feature extraction module 301 is configured to perform feature extraction processing on a plurality of original scene images acquired from a target scene, so as to obtain feature data of each original scene image, where the feature data of each original scene image may include feature points in the original scene image and feature attributes of each feature point in the original scene image;

the feature matching module 302 is configured to perform feature matching processing on feature points in all original scene images according to feature data corresponding to each original scene image to obtain a feature matching result, where the feature matching processing is used to determine a target feature point pair with a matching relationship between any two original scene images;

the map construction module 303 is configured to construct a target AR map corresponding to a target scene based on the feature matching result and feature images corresponding to each original scene image, where the feature image corresponding to each original scene image may include target feature points in the original scene image that have a matching relationship with other original scene images, and the target AR map is used to determine pose information corresponding to any scene image acquired from the target scene.

Therefore, the device described in fig. 3 is implemented to extract and match the feature data of any original scene image, so that the feature points in different original scene images are corresponding, and the AR map of the scene is constructed according to the matching result and the feature map generated based on the matching result, thereby reducing the dependency degree of the AR map on the time sequence of the image data in the construction process, and improving the reusability of the AR map.

In an alternative embodiment, as shown in fig. 4, the map construction module 303 may include:

the cluster segmentation sub-module 3031 is configured to segment the feature image corresponding to each original scene image based on a preset segmentation algorithm and a feature attribute of each target feature point in the feature image corresponding to each original scene image to obtain one or more sub-feature images corresponding to each original scene image, and classify all the sub-feature images according to a feature matching result to obtain a plurality of sub-feature image sets, where each sub-feature image set may include at least 2 sub-feature images with a matching relationship, and the presence of the matching relationship between the 2 sub-feature images indicates that at least one pair of target feature points with the matching relationship are included in the 2 sub-feature images;

the map construction sub-module 3032 is configured to construct an AR sub-map corresponding to each sub-feature image set based on the feature matching result;

the map merging sub-module 3033 is configured to merge all the AR sub-maps into a target AR map corresponding to the target scene.

It can be seen that implementing the apparatus described in fig. 4 is advantageous for improving mapping efficiency and reducing trajectory drift caused by large scene mapping error accumulation by employing the strategy of blocking parallel positioning and mapping to construct an AR map.

In another alternative embodiment, as shown in fig. 4, the specific manner of constructing the AR sub-map corresponding to each sub-feature image set by the map construction sub-module 3032 based on the feature matching result may include:

Therefore, the device described in fig. 4 is implemented to perform triangulation processing based on the two initialized images with the highest matching degree to obtain the segmented AR sub-map, so that the accuracy of AR sub-map construction is improved.

In yet another alternative embodiment, as shown in fig. 4, for each sub-feature image set, the specific manner of constructing the AR sub-map corresponding to the sub-feature image set by the map construction sub-module 3032 based on the initialized image of the sub-feature image set may include:

It can be seen that implementing the apparatus described in fig. 4 introduces the weight of the same feature point in different initialized images during the two-view triangulation process, thereby reducing the re-projection error when locating the distant point in the initialized image.

In yet another alternative embodiment, as shown in fig. 4, the map construction submodule 3032 is further configured to, for each sub-feature image set, update, based on all sub-feature images in the sub-feature image set, an AR sub-map corresponding to the sub-feature image set before the map merging submodule 3033 merges all AR sub-maps into a target AR map corresponding to the target scene, and trigger the map merging submodule 3033 to execute the above operation of merging all AR sub-maps into the target AR map corresponding to the target scene, if the sub-feature image set includes at least 3 sub-feature images;

For each sub-feature image set, the specific manner of updating the AR sub-map corresponding to the sub-feature image set by the map construction sub-module 3032 based on all the sub-feature images in the sub-feature image set may include:

Therefore, the device described in the implementation fig. 4 performs multi-view triangulation based on other feature images to construct a more complete AR sub-map on the basis of the initial AR sub-map constructed by two-view triangulation, incremental positioning mapping is achieved, positioning in the multi-view triangulation process is achieved through a weight selection iterative least square method, so that the anti-poor capability of multi-view triangulation is improved, noise caused by a short base line is eliminated, and the accuracy and stability of the AR sub-map are further improved.

In yet another alternative embodiment, as shown in fig. 4, the map construction sub-module 3032 is further configured to, before the map merging sub-module 3033 merges all the AR sub-maps into a target AR map corresponding to the target scene, determine, for each AR sub-map corresponding to the set of sub-feature images, whether the AR sub-map meets a preset constraint condition based on radar detection data of the target scene, correct the AR sub-map if the AR sub-map does not meet the preset constraint condition, and re-execute the above operation of determining whether the AR sub-map meets the preset constraint condition based on the radar detection data of the target scene, where the radar detection data is at least based on a radar detection result obtained by detecting the target scene by the radar detection device; if all the AR sub-maps corresponding to the sub-feature image sets meet the preset constraint condition, the map merging sub-module 3033 is triggered to execute the above operation of merging all the AR sub-maps into the target AR map corresponding to the target scene.

In yet another alternative embodiment, as shown in fig. 4, the constraint conditions may include a beam method adjustment constraint condition, and for each AR sub-map corresponding to each sub-feature image, the specific manner in which the map construction sub-module 3032 determines whether the AR sub-map meets the preset constraint condition may include:

judging whether the error value is larger than a preset error value or not;

Therefore, the device described in fig. 4 can also perform beam method adjustment constraint on the constructed AR sub-map based on the radar detection data of the target scene, so that the condition that the AR map is not accurately constructed due to no GPS information in the indoor scene is reduced, meanwhile, the track drift phenomenon of incremental map construction of the large scene is reduced, and the accuracy and stability of the AR map are further improved.

In yet another alternative embodiment, as shown in fig. 4, the apparatus may further include a pruning module 304, and the pruning module 304 may include:

A determining submodule 3041, configured to determine, before the map building module 303 builds the target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image, one or more feature image sets from the feature images corresponding to all the original scene images according to the feature matching result;

the judging submodule 3042 is configured to judge, for each feature image set, whether the feature image set meets a condition corresponding to a preset pruning principle, and if each feature image set meets the condition corresponding to the preset pruning principle, trigger the map building module 303 to execute the above feature image corresponding to each original scene image based on the feature matching result, and build a target AR map corresponding to a target scene;

the pruning submodule 3043 is configured to, for each feature image set, if the feature image set does not meet the condition corresponding to the preset pruning principle, prune at least one feature image in the feature image set to obtain a new feature image set, and trigger the judging submodule 3042 to re-execute the above operation of judging whether the feature image set meets the condition corresponding to the preset pruning principle.

Therefore, the device described in fig. 4 can also perform pruning processing on the feature images constructed based on the key feature points in the original scene images, so that the feature points in the feature images are reduced, the workload and redundancy of image construction are reduced, and the image construction efficiency is improved.

In yet another alternative embodiment, as shown in fig. 4, the preset pruning principle may include a loop consistency principle, and each feature image set may include a plurality of feature images arranged randomly or based on a preset rule, where the number of images in each feature image set is greater than or equal to 3;

for each feature image set, the specific manner of determining whether the feature image set meets the condition corresponding to the preset pruning principle by using the determining sub-module 3042 may include:

Therefore, the device described in fig. 4 is implemented to prune the feature image based on the loop consistency principle, so as to improve the quality of the initial view, and facilitate reducing the uncertainty of triangularization in the process of constructing the image due to the short base line in the feature image, thereby reducing the re-projection error of the AR map.

In yet another alternative embodiment, as shown in fig. 4, the feature points in each original scene image include at least first feature points extracted based on a conventional feature extraction network;

the specific way for the feature matching module 302 to perform feature matching processing on feature points in all original scene images according to feature data corresponding to each original scene image to obtain feature matching results may include:

It can be seen that the implementation of the device described in fig. 4 can also screen the original scene images by using the radar detection data of the target scene as the prior position information, so that global matching is not required for all the original scene images, and the workload of image matching calculation is reduced.

Example IV

Referring to fig. 5, fig. 5 is a schematic structural diagram of an AR map construction apparatus according to another embodiment of the present invention. As shown in fig. 5, the AR map construction apparatus may include:

a memory 401 storing executable program codes;

a processor 402 coupled with the memory 401;

the processor 402 invokes executable program codes stored in the memory 401 to perform the steps in the AR map construction method described in the first or second embodiment of the present invention.

Example five

The embodiment of the invention discloses a computer storage medium which stores computer instructions for executing the steps in the AR map construction method described in the first or second embodiment of the invention when the computer instructions are called.

Example six

An embodiment of the present invention discloses a computer program product comprising a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps in the AR map construction method described in the first or second embodiment.

The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.

Finally, it should be noted that: the method and the device for constructing the AR map disclosed by the embodiment of the invention are only disclosed in the preferred embodiment of the invention, and are only used for illustrating the technical scheme of the invention, but are not limited to the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of constructing an AR map, the method comprising:

Constructing a target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image, wherein the feature image corresponding to each original scene image comprises target feature points with matching relation with other original scene images in the original scene image, and the target AR map is used for determining pose information corresponding to any scene image acquired from the target scene;

the constructing a target AR map corresponding to the target scene based on the feature matching result and the feature image corresponding to each original scene image includes:

combining all the AR sub-maps into a target AR map corresponding to the target scene;

the constructing an AR sub-map corresponding to each sub-feature image set based on the feature matching result includes:

for each sub-feature image set, constructing an AR sub-map corresponding to the sub-feature image set based on an initialized image of the sub-feature image set;

for each sub-feature image set, the constructing an AR sub-map corresponding to the sub-feature image set based on the initialized image of the sub-feature image set includes:

determining relative pose information corresponding to 2 initialized images based on first target feature points with matching relations in the 2 initialized images of the sub-feature image set;

2. The AR map construction method according to claim 1, wherein the relative pose information includes a first rotation matrix and a translation matrix between 2 of the initialized images.

3. The method for constructing an AR map according to claim 2, wherein before said merging all the AR sub-maps into a target AR map corresponding to the target scene, the method further comprises:

4. The method of constructing an AR map according to any one of claims 1-3, wherein before said merging all the AR sub-maps into a target AR map corresponding to the target scene, the method further comprises:

5. The method for constructing an AR map according to any one of claims 1 to 3, wherein before the constructing a target AR map corresponding to the target scene based on the feature image corresponding to each of the original scene images as a result of the feature matching, the method further comprises:

6. The AR map construction method according to claim 5, wherein the preset pruning principle includes a loop consistency principle, each of the feature image sets includes a plurality of the feature images arranged randomly or based on a preset rule, and the number of images in each of the feature image sets is 3 or more;

7. The method of constructing an AR map according to any one of claims 1, 2, 3, and 6, wherein the feature points in each of the original scene images include at least a first feature point extracted based on a conventional feature extraction network;

8. An apparatus for constructing an AR map, the apparatus comprising:

the map construction module is used for constructing a target AR map corresponding to the target scene based on the feature matching result and the feature images corresponding to each original scene image, wherein each feature image corresponding to the original scene image comprises target feature points with matching relations with other original scene images in the original scene image, and the target AR map is used for determining pose information corresponding to any scene image acquired from the target scene;

The map construction module comprises:

the map merging sub-module is used for merging all the AR sub-maps into a target AR map corresponding to the target scene;

the specific mode of constructing the AR sub-map corresponding to each sub-feature image set by the map construction sub-module based on the feature matching result comprises the following steps:

for each sub-feature image set, the specific manner of constructing the AR sub-map corresponding to the sub-feature image set by the map construction sub-module based on the initialized image of the sub-feature image set includes: