CN117746005A

CN117746005A - Spatial scene positioning method, device, electronic equipment and storage medium

Info

Publication number: CN117746005A
Application number: CN202311828681.6A
Authority: CN
Inventors: 饶童; 周杰; 胡洋; 潘慈辉
Original assignee: You Can See Beijing Technology Co ltd AS
Current assignee: You Can See Beijing Technology Co ltd AS
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-22

Abstract

The embodiment of the disclosure discloses a spatial scene positioning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: image acquisition is carried out in a target space by utilizing image acquisition equipment, so as to obtain a first video stream; determining a plurality of second images matched with a first image in the first video stream from a model image library based on the first image; the model image library stores a plurality of three-dimensional models corresponding to the target space and a plurality of space two-dimensional images with color information, and each three-dimensional model corresponds to at least one space two-dimensional image; determining at least one three-dimensional model corresponding to the first image based on the plurality of second images, and determining at least one group of first pose information corresponding to the image acquisition device based on at least one three-dimensional model and the first image; and verifying and screening the at least one group of first pose information to determine target pose information of the image acquisition equipment.

Description

Spatial scene positioning method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to computer vision technology, and in particular, to a method, an apparatus, an electronic device, and a storage medium for positioning a spatial scene.

Background

When the user goes to a strange place, the user is likely to lose direction if the user does not use navigation, and meanwhile, the user is also in the current large-scale occasions such as indoor and the like, and in many large and complex spaces such as factories, super-merchants, parking lots and the like, the user cannot determine the position and the destination of the user in the complex space. How to determine the location of the user and conveniently find the location of the merchant is a matter of comparing headache, and the manual way of asking is sometimes understood inaccurately, so that the indoor guide guidepost has various problems of complex directions and the like.

Disclosure of Invention

The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides a spatial scene positioning method, a spatial scene positioning device, electronic equipment and a storage medium.

According to an aspect of the embodiments of the present disclosure, there is provided a spatial scene locating method, including:

image acquisition is carried out in a target space by utilizing image acquisition equipment, so as to obtain a first video stream; wherein the first video stream comprises at least one frame of a first image;

Determining a plurality of second images matched with a first image in the first video stream from a model image library based on the first image; the model image library stores a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images with color information;

determining a plurality of pairs of first matching points based on the plurality of second images and the first image; wherein each pair of the first matching points includes a feature point in the first image and a feature point in the second image;

determining at least one group of first pose information corresponding to the image acquisition equipment based on the plurality of pairs of first matching point pairs and the three-dimensional model;

and verifying and screening the at least one group of first pose information to determine target pose information of the image acquisition equipment.

Optionally, before determining the plurality of second images matched with the first image from the model image library based on one frame of the first image in the first video stream, the method further includes:

preprocessing the three-dimensional model corresponding to the target space and a plurality of the space two-dimensional images, and determining description characteristic information and three-dimensional coordinate information corresponding to a plurality of characteristic points.

Optionally, the preprocessing the three-dimensional model corresponding to the target space and the plurality of space two-dimensional images to determine description feature information and three-dimensional coordinate information corresponding to the plurality of feature points includes:

performing feature extraction on a plurality of the space two-dimensional images by using at least one feature extraction network to obtain descriptive feature information corresponding to each feature point in the plurality of feature points;

and determining point cloud data corresponding to the target space based on the three-dimensional model, and determining three-dimensional space information corresponding to each of the characteristic points based on the point cloud data.

Optionally, the determining, based on a frame of the first image in the first video stream, a plurality of second images matching the first image from a model image library includes:

extracting features of the first image to obtain first image features;

and matching the first image features with the spatial image features corresponding to a plurality of spatial two-dimensional images prestored in the model image library, and determining the plurality of second images based on the matching result.

Optionally, the determining a plurality of pairs of matching points based on the plurality of second images and the first image includes:

Performing feature point matching on the first image and each of the plurality of second images to obtain a plurality of pairs of second matching point pairs;

and screening the plurality of pairs of second matching point pairs to obtain a plurality of pairs of first matching point pairs.

Optionally, the screening process is performed on the pairs of second matching points to obtain pairs of first matching points, including:

determining second pose information corresponding to each second image based on two-dimensional plane information of the feature points corresponding to the first image and three-dimensional space information corresponding to the feature points of the second image for each of the plurality of pairs of second matching point pairs;

screening the plurality of second pose information based on the gravity direction information corresponding to the image acquisition equipment to obtain at least one screened second pose information;

and determining the plurality of pairs of first matching points based on at least one second matching point pair corresponding to the second pose information.

determining second pose information corresponding to each second image based on two-dimensional plane information of the feature points corresponding to the first image and three-dimensional space information corresponding to the feature points of the second image for each of the plurality of pairs of matching point pairs;

Screening the plurality of second pose information by utilizing a random consistency algorithm based on at least one priori information to obtain at least one screened second pose information;

Optionally, the determining, based on the plurality of pairs of first matching points and the three-dimensional model, at least one set of first pose information corresponding to the image acquisition device includes:

determining a plurality of third matching point pairs corresponding to each pair of the first matching point pairs based on each pair of the first matching point pairs;

determining a plurality of pairs of fourth matching point pairs which are correctly matched according to a judging result on the basis of whether the plurality of pairs of first matching point pairs and the plurality of pairs of third matching point pairs are correct or not by a beam adjustment method;

and determining at least one group of first pose information corresponding to the image acquisition equipment based on the fourth matching point pairs.

Optionally, the verifying and screening the at least one set of first pose information, determining target pose information of the image acquisition device includes:

verifying and screening the at least one group of first pose information based on at least one priori information, and determining target pose information of the image acquisition equipment; wherein the a priori information includes at least one of: the gravity direction of the image acquisition equipment, the position information of the preset positioning equipment and the third pose information of the image acquisition equipment, which is determined by utilizing a visual odometer based on the first image.

Optionally, the method further comprises:

a navigation route in the target space is determined based on target pose information of the image acquisition device and a destination input by a user.

According to another aspect of the embodiments of the present disclosure, there is provided a spatial scene locating device, including:

the image acquisition module is used for acquiring images in a target space by utilizing image acquisition equipment to obtain a first video stream; wherein the first video stream comprises at least one frame of a first image;

an image matching module, configured to determine, from a model image library, a plurality of second images that match a first image in the first video stream based on the first image; the model image library stores a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images with color information;

a point pair matching module for determining a plurality of pairs of first matching point pairs based on the plurality of second images and the first image; wherein each second image corresponds to at least one pair of first matching point pairs, and each pair of first matching point pairs comprises a characteristic point in one first image and a characteristic point in one second image;

The pose estimation module is used for determining a plurality of groups of first pose information corresponding to the image acquisition equipment based on the plurality of pairs of first matching point pairs and the three-dimensional model;

and the verification screening module is used for carrying out verification screening on the multiple groups of first pose information and determining target pose information of the image acquisition equipment.

Optionally, the apparatus further comprises:

the preprocessing module is used for preprocessing the three-dimensional model corresponding to the target space and the plurality of space two-dimensional images and determining description characteristic information and three-dimensional coordinate information corresponding to the plurality of characteristic points.

Optionally, the preprocessing module is specifically configured to perform feature extraction on the plurality of spatial two-dimensional images by using at least one feature extraction network, so as to obtain descriptive feature information corresponding to each feature point in the plurality of feature points; and determining point cloud data corresponding to the target space based on the three-dimensional model, and determining three-dimensional space information corresponding to each of the characteristic points based on the point cloud data.

Optionally, the image matching module is specifically configured to perform feature extraction on the first image to obtain a first image feature; and matching the first image features with the spatial image features corresponding to the plurality of spatial two-dimensional images prestored in the model image library, and determining the plurality of second images based on the matching result.

Optionally, the point pair matching module includes:

the initial matching unit is used for performing feature point matching on the first image and each of the plurality of second images to obtain a plurality of pairs of second matching point pairs;

and the screening processing unit is used for screening the plurality of pairs of second matching point pairs to obtain a plurality of pairs of first matching point pairs.

Optionally, the screening processing unit is specifically configured to determine, for each of the plurality of pairs of second matching points, second pose information corresponding to each of the second images based on two-dimensional plane information of feature points corresponding to the first image and three-dimensional space information corresponding to feature points of the second image; screening the plurality of second pose information based on the gravity direction information corresponding to the image acquisition equipment to obtain at least one screened second pose information; and determining the plurality of pairs of first matching points based on at least one second matching point pair corresponding to the second pose information.

Optionally, the screening processing unit is specifically configured to determine, for each of the plurality of pairs of matching points, second pose information corresponding to each of the second images based on two-dimensional plane information of feature points corresponding to the first image and three-dimensional space information corresponding to feature points of the second image; screening the plurality of second pose information by utilizing a random consistency algorithm based on at least one priori information to obtain at least one screened second pose information; and determining the plurality of pairs of first matching points based on at least one second matching point pair corresponding to the second pose information.

Optionally, the pose estimation module is specifically configured to determine, based on each of the plurality of pairs of first matching points, a plurality of pairs of third matching points corresponding to each pair of first matching points; determining a plurality of pairs of fourth matching point pairs which are correctly matched according to a judging result on the basis of whether the plurality of pairs of first matching point pairs and the plurality of pairs of third matching point pairs are correct or not by a beam adjustment method; and determining at least one group of first pose information corresponding to the image acquisition equipment based on the fourth matching point pairs.

Optionally, the verification screening module is specifically configured to perform verification screening on the at least one set of first pose information based on at least one priori information, and determine target pose information of the image acquisition device; wherein the a priori information includes at least one of: the gravity direction of the image acquisition equipment, the position information of the preset positioning equipment and the third pose information of the image acquisition equipment, which is determined by utilizing a visual odometer based on the first image.

Optionally, the apparatus further comprises:

and the navigation module is used for determining a navigation route in the target space based on the target pose information of the image acquisition equipment and the destination input by the user.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including:

a memory for storing a computer program product;

a processor configured to execute the computer program product stored in the memory, and when executed, implement the method according to any one of the embodiments.

According to a further aspect of the disclosed embodiments, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method according to any of the above embodiments.

According to a further aspect of the disclosed embodiments, there is provided a computer program product comprising computer program instructions which, when executed by a processor, implement the method of any of the embodiments described above.

The embodiment of the disclosure provides a spatial scene positioning method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: image acquisition is carried out in a target space by utilizing image acquisition equipment, so as to obtain a first video stream; determining a plurality of second images matched with a first image in the first video stream from a model image library based on the first image; the model image library stores a plurality of three-dimensional models corresponding to the target space and a plurality of space two-dimensional images with color information, and each three-dimensional model corresponds to at least one space two-dimensional image; determining at least one three-dimensional model corresponding to the first image based on the plurality of second images, and determining at least one group of first pose information corresponding to the image acquisition device based on at least one three-dimensional model and the first image; verifying and screening the at least one group of first pose information to determine target pose information of the image acquisition equipment; according to the method, the first image is obtained in the target space, the image matching and the feature point matching are combined with the three-dimensional information in the known three-dimensional model, the three-dimensional information corresponding to the feature points in the two-dimensional first image can be determined, at least one group of first pose information is determined, the target pose information is determined through verification and screening, and accuracy of the determined target pose information is improved.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a method for spatially scene localization according to an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of step 104 in the embodiment of FIG. 1 of the present disclosure;

FIG. 3 is a schematic flow chart of step 106 in the embodiment of FIG. 1 of the present disclosure;

FIG. 4 is a schematic flow chart of step 108 in the embodiment of FIG. 1 of the present disclosure;

FIG. 5 is a schematic diagram of a spatial scene locating device according to an exemplary embodiment of the present disclosure;

fig. 6 illustrates a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship. The data referred to in this disclosure may include unstructured data, such as text, images, video, and the like, as well as structured data.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Exemplary method

Fig. 1 is a flowchart of a spatial scene positioning method according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 1, and includes the following steps:

and 102, performing image acquisition by using an image acquisition device in a target space to obtain a first video stream.

Wherein the first video stream comprises at least one frame of the first image.

In this embodiment, the target space may be any scene space, for example, a parking lot, a business super, a factory, a hospital, and the like; only a corresponding three-dimensional space model is needed, and the embodiment is not limited to have an application scene; the image capturing device may be any device having an image capturing function, for example, a mobile phone, a video camera, a camera, etc.; optionally, the user holds the image acquisition device to enter the target space, performs image acquisition on the current scene, and based on the movement of the user in the target space, can continuously perform image acquisition to obtain a first video stream, and because the first video stream is formed by at least one frame of first image, the first image can be any frame in the video stream to obtain the first image, so that the pose of the image acquisition device when the first image is obtained is determined, namely the positioning of the image acquisition device in the target space is realized; the present embodiment is explained taking a first image as an example, and it can be understood that the processing of other frame images is similar to that of the first image.

Step 104, determining a plurality of second images matched with the first images from a model image library based on a frame of the first images in the first video stream.

The model image library stores a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images with color information.

In an embodiment, the target space is a known space, and before the implementation of the embodiment, three-dimensional information and two-dimensional color information are acquired in the target space, and by acquiring point positions of the three-dimensional information and the two-dimensional information to coincide, the three-dimensional information of each point in the target space can be corresponding to the two-dimensional information, that is, the three-dimensional information and the two-dimensional information of each point are known in the three-dimensional model; in order to facilitate image retrieval, the spatial two-dimensional images are independently stored in a model image library, and the corresponding relation between the spatial two-dimensional images and the three-dimensional model is stored, namely, three-dimensional information can be determined in the corresponding three-dimensional model by each point in the spatial two-dimensional images.

Step 106, determining a plurality of pairs of first matching points based on the plurality of second images and the first image.

Wherein each pair of first matching points includes a feature point in one first image and a feature point in one second image.

Optionally, after two-dimensional image retrieval is achieved, since there are necessarily some mismatching in the obtained multiple second images, in this embodiment, multiple pairs of first matching point pairs are obtained by matching the second images with corresponding feature points in the first images, at this time, the second images of the first matching point pairs that do not have matching indicate that the second images are mismatching, and the second images are to be filtered, so that screening of the multiple second images is achieved through feature point matching, and accuracy of the screened second images is improved.

And step 108, determining a plurality of groups of first pose information corresponding to the image acquisition equipment based on the plurality of pairs of first matching point pairs and the three-dimensional model.

Optionally, based on the three-dimensional information of the three-dimensional model, two-dimensional information and three-dimensional information corresponding to each point in the first image can be determined; based on the two-dimensional information and the three-dimensional information of each point, at least one group of first pose information can be determined based on a pose estimation method in the prior art, each group of first pose information comprises information of 6 degrees of freedom (3 translational degrees of freedom and 3 rotational degrees of freedom respectively corresponding to an xyz axis), and therefore the position information of the image acquisition equipment can be determined based on the translational degrees of freedom, and initial positioning is achieved. And the pose information is determined through the two-dimensional information and the three-dimensional space information corresponding to the first matching point pair subjected to the matching processing, so that the accuracy of the first pose information can be effectively improved.

Step 110, at least one group of first pose information is verified and filtered, and target pose information of the image acquisition device is determined.

In this embodiment, the prior information with the verification function may be obtained in at least one mode, and verification and screening are performed on at least one first pose information based on the prior information, so that accuracy of target pose information may be improved, and a final positioning result may achieve a better positioning effect.

The spatial scene positioning method provided by the embodiment of the disclosure includes: image acquisition is carried out in a target space by utilizing image acquisition equipment, so as to obtain a first video stream; determining a plurality of second images matched with a first image in the first video stream from a model image library based on the first image; the model image library stores a plurality of three-dimensional models corresponding to the target space and a plurality of space two-dimensional images with color information, and each three-dimensional model corresponds to at least one space two-dimensional image; determining at least one three-dimensional model corresponding to the first image based on the plurality of second images, and determining at least one group of first pose information corresponding to the image acquisition device based on at least one three-dimensional model and the first image; verifying and screening the at least one group of first pose information to determine target pose information of the image acquisition equipment; according to the method, the first image is obtained in the target space, the image matching and the feature point matching are combined with the three-dimensional information in the known three-dimensional model, the three-dimensional information corresponding to the feature points in the two-dimensional first image can be determined, at least one group of first pose information is determined, the target pose information is determined through verification and screening, and accuracy of the determined target pose information is improved.

In some alternative embodiments, before performing step 104, it may further include:

preprocessing a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images, and determining description characteristic information and three-dimensional coordinate information corresponding to a plurality of characteristic points.

The three-dimensional model provided in this embodiment mainly includes color information and geometric information. The color information can help us to match with the video stream used for positioning, the geometric information provides a space position, and the data preprocessing is mainly used for extracting color features (descriptive feature information obtained based on two-dimensional plane image extraction) of feature points which can be matched efficiently and corresponding space positions (coordinate information of three-dimensional points, namely coordinates of the feature points corresponding to an x axis, a y axis and a z axis in the three-dimensional model) from redundant data; the present embodiment performs sparse processing on points in the three-dimensional model, and extracts only a part of representative points as feature points (for example, corner points or feature points of different categories corresponding to the classification by a neural network).

Optionally, preprocessing a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images, and determining description feature information and three-dimensional coordinate information corresponding to a plurality of feature points, including:

Performing feature extraction on the plurality of spatial two-dimensional images by using at least one feature extraction network to obtain descriptive feature information corresponding to each feature point in the plurality of feature points; the descriptive feature information may include, but is not limited to, features such as high-dimensional unexplainable features (e.g., feature vectors output by a certain layer in a neural network, features describing an image, similar feature fingerprints, etc.), low-dimensional manually defined features and category features (information such as tables, sofas, stairs, KFC logo, etc.), semantic features, etc.

And determining point cloud data corresponding to the target space based on the three-dimensional model, and determining three-dimensional space information corresponding to each of the plurality of feature points based on the point cloud data.

In the embodiment, when the point cloud data is recovered through the three-dimensional model, the accuracy of the point cloud data is ensured through interpolation and rejection processing, the accuracy of the three-dimensional space information corresponding to the characteristic points is improved, and the problem that the three-dimensional space information cannot be acquired is avoided. Optionally, the position of the feature point in the space can be obtained from the model surface of the three-dimensional model, but for special objects (determined by semantic segmentation, such as objects like mirror surfaces, leaves, etc., the mirror surfaces can cause semantic segmentation errors, the leaves can cause problems of shielding the objects, etc.), the accuracy of the three-dimensional space information can be improved by marking in advance.

As shown in fig. 2, step 104 may include the following steps, based on the embodiment shown in fig. 1, described above:

in step 1041, feature extraction is performed on the first image, so as to obtain a first image feature.

Alternatively, the feature extraction may be performed on the first image through at least one neural network, which may correspond to the neural network performing feature extraction on the two-dimensional image in the target space in the preprocessing, for example, may include: classification network, segmentation network, etc., the obtained first image features may enable fast image retrieval.

Step 1042, matching the first image features with the spatial image features corresponding to the plurality of spatial two-dimensional images pre-stored in the model image library, and determining a plurality of second images based on the matching result.

The present embodiment may also describe an image as a feature vector (corresponding to a first image feature), determine a plurality of second images according to the similarity by determining the similarity between the spatial image feature in the form of the feature vector pre-stored in the model image library and the first image feature (the similarity may be determined according to the distance between the feature vectors), alternatively, determine a plurality of spatial two-dimensional images with the greatest similarity as the second images based on the similarity size sorting, or set a similarity threshold (the specific value is determined according to the actual application scenario), and use the spatial two-dimensional image corresponding to the similarity greater than the similarity threshold as the second image.

As shown in fig. 3, step 106 may include the following steps, based on the embodiment shown in fig. 1, described above:

in step 1061, feature point matching is performed on the first image and each of the plurality of second images, so as to obtain a plurality of pairs of second matching points.

Optionally, in order to improve the matching accuracy, the embodiment proposes that the second image and the first image are subjected to feature point matching, and pose information of the image acquisition device is determined more accurately by using the matched feature point pairs, because description feature information and three-dimensional coordinate information of a plurality of feature points are determined for each spatial two-dimensional image in preprocessing.

Step 1062, screening the plurality of second matching point pairs to obtain a plurality of first matching point pairs.

In the present embodiment, based on the position of the feature point in the image, and the descriptor of the feature (corresponding to a plurality of image points within a set range around the feature point); and the feature points are matched by combining the positions of the feature points and the descriptors, so that the accuracy of feature point matching is improved. According to the embodiment, a pile of second matching point pairs between the first image and the second image are obtained through feature point matching; in order to improve the accuracy of the matched feature point pairs, the embodiment also performs screening processing on the second matching point pairs, for example, removing a part of erroneous matching through prior information.

Optionally, in some alternative embodiments, step 1062 may include:

determining second pose information corresponding to each second image based on the two-dimensional plane information of the feature points corresponding to the first image and the three-dimensional space information corresponding to the feature points of the second image for each of the plurality of pairs of second matching points;

a plurality of pairs of first matching points are determined based on at least one pair of second matching points corresponding to the second pose information.

In this embodiment, based on the feature points in the first image and the feature points in the second image included in the second matching point pair, the feature points in the first image may determine two-dimensional information, the feature points in the second image may determine corresponding three-dimensional information through a three-dimensional model corresponding to the second image, and based on the two-dimensional information and the three-dimensional information corresponding to the matching feature points, the second pose information corresponding to each second image may be determined through an external parameter calibration method (for example, pnP algorithm, etc.) in the prior art; the image capturing device is usually provided with a gravity meter, and the gravity direction of the image capturing device can be determined, in this embodiment, the gravity direction is taken as prior information, the calculated multiple pieces of second pose information are screened, the screening process can be to match the degree of freedom representing the gravity direction in the second pose with the gravity direction information, only the second pose information matched with the gravity direction is reserved, and relatively accurate second pose information is obtained through the prior information of the image capturing device.

Optionally, in other alternative embodiments, step 1062 may include:

determining second pose information corresponding to each second image based on the two-dimensional plane information of the feature points corresponding to the first image and the three-dimensional space information corresponding to the feature points of the second image for each of the plurality of pairs of matching points;

screening the plurality of second pose information by utilizing a random consistency algorithm based on at least one priori information to obtain screened at least one second pose information;

The method for determining the plurality of second pose information in this embodiment is the same as that in the above embodiment, and is not described herein again. For the screening of the second pose information, the embodiment adopts a random consensus (RANSAC) algorithm, wherein the random consensus algorithm is an algorithm for randomly extracting several groups of matches, solving a solution of a mathematical model (solving spatial positions in the embodiment), checking inner points and outer points of the whole set, and circulating the above-mentioned processes for a plurality of times to obtain an optimal solving result. The RANSAC algorithm is an algorithm with multiple wins, and needs to perform a large amount of computation, so as to accelerate the screening speed and accuracy, this embodiment proposes a RANSAC scheme based on at least one priori information, where the a priori information may include, but is not limited to: 1. the direction of gravity: after RANSAC extraction of the subset, the dimensions of the mathematical model solver are built to decrease from six degrees of freedom (three dimensional spatial position and three rotational components) to four degrees of freedom (three dimensional spatial position and one rotation around the direction of gravity) due to the known direction of gravity. The lower the degree of freedom is, the fewer the variables are solved, so that the higher the stability of the solution is, the faster the speed is; 2. auxiliary positioning tools such as a bluetooth beacon and/or a WiFi beacon stabilizing translation amount (the auxiliary positioning tools are set at preset positions in a target space in advance, and the position information of each auxiliary positioning tool is known) are priori: also after the RANSAC extracts the subset, the mathematical model is still six degrees of freedom, but since the beacons such as bluetooth and WiFi can provide a substantially accurate spatial location, the results obtained by each round of solution of RANSAC need to pass two verifications: a) The inner points are enough; b) The solved spatial position is consistent with the prior to a certain extent (the error of the Bluetooth and WiFi beacons is larger, and is generally considered to be about 0.5 meter), and a more stable solving result is finally obtained; 3. visual odometry (by which the pose information of the image acquisition device is estimated from the captured first image) is implemented as a priori: usually, pose estimation of the visual odometer in a short time is relatively accurate (if visual information is lost, IMU integration in a short time does not have a big problem), if pose of a previous frame is solved, the pose+the visual odometer=the pose of the current frame can be calculated, if a certain deviation is acceptable, filtering can be performed on the current matching result in a certain threshold, if covariance matrixes of multiple sensors such as vision, IMU and the like exist, the filtered threshold is defined based on covariance information, otherwise, the outlier is filtered by defining the threshold by the distance of a timestamp (i.e. the closer in time, the smaller the odometer drift error).

The priori information in the embodiment can realize information acquisition by arranging devices such as laser, a gravity meter, an inertial sensor and the like in the image acquisition device, and preliminary positioning in an unknown space is realized.

As shown in fig. 4, step 108 may include the following steps, based on the embodiment shown in fig. 1, described above:

step 1081, determining a plurality of third matching point pairs corresponding to each pair of first matching point pairs based on each pair of first matching point pairs.

Step 1082, determining, based on the beam adjustment method, whether the pairs of first matching points and the pairs of third matching points are correct, and determining pairs of fourth matching points that are correctly matched according to the determination result.

Step 1083, determining at least one set of first pose information corresponding to the image capture device based on the plurality of fourth matching point pairs.

In this embodiment, in order to improve accuracy of the first pose information, a preset number of point pairs are acquired around each first matching point pair, and as a third matching point pair, the third matching point pairs are collected to realize stable and accurate determination of the first pose information; the beam adjustment method (bundle adjustnment) uses the pose of the camera and the three-dimensional coordinates of the measurement points as unknown parameters, and uses the coordinates of the feature points detected on the image for the front intersection as observation data to adjust the camera parameters and the world point coordinates optimally. What this embodiment is to solve is whether the first matching point pair and the third matching point pair between the plurality of images are correct or not, and the position of the images in the spatial model.

In some alternative embodiments, step 110 may include:

and verifying and screening at least one group of first pose information based on at least one priori information, and determining target pose information of the image acquisition equipment.

Wherein the a priori information includes, but is not limited to, at least one of: the gravity direction of the image acquisition device, the position information of the preset positioning device, the third pose information of the image acquisition device determined based on the first image by utilizing the visual odometer, and the like.

In this embodiment, after at least one set of first pose information is obtained by solving, in order to obtain more accurate target pose information, the present embodiment provides that verification screening is performed on each set of first pose information based on prior information, where the acquisition of the prior information may be obtained according to some known hardware devices, for example, the gravity direction of the image acquisition device is obtained by a gravity meter built in the image acquisition device, the positioning of the spatial position is assisted by setting a plurality of bluetooth beacons and/or WiFi beacons at a plurality of known positions, and whether the first pose information is accurate is verified by third pose information estimated by a visual odometer; and taking the first pose information verified by at least one priori information as target pose information to finish the positioning of the image acquisition equipment in the target space. The embodiment realizes efficient positioning, wherein the positioning correction and repositioning functions are included, the repositioning is performed in the initialization process of the whole system and when the positioning is lost, and the positioning correction is used for repairing the drift of the front-end system in long-distance operation in navigation.

In some optional embodiments, the method provided in this embodiment further includes:

a navigation route in the target space is determined based on the target pose information of the image capturing device and the destination input by the user.

After the positioning of the image acquisition equipment is finished, the position of the image acquisition equipment in the three-dimensional model can be determined, and the navigation of the user can be realized by applying the position of the image acquisition equipment in AR positioning navigation; in addition, besides navigating the destination input by the user, the route acquisition method can also provide the route acquisition for the user according to the object input by the user; the problem that a user cannot correctly reach a destination due to unaware of a target space is overcome, and the embodiment can be applied to various scenes (e.g., super business, factories, hospitals and the like).

Any of the spatial scene locating methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including, but not limited to: terminal equipment, servers, etc. Alternatively, any of the spatial scene locating methods provided by the embodiments of the present disclosure may be executed by a processor, such as the processor executing any of the spatial scene locating methods mentioned by the embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.

Exemplary apparatus

Fig. 5 is a schematic structural diagram of a spatial scene locating device according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the apparatus provided in this embodiment includes:

the image acquisition module 51 is configured to acquire an image by using an image acquisition device in a target space, and obtain a first video stream. Wherein the first video stream comprises at least one frame of the first image.

An image matching module 52 for determining a plurality of second images matching the first image from the model image library based on a frame of the first image in the first video stream.

The point pair matching module 53 is configured to determine a plurality of pairs of first matching point pairs based on the plurality of second images and the first image.

Each second image corresponds to at least one pair of first matching points, and each pair of first matching points comprises a characteristic point in one first image and a characteristic point in one second image.

The pose estimation module 54 is configured to determine multiple sets of first pose information corresponding to the image capturing device based on the multiple pairs of first matching point pairs and the three-dimensional model.

And the verification and screening module 55 is used for carrying out verification and screening on the multiple groups of first pose information and determining target pose information of the image acquisition equipment.

The spatial scene positioning device provided by the above embodiment of the present disclosure includes: image acquisition is carried out in a target space by utilizing image acquisition equipment, so as to obtain a first video stream; determining a plurality of second images matched with a first image in the first video stream from a model image library based on the first image; the model image library stores a plurality of three-dimensional models corresponding to the target space and a plurality of space two-dimensional images with color information, and each three-dimensional model corresponds to at least one space two-dimensional image; determining at least one three-dimensional model corresponding to the first image based on the plurality of second images, and determining at least one group of first pose information corresponding to the image acquisition device based on at least one three-dimensional model and the first image; verifying and screening the at least one group of first pose information to determine target pose information of the image acquisition equipment; according to the method, the first image is obtained in the target space, the image matching and the feature point matching are combined with the three-dimensional information in the known three-dimensional model, the three-dimensional information corresponding to the feature points in the two-dimensional first image can be determined, at least one group of first pose information is determined, the target pose information is determined through verification and screening, and accuracy of the determined target pose information is improved.

In some optional embodiments, the apparatus provided in this embodiment further includes:

Optionally, the preprocessing module is specifically configured to perform feature extraction on the plurality of spatial two-dimensional images by using at least one feature extraction network, so as to obtain descriptive feature information corresponding to each feature point in the plurality of feature points; and determining point cloud data corresponding to the target space based on the three-dimensional model, and determining three-dimensional space information corresponding to each of the plurality of characteristic points based on the point cloud data.

In some alternative embodiments, the image matching module 52 is specifically configured to perform feature extraction on the first image to obtain a first image feature; and matching the first image features with the spatial image features corresponding to the plurality of spatial two-dimensional images prestored in the model image library, and determining a plurality of second images based on the matching result.

In some alternative embodiments, the point pair matching module 53 includes:

the initial matching unit is used for performing characteristic point matching on the first image and each of the plurality of second images to obtain a plurality of pairs of second matching point pairs;

In some optional embodiments, the screening processing unit is specifically configured to determine, for each of the plurality of pairs of second matching points, second pose information corresponding to each of the second images based on two-dimensional plane information of the feature points corresponding to the first image and three-dimensional space information corresponding to the feature points of the second image; screening the plurality of second pose information based on the gravity direction information corresponding to the image acquisition equipment to obtain at least one screened second pose information; a plurality of pairs of first matching points are determined based on at least one pair of second matching points corresponding to the second pose information.

In other optional embodiments, the screening processing unit is specifically configured to determine, for each of the plurality of pairs of matching points, second pose information corresponding to each of the second images based on two-dimensional plane information of the feature points corresponding to the first image and three-dimensional space information corresponding to the feature points of the second image; screening the plurality of second pose information by utilizing a random consistency algorithm based on at least one priori information to obtain screened at least one second pose information; a plurality of pairs of first matching points are determined based on at least one pair of second matching points corresponding to the second pose information.

In some alternative embodiments, the pose estimation module 54 is specifically configured to determine, based on each of the plurality of pairs of first matching points, a plurality of pairs of third matching points corresponding to each of the plurality of pairs of first matching points; determining a plurality of pairs of fourth matching point pairs which are correctly matched according to a judging result on the basis of whether the plurality of pairs of first matching point pairs and the plurality of pairs of third matching point pairs are correct or not by a beam adjustment method; at least one set of first pose information corresponding to the image acquisition device is determined based on the plurality of fourth matching point pairs.

In some optional embodiments, the verification screening module 55 is specifically configured to perform verification screening on the at least one set of first pose information based on at least one priori information, and determine target pose information of the image capturing device; wherein the a priori information includes at least one of: the gravity direction of the image acquisition device, the position information of the preset positioning device and the third pose information of the image acquisition device determined by utilizing the visual odometer based on the first image.

and the navigation module is used for determining a navigation route in the target space based on the target pose information of the image acquisition device and the destination input by the user.

Exemplary electronic device

Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 6. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 6, the electronic device includes one or more processors and memory.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.

The memory may store one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program products may be stored on the computer readable storage medium that can be run by a processor to implement the spatial scene locating method and/or other desired functions of the various embodiments of the disclosure described above.

In one example, the electronic device may further include: input devices and output devices, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device may include, for example, a keyboard, a mouse, and the like.

The output device may output various information including the determined distance information, direction information, etc., to the outside. The output device may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 6, with components such as buses, input/output interfaces, etc. omitted for simplicity. In addition, the electronic device may include any other suitable components depending on the particular application.

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a spatial scene locating method according to various embodiments of the present disclosure described in the above section of the present description.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in the spatial scene locating method according to various embodiments of the present disclosure described in the above section of the present description.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A method for locating a spatial scene, comprising:

2. The method of claim 1, wherein before determining a plurality of second images matching the first image from a model image library based on a frame of the first image in the first video stream, further comprising:

3. The method according to claim 2, wherein preprocessing the three-dimensional model corresponding to the target space and the plurality of spatial two-dimensional images to determine descriptive feature information and three-dimensional coordinate information corresponding to a plurality of feature points includes:

4. A method according to any of claims 1-3, wherein said determining a plurality of second images from a model image library that match said first image based on a frame of the first image in said first video stream comprises:

extracting features of the first image to obtain first image features;

5. The method of any of claims 1-4, wherein the determining a plurality of pairs of matching points based on the plurality of second images and the first image comprises:

6. The method of claim 5, wherein the filtering the plurality of pairs of second matching points to obtain a plurality of pairs of first matching points comprises:

7. The method of claim 5, wherein the filtering the plurality of pairs of second matching points to obtain a plurality of pairs of first matching points comprises:

8. The method of any of claims 1-7, wherein determining at least one set of first pose information corresponding to the image acquisition device based on the plurality of pairs of first matching points and the three-dimensional model comprises:

9. The method according to any one of claims 1-8, wherein the performing verification screening on the at least one set of first pose information to determine target pose information of the image acquisition device includes:

10. The method according to any one of claims 1-9, further comprising:

11. A spatial scene locating device, comprising:

the image acquisition module is used for acquiring images in a target space by utilizing image acquisition equipment to obtain a first image;

an image matching module for determining a plurality of second images matched with the first image from a model image library based on the first image; the model image library stores a three-dimensional model corresponding to the target space and a plurality of space two-dimensional images with color information;

12. An electronic device, comprising:

a memory for storing a computer program product;

a processor for executing a computer program product stored in said memory, which, when executed, implements the method of any of the preceding claims 1-10.

13. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of the preceding claims 1-10.